[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

2010-09-16 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12910441#action_12910441
 ] 

Ankur commented on PIG-1229:


In the putNext() method, count is reset to 0 every time the number of tuples 
added to the batch exceed 'batchSize'. The batch is then executed and its 
parameters cleared. There is currently 
an ExecException in the putNext() method that is being ignored. Can you try 
adding some debugging System.outs and check the stdout/stderr of your reducers 
to see if that is the problem ?

> allow pig to write output into a JDBC db
> 
>
> Key: PIG-1229
> URL: https://issues.apache.org/jira/browse/PIG-1229
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Reporter: Ian Holsman
>Assignee: Ankur
>Priority: Minor
> Fix For: 0.8.0
>
> Attachments: jira-1229-final.patch, jira-1229-final.test-fix.patch, 
> jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1060) MultiQuery optimization throws error for multi-level splits

2009-10-29 Thread Ankur (JIRA)
MultiQuery optimization throws error for multi-level splits
---

 Key: PIG-1060
 URL: https://issues.apache.org/jira/browse/PIG-1060
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Ankur


Consider the following scenario :-
1. Multi-level splits in the map plan.
2. Each split branch further progressing across a local-global rearrange.
3. Output of each of these finally merged via a UNION.

MultiQuery optimizer throws the following error in such a case:
"ERROR 2146: Internal Error. Inconsistency in key index found during 
optimization."


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1060) MultiQuery optimization throws error for multi-level splits

2009-10-29 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771390#action_12771390
 ] 

Ankur commented on PIG-1060:


Here's a sample script to illustrate the issue. Note that sample data isn't 
very important here since the optimization and execution fail. 
=== test.pig 

data = LOAD 'dummy' as (name:chararray, freq:int);

filter1 = FILTER data BY freq < 5;
group1 = GROUP filter1 BY name;
proj1 = FOREACH group1 GENERATE FLATTEN(group), 'string1', SUM(filter1.freq);

filter2 = FILTER data by freq > 5;
group2 = GROUP filter2 BY name;
proj2 = FOREACH group2 GENERATE FLATTEN(group), 'string2', SUM(filter2.freq);

filter3 = FILTER filter2 by freq < 10;
group3 = GROUP filter3 By name;
proj3 = FOREACH group3 GENERATE FLATTEN(group), 'string3', SUM(filter3.freq);

filter4 = FILTER filter3 by freq > 7;
group4 = GROUP filter4 By name;
proj4 = FOREACH group4 GENERATE FLATTEN(group), 'string4', SUM(filter4.freq);

M1 = LIMIT proj1 10;
M2 = LIMIT proj2 10;
M3 = LIMIT proj3 10;
M4 = LIMIT proj4 10;

U = UNION M1, M2, M3, M4;

STORE U INTO 'res' USING PigStorage();

The dot output can dumped via command - "explain -dot -script test.pig;" to 
visualize the scenario.
A surprising observation is that despite turning MultiQuery off using -M, it 
seems that the MultiQuery optimizer is still runs and fails the script.




> MultiQuery optimization throws error for multi-level splits
> ---
>
> Key: PIG-1060
> URL: https://issues.apache.org/jira/browse/PIG-1060
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Ankur
>
> Consider the following scenario :-
> 1. Multi-level splits in the map plan.
> 2. Each split branch further progressing across a local-global rearrange.
> 3. Output of each of these finally merged via a UNION.
> MultiQuery optimizer throws the following error in such a case:
> "ERROR 2146: Internal Error. Inconsistency in key index found during 
> optimization."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-958) Splitting output data on key field

2009-11-02 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772925#action_12772925
 ] 

Ankur commented on PIG-958:
---

Can we have an update on this please ?

> Splitting output data on key field
> --
>
> Key: PIG-958
> URL: https://issues.apache.org/jira/browse/PIG-958
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Ankur
> Attachments: 958.v3.patch, 958.v4.patch
>
>
> Pig users often face the need to split the output records into a bunch of 
> files and directories depending on the type of record. Pig's SPLIT operator 
> is useful when record types are few and known in advance. In cases where type 
> is not directly known but is derived dynamically from values of a key field 
> in the output tuple, a custom store function is a better solution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-958) Splitting output data on key field

2009-11-03 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12773389#action_12773389
 ] 

Ankur commented on PIG-958:
---

> Can you explain this a little bit more - ..
In the earlier patch (958.v3.patch), After moving the results from the tasks 
current working directory, I was manually deleting the directory. This is to 
ensure that empty part files don't get moved to the final output directory. But 
doing so causes hadoop to complain that it can no longer write to task's output 
dir and the task fails.

> I saw compile errors while trying to run unit test: ...
Did you compile the pig.jar  and ran core test before. This creates the 
necessary classes and jar file son the local machine required by contrib tests.

On my local machine
gan...@grainflydivide-dr:pig_trunk$ ant 
...
buildJar:
 [echo] svnString 830456
  [jar] Building jar: 
/home/gankur/eclipse/workspace/pig_trunk/build/pig-0.6.0-dev-core.jar
  [jar] Building jar: 
/home/gankur/eclipse/workspace/pig_trunk/build/pig-0.6.0-dev.jar
 [copy] Copying 1 file to /home/gankur/eclipse/workspace/pig_trunk

gan...@grainflydivide-dr:pig_trunk$ ant test
...
test-core:
   [delete] Deleting directory 
/home/gankur/eclipse/workspace/pig_trunk/build/test/logs
[mkdir] Created dir: 
/home/gankur/eclipse/workspace/pig_trunk/build/test/logs
[junit] Running org.apache.pig.test.TestAdd
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.024 sec
[junit] Running org.apache.pig.test.TestAlgebraicEval
...
gan...@grainflydivide-dr:pig_trunk$ cd contrib/piggybank/java/
gan...@grainflydivide-dr:java$ ant test
...
test:
 [echo]  *** Running UDF tests ***
   [delete] Deleting directory 
/home/gankur/eclipse/workspace/pig_trunk/contrib/piggybank/java/build/test/logs
[mkdir] Created dir: 
/home/gankur/eclipse/workspace/pig_trunk/contrib/piggybank/java/build/test/logs
[junit] Running org.apache.pig.piggybank.test.evaluation.TestEvalString
[junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 0.15 sec
[junit] Running org.apache.pig.piggybank.test.evaluation.TestMathUDF
[junit] Tests run: 35, Failures: 0, Errors: 0, Time elapsed: 0.123 sec
[junit] Running org.apache.pig.piggybank.test.evaluation.TestStat
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.114 sec
[junit] Running 
org.apache.pig.piggybank.test.evaluation.datetime.TestDiffDate
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.105 sec
[junit] Running org.apache.pig.piggybank.test.evaluation.decode.TestDecode
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.089 sec
[junit] Running org.apache.pig.piggybank.test.evaluation.string.TestHashFNV
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.094 sec
[junit] Running 
org.apache.pig.piggybank.test.evaluation.string.TestLookupInFiles
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 17.163 sec
[junit] Running org.apache.pig.piggybank.test.evaluation.string.TestRegex
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.092 sec
[junit] Running 
org.apache.pig.piggybank.test.evaluation.util.TestSearchQuery
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.093 sec
[junit] Running org.apache.pig.piggybank.test.evaluation.util.TestTop
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.099 sec
[junit] Running 
org.apache.pig.piggybank.test.evaluation.util.apachelogparser.TestDateExtractor
[junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.087 sec
[junit] Running 
org.apache.pig.piggybank.test.evaluation.util.apachelogparser.TestHostExtractor
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.083 sec
[junit] Running 
org.apache.pig.piggybank.test.evaluation.util.apachelogparser.TestSearchEngineExtractor
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.091 sec
[junit] Running 
org.apache.pig.piggybank.test.evaluation.util.apachelogparser.TestSearchTermExtractor
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.1 sec
[junit] Running org.apache.pig.piggybank.test.storage.TestCombinedLogLoader
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.535 sec
[junit] Running org.apache.pig.piggybank.test.storage.TestCommonLogLoader
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.54 sec
[junit] Running org.apache.pig.piggybank.test.storage.TestHelper
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.014 sec
[junit] Running org.apache.pig.piggybank.test.storage.TestMultiStorage
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 16.964 sec
[junit] Running org.apache.pig.piggybank.test.storage.TestMyRegExLoader
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.452 sec
[junit] Running org.apache

[jira] Created: (PIG-1075) Error in Cogroup when key fields types don't match

2009-11-05 Thread Ankur (JIRA)
Error in Cogroup when key fields types don't match
--

 Key: PIG-1075
 URL: https://issues.apache.org/jira/browse/PIG-1075
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.5.0
Reporter: Ankur


When Cogrouping 2 relations on multiple key fields, pig throws an error if the 
corresponding types don't match. 
Consider the following script:-
A = LOAD 'data' USING PigStorage() as (a:chararray, b:int, c:int);
B = LOAD 'data' USING PigStorage() as (a:chararray, b:chararray, c:int);
C = CoGROUP A BY (a,b,c), B BY (a,b,c);
D = FOREACH C GENERATE FLATTEN(A), FLATTEN(B);
describe D;
dump D;

The complete stack trace of the error thrown is

Pig Stack Trace
---
ERROR 1051: Cannot cast to Unknown

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1001: Unable to 
describe schema for alias D
at org.apache.pig.PigServer.dumpSchema(PigServer.java:436)
at 
org.apache.pig.tools.grunt.GruntParser.processDescribe(GruntParser.java:233)
at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:253)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
at org.apache.pig.Main.main(Main.java:397)
Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An 
unexpected exception caused the validation to stop
at 
org.apache.pig.impl.plan.PlanValidator.validateSkipCollectException(PlanValidator.java:104)
at 
org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:40)
at 
org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:30)
at 
org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:83)
at org.apache.pig.PigServer.compileLp(PigServer.java:821)
at org.apache.pig.PigServer.dumpSchema(PigServer.java:428)
... 6 more
Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: 
ERROR 1060: Cannot resolve COGroup output schema
at 
org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:2463)
at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:372)
at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:45)
at 
org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:69)
at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
at 
org.apache.pig.impl.plan.PlanValidator.validateSkipCollectException(PlanValidator.java:101)
... 11 more
Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: 
ERROR 1051: Cannot cast to Unknown
at 
org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.insertAtomicCastForCOGroupInnerPlan(TypeCheckingVisitor.java:2552)
at 
org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:2451)
... 16 more

The error message does not help the user in identifying the issue clearly 
especially if the pig script is large and complex.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1075) Error in Cogroup when key fields types don't match

2009-11-05 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774222#action_12774222
 ] 

Ankur commented on PIG-1075:


Pig should throw an error message that better identifies the cause of the 
problem.

> Error in Cogroup when key fields types don't match
> --
>
> Key: PIG-1075
> URL: https://issues.apache.org/jira/browse/PIG-1075
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.5.0
>Reporter: Ankur
>
> When Cogrouping 2 relations on multiple key fields, pig throws an error if 
> the corresponding types don't match. 
> Consider the following script:-
> A = LOAD 'data' USING PigStorage() as (a:chararray, b:int, c:int);
> B = LOAD 'data' USING PigStorage() as (a:chararray, b:chararray, c:int);
> C = CoGROUP A BY (a,b,c), B BY (a,b,c);
> D = FOREACH C GENERATE FLATTEN(A), FLATTEN(B);
> describe D;
> dump D;
> The complete stack trace of the error thrown is
> Pig Stack Trace
> ---
> ERROR 1051: Cannot cast to Unknown
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1001: Unable to 
> describe schema for alias D
> at org.apache.pig.PigServer.dumpSchema(PigServer.java:436)
> at 
> org.apache.pig.tools.grunt.GruntParser.processDescribe(GruntParser.java:233)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:253)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> at org.apache.pig.Main.main(Main.java:397)
> Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An 
> unexpected exception caused the validation to stop
> at 
> org.apache.pig.impl.plan.PlanValidator.validateSkipCollectException(PlanValidator.java:104)
> at 
> org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:40)
> at 
> org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:30)
> at 
> org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:83)
> at org.apache.pig.PigServer.compileLp(PigServer.java:821)
> at org.apache.pig.PigServer.dumpSchema(PigServer.java:428)
> ... 6 more
> Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: 
> ERROR 1060: Cannot resolve COGroup output schema
> at 
> org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:2463)
> at 
> org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:372)
> at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:45)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:69)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
> at 
> org.apache.pig.impl.plan.PlanValidator.validateSkipCollectException(PlanValidator.java:101)
> ... 11 more
> Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: 
> ERROR 1051: Cannot cast to Unknown
> at 
> org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.insertAtomicCastForCOGroupInnerPlan(TypeCheckingVisitor.java:2552)
> at 
> org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:2451)
> ... 16 more
> The error message does not help the user in identifying the issue clearly 
> especially if the pig script is large and complex.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1108) Incorrect map output key type in MultiQuery optimization

2009-11-24 Thread Ankur (JIRA)
Incorrect map output key type in MultiQuery optimization


 Key: PIG-1108
 URL: https://issues.apache.org/jira/browse/PIG-1108
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Ankur


When trying to merge 2 split plans, one of which never progresses along an M/R 
boundary, PIG sets the map-output key type incorrectly resulting in the 
following error:-

java.io.IOException: Type mismatch in key from map: expected 
org.apache.pig.impl.io.NullableText, recieved 
org.apache.pig.impl.io.NullableTuple
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:807)
at 
org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:238)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:159)

Here is a small script to be used a reproducible test case

rmf plan1
rmf plan2
A = LOAD 'data' USING PigStorage() as (a: int, b: chararray);
SPLIT A into plan1 IF (a>5), plan2 IF (a<5);
B = GROUP plan1 BY b;
C = FOREACH B {
  tmp = ORDER plan1 BY a desc;
  GENERATE FLATTEN(group) as b, tmp;
  };
D = FILTER C BY b is not null;
STORE D into 'plan1';
STORE plan2 into 'plan2';


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1108) Incorrect map output key type in MultiQuery optimization

2009-11-25 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782787#action_12782787
 ] 

Ankur commented on PIG-1108:


In my test run on 0.6.0 branch, disabling MQ did not work. Pig client logs 
showed that MQ was still kicking in and the mappers failed with the same error 
message as in description. It will be good if we can add few points about 
"SecondaryKey" here - 
http://wiki.apache.org/pig/PigMultiQueryPerformanceSpecification

> Incorrect map output key type in MultiQuery optimization
> 
>
> Key: PIG-1108
> URL: https://issues.apache.org/jira/browse/PIG-1108
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ankur
>Assignee: Richard Ding
>
> When trying to merge 2 split plans, one of which never progresses along an 
> M/R boundary, PIG sets the map-output key type incorrectly resulting in the 
> following error:-
> java.io.IOException: Type mismatch in key from map: expected 
> org.apache.pig.impl.io.NullableText, recieved 
> org.apache.pig.impl.io.NullableTuple
>   at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:807)
>   at 
> org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:238)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>   at org.apache.hadoop.mapred.Child.main(Child.java:159)
> Here is a small script to be used a reproducible test case
> rmf plan1
> rmf plan2
> A = LOAD 'data' USING PigStorage() as (a: int, b: chararray);
> SPLIT A into plan1 IF (a>5), plan2 IF (a<5);
> B = GROUP plan1 BY b;
> C = FOREACH B {
>   tmp = ORDER plan1 BY a desc;
>   GENERATE FLATTEN(group) as b, tmp;
>   };
> D = FILTER C BY b is not null;
> STORE D into 'plan1';
> STORE plan2 into 'plan2';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1112) FLATTEN eliminates the alias

2009-11-26 Thread Ankur (JIRA)
FLATTEN eliminates the alias


 Key: PIG-1112
 URL: https://issues.apache.org/jira/browse/PIG-1112
 Project: Pig
  Issue Type: Bug
Reporter: Ankur
 Fix For: 0.6.0


If schema for a field of type 'bag' is partially defined then FLATTEN() 
incorrectly eliminates the field and throws an error. 
Consider the following example:-

A = LOAD 'sample' using PigStorage() as (first:chararray, second:chararray, 
ladder:bag{});  
B = FOREACH A GENERATE first,FLATTEN(ladder) as third,second;   

C = GROUP B by (first,third);

This throws the error
 ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. 
Invalid alias: third in {first: chararray,second: chararray}


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1113) Diamond query optimization throws error in JOIN

2009-11-26 Thread Ankur (JIRA)
Diamond query optimization throws error in JOIN
---

 Key: PIG-1113
 URL: https://issues.apache.org/jira/browse/PIG-1113
 Project: Pig
  Issue Type: Bug
Reporter: Ankur


The following script results in 1 M/R job as a result of diamond query 
optimization but the script fails.

set1 = LOAD 'set1' USING PigStorage as (a:chararray, b:chararray, c:chararray);
set2 = LOAD 'set2' USING PigStorage as (a: chararray, b:chararray, c:bag{});

set2_1 = FOREACH set2 GENERATE a as f1, b as f2, (chararray) 0 as f3;
set2_2 = FOREACH set2 GENERATE a as f1, FLATTEN((IsEmpty(c) ? null : c)) as f2, 
(chararray) 1 as f3;

all_set2 = UNION set2_1, set2_2;

joined_sets = JOIN set1 BY (a,b), all_set2 BY (f2,f3);
dump joined_sets;

And here is the error

org.apache.pig.backend.executionengine.ExecException: ERROR 1071: Cannot 
convert a bag to a String
at org.apache.pig.data.DataType.toString(DataType.java:739)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:625)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:288)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:247)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:238)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:159)



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1113) Diamond query optimization throws error in JOIN

2009-11-26 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12782877#action_12782877
 ] 

Ankur commented on PIG-1113:


The script fails even if correct schema is specified for the c:bag{}. So the 
following change does not alleviate the problem

set2 = LOAD 'set2' USING PigStorage as (a: chararray, b:chararray, 
c:bag{T:tuple(l:chararray)});

> Diamond query optimization throws error in JOIN
> ---
>
> Key: PIG-1113
> URL: https://issues.apache.org/jira/browse/PIG-1113
> Project: Pig
>  Issue Type: Bug
>Reporter: Ankur
>
> The following script results in 1 M/R job as a result of diamond query 
> optimization but the script fails.
> set1 = LOAD 'set1' USING PigStorage as (a:chararray, b:chararray, 
> c:chararray);
> set2 = LOAD 'set2' USING PigStorage as (a: chararray, b:chararray, c:bag{});
> set2_1 = FOREACH set2 GENERATE a as f1, b as f2, (chararray) 0 as f3;
> set2_2 = FOREACH set2 GENERATE a as f1, FLATTEN((IsEmpty(c) ? null : c)) as 
> f2, (chararray) 1 as f3;
> all_set2 = UNION set2_1, set2_2;
> joined_sets = JOIN set1 BY (a,b), all_set2 BY (f2,f3);
> dump joined_sets;
> And here is the error
> org.apache.pig.backend.executionengine.ExecException: ERROR 1071: Cannot 
> convert a bag to a String
>   at org.apache.pig.data.DataType.toString(DataType.java:739)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:625)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:288)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
>   at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:247)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:238)
>   at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>   at org.apache.hadoop.mapred.Child.main(Child.java:159)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1114) MultiQuery optimization throws error when merging 2 level splits

2009-11-29 Thread Ankur (JIRA)
MultiQuery optimization throws error when merging 2 level splits


 Key: PIG-1114
 URL: https://issues.apache.org/jira/browse/PIG-1114
 Project: Pig
  Issue Type: Bug
Reporter: Ankur
Priority: Critical


Multi-query optimization throws an error when merging 2 level splits. Following 
is the script to reproduce the error

data = LOAD 'data' USING PigStorage() AS (id:int, name:chararray);

ids = FOREACH data GENERATE id;
allId = GROUP ids all;
allIdCount = FOREACH allId GENERATE group as allId, COUNT(ids) as total;
idGroup = GROUP ids by id;
idGroupCount = FOREACH idGroup GENERATE group as id, COUNT(ids) as count;
countTotal = cross idGroupCount, allIdCount;
idCountTotal = foreach countTotal generate
id,
count,
total,
(double)count / (double)total as proportion;
orderedCounts = order idCountTotal by count desc;
STORE orderedCounts INTO 'mq_problem/ids';

names = FOREACH data GENERATE name;
allNames = GROUP names all;
allNamesCount = FOREACH allNames GENERATE group as namesAll, COUNT(names) as 
total;
nameGroup = GROUP names by name;
nameGroupCount = FOREACH nameGroup GENERATE group as name, COUNT(names) as 
count;
namesCrossed = cross nameGroupCount, allNamesCount;
nameCountTotal = foreach namesCrossed generate
name,
count,
total,
(double)count / (double)total as proportion;
nameCountsOrdered = order nameCountTotal by count desc;
STORE nameCountsOrdered INTO 'mq_problem/names';




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1114) MultiQuery optimization throws error when merging 2 level splits

2009-11-29 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783553#action_12783553
 ] 

Ankur commented on PIG-1114:


The error thrown is 

java.io.IOException: Type mismatch in key from map: expected 
org.apache.pig.impl.io.NullableTuple, recieved 
org.apache.pig.impl.io.NullableText
at 
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:807)
at 
org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:466)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:249)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:238)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:159)



> MultiQuery optimization throws error when merging 2 level splits
> 
>
> Key: PIG-1114
> URL: https://issues.apache.org/jira/browse/PIG-1114
> Project: Pig
>  Issue Type: Bug
>Reporter: Ankur
>Priority: Critical
>
> Multi-query optimization throws an error when merging 2 level splits. 
> Following is the script to reproduce the error
> data = LOAD 'data' USING PigStorage() AS (id:int, name:chararray);
> ids = FOREACH data GENERATE id;
> allId = GROUP ids all;
> allIdCount = FOREACH allId GENERATE group as allId, COUNT(ids) as total;
> idGroup = GROUP ids by id;
> idGroupCount = FOREACH idGroup GENERATE group as id, COUNT(ids) as count;
> countTotal = cross idGroupCount, allIdCount;
> idCountTotal = foreach countTotal generate
> id,
> count,
> total,
> (double)count / (double)total as proportion;
> orderedCounts = order idCountTotal by count desc;
> STORE orderedCounts INTO 'mq_problem/ids';
> names = FOREACH data GENERATE name;
> allNames = GROUP names all;
> allNamesCount = FOREACH allNames GENERATE group as namesAll, COUNT(names) as 
> total;
> nameGroup = GROUP names by name;
> nameGroupCount = FOREACH nameGroup GENERATE group as name, COUNT(names) as 
> count;
> namesCrossed = cross nameGroupCount, allNamesCount;
> nameCountTotal = foreach namesCrossed generate
> name,
> count,
> total,
> (double)count / (double)total as proportion;
> nameCountsOrdered = order nameCountTotal by count desc;
> STORE nameCountsOrdered INTO 'mq_problem/names';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1114) MultiQuery optimization throws error when merging 2 level splits

2009-11-29 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12783554#action_12783554
 ] 

Ankur commented on PIG-1114:


The same script works with -M (multi-query disabled) option, BUT surprisingly 
the run indicates that now multi-query optimization being applied separately to 
the first STORE and the second STORE. This is just a workaround but it also 
indicates that in cases like this, disabling multi-query actually DOES NOT 
disable it completely instead just makes it run on parts of the script.

> MultiQuery optimization throws error when merging 2 level splits
> 
>
> Key: PIG-1114
> URL: https://issues.apache.org/jira/browse/PIG-1114
> Project: Pig
>  Issue Type: Bug
>Reporter: Ankur
>Priority: Critical
>
> Multi-query optimization throws an error when merging 2 level splits. 
> Following is the script to reproduce the error
> data = LOAD 'data' USING PigStorage() AS (id:int, name:chararray);
> ids = FOREACH data GENERATE id;
> allId = GROUP ids all;
> allIdCount = FOREACH allId GENERATE group as allId, COUNT(ids) as total;
> idGroup = GROUP ids by id;
> idGroupCount = FOREACH idGroup GENERATE group as id, COUNT(ids) as count;
> countTotal = cross idGroupCount, allIdCount;
> idCountTotal = foreach countTotal generate
> id,
> count,
> total,
> (double)count / (double)total as proportion;
> orderedCounts = order idCountTotal by count desc;
> STORE orderedCounts INTO 'mq_problem/ids';
> names = FOREACH data GENERATE name;
> allNames = GROUP names all;
> allNamesCount = FOREACH allNames GENERATE group as namesAll, COUNT(names) as 
> total;
> nameGroup = GROUP names by name;
> nameGroupCount = FOREACH nameGroup GENERATE group as name, COUNT(names) as 
> count;
> namesCrossed = cross nameGroupCount, allNamesCount;
> nameCountTotal = foreach namesCrossed generate
> name,
> count,
> total,
> (double)count / (double)total as proportion;
> nameCountsOrdered = order nameCountTotal by count desc;
> STORE nameCountsOrdered INTO 'mq_problem/names';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1114) MultiQuery optimization throws error when merging 2 level splits

2009-11-29 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-1114:
---

Fix Version/s: 0.6.0

> MultiQuery optimization throws error when merging 2 level splits
> 
>
> Key: PIG-1114
> URL: https://issues.apache.org/jira/browse/PIG-1114
> Project: Pig
>  Issue Type: Bug
>Reporter: Ankur
>Priority: Critical
> Fix For: 0.6.0
>
>
> Multi-query optimization throws an error when merging 2 level splits. 
> Following is the script to reproduce the error
> data = LOAD 'data' USING PigStorage() AS (id:int, name:chararray);
> ids = FOREACH data GENERATE id;
> allId = GROUP ids all;
> allIdCount = FOREACH allId GENERATE group as allId, COUNT(ids) as total;
> idGroup = GROUP ids by id;
> idGroupCount = FOREACH idGroup GENERATE group as id, COUNT(ids) as count;
> countTotal = cross idGroupCount, allIdCount;
> idCountTotal = foreach countTotal generate
> id,
> count,
> total,
> (double)count / (double)total as proportion;
> orderedCounts = order idCountTotal by count desc;
> STORE orderedCounts INTO 'mq_problem/ids';
> names = FOREACH data GENERATE name;
> allNames = GROUP names all;
> allNamesCount = FOREACH allNames GENERATE group as namesAll, COUNT(names) as 
> total;
> nameGroup = GROUP names by name;
> nameGroupCount = FOREACH nameGroup GENERATE group as name, COUNT(names) as 
> count;
> namesCrossed = cross nameGroupCount, allNamesCount;
> nameCountTotal = foreach namesCrossed generate
> name,
> count,
> total,
> (double)count / (double)total as proportion;
> nameCountsOrdered = order nameCountTotal by count desc;
> STORE nameCountsOrdered INTO 'mq_problem/names';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1114) MultiQuery optimization throws error when merging 2 level splits

2009-11-30 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-1114:
---

Attachment: Pig_1114_Client.log

> MultiQuery optimization throws error when merging 2 level splits
> 
>
> Key: PIG-1114
> URL: https://issues.apache.org/jira/browse/PIG-1114
> Project: Pig
>  Issue Type: Bug
>Reporter: Ankur
>Assignee: Richard Ding
>Priority: Critical
> Fix For: 0.6.0
>
> Attachments: Pig_1114_Client.log
>
>
> Multi-query optimization throws an error when merging 2 level splits. 
> Following is the script to reproduce the error
> data = LOAD 'data' USING PigStorage() AS (id:int, name:chararray);
> ids = FOREACH data GENERATE id;
> allId = GROUP ids all;
> allIdCount = FOREACH allId GENERATE group as allId, COUNT(ids) as total;
> idGroup = GROUP ids by id;
> idGroupCount = FOREACH idGroup GENERATE group as id, COUNT(ids) as count;
> countTotal = cross idGroupCount, allIdCount;
> idCountTotal = foreach countTotal generate
> id,
> count,
> total,
> (double)count / (double)total as proportion;
> orderedCounts = order idCountTotal by count desc;
> STORE orderedCounts INTO 'mq_problem/ids';
> names = FOREACH data GENERATE name;
> allNames = GROUP names all;
> allNamesCount = FOREACH allNames GENERATE group as namesAll, COUNT(names) as 
> total;
> nameGroup = GROUP names by name;
> nameGroupCount = FOREACH nameGroup GENERATE group as name, COUNT(names) as 
> count;
> namesCrossed = cross nameGroupCount, allNamesCount;
> nameCountTotal = foreach namesCrossed generate
> name,
> count,
> total,
> (double)count / (double)total as proportion;
> nameCountsOrdered = order nameCountTotal by count desc;
> STORE nameCountsOrdered INTO 'mq_problem/names';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1114) MultiQuery optimization throws error when merging 2 level splits

2009-11-30 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784070#action_12784070
 ] 

Ankur commented on PIG-1114:


Richard,
 I ran the above script again with -M  option to confirm that 
Multiquery was not disabled, instead it worked on 2 separated parts of the 
script. I am attaching the pig client logs from the run for your reference.

> MultiQuery optimization throws error when merging 2 level splits
> 
>
> Key: PIG-1114
> URL: https://issues.apache.org/jira/browse/PIG-1114
> Project: Pig
>  Issue Type: Bug
>Reporter: Ankur
>Assignee: Richard Ding
>Priority: Critical
> Fix For: 0.6.0
>
> Attachments: Pig_1114_Client.log
>
>
> Multi-query optimization throws an error when merging 2 level splits. 
> Following is the script to reproduce the error
> data = LOAD 'data' USING PigStorage() AS (id:int, name:chararray);
> ids = FOREACH data GENERATE id;
> allId = GROUP ids all;
> allIdCount = FOREACH allId GENERATE group as allId, COUNT(ids) as total;
> idGroup = GROUP ids by id;
> idGroupCount = FOREACH idGroup GENERATE group as id, COUNT(ids) as count;
> countTotal = cross idGroupCount, allIdCount;
> idCountTotal = foreach countTotal generate
> id,
> count,
> total,
> (double)count / (double)total as proportion;
> orderedCounts = order idCountTotal by count desc;
> STORE orderedCounts INTO 'mq_problem/ids';
> names = FOREACH data GENERATE name;
> allNames = GROUP names all;
> allNamesCount = FOREACH allNames GENERATE group as namesAll, COUNT(names) as 
> total;
> nameGroup = GROUP names by name;
> nameGroupCount = FOREACH nameGroup GENERATE group as name, COUNT(names) as 
> count;
> namesCrossed = cross nameGroupCount, allNamesCount;
> nameCountTotal = foreach namesCrossed generate
> name,
> count,
> total,
> (double)count / (double)total as proportion;
> nameCountsOrdered = order nameCountTotal by count desc;
> STORE nameCountsOrdered INTO 'mq_problem/names';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1152) bincond operator throws parser error

2009-12-14 Thread Ankur (JIRA)
bincond operator throws parser error


 Key: PIG-1152
 URL: https://issues.apache.org/jira/browse/PIG-1152
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Ankur


Bincond operator throws parser error when true condition contains a constant 
bag with 1 tuple containing a single field of int type with -ve value. 

Here is the script to reproduce the issue

A = load 'A' as (s: chararray, x: int, y: int);
B = group A by s;
C = foreach B generate group, flatten(((COUNT(A) < 1L) ? {(-1)} : A.x));
dump C;


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1168) Dump produces wrong results

2009-12-21 Thread Ankur (JIRA)
Dump produces wrong results
---

 Key: PIG-1168
 URL: https://issues.apache.org/jira/browse/PIG-1168
 Project: Pig
  Issue Type: Bug
Reporter: Ankur


For a map-only job, dump just re-executes every pig-latin statement from the 
begininng assuming that they would produce same result. the assumption is not 
valid if there are UDFs that are invoked. Consider the following script:-

raw = LOAD '$input' USING PigStorage() AS (text_string:chararray);
DUMP raw;

ccm = FOREACH raw GENERATE MyUDF(text_string);
DUMP ccm;

bug = FOREACH ccm GENERATE ccmObj;

DUMP bug;

The UDF MyUDF generates a tuple with one of the fields being a randomly 
generated UUID. So even though one would expect relations 'ccm' and 'bug' to 
contain identical data, they are different because of re-execution from the 
begininng. This breaks the application logic.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-761) ERROR 2086 on simple JOIN

2009-12-23 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794005#action_12794005
 ] 

Ankur commented on PIG-761:
---

Here is a very simple script to reproduce the issue:-

- Start -
data1 = LOAD 'data1' as (a:int, b:int, c:chararray);
proj1 = LIMIT data1 5;

data2 = LOAD 'data2' as (x:int, y:chararray, z:chararray);
proj2 = FOREACH data2 GENERATE x, y;

cogrouped = COGROUP proj1 BY a, proj2 BY x INNER PARALLEL 2;
joined = FOREACH cogrouped GENERATE FLATTEN(proj1), FLATTEN(proj2);

store joined into 'results';
- End 

The problem seems to be with the LIMIT operator for one of the relations 
participating in the join.  Seems like this causes the mismatch between 
expected and found local re-arrange operators

> ERROR 2086 on simple JOIN
> -
>
> Key: PIG-761
> URL: https://issues.apache.org/jira/browse/PIG-761
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.2.0
> Environment: mapreduce mode
>Reporter: Vadim Zaliva
>
> ERROR 2086: Unexpected problem during optimization. Could not find all 
> LocalRearrange operators.org.apache.pig.impl.logicalLayer.FrontendException: 
> ERROR 1002: Unable to store alias 109
> doing pretty straightforward join in one of my pig scripts. I am able to 
> 'dump' both relationship involved in this join. when I try to join them I am 
> getting this error.
> Here is a full log:
> ERROR 2086: Unexpected problem during optimization. Could not find all
> LocalRearrange operators.
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable
> to store alias 109
>at org.apache.pig.PigServer.registerQuery(PigServer.java:296)
>at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529)
>at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280)
>at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
>at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
>at org.apache.pig.Main.main(Main.java:319)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR
> 2043: Unexpected error during execution.
>at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:274)
>at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:700)
>at org.apache.pig.PigServer.execute(PigServer.java:691)
>at org.apache.pig.PigServer.registerQuery(PigServer.java:292)
>... 5 more
> Caused by: org.apache.pig.impl.plan.optimizer.OptimizerException:
> ERROR 2086: Unexpected problem during optimization. Could not find all
> LocalRearrange operators.
>at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator.handlePackage(POPackageAnnotator.java:116)
>at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator.visitMROp(POPackageAnnotator.java:88)
>at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:194)
>at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:43)
>at 
> org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:65)
>at 
> org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
>at 
> org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
>at 
> org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
>at 
> org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
>at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
>at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.
> MapReduceLauncher.compile(MapReduceLauncher.java:198)
>at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:80)
>at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:261)
>... 8 more
> ERROR 1002: Unable to store alias 398
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable
> to store alias 398
>at org.apache.pig.PigServer.registerQuery(PigServer.java:296)
>at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529)
>at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280)
>at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99)
>at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
>at org.apache.pig.Main.main(Main.java:319)
> Caused by: java.lang.NullPointerException
>at 
> org.apache.pi

[jira] Created: (PIG-1191) POCast throws exception for certain sequences of LOAD, FILTER, FORACH

2010-01-14 Thread Ankur (JIRA)
POCast throws exception for certain sequences of LOAD, FILTER, FORACH
-

 Key: PIG-1191
 URL: https://issues.apache.org/jira/browse/PIG-1191
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Ankur
Priority: Blocker


When using a custom load/store function, one that returns complex data (map of 
maps, list of maps), for certain sequences  of LOAD, FILTER, FOREACH pig script 
throws an exception of the form -
 
org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received a 
bytearray from the UDF. Cannot determine how to convert the bytearray to 

at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:639)
...
Looking through the code of POCast, apparently the operator was unable to find 
the right load function for doing the conversion and consequently bailed out 
with the exception failing the entire pig script.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1191) POCast throws exception for certain sequences of LOAD, FILTER, FORACH

2010-01-14 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800609#action_12800609
 ] 

Ankur commented on PIG-1191:


Listed below are the identified cases. 

CASE 1: LOAD -> FILTER -> FOREACH -> LIMIT -> STORE
===

SCRIPT
---
sds = LOAD '/my/data/location'
  USING my.org.MyMapLoader()
  AS (simpleFields:map[], mapFields:map[], listMapFields:map[]);
queries = FILTER sds BY mapFields#'page_params'#'query' is NOT NULL;
queries_rand = FOREACH queries
   GENERATE (CHARARRAY) (mapFields#'page_params'#'query') AS 
query_string;
queries_limit = LIMIT queries_rand 100;
STORE queries_limit INTO 'out'; 

RESULT 

FAILS in reduce stage with the following exception

org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received a 
bytearray from the UDF. Cannot determine
how to convert the bytearray to string.
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:639)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:288)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.runPipeline(PigMapReduce.java:423)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.processOnePackageOutput(PigMapReduce.java:391)
at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:371)


CASE 2: LOAD -> FOREACH -> FILTER -> LIMIT -> STORE
===
Note that FILTER and FOREACH order is reversed

SCRIPT
---
sds = LOAD '/my/data/location'
  USING my.org.MyMapLoader()
  AS (simpleFields:map[], mapFields:map[], listMapFields:map[]);
queries_rand = FOREACH sds
   GENERATE (CHARARRAY) (mapFields#'page_params'#'query') AS 
query_string;
queries = FILTER queries_rand BY query_string IS NOT null;
queries_limit = LIMIT queries 100; 
STORE queries_limit INTO 'out';

RESULT
---
SUCCESS - Results are correctly stored. So if a projection is done before 
FILTER it recieves the LoadFunc in the POCast
operator and everything is cool.


CASE 3: LOAD -> FOREACH -> FOREACH -> FILTER -> LIMIT -> STORE
==

SCRIPT
---
ds = LOAD '/my/data/location'
  USING my.org.MyMapLoader()
  AS (simpleFields:map[], mapFields:map[], listMapFields:map[]);
params = FOREACH sds GENERATE 
  (map[]) (mapFields#'page_params') AS params;
queries = FOREACH params
  GENERATE (CHARARRAY) (params#'query') AS query_string;
queries_filtered = FILTER queries
   BY query_string IS NOT null;
queries_limit = LIMIT queries_filtered 100;
STORE queries_limit INTO 'out';

RESULT
---
FAILS in Map stage. Looks like the 2nd FOREACH did not get the loadFunc and 
bailed out with following stack trace

org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received a 
bytearray from the UDF. Cannot determine
how to convert the bytearray to string. at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:639)
 at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:288)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFilter.getNext(POFilter.java:95)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLimit.getNext(POLimit.java:85)
 at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260)
 at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253)
 at

CASE 4: LOAD -> FOREACH -> FOREACH -> LIMIT -> STORE


SCRIPT
---
sds = LOAD '/my/data/location'
  USING my.org.MyMapLoader()
  AS (simpleFields:map[], mapFields:map[], listMapFields:map[]);
params = FOREACH sds GENERATE
  (map[]) (mapFields#'page_params') AS params;
queries = FOREACH

[jira] Commented: (PIG-1191) POCast throws exception for certain sequences of LOAD, FILTER, FORACH

2010-01-15 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800610#action_12800610
 ] 

Ankur commented on PIG-1191:


I'll check and update the ticket

> POCast throws exception for certain sequences of LOAD, FILTER, FORACH
> -
>
> Key: PIG-1191
> URL: https://issues.apache.org/jira/browse/PIG-1191
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ankur
>Priority: Blocker
> Attachments: PIG-1191-1.patch
>
>
> When using a custom load/store function, one that returns complex data (map 
> of maps, list of maps), for certain sequences  of LOAD, FILTER, FOREACH pig 
> script throws an exception of the form -
>  
> org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received a 
> bytearray from the UDF. Cannot determine how to convert the bytearray to 
> 
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:639)
> ...
> Looking through the code of POCast, apparently the operator was unable to 
> find the right load function for doing the conversion and consequently bailed 
> out with the exception failing the entire pig script.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1191) POCast throws exception for certain sequences of LOAD, FILTER, FORACH

2010-01-15 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800636#action_12800636
 ] 

Ankur commented on PIG-1191:


Case 1, 2: Succeeds
Case 3 : Fails
Case 4,5: Empty results. Both of them are using consecutive projection of 
complex fields.

I'll add 1 more test case

> POCast throws exception for certain sequences of LOAD, FILTER, FORACH
> -
>
> Key: PIG-1191
> URL: https://issues.apache.org/jira/browse/PIG-1191
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ankur
>Priority: Blocker
> Attachments: PIG-1191-1.patch
>
>
> When using a custom load/store function, one that returns complex data (map 
> of maps, list of maps), for certain sequences  of LOAD, FILTER, FOREACH pig 
> script throws an exception of the form -
>  
> org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received a 
> bytearray from the UDF. Cannot determine how to convert the bytearray to 
> 
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:639)
> ...
> Looking through the code of POCast, apparently the operator was unable to 
> find the right load function for doing the conversion and consequently bailed 
> out with the exception failing the entire pig script.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1191) POCast throws exception for certain sequences of LOAD, FILTER, FORACH

2010-01-15 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800655#action_12800655
 ] 

Ankur commented on PIG-1191:


CASE 6:  In CASE 1 replace LIMIT with a GROUP BY followed by FOREACH 


Succeeds with the given patch.


> POCast throws exception for certain sequences of LOAD, FILTER, FORACH
> -
>
> Key: PIG-1191
> URL: https://issues.apache.org/jira/browse/PIG-1191
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ankur
>Priority: Blocker
> Attachments: PIG-1191-1.patch
>
>
> When using a custom load/store function, one that returns complex data (map 
> of maps, list of maps), for certain sequences  of LOAD, FILTER, FOREACH pig 
> script throws an exception of the form -
>  
> org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received a 
> bytearray from the UDF. Cannot determine how to convert the bytearray to 
> 
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:639)
> ...
> Looking through the code of POCast, apparently the operator was unable to 
> find the right load function for doing the conversion and consequently bailed 
> out with the exception failing the entire pig script.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1191) POCast throws exception for certain sequences of LOAD, FILTER, FORACH

2010-01-15 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur reassigned PIG-1191:
--

Assignee: Pradeep Kamath

> POCast throws exception for certain sequences of LOAD, FILTER, FORACH
> -
>
> Key: PIG-1191
> URL: https://issues.apache.org/jira/browse/PIG-1191
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ankur
>Assignee: Pradeep Kamath
>Priority: Blocker
> Attachments: PIG-1191-1.patch
>
>
> When using a custom load/store function, one that returns complex data (map 
> of maps, list of maps), for certain sequences  of LOAD, FILTER, FOREACH pig 
> script throws an exception of the form -
>  
> org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received a 
> bytearray from the UDF. Cannot determine how to convert the bytearray to 
> 
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:639)
> ...
> Looking through the code of POCast, apparently the operator was unable to 
> find the right load function for doing the conversion and consequently bailed 
> out with the exception failing the entire pig script.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1191) POCast throws exception for certain sequences of LOAD, FILTER, FORACH

2010-01-15 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12800660#action_12800660
 ] 

Ankur commented on PIG-1191:


Small correct in comment dated - 15/Jan/10 09:39 AM

Case 5: Still FAILS

> POCast throws exception for certain sequences of LOAD, FILTER, FORACH
> -
>
> Key: PIG-1191
> URL: https://issues.apache.org/jira/browse/PIG-1191
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ankur
>Assignee: Pradeep Kamath
>Priority: Blocker
> Attachments: PIG-1191-1.patch
>
>
> When using a custom load/store function, one that returns complex data (map 
> of maps, list of maps), for certain sequences  of LOAD, FILTER, FOREACH pig 
> script throws an exception of the form -
>  
> org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received a 
> bytearray from the UDF. Cannot determine how to convert the bytearray to 
> 
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:639)
> ...
> Looking through the code of POCast, apparently the operator was unable to 
> find the right load function for doing the conversion and consequently bailed 
> out with the exception failing the entire pig script.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1191) POCast throws exception for certain sequences of LOAD, FILTER, FORACH

2010-01-17 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12801687#action_12801687
 ] 

Ankur commented on PIG-1191:


Verified that PIG-1191-2.patch successfully passes all 6 test cases with 
expected results.
So barring the increased number of release audit  warnings,
+1 for commit

> POCast throws exception for certain sequences of LOAD, FILTER, FORACH
> -
>
> Key: PIG-1191
> URL: https://issues.apache.org/jira/browse/PIG-1191
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ankur
>Assignee: Pradeep Kamath
>Priority: Blocker
> Attachments: PIG-1191-1.patch, PIG-1191-2.patch
>
>
> When using a custom load/store function, one that returns complex data (map 
> of maps, list of maps), for certain sequences  of LOAD, FILTER, FOREACH pig 
> script throws an exception of the form -
>  
> org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received a 
> bytearray from the UDF. Cannot determine how to convert the bytearray to 
> 
> at 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:639)
> ...
> Looking through the code of POCast, apparently the operator was unable to 
> find the right load function for doing the conversion and consequently bailed 
> out with the exception failing the entire pig script.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1229) allow pig to write output into a JDBC db

2010-02-08 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur reassigned PIG-1229:
--

Assignee: Ankur

> allow pig to write output into a JDBC db
> 
>
> Key: PIG-1229
> URL: https://issues.apache.org/jira/browse/PIG-1229
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Reporter: Ian Holsman
>Assignee: Ankur
>Priority: Minor
> Attachments: DbStorage.java
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

2010-02-08 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831337#action_12831337
 ] 

Ankur commented on PIG-1229:


Aaron, Thanks for the suggestions.
I'll have an updated patch coming soon.

> allow pig to write output into a JDBC db
> 
>
> Key: PIG-1229
> URL: https://issues.apache.org/jira/browse/PIG-1229
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Reporter: Ian Holsman
>Assignee: Ankur
>Priority: Minor
> Attachments: DbStorage.java
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1233) NullPointerException in AVG

2010-02-09 Thread Ankur (JIRA)
NullPointerException in AVG 


 Key: PIG-1233
 URL: https://issues.apache.org/jira/browse/PIG-1233
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Ankur
 Fix For: 0.6.0


The overridden method - getValue() in AVG throws null pointer exception in case 
accumulate() is not called leaving variable 'intermediateCount'  initialized to 
null. This causes java to throw exception when it tries to 'unbox' the value 
for numeric comparison.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1233) NullPointerException in AVG

2010-02-09 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-1233:
---

Attachment: jira-1233.patch

Attached is a very simple patch that adds the required null checks. This is a 
very simple code change so I don't think any new test cases are needed. 

> NullPointerException in AVG 
> 
>
> Key: PIG-1233
> URL: https://issues.apache.org/jira/browse/PIG-1233
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ankur
> Fix For: 0.6.0
>
> Attachments: jira-1233.patch
>
>
> The overridden method - getValue() in AVG throws null pointer exception in 
> case accumulate() is not called leaving variable 'intermediateCount'  
> initialized to null. This causes java to throw exception when it tries to 
> 'unbox' the value for numeric comparison.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1233) NullPointerException in AVG

2010-02-09 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur reassigned PIG-1233:
--

Assignee: Ankur

> NullPointerException in AVG 
> 
>
> Key: PIG-1233
> URL: https://issues.apache.org/jira/browse/PIG-1233
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ankur
>Assignee: Ankur
> Fix For: 0.6.0
>
> Attachments: jira-1233.patch
>
>
> The overridden method - getValue() in AVG throws null pointer exception in 
> case accumulate() is not called leaving variable 'intermediateCount'  
> initialized to null. This causes java to throw exception when it tries to 
> 'unbox' the value for numeric comparison.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1233) NullPointerException in AVG

2010-02-09 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-1233:
---

Status: Patch Available  (was: Open)

> NullPointerException in AVG 
> 
>
> Key: PIG-1233
> URL: https://issues.apache.org/jira/browse/PIG-1233
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ankur
>Assignee: Ankur
> Fix For: 0.6.0
>
> Attachments: jira-1233.patch
>
>
> The overridden method - getValue() in AVG throws null pointer exception in 
> case accumulate() is not called leaving variable 'intermediateCount'  
> initialized to null. This causes java to throw exception when it tries to 
> 'unbox' the value for numeric comparison.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1233) NullPointerException in AVG

2010-02-15 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-1233:
---

Attachment: (was: jira-1233.patch)

> NullPointerException in AVG 
> 
>
> Key: PIG-1233
> URL: https://issues.apache.org/jira/browse/PIG-1233
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ankur
>Assignee: Ankur
> Fix For: 0.6.0
>
> Attachments: jira-1233.patch
>
>
> The overridden method - getValue() in AVG throws null pointer exception in 
> case accumulate() is not called leaving variable 'intermediateCount'  
> initialized to null. This causes java to throw exception when it tries to 
> 'unbox' the value for numeric comparison.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1233) NullPointerException in AVG

2010-02-15 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-1233:
---

Status: In Progress  (was: Patch Available)

> NullPointerException in AVG 
> 
>
> Key: PIG-1233
> URL: https://issues.apache.org/jira/browse/PIG-1233
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ankur
>Assignee: Ankur
> Fix For: 0.6.0
>
> Attachments: jira-1233.patch
>
>
> The overridden method - getValue() in AVG throws null pointer exception in 
> case accumulate() is not called leaving variable 'intermediateCount'  
> initialized to null. This causes java to throw exception when it tries to 
> 'unbox' the value for numeric comparison.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1233) NullPointerException in AVG

2010-02-15 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-1233:
---

Attachment: jira-1233.patch

Added test case

> NullPointerException in AVG 
> 
>
> Key: PIG-1233
> URL: https://issues.apache.org/jira/browse/PIG-1233
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ankur
>Assignee: Ankur
> Fix For: 0.6.0
>
> Attachments: jira-1233.patch
>
>
> The overridden method - getValue() in AVG throws null pointer exception in 
> case accumulate() is not called leaving variable 'intermediateCount'  
> initialized to null. This causes java to throw exception when it tries to 
> 'unbox' the value for numeric comparison.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1233) NullPointerException in AVG

2010-02-15 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-1233:
---

Status: Patch Available  (was: In Progress)

Retrying hudson after adding the suggested test case

> NullPointerException in AVG 
> 
>
> Key: PIG-1233
> URL: https://issues.apache.org/jira/browse/PIG-1233
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ankur
>Assignee: Ankur
> Fix For: 0.6.0
>
> Attachments: jira-1233.patch
>
>
> The overridden method - getValue() in AVG throws null pointer exception in 
> case accumulate() is not called leaving variable 'intermediateCount'  
> initialized to null. This causes java to throw exception when it tries to 
> 'unbox' the value for numeric comparison.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

2010-02-15 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-1229:
---

Attachment: jira-1229.patch

Updated code with added test case using HSQLDB (binary part of the patch).

> allow pig to write output into a JDBC db
> 
>
> Key: PIG-1229
> URL: https://issues.apache.org/jira/browse/PIG-1229
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Reporter: Ian Holsman
>Assignee: Ankur
>Priority: Minor
> Attachments: jira-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

2010-02-15 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-1229:
---

Fix Version/s: 0.6.0
   Status: Patch Available  (was: Open)

> allow pig to write output into a JDBC db
> 
>
> Key: PIG-1229
> URL: https://issues.apache.org/jira/browse/PIG-1229
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Reporter: Ian Holsman
>Assignee: Ankur
>Priority: Minor
> Fix For: 0.6.0
>
> Attachments: jira-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

2010-02-15 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-1229:
---

Attachment: hsqldb.jar

Attaching hsqldb.jar separately as including it in the patch does not work

> allow pig to write output into a JDBC db
> 
>
> Key: PIG-1229
> URL: https://issues.apache.org/jira/browse/PIG-1229
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Reporter: Ian Holsman
>Assignee: Ankur
>Priority: Minor
> Fix For: 0.6.0
>
> Attachments: hsqldb.jar, jira-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1233) NullPointerException in AVG

2010-02-15 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834075#action_12834075
 ] 

Ankur commented on PIG-1233:


The test report URLs don't work. Is this the correct one ?
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/205/testReport/

Looks alright to me.

> NullPointerException in AVG 
> 
>
> Key: PIG-1233
> URL: https://issues.apache.org/jira/browse/PIG-1233
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ankur
>Assignee: Ankur
> Fix For: 0.6.0
>
> Attachments: jira-1233.patch
>
>
> The overridden method - getValue() in AVG throws null pointer exception in 
> case accumulate() is not called leaving variable 'intermediateCount'  
> initialized to null. This causes java to throw exception when it tries to 
> 'unbox' the value for numeric comparison.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1238) Dump does not respect the schema

2010-02-16 Thread Ankur (JIRA)
Dump does not respect the schema


 Key: PIG-1238
 URL: https://issues.apache.org/jira/browse/PIG-1238
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Ankur


For complex data type and certain sequence of operations dump produces results 
with non-existent field in the relation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1238) Dump does not respect the schema

2010-02-16 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834151#action_12834151
 ] 

Ankur commented on PIG-1238:


Here is a script to reproduce the issue:-

A = LOAD 'two.txt' USING PigStorage();
B = FOREACH A GENERATE ['a'#'12'] as b:map[], ['b'#['c'#'12']] as mapFields;
C = FOREACH B GENERATE(CHARARRAY) mapFields#'b'#'c' AS f1, RANDOM() AS f2;
D = ORDER C BY f2 PARALLEL 10;
E = LIMIT D 20;
F = FOREACH E GENERATE f1;
describe F;
dump F;

With the above script here is a snippet of the logs that might be useful
...
...
2010-02-16 10:42:44,814 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 90% complete
2010-02-16 10:42:55,966 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 100% complete
2010-02-16 10:42:55,981 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Successfully stored result in: 
"hdfs://mithrilblue-nn1.blue.ygrid.yahoo.com/tmp/temp-1870551954/tmp-470213889"
2010-02-16 10:42:55,991 [main] WARN  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Encountered Warning ACCESSING_NON_EXISTENT_FIELD 1 time(s).
2010-02-16 10:42:55,991 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Records written : 1
2010-02-16 10:42:55,991 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Bytes written : 14
2010-02-16 10:42:55,991 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Success!
(12,)

Note:- If we remove "PARALLEL 10" from Order by correct results are produced 
and NO warning is thrown.

> Dump does not respect the schema
> 
>
> Key: PIG-1238
> URL: https://issues.apache.org/jira/browse/PIG-1238
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ankur
>
> For complex data type and certain sequence of operations dump produces 
> results with non-existent field in the relation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1238) Dump does not respect the schema

2010-02-16 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834642#action_12834642
 ] 

Ankur commented on PIG-1238:


Daniel the correct syntax is - ['b'#['c'#'12']] as mapFields.

> Dump does not respect the schema
> 
>
> Key: PIG-1238
> URL: https://issues.apache.org/jira/browse/PIG-1238
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ankur
>
> For complex data type and certain sequence of operations dump produces 
> results with non-existent field in the relation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1238) Dump does not respect the schema

2010-02-16 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834643#action_12834643
 ] 

Ankur commented on PIG-1238:


Seems like inner [] are making parts of it appear underlined. Correct syntax is
['b'# ['c'#'12'] ] as mapFields

> Dump does not respect the schema
> 
>
> Key: PIG-1238
> URL: https://issues.apache.org/jira/browse/PIG-1238
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ankur
>
> For complex data type and certain sequence of operations dump produces 
> results with non-existent field in the relation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1238) Dump does not respect the schema

2010-02-16 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834644#action_12834644
 ] 

Ankur commented on PIG-1238:


Sigh
Enclose 'c'#'12' in a square bracket and then enclose 'b'# ... in another 
square bracket

> Dump does not respect the schema
> 
>
> Key: PIG-1238
> URL: https://issues.apache.org/jira/browse/PIG-1238
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ankur
>
> For complex data type and certain sequence of operations dump produces 
> results with non-existent field in the relation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1233) NullPointerException in AVG

2010-02-16 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12834645#action_12834645
 ] 

Ankur commented on PIG-1233:


Olga,
   All queries that use AVG(),  have null values for certain keys and have 
accumulator turned on for them are affected by this. Please see the test case 
for a sample query. The current workaround is to filter the nulls before 
averaging.

> NullPointerException in AVG 
> 
>
> Key: PIG-1233
> URL: https://issues.apache.org/jira/browse/PIG-1233
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ankur
>Assignee: Ankur
> Fix For: 0.7.0
>
> Attachments: jira-1233.patch
>
>
> The overridden method - getValue() in AVG throws null pointer exception in 
> case accumulate() is not called leaving variable 'intermediateCount'  
> initialized to null. This causes java to throw exception when it tries to 
> 'unbox' the value for numeric comparison.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1233) NullPointerException in AVG

2010-02-16 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-1233:
---

Status: In Progress  (was: Patch Available)

> NullPointerException in AVG 
> 
>
> Key: PIG-1233
> URL: https://issues.apache.org/jira/browse/PIG-1233
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ankur
>Assignee: Ankur
> Fix For: 0.7.0
>
> Attachments: jira-1233.patch
>
>
> The overridden method - getValue() in AVG throws null pointer exception in 
> case accumulate() is not called leaving variable 'intermediateCount'  
> initialized to null. This causes java to throw exception when it tries to 
> 'unbox' the value for numeric comparison.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1233) NullPointerException in AVG

2010-02-16 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-1233:
---

Status: Patch Available  (was: In Progress)

Retrying as suggested by Olga

> NullPointerException in AVG 
> 
>
> Key: PIG-1233
> URL: https://issues.apache.org/jira/browse/PIG-1233
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ankur
>Assignee: Ankur
> Fix For: 0.7.0
>
> Attachments: jira-1233.patch
>
>
> The overridden method - getValue() in AVG throws null pointer exception in 
> case accumulate() is not called leaving variable 'intermediateCount'  
> initialized to null. This causes java to throw exception when it tries to 
> 'unbox' the value for numeric comparison.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1233) NullPointerException in AVG

2010-02-17 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12835136#action_12835136
 ] 

Ankur commented on PIG-1233:


In the current code path we cannot have a situation where intermediateCount in 
NOT null but intermediateSum is null. So just checking the former if sufficient.

> NullPointerException in AVG 
> 
>
> Key: PIG-1233
> URL: https://issues.apache.org/jira/browse/PIG-1233
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ankur
>Assignee: Ankur
> Fix For: 0.7.0
>
> Attachments: jira-1233.patch
>
>
> The overridden method - getValue() in AVG throws null pointer exception in 
> case accumulate() is not called leaving variable 'intermediateCount'  
> initialized to null. This causes java to throw exception when it tries to 
> 'unbox' the value for numeric comparison.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1273) Skewed join throws error

2010-03-02 Thread Ankur (JIRA)
Skewed join throws error 
-

 Key: PIG-1273
 URL: https://issues.apache.org/jira/browse/PIG-1273
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Ankur


When the sampled relation is too small or empty then skewed join fails.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1273) Skewed join throws error

2010-03-02 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840482#action_12840482
 ] 

Ankur commented on PIG-1273:


Here is a simple script to reproduce it

a = load 'test.dat' using PigStorage() as (nums:chararray);
b = load 'join.dat' using PigStorage('\u0001') as 
(number:chararray,text:chararray);
c = filter a by nums == '7';
d = join c by nums LEFT OUTER, b by number USING "skewed";
dump d;

 test.dat 
1
2
3
4
5

= join.dat =
1^Aone
2^Atwo
3^Athree

where ^A means Control-A charatcer used as a separator.

> Skewed join throws error 
> -
>
> Key: PIG-1273
> URL: https://issues.apache.org/jira/browse/PIG-1273
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ankur
>
> When the sampled relation is too small or empty then skewed join fails.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1273) Skewed join throws error

2010-03-02 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840483#action_12840483
 ] 

Ankur commented on PIG-1273:


Complete stack trace of the error thrown my 3rd M/R job in the pipeline

java.lang.RuntimeException: Error in configuring object
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at 
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at 
org.apache.hadoop.mapred.MapTask$OldOutputCollector.(MapTask.java:448)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:159)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88)
... 6 more
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Empty 
samples file
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.SkewedPartitioner.configure(SkewedPartitioner.java:128)
... 11 more
Caused by: java.lang.RuntimeException: Empty samples file
at 
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil.loadPartitionFile(MapRedUtil.java:128)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.SkewedPartitioner.configure(SkewedPartitioner.java:125)
... 11 more


> Skewed join throws error 
> -
>
> Key: PIG-1273
> URL: https://issues.apache.org/jira/browse/PIG-1273
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ankur
>
> When the sampled relation is too small or empty then skewed join fails.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1274) Column pruning throws Null pointer exception

2010-03-02 Thread Ankur (JIRA)
Column pruning throws Null pointer exception


 Key: PIG-1274
 URL: https://issues.apache.org/jira/browse/PIG-1274
 Project: Pig
  Issue Type: Bug
Reporter: Ankur


In case data has missing values for certain columns in a relation participating 
in a join, column pruning throws null pointer exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1274) Column pruning throws Null pointer exception

2010-03-02 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12840514#action_12840514
 ] 

Ankur commented on PIG-1274:


Here is a script to reproduce the error

== pig script  =

R1 = load 'data1' as (a:chararray, b:chararray, c:chararray, d:chararray);
R2 = load 'data2' as (x:chararray, y:chararray, z:chararray);
joined = join R1 by c, R2 by z;
projected = FOREACH joined generate c, d;
dump projected;

= data 1==
a   b
== data 2 ==
a   b   c
a   t   d
a   x   e

The exception log is


ERROR 1002: Unable to store alias projected

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open 
iterator for alias projected
at org.apache.pig.PigServer.openIterator(PigServer.java:482)
at 
org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:552)
at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
...
...
Caused by: java.lang.NullPointerException
at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:149)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:234)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast.getNext(POCast.java:615)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:364)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:288)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:260)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:256)
at 
org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POCogroup.accumulateData(POCogroup.java:177)
at 
org.apache.pig.backend.local.executionengine.physicalLayer.relationalOperators.POCogroup.getNext(POCogroup.java:96)

> Column pruning throws Null pointer exception
> 
>
> Key: PIG-1274
> URL: https://issues.apache.org/jira/browse/PIG-1274
> Project: Pig
>  Issue Type: Bug
>Reporter: Ankur
>
> In case data has missing values for certain columns in a relation 
> participating in a join, column pruning throws null pointer exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

2010-03-03 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12841049#action_12841049
 ] 

Ankur commented on PIG-1229:


Sure, I'll do that. Give me a couple days of time.

> allow pig to write output into a JDBC db
> 
>
> Key: PIG-1229
> URL: https://issues.apache.org/jira/browse/PIG-1229
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Reporter: Ian Holsman
>Assignee: Ankur
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: hsqldb.jar, jira-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Reopened: (PIG-1238) Dump does not respect the schema

2010-03-19 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur reopened PIG-1238:



This does not work for me when i take a fresh checkout from the trunk. I still 
get the same error

> Dump does not respect the schema
> 
>
> Key: PIG-1238
> URL: https://issues.apache.org/jira/browse/PIG-1238
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ankur
>Assignee: Richard Ding
> Fix For: 0.7.0
>
> Attachments: PIG-1238.patch
>
>
> For complex data type and certain sequence of operations dump produces 
> results with non-existent field in the relation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

2010-03-21 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12847909#action_12847909
 ] 

Ankur commented on PIG-1229:


@Ashtosh Chauhan 
I read the HSQLDB license and it looked ok to me but I am not a lawyer :-) . 
Besides that apache cocoon uses it. I think we should be ok pulling it through 
ivy.

I'll make the ivy and load-store related changes and submit a new patch on 
Monday.

Sorry for the delay.
 

> allow pig to write output into a JDBC db
> 
>
> Key: PIG-1229
> URL: https://issues.apache.org/jira/browse/PIG-1229
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Reporter: Ian Holsman
>Assignee: Ankur
>Priority: Minor
> Fix For: 0.7.0
>
> Attachments: hsqldb.jar, jira-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1327) Incorrect column pruning after multiple JOIN operations

2010-03-25 Thread Ankur (JIRA)
Incorrect column pruning after multiple JOIN operations
---

 Key: PIG-1327
 URL: https://issues.apache.org/jira/browse/PIG-1327
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Ankur


In a script with multiple JOIN and GROUP operations, the column pruner 
incorrectly removes some of the fields that it shouldn't. Here is a script that 
demonstrates the issue

 = LOAD 'data1' USING PigStorage() AS (a:chararray, b:chararray, c:long);
B = LOAD 'data2' USING PigStorage() AS (x:chararray, y:chararray, z:long);
C = LOAD 'data3' using PigStorage() AS (d:chararray, e:chararray, f:chararray);

join1 = JOIN B by x, A by a;
filtered1 = FILTER join1  BY y == b;
InterimData = FOREACH filtered1 GENERATE a, b, c, y, z;
join2 = JOIN InterimData BY b LEFT OUTER, C BY d  PARALLEL 2;
proj = FOREACH join2 GENERATE a,b,y,z,e,f;
TopNPrj = FOREACH proj GENERATE a, (( e is not null and e != '') ? e : 'None') 
, z;
TopNDataGrp = GROUP TopNPrj BY (a, e) PARALLEL 2;
TopNDataSum = FOREACH TopNDataGrp GENERATE flatten(group) as (a, e), 
SUM(TopNPrj.z) as views;
TopNDataRegrp = GROUP TopNDataSum BY (a) PARALLEL 2;
TopNDataCount = FOREACH TopNDataRegrp { OrderedData = ORDER TopNDataSum BY 
views desc; LimitedData = LIMIT OrderedData 50; GENERATE LimitedData; }
TopNData = FOREACH TopNDataCount GENERATE flatten($0) as (a, e, views);
store TopNData into 'tmpTopN';
TopNData_stored = load 'tmpTopN' as (a:chararray, b:chararray, c:long);
joinTopNData = JOIN TopNData_stored BY (a,b) RIGHT OUTER, proj BY (a,b) 
PARALLEL 2;
describe joinTopNData;
STORE  joinTopNData  INTO 'output';

The column 'f' from relation 'C' participating in the 2nd JOIN is missing from 
the final join ouput

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1327) Incorrect column pruning after multiple JOIN operations

2010-03-25 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12849995#action_12849995
 ] 

Ankur commented on PIG-1327:


Yes, I verified that

> Incorrect column pruning after multiple JOIN operations
> ---
>
> Key: PIG-1327
> URL: https://issues.apache.org/jira/browse/PIG-1327
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Ankur
>
> In a script with multiple JOIN and GROUP operations, the column pruner 
> incorrectly removes some of the fields that it shouldn't. Here is a script 
> that demonstrates the issue
>  = LOAD 'data1' USING PigStorage() AS (a:chararray, b:chararray, c:long);
> B = LOAD 'data2' USING PigStorage() AS (x:chararray, y:chararray, z:long);
> C = LOAD 'data3' using PigStorage() AS (d:chararray, e:chararray, 
> f:chararray);
> join1 = JOIN B by x, A by a;
> filtered1 = FILTER join1  BY y == b;
> InterimData = FOREACH filtered1 GENERATE a, b, c, y, z;
> join2 = JOIN InterimData BY b LEFT OUTER, C BY d  PARALLEL 2;
> proj = FOREACH join2 GENERATE a,b,y,z,e,f;
> TopNPrj = FOREACH proj GENERATE a, (( e is not null and e != '') ? e : 
> 'None') , z;
> TopNDataGrp = GROUP TopNPrj BY (a, e) PARALLEL 2;
> TopNDataSum = FOREACH TopNDataGrp GENERATE flatten(group) as (a, e), 
> SUM(TopNPrj.z) as views;
> TopNDataRegrp = GROUP TopNDataSum BY (a) PARALLEL 2;
> TopNDataCount = FOREACH TopNDataRegrp { OrderedData = ORDER TopNDataSum BY 
> views desc; LimitedData = LIMIT OrderedData 50; GENERATE LimitedData; }
> TopNData = FOREACH TopNDataCount GENERATE flatten($0) as (a, e, views);
> store TopNData into 'tmpTopN';
> TopNData_stored = load 'tmpTopN' as (a:chararray, b:chararray, c:long);
> joinTopNData = JOIN TopNData_stored BY (a,b) RIGHT OUTER, proj BY (a,b) 
> PARALLEL 2;
> describe joinTopNData;
> STORE  joinTopNData  INTO 'output';
> The column 'f' from relation 'C' participating in the 2nd JOIN is missing 
> from the final join ouput

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

2010-03-30 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-1229:
---

Attachment: jira-1229-v2.patch

Here is the updated patch that compiles against pig 0.7 branch and implements 
new load/store APIs. 

Note:- that I haven't used hadoop's DBOutputFormat as the code is not yet moved 
to o.p.h.mapreduce.lib and hence there are compatibility issues.

> allow pig to write output into a JDBC db
> 
>
> Key: PIG-1229
> URL: https://issues.apache.org/jira/browse/PIG-1229
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Reporter: Ian Holsman
>Assignee: Ankur
>Priority: Minor
> Fix For: 0.8.0
>
> Attachments: jira-1229-v2.patch, jira-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

2010-03-30 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-1229:
---

Attachment: (was: hsqldb.jar)

> allow pig to write output into a JDBC db
> 
>
> Key: PIG-1229
> URL: https://issues.apache.org/jira/browse/PIG-1229
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Reporter: Ian Holsman
>Assignee: Ankur
>Priority: Minor
> Fix For: 0.8.0
>
> Attachments: jira-1229-v2.patch, jira-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

2010-03-30 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-1229:
---

Attachment: (was: jira-1229.patch)

> allow pig to write output into a JDBC db
> 
>
> Key: PIG-1229
> URL: https://issues.apache.org/jira/browse/PIG-1229
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Reporter: Ian Holsman
>Assignee: Ankur
>Priority: Minor
> Fix For: 0.8.0
>
> Attachments: jira-1229-v2.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

2010-03-30 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-1229:
---

Status: In Progress  (was: Patch Available)

> allow pig to write output into a JDBC db
> 
>
> Key: PIG-1229
> URL: https://issues.apache.org/jira/browse/PIG-1229
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Reporter: Ian Holsman
>Assignee: Ankur
>Priority: Minor
> Fix For: 0.8.0
>
> Attachments: jira-1229-v2.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

2010-03-30 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-1229:
---

Status: Patch Available  (was: In Progress)

> allow pig to write output into a JDBC db
> 
>
> Key: PIG-1229
> URL: https://issues.apache.org/jira/browse/PIG-1229
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Reporter: Ian Holsman
>Assignee: Ankur
>Priority: Minor
> Fix For: 0.8.0
>
> Attachments: jira-1229-v2.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

2010-03-31 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12852243#action_12852243
 ] 

Ankur commented on PIG-1229:


Ashutosh,
   Thanks for the review comments. Accepting the store location via 
setStoreLocation() definitely makes sense. However I am not sure about checking 
database reachability in checkOutputSepcs() 
since that may be called on the client side as well and the DB machine may not 
be reachable from the client machine. Isn't OutputFormat's setupTask()  a 
better place to do a DB availability checks ?
This sounds like a reasonable ask before a commit. I will incorporate this and 
submit a new patch 

> Doing DataType.find() 
I assume this is what you have in mind :-
1. Getting DB Schema information for the table we are writing to.
2. Use checkSchema() API to validate this with Pig supplied schema and 
cache it.
3. Use the cached information in the putNext() method.

This is more of a performance enhancement and looks like more work. So I would 
prefer if we track this as a JIRA for DBStorage.

> allow pig to write output into a JDBC db
> 
>
> Key: PIG-1229
> URL: https://issues.apache.org/jira/browse/PIG-1229
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Reporter: Ian Holsman
>Assignee: Ankur
>Priority: Minor
> Fix For: 0.8.0
>
> Attachments: jira-1229-v2.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

2010-04-06 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12853843#action_12853843
 ] 

Ankur commented on PIG-1229:


So accepting the JDBC URL in setStoreLocation() exposes a flaw in Hadoop's Path 
class and it causes test case to fail with following exception

java.net.URISyntaxException: Relative path in absolute URI: 
jdbc:hsqldb:file:/tmp/batchtest;hsqldb.default_table_type=cached;hsqldb.cache_rows=100
java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path 
in absolute URI: 
jdbc:hsqldb:file:/tmp/batchtest;hsqldb.default_table_type=cached;hsqldb.cache_rows=100
at org.apache.hadoop.fs.Path.initialize(Path.java:140)
at org.apache.hadoop.fs.Path.(Path.java:126)
at org.apache.pig.LoadFunc.getAbsolutePath(LoadFunc.java:238)
at 
org.apache.pig.StoreFunc.relToAbsPathForStoreLocation(StoreFunc.java:60)
at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.StoreClause(QueryParser.java:3587)
...
...
Caused by: java.net.URISyntaxException: Relative path in absolute URI: 
jdbc:hsqldb:file:/tmp/batchtest;hsqldb.default_table_type=cached;hsqldb.cache_rows=100
at java.net.URI.checkPath(URI.java:1787)
at java.net.URI.(URI.java:735)
at org.apache.hadoop.fs.Path.initialize(Path.java:137)

Looking at the code of Path.java it seems like it extracts scheme based on the 
first occurrence of ':', this causes authority and path to be extracted 
incorrectly resulting in the above exception thrown java.net.URI. 
However if I try to initialize URI directly with the URL string, no exception 
is thrown.

As for DB reachability check, I think it is ok to check the availability at the 
runtime an fail if its available. We do this prepareToWrite(). 
For performance enhancement, I think we can track that via separate issue.

This patch has taken quite a while now and I wouldn't want to delay it further 
by depending on a hadoop fix.

So If a reviewer does not find any blocking issues then my suggestion is to go 
ahead with the commit. 

> allow pig to write output into a JDBC db
> 
>
> Key: PIG-1229
> URL: https://issues.apache.org/jira/browse/PIG-1229
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Reporter: Ian Holsman
>Assignee: Ankur
>Priority: Minor
> Fix For: 0.8.0
>
> Attachments: jira-1229-v2.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

2010-04-11 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12855835#action_12855835
 ] 

Ankur commented on PIG-1229:


* Sigh *
The problem is with hadoop's Path implementation that has problems 
understanding JDBC URLs correctly. So turning relToAbsPathForStoreFunction() 
does NOT help. 
The URI SyntaxException is now propagated to the point of setting output path 
for the job. Here is the new trace from the text execution failure with 
suggested workaround

org.apache.pig.backend.executionengine.ExecException: ERROR 2043: Unexpected 
error during execution.
at 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:332)
at 
org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:835)
at org.apache.pig.PigServer.execute(PigServer.java:828)
at org.apache.pig.PigServer.access$100(PigServer.java:105)
at org.apache.pig.PigServer$Graph.execute(PigServer.java:1080)
at org.apache.pig.PigServer.executeBatch(PigServer.java:288)
at 
org.apache.pig.piggybank.test.storage.TestDBStorage.testWriteToDB(Unknown 
Source)
Caused by: 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobCreationException:
 ERROR 2017: Internal error creating job configuration.
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:624)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.compile(JobControlCompiler.java:246)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:131)
at 
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:308)
Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: 
Relative path in absolute URI: 
jdbc:hsqldb:file:/tmp/batchtest;hsqldb.default_table_type=cached;hsqldb.cache_rows=100
at org.apache.hadoop.fs.Path.initialize(Path.java:140)
at org.apache.hadoop.fs.Path.(Path.java:126)
at org.apache.hadoop.fs.Path.(Path.java:45)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.getJob(JobControlCompiler.java:459)
Caused by: java.net.URISyntaxException: Relative path in absolute URI: 
jdbc:hsqldb:file:/tmp/batchtest;hsqldb.default_table_type=cached;hsqldb.cache_rows=100
at java.net.URI.checkPath(URI.java:1787)
at java.net.URI.(URI.java:735)
at org.apache.hadoop.fs.Path.initialize(Path.java:137)


  

> allow pig to write output into a JDBC db
> 
>
> Key: PIG-1229
> URL: https://issues.apache.org/jira/browse/PIG-1229
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Reporter: Ian Holsman
>Assignee: Ankur
>Priority: Minor
> Fix For: 0.8.0
>
> Attachments: jira-1229-v2.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

2010-04-13 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12856761#action_12856761
 ] 

Ankur commented on PIG-1229:


Any updates ? 

> allow pig to write output into a JDBC db
> 
>
> Key: PIG-1229
> URL: https://issues.apache.org/jira/browse/PIG-1229
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Reporter: Ian Holsman
>Assignee: Ankur
>Priority: Minor
> Fix For: 0.8.0
>
> Attachments: jira-1229-v2.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

2010-04-15 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12857253#action_12857253
 ] 

Ankur commented on PIG-1229:


So I read the complete thread and here are my thoughts:-

- Speculative execution issue : With recent changes of moving to Hadoop's I/O 
format in Load/Store, DBStorage has been modified to commit the data to DB in 
OutputCommitter's 
commitTask() method.   Hadoop itself gaurantees that the method will be called 
only for first successful attempt so it shouldn't matter whether or not 
speculative execution is on. 
BUT this does NOT solve the problem where certain tasks finished successfully 
but the JOB itself failed in which case the data from successful attempts 
should be rolled back.

- Writing to Temporary Table: Even this does not handle the case the above case 
since some of the tasks would have moved their data to the actual table.

- Bulk loading : This is the most suitable option in my opinion if the data is 
large. However for small to medium data size (like aggregate summaries), I 
found DBStorage UDF to be most helpful. 
It just eliminates one more layer of processing from the application. In fact 
this was precisely the reason it was written for.

So in a nutshell, using a single mapper/reducer with this patch should be good 
regardless of speculative execution being off/on. In case of multiple 
mappers/reducers writing to DB it should be application's
responsibility to cleanup data ONLY IN CASE of job failure.

> allow pig to write output into a JDBC db
> 
>
> Key: PIG-1229
> URL: https://issues.apache.org/jira/browse/PIG-1229
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Reporter: Ian Holsman
>Assignee: Ankur
>Priority: Minor
> Fix For: 0.8.0
>
> Attachments: jira-1229-v2.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (PIG-1379) Jars registered from command line should override the ones present in the script

2010-04-15 Thread Ankur (JIRA)
Jars registered from command line should override the ones present in the 
script 
-

 Key: PIG-1379
 URL: https://issues.apache.org/jira/browse/PIG-1379
 Project: Pig
  Issue Type: Improvement
Reporter: Ankur
 Fix For: 0.7.0


Jars that are registered from the command line when executing the pig script 
should override the ones that are specified via 'register' in the pig script 
itself.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Created: (PIG-1392) Parser fails to recognize valid field

2010-04-23 Thread Ankur (JIRA)
Parser fails to recognize valid field
-

 Key: PIG-1392
 URL: https://issues.apache.org/jira/browse/PIG-1392
 Project: Pig
  Issue Type: Bug
Reporter: Ankur


Using this script below, parser fails to recognize a valid field in the 
relation and throws error

A = LOAD '/tmp' as (a:int, b:chararray, c:int);
B = GROUP A BY (a, b);
C = FOREACH B { bg = A.(b,c); GENERATE group, bg; } ;

The error thrown is

2010-04-23 10:16:20,610 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
1000: Error during parsing. Invalid alias: c in {group: (a: int,b: 
chararray),A: {a: int,b: chararray,c: int}}


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1392) Parser fails to recognize valid field

2010-04-23 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-1392:
---

Fix Version/s: 0.7.0

> Parser fails to recognize valid field
> -
>
> Key: PIG-1392
> URL: https://issues.apache.org/jira/browse/PIG-1392
> Project: Pig
>  Issue Type: Bug
>Reporter: Ankur
> Fix For: 0.7.0
>
>
> Using this script below, parser fails to recognize a valid field in the 
> relation and throws error
> A = LOAD '/tmp' as (a:int, b:chararray, c:int);
> B = GROUP A BY (a, b);
> C = FOREACH B { bg = A.(b,c); GENERATE group, bg; } ;
> The error thrown is
> 2010-04-23 10:16:20,610 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1000: Error during parsing. Invalid alias: c in {group: (a: int,b: 
> chararray),A: {a: int,b: chararray,c: int}}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1392) Parser fails to recognize valid field

2010-04-23 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12860218#action_12860218
 ] 

Ankur commented on PIG-1392:


This script works

A = LOAD '/tmp' as (a:int, b:chararray, c:int);
B = GROUP A BY (a, b);
C = FOREACH B {  GENERATE group, A.(b,c); } ;

> Parser fails to recognize valid field
> -
>
> Key: PIG-1392
> URL: https://issues.apache.org/jira/browse/PIG-1392
> Project: Pig
>  Issue Type: Bug
>Reporter: Ankur
> Fix For: 0.7.0
>
>
> Using this script below, parser fails to recognize a valid field in the 
> relation and throws error
> A = LOAD '/tmp' as (a:int, b:chararray, c:int);
> B = GROUP A BY (a, b);
> C = FOREACH B { bg = A.(b,c); GENERATE group, bg; } ;
> The error thrown is
> 2010-04-23 10:16:20,610 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1000: Error during parsing. Invalid alias: c in {group: (a: int,b: 
> chararray),A: {a: int,b: chararray,c: int}}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1393) Bug in Nested FOREACH

2010-04-25 Thread Ankur (JIRA)
Bug in Nested FOREACH
-

 Key: PIG-1393
 URL: https://issues.apache.org/jira/browse/PIG-1393
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Ankur
 Fix For: 0.8.0


Following script makes the parser throw an error

A = load 'data' as ( a: int, b: map[]) ;
B = foreach A generate ((chararray) b#'url') as url;
C = foreach B { 
  urlQueryFields = url#'queryFields';
  result = (urlQueryFields is not null) ? urlQueryFields : 1;
  generate result;
};


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

2010-04-26 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-1229:
---

Attachment: jira-1229-v3.patch

Here you go ...

> allow pig to write output into a JDBC db
> 
>
> Key: PIG-1229
> URL: https://issues.apache.org/jira/browse/PIG-1229
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Reporter: Ian Holsman
>Assignee: Ankur
>Priority: Minor
> Fix For: 0.8.0
>
> Attachments: jira-1229-v2.patch, jira-1229-v3.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1229) allow pig to write output into a JDBC db

2010-05-20 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12869552#action_12869552
 ] 

Ankur commented on PIG-1229:


Hi Ashutosh,
   Thanks for helping out here. The error that you see - 
"...The database is already in use by another process" is due to locking issues 
in hsqldb 1.8.0.7. Upgrading to 1.8.0.10 
alleviates the problem and the test passes successfully. Few changes that I did

1. Added a placeholder record-writer as PigOutputFormat calls close() on it 
throwing null pointer exception if we return null from our output format.
2. Looks like you missed the ivy.xml and build.xml changes to pull the correct 
hsqldb jar.
 

> allow pig to write output into a JDBC db
> 
>
> Key: PIG-1229
> URL: https://issues.apache.org/jira/browse/PIG-1229
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Reporter: Ian Holsman
>Assignee: Ankur
>Priority: Minor
> Fix For: 0.8.0
>
> Attachments: jira-1229-v2.patch, jira-1229-v3.patch, 
> pig-1229.2.patch, pig-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

2010-05-20 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-1229:
---

Attachment: pig-1229.2.patch

> allow pig to write output into a JDBC db
> 
>
> Key: PIG-1229
> URL: https://issues.apache.org/jira/browse/PIG-1229
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Reporter: Ian Holsman
>Assignee: Ankur
>Priority: Minor
> Fix For: 0.8.0
>
> Attachments: jira-1229-v2.patch, jira-1229-v3.patch, 
> pig-1229.2.patch, pig-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1462) No informative error message on parse problem

2010-06-22 Thread Ankur (JIRA)
No informative error message on parse problem
-

 Key: PIG-1462
 URL: https://issues.apache.org/jira/browse/PIG-1462
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Ankur


Consider the following script

in = load 'data' using PigStorage() as (m:map[]);
tags = foreach in generate m#'k1' as (tagtuple: tuple(chararray));
dump tags;

This throws the following error message that does not really say that this is a 
bad declaration

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during 
parsing. Encountered "" at line 2, column 38.
Was expecting one of:

at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1170)
at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
at 
org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737)
at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
at org.apache.pig.Main.main(Main.java:391)


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1462) No informative error message on parse problem

2010-06-22 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881551#action_12881551
 ] 

Ankur commented on PIG-1462:


Right, the JIRA is for adding a better error message that doesn't leave a user 
guessing

> No informative error message on parse problem
> -
>
> Key: PIG-1462
> URL: https://issues.apache.org/jira/browse/PIG-1462
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Ankur
>
> Consider the following script
> in = load 'data' using PigStorage() as (m:map[]);
> tags = foreach in generate m#'k1' as (tagtuple: tuple(chararray));
> dump tags;
> This throws the following error message that does not really say that this is 
> a bad declaration
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during 
> parsing. Encountered "" at line 2, column 38.
> Was expecting one of:
> 
>   at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1170)
>   at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
>   at org.apache.pig.PigServer.registerQuery(PigServer.java:425)
>   at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737)
>   at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
>   at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
>   at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
>   at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
>   at org.apache.pig.Main.main(Main.java:391)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1482) Pig gets confused when more than one loader is involved

2010-07-06 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885840#action_12885840
 ] 

Ankur commented on PIG-1482:


forgot to add

Include this change as well for the above script to work

G = FOREACH F GENERATE group.v1, group.a;

> Pig gets confused when more than one loader is involved
> ---
>
> Key: PIG-1482
> URL: https://issues.apache.org/jira/browse/PIG-1482
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Ankur
>
> In case of two relations being loaded using different loader, joined, grouped 
> and projected, pig gets confused in trying to find appropriate loader for the 
> requested cast. Consider the following script :-
> A = LOAD 'data1' USING PigStorage() AS (s, m, l);
> B = FOREACH A GENERATE s#'k1' as v1, m#'k2' as v2, l#'k3' as v3;
> C = FOREACH B GENERATE v1, (v2 == 'v2' ? 1L : 0L) as v2:long, (v3 == 'v3' ? 1 
> :0) as v3:int;
> D = LOAD 'data2' USING TextLoader() AS (a);
> E = JOIN C BY v1, D BY a USING 'replicated';
> F = GROUP E BY (v1, a);
> G = FOREACH F GENERATE (chararray)group.v1, group.a;
> 
> dump G;
> This throws the error, stack trace of which is in the next comment

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1482) Pig gets confused when more than one loader is involved

2010-07-06 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885839#action_12885839
 ] 

Ankur commented on PIG-1482:


Casting early alleviates the problem. So this makes the above script work

C = FOREACH B GENERATE (chararray) v1, (v2 == 'v2' ? 1L : 0L) as v2:long, (v3 
== 'v3' ? 1 :0) as v3:int;

> Pig gets confused when more than one loader is involved
> ---
>
> Key: PIG-1482
> URL: https://issues.apache.org/jira/browse/PIG-1482
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Ankur
>
> In case of two relations being loaded using different loader, joined, grouped 
> and projected, pig gets confused in trying to find appropriate loader for the 
> requested cast. Consider the following script :-
> A = LOAD 'data1' USING PigStorage() AS (s, m, l);
> B = FOREACH A GENERATE s#'k1' as v1, m#'k2' as v2, l#'k3' as v3;
> C = FOREACH B GENERATE v1, (v2 == 'v2' ? 1L : 0L) as v2:long, (v3 == 'v3' ? 1 
> :0) as v3:int;
> D = LOAD 'data2' USING TextLoader() AS (a);
> E = JOIN C BY v1, D BY a USING 'replicated';
> F = GROUP E BY (v1, a);
> G = FOREACH F GENERATE (chararray)group.v1, group.a;
> 
> dump G;
> This throws the error, stack trace of which is in the next comment

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1482) Pig gets confused when more than one loader is involved

2010-07-06 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12885838#action_12885838
 ] 

Ankur commented on PIG-1482:


ERROR 1065: Found more than one load function to use: [PigStorage, TextLoader]

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open 
iterator for alias K
at org.apache.pig.PigServer.openIterator(PigServer.java:521)
at 
org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:544)
at 
org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
at 
org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
at org.apache.pig.Main.main(Main.java:391)
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: 
Unable to store alias K
at org.apache.pig.PigServer.store(PigServer.java:577)
at org.apache.pig.PigServer.openIterator(PigServer.java:504)
... 6 more
Caused by: org.apache.pig.impl.plan.PlanValidationException: ERROR 0: An 
unexpected exception caused the validation to stop
at 
org.apache.pig.impl.plan.PlanValidator.validateSkipCollectException(PlanValidator.java:104)
at 
org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:40)
at 
org.apache.pig.impl.logicalLayer.validators.TypeCheckingValidator.validate(TypeCheckingValidator.java:30)
at 
org.apache.pig.impl.logicalLayer.validators.LogicalPlanValidationExecutor.validate(LogicalPlanValidationExecutor.java:89)
at org.apache.pig.PigServer.validate(PigServer.java:930)
at org.apache.pig.PigServer.compileLp(PigServer.java:884)
at org.apache.pig.PigServer.store(PigServer.java:568)
... 7 more
Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: 
ERROR 1053: Cannot resolve load function to use for casting from bytearray to 
chararray.
at 
org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:1775)
at org.apache.pig.impl.logicalLayer.LOCast.visit(LOCast.java:67)
at org.apache.pig.impl.logicalLayer.LOCast.visit(LOCast.java:32)
at 
org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:69)
at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
at 
org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.checkInnerPlan(TypeCheckingVisitor.java:2819)
at 
org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.visit(TypeCheckingVisitor.java:2723)
at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:130)
at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:45)
at 
org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:69)
at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
at 
org.apache.pig.impl.plan.PlanValidator.validateSkipCollectException(PlanValidator.java:101)
... 13 more
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1065: 
Found more than one load function to use: [PigStorage, TextLoader]
at 
org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.getLoadFuncSpec(TypeCheckingVisitor.java:3161)
at 
org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.getLoadFuncSpec(TypeCheckingVisitor.java:3176)
at 
org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.getLoadFuncSpec(TypeCheckingVisitor.java:3103)
at 
org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.getLoadFuncSpec(TypeCheckingVisitor.java:3176)
at 
org.apache.pig.impl.logicalLayer.validators.TypeCheckingVisitor.getLoadFuncSpec(TypeCheckingVisitor.java:3103)


> Pig gets confused when more than one loader is involved
> ---
>
> Key: PIG-1482
> URL: https://issues.apache.org/jira/browse/PIG-1482
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Ankur
>
> In case of two relations being loaded using different loader, joined, grouped 
> and projected, pig gets confused in trying to find appropriate loader for the 
> requested cast. Consider the following script :-
> A = LOAD 'data1' USING PigStorage() AS (s, m, l);
> B = FOREACH A GENERATE s#'k1' as v1, m#'k2' as v2, l#'k3' as v3;
> C = FOREACH B GENERATE v1, (v2 == 'v2' ? 1L : 0L) as v2:long, (v3 == 'v3' ? 1 
> :0) as v3:int;
> D = LOAD 'data2' USING TextLoader() AS (a);
> E = JOIN C BY v1, D BY a USING 'replicated';
> F = GROUP E BY (v1, a);
> G = FOREACH F GENERATE (chararray)group.v1, group

[jira] Created: (PIG-1482) Pig gets confused when more than one loader is involved

2010-07-06 Thread Ankur (JIRA)
Pig gets confused when more than one loader is involved
---

 Key: PIG-1482
 URL: https://issues.apache.org/jira/browse/PIG-1482
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Ankur


In case of two relations being loaded using different loader, joined, grouped 
and projected, pig gets confused in trying to find appropriate loader for the 
requested cast. Consider the following script :-

A = LOAD 'data1' USING PigStorage() AS (s, m, l);
B = FOREACH A GENERATE s#'k1' as v1, m#'k2' as v2, l#'k3' as v3;
C = FOREACH B GENERATE v1, (v2 == 'v2' ? 1L : 0L) as v2:long, (v3 == 'v3' ? 1 
:0) as v3:int;

D = LOAD 'data2' USING TextLoader() AS (a);
E = JOIN C BY v1, D BY a USING 'replicated';

F = GROUP E BY (v1, a);
G = FOREACH F GENERATE (chararray)group.v1, group.a;

dump G;

This throws the error, stack trace of which is in the next comment


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1516) finalize in bag implementations causes pig to run out of memory in reduce

2010-07-25 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12892176#action_12892176
 ] 

Ankur commented on PIG-1516:


The solution to have the finalize method AT ALL for the purpose of deleting 
files when object is garbage collected is NOT a good one. Generally speaking 
using finalizers to release non-memory resources like file handles should be 
avoided as it has an insidious bug. From the article on "Object finalization 
and Cleanup" - http://www.javaworld.com/jw-06-1998/jw-06-techniques.html

"Don't rely on finalizers to release non-memory resources"

An example of an object that breaks this rule is one that opens a file in its 
constructor and closes the file in its finalize() method. Although this design 
seems neat, tidy, and symmetrical, it potentially creates an insidious bug. A 
Java program generally will have only a finite number of file handles at its 
disposal. When all those handles are in use, the program won't be able to open 
any more files.  

> finalize in bag implementations causes pig to run out of memory in reduce 
> --
>
> Key: PIG-1516
> URL: https://issues.apache.org/jira/browse/PIG-1516
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Thejas M Nair
>Assignee: Thejas M Nair
> Fix For: 0.8.0
>
>
> *Problem:*
> pig bag implementations that are subclasses of DefaultAbstractBag, have 
> finalize methods implemented. As a result, the garbage collector moves them 
> to a finalization queue, and the memory used is freed only after the 
> finalization happens on it.
> If the bags are not finalized fast enough, a lot of memory is consumed by the 
> finalization queue, and pig runs out of memory. This can happen if large 
> number of small bags are being created.
> *Solution:*
> The finalize function exists for the purpose of deleting the spill files that 
> are created when the bag is too large. But if the bags are small enough, no 
> spill files are created, and there is no use of the finalize function.
>  A new class that holds a list of files will be introduced (FileList). This 
> class will have a finalize method that deletes the files. The bags will no 
> longer have finalize methods, and the bags will use FileList instead of 
> ArrayList.
> *Possible workaround for earlier releases:*
> Since the fix is going into 0.8, here is a workaround -
> Disabling the combiner will reduce the number of bags getting created, as 
> there will not be the stage of combining intermediate merge results. But I 
> would recommend disabling it only if you have this problem as it is likely to 
> slow down the query .
> To disable combiner, set the property: -Dpig.exec.nocombiner=true

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

2010-07-27 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-1229:
---

Status: In Progress  (was: Patch Available)

> allow pig to write output into a JDBC db
> 
>
> Key: PIG-1229
> URL: https://issues.apache.org/jira/browse/PIG-1229
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Reporter: Ian Holsman
>Assignee: Ankur
>Priority: Minor
> Fix For: 0.8.0
>
> Attachments: jira-1229-final.patch, jira-1229-v2.patch, 
> jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

2010-07-27 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-1229:
---

Attachment: jira-1229-final.patch

Hope this one finally goes in .

> allow pig to write output into a JDBC db
> 
>
> Key: PIG-1229
> URL: https://issues.apache.org/jira/browse/PIG-1229
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Reporter: Ian Holsman
>Assignee: Ankur
>Priority: Minor
> Fix For: 0.8.0
>
> Attachments: jira-1229-final.patch, jira-1229-v2.patch, 
> jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

2010-07-27 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-1229:
---

Status: Patch Available  (was: In Progress)

Regenerated the patch as per Ashutosh's suggestion.

> allow pig to write output into a JDBC db
> 
>
> Key: PIG-1229
> URL: https://issues.apache.org/jira/browse/PIG-1229
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Reporter: Ian Holsman
>Assignee: Ankur
>Priority: Minor
> Fix For: 0.8.0
>
> Attachments: jira-1229-final.patch, jira-1229-v2.patch, 
> jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

2010-08-03 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-1229:
---

Attachment: jira-1229-final.test-fix.patch

Attaching the patch with fixes to the test case.
1. Starting the HsqlDB server manually - dbServer.start().
2. Supplying user name and password when initializing DBStorage.

> allow pig to write output into a JDBC db
> 
>
> Key: PIG-1229
> URL: https://issues.apache.org/jira/browse/PIG-1229
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Reporter: Ian Holsman
>Assignee: Ankur
>Priority: Minor
> Fix For: 0.8.0
>
> Attachments: jira-1229-final.patch, jira-1229-final.test-fix.patch, 
> jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

2010-08-03 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-1229:
---

Attachment: (was: jira-1229-final.test-fix.patch)

> allow pig to write output into a JDBC db
> 
>
> Key: PIG-1229
> URL: https://issues.apache.org/jira/browse/PIG-1229
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Reporter: Ian Holsman
>Assignee: Ankur
>Priority: Minor
> Fix For: 0.8.0
>
> Attachments: jira-1229-final.patch, jira-1229-v2.patch, 
> jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

2010-08-03 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-1229:
---

Attachment: jira-1229-final.test-fix.patch

Here is my understanding of what happens

1. The main thread in the JVM executing the test initializes MiniDFSCluster,  
MiniMRCluster and HSQLDB server all in different threads.
2. The test setUp() method then executed to create table 'ttt' to which data 
will be written by DBStorage() in the test.
3. Pig statements are then executed that spawn M/R job as a separate process 
that tries to get a connection to the database and create a preparedStatement 
for table 'ttt'. This fails sometimes as DB thread does NOT get a chance to 
fully persist the table information and the exception is thrown from the 
map-tasks as noted by Ashutosh.

The fix for this is to add a 5 sec sleep in setUp() method to give DB a chance 
to persist table information. This alleviates the problem and test passes for 
repeated multiple runs. 

Note that Ideal fix would have been to do a busy wait for table creation 
completion but i don't see a method in HSqlDB to do that. 

> allow pig to write output into a JDBC db
> 
>
> Key: PIG-1229
> URL: https://issues.apache.org/jira/browse/PIG-1229
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Reporter: Ian Holsman
>Assignee: Ankur
>Priority: Minor
> Fix For: 0.8.0
>
> Attachments: jira-1229-final.patch, jira-1229-final.test-fix.patch, 
> jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

2010-08-04 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-1229:
---

Attachment: (was: jira-1229-final.test-fix.patch)

> allow pig to write output into a JDBC db
> 
>
> Key: PIG-1229
> URL: https://issues.apache.org/jira/browse/PIG-1229
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Reporter: Ian Holsman
>Assignee: Ankur
>Priority: Minor
> Fix For: 0.8.0
>
> Attachments: jira-1229-final.patch, jira-1229-v2.patch, 
> jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

2010-08-04 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-1229:
---

Attachment: jira-1229-final.test-fix.patch

Aaron,
 Autocommit() was not the issue.  It was the usage of 
"jdbc:hsqldb:file:" url in the STORE function that was the problem. Replacing 
it with "jdbc:hsqldb:hsql://localhost/dbname" solved the issue. Attaching the 
updated patch with the test case modification.

Really appreciate your help here. Thanks a lot :-)

> allow pig to write output into a JDBC db
> 
>
> Key: PIG-1229
> URL: https://issues.apache.org/jira/browse/PIG-1229
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Reporter: Ian Holsman
>Assignee: Ankur
>Priority: Minor
> Fix For: 0.8.0
>
> Attachments: jira-1229-final.patch, jira-1229-final.test-fix.patch, 
> jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.2.patch, pig-1229.patch
>
>
> UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-428) TypeCastInserter does not replace projects in inner plans correctly

2009-01-13 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-428:
--


I am still seeing this issue. What I do is load my data using a custom loader. 
One of the fields returned by loader is of type Map.
When I retrieve a value from the map and group on that, I get this exception. 
Here is a snippet of  my script.

raw =  LOAD '/mydata/*' USING MyLoader() ;
entry = FILTER raw BY (CUSTOMARGMAP#'keyOfInterest' is not null);
listing = FOREACH entry GENERATE CUSTOMARGMAP#'keyOfInterest' as keyGroup;
myGroup = GROUP listing BY (keyGroup);
unordered_results = FOREACH myGroup GENERATE group, COUNT(*);
results = ORDER unordered_results by $1 DESC;
STORE results INTO 'Results' USING PigStorage();

 



> TypeCastInserter does not replace projects in inner plans correctly
> ---
>
> Key: PIG-428
> URL: https://issues.apache.org/jira/browse/PIG-428
> Project: Pig
>  Issue Type: Bug
>Affects Versions: types_branch
>Reporter: Pradeep Kamath
> Fix For: types_branch
>
> Attachments: PIG-428.patch
>
>
> The TypeCastInserter tries to replace the Project's input operator in inner 
> plans with the new foreach operator it adds. However it should replace only 
> those Projects' input where the new Foreach has been added after the operator 
> which was earlier the input to Project.
> Here is a query which fails due to this:
> {code}
> a = load 'st10k' as (name:chararray,age:int, gpa:double);
> another = load 'st10k';
> c = foreach another generate $0, $1+ 10, $2 + 10;
> d = join a by $0, c by $0;
> dump d;
> {code}
> Here is the error:
> {noformat}
> 2008-09-11 23:34:28,169 [main] ERROR 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error 
> message from task (map) tip_200809051428_0045_m_00java.io.IOException: 
> Type mismatch in key from map: expected org.apache.pig.impl.io.NullableText, 
> recieved org.apache.pig.impl.io.NullableBytesWritable
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:419)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:83)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:172)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:158)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:75)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:219)
> at 
> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-732) Utility UDFs

2009-03-25 Thread Ankur (JIRA)
Utility UDFs 
-

 Key: PIG-732
 URL: https://issues.apache.org/jira/browse/PIG-732
 Project: Pig
  Issue Type: New Feature
Reporter: Ankur
Priority: Minor
 Attachments: udf.v1.patch

Two utility UDFs and their respective test cases.

1. TopN - Accepts number of tuples (N) to retain in output, field number (type 
long) to use for comparison, and an sorted/unsorted bag of tuples. It outputs a 
bag containing top N tuples.

2. SearchQuery - Accepts an encoded URL from any of the 4 search engines 
(Yahoo, Google, AOL, Live) and extracts and normalizes the search query present 
in it.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-732) Utility UDFs

2009-03-25 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-732:
--

Attachment: udf.v1.patch

Since the UDFs are quite small, I combined them in a single patch instead of 
opening up a separate jira for each UDF. However if people believe having 
separate jira for each will help then I can split this up into 2.

> Utility UDFs 
> -
>
> Key: PIG-732
> URL: https://issues.apache.org/jira/browse/PIG-732
> Project: Pig
>  Issue Type: New Feature
>Reporter: Ankur
>Priority: Minor
> Attachments: udf.v1.patch
>
>
> Two utility UDFs and their respective test cases.
> 1. TopN - Accepts number of tuples (N) to retain in output, field number 
> (type long) to use for comparison, and an sorted/unsorted bag of tuples. It 
> outputs a bag containing top N tuples.
> 2. SearchQuery - Accepts an encoded URL from any of the 4 search engines 
> (Yahoo, Google, AOL, Live) and extracts and normalizes the search query 
> present in it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-732) Utility UDFs

2009-03-25 Thread Ankur (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-732?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankur updated PIG-732:
--

Attachment: udf.v2.patch

> Utility UDFs 
> -
>
> Key: PIG-732
> URL: https://issues.apache.org/jira/browse/PIG-732
> Project: Pig
>  Issue Type: New Feature
>Reporter: Ankur
>Priority: Minor
> Attachments: udf.v1.patch, udf.v2.patch
>
>
> Two utility UDFs and their respective test cases.
> 1. TopN - Accepts number of tuples (N) to retain in output, field number 
> (type long) to use for comparison, and an sorted/unsorted bag of tuples. It 
> outputs a bag containing top N tuples.
> 2. SearchQuery - Accepts an encoded URL from any of the 4 search engines 
> (Yahoo, Google, AOL, Live) and extracts and normalizes the search query 
> present in it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-732) Utility UDFs

2009-03-25 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12689104#action_12689104
 ] 

Ankur commented on PIG-732:
---

Olga,
Thanks for a quick review. 
> (1) Pig already support limit operator 
I have a relation where I need to group by field-1 and retain top-N occurrences 
of field-2. So I group by (field-1, field-2), generate counts and flattened 
tuple of the form (field-1, field2, ). Now I again group on field-1 and 
just retain top-N tuples. So I actually need to project bags of limited size. I 
don't think this can be done using LIMIT as it is not allowed inside FOREACH.

> (2) Filtering UDFs are meant to be used as 
Moved TopN and SearchQuery UDFs to  piggyBank/evaluation/util. Also moved the 
test cases to the appropriate location.

> (3) Each file included needs to have Apache license header 
Done.



> Utility UDFs 
> -
>
> Key: PIG-732
> URL: https://issues.apache.org/jira/browse/PIG-732
> Project: Pig
>  Issue Type: New Feature
>Reporter: Ankur
>Priority: Minor
> Attachments: udf.v1.patch, udf.v2.patch
>
>
> Two utility UDFs and their respective test cases.
> 1. TopN - Accepts number of tuples (N) to retain in output, field number 
> (type long) to use for comparison, and an sorted/unsorted bag of tuples. It 
> outputs a bag containing top N tuples.
> 2. SearchQuery - Accepts an encoded URL from any of the 4 search engines 
> (Yahoo, Google, AOL, Live) and extracts and normalizes the search query 
> present in it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



  1   2   >