[jira] Assigned: (PIG-19) A=load causes parse error

2010-09-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-19?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-19:
-

Assignee: Xuefu Zhang

 A=load causes parse error
 -

 Key: PIG-19
 URL: https://issues.apache.org/jira/browse/PIG-19
 Project: Pig
  Issue Type: Bug
  Components: grunt
Reporter: Olga Natkovich
Assignee: Xuefu Zhang
Priority: Minor
 Fix For: 0.9.0


 Parser expects spaces around =. This should be a minor change in 
 src/org/apache/pig/tools/grunt/GruntParser.jj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-313) Error handling aggregate of a computation

2010-09-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-313:
--

Assignee: Alan Gates

 Error handling aggregate of a computation
 -

 Key: PIG-313
 URL: https://issues.apache.org/jira/browse/PIG-313
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Pradeep Kamath
Assignee: Alan Gates
Priority: Minor
 Fix For: 0.9.0


 Query which fails:
 {code}
 a = load ':INPATH:/singlefile/studenttab10k' as (name:chararray, age:int, 
 gpa:double);
 b = group a by name;
 c = foreach b generate group, SUM(a.age*a.gpa);
 store c into ':OUTPATH:';\,
 {code}
 Error output:
 {quote}
 2008-07-14 16:34:08,684 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to hadoop file system at: testhost.com:8020
 2008-07-14 16:34:08,741 [main] WARN  org.apache.hadoop.fs.FileSystem - 
 testhost.com:8020 is a deprecated filesystem name. Use 
 hdfs://testhost:8020/ instead.
 2008-07-14 16:34:08,995 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to map-reduce job tracker at: testhost.com:50020
 2008-07-14 16:34:09,251 [main] WARN  org.apache.hadoop.fs.FileSystem - 
 testhost.com:8020 is a deprecated filesystem name. Use 
 hdfs://testhost:8020/ instead.
 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Cannot 
 evaluate output type of Mul/Div Operator
 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Problem 
 resolving LOForEach schema
 2008-07-14 16:34:09,559 [main] ERROR org.apache.pig.PigServer - Severe 
 problem found during validation 
 org.apache.pig.impl.plan.PlanValidationException: An unexpected exception 
 caused the validation to stop 
 2008-07-14 16:34:09,560 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
 java.io.IOException: Unable to store for alias: c
 2008-07-14 16:34:09,560 [main] ERROR org.apache.pig.Main - 
 java.io.IOException: Unable to store for alias: c
 {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-333) MIN on strings (undeclared) gives strange error in store

2010-09-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-333:
--

Assignee: Alan Gates  (was: Santhosh Srinivasan)

 MIN on strings (undeclared) gives strange error in store
 

 Key: PIG-333
 URL: https://issues.apache.org/jira/browse/PIG-333
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Pradeep Kamath
Assignee: Alan Gates
Priority: Minor
 Fix For: 0.9.0


 Script which causes error:
 {code}
 a = load '/user/pig/tests/data/singlefile/votertab10k' as (name, age, 
 registration, contribution);
 b = group a all;
 c = foreach b generate MIN(a.name), MAX(a.name);
 store c into '/tmp';
 {code}
 Error:
 {noformat}
 2008-07-23 11:31:15,415 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - 0.0% 
 complete
 2008-07-23 11:31:19,167 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - 50.0% 
 complete
 2008-07-23 11:31:43,431 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - 
 100.0% complete
 2008-07-23 11:31:45,956 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - 
 Unsuccessful attempt. Completed 0.0% of the job
 2008-07-23 11:31:45,969 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error 
 message from task (map) tip_20080723_0002_m_00
 2008-07-23 11:31:45,974 [main] ERROR 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher - Error 
 message from task (reduce) tip_20080723_0002_r_00 
 java.io.IOException: Cannot store a non-flat tuple using PigStorage
 at org.apache.pig.builtin.PigStorage.putNext(PigStorage.java:163)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:117)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:90)
 at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:373)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:170)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:85)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391)
 at 
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
  java.io.IOException: Cannot store a non-flat tuple using PigStorage
 at org.apache.pig.builtin.PigStorage.putNext(PigStorage.java:163)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:117)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:90)
 at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:373)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:170)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:85)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391)
 at 
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
  java.io.IOException: Cannot store a non-flat tuple using PigStorage
 at org.apache.pig.builtin.PigStorage.putNext(PigStorage.java:163)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:117)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:90)
 at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:373)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:170)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Reduce.reduce(PigMapReduce.java:85)
 at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:391)
 at 
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)
  java.io.IOException: Cannot store a non-flat tuple using PigStorage
 at org.apache.pig.builtin.PigStorage.putNext(PigStorage.java:163)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:117)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:90)
 at 

[jira] Assigned: (PIG-144) The error message should be more meaningful when there is a typo in PIg script

2010-09-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-144:
--

Assignee: Xuefu Zhang

 The error message should be more meaningful when there is a typo in PIg script
 --

 Key: PIG-144
 URL: https://issues.apache.org/jira/browse/PIG-144
 Project: Pig
  Issue Type: Bug
Reporter: Xu Zhang
Assignee: Xuefu Zhang
Priority: Minor
 Fix For: 0.9.0


 When I ran the following Pig script on the command line {{pig -c mycluster 
 myscript.pig}}, I got the error: 
 2008-03-07 16:31:45,992 [main] ERROR org.apache.pig.tools.grunt.Grunt -
   
 {code}
 A = load '/user/pig/tests/data/singlefile/fileexists';
 B = foreach A generate $2, $1, $0;
 C = strean B through `awk '{print $3   $4 \t $2 \t $1}'`;
 store C into '/user/pig/tests/data/singlefile/results1';
 {code}
 The error message is not quite meaningful, and it took me a while to find out 
 what was wrong - the word strean should have been stream.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-356) map lookup on empty key should be disallowed at parse time

2010-09-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-356:
--

Assignee: Xuefu Zhang

 map lookup on empty key should be disallowed at parse time
 --

 Key: PIG-356
 URL: https://issues.apache.org/jira/browse/PIG-356
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Pradeep Kamath
Assignee: Xuefu Zhang
Priority: Minor
 Fix For: 0.9.0


 Currently the following is allowed:
 {code}
 a = load 'testfile';
 b = foreach a generate $0#'apple', $0#'mango', $0#'', flatten($1#'orange');
 {code}
 Looking up an empty key ($0#'') should not be allowed at parse time

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-548) ParseException involving as keyword

2010-09-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-548:
--

Assignee: Xuefu Zhang

 ParseException involving as  keyword
 --

 Key: PIG-548
 URL: https://issues.apache.org/jira/browse/PIG-548
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Viraj Bhat
Assignee: Xuefu Zhang
Priority: Minor
 Fix For: 0.9.0

 Attachments: assyntax.pig


 The enclosed Pig script, throws the following error:
 =
 org.apache.pig.tools.pigscript.parser.ParseException: Encountered as at 
 line 13, column 11.
 Was expecting one of:
 EOF 
 cat ...
 cd ...
 cp ...
 copyFromLocal ...
 copyToLocal ...
 dump ...
 describe ...
 explain ...
 help ...
 kill ...
 ls ...
 mv ...
 mkdir ...
 pwd ...
 quit ...
 register ...
 rm ...
 rmf ...
 set ...
 illustrate ...
 scriptDone ...
  ...
 EOL ...
 ; ...
 
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.generateParseException(PigScriptParser.java:688)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.handle_invalid_command(PigScriptParser.java:515)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:356)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
 at org.apache.pig.Main.main(Main.java:306)
 =
 But the error seems to disappear if a few lines are moved around the 
 foreach and as keywords. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-435) wrong columns produced if incomplete definition provided during load

2010-09-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-435:
--

Assignee: Alan Gates  (was: Pradeep Kamath)

 wrong columns produced if incomplete definition provided during load
 

 Key: PIG-435
 URL: https://issues.apache.org/jira/browse/PIG-435
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Olga Natkovich
Assignee: Alan Gates
Priority: Minor
 Fix For: 0.9.0


 Scrip:
 A = load 'studenttab10k' as (name); -- note that data has more than 1 column
 B = load 'votertab10k' as (name, age, reg, contrib);
 D = COGROUP A by name, B by name;  
 E = foreach D generate flatten(A), flatten(B); 
 F = foreach E generate registration, contr;
 dump F;
 The dump produces the wrong columns. This is because even though we declared 
 only one column, we actually load all columns of A. So any place where we 
 explicitely or implicitely use A.* as the case in flatten, we would produce 
 the wrong results.
 The long term solution is actually to push projections into the load. Shorter 
 term the proposal is to notice if the script uses A.* and stick a project 
 after the load. Note that we don't need to do that if types are declared 
 because there will be already casting foreach there.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-673) several aggregate functions do not check the number of arguments and do not correctly check for a type bag

2010-09-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-673:
--

Assignee: Daniel Dai

 several aggregate functions do not check the number of arguments and do not 
 correctly check for a type bag
 

 Key: PIG-673
 URL: https://issues.apache.org/jira/browse/PIG-673
 Project: Pig
  Issue Type: Bug
 Environment: i686 i386 GNU/Linux
Reporter: Araceli Henley
Assignee: Daniel Dai
 Fix For: 0.9.0


 DIFF expects two bags as the argument. But in this negative test case we pass:
 1) there is a single argument to diff instead of two,
 2) The argument should be a bag but is an int.
 TEST: AggregateFunc_190
  A =LOAD '/user/pig/tests/data/types/DataAll' USING PigStorage() AS ( 
 Fint:int, Flong:long, Fdouble:double, Ffloat:float, Fchar:chararray, 
 Fchararray:chararray, Fbytearray:bytearray, Fmap:map[], Fbag:BAG{ t:tuple( 
 name, age, avg ) }, Ftuple:( name:chararray, age:int, avg:float) );
 B =GROUP A ALL; 
 X =FOREACH B GENERATE  DIFF( A.Fint) + DIFF( A.Fint); 
 STORE X INTO 
 '/user/pig/tests/results/araceli.1234381533/AggregateFunc_190.out' USING 
 PigStorage();
 ERROR 1000: Error during parsing. Atomic field expected but found non-atomic 
 field
 TEST AggregateFunc_1901 
 A =LOAD '/user/pig/tests/data/types/DataAll' USING PigStorage() AS ( 
 Fint:int, Flong:long, Fdouble:double, Ffloat:float, Fchar:chararray, 
 Fchararray:chararray, Fbytearray:bytearray, Fmap:map[], Fbag:BAG{ t:tuple( 
 name, age, avg ) }, Ftuple:( name:chararray, age:int, avg:float) );
 B =GROUP A ALL; 
 X =FOREACH B GENERATE  DIFF( A.Fint, A.Fint) + DIFF( A.Fint, A.Fint);
  STORE X INTO 
 '/user/pig/tests/results/araceli.1234467894/AggregateFunc_1901.out' USING 
 PigStorage();
 ERROR 1000: Error during parsing. Atomic field expected but found non-atomic 
 field
 TEST AggregateFunc_1902 
  A =LOAD '/user/pig/tests/data/types/DataAll' USING PigStorage() AS ( 
 Fint:int, Flong:long, Fdouble:double, Ffloat:float, Fchar:chararray, 
 Fchararray:chararray, Fbytearray:bytearray, Fmap:map[], Fbag:BAG{ t:tuple( 
 name, age, avg ) }, Ftuple:( name:chararray, age:int, avg:float) );
 B =GROUP A ALL; 
 X =FOREACH B GENERATE  DIFF( A.Fint, A.Fint + A.Fint); 
 STORE X INTO 
 '/user/pig/tests/results/araceli.1234467894/AggregateFunc_1902.out' USING 
 PigStorage();
 throws error: ERROR 1039: Incompatible types in Add Operator left hand 
 side:bag right hand side:bag

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-674) Improve errors in Pig parser

2010-09-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-674:
--

Assignee: Xuefu Zhang

 Improve errors in Pig parser
 

 Key: PIG-674
 URL: https://issues.apache.org/jira/browse/PIG-674
 Project: Pig
  Issue Type: Bug
Reporter: Araceli Henley
Assignee: Xuefu Zhang
Priority: Minor
 Fix For: 0.9.0


 These tests are for Aggregate Functions
 
Recomend msg -  SHould indicate that this is an invalid cast.
ERROR - MAX with int with invalid cast
TEST:  106,
PIG SCRIPT:  A =LOAD ':INPATH:/types/DataAll' USING PigStorage() AS ( 
 Fint:int, Flong:long, Fdouble:double, Ffloat:float, Fchar:chararray, 
 Fchararray:chararray, Fbytearray:bytearray, Fmap:map[], Fbag:BAG{ t:tuple( 
 name, age, avg ) }, Ftuple:( name:chararray, age:int, avg:float) );B =GROUP A 
 ALL; X =FOREACH B GENERATE A.Fint, MAX( (invalid) A.Fint ); STORE X INTO 
 ':OUTPATH:' USING PigStorage();\,
   CURRENT ERROR MESSAGE: ERROR 1000:.*Invalid alias: MAX,
 
Recomend msg -  SHould indicate that this is an invalid cast.
ERROR - MAX with int with invalid cast
TEST:  106,
PIG SCRIPT:  A =LOAD ':INPATH:/types/DataAll' USING PigStorage() AS ( 
 Fint:int, Flong:long, Fdouble:double, Ffloat:float, Fchar:chararray, 
 Fchararray:chararray, Fbytearray:bytearray, Fmap:map[], Fbag:BAG{ t:tuple( 
 name, age, avg ) }, Ftuple:( name:chararray, age:int, avg:float) );B =GROUP A 
 ALL; X =FOREACH B GENERATE A.Fint, MAX( (invalid) A.Fint ); STORE X INTO 
 ':OUTPATH:' USING PigStorage();\,
   CURRENT ERROR MESSAGE: ERROR 1000:.*Invalid alias: MAX,
 
Recomend msg -
ERROR: invalid use of foreach with multiple functions and positional 
 parameters
TEST:  107,
PIG SCRIPT:  A =LOAD ':INPATH:/types/DataAll' USING PigStorage() AS ( 
 Fint:int, Flong:long, Fdouble:double, Ffloat:float, Fchar:chararray, 
 Fchararray:chararray, Fbytearray:bytearray, Fmap:map[], Fbag:BAG{ t:tuple( 
 name, age, avg ) }, Ftuple:( name:chararray, age:int, avg:float) );B =GROUP A 
 ALL; X =FOREACH A GENERATE  SUM( A.$0), AVG( A.$0), COUNT( A.$0), MAX(A.$0), 
 MIN( A.$0); STORE X INTO ':OUTPATH:' USING PigStorage();\,
   CURRENT ERROR MESSAGE: FIX: improve msg,
 
Recomend msg - ERROR 1052: Cannot cast bag with schema.*: bag
ERROR: invalid use of MIN with int with valid cast
TEST:  108,
PIG SCRIPT:  A =LOAD ':INPATH:/types/DataAll' USING PigStorage() AS ( 
 Fint:int, Flong:long, Fdouble:double, Ffloat:float, Fchar:chararray, 
 Fchararray:chararray, Fbytearray:bytearray, Fmap:map[], Fbag:BAG{ t:tuple( 
 name, age, avg ) }, Ftuple:( name:chararray, age:int, avg:float) );B =GROUP A 
 ALL; X =FOREACH B GENERATE A.Fint, MIN( (double) A.Fint ); STORE X INTO 
 ':OUTPATH:' USING PigStorage();\,
   CURRENT ERROR MESSAGE: ERROR 1052: Cannot cast.*,
 
Recomend msg -
ERROR - AVG needs bag
TEST:  113,
PIG SCRIPT:  A =LOAD ':INPATH:/types/DataAll' USING PigStorage() AS ( 
 Fint:int, Flong:long, Fdouble:double, Ffloat:float, Fchar:chararray, 
 Fchararray:chararray, Fbytearray:bytearray, Fmap:map[], Fbag:BAG{ t:tuple( 
 name, age, avg ) }, Ftuple:( name:chararray, age:int, avg:float) ); B = GROUP 
 A ALL; X =FOREACH B GENERATE  AVG( A.Fint); STORE X INTO ':OUTPATH:' USING 
 PigStorage();\,
   CURRENT ERROR MESSAGE: ERROR 1052: Cannot cast bag with schema.*bag,
 
Recomend msg -  this should indicate there was an invalid Cast
ERROR - AVG with int with invalid cast
TEST:  115,
PIG SCRIPT:  A =LOAD ':INPATH:/types/DataAll' USING PigStorage() AS ( 
 Fint:int, Flong:long, Fdouble:double, Ffloat:float, Fchar:chararray, 
 Fchararray:chararray, Fbytearray:bytearray, Fmap:map[], Fbag:BAG{ t:tuple( 
 name, age, avg ) }, Ftuple:( name:chararray, age:int, avg:float) );B =GROUP A 
 ALL; X =FOREACH B GENERATE A.Fint, AVG( (invalid) A.Fint ); STORE X INTO 
 ':OUTPATH:' USING PigStorage();\,
   CURRENT ERROR MESSAGE: ERROR 1000:.*Invalid alias: AVG,
 
Recomend msg -  this should indicate that COUNT expects a bag for an 
 argument
ERROR - COUNT needs bag
TEST:  118,
PIG SCRIPT:  A =LOAD ':INPATH:/types/DataAll' USING PigStorage() AS ( 
 Fint:int, Flong:long, Fdouble:double, Ffloat:float, Fchar:chararray, 
 Fchararray:chararray, Fbytearray:bytearray, Fmap:map[], Fbag:BAG{ t:tuple( 
 name, age, avg ) }, Ftuple:( name:chararray, age:int, 

[jira] Assigned: (PIG-671) typechecker does not throw an error when multiple arguments are passed to COUNT

2010-09-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-671:
--

Assignee: Daniel Dai

 typechecker does not throw an error when multiple arguments are passed to 
 COUNT
 ---

 Key: PIG-671
 URL: https://issues.apache.org/jira/browse/PIG-671
 Project: Pig
  Issue Type: Bug
 Environment: i686 i386 GNU/Linux
Reporter: Araceli Henley
Assignee: Daniel Dai
Priority: Trivial
 Fix For: 0.9.0


 In this example, the agggregate function COUNT is passed multiple arguments 
 and does not throw an error.
 TEST: Aggregate_184
  A =LOAD '/user/pig/tests/data/types/DataAll' USING PigStorage() AS ( 
 Fint:int, Flong:long, Fdouble:double, Ffloat:float, Fchar:chararray, 
 Fchararray:chararray, Fbytearray:bytearray, Fmap:map[], Fbag:BAG{ t:tuple( 
 name, age, avg ) }, Ftuple:( name:chararray, age:int, avg:float) );
 B =GROUP A ALL; 
 X =FOREACH B GENERATE COUNT ( A.$0, A.$0 ); 
 STORE X INTO 
 '/user/pig/tests/results/araceli.1234381533/AggregateFunc_184.out' USING 
 PigStorage();

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-709) Handling of NULL in Pig builtin functions needs to be reviewed

2010-09-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-709:
--

Assignee: Daniel Dai  (was: Alan Gates)

 Handling of NULL in Pig builtin functions needs to be reviewed
 --

 Key: PIG-709
 URL: https://issues.apache.org/jira/browse/PIG-709
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Santhosh Srinivasan
Assignee: Daniel Dai
 Fix For: 0.9.0


 Pig builtin functions do not handle NULL consistently. Some examples are the 
 combiner versus non-combiner for AVG. All the builtins need a review of cases 
 where NULL is handled.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-747) Logical to Physical Plan Translation fails when temporary alias are created within foreach

2010-09-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-747:
--

Assignee: Alan Gates  (was: Daniel Dai)

 Logical to Physical Plan Translation fails when temporary alias are created 
 within foreach
 --

 Key: PIG-747
 URL: https://issues.apache.org/jira/browse/PIG-747
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Viraj Bhat
Assignee: Alan Gates
 Fix For: 0.9.0

 Attachments: physicalplan.txt, physicalplanprob.pig, PIG-747-1.patch


 Consider a the pig script which calculates a new column F inside the foreach 
 as:
 {code}
 A = load 'physicalplan.txt' as (col1,col2,col3);
 B = foreach A {
D = col1/col2;
E = col3/col2;
F = E - (D*D);
generate
F as newcol;
 };
 dump B;
 {code}
 This gives the following error:
 ===
 Caused by: 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogicalToPhysicalTranslatorException:
  ERROR 2015: Invalid physical operators in the physical plan
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:377)
 at 
 org.apache.pig.impl.logicalLayer.LOMultiply.visit(LOMultiply.java:63)
 at 
 org.apache.pig.impl.logicalLayer.LOMultiply.visit(LOMultiply.java:29)
 at 
 org.apache.pig.impl.plan.DependencyOrderWalkerWOSeenChk.walk(DependencyOrderWalkerWOSeenChk.java:68)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:908)
 at 
 org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:122)
 at org.apache.pig.impl.logicalLayer.LOForEach.visit(LOForEach.java:41)
 at 
 org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
 at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
 at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:246)
 ... 10 more
 Caused by: org.apache.pig.impl.plan.PlanException: ERROR 0: Attempt to give 
 operator of type 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.Divide
  multiple outputs.  This operator does not support multiple outputs.
 at 
 org.apache.pig.impl.plan.OperatorPlan.connect(OperatorPlan.java:158)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.plans.PhysicalPlan.connect(PhysicalPlan.java:89)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:373)
 ... 19 more
 ===

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1017) Converts strings to text in Pig

2010-09-14 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909251#action_12909251
 ] 

Alan Gates commented on PIG-1017:
-

Are we really going to do this?  I doubt it now, as the backward 
incompatibility cost would be so high.  At the very least I don't think we'll 
do it for 0.9.

 Converts strings to text in Pig
 ---

 Key: PIG-1017
 URL: https://issues.apache.org/jira/browse/PIG-1017
 Project: Pig
  Issue Type: Improvement
Reporter: Sriranjan Manjunath
Assignee: Sriranjan Manjunath
 Fix For: 0.9.0

 Attachments: stotext.patch


 Strings in Java are UTF-16 and takes 2 bytes. Text 
 (org.apache.hadoop.io.Text) stores the data in UTF-8 and could show 
 significant reductions in memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1092) Pig Latin Parser fails to recognize \n as a whitespace

2010-09-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-1092:
---

Assignee: Xuefu Zhang

 Pig Latin Parser fails to recognize \n as a whitespace
 

 Key: PIG-1092
 URL: https://issues.apache.org/jira/browse/PIG-1092
 Project: Pig
  Issue Type: Bug
  Components: grunt
 Environment: RHEL linux
Reporter: Yang Yang
Assignee: Xuefu Zhang
Priority: Minor
 Fix For: 0.9.0


 the following pig latin script fails to parse
 a = load 'input_file' as
 ( field1 : int );
 note that there is no char after the as, so there is only one \n char 
 between the as and ( on the next line.
 adding a whitespace after as solves it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1341) BinStorage cannot convert DataByteArray to Chararray and results in FIELD_DISCARDED_TYPE_CONVERSION_FAILED

2010-09-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-1341:
---

Assignee: Alan Gates  (was: Richard Ding)

 BinStorage cannot convert DataByteArray to Chararray and results in 
 FIELD_DISCARDED_TYPE_CONVERSION_FAILED
 --

 Key: PIG-1341
 URL: https://issues.apache.org/jira/browse/PIG-1341
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Viraj Bhat
Assignee: Alan Gates
 Fix For: 0.9.0

 Attachments: PIG-1341.patch


 Script reads in BinStorage data and tries to convert a column which is in 
 DataByteArray to Chararray. 
 {code}
 raw = load 'sampledata' using BinStorage() as (col1,col2, col3);
 --filter out null columns
 A = filter raw by col1#'bcookie' is not null;
 B = foreach A generate col1#'bcookie'  as reqcolumn;
 describe B;
 --B: {regcolumn: bytearray}
 X = limit B 5;
 dump X;
 B = foreach A generate (chararray)col1#'bcookie'  as convertedcol;
 describe B;
 --B: {convertedcol: chararray}
 X = limit B 5;
 dump X;
 {code}
 The first dump produces:
 (36co9b55onr8s)
 (36co9b55onr8s)
 (36hilul5oo1q1)
 (36hilul5oo1q1)
 (36l4cj15ooa8a)
 The second dump produces:
 ()
 ()
 ()
 ()
 ()
 It also throws an error message: FIELD_DISCARDED_TYPE_CONVERSION_FAILED 5 
 time(s).
 Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1358) [piggybank] String functions should handle exceptions in a consistent manner

2010-09-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-1358:
---

Assignee: Daniel Dai

 [piggybank] String functions should handle exceptions in a consistent manner 
 -

 Key: PIG-1358
 URL: https://issues.apache.org/jira/browse/PIG-1358
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Richard Ding
Assignee: Daniel Dai
 Fix For: 0.9.0


 The String functions in piggybank handles exceptions differently. Some 
 catches all exceptions, some catches only ClassCastException, while some 
 catches only ExecException. The exception handling code in these functions 
 should be consistent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1538) isTwoLevelAccessRequired() returns false for nested relation

2010-09-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-1538:
---

Assignee: Alan Gates

 isTwoLevelAccessRequired() returns false for nested relation
 

 Key: PIG-1538
 URL: https://issues.apache.org/jira/browse/PIG-1538
 Project: Pig
  Issue Type: Wish
  Components: impl
Affects Versions: 0.7.0
Reporter: Justin Hu
Assignee: Alan Gates
Priority: Minor
 Fix For: 0.9.0

 Attachments: testcase.tgz


 Some user depends isTwoLevelAccessRequired() method in his UDF, and wishes 
 the method returns TRUE for nested schema (for example, the relation with 
 nested tuple).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1499) Type error message does not include complex type

2010-09-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-1499:
---

Assignee: Xuefu Zhang

 Type error message does not include complex type
 

 Key: PIG-1499
 URL: https://issues.apache.org/jira/browse/PIG-1499
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.7.0
 Environment: Hadoop 0.20.104.3.1007030707
 Apache Pig version 0.7.0.20.100.1.1006041903 (r951530)
Reporter: Sherry Chen
Assignee: Xuefu Zhang
Priority: Minor
 Fix For: 0.9.0


 When loading data as a bag, if the schema specification is not correct, error 
 message does not include useful information about bag.
 For example, input file as input.txt, working script as working.pig, non 
 working as not_working.pig as following:
 input.txt
 {(2, 3)}
 {(4, 6)}
 {(5, 7)}
 not_working.pig
 A = LOAD 'input.txt' AS (f1:bag[T:tuple(t1, t2)]);
 describe A;
 dump A;
 working .pig
 A = LOAD 'input.txt' AS (f1:bag{T:tuple(t1, t2)});
 describe A;
 dump A;
 if run:  pig -latest -x local working.pig, we get result:
 ({(2, 3)})
 ({(4, 6)})
 ({(5, 7)})
 if run  pig -latest -x local not_working.pig, we get:
  ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. 
 Encountered  bag bag  at line 1, column 29.
 Was expecting one of:
 int ...
 long ...
 float ...
 double ...
 chararray ...
 bytearray ...
 int ...
 long ...
 float ...
 double ...
 chararray ...
 bytearray ...
 Please include bag{} map[] tuple() in Error message for better addressing the 
 error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1573) PIG shouldn't pass all input to a UDF if the UDF specify no argument

2010-09-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-1573:
---

Assignee: Daniel Dai  (was: Xuefu Zhang)

 PIG shouldn't pass all input to a UDF if the UDF specify no argument
 

 Key: PIG-1573
 URL: https://issues.apache.org/jira/browse/PIG-1573
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0
Reporter: Xuefu Zhang
Assignee: Daniel Dai
 Fix For: 0.9.0


 Currently If in a pig script user uses a UDF with no argument, PIG backend 
 assumes that the UDF takes all input so at run time it passes all input as a 
 tuple to the UDF. This assumption is incorrect, causing conceptual 
 confusions. If a UDF takes all input, it can specify a star (*) as its 
 argument. If it specify no argument at  all, then we assume that it requires 
 no input data. 
 We need to differentiate no input and all input for a UDF. Thus, in case that 
 a UDF specify no argument, backend should pass the UDF  an empty tuple.
 See notes in PIG-1586 for more information.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1584) deal with inner cogroup

2010-09-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-1584:
---

Assignee: Alan Gates

 deal with inner cogroup
 ---

 Key: PIG-1584
 URL: https://issues.apache.org/jira/browse/PIG-1584
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
Assignee: Alan Gates
 Fix For: 0.9.0


 The current implementation of inner in case of cogroup is in conflict with 
 join. We need to decide of whether to fix inner cogroup or just remove the 
 functionality if it is not widely used

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1577) support to variable number of arguments in UDF

2010-09-14 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-1577:
---

Assignee: Daniel Dai

 support to variable number of arguments in UDF
 --

 Key: PIG-1577
 URL: https://issues.apache.org/jira/browse/PIG-1577
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.6.0
Reporter: Olga Natkovich
Assignee: Daniel Dai
 Fix For: 0.9.0


 In the current implementation, functionality that allows to map arguments to 
 classes does not support functions with variable number of arguments. Also it 
 does not support funtions that can have variable (but fixed in number) number 
 of arguments. 
 This causes problems for string UDFs such as CONCAT that can take an 
 arbitrary number of arguments or TRIM that can take 1,2, or 3 arguments

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-366) PigPen - Eclipse plugin for a graphical PigLatin editor

2010-09-14 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909330#action_12909330
 ] 

Yan Zhou commented on PIG-366:
--

Robert,

Could you put down a step-by-step instruction on how to use this jar as an 
eclipse plug-in?  Thanks.

 PigPen - Eclipse plugin for a graphical PigLatin editor
 ---

 Key: PIG-366
 URL: https://issues.apache.org/jira/browse/PIG-366
 Project: Pig
  Issue Type: New Feature
Reporter: Shubham Chopra
Assignee: Robert Gibbon
Priority: Minor
 Attachments: org.apache.pig.pigpen-0.7.0.tar.gz, 
 org.apache.pig.pigpen-0.7.2.tar.gz, org.apache.pig.pigpen_0.0.1.jar, 
 org.apache.pig.pigpen_0.0.1.tgz, org.apache.pig.pigpen_0.0.4.jar, 
 org.apache.pig.pigpen_0.7.2.jar, pigpen.patch, pigPen.patch, PigPen.tgz


 This is an Eclipse plugin that provides a GUI that can help users create 
 PigLatin scripts and see the example generator outputs on the fly and submit 
 the jobs to hadoop clusters.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1479) Embed Pig in scripting languages

2010-09-14 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909345#action_12909345
 ] 

Julien Le Dem commented on PIG-1479:


Thanks Richard!

 Embed Pig in scripting languages
 

 Key: PIG-1479
 URL: https://issues.apache.org/jira/browse/PIG-1479
 Project: Pig
  Issue Type: New Feature
Reporter: Julien Le Dem
 Attachments: PIG-1479.patch, pig-greek.tgz


 It should be possible to embed Pig calls in a scripting language and let 
 functions defined in the same script available as UDFs.
 This is a spin off of https://issues.apache.org/jira/browse/PIG-928 which 
 lets users define UDFs in scripting languages.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1589) add test cases for mapreduce operator which use distributed cache

2010-09-14 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1589?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1589:
---

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Patch committed to 0.8 branch and trunk.


 add test cases for mapreduce operator which use distributed cache
 -

 Key: PIG-1589
 URL: https://issues.apache.org/jira/browse/PIG-1589
 Project: Pig
  Issue Type: Task
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.8.0

 Attachments: PIG-1589.1.patch, TestWordCount.jar


 '-files filename' can be specified in the parameters for mapreduce operator 
 to send files to distributed cache. Need to add test cases for that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1609) 'union onschema' should give a more useful error message when schema of one of the relations has null column name

2010-09-14 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909377#action_12909377
 ] 

Thejas M Nair commented on PIG-1609:


All unit tests passed in my run. Patch is ready for review.

  

 'union onschema' should give a more useful error message when schema of one 
 of the relations has null column name
 -

 Key: PIG-1609
 URL: https://issues.apache.org/jira/browse/PIG-1609
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.8.0

 Attachments: PIG-1609.1.patch


 A better error message needs to be given in this case -
 {code}
 grunt l = load '/tmp/empty.bag' as (i : int);
 grunt f = foreach l generate i+1;
 grunt describe f;
 f: {int}
 grunt u = union onschema l , f;
 2010-09-10 18:08:13,000 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1000: Error during parsing. Error merging
 schemas for union operator
 Details at logfile: /Users/tejas/pig_nmr_syn/trunk/pig_1284167020897.log
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1611) use enums for error code

2010-09-14 Thread Thejas M Nair (JIRA)
use enums for error code


 Key: PIG-1611
 URL: https://issues.apache.org/jira/browse/PIG-1611
 Project: Pig
  Issue Type: Sub-task
Reporter: Thejas M Nair
 Fix For: 0.9.0


Pig code is using integer constants for error code, and the value of the error 
code is reserved using 
http://wiki.apache.org/pig/PigErrorHandlingFunctionalSpecification .
This process is cumbersome and error prone.

It will be better to use enum values instead. The enum value can contain the 
error message and encapsulate the error code. 
For example -
{code}
Replace 
throw new SchemaMergeException(Error in merging schema, 2124, 
PigException.BUG); 
with
throw new SchemaMergeException(SCHEMA_MERGE_EX, PigException.BUG); 

{code}


Where SCHEMA_MERGE_EX belongs to a error codes enum. We can use the ordinal 
value of the enum and an offset to determine the error code. 
The error code will be passed through the constructor of the enum.
{code}
SCHEMA_MERGE_EX(Error in merging schema);
{code}

For documentation, the error code and error messages can be dumped using code 
that uses the enum error code class.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1609) 'union onschema' should give a more useful error message when schema of one of the relations has null column name

2010-09-14 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909412#action_12909412
 ] 

Richard Ding commented on PIG-1609:
---

+1

 'union onschema' should give a more useful error message when schema of one 
 of the relations has null column name
 -

 Key: PIG-1609
 URL: https://issues.apache.org/jira/browse/PIG-1609
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.8.0

 Attachments: PIG-1609.1.patch


 A better error message needs to be given in this case -
 {code}
 grunt l = load '/tmp/empty.bag' as (i : int);
 grunt f = foreach l generate i+1;
 grunt describe f;
 f: {int}
 grunt u = union onschema l , f;
 2010-09-10 18:08:13,000 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1000: Error during parsing. Error merging
 schemas for union operator
 Details at logfile: /Users/tejas/pig_nmr_syn/trunk/pig_1284167020897.log
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1612) error reporting: PigException needs to have a way to indicate that its message is appropriate for user

2010-09-14 Thread Thejas M Nair (JIRA)
error reporting: PigException needs to have a way to indicate that its message 
is appropriate for user
--

 Key: PIG-1612
 URL: https://issues.apache.org/jira/browse/PIG-1612
 Project: Pig
  Issue Type: Improvement
Reporter: Thejas M Nair
 Fix For: 0.9.0


The error message printed to the user by pig is the message from the exception 
that is the 'root cause' from the chain of getCause() of exception that has 
been thrown. But often the 'root cause' exception does not have enough context 
that would make for a better error message. It should be possible for a 
PigException to indicate to the code that determines the error message that its 
getMessage() string should be used instead of that of the 'cause' exception.

The following code in LogUtils.java is used to determine the exception that is 
the 'root cause' -
{code}
public static PigException getPigException(Throwable top) {
Throwable current = top;
Throwable pigException = top;

while (current != null  current.getCause() != null){
current = current.getCause();
if((current instanceof PigException)  
(((PigException)current).getErrorCode() != 0)) {
pigException = current;
}
}
return (pigException instanceof PigException? 
(PigException)pigException : null);

}
{code}





-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1612) error reporting: PigException needs to have a way to indicate that its message is appropriate for user

2010-09-14 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909418#action_12909418
 ] 

Thejas M Nair commented on PIG-1612:


For example, in this exception stack trace -
{code}
Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Error 
merging schemas for union operator : Error merging schema: ({i: int,j: long}) 
with merged schema: ({l1::i: int,l1::j: long,l2::i: int,l2::j: long}) of 
schemas : [{l1::i: int,l1::j: long,l2::i: int,
l2::j: long}]
at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.UnionClause(QueryParser.java:3409)
at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1457)
at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:1010)
at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:797)
at 
org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1593)
... 13 more
Caused by: org.apache.pig.impl.logicalLayer.schema.SchemaMergeException: ERROR 
0: Error merging schema: ({i: int,j: long}) with merged schema: ({l1::i: 
int,l1::j: long,l2::i: int,l2::j: long}) of schemas : [{l1::i: int,l1::j: 
long,l2::i: int,l2::j: long}]
at 
org.apache.pig.impl.logicalLayer.schema.Schema.mergeSchemasByAlias(Schema.java:1652)
at 
org.apache.pig.impl.logicalLayer.parser.QueryParser.UnionClause(QueryParser.java:3405)
... 18 more
Caused by: org.apache.pig.impl.logicalLayer.schema.SchemaMergeException: ERROR 
0: Caught exception finding FieldSchema for aliasi
at 
org.apache.pig.impl.logicalLayer.schema.Schema.getFieldSubNameMatchThrowSchemaMergeException(Schema.java:1787)
at 
org.apache.pig.impl.logicalLayer.schema.Schema.mergeSchemaByAlias(Schema.java:1686)
at 
org.apache.pig.impl.logicalLayer.schema.Schema.mergeSchemasByAlias(Schema.java:1646)
... 19 more
Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1025: 
Found more than one match: l1::i, l2::i
at 
org.apache.pig.impl.logicalLayer.schema.Schema.getField(Schema.java:819)
at 
org.apache.pig.impl.logicalLayer.schema.Schema.getFieldSubNameMatch(Schema.java:836)
at 
org.apache.pig.impl.logicalLayer.schema.Schema.getFieldSubNameMatchThrowSchemaMergeException(Schema.java:1783)
... 21 more

{code}
The pig statement that results in this error is a union command -
u = union onschema f, l3;

The error message that is printed only says -  'Found more than one match: 
l1::i, l2::i' . It would be more useful for the user if we are able to say 
something on lines of  -
Error merging schema: ({i: int,j: long}) with merged schema: ({l1::i: 
int,l1::j: long,l2::i: int,l2::j: long}) of schemas : [{l1::i: int,l1::j: 
long,l2::i: int,l2::j: long}]. Found more than one match: l1::i, l2::i 
(assuming this was the message generated exception from Schema.java:1652)



 error reporting: PigException needs to have a way to indicate that its 
 message is appropriate for user
 --

 Key: PIG-1612
 URL: https://issues.apache.org/jira/browse/PIG-1612
 Project: Pig
  Issue Type: Improvement
Reporter: Thejas M Nair
 Fix For: 0.9.0


 The error message printed to the user by pig is the message from the 
 exception that is the 'root cause' from the chain of getCause() of exception 
 that has been thrown. But often the 'root cause' exception does not have 
 enough context that would make for a better error message. It should be 
 possible for a PigException to indicate to the code that determines the error 
 message that its getMessage() string should be used instead of that of the 
 'cause' exception.
 The following code in LogUtils.java is used to determine the exception that 
 is the 'root cause' -
 {code}
 public static PigException getPigException(Throwable top) {
 Throwable current = top;
 Throwable pigException = top;
 while (current != null  current.getCause() != null){
 current = current.getCause();
 if((current instanceof PigException)  
 (((PigException)current).getErrorCode() != 0)) {
 pigException = current;
 }
 }
 return (pigException instanceof PigException? 
 (PigException)pigException : null);
 
 }
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1609) 'union onschema' should give a more useful error message when schema of one of the relations has null column name

2010-09-14 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1609:
---

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Patch committed to 0.8 branch and trunk.


 'union onschema' should give a more useful error message when schema of one 
 of the relations has null column name
 -

 Key: PIG-1609
 URL: https://issues.apache.org/jira/browse/PIG-1609
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Fix For: 0.8.0

 Attachments: PIG-1609.1.patch


 A better error message needs to be given in this case -
 {code}
 grunt l = load '/tmp/empty.bag' as (i : int);
 grunt f = foreach l generate i+1;
 grunt describe f;
 f: {int}
 grunt u = union onschema l , f;
 2010-09-10 18:08:13,000 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1000: Error during parsing. Error merging
 schemas for union operator
 Details at logfile: /Users/tejas/pig_nmr_syn/trunk/pig_1284167020897.log
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1608) pig should always include pig-default.properties and pig.properties in the pig.jar

2010-09-14 Thread niraj rai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

niraj rai updated PIG-1608:
---

Attachment: PIG-1608_0.patch

This patch will include pig-default.properties with each pig jar file, by 
default.

 pig should always include pig-default.properties and pig.properties in the 
 pig.jar
 --

 Key: PIG-1608
 URL: https://issues.apache.org/jira/browse/PIG-1608
 Project: Pig
  Issue Type: Bug
Reporter: niraj rai
Assignee: niraj rai
 Attachments: PIG-1608_0.patch


 pig should always include pig-default.properties and pig.properties as a part 
 of the pig.jar file

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1611) use enums for error code

2010-09-14 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909493#action_12909493
 ] 

Dmitriy V. Ryaboy commented on PIG-1611:


+140

 use enums for error code
 

 Key: PIG-1611
 URL: https://issues.apache.org/jira/browse/PIG-1611
 Project: Pig
  Issue Type: Sub-task
Reporter: Thejas M Nair
 Fix For: 0.9.0


 Pig code is using integer constants for error code, and the value of the 
 error code is reserved using 
 http://wiki.apache.org/pig/PigErrorHandlingFunctionalSpecification .
 This process is cumbersome and error prone.
 It will be better to use enum values instead. The enum value can contain the 
 error message and encapsulate the error code. 
 For example -
 {code}
 Replace 
 throw new SchemaMergeException(Error in merging schema, 2124, 
 PigException.BUG); 
 with
 throw new SchemaMergeException(SCHEMA_MERGE_EX, PigException.BUG); 
 {code}
 Where SCHEMA_MERGE_EX belongs to a error codes enum. We can use the ordinal 
 value of the enum and an offset to determine the error code. 
 The error code will be passed through the constructor of the enum.
 {code}
 SCHEMA_MERGE_EX(Error in merging schema);
 {code}
 For documentation, the error code and error messages can be dumped using code 
 that uses the enum error code class.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1542) log level not propogated to MR task loggers

2010-09-14 Thread niraj rai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

niraj rai updated PIG-1542:
---

Status: Patch Available  (was: Open)

 log level not propogated to MR task loggers
 ---

 Key: PIG-1542
 URL: https://issues.apache.org/jira/browse/PIG-1542
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: niraj rai
 Fix For: 0.8.0

 Attachments: PIG-1542.patch, PIG-1542_1.patch


 Specifying -d DEBUG does not affect the logging of the MR tasks .
 This was fixed earlier in PIG-882 .

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1608) pig should always include pig-default.properties and pig.properties in the pig.jar

2010-09-14 Thread niraj rai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

niraj rai updated PIG-1608:
---

Status: Patch Available  (was: Open)

 pig should always include pig-default.properties and pig.properties in the 
 pig.jar
 --

 Key: PIG-1608
 URL: https://issues.apache.org/jira/browse/PIG-1608
 Project: Pig
  Issue Type: Bug
Reporter: niraj rai
Assignee: niraj rai
 Attachments: PIG-1608_0.patch


 pig should always include pig-default.properties and pig.properties as a part 
 of the pig.jar file

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1479) Embed Pig in scripting languages

2010-09-14 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1479:
--

Attachment: PIG-1479_2.patch

In the previous patch, the executeScript method on ScriptPigServer returns a 
list of ExecJobs (one for each store statement in the script). Unfortunately, 
the order of ExecJobs in the list is indeterminate.  

This patch fixes this problem by making the executeScript method return a 
PigStats object. One then can retrieves the output result by the alias 
corresponding to store statement.

Here is a example:

{code}
P = pig.executeScript(
A = load '${input}';
... ...
store G into '${output}'; )

output = P.result(G)  # an OutputStats object
iter = output.iterator()
if iter.hasNext():
# do something
else:
# do something else
{code} 

 Embed Pig in scripting languages
 

 Key: PIG-1479
 URL: https://issues.apache.org/jira/browse/PIG-1479
 Project: Pig
  Issue Type: New Feature
Reporter: Julien Le Dem
 Attachments: PIG-1479.patch, PIG-1479_2.patch, pig-greek.tgz


 It should be possible to embed Pig calls in a scripting language and let 
 functions defined in the same script available as UDFs.
 This is a spin off of https://issues.apache.org/jira/browse/PIG-928 which 
 lets users define UDFs in scripting languages.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1479) Embed Pig in scripting languages

2010-09-14 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1479:
--

Attachment: pig-greek-test.tar

Attach the updated test program from Julien.

To run the example:

* tar -xvf pig-greek-test.tar
* java -cp pig.jar:jython jar org.apache.pig.Main -x local -g script/tc.py

 Embed Pig in scripting languages
 

 Key: PIG-1479
 URL: https://issues.apache.org/jira/browse/PIG-1479
 Project: Pig
  Issue Type: New Feature
Reporter: Julien Le Dem
 Attachments: PIG-1479.patch, PIG-1479_2.patch, pig-greek-test.tar, 
 pig-greek.tgz


 It should be possible to embed Pig calls in a scripting language and let 
 functions defined in the same script available as UDFs.
 This is a spin off of https://issues.apache.org/jira/browse/PIG-928 which 
 lets users define UDFs in scripting languages.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1578) PigServer.executeBatch does not return status of failed job for native mapreduce statement

2010-09-14 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1578:


Fix Version/s: (was: 0.8.0)

 PigServer.executeBatch does not return status of failed job for native 
 mapreduce statement
 --

 Key: PIG-1578
 URL: https://issues.apache.org/jira/browse/PIG-1578
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair
Assignee: Richard Ding

 For failed job PigServer.executeBatch does not return ExecJob . 
 ExecJobs are created using output statistics, and the output statistics for 
 jobs that failed does not seem to exist.
 The query i tried was a native mapreduce job, where the output file of the 
 native mr job already exists causing that job to fail.
 {code}
 A = load ' + INPUT_FILE + ';
 B = mapreduce ' + jarFileName + '  +
 Store A into 'table_testNativeMRJobSimple_input' +
 Load 'table_testNativeMRJobSimple_output' +
 `WordCount table_testNativeMRJobSimple_input  + INPUT_FILE + 
 `;);
 Store B into 'table_testNativeMRJobSimpleDir';);
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-815) misleading error message when streaming fails

2010-09-14 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-815.


Resolution: Won't Fix

I don't think we have sufficient information to act on this

 misleading error message when streaming fails
 -

 Key: PIG-815
 URL: https://issues.apache.org/jira/browse/PIG-815
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Olga Natkovich
Assignee: Gunther Hagleitner
 Fix For: 0.9.0


 One of the users reported seeing a confusing message: Jobs not found in the 
 JobClient. Please try to use Local, Hadoop Distributed or Hadoop MiniCluster 
 modes instead of Hadoop LocalExecution ERROR 2055: Received Error while 
 processing the map plan: 'process.pl ' failed with exit status: 255 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-638) error handling - enforce error codes

2010-09-14 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-638:
---

Fix Version/s: (was: 0.9.0)

 error handling - enforce error codes
 

 Key: PIG-638
 URL: https://issues.apache.org/jira/browse/PIG-638
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
Assignee: Santhosh Srinivasan

 We should not allow exceptions that don't set error code as that kind of 
 information is not helpful for users.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1017) Converts strings to text in Pig

2010-09-14 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-1017:


Assignee: Thejas M Nair  (was: Sriranjan Manjunath)

We need to decide if this is something we should do for 0.9

 Converts strings to text in Pig
 ---

 Key: PIG-1017
 URL: https://issues.apache.org/jira/browse/PIG-1017
 Project: Pig
  Issue Type: Improvement
Reporter: Sriranjan Manjunath
Assignee: Thejas M Nair
 Fix For: 0.9.0

 Attachments: stotext.patch


 Strings in Java are UTF-16 and takes 2 bytes. Text 
 (org.apache.hadoop.io.Text) stores the data in UTF-8 and could show 
 significant reductions in memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.