[jira] Commented: (PIG-872) use distributed cache for the replicated data set in FR join

2009-11-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781415#action_12781415
 ] 

Hadoop QA commented on PIG-872:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12425805/PIG_872.patch.1
  against trunk revision 882818.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/165/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/165/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/165/console

This message is automatically generated.

 use distributed cache for the replicated data set in FR join
 

 Key: PIG-872
 URL: https://issues.apache.org/jira/browse/PIG-872
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Sriranjan Manjunath
 Attachments: PIG_872.patch.1


 Currently, the replicated file is read directly from DFS by all maps. If the 
 number of the concurrent maps is huge, we can overwhelm the NameNode with 
 open calls.
 Using distributed cache will address the issue and might also give a 
 performance boost since the file will be copied locally once and the reused 
 by all tasks running on the same machine.
 The basic approach would be to use cacheArchive to place the file into the 
 cache on the frontend and on the backend, the tasks would need to refer to 
 the data using path from the cache.
 Note that cacheArchive does not work in Hadoop local mode. (Not a problem for 
 us right now as we don't use it.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-872) use distributed cache for the replicated data set in FR join

2009-11-23 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-872:
---

Status: Open  (was: Patch Available)

 use distributed cache for the replicated data set in FR join
 

 Key: PIG-872
 URL: https://issues.apache.org/jira/browse/PIG-872
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Sriranjan Manjunath
 Attachments: PIG_872.patch.1


 Currently, the replicated file is read directly from DFS by all maps. If the 
 number of the concurrent maps is huge, we can overwhelm the NameNode with 
 open calls.
 Using distributed cache will address the issue and might also give a 
 performance boost since the file will be copied locally once and the reused 
 by all tasks running on the same machine.
 The basic approach would be to use cacheArchive to place the file into the 
 cache on the frontend and on the backend, the tasks would need to refer to 
 the data using path from the cache.
 Note that cacheArchive does not work in Hadoop local mode. (Not a problem for 
 us right now as we don't use it.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-872) use distributed cache for the replicated data set in FR join

2009-11-23 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-872:
---

Status: Patch Available  (was: Open)

resubmitting the patch. looks like we had problems running tests

 use distributed cache for the replicated data set in FR join
 

 Key: PIG-872
 URL: https://issues.apache.org/jira/browse/PIG-872
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Sriranjan Manjunath
 Attachments: PIG_872.patch.1


 Currently, the replicated file is read directly from DFS by all maps. If the 
 number of the concurrent maps is huge, we can overwhelm the NameNode with 
 open calls.
 Using distributed cache will address the issue and might also give a 
 performance boost since the file will be copied locally once and the reused 
 by all tasks running on the same machine.
 The basic approach would be to use cacheArchive to place the file into the 
 cache on the frontend and on the backend, the tasks would need to refer to 
 the data using path from the cache.
 Note that cacheArchive does not work in Hadoop local mode. (Not a problem for 
 us right now as we don't use it.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1091) [zebra] Exception when load with projection of map keys on a map column that is not map split

2009-11-23 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1091:
--

Fix Version/s: 0.6.0

 [zebra] Exception when load with projection of map keys on a map column that 
 is not map split 
 --

 Key: PIG-1091
 URL: https://issues.apache.org/jira/browse/PIG-1091
 Project: Pig
  Issue Type: Bug
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Minor
 Fix For: 0.6.0, 0.7.0

 Attachments: PIG-1091.patch


 With schema of f1:string, f2:map, storage info of [f1]; [f2], a 
 projection of f2#{a} will see exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-524) ORDER (x,y) gives syntax error

2009-11-23 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-524.


Resolution: Duplicate

This duplicate of PIG-900

 ORDER (x,y) gives syntax error
 --

 Key: PIG-524
 URL: https://issues.apache.org/jira/browse/PIG-524
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Olga Natkovich

 In trunk, this is a valid notation
 A = load 'data' as (x, y z);
 B = order A by (x,y);
 However, new code only allows
 B = order A by x,y;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-807) PERFORMANCE: Provide a way for UDFs to use read-once bags (backed by the Hadoop values iterator)

2009-11-23 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-807.


Resolution: Won't Fix

accumulator interface has been introduced for UDFs to solve this issue

 PERFORMANCE: Provide a way for UDFs to use read-once bags (backed by the 
 Hadoop values iterator)
 

 Key: PIG-807
 URL: https://issues.apache.org/jira/browse/PIG-807
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.1
Reporter: Pradeep Kamath

 Currently all bags resulting from a group or cogroup are materialized as bags 
 containing all of the contents. The issue with this is that if a particular 
 key has many corresponding values, all these values get stuffed in a bag 
 which may run out of memory and hence spill causing slow down in performance 
 and sometime memory exceptions. In many cases, the udfs which use these bags 
 coming out a group and cogroup only need to iterate over the bag in a 
 unidirectional read-once manner. This can be implemented by having the bag 
 implement its iterator by simply iterating over the underlying hadoop 
 iterator provided in the reduce. This kind of a bag is also needed in 
 http://issues.apache.org/jira/browse/PIG-802. So the code can be reused for 
 this issue too. The other part of this issue is to have some way for the udfs 
 to communicate to Pig that any input bags that they need are read once bags 
 . This can be achieved by having an Interface - say UsesReadOnceBags  which 
 is serves as a tag to indicate the intent to Pig. Pig can then rewire its 
 execution plan to use ReadOnceBags is feasible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1088) change merge join and merge join indexer to work with new LoadFunc interface

2009-11-23 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-1088:
---

Attachment: PIG-1088.1.patch

Changes address Pradeep's comments.

All mergejoin test cases pass. Also ran test-commit test cases and ensured that 
they match results seen in PIG-1094 .
test-patch results - 
 [exec] -1 overall.
 [exec]
 [exec] +1 @author.  The patch does not contain any @author tags.
 [exec]
 [exec] -1 tests included.  The patch doesn't appear to include any new 
or modified tests.
 [exec] Please justify why no tests are needed for 
this patch.
 [exec]
 [exec] +1 javadoc.  The javadoc tool did not generate any warning 
messages.
 [exec]
 [exec] +1 javac.  The applied patch does not increase the total number 
of javac compiler warnings.
 [exec]
 [exec] +1 findbugs.  The patch does not introduce any new Findbugs 
warnings.
 [exec]
 [exec] +1 release audit.  The applied patch does not increase the 
total number of release audit warnings.


 change merge join and merge join indexer to work with new LoadFunc interface
 

 Key: PIG-1088
 URL: https://issues.apache.org/jira/browse/PIG-1088
 Project: Pig
  Issue Type: Sub-task
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: PIG-1088.1.patch, PIG-1088.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-843) PERFORMANCE: improvements in memory management

2009-11-23 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-843.


Resolution: Fixed

I believe memory issue has been sufficiently addressed. 

 PERFORMANCE: improvements in memory management
 --

 Key: PIG-843
 URL: https://issues.apache.org/jira/browse/PIG-843
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich

 Currently, Pig uses way too much memory. We need to understand where memory 
 goes and come up with strategy to minimize memory footprint

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1078) [zebra] merge join with empty table failed

2009-11-23 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1078:
--

Fix Version/s: 0.7.0

 [zebra] merge join with empty table failed
 --

 Key: PIG-1078
 URL: https://issues.apache.org/jira/browse/PIG-1078
 Project: Pig
  Issue Type: Bug
Reporter: Jing Huang
 Fix For: 0.6.0, 0.7.0

 Attachments: PIG-1078.patch


 Got indexOutOfBound exception. 
 Here is the pig script:
 register /grid/0/dev/hadoopqa/jars/zebra.jar;
 --a1 = load '1.txt' as (a:int, 
 b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]);
 --a2 = load 'empty.txt' as (a:int, 
 b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]);
 --dump a1;
 --a1order = order a1 by a;
 --a2order = order a2 by a;
 --store a1order into 'a1' using 
 org.apache.hadoop.zebra.pig.TableStorer('[a,b,c];[d,e,f,r1,m1]');
 --store a2order into 'empty' using 
 org.apache.hadoop.zebra.pig.TableStorer('[a,b,c];[d,e,f,r1,m1]');
 rec1 = load 'a1' using org.apache.hadoop.zebra.pig.TableLoader();
 rec2 = load 'empty' using org.apache.hadoop.zebra.pig.TableLoader();
 joina = join rec1 by a, rec2 by a using merge ;
 dump joina;
 ==
 please note that table a1 and empty are created correctly. 
 Here is the stack trace:
 Backend error message
 -
 java.lang.ArrayIndexOutOfBoundsException: 0
 at 
 org.apache.hadoop.zebra.mapred.TableInputFormat.getTableRecordReader(TableInputFormat.java:478)
 at 
 org.apache.hadoop.zebra.pig.TableLoader.bindTo(TableLoader.java:166)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.seekInRightStream(POMergeJoin.java:400)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNext(POMergeJoin.java:181)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:247)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:238)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
 at org.apache.hadoop.mapred.Child.main(Child.java:159)
 Pig Stack Trace
 ---
 ERROR 6015: During execution, encountered a Hadoop error.
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
 open iterator for alias joina
 at org.apache.pig.PigServer.openIterator(PigServer.java:481)
 at 
 org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:539)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
 at org.apache.pig.Main.main(Main.java:386)
 Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 6015: 
 During execution, encountered a Hadoop error.
 at 
 .apache.hadoop.zebra.mapred.TableInputFormat.getTableRecordReader(TableInputFormat.java:478)
 at .apache.hadoop.zebra.pig.TableLoader.bindTo(TableLoader.java:166)
 at 
 .apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.seekInRightStream(POMergeJoin.java:400)
 at 
 .apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNext(POMergeJoin.java:181)
 at 
 .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:247)
 at 
 .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:238)
 at 
 .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
 at .apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at .apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
 at .apache.hadoop.mapred.MapTask.run(MapTask.java:307)
 Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
 ... 10 more
 
  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1074) Zebra store function should allow '::' in column names in output schema

2009-11-23 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1074:
--

Fix Version/s: 0.7.0
   0.6.0

 Zebra store function should allow '::' in column names in output schema
 ---

 Key: PIG-1074
 URL: https://issues.apache.org/jira/browse/PIG-1074
 Project: Pig
  Issue Type: Bug
Reporter: Pradeep Kamath
 Fix For: 0.6.0, 0.7.0


 the following script fails: 
  {noformat}
 a = load '/zebra/singlefile/studenttab10k' using 
 org.apache.hadoop.zebra.pig.TableLoader() as (name, age, gpa);
 b = load '/zebra/singlefile/votertab10k' using 
 org.apache.hadoop.zebra.pig.TableLoader() as (name, age, registration, 
 contributions);
 c = filter a by age  20;
 d = filter b by age  20;
 store c into 
 '/user/pig/out//ZebraMultiQuery_30.out.1' using 
 org.apache.hadoop.zebra.pig.TableStorer('');
 store d into 
 '/user/pig/out//ZebraMultiQuery_30.out.2' using 
 org.apache.hadoop.zebra.pig.TableStorer('');
 e = cogroup c by name, d by name;
 f = foreach e generate flatten(c), flatten(d);
 store f into '/user/pig//ZebraMultiQuery_30.out.3' 
 using org.apache.hadoop.zebra.pig.TableStorer('');
 {noformat}
 Here the schema of f has names like c::name and it looks like zebra storefunc 
 does not allow '::' in column name 
 The stack trace is
  
 ERROR 2997: Unable to recreate exception from backend error: 
 java.io.IOException: ColumnGroup.Writer constructor failed : Partition 
 constructor failed :Encountered  : :  at line 1, column 3.
  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1098) [zebra] Zebra Performance Optimizations

2009-11-23 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1098:
--

Fix Version/s: 0.7.0
   0.6.0

 [zebra] Zebra Performance Optimizations
 ---

 Key: PIG-1098
 URL: https://issues.apache.org/jira/browse/PIG-1098
 Project: Pig
  Issue Type: Improvement
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Minor
 Fix For: 0.6.0, 0.7.0


 Many in-core performance optimization opportunities exist in zebra, such as 
 removal of redundant precautionary checks, use of better collection types to 
 reduce levels of indirection to the memory objects, changing of input splits 
 in ascending sizes to descending sizes. Observed protyped improvements are 
 around 10% wall clock time improvements.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1095) [zebra] Schema support of anonymous fields in COLECTION fails

2009-11-23 Thread Yan Zhou (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yan Zhou updated PIG-1095:
--

Fix Version/s: 0.7.0
   0.6.0

 [zebra] Schema support of anonymous fields in COLECTION fails
 -

 Key: PIG-1095
 URL: https://issues.apache.org/jira/browse/PIG-1095
 Project: Pig
  Issue Type: Bug
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Minor
 Fix For: 0.6.0, 0.7.0


 The schema parser fails on schemas of COLLECTION columns like 
 c:collection(int).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: TPC-H benchmark

2009-11-23 Thread Alan Gates
I don't know of any.  Officially Pig cannot publish a TPC-H number  
because it is not a transaction based store.  But I still think it  
would be very interesting to see the results if someone took the time  
to translate the queries.


Alan.

On Nov 22, 2009, at 6:20 PM, RichardGUO Fei wrote:



Hi,



Apart from Pig Performance and Pig Mix, do you know any TPC-H  
benchmark rewritten for Pig?




Thanks,

Richard

_
MSN十周年庆典,查看MSN注册时间,赢取神秘大奖
http://10.msn.com.cn




[jira] Resolved: (PIG-844) PERFORMANCE: streaming data to the UDFs in foreach

2009-11-23 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-844.



accumulate interface took care of this.

 PERFORMANCE: streaming data to the UDFs in foreach
 --

 Key: PIG-844
 URL: https://issues.apache.org/jira/browse/PIG-844
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich

 Currently, Pig places the data passed to UDFs into a bag. This can cause the 
 process to use more memory than actually needed as in many cases it would be 
 better to push the data one tuple at a time to the UDFs.
 For the case where combiner is invoked, this might not be that important; 
 however, for non-algebraic UDFs as well as other cases where combiner can't 
 be used, this can provide significant memory improvement.
 Another possible use case is where the data is already grouped going into pig 
 and we don't need to group it again.
 How this will effect UDF interface needs to be further discussed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-856) PERFORMANCE: reduce number of replicas

2009-11-23 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-856.


Resolution: Won't Fix

We tried reducing the number of replicas and the performance actually degraded 
probably because there were fewer places to read the data from. 

 PERFORMANCE: reduce number of replicas
 --

 Key: PIG-856
 URL: https://issues.apache.org/jira/browse/PIG-856
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.3.0
Reporter: Olga Natkovich

 Currently Pig uses the default number of replicas between MR jobs. Currently, 
 the number is 3. Given the temp nature of the data, we should never need more 
 than 2 and should explicitely set it to improve performance and to be nicer 
 to the name node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-598) Parameter substitution ($PARAMETER) should not be performed in comments

2009-11-23 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-598:
--

Status: Open  (was: Patch Available)

 Parameter substitution ($PARAMETER) should not be performed in comments
 ---

 Key: PIG-598
 URL: https://issues.apache.org/jira/browse/PIG-598
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: David Ciemiewicz
Assignee: Thejas M Nair
 Attachments: PIG-598.patch


 Compiling the following code example will generate an error that 
 $NOT_A_PARAMETER is an Undefined Parameter.
 This is problematic as sometimes you want to comment out parts of your code, 
 including parameters so that you don't have to define them.
 This I think it would be really good if parameter substitution was not 
 performed in comments.
 {code}
 -- $NOT_A_PARAMETER
 {code}
 {code}
 -bash-3.00$ pig -exectype local -latest comment.pig
 USING: /grid/0/gs/pig/current
 java.lang.RuntimeException: Undefined parameter : NOT_A_PARAMETER
 at 
 org.apache.pig.tools.parameters.PreprocessorContext.substitute(PreprocessorContext.java:221)
 at 
 org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.parsePigFile(ParameterSubstitutionPreprocessor.java:106)
 at 
 org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.genSubstitutedFile(ParameterSubstitutionPreprocessor.java:86)
 at org.apache.pig.Main.runParamPreprocessor(Main.java:394)
 at org.apache.pig.Main.main(Main.java:296)
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-598) Parameter substitution ($PARAMETER) should not be performed in comments

2009-11-23 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-598:
--

Patch Info:   (was: [Patch Available])

 Parameter substitution ($PARAMETER) should not be performed in comments
 ---

 Key: PIG-598
 URL: https://issues.apache.org/jira/browse/PIG-598
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: David Ciemiewicz
Assignee: Thejas M Nair
 Attachments: PIG-598.patch


 Compiling the following code example will generate an error that 
 $NOT_A_PARAMETER is an Undefined Parameter.
 This is problematic as sometimes you want to comment out parts of your code, 
 including parameters so that you don't have to define them.
 This I think it would be really good if parameter substitution was not 
 performed in comments.
 {code}
 -- $NOT_A_PARAMETER
 {code}
 {code}
 -bash-3.00$ pig -exectype local -latest comment.pig
 USING: /grid/0/gs/pig/current
 java.lang.RuntimeException: Undefined parameter : NOT_A_PARAMETER
 at 
 org.apache.pig.tools.parameters.PreprocessorContext.substitute(PreprocessorContext.java:221)
 at 
 org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.parsePigFile(ParameterSubstitutionPreprocessor.java:106)
 at 
 org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.genSubstitutedFile(ParameterSubstitutionPreprocessor.java:86)
 at org.apache.pig.Main.runParamPreprocessor(Main.java:394)
 at org.apache.pig.Main.main(Main.java:296)
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-598) Parameter substitution ($PARAMETER) should not be performed in comments

2009-11-23 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781510#action_12781510
 ] 

Thejas M Nair commented on PIG-598:
---

bq. One issue I faced while working on PIG-928 was when trying to name 
variables in ruby bound to java variables.
Ashutosh,
You can use \ to escape parameter substitution . Use 'return 
\$input.split();' instead of 'return $input.split();' . After parameter 
substitution, it becomes 'return $input.split();' .



 Parameter substitution ($PARAMETER) should not be performed in comments
 ---

 Key: PIG-598
 URL: https://issues.apache.org/jira/browse/PIG-598
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: David Ciemiewicz
Assignee: Thejas M Nair
 Attachments: PIG-598.patch


 Compiling the following code example will generate an error that 
 $NOT_A_PARAMETER is an Undefined Parameter.
 This is problematic as sometimes you want to comment out parts of your code, 
 including parameters so that you don't have to define them.
 This I think it would be really good if parameter substitution was not 
 performed in comments.
 {code}
 -- $NOT_A_PARAMETER
 {code}
 {code}
 -bash-3.00$ pig -exectype local -latest comment.pig
 USING: /grid/0/gs/pig/current
 java.lang.RuntimeException: Undefined parameter : NOT_A_PARAMETER
 at 
 org.apache.pig.tools.parameters.PreprocessorContext.substitute(PreprocessorContext.java:221)
 at 
 org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.parsePigFile(ParameterSubstitutionPreprocessor.java:106)
 at 
 org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.genSubstitutedFile(ParameterSubstitutionPreprocessor.java:86)
 at org.apache.pig.Main.runParamPreprocessor(Main.java:394)
 at org.apache.pig.Main.main(Main.java:296)
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Reopened: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

2009-11-23 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath reopened PIG-1090:
-


Reopening since we need to implement LoadMetadata interface in BinStorage so as 
to implement the getSchema() method in that interface - this will depend on the 
decision for the comment - 
http://issues.apache.org/jira/browse/PIG-966?focusedCommentId=12780873page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12780873

 Update sources to reflect recent changes in load-store interfaces
 -

 Key: PIG-1090
 URL: https://issues.apache.org/jira/browse/PIG-1090
 Project: Pig
  Issue Type: Sub-task
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-1090.patch


 There have been some changes (as recorded in the Changes Section, Nov 2 2009 
 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
 load/store interfaces - this jira is to track the task of making those 
 changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-598) Parameter substitution ($PARAMETER) should not be performed in comments

2009-11-23 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-598:
--

Attachment: PIG-598.1.patch

Additional changes in this patch-
* Fixed parsing in  PigFileParser.jj 
* Modified test input file for testCommentWithParam() - inputComment.pig, to 
include comments within and at end of statements

 Parameter substitution ($PARAMETER) should not be performed in comments
 ---

 Key: PIG-598
 URL: https://issues.apache.org/jira/browse/PIG-598
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: David Ciemiewicz
Assignee: Thejas M Nair
 Attachments: PIG-598.1.patch, PIG-598.patch


 Compiling the following code example will generate an error that 
 $NOT_A_PARAMETER is an Undefined Parameter.
 This is problematic as sometimes you want to comment out parts of your code, 
 including parameters so that you don't have to define them.
 This I think it would be really good if parameter substitution was not 
 performed in comments.
 {code}
 -- $NOT_A_PARAMETER
 {code}
 {code}
 -bash-3.00$ pig -exectype local -latest comment.pig
 USING: /grid/0/gs/pig/current
 java.lang.RuntimeException: Undefined parameter : NOT_A_PARAMETER
 at 
 org.apache.pig.tools.parameters.PreprocessorContext.substitute(PreprocessorContext.java:221)
 at 
 org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.parsePigFile(ParameterSubstitutionPreprocessor.java:106)
 at 
 org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.genSubstitutedFile(ParameterSubstitutionPreprocessor.java:86)
 at org.apache.pig.Main.runParamPreprocessor(Main.java:394)
 at org.apache.pig.Main.main(Main.java:296)
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-598) Parameter substitution ($PARAMETER) should not be performed in comments

2009-11-23 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair updated PIG-598:
--

Status: Patch Available  (was: Open)

 Parameter substitution ($PARAMETER) should not be performed in comments
 ---

 Key: PIG-598
 URL: https://issues.apache.org/jira/browse/PIG-598
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: David Ciemiewicz
Assignee: Thejas M Nair
 Attachments: PIG-598.1.patch, PIG-598.patch


 Compiling the following code example will generate an error that 
 $NOT_A_PARAMETER is an Undefined Parameter.
 This is problematic as sometimes you want to comment out parts of your code, 
 including parameters so that you don't have to define them.
 This I think it would be really good if parameter substitution was not 
 performed in comments.
 {code}
 -- $NOT_A_PARAMETER
 {code}
 {code}
 -bash-3.00$ pig -exectype local -latest comment.pig
 USING: /grid/0/gs/pig/current
 java.lang.RuntimeException: Undefined parameter : NOT_A_PARAMETER
 at 
 org.apache.pig.tools.parameters.PreprocessorContext.substitute(PreprocessorContext.java:221)
 at 
 org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.parsePigFile(ParameterSubstitutionPreprocessor.java:106)
 at 
 org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.genSubstitutedFile(ParameterSubstitutionPreprocessor.java:86)
 at org.apache.pig.Main.runParamPreprocessor(Main.java:394)
 at org.apache.pig.Main.main(Main.java:296)
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-598) Parameter substitution ($PARAMETER) should not be performed in comments

2009-11-23 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781546#action_12781546
 ] 

Ashutosh Chauhan commented on PIG-598:
--

I guess my question is what should be the behavior when $ is specified in the 
script and no substitution for it is provided. There are two options: a) If Pig 
encounters a $ and doesn't find a substitution for it, it fails right there. 
b) Pig logs a warning message and continue assuming user wants literal $ and 
not the substitution.

Advantage for b) is there will not be a need of escaping. Disadvantage is when 
no substitution was unintentional, Pig will fail later, possibly with a 
different error message.
Disadvantage of a) is it mandates user to escape $, where its possible not to 
have such requirement. Advantage is a clear error message can be thrown if no 
substitution was unintentional.

What do you think which option shall we choose? 


 Parameter substitution ($PARAMETER) should not be performed in comments
 ---

 Key: PIG-598
 URL: https://issues.apache.org/jira/browse/PIG-598
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: David Ciemiewicz
Assignee: Thejas M Nair
 Attachments: PIG-598.1.patch, PIG-598.patch


 Compiling the following code example will generate an error that 
 $NOT_A_PARAMETER is an Undefined Parameter.
 This is problematic as sometimes you want to comment out parts of your code, 
 including parameters so that you don't have to define them.
 This I think it would be really good if parameter substitution was not 
 performed in comments.
 {code}
 -- $NOT_A_PARAMETER
 {code}
 {code}
 -bash-3.00$ pig -exectype local -latest comment.pig
 USING: /grid/0/gs/pig/current
 java.lang.RuntimeException: Undefined parameter : NOT_A_PARAMETER
 at 
 org.apache.pig.tools.parameters.PreprocessorContext.substitute(PreprocessorContext.java:221)
 at 
 org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.parsePigFile(ParameterSubstitutionPreprocessor.java:106)
 at 
 org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.genSubstitutedFile(ParameterSubstitutionPreprocessor.java:86)
 at org.apache.pig.Main.runParamPreprocessor(Main.java:394)
 at org.apache.pig.Main.main(Main.java:296)
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1091) [zebra] Exception when load with projection of map keys on a map column that is not map split

2009-11-23 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781551#action_12781551
 ] 

Alan Gates commented on PIG-1091:
-

Patch applied to 0.6  branch.

 [zebra] Exception when load with projection of map keys on a map column that 
 is not map split 
 --

 Key: PIG-1091
 URL: https://issues.apache.org/jira/browse/PIG-1091
 Project: Pig
  Issue Type: Bug
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Minor
 Fix For: 0.6.0, 0.7.0

 Attachments: PIG-1091.patch


 With schema of f1:string, f2:map, storage info of [f1]; [f2], a 
 projection of f2#{a} will see exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-872) use distributed cache for the replicated data set in FR join

2009-11-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781560#action_12781560
 ] 

Hadoop QA commented on PIG-872:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12425805/PIG_872.patch.1
  against trunk revision 882818.

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no tests are needed for this patch.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/51/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/51/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/51/console

This message is automatically generated.

 use distributed cache for the replicated data set in FR join
 

 Key: PIG-872
 URL: https://issues.apache.org/jira/browse/PIG-872
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Sriranjan Manjunath
 Attachments: PIG_872.patch.1


 Currently, the replicated file is read directly from DFS by all maps. If the 
 number of the concurrent maps is huge, we can overwhelm the NameNode with 
 open calls.
 Using distributed cache will address the issue and might also give a 
 performance boost since the file will be copied locally once and the reused 
 by all tasks running on the same machine.
 The basic approach would be to use cacheArchive to place the file into the 
 cache on the frontend and on the backend, the tasks would need to refer to 
 the data using path from the cache.
 Note that cacheArchive does not work in Hadoop local mode. (Not a problem for 
 us right now as we don't use it.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-598) Parameter substitution ($PARAMETER) should not be performed in comments

2009-11-23 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781562#action_12781562
 ] 

Thejas M Nair commented on PIG-598:
---

I prefer option a. It is just a matter of putting a \ before the $ in the 
scripts :) 
I think compared to the cost of time spending debugging a weird error or 
unexpected output results, the cost of a for the user is trivial.

Ideally, I think we should support an option where user can change from default 
behavior (a) to (b) using a commandline switch or a statement in the script.

 Parameter substitution ($PARAMETER) should not be performed in comments
 ---

 Key: PIG-598
 URL: https://issues.apache.org/jira/browse/PIG-598
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: David Ciemiewicz
Assignee: Thejas M Nair
 Attachments: PIG-598.1.patch, PIG-598.patch


 Compiling the following code example will generate an error that 
 $NOT_A_PARAMETER is an Undefined Parameter.
 This is problematic as sometimes you want to comment out parts of your code, 
 including parameters so that you don't have to define them.
 This I think it would be really good if parameter substitution was not 
 performed in comments.
 {code}
 -- $NOT_A_PARAMETER
 {code}
 {code}
 -bash-3.00$ pig -exectype local -latest comment.pig
 USING: /grid/0/gs/pig/current
 java.lang.RuntimeException: Undefined parameter : NOT_A_PARAMETER
 at 
 org.apache.pig.tools.parameters.PreprocessorContext.substitute(PreprocessorContext.java:221)
 at 
 org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.parsePigFile(ParameterSubstitutionPreprocessor.java:106)
 at 
 org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.genSubstitutedFile(ParameterSubstitutionPreprocessor.java:86)
 at org.apache.pig.Main.runParamPreprocessor(Main.java:394)
 at org.apache.pig.Main.main(Main.java:296)
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-1078) [zebra] merge join with empty table failed

2009-11-23 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-1078:
---

Assignee: Yan Zhou

 [zebra] merge join with empty table failed
 --

 Key: PIG-1078
 URL: https://issues.apache.org/jira/browse/PIG-1078
 Project: Pig
  Issue Type: Bug
Reporter: Jing Huang
Assignee: Yan Zhou
 Fix For: 0.6.0, 0.7.0

 Attachments: PIG-1078.patch


 Got indexOutOfBound exception. 
 Here is the pig script:
 register /grid/0/dev/hadoopqa/jars/zebra.jar;
 --a1 = load '1.txt' as (a:int, 
 b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]);
 --a2 = load 'empty.txt' as (a:int, 
 b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]);
 --dump a1;
 --a1order = order a1 by a;
 --a2order = order a2 by a;
 --store a1order into 'a1' using 
 org.apache.hadoop.zebra.pig.TableStorer('[a,b,c];[d,e,f,r1,m1]');
 --store a2order into 'empty' using 
 org.apache.hadoop.zebra.pig.TableStorer('[a,b,c];[d,e,f,r1,m1]');
 rec1 = load 'a1' using org.apache.hadoop.zebra.pig.TableLoader();
 rec2 = load 'empty' using org.apache.hadoop.zebra.pig.TableLoader();
 joina = join rec1 by a, rec2 by a using merge ;
 dump joina;
 ==
 please note that table a1 and empty are created correctly. 
 Here is the stack trace:
 Backend error message
 -
 java.lang.ArrayIndexOutOfBoundsException: 0
 at 
 org.apache.hadoop.zebra.mapred.TableInputFormat.getTableRecordReader(TableInputFormat.java:478)
 at 
 org.apache.hadoop.zebra.pig.TableLoader.bindTo(TableLoader.java:166)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.seekInRightStream(POMergeJoin.java:400)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNext(POMergeJoin.java:181)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:247)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:238)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
 at org.apache.hadoop.mapred.Child.main(Child.java:159)
 Pig Stack Trace
 ---
 ERROR 6015: During execution, encountered a Hadoop error.
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
 open iterator for alias joina
 at org.apache.pig.PigServer.openIterator(PigServer.java:481)
 at 
 org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:539)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
 at org.apache.pig.Main.main(Main.java:386)
 Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 6015: 
 During execution, encountered a Hadoop error.
 at 
 .apache.hadoop.zebra.mapred.TableInputFormat.getTableRecordReader(TableInputFormat.java:478)
 at .apache.hadoop.zebra.pig.TableLoader.bindTo(TableLoader.java:166)
 at 
 .apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.seekInRightStream(POMergeJoin.java:400)
 at 
 .apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNext(POMergeJoin.java:181)
 at 
 .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:247)
 at 
 .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:238)
 at 
 .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65)
 at .apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
 at .apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
 at .apache.hadoop.mapred.MapTask.run(MapTask.java:307)
 Caused by: java.lang.ArrayIndexOutOfBoundsException: 0
 ... 10 more
 
  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1088) change merge join and merge join indexer to work with new LoadFunc interface

2009-11-23 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781608#action_12781608
 ] 

Pradeep Kamath commented on PIG-1088:
-

The patch did not include tests since there were existing tests in 
TestMergeJoin which I confirmed work in the load-store branch with this patch.

 change merge join and merge join indexer to work with new LoadFunc interface
 

 Key: PIG-1088
 URL: https://issues.apache.org/jira/browse/PIG-1088
 Project: Pig
  Issue Type: Sub-task
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: PIG-1088.1.patch, PIG-1088.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1088) change merge join and merge join indexer to work with new LoadFunc interface

2009-11-23 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1088:


  Resolution: Fixed
Hadoop Flags: [Incompatible change, Reviewed]
  Status: Resolved  (was: Patch Available)

+1, Patch committed to load-store-redesign with the minor change made in 
consultation with Thejas:
DataType.isAtomic returns true for GENERIC_WRITABLECOMPARABLE and 
DataType.isComplex returns false for it.

 change merge join and merge join indexer to work with new LoadFunc interface
 

 Key: PIG-1088
 URL: https://issues.apache.org/jira/browse/PIG-1088
 Project: Pig
  Issue Type: Sub-task
Reporter: Thejas M Nair
Assignee: Thejas M Nair
 Attachments: PIG-1088.1.patch, PIG-1088.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-966) Proposed rework for LoadFunc, StoreFunc, and Slice/r interfaces

2009-11-23 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781692#action_12781692
 ] 

Dmitriy V. Ryaboy commented on PIG-966:
---

LoadFunc has a method called determineSchema, not getSchema. This implies some 
sort of introspection, so I can see interpreting this as if you are looking at 
the data, use determineSchema, and if you have a metadata store/repo then 
implement LoadMetadata. 

But I agree this is clunky and potentially confusing. 

I am of two minds about this. On one hand, moving the method make sense as it's 
metadata-related. On the other hand, it makes implementations that work with 
self-describing formats like Avro implement a heavy-looking interface, and 
requires further changes to existing LoadFunc implementations that will have to 
be ported. 

Another issue is that LoadMetadata.getSchema() returns a ResourceSchema, 
whereas LoadFunc.determineSchema() returns Pig's Schema. The two are compatible 
(I have a translation from one to the other in PIG-760), but not the same. 

 Proposed rework for LoadFunc, StoreFunc, and Slice/r interfaces
 ---

 Key: PIG-966
 URL: https://issues.apache.org/jira/browse/PIG-966
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Alan Gates
Assignee: Alan Gates

 I propose that we rework the LoadFunc, StoreFunc, and Slice/r interfaces 
 significantly.  See http://wiki.apache.org/pig/LoadStoreRedesignProposal for 
 full details

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-966) Proposed rework for LoadFunc, StoreFunc, and Slice/r interfaces

2009-11-23 Thread Dmitriy V. Ryaboy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781695#action_12781695
 ] 

Dmitriy V. Ryaboy commented on PIG-966:
---

Regarding Streaming:

We should support Typed Bytes as a binary protocol for streaming.  This was a 
huge performance win for Dumbo (and I think Hive, as well).

Here's a 7-slide intro: 
http://static.last.fm/johan/huguk-20090414/klaas-hadoop-1722.pdf

Patch/discussion here: https://issues.apache.org/jira/browse/HADOOP-1722

 Proposed rework for LoadFunc, StoreFunc, and Slice/r interfaces
 ---

 Key: PIG-966
 URL: https://issues.apache.org/jira/browse/PIG-966
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Alan Gates
Assignee: Alan Gates

 I propose that we rework the LoadFunc, StoreFunc, and Slice/r interfaces 
 significantly.  See http://wiki.apache.org/pig/LoadStoreRedesignProposal for 
 full details

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-598) Parameter substitution ($PARAMETER) should not be performed in comments

2009-11-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781705#action_12781705
 ] 

Hadoop QA commented on PIG-598:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12425862/PIG-598.1.patch
  against trunk revision 882818.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 48 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 213 javac compiler warnings (more 
than the trunk's current 211 warnings).

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

-1 release audit.  The applied patch generated 361 release audit warnings 
(more than the trunk's current 356 warnings).

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/52/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/52/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/52/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/52/console

This message is automatically generated.

 Parameter substitution ($PARAMETER) should not be performed in comments
 ---

 Key: PIG-598
 URL: https://issues.apache.org/jira/browse/PIG-598
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: David Ciemiewicz
Assignee: Thejas M Nair
 Attachments: PIG-598.1.patch, PIG-598.patch


 Compiling the following code example will generate an error that 
 $NOT_A_PARAMETER is an Undefined Parameter.
 This is problematic as sometimes you want to comment out parts of your code, 
 including parameters so that you don't have to define them.
 This I think it would be really good if parameter substitution was not 
 performed in comments.
 {code}
 -- $NOT_A_PARAMETER
 {code}
 {code}
 -bash-3.00$ pig -exectype local -latest comment.pig
 USING: /grid/0/gs/pig/current
 java.lang.RuntimeException: Undefined parameter : NOT_A_PARAMETER
 at 
 org.apache.pig.tools.parameters.PreprocessorContext.substitute(PreprocessorContext.java:221)
 at 
 org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.parsePigFile(ParameterSubstitutionPreprocessor.java:106)
 at 
 org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.genSubstitutedFile(ParameterSubstitutionPreprocessor.java:86)
 at org.apache.pig.Main.runParamPreprocessor(Main.java:394)
 at org.apache.pig.Main.main(Main.java:296)
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-598) Parameter substitution ($PARAMETER) should not be performed in comments

2009-11-23 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-598?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781711#action_12781711
 ] 

Thejas M Nair commented on PIG-598:
---

bq. -1 javac. The applied patch generated 213 javac compiler warnings (more 
than the trunk's current 211 warnings).
The additional warnings are from code generated by javacc, which cannot be 
fixed in the .jj files.

bq. -1 release audit. The applied patch generated 361 release audit warnings 
(more than the trunk's current 356 warnings).
The release audit warnings are from new test input and benchmark files, because 
they don't have the apache license header.


 Parameter substitution ($PARAMETER) should not be performed in comments
 ---

 Key: PIG-598
 URL: https://issues.apache.org/jira/browse/PIG-598
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.2.0
Reporter: David Ciemiewicz
Assignee: Thejas M Nair
 Attachments: PIG-598.1.patch, PIG-598.patch


 Compiling the following code example will generate an error that 
 $NOT_A_PARAMETER is an Undefined Parameter.
 This is problematic as sometimes you want to comment out parts of your code, 
 including parameters so that you don't have to define them.
 This I think it would be really good if parameter substitution was not 
 performed in comments.
 {code}
 -- $NOT_A_PARAMETER
 {code}
 {code}
 -bash-3.00$ pig -exectype local -latest comment.pig
 USING: /grid/0/gs/pig/current
 java.lang.RuntimeException: Undefined parameter : NOT_A_PARAMETER
 at 
 org.apache.pig.tools.parameters.PreprocessorContext.substitute(PreprocessorContext.java:221)
 at 
 org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.parsePigFile(ParameterSubstitutionPreprocessor.java:106)
 at 
 org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.genSubstitutedFile(ParameterSubstitutionPreprocessor.java:86)
 at org.apache.pig.Main.runParamPreprocessor(Main.java:394)
 at org.apache.pig.Main.main(Main.java:296)
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1095) [zebra] Schema support of anonymous fields in COLECTION fails

2009-11-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12781781#action_12781781
 ] 

Hadoop QA commented on PIG-1095:


+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12425897/PIG-1095.patch
  against trunk revision 883515.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 5 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/53/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/53/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/53/console

This message is automatically generated.

 [zebra] Schema support of anonymous fields in COLECTION fails
 -

 Key: PIG-1095
 URL: https://issues.apache.org/jira/browse/PIG-1095
 Project: Pig
  Issue Type: Bug
Reporter: Yan Zhou
Assignee: Yan Zhou
Priority: Minor
 Fix For: 0.6.0, 0.7.0

 Attachments: PIG-1095.patch


 The schema parser fails on schemas of COLLECTION columns like 
 c:collection(int).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: TPC-H benchmark

2009-11-23 Thread Jeff Hammerbacher
Hey,

It's not Pig, but if you're looking for TPC-H on Hadoop, the Hive team has
run the TPC-H benchmarks: http://issues.apache.org/jira/browse/HIVE-600.

Regards,
Jeff

2009/11/23 Alan Gates ga...@yahoo-inc.com

 I don't know of any.  Officially Pig cannot publish a TPC-H number because
 it is not a transaction based store.  But I still think it would be very
 interesting to see the results if someone took the time to translate the
 queries.

 Alan.


 On Nov 22, 2009, at 6:20 PM, RichardGUO Fei wrote:


 Hi,



 Apart from Pig Performance and Pig Mix, do you know any TPC-H benchmark
 rewritten for Pig?



 Thanks,

 Richard

 _
 MSN十周年庆典,查看MSN注册时间,赢取神秘大奖
 http://10.msn.com.cn