[jira] Commented: (PIG-1102) Collect number of spills per job
[ https://issues.apache.org/jira/browse/PIG-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792399#action_12792399 ] Hadoop QA commented on PIG-1102: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12428356/PIG_1102.patch against trunk revision 892125. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 400 release audit warnings (more than the trunk's current 397 warnings). -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/139/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/139/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/139/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/139/console This message is automatically generated. Collect number of spills per job Key: PIG-1102 URL: https://issues.apache.org/jira/browse/PIG-1102 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Sriranjan Manjunath Fix For: 0.7.0 Attachments: PIG_1102.patch Memory shortage is one of the main performance issues in Pig. Knowing when we spill do the disk is useful for understanding query performance and also to see how certain changes in Pig effect that. Other interesting stats to collect would be average CPU usage and max mem usage but I am not sure if this information is easily retrievable. Using Hadoop counters for this would make sense. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1157) Sucessive replicated joins do not generate Map Reduce plan and fails due to OOM
[ https://issues.apache.org/jira/browse/PIG-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792468#action_12792468 ] Hadoop QA commented on PIG-1157: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12428359/PIG-1157.patch against trunk revision 892125. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/140/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/140/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/140/console This message is automatically generated. Sucessive replicated joins do not generate Map Reduce plan and fails due to OOM --- Key: PIG-1157 URL: https://issues.apache.org/jira/browse/PIG-1157 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Assignee: Richard Ding Fix For: 0.6.0 Attachments: oomreplicatedjoin.pig, PIG-1157.patch, replicatedjoinexplain.log Hi all, I have a script which does 2 replicated joins in succession. Please note that the inputs do not exist on the HDFS. {code} A = LOAD '/tmp/abc' USING PigStorage('\u0001') AS (a:long, b, c); A1 = FOREACH A GENERATE a; B = GROUP A1 BY a; C = LOAD '/tmp/xyz' USING PigStorage('\u0001') AS (x:long, y); D = JOIN C BY x, B BY group USING replicated; E = JOIN A BY a, D by x USING replicated; dump E; {code} 2009-12-16 19:12:00,253 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 4 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 1 map-only splittees. 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 1 map-reduce splittees. 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 2 out of total 2 splittees. 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 2 2009-12-16 19:12:00,713 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. unable to create new native thread Details at logfile: pig_1260990666148.log Looking at the log file: Pig Stack Trace --- ERROR 2998: Unhandled internal error. unable to create new native thread java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:597) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:131) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:773) at org.apache.pig.PigServer.store(PigServer.java:522) at org.apache.pig.PigServer.openIterator(PigServer.java:458) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) at org.apache.pig.Main.main(Main.java:397) If we want to look at the explain output, we find that there is no Map Reduce plan that is generated. Why is the M/R plan not generated? Attaching the script and explain output. Viraj -- This message is automatically generated by JIRA. - You can reply
[jira] Updated: (PIG-1157) Sucessive replicated joins do not generate Map Reduce plan and fails due to OOM
[ https://issues.apache.org/jira/browse/PIG-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1157: -- Status: Open (was: Patch Available) Sucessive replicated joins do not generate Map Reduce plan and fails due to OOM --- Key: PIG-1157 URL: https://issues.apache.org/jira/browse/PIG-1157 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Assignee: Richard Ding Fix For: 0.6.0 Attachments: oomreplicatedjoin.pig, PIG-1157.patch, PIG-1157.patch, replicatedjoinexplain.log Hi all, I have a script which does 2 replicated joins in succession. Please note that the inputs do not exist on the HDFS. {code} A = LOAD '/tmp/abc' USING PigStorage('\u0001') AS (a:long, b, c); A1 = FOREACH A GENERATE a; B = GROUP A1 BY a; C = LOAD '/tmp/xyz' USING PigStorage('\u0001') AS (x:long, y); D = JOIN C BY x, B BY group USING replicated; E = JOIN A BY a, D by x USING replicated; dump E; {code} 2009-12-16 19:12:00,253 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 4 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 1 map-only splittees. 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 1 map-reduce splittees. 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 2 out of total 2 splittees. 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 2 2009-12-16 19:12:00,713 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. unable to create new native thread Details at logfile: pig_1260990666148.log Looking at the log file: Pig Stack Trace --- ERROR 2998: Unhandled internal error. unable to create new native thread java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:597) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:131) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:773) at org.apache.pig.PigServer.store(PigServer.java:522) at org.apache.pig.PigServer.openIterator(PigServer.java:458) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) at org.apache.pig.Main.main(Main.java:397) If we want to look at the explain output, we find that there is no Map Reduce plan that is generated. Why is the M/R plan not generated? Attaching the script and explain output. Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1157) Sucessive replicated joins do not generate Map Reduce plan and fails due to OOM
[ https://issues.apache.org/jira/browse/PIG-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1157: -- Attachment: PIG-1157.patch Sucessive replicated joins do not generate Map Reduce plan and fails due to OOM --- Key: PIG-1157 URL: https://issues.apache.org/jira/browse/PIG-1157 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Assignee: Richard Ding Fix For: 0.6.0 Attachments: oomreplicatedjoin.pig, PIG-1157.patch, PIG-1157.patch, replicatedjoinexplain.log Hi all, I have a script which does 2 replicated joins in succession. Please note that the inputs do not exist on the HDFS. {code} A = LOAD '/tmp/abc' USING PigStorage('\u0001') AS (a:long, b, c); A1 = FOREACH A GENERATE a; B = GROUP A1 BY a; C = LOAD '/tmp/xyz' USING PigStorage('\u0001') AS (x:long, y); D = JOIN C BY x, B BY group USING replicated; E = JOIN A BY a, D by x USING replicated; dump E; {code} 2009-12-16 19:12:00,253 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 4 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 1 map-only splittees. 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 1 map-reduce splittees. 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 2 out of total 2 splittees. 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 2 2009-12-16 19:12:00,713 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. unable to create new native thread Details at logfile: pig_1260990666148.log Looking at the log file: Pig Stack Trace --- ERROR 2998: Unhandled internal error. unable to create new native thread java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:597) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:131) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:773) at org.apache.pig.PigServer.store(PigServer.java:522) at org.apache.pig.PigServer.openIterator(PigServer.java:458) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) at org.apache.pig.Main.main(Main.java:397) If we want to look at the explain output, we find that there is no Map Reduce plan that is generated. Why is the M/R plan not generated? Attaching the script and explain output. Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1157) Sucessive replicated joins do not generate Map Reduce plan and fails due to OOM
[ https://issues.apache.org/jira/browse/PIG-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1157: -- Status: Patch Available (was: Open) Sucessive replicated joins do not generate Map Reduce plan and fails due to OOM --- Key: PIG-1157 URL: https://issues.apache.org/jira/browse/PIG-1157 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Assignee: Richard Ding Fix For: 0.6.0 Attachments: oomreplicatedjoin.pig, PIG-1157.patch, PIG-1157.patch, replicatedjoinexplain.log Hi all, I have a script which does 2 replicated joins in succession. Please note that the inputs do not exist on the HDFS. {code} A = LOAD '/tmp/abc' USING PigStorage('\u0001') AS (a:long, b, c); A1 = FOREACH A GENERATE a; B = GROUP A1 BY a; C = LOAD '/tmp/xyz' USING PigStorage('\u0001') AS (x:long, y); D = JOIN C BY x, B BY group USING replicated; E = JOIN A BY a, D by x USING replicated; dump E; {code} 2009-12-16 19:12:00,253 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 4 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 1 map-only splittees. 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 1 map-reduce splittees. 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 2 out of total 2 splittees. 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 2 2009-12-16 19:12:00,713 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. unable to create new native thread Details at logfile: pig_1260990666148.log Looking at the log file: Pig Stack Trace --- ERROR 2998: Unhandled internal error. unable to create new native thread java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:597) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:131) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:773) at org.apache.pig.PigServer.store(PigServer.java:522) at org.apache.pig.PigServer.openIterator(PigServer.java:458) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) at org.apache.pig.Main.main(Main.java:397) If we want to look at the explain output, we find that there is no Map Reduce plan that is generated. Why is the M/R plan not generated? Attaching the script and explain output. Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1110) Handle compressed file formats -- Gz, BZip with the new proposal
[ https://issues.apache.org/jira/browse/PIG-1110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1110: -- Attachment: PIG-1110.patch The output from running ant test-patch target locally: {code} [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. {code} Handle compressed file formats -- Gz, BZip with the new proposal Key: PIG-1110 URL: https://issues.apache.org/jira/browse/PIG-1110 Project: Pig Issue Type: Sub-task Reporter: Richard Ding Assignee: Richard Ding Attachments: PIG-1110.patch, PIG-1110.patch, PIG_1110_Jeff.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables
[ https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792564#action_12792564 ] Alan Gates commented on PIG-1117: - There seems to be a lot of code duplication between HiveColumnarLoader.setup(String, boolean, String) and HiveColumnarLoader.setup(String, boolean). Could these two functions be combined or the common code factored out? Pig doesn't support BOOLEAN and BYTE as an external types, we only use them internally. So these should be converted to something else in HivecolumnarLoader.findPigDataType. You may want to implement fieldsToRead, as that allows Pig to tell your loader exactly what fields it requires for this query, without requiring the user to specify it. In HiveColumnarLoader.readRowColumns it is good to use TupleFactory.newTuple(int) rather than TupleFactory.newTuple() when you know the size of the tuple you'll be creating. newTuple(int) plus Tuple.set() is more efficient than newTuple() + Tuple.append(). svn diff doesn't add jars to patch files, so you'll need to attach the hive-exec.jar separately to the jira so that we can run tests. Also, please be aware that we are rewriting the entire load/store interface, and hope to release this soon, probably in 0.7. See PIG-966 for details. This obviously will affect your code. Hopefully it will make it much easier, as the need to write a separate slicer will go away. Pig reading hive columnar rc tables --- Key: PIG-1117 URL: https://issues.apache.org/jira/browse/PIG-1117 Project: Pig Issue Type: New Feature Reporter: Gerrit Jansen van Vuuren Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, PIG-1117.patch I've coded a LoadFunc implementation that can read from Hive Columnar RC tables, this is needed for a project that I'm working on because all our data is stored using the Hive thrift serialized Columnar RC format. I have looked at the piggy bank but did not find any implementation that could do this. We've been running it on our cluster for the last week and have worked out most bugs. There are still some improvements to be done but I would need like setting the amount of mappers based on date partitioning. Its been optimized so as to read only specific columns and can churn through a data set almost 8 times faster with this improvement because not all column data is read. I would like to contribute the class to the piggybank can you guide me in what I need to do? I've used hive specific classes to implement this, is it possible to add this to the piggy bank build ivy for automatic download of the dependencies? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-480) PERFORMANCE: Use identity mapper in a chain of M-R jobs
[ https://issues.apache.org/jira/browse/PIG-480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792569#action_12792569 ] Alan Gates commented on PIG-480: What kind of performance gain do we get from this? The only PigMIx query that looks like it would be directly affected is PigMix_3. It would be interesting to run that and a few other queries that we expect would benefit from this to measure the performance improvements. PERFORMANCE: Use identity mapper in a chain of M-R jobs --- Key: PIG-480 URL: https://issues.apache.org/jira/browse/PIG-480 Project: Pig Issue Type: Improvement Affects Versions: 0.2.0 Reporter: Olga Natkovich Assignee: Ying He Attachments: PIG_480.patch, PIG_480.patch For jobs with two or more MR jobs, use identity mapper wherever possible in second and subsequent MR jobs. Identity mapper is about 50% than pig empty map job because it doesn't parse the data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1149) Allow instantiation of SampleLoaders with parametrized LoadFuncs
[ https://issues.apache.org/jira/browse/PIG-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792573#action_12792573 ] Alan Gates commented on PIG-1149: - Changes look fine. Pradeep, will this apply as is to the load-store redesign branch or will we need a separate patch for that? Allow instantiation of SampleLoaders with parametrized LoadFuncs Key: PIG-1149 URL: https://issues.apache.org/jira/browse/PIG-1149 Project: Pig Issue Type: Bug Reporter: Dmitriy V. Ryaboy Assignee: Dmitriy V. Ryaboy Priority: Minor Fix For: 0.7.0 Attachments: pig_1149.patch Currently, it is not possible to instantiate a SampleLoader with something like PigStorage(':'). We should allow passing parameters to the loaders being sampled. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1149) Allow instantiation of SampleLoaders with parametrized LoadFuncs
[ https://issues.apache.org/jira/browse/PIG-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792578#action_12792578 ] Pradeep Kamath commented on PIG-1149: - I tried applying it on the branch and it failed: {noformat} :/tmp/load-store-redesign]patch -p0 /homes/pradeepk/dev/pig-apache/pig/trunk/pig_1149.patch patching file src/org/apache/pig/impl/builtin/SampleLoader.java Hunk #1 succeeded at 31 with fuzz 2 (offset -4 lines). Hunk #2 FAILED at 46. 1 out of 2 hunks FAILED -- saving rejects to file src/org/apache/pig/impl/builtin/SampleLoader.java.rej patching file test/org/apache/pig/test/TestPoissonSampleLoader.java [prade...@chargesize:/tmp/load-store-redesign] {noformat} Since Thejas worked on PIG-1062, he might be in a better position to check whether this patch needs changes. Allow instantiation of SampleLoaders with parametrized LoadFuncs Key: PIG-1149 URL: https://issues.apache.org/jira/browse/PIG-1149 Project: Pig Issue Type: Bug Reporter: Dmitriy V. Ryaboy Assignee: Dmitriy V. Ryaboy Priority: Minor Fix For: 0.7.0 Attachments: pig_1149.patch Currently, it is not possible to instantiate a SampleLoader with something like PigStorage(':'). We should allow passing parameters to the loaders being sampled. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1158) pig command line -M option doesn't support table union correctly (comma seperated paths)
[ https://issues.apache.org/jira/browse/PIG-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792589#action_12792589 ] Richard Ding commented on PIG-1158: --- Without -M option Pig converts paths to their absolute locations before passing them to the loaders/storers. With -M option, Pig passes the paths as is to the loaders/storers. This distinction seems to be obsolete. The fix will be to convert paths to their absolute locations in both cases. pig command line -M option doesn't support table union correctly (comma seperated paths) Key: PIG-1158 URL: https://issues.apache.org/jira/browse/PIG-1158 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Jing Huang Assignee: Richard Ding Fix For: 0.7.0 for example, load (1.txt,2.txt) USING org.apache.hadoop.zebra.pig.TableLoader() i see this errror from stand out: [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2100: hdfs://gsbl90380.blue.ygrid.yahoo.com/user/hadoopqa/1.txt,2.txt does not exist. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1157) Sucessive replicated joins do not generate Map Reduce plan and fails due to OOM
[ https://issues.apache.org/jira/browse/PIG-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792593#action_12792593 ] Olga Natkovich commented on PIG-1157: - +1. Patch looks good. Will commit once the tests pass. Sucessive replicated joins do not generate Map Reduce plan and fails due to OOM --- Key: PIG-1157 URL: https://issues.apache.org/jira/browse/PIG-1157 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Assignee: Richard Ding Fix For: 0.6.0 Attachments: oomreplicatedjoin.pig, PIG-1157.patch, PIG-1157.patch, replicatedjoinexplain.log Hi all, I have a script which does 2 replicated joins in succession. Please note that the inputs do not exist on the HDFS. {code} A = LOAD '/tmp/abc' USING PigStorage('\u0001') AS (a:long, b, c); A1 = FOREACH A GENERATE a; B = GROUP A1 BY a; C = LOAD '/tmp/xyz' USING PigStorage('\u0001') AS (x:long, y); D = JOIN C BY x, B BY group USING replicated; E = JOIN A BY a, D by x USING replicated; dump E; {code} 2009-12-16 19:12:00,253 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 4 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 1 map-only splittees. 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 1 map-reduce splittees. 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 2 out of total 2 splittees. 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 2 2009-12-16 19:12:00,713 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. unable to create new native thread Details at logfile: pig_1260990666148.log Looking at the log file: Pig Stack Trace --- ERROR 2998: Unhandled internal error. unable to create new native thread java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:597) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:131) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:773) at org.apache.pig.PigServer.store(PigServer.java:522) at org.apache.pig.PigServer.openIterator(PigServer.java:458) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) at org.apache.pig.Main.main(Main.java:397) If we want to look at the explain output, we find that there is no Map Reduce plan that is generated. Why is the M/R plan not generated? Attaching the script and explain output. Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1162) Pig 0.6.0 - UDF doc
Pig 0.6.0 - UDF doc --- Key: PIG-1162 URL: https://issues.apache.org/jira/browse/PIG-1162 Project: Pig Issue Type: Task Components: documentation Affects Versions: 0.6.0 Reporter: Corinne Chandel Assignee: Corinne Chandel Fix For: 0.6.0 Pig 0.6.0 - UDF doc Small corrections. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1162) Pig 0.6.0 - UDF doc
[ https://issues.apache.org/jira/browse/PIG-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Corinne Chandel updated PIG-1162: - Attachment: pig-6-udf.patch Patch file for UDF doc. Pig 0.6.0 - UDF doc --- Key: PIG-1162 URL: https://issues.apache.org/jira/browse/PIG-1162 Project: Pig Issue Type: Task Components: documentation Affects Versions: 0.6.0 Reporter: Corinne Chandel Assignee: Corinne Chandel Fix For: 0.6.0 Attachments: pig-6-udf.patch Pig 0.6.0 - UDF doc Small corrections. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1162) Pig 0.6.0 - UDF doc
[ https://issues.apache.org/jira/browse/PIG-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Corinne Chandel updated PIG-1162: - Status: Patch Available (was: Open) (1) apply this patch to Pig TRUNK (2) apply this patch to Pig branch-0.6 (3) Note: No new test code required; changes to documentation only. Pig 0.6.0 - UDF doc --- Key: PIG-1162 URL: https://issues.apache.org/jira/browse/PIG-1162 Project: Pig Issue Type: Task Components: documentation Affects Versions: 0.6.0 Reporter: Corinne Chandel Assignee: Corinne Chandel Fix For: 0.6.0 Attachments: pig-6-udf.patch Pig 0.6.0 - UDF doc Small corrections. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1102) Collect number of spills per job
[ https://issues.apache.org/jira/browse/PIG-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792612#action_12792612 ] Olga Natkovich commented on PIG-1102: - I will be reviewing this patch Collect number of spills per job Key: PIG-1102 URL: https://issues.apache.org/jira/browse/PIG-1102 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Sriranjan Manjunath Fix For: 0.7.0 Attachments: PIG_1102.patch Memory shortage is one of the main performance issues in Pig. Knowing when we spill do the disk is useful for understanding query performance and also to see how certain changes in Pig effect that. Other interesting stats to collect would be average CPU usage and max mem usage but I am not sure if this information is easily retrievable. Using Hadoop counters for this would make sense. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1149) Allow instantiation of SampleLoaders with parametrized LoadFuncs
[ https://issues.apache.org/jira/browse/PIG-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792627#action_12792627 ] Thejas M Nair commented on PIG-1149: I notice that this patch is using org.mortbay.log instead of org.apache.commons.logging. That is not used anywhere else in pig code. Should we replace that with org.apache.commons.logging ? A small change is required to get the patch working with load-store branch. It no longer requires the load func to implement SampleLoader interface, and that interface has been removed. I can submit the modified patch. {code} +loader = (SamplableLoader)PigContext.instantiateFuncFromSpec(funcSpec); {code} changes to +loader = (LoadFunc)PigContext.instantiateFuncFromSpec(funcSpec); {code} Allow instantiation of SampleLoaders with parametrized LoadFuncs Key: PIG-1149 URL: https://issues.apache.org/jira/browse/PIG-1149 Project: Pig Issue Type: Bug Reporter: Dmitriy V. Ryaboy Assignee: Dmitriy V. Ryaboy Priority: Minor Fix For: 0.7.0 Attachments: pig_1149.patch Currently, it is not possible to instantiate a SampleLoader with something like PigStorage(':'). We should allow passing parameters to the loaders being sampled. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1157) Sucessive replicated joins do not generate Map Reduce plan and fails due to OOM
[ https://issues.apache.org/jira/browse/PIG-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792633#action_12792633 ] Hadoop QA commented on PIG-1157: +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12428448/PIG-1157.patch against trunk revision 892125. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/141/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/141/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/141/console This message is automatically generated. Sucessive replicated joins do not generate Map Reduce plan and fails due to OOM --- Key: PIG-1157 URL: https://issues.apache.org/jira/browse/PIG-1157 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Assignee: Richard Ding Fix For: 0.6.0 Attachments: oomreplicatedjoin.pig, PIG-1157.patch, PIG-1157.patch, replicatedjoinexplain.log Hi all, I have a script which does 2 replicated joins in succession. Please note that the inputs do not exist on the HDFS. {code} A = LOAD '/tmp/abc' USING PigStorage('\u0001') AS (a:long, b, c); A1 = FOREACH A GENERATE a; B = GROUP A1 BY a; C = LOAD '/tmp/xyz' USING PigStorage('\u0001') AS (x:long, y); D = JOIN C BY x, B BY group USING replicated; E = JOIN A BY a, D by x USING replicated; dump E; {code} 2009-12-16 19:12:00,253 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 4 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 1 map-only splittees. 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 1 map-reduce splittees. 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 2 out of total 2 splittees. 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 2 2009-12-16 19:12:00,713 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. unable to create new native thread Details at logfile: pig_1260990666148.log Looking at the log file: Pig Stack Trace --- ERROR 2998: Unhandled internal error. unable to create new native thread java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:597) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:131) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:773) at org.apache.pig.PigServer.store(PigServer.java:522) at org.apache.pig.PigServer.openIterator(PigServer.java:458) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) at org.apache.pig.Main.main(Main.java:397) If we want to look at the explain output, we find that there is no Map Reduce plan that is generated. Why is the M/R plan not generated? Attaching the script and explain output. Viraj -- This message is automatically generated by JIRA. -
[jira] Updated: (PIG-1158) pig command line -M option doesn't support table union correctly (comma seperated paths)
[ https://issues.apache.org/jira/browse/PIG-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1158: -- Attachment: PIG-1158.patch pig command line -M option doesn't support table union correctly (comma seperated paths) Key: PIG-1158 URL: https://issues.apache.org/jira/browse/PIG-1158 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Jing Huang Assignee: Richard Ding Fix For: 0.7.0 Attachments: PIG-1158.patch for example, load (1.txt,2.txt) USING org.apache.hadoop.zebra.pig.TableLoader() i see this errror from stand out: [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2100: hdfs://gsbl90380.blue.ygrid.yahoo.com/user/hadoopqa/1.txt,2.txt does not exist. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1163) Pig/Zebra 0.6.0 release - Doc Updates
[ https://issues.apache.org/jira/browse/PIG-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Corinne Chandel updated PIG-1163: - Attachment: zebra-6-update-1.patch First update patch for Zebra 0.6.0 release Pig/Zebra 0.6.0 release - Doc Updates - Key: PIG-1163 URL: https://issues.apache.org/jira/browse/PIG-1163 Project: Pig Issue Type: Task Components: documentation Affects Versions: 0.6.0 Reporter: Corinne Chandel Assignee: Corinne Chandel Priority: Blocker Fix For: 0.6.0 Attachments: zebra-6-update-1.patch Pig/Zebra 0.6.0 release - Doc Updates Updates for the Zebra 0.6.0 docs. (1) First patch - please apply the first patch now (zebra-6-update-1.patch) (2) Second patch - depending on feeback, we may have a second patch to apply Jan 4 or Jan 5 Thanks/C -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1163) Pig/Zebra 0.6.0 release - Doc Updates
[ https://issues.apache.org/jira/browse/PIG-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Corinne Chandel updated PIG-1163: - Status: Patch Available (was: Open) (1) Apply this patch to Pig TRUNK (2) Apply this patch to Pig branch-0.6 (3) Note: No new test code required; changes to documentation only. Pig/Zebra 0.6.0 release - Doc Updates - Key: PIG-1163 URL: https://issues.apache.org/jira/browse/PIG-1163 Project: Pig Issue Type: Task Components: documentation Affects Versions: 0.6.0 Reporter: Corinne Chandel Assignee: Corinne Chandel Priority: Blocker Fix For: 0.6.0 Attachments: zebra-6-update-1.patch Pig/Zebra 0.6.0 release - Doc Updates Updates for the Zebra 0.6.0 docs. (1) First patch - please apply the first patch now (zebra-6-update-1.patch) (2) Second patch - depending on feeback, we may have a second patch to apply Jan 4 or Jan 5 Thanks/C -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1157) Sucessive replicated joins do not generate Map Reduce plan and fails due to OOM
[ https://issues.apache.org/jira/browse/PIG-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-1157: Resolution: Fixed Fix Version/s: (was: 0.6.0) 0.7.0 Status: Resolved (was: Patch Available) patch committed, thanks Richard Sucessive replicated joins do not generate Map Reduce plan and fails due to OOM --- Key: PIG-1157 URL: https://issues.apache.org/jira/browse/PIG-1157 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Viraj Bhat Assignee: Richard Ding Fix For: 0.7.0 Attachments: oomreplicatedjoin.pig, PIG-1157.patch, PIG-1157.patch, replicatedjoinexplain.log Hi all, I have a script which does 2 replicated joins in succession. Please note that the inputs do not exist on the HDFS. {code} A = LOAD '/tmp/abc' USING PigStorage('\u0001') AS (a:long, b, c); A1 = FOREACH A GENERATE a; B = GROUP A1 BY a; C = LOAD '/tmp/xyz' USING PigStorage('\u0001') AS (x:long, y); D = JOIN C BY x, B BY group USING replicated; E = JOIN A BY a, D by x USING replicated; dump E; {code} 2009-12-16 19:12:00,253 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 4 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 1 map-only splittees. 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 1 map-reduce splittees. 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - Merged 2 out of total 2 splittees. 2009-12-16 19:12:00,254 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 2 2009-12-16 19:12:00,713 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. unable to create new native thread Details at logfile: pig_1260990666148.log Looking at the log file: Pig Stack Trace --- ERROR 2998: Unhandled internal error. unable to create new native thread java.lang.OutOfMemoryError: unable to create new native thread at java.lang.Thread.start0(Native Method) at java.lang.Thread.start(Thread.java:597) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:131) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:773) at org.apache.pig.PigServer.store(PigServer.java:522) at org.apache.pig.PigServer.openIterator(PigServer.java:458) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) at org.apache.pig.Main.main(Main.java:397) If we want to look at the explain output, we find that there is no Map Reduce plan that is generated. Why is the M/R plan not generated? Attaching the script and explain output. Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1102) Collect number of spills per job
[ https://issues.apache.org/jira/browse/PIG-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792680#action_12792680 ] Sriranjan Manjunath commented on PIG-1102: -- I ran the test again on my local machine, and it passes. The test failed because of too many open file descriptors. Is this a hudson related issue? Collect number of spills per job Key: PIG-1102 URL: https://issues.apache.org/jira/browse/PIG-1102 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Sriranjan Manjunath Fix For: 0.7.0 Attachments: PIG_1102.patch Memory shortage is one of the main performance issues in Pig. Knowing when we spill do the disk is useful for understanding query performance and also to see how certain changes in Pig effect that. Other interesting stats to collect would be average CPU usage and max mem usage but I am not sure if this information is easily retrievable. Using Hadoop counters for this would make sense. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1141) Make streaming work with the new load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792688#action_12792688 ] Alan Gates commented on PIG-1141: - In DefaultInputHandler.close, why was the code that flushes and closes stdin removed? Same question for DefaultOutputHandler and stdout. It seems like we still need to flush and close these streams properly. Similar to the above, close was removed from FileOutputHandler (but not FileInputHandler). Both PigToStream and StreamToPig interfaces should have some javadoc comments for the interface explaining what they do and why. In StorageUtil.parseFieldDel, you call Integer.valueOf(String) for both \u and \x. For \x you should instead use Integer.valueOf(String, 16). Make streaming work with the new load-store interfaces --- Key: PIG-1141 URL: https://issues.apache.org/jira/browse/PIG-1141 Project: Pig Issue Type: Sub-task Reporter: Richard Ding Assignee: Richard Ding Attachments: PIG-1141.patch, PIG-1141.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1149) Allow instantiation of SampleLoaders with parametrized LoadFuncs
[ https://issues.apache.org/jira/browse/PIG-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792689#action_12792689 ] Thejas M Nair commented on PIG-1149: The first test case failure is known, I will be fixing that with a patch in PIG-1094. The special string gets added to the last row only. But that looks unnecessary. I will be removing that with a new patch in PIG-1062. You can submit your patch for LSR branch patch, by checking for 5 columns in your test case. I will change your new test case as well when I submit new PIG-1062 patch (to check for 4 columns). Allow instantiation of SampleLoaders with parametrized LoadFuncs Key: PIG-1149 URL: https://issues.apache.org/jira/browse/PIG-1149 Project: Pig Issue Type: Bug Reporter: Dmitriy V. Ryaboy Assignee: Dmitriy V. Ryaboy Priority: Minor Fix For: 0.7.0 Attachments: pig_1149.patch Currently, it is not possible to instantiate a SampleLoader with something like PigStorage(':'). We should allow passing parameters to the loaders being sampled. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1156) Add aliases to ExecJobs and PhysicalOperators
[ https://issues.apache.org/jira/browse/PIG-1156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-1156: Resolution: Fixed Status: Resolved (was: Patch Available) Patch committed. Thanks Dmitriy. Add aliases to ExecJobs and PhysicalOperators - Key: PIG-1156 URL: https://issues.apache.org/jira/browse/PIG-1156 Project: Pig Issue Type: Improvement Reporter: Dmitriy V. Ryaboy Assignee: Dmitriy V. Ryaboy Fix For: 0.7.0 Attachments: pig_batchAliases.patch Currently, the way to use muti-query from Java is as follows: 1. pigServer.setBatchOn(); 2. register your queries with pigServer 3. ListExecJob jobs = pigServer.executeBatch(); 4. for (ExecJob job : jobs) { IteratorTuple results = job.getResults(); } This will cause all stores to get evaluated in a single batch. However, there is no way to identify which of the ExecJobs corresponds to which store. We should add aliases by which the stored relations are known to ExecJob in order to allow the user to identify what the jobs correspond do. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1149) Allow instantiation of SampleLoaders with parametrized LoadFuncs
[ https://issues.apache.org/jira/browse/PIG-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792698#action_12792698 ] Thejas M Nair commented on PIG-1149: I spoke too soon about the special string being unnecessary. GetMemNumRows uses it. I will add some comments to document that in PoissonSampleLoader . In previous comment, special string gets added to the last row only should be special string gets added to the last *sample* row only. Allow instantiation of SampleLoaders with parametrized LoadFuncs Key: PIG-1149 URL: https://issues.apache.org/jira/browse/PIG-1149 Project: Pig Issue Type: Bug Reporter: Dmitriy V. Ryaboy Assignee: Dmitriy V. Ryaboy Priority: Minor Fix For: 0.7.0 Attachments: pig_1149.patch Currently, it is not possible to instantiate a SampleLoader with something like PigStorage(':'). We should allow passing parameters to the loaders being sampled. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1159) merge join right side table does not support comma seperated paths
[ https://issues.apache.org/jira/browse/PIG-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1159: -- Attachment: PIG-1159.patch With this patch, Pig runtime no longer passes an InputStream to IndexableLoader through the bindTo method. An IndexableLoader is resposible to create its own InputStream for reading data. This actually isn't a new requirement: currently all existing IndexableLoaders create their own InputStreams. And, in the future, with the load-store redesign, Pig runtime will no longer create InputStreams for the loaders. merge join right side table does not support comma seperated paths -- Key: PIG-1159 URL: https://issues.apache.org/jira/browse/PIG-1159 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Jing Huang Assignee: Richard Ding Fix For: 0.7.0 Attachments: PIG-1159.patch For example this is my script:(join_jira1.pig) register /grid/0/dev/hadoopqa/jars/zebra.jar; --a1 = load '1.txt' as (a:int, b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]); --a2 = load '2.txt' as (a:int, b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]); --sort1 = order a1 by a parallel 6; --sort2 = order a2 by a parallel 5; --store sort1 into 'asort1' using org.apache.hadoop.zebra.pig.TableStorer('[a,b,c,d]'); --store sort2 into 'asort2' using org.apache.hadoop.zebra.pig.TableStorer('[a,b,c,d]'); --store sort1 into 'asort3' using org.apache.hadoop.zebra.pig.TableStorer('[a,b,c,d]'); --store sort2 into 'asort4' using org.apache.hadoop.zebra.pig.TableStorer('[a,b,c,d]'); joinl = LOAD 'asort1,asort2' USING org.apache.hadoop.zebra.pig.TableLoader('a,b,c,d', 'sorted'); joinr = LOAD 'asort3,asort4' USING org.apache.hadoop.zebra.pig.TableLoader('a,b,c,d', 'sorted'); joina = join joinl by a, joinr by a using merge ; dump joina; == here is the log: Backend error message - java.lang.IllegalArgumentException: Pathname /user/hadoopqa/asort3,hdfs:/gsbl90380.blue.ygrid.yahoo.com/user/hadoopqa/asort4 from hdfs://gsbl90380.blue.ygrid.yahoo.com/user/hadoopqa/asort3,hdfs:/gsbl90380.blue.ygrid.yahoo.com/user/hadoopqa/asort4 is not a valid DFS filename. at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:158) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:648) at org.apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:203) at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:131) at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:147) at org.apache.pig.impl.io.FileLocalizer.fullPath(FileLocalizer.java:534) at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:338) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.seekInRightStream(POMergeJoin.java:398) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNext(POMergeJoin.java:184) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:159) Pig Stack Trace --- ERROR 6015: During execution, encountered a Hadoop error. org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias joina at org.apache.pig.PigServer.openIterator(PigServer.java:482) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:539) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) at org.apache.pig.Main.main(Main.java:386) Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 6015: During
[jira] Updated: (PIG-1159) merge join right side table does not support comma seperated paths
[ https://issues.apache.org/jira/browse/PIG-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1159: -- Status: Patch Available (was: Open) merge join right side table does not support comma seperated paths -- Key: PIG-1159 URL: https://issues.apache.org/jira/browse/PIG-1159 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Jing Huang Assignee: Richard Ding Fix For: 0.7.0 Attachments: PIG-1159.patch For example this is my script:(join_jira1.pig) register /grid/0/dev/hadoopqa/jars/zebra.jar; --a1 = load '1.txt' as (a:int, b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]); --a2 = load '2.txt' as (a:int, b:float,c:long,d:double,e:chararray,f:bytearray,r1(f1:chararray,f2:chararray),m1:map[]); --sort1 = order a1 by a parallel 6; --sort2 = order a2 by a parallel 5; --store sort1 into 'asort1' using org.apache.hadoop.zebra.pig.TableStorer('[a,b,c,d]'); --store sort2 into 'asort2' using org.apache.hadoop.zebra.pig.TableStorer('[a,b,c,d]'); --store sort1 into 'asort3' using org.apache.hadoop.zebra.pig.TableStorer('[a,b,c,d]'); --store sort2 into 'asort4' using org.apache.hadoop.zebra.pig.TableStorer('[a,b,c,d]'); joinl = LOAD 'asort1,asort2' USING org.apache.hadoop.zebra.pig.TableLoader('a,b,c,d', 'sorted'); joinr = LOAD 'asort3,asort4' USING org.apache.hadoop.zebra.pig.TableLoader('a,b,c,d', 'sorted'); joina = join joinl by a, joinr by a using merge ; dump joina; == here is the log: Backend error message - java.lang.IllegalArgumentException: Pathname /user/hadoopqa/asort3,hdfs:/gsbl90380.blue.ygrid.yahoo.com/user/hadoopqa/asort4 from hdfs://gsbl90380.blue.ygrid.yahoo.com/user/hadoopqa/asort3,hdfs:/gsbl90380.blue.ygrid.yahoo.com/user/hadoopqa/asort4 is not a valid DFS filename. at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:158) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:648) at org.apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:203) at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:131) at org.apache.pig.backend.hadoop.datastorage.HDataStorage.asElement(HDataStorage.java:147) at org.apache.pig.impl.io.FileLocalizer.fullPath(FileLocalizer.java:534) at org.apache.pig.impl.io.FileLocalizer.open(FileLocalizer.java:338) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.seekInRightStream(POMergeJoin.java:398) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNext(POMergeJoin.java:184) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.map(PigMapOnly.java:65) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:159) Pig Stack Trace --- ERROR 6015: During execution, encountered a Hadoop error. org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias joina at org.apache.pig.PigServer.openIterator(PigServer.java:482) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:539) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) at org.apache.pig.Main.main(Main.java:386) Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 6015: During execution, encountered a Hadoop error. at .apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:158) at .apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:453) at .apache.hadoop.fs.FileSystem.exists(FileSystem.java:648)at .apache.pig.backend.hadoop.datastorage.HDataStorage.isContainer(HDataStorage.java:203)
[jira] Commented: (PIG-1141) Make streaming work with the new load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792720#action_12792720 ] Richard Ding commented on PIG-1141: --- bq. In DefaultInputHandler.close, why was the code that flushes and closes stdin removed? Same question for DefaultOutputHandler and stdout. It seems like we still need to flush and close these streams properly. Becuase there is no 'stdin' or 'stdout' to flush and close :) bq. Similar to the above, close was removed from FileOutputHandler (but not FileInputHandler). I want to do the same for FileInputHandler, but findbugs doesn't allow it :( bq. Both PigToStream and StreamToPig interfaces should have some javadoc comments for the interface explaining what they do and why. I'll add javadoc for the interfaces. bq. In StorageUtil.parseFieldDel, you call Integer.valueOf(String) for both \u and \x. For \x you should instead use Integer.valueOf(String, 16). This is copied (refactored) from the current PigStorage code, do we want to change it? Make streaming work with the new load-store interfaces --- Key: PIG-1141 URL: https://issues.apache.org/jira/browse/PIG-1141 Project: Pig Issue Type: Sub-task Reporter: Richard Ding Assignee: Richard Ding Attachments: PIG-1141.patch, PIG-1141.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1163) Pig/Zebra 0.6.0 release - Doc Updates
[ https://issues.apache.org/jira/browse/PIG-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-1163: Resolution: Fixed Status: Resolved (was: Patch Available) patch committed to both trunk and 0.6.0 branch. Thanks, Corinne Pig/Zebra 0.6.0 release - Doc Updates - Key: PIG-1163 URL: https://issues.apache.org/jira/browse/PIG-1163 Project: Pig Issue Type: Task Components: documentation Affects Versions: 0.6.0 Reporter: Corinne Chandel Assignee: Corinne Chandel Priority: Blocker Fix For: 0.6.0 Attachments: zebra-6-update-1.patch Pig/Zebra 0.6.0 release - Doc Updates Updates for the Zebra 0.6.0 docs. (1) First patch - please apply the first patch now (zebra-6-update-1.patch) (2) Second patch - depending on feeback, we may have a second patch to apply Jan 4 or Jan 5 Thanks/C -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1164) [zebra]smoke test
[zebra]smoke test - Key: PIG-1164 URL: https://issues.apache.org/jira/browse/PIG-1164 Project: Pig Issue Type: Test Affects Versions: 0.6.0 Reporter: Jing Huang Fix For: 0.7.0 Change zebra build.xml file to add smoke target. And env.sh and run script under zebra/src/test/smoke dir -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1162) Pig 0.6.0 - UDF doc
[ https://issues.apache.org/jira/browse/PIG-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792738#action_12792738 ] Hadoop QA commented on PIG-1162: +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12428464/pig-6-udf.patch against trunk revision 892125. +1 @author. The patch does not contain any @author tags. +0 tests included. The patch appears to be a documentation patch that doesn't require tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/142/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/142/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/142/console This message is automatically generated. Pig 0.6.0 - UDF doc --- Key: PIG-1162 URL: https://issues.apache.org/jira/browse/PIG-1162 Project: Pig Issue Type: Task Components: documentation Affects Versions: 0.6.0 Reporter: Corinne Chandel Assignee: Corinne Chandel Fix For: 0.6.0 Attachments: pig-6-udf.patch Pig 0.6.0 - UDF doc Small corrections. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-1146) Inconsistent column pruning in LOUnion
[ https://issues.apache.org/jira/browse/PIG-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai reassigned PIG-1146: --- Assignee: Daniel Dai Inconsistent column pruning in LOUnion -- Key: PIG-1146 URL: https://issues.apache.org/jira/browse/PIG-1146 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 Attachments: PIG-1146-1.patch This happens when we do a union on two relations, if one column comes from a loader, the other matching column comes from a constant, and this column get pruned. We prune for the one from loader and did not prune the constant. Thus leaves union an inconsistent state. Here is a script: {code} a = load '1.txt' as (a0, a1:chararray, a2); b = load '2.txt' as (b0, b2); c = foreach b generate b0, 'hello', b2; d = union a, c; e = foreach d generate $0, $2; dump e; {code} 1.txt: {code} ulysses thompson64 1.90 katie carson25 3.65 {code} 2.txt: {code} luke king 0.73 holly davidson 2.43 {code} expected output: (ulysses thompson,1.90) (katie carson,3.65) (luke king,0.73) (holly davidson,2.43) real output: (ulysses thompson,) (katie carson,) (luke king,0.73) (holly davidson,2.43) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1146) Inconsistent column pruning in LOUnion
[ https://issues.apache.org/jira/browse/PIG-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1146: Attachment: PIG-1146-1.patch Inconsistent column pruning in LOUnion -- Key: PIG-1146 URL: https://issues.apache.org/jira/browse/PIG-1146 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Fix For: 0.7.0 Attachments: PIG-1146-1.patch This happens when we do a union on two relations, if one column comes from a loader, the other matching column comes from a constant, and this column get pruned. We prune for the one from loader and did not prune the constant. Thus leaves union an inconsistent state. Here is a script: {code} a = load '1.txt' as (a0, a1:chararray, a2); b = load '2.txt' as (b0, b2); c = foreach b generate b0, 'hello', b2; d = union a, c; e = foreach d generate $0, $2; dump e; {code} 1.txt: {code} ulysses thompson64 1.90 katie carson25 3.65 {code} 2.txt: {code} luke king 0.73 holly davidson 2.43 {code} expected output: (ulysses thompson,1.90) (katie carson,3.65) (luke king,0.73) (holly davidson,2.43) real output: (ulysses thompson,) (katie carson,) (luke king,0.73) (holly davidson,2.43) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1164) [zebra]smoke test
[ https://issues.apache.org/jira/browse/PIG-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Huang updated PIG-1164: Attachment: smoke.patch Patch for the zebra smoke test. No unit test needed for this patch. Only changed build.xml to add smoke target and added environment setup file. [zebra]smoke test - Key: PIG-1164 URL: https://issues.apache.org/jira/browse/PIG-1164 Project: Pig Issue Type: Test Affects Versions: 0.6.0 Reporter: Jing Huang Fix For: 0.7.0 Attachments: smoke.patch Change zebra build.xml file to add smoke target. And env.sh and run script under zebra/src/test/smoke dir -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1146) Inconsistent column pruning in LOUnion
[ https://issues.apache.org/jira/browse/PIG-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1146: Status: Patch Available (was: Open) Inconsistent column pruning in LOUnion -- Key: PIG-1146 URL: https://issues.apache.org/jira/browse/PIG-1146 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 Attachments: PIG-1146-1.patch This happens when we do a union on two relations, if one column comes from a loader, the other matching column comes from a constant, and this column get pruned. We prune for the one from loader and did not prune the constant. Thus leaves union an inconsistent state. Here is a script: {code} a = load '1.txt' as (a0, a1:chararray, a2); b = load '2.txt' as (b0, b2); c = foreach b generate b0, 'hello', b2; d = union a, c; e = foreach d generate $0, $2; dump e; {code} 1.txt: {code} ulysses thompson64 1.90 katie carson25 3.65 {code} 2.txt: {code} luke king 0.73 holly davidson 2.43 {code} expected output: (ulysses thompson,1.90) (katie carson,3.65) (luke king,0.73) (holly davidson,2.43) real output: (ulysses thompson,) (katie carson,) (luke king,0.73) (holly davidson,2.43) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-1153) [zebra] spliting columns at different levels in a complex record column into different column groups throws exception
[ https://issues.apache.org/jira/browse/PIG-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou reassigned PIG-1153: - Assignee: Yan Zhou [zebra] spliting columns at different levels in a complex record column into different column groups throws exception - Key: PIG-1153 URL: https://issues.apache.org/jira/browse/PIG-1153 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Xuefu Zhang Assignee: Yan Zhou Attachments: PIG-1153.patch The following code sample: String strSch = r1:record(f1:int, f2:int), r2:record(f5:int, r3:record(f3:float, f4)); String strStorage = [r1.f1, r2.r3.f3, r2.f5]; [r1.f2, r2.r3.f4]; Partition p = new Partition(schema.toString(), strStorage, null); gives the following exception: org.apache.hadoop.zebra.parser.ParseException: Different Split Types Set on the same field: r2.f5 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1153) [zebra] spliting columns at different levels in a complex record column into different column groups throws exception
[ https://issues.apache.org/jira/browse/PIG-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1153: -- Attachment: PIG-1153.patch [zebra] spliting columns at different levels in a complex record column into different column groups throws exception - Key: PIG-1153 URL: https://issues.apache.org/jira/browse/PIG-1153 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Xuefu Zhang Assignee: Yan Zhou Attachments: PIG-1153.patch The following code sample: String strSch = r1:record(f1:int, f2:int), r2:record(f5:int, r3:record(f3:float, f4)); String strStorage = [r1.f1, r2.r3.f3, r2.f5]; [r1.f2, r2.r3.f4]; Partition p = new Partition(schema.toString(), strStorage, null); gives the following exception: org.apache.hadoop.zebra.parser.ParseException: Different Split Types Set on the same field: r2.f5 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1153) [zebra] spliting columns at different levels in a complex record column into different column groups throws exception
[ https://issues.apache.org/jira/browse/PIG-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1153: -- Status: Patch Available (was: Open) [zebra] spliting columns at different levels in a complex record column into different column groups throws exception - Key: PIG-1153 URL: https://issues.apache.org/jira/browse/PIG-1153 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Xuefu Zhang Assignee: Yan Zhou Attachments: PIG-1153.patch The following code sample: String strSch = r1:record(f1:int, f2:int), r2:record(f5:int, r3:record(f3:float, f4)); String strStorage = [r1.f1, r2.r3.f3, r2.f5]; [r1.f2, r2.r3.f4]; Partition p = new Partition(schema.toString(), strStorage, null); gives the following exception: org.apache.hadoop.zebra.parser.ParseException: Different Split Types Set on the same field: r2.f5 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1165) Signature of loader does not set correctly for order by
Signature of loader does not set correctly for order by --- Key: PIG-1165 URL: https://issues.apache.org/jira/browse/PIG-1165 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 In pig, we need to set signature for each LoadFunc. Currently, we use alias of the LOAD statement in Pig script of the signature of the LoadFunc. One use case we have is in LoadFunc, we use signature to retrieve pruned columns of each specific loader. However, in order by statement, we do not set signature for the loader correctly. In this case, we do not prune the loader correctly. For example, the following script produce wrong result: {code} a = load '1.txt' as (a0, a1); b = order a by a1; c = order b by a1; d = foreach c generate a1; dump d; {code} 1.txt: {code} 1 a 2 b 3 c 6 d 5 e {code} expected result: a b c d e current result: 1 2 3 5 6 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1162) Pig 0.6.0 - UDF doc
[ https://issues.apache.org/jira/browse/PIG-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-1162: Resolution: Fixed Status: Resolved (was: Patch Available) patch committed to both trunk and 0.6 branch. Thanks, Corinne! Pig 0.6.0 - UDF doc --- Key: PIG-1162 URL: https://issues.apache.org/jira/browse/PIG-1162 Project: Pig Issue Type: Task Components: documentation Affects Versions: 0.6.0 Reporter: Corinne Chandel Assignee: Corinne Chandel Fix For: 0.6.0 Attachments: pig-6-udf.patch Pig 0.6.0 - UDF doc Small corrections. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1130) In pig local ( hadoop local mode ) mode the counting of number of tuples and bytes is incorrect if data is more than one local split.
[ https://issues.apache.org/jira/browse/PIG-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792770#action_12792770 ] Jeff Zhang commented on PIG-1130: - Alan, I think one method is to check the type of FileSystem, if it is LocalFileSystem in MapReduce mode, then we should throw Exception. In pig local ( hadoop local mode ) mode the counting of number of tuples and bytes is incorrect if data is more than one local split. - Key: PIG-1130 URL: https://issues.apache.org/jira/browse/PIG-1130 Project: Pig Issue Type: Bug Reporter: Ankit Modi Priority: Minor If the output generates more than one part file, the current code only gives stats of the first part file. ie. part-0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1149) Allow instantiation of SampleLoaders with parametrized LoadFuncs
[ https://issues.apache.org/jira/browse/PIG-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy V. Ryaboy updated PIG-1149: --- Attachment: pig_1149_lsr-branch.patch Attaching patch for lsr branch. I also retabbed the involved files to replace tabs with spaces, and got rid of some unused imports. Note the FIXME in the test case, as discussed. Allow instantiation of SampleLoaders with parametrized LoadFuncs Key: PIG-1149 URL: https://issues.apache.org/jira/browse/PIG-1149 Project: Pig Issue Type: Bug Reporter: Dmitriy V. Ryaboy Assignee: Dmitriy V. Ryaboy Priority: Minor Fix For: 0.7.0 Attachments: pig_1149.patch, pig_1149_lsr-branch.patch Currently, it is not possible to instantiate a SampleLoader with something like PigStorage(':'). We should allow passing parameters to the loaders being sampled. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1158) pig command line -M option doesn't support table union correctly (comma seperated paths)
[ https://issues.apache.org/jira/browse/PIG-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792786#action_12792786 ] Hadoop QA commented on PIG-1158: +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12428482/PIG-1158.patch against trunk revision 892408. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/143/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/143/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/143/console This message is automatically generated. pig command line -M option doesn't support table union correctly (comma seperated paths) Key: PIG-1158 URL: https://issues.apache.org/jira/browse/PIG-1158 Project: Pig Issue Type: Bug Affects Versions: 0.6.0 Reporter: Jing Huang Assignee: Richard Ding Fix For: 0.7.0 Attachments: PIG-1158.patch for example, load (1.txt,2.txt) USING org.apache.hadoop.zebra.pig.TableLoader() i see this errror from stand out: [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2100: hdfs://gsbl90380.blue.ygrid.yahoo.com/user/hadoopqa/1.txt,2.txt does not exist. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1163) Pig/Zebra 0.6.0 release - Doc Updates
[ https://issues.apache.org/jira/browse/PIG-1163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12792787#action_12792787 ] Hadoop QA commented on PIG-1163: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12428494/zebra-6-update-1.patch against trunk revision 892416. +1 @author. The patch does not contain any @author tags. +0 tests included. The patch appears to be a documentation patch that doesn't require tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/144/console This message is automatically generated. Pig/Zebra 0.6.0 release - Doc Updates - Key: PIG-1163 URL: https://issues.apache.org/jira/browse/PIG-1163 Project: Pig Issue Type: Task Components: documentation Affects Versions: 0.6.0 Reporter: Corinne Chandel Assignee: Corinne Chandel Priority: Blocker Fix For: 0.6.0 Attachments: zebra-6-update-1.patch Pig/Zebra 0.6.0 release - Doc Updates Updates for the Zebra 0.6.0 docs. (1) First patch - please apply the first patch now (zebra-6-update-1.patch) (2) Second patch - depending on feeback, we may have a second patch to apply Jan 4 or Jan 5 Thanks/C -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.