[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables
[ https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12793971#action_12793971 ] Gerrit Jansen van Vuuren commented on PIG-1117: --- OK, will upload the 0.7.0 implementation today, It will still not have an implementation for fieldsToRead just empty method. I'll have a look at it after xmas. Pig reading hive columnar rc tables --- Key: PIG-1117 URL: https://issues.apache.org/jira/browse/PIG-1117 Project: Pig Issue Type: New Feature Affects Versions: 0.6.0 Reporter: Gerrit Jansen van Vuuren Fix For: 0.7.0 Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, PIG-1117.patch, PIG-117-v.0.6.0.patch I've coded a LoadFunc implementation that can read from Hive Columnar RC tables, this is needed for a project that I'm working on because all our data is stored using the Hive thrift serialized Columnar RC format. I have looked at the piggy bank but did not find any implementation that could do this. We've been running it on our cluster for the last week and have worked out most bugs. There are still some improvements to be done but I would need like setting the amount of mappers based on date partitioning. Its been optimized so as to read only specific columns and can churn through a data set almost 8 times faster with this improvement because not all column data is read. I would like to contribute the class to the piggybank can you guide me in what I need to do? I've used hive specific classes to implement this, is it possible to add this to the piggy bank build ivy for automatic download of the dependencies? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1117) Pig reading hive columnar rc tables
[ https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gerrit Jansen van Vuuren updated PIG-1117: -- Affects Version/s: (was: 0.6.0) Status: Open (was: Patch Available) Pig reading hive columnar rc tables --- Key: PIG-1117 URL: https://issues.apache.org/jira/browse/PIG-1117 Project: Pig Issue Type: New Feature Reporter: Gerrit Jansen van Vuuren Fix For: 0.7.0 Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, PIG-1117.patch, PIG-117-v.0.6.0.patch I've coded a LoadFunc implementation that can read from Hive Columnar RC tables, this is needed for a project that I'm working on because all our data is stored using the Hive thrift serialized Columnar RC format. I have looked at the piggy bank but did not find any implementation that could do this. We've been running it on our cluster for the last week and have worked out most bugs. There are still some improvements to be done but I would need like setting the amount of mappers based on date partitioning. Its been optimized so as to read only specific columns and can churn through a data set almost 8 times faster with this improvement because not all column data is read. I would like to contribute the class to the piggybank can you guide me in what I need to do? I've used hive specific classes to implement this, is it possible to add this to the piggy bank build ivy for automatic download of the dependencies? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1117) Pig reading hive columnar rc tables
[ https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gerrit Jansen van Vuuren updated PIG-1117: -- Attachment: PIG-117-v.0.7.0.patch Changes: - Slicing done per block and not per file. - Automatic download of hive dependencies from the apache website. This is only done once. - Added empty implementation for fieldsToRead (will implement this soon). - Refactored out code duplication. - Changed Byte value to be cast to Integer - Changed Boolean values to be 1 if true else 0 Test: ant hive-test Jar: ant hive-jar Dependencies: The hive_exec.jar needs to be either in the classpath for all task nodes or registered in the pig script e.g REGISTER hive_exec.jar REGISTER piggybank.jar Pig reading hive columnar rc tables --- Key: PIG-1117 URL: https://issues.apache.org/jira/browse/PIG-1117 Project: Pig Issue Type: New Feature Reporter: Gerrit Jansen van Vuuren Fix For: 0.7.0 Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, PIG-1117.patch, PIG-117-v.0.6.0.patch, PIG-117-v.0.7.0.patch I've coded a LoadFunc implementation that can read from Hive Columnar RC tables, this is needed for a project that I'm working on because all our data is stored using the Hive thrift serialized Columnar RC format. I have looked at the piggy bank but did not find any implementation that could do this. We've been running it on our cluster for the last week and have worked out most bugs. There are still some improvements to be done but I would need like setting the amount of mappers based on date partitioning. Its been optimized so as to read only specific columns and can churn through a data set almost 8 times faster with this improvement because not all column data is read. I would like to contribute the class to the piggybank can you guide me in what I need to do? I've used hive specific classes to implement this, is it possible to add this to the piggy bank build ivy for automatic download of the dependencies? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1117) Pig reading hive columnar rc tables
[ https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gerrit Jansen van Vuuren updated PIG-1117: -- Tags: PIG-117-v.0.7.0.patch (was: PIG-117-v.0.6.0.patch) Affects Version/s: 0.7.0 Status: Patch Available (was: Open) Pig reading hive columnar rc tables --- Key: PIG-1117 URL: https://issues.apache.org/jira/browse/PIG-1117 Project: Pig Issue Type: New Feature Affects Versions: 0.7.0 Reporter: Gerrit Jansen van Vuuren Fix For: 0.7.0 Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, PIG-1117.patch, PIG-117-v.0.6.0.patch, PIG-117-v.0.7.0.patch I've coded a LoadFunc implementation that can read from Hive Columnar RC tables, this is needed for a project that I'm working on because all our data is stored using the Hive thrift serialized Columnar RC format. I have looked at the piggy bank but did not find any implementation that could do this. We've been running it on our cluster for the last week and have worked out most bugs. There are still some improvements to be done but I would need like setting the amount of mappers based on date partitioning. Its been optimized so as to read only specific columns and can churn through a data set almost 8 times faster with this improvement because not all column data is read. I would like to contribute the class to the piggybank can you guide me in what I need to do? I've used hive specific classes to implement this, is it possible to add this to the piggy bank build ivy for automatic download of the dependencies? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-761) ERROR 2086 on simple JOIN
[ https://issues.apache.org/jira/browse/PIG-761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794005#action_12794005 ] Ankur commented on PIG-761: --- Here is a very simple script to reproduce the issue:- - Start - data1 = LOAD 'data1' as (a:int, b:int, c:chararray); proj1 = LIMIT data1 5; data2 = LOAD 'data2' as (x:int, y:chararray, z:chararray); proj2 = FOREACH data2 GENERATE x, y; cogrouped = COGROUP proj1 BY a, proj2 BY x INNER PARALLEL 2; joined = FOREACH cogrouped GENERATE FLATTEN(proj1), FLATTEN(proj2); store joined into 'results'; - End The problem seems to be with the LIMIT operator for one of the relations participating in the join. Seems like this causes the mismatch between expected and found local re-arrange operators ERROR 2086 on simple JOIN - Key: PIG-761 URL: https://issues.apache.org/jira/browse/PIG-761 Project: Pig Issue Type: Bug Affects Versions: 0.2.0 Environment: mapreduce mode Reporter: Vadim Zaliva ERROR 2086: Unexpected problem during optimization. Could not find all LocalRearrange operators.org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias 109 doing pretty straightforward join in one of my pig scripts. I am able to 'dump' both relationship involved in this join. when I try to join them I am getting this error. Here is a full log: ERROR 2086: Unexpected problem during optimization. Could not find all LocalRearrange operators. org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias 109 at org.apache.pig.PigServer.registerQuery(PigServer.java:296) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) at org.apache.pig.Main.main(Main.java:319) Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2043: Unexpected error during execution. at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:274) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:700) at org.apache.pig.PigServer.execute(PigServer.java:691) at org.apache.pig.PigServer.registerQuery(PigServer.java:292) ... 5 more Caused by: org.apache.pig.impl.plan.optimizer.OptimizerException: ERROR 2086: Unexpected problem during optimization. Could not find all LocalRearrange operators. at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator.handlePackage(POPackageAnnotator.java:116) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator.visitMROp(POPackageAnnotator.java:88) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:194) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:43) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:65) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67) at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer. MapReduceLauncher.compile(MapReduceLauncher.java:198) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:80) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:261) ... 8 more ERROR 1002: Unable to store alias 398 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias 398 at org.apache.pig.PigServer.registerQuery(PigServer.java:296) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) at org.apache.pig.Main.main(Main.java:319) Caused by: java.lang.NullPointerException at
[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables
[ https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794077#action_12794077 ] Hadoop QA commented on PIG-1117: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12428803/PIG-117-v.0.7.0.patch against trunk revision 893373. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 5 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 410 release audit warnings (more than the trunk's current 408 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/153/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/153/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/153/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/153/console This message is automatically generated. Pig reading hive columnar rc tables --- Key: PIG-1117 URL: https://issues.apache.org/jira/browse/PIG-1117 Project: Pig Issue Type: New Feature Affects Versions: 0.7.0 Reporter: Gerrit Jansen van Vuuren Fix For: 0.7.0 Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, PIG-1117.patch, PIG-117-v.0.6.0.patch, PIG-117-v.0.7.0.patch I've coded a LoadFunc implementation that can read from Hive Columnar RC tables, this is needed for a project that I'm working on because all our data is stored using the Hive thrift serialized Columnar RC format. I have looked at the piggy bank but did not find any implementation that could do this. We've been running it on our cluster for the last week and have worked out most bugs. There are still some improvements to be done but I would need like setting the amount of mappers based on date partitioning. Its been optimized so as to read only specific columns and can churn through a data set almost 8 times faster with this improvement because not all column data is read. I would like to contribute the class to the piggybank can you guide me in what I need to do? I've used hive specific classes to implement this, is it possible to add this to the piggy bank build ivy for automatic download of the dependencies? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-1166) A bit change of the interface of Tuple DataBag ( make the set and append method return this)
[ https://issues.apache.org/jira/browse/PIG-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang reassigned PIG-1166: --- Assignee: Jeff Zhang A bit change of the interface of Tuple DataBag ( make the set and append method return this) -- Key: PIG-1166 URL: https://issues.apache.org/jira/browse/PIG-1166 Project: Pig Issue Type: Improvement Reporter: Jeff Zhang Assignee: Jeff Zhang Priority: Minor When people write unit test for UDF, they always need to build a tuple or bag. If we change the interface of Tuple and DataBag, make the set and append method return this, it can decrease the code size. e.g. Now people have to write the following code to build a Tuple: {code} Tuple tuple=TupleFactory.getInstance().newTuple(3); tuple.set(0,item_0); tuple.set(1,item_1); tuple.set(2,item_2); {code} If we change the interface, make the set and append method return this, we can rewrite the above code like this: {code} Tuple tuple=TupleFactory.getInstance().newTuple(3); tuple.set(0,item_0).set(1,item_1).set(2,item_2); {code} This interface change won't have back compatibility problem and I think there's no performance problem too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1148) Move splitable logic from pig latin to InputFormat
[ https://issues.apache.org/jira/browse/PIG-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated PIG-1148: Status: Patch Available (was: Open) Move splitable logic from pig latin to InputFormat -- Key: PIG-1148 URL: https://issues.apache.org/jira/browse/PIG-1148 Project: Pig Issue Type: Sub-task Reporter: Jeff Zhang Assignee: Jeff Zhang Attachments: PIG-1148.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1166) A bit change of the interface of Tuple DataBag ( make the set and append method return this)
[ https://issues.apache.org/jira/browse/PIG-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated PIG-1166: Status: Patch Available (was: Open) A bit change of the interface of Tuple DataBag ( make the set and append method return this) -- Key: PIG-1166 URL: https://issues.apache.org/jira/browse/PIG-1166 Project: Pig Issue Type: Improvement Reporter: Jeff Zhang Assignee: Jeff Zhang Priority: Minor Attachments: Pig_1166.patch When people write unit test for UDF, they always need to build a tuple or bag. If we change the interface of Tuple and DataBag, make the set and append method return this, it can decrease the code size. e.g. Now people have to write the following code to build a Tuple: {code} Tuple tuple=TupleFactory.getInstance().newTuple(3); tuple.set(0,item_0); tuple.set(1,item_1); tuple.set(2,item_2); {code} If we change the interface, make the set and append method return this, we can rewrite the above code like this: {code} Tuple tuple=TupleFactory.getInstance().newTuple(3); tuple.set(0,item_0).set(1,item_1).set(2,item_2); {code} This interface change won't have back compatibility problem and I think there's no performance problem too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1166) A bit change of the interface of Tuple DataBag ( make the set and append method return this)
[ https://issues.apache.org/jira/browse/PIG-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated PIG-1166: Attachment: Pig_1166.patch A bit change of the interface of Tuple DataBag ( make the set and append method return this) -- Key: PIG-1166 URL: https://issues.apache.org/jira/browse/PIG-1166 Project: Pig Issue Type: Improvement Reporter: Jeff Zhang Assignee: Jeff Zhang Priority: Minor Attachments: Pig_1166.patch When people write unit test for UDF, they always need to build a tuple or bag. If we change the interface of Tuple and DataBag, make the set and append method return this, it can decrease the code size. e.g. Now people have to write the following code to build a Tuple: {code} Tuple tuple=TupleFactory.getInstance().newTuple(3); tuple.set(0,item_0); tuple.set(1,item_1); tuple.set(2,item_2); {code} If we change the interface, make the set and append method return this, we can rewrite the above code like this: {code} Tuple tuple=TupleFactory.getInstance().newTuple(3); tuple.set(0,item_0).set(1,item_1).set(2,item_2); {code} This interface change won't have back compatibility problem and I think there's no performance problem too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1148) Move splitable logic from pig latin to InputFormat
[ https://issues.apache.org/jira/browse/PIG-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794085#action_12794085 ] Hadoop QA commented on PIG-1148: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12428829/PIG-1148.patch against trunk revision 893373. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 27 new or modified tests. -1 patch. The patch command could not apply the patch. Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/154/console This message is automatically generated. Move splitable logic from pig latin to InputFormat -- Key: PIG-1148 URL: https://issues.apache.org/jira/browse/PIG-1148 Project: Pig Issue Type: Sub-task Reporter: Jeff Zhang Assignee: Jeff Zhang Attachments: PIG-1148.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1094) Fix unit tests corresponding to source changes so far
[ https://issues.apache.org/jira/browse/PIG-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1094: --- Attachment: PIG-1094_5.patch This patch (PIG-1094_5.patch) fixes the order-by, skew-join and merge-join test failures. TestPoissonSampleLoader.java - testNumSamples() - Unlike earlier version of sampler, if there are very few rows (3 in this case) only one sample is selected. WeightedRangePartitioner.java - If the sample file is empty, there was a check to ensure that the input is also empty , using FileLocalizer.getSize(). Removed that check. Input location need not be a file. PoissonSampleLoader.java - additional comments, fixed indentation . GetMemNumRows.java - handling the case where 2nd last column is null (while looking for the specially marked last tuple). output of test-patch - [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 6 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Fix unit tests corresponding to source changes so far - Key: PIG-1094 URL: https://issues.apache.org/jira/browse/PIG-1094 Project: Pig Issue Type: Sub-task Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1094.patch, PIG-1094_2.patch, PIG-1094_3.patch, PIG-1094_4.patch, PIG-1094_5.patch The check-in's so far on load-store-redesign branch have nor addressed unit test failures due to interface changes. This jira is to track the task of making the common case unit tests work with the new interfaces. Some aspects of the new proposal like using LoadCaster interface for casting, making local mode work have not been completed yet. Tests which are failing due to those reasons will not be fixed in this jira and addressed in the jiras corresponding to those tasks -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1136) [zebra] Map Split of Storage info do not allow for leading underscore char '_'
[ https://issues.apache.org/jira/browse/PIG-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated PIG-1136: - Status: Patch Available (was: Open) [zebra] Map Split of Storage info do not allow for leading underscore char '_' -- Key: PIG-1136 URL: https://issues.apache.org/jira/browse/PIG-1136 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Yan Zhou Priority: Minor Attachments: pig-1136-xuefu.patch There is some user need to support that type of map keys. Pig's column does not allow for leading underscore, but apparently no restriction is placed on the map key. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1136) [zebra] Map Split of Storage info do not allow for leading underscore char '_'
[ https://issues.apache.org/jira/browse/PIG-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794143#action_12794143 ] Yan Zhou commented on PIG-1136: --- Patch reviewed +1 [zebra] Map Split of Storage info do not allow for leading underscore char '_' -- Key: PIG-1136 URL: https://issues.apache.org/jira/browse/PIG-1136 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Yan Zhou Priority: Minor Attachments: pig-1136-xuefu.patch There is some user need to support that type of map keys. Pig's column does not allow for leading underscore, but apparently no restriction is placed on the map key. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1090) Update sources to reflect recent changes in load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794158#action_12794158 ] Pradeep Kamath commented on PIG-1090: - Dmitriy, The method in LoadMetadata that is implemented in PIG-1090-4.patch is to set partition filter and not to implement filter pushdown in general. Only partition filter conditions are pushed down through LoadMetadata as per the redesign proposal. As you rightly pointed pushing down filters in general will be done through the LoadPushDown interface which currently only has a pushProjection method - at a later point when Pig is able to push down filters, a pushFilter method can be introduced. It is not currently present because we don't know what the argument would look like eventually when we do push down filters. The optimization in the patch attached to this jira is only to extract conditions on partition columns which is needed to be able to call LoadMetadata.setPartitionFitler() method and hence was added in this patch. Update sources to reflect recent changes in load-store interfaces - Key: PIG-1090 URL: https://issues.apache.org/jira/browse/PIG-1090 Project: Pig Issue Type: Sub-task Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090.patch, PIG-1190-5.patch There have been some changes (as recorded in the Changes Section, Nov 2 2009 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the load/store interfaces - this jira is to track the task of making those changes under src. Changes under test will be addresses in a different jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-761) ERROR 2086 on simple JOIN
[ https://issues.apache.org/jira/browse/PIG-761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-761: --- Attachment: PIG-761-1.patch The problem lies in the complexity between limit and one of the optimization. More specific, optimization POPackageAnnotator search for matching POLocalRearrange in the map plan, if not, search in the predecessor's reduce plan. However, if we have a limit, limit will introduce a map-reduce job between the original map-reduce job and its predecessor. POPackageAnnotator cannot find the POLocalRearrange then. To fix it, we mark the map reduce job introduced by limit, and in POPackageAnnotator, if we saw a limit map reduce job, we will search POLocalRearrange in limit job's parent. ERROR 2086 on simple JOIN - Key: PIG-761 URL: https://issues.apache.org/jira/browse/PIG-761 Project: Pig Issue Type: Bug Affects Versions: 0.2.0 Environment: mapreduce mode Reporter: Vadim Zaliva Fix For: 0.6.0 Attachments: PIG-761-1.patch ERROR 2086: Unexpected problem during optimization. Could not find all LocalRearrange operators.org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias 109 doing pretty straightforward join in one of my pig scripts. I am able to 'dump' both relationship involved in this join. when I try to join them I am getting this error. Here is a full log: ERROR 2086: Unexpected problem during optimization. Could not find all LocalRearrange operators. org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias 109 at org.apache.pig.PigServer.registerQuery(PigServer.java:296) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) at org.apache.pig.Main.main(Main.java:319) Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2043: Unexpected error during execution. at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:274) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:700) at org.apache.pig.PigServer.execute(PigServer.java:691) at org.apache.pig.PigServer.registerQuery(PigServer.java:292) ... 5 more Caused by: org.apache.pig.impl.plan.optimizer.OptimizerException: ERROR 2086: Unexpected problem during optimization. Could not find all LocalRearrange operators. at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator.handlePackage(POPackageAnnotator.java:116) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator.visitMROp(POPackageAnnotator.java:88) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:194) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:43) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:65) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67) at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer. MapReduceLauncher.compile(MapReduceLauncher.java:198) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:80) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:261) ... 8 more ERROR 1002: Unable to store alias 398 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias 398 at org.apache.pig.PigServer.registerQuery(PigServer.java:296) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) at org.apache.pig.Main.main(Main.java:319) Caused by: java.lang.NullPointerException at
[jira] Updated: (PIG-761) ERROR 2086 on simple JOIN
[ https://issues.apache.org/jira/browse/PIG-761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-761: --- Fix Version/s: 0.6.0 Assignee: Daniel Dai Status: Patch Available (was: Open) ERROR 2086 on simple JOIN - Key: PIG-761 URL: https://issues.apache.org/jira/browse/PIG-761 Project: Pig Issue Type: Bug Affects Versions: 0.2.0 Environment: mapreduce mode Reporter: Vadim Zaliva Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-761-1.patch ERROR 2086: Unexpected problem during optimization. Could not find all LocalRearrange operators.org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias 109 doing pretty straightforward join in one of my pig scripts. I am able to 'dump' both relationship involved in this join. when I try to join them I am getting this error. Here is a full log: ERROR 2086: Unexpected problem during optimization. Could not find all LocalRearrange operators. org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias 109 at org.apache.pig.PigServer.registerQuery(PigServer.java:296) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) at org.apache.pig.Main.main(Main.java:319) Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2043: Unexpected error during execution. at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:274) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:700) at org.apache.pig.PigServer.execute(PigServer.java:691) at org.apache.pig.PigServer.registerQuery(PigServer.java:292) ... 5 more Caused by: org.apache.pig.impl.plan.optimizer.OptimizerException: ERROR 2086: Unexpected problem during optimization. Could not find all LocalRearrange operators. at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator.handlePackage(POPackageAnnotator.java:116) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator.visitMROp(POPackageAnnotator.java:88) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:194) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:43) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:65) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67) at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer. MapReduceLauncher.compile(MapReduceLauncher.java:198) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:80) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:261) ... 8 more ERROR 1002: Unable to store alias 398 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias 398 at org.apache.pig.PigServer.registerQuery(PigServer.java:296) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) at org.apache.pig.Main.main(Main.java:319) Caused by: java.lang.NullPointerException at org.apache.pig.backend.hadoop.executionengine.physicalLayer.LogToPhyTranslationVisitor.visit(LogToPhyTranslationVisitor.java:669) at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:330) at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:41) at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at
[jira] Commented: (PIG-1090) Update sources to reflect recent changes in load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794179#action_12794179 ] Pradeep Kamath commented on PIG-1090: - +1 for PIG-1090-5.patch, patch committed to load-store-redesign branch. Update sources to reflect recent changes in load-store interfaces - Key: PIG-1090 URL: https://issues.apache.org/jira/browse/PIG-1090 Project: Pig Issue Type: Sub-task Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090.patch, PIG-1190-5.patch There have been some changes (as recorded in the Changes Section, Nov 2 2009 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the load/store interfaces - this jira is to track the task of making those changes under src. Changes under test will be addresses in a different jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1136) [zebra] Map Split of Storage info do not allow for leading underscore char '_'
[ https://issues.apache.org/jira/browse/PIG-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated PIG-1136: - Attachment: (was: pig-1136-xuefu.patch) [zebra] Map Split of Storage info do not allow for leading underscore char '_' -- Key: PIG-1136 URL: https://issues.apache.org/jira/browse/PIG-1136 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Yan Zhou Priority: Minor Attachments: pig-1136-xuefu-new.patch There is some user need to support that type of map keys. Pig's column does not allow for leading underscore, but apparently no restriction is placed on the map key. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1136) [zebra] Map Split of Storage info do not allow for leading underscore char '_'
[ https://issues.apache.org/jira/browse/PIG-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xuefu Zhang updated PIG-1136: - Attachment: pig-1136-xuefu-new.patch [zebra] Map Split of Storage info do not allow for leading underscore char '_' -- Key: PIG-1136 URL: https://issues.apache.org/jira/browse/PIG-1136 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Yan Zhou Priority: Minor Attachments: pig-1136-xuefu-new.patch There is some user need to support that type of map keys. Pig's column does not allow for leading underscore, but apparently no restriction is placed on the map key. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1170) [zebra] end to end test and stress test
[zebra] end to end test and stress test --- Key: PIG-1170 URL: https://issues.apache.org/jira/browse/PIG-1170 Project: Pig Issue Type: Test Affects Versions: 0.6.0 Reporter: Jing Huang Fix For: 0.7.0 Add test cases for zebra end 2 end test , stress test and stress test verification tool. No unit test is needed for this jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1146) Inconsistent column pruning in LOUnion
[ https://issues.apache.org/jira/browse/PIG-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1146: Status: Patch Available (was: Open) Inconsistent column pruning in LOUnion -- Key: PIG-1146 URL: https://issues.apache.org/jira/browse/PIG-1146 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 Attachments: PIG-1146-1.patch, PIG-1146-2.patch This happens when we do a union on two relations, if one column comes from a loader, the other matching column comes from a constant, and this column get pruned. We prune for the one from loader and did not prune the constant. Thus leaves union an inconsistent state. Here is a script: {code} a = load '1.txt' as (a0, a1:chararray, a2); b = load '2.txt' as (b0, b2); c = foreach b generate b0, 'hello', b2; d = union a, c; e = foreach d generate $0, $2; dump e; {code} 1.txt: {code} ulysses thompson64 1.90 katie carson25 3.65 {code} 2.txt: {code} luke king 0.73 holly davidson 2.43 {code} expected output: (ulysses thompson,1.90) (katie carson,3.65) (luke king,0.73) (holly davidson,2.43) real output: (ulysses thompson,) (katie carson,) (luke king,0.73) (holly davidson,2.43) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1166) A bit change of the interface of Tuple DataBag ( make the set and append method return this)
[ https://issues.apache.org/jira/browse/PIG-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794204#action_12794204 ] Hadoop QA commented on PIG-1166: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12428831/Pig_1166.patch against trunk revision 893373. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 420 release audit warnings (more than the trunk's current 413 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/155/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/155/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/155/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/155/console This message is automatically generated. A bit change of the interface of Tuple DataBag ( make the set and append method return this) -- Key: PIG-1166 URL: https://issues.apache.org/jira/browse/PIG-1166 Project: Pig Issue Type: Improvement Reporter: Jeff Zhang Assignee: Jeff Zhang Priority: Minor Attachments: Pig_1166.patch When people write unit test for UDF, they always need to build a tuple or bag. If we change the interface of Tuple and DataBag, make the set and append method return this, it can decrease the code size. e.g. Now people have to write the following code to build a Tuple: {code} Tuple tuple=TupleFactory.getInstance().newTuple(3); tuple.set(0,item_0); tuple.set(1,item_1); tuple.set(2,item_2); {code} If we change the interface, make the set and append method return this, we can rewrite the above code like this: {code} Tuple tuple=TupleFactory.getInstance().newTuple(3); tuple.set(0,item_0).set(1,item_1).set(2,item_2); {code} This interface change won't have back compatibility problem and I think there's no performance problem too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1146) Inconsistent column pruning in LOUnion
[ https://issues.apache.org/jira/browse/PIG-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1146: Status: Open (was: Patch Available) Inconsistent column pruning in LOUnion -- Key: PIG-1146 URL: https://issues.apache.org/jira/browse/PIG-1146 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 Attachments: PIG-1146-1.patch, PIG-1146-2.patch This happens when we do a union on two relations, if one column comes from a loader, the other matching column comes from a constant, and this column get pruned. We prune for the one from loader and did not prune the constant. Thus leaves union an inconsistent state. Here is a script: {code} a = load '1.txt' as (a0, a1:chararray, a2); b = load '2.txt' as (b0, b2); c = foreach b generate b0, 'hello', b2; d = union a, c; e = foreach d generate $0, $2; dump e; {code} 1.txt: {code} ulysses thompson64 1.90 katie carson25 3.65 {code} 2.txt: {code} luke king 0.73 holly davidson 2.43 {code} expected output: (ulysses thompson,1.90) (katie carson,3.65) (luke king,0.73) (holly davidson,2.43) real output: (ulysses thompson,) (katie carson,) (luke king,0.73) (holly davidson,2.43) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1090) Update sources to reflect recent changes in load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794212#action_12794212 ] Thejas M Nair commented on PIG-1090: I have reviewed the changes related to partition filter extraction. * The case where load statement has a user defined schema with different column names for partition column needs to be handled. * src/org/apache/pig/LoadMetadata.java - I think we should document in the comments that the load function does not have to implement setParitionFilter even if it implements other parts of LoadMetadata interface. And that it can communicate this to pig by returning null in getPartitionKeys. * src/org/apache/pig/Expression.java - in BinaryExpression.toString() , need to add parenthesis around the arguments , if they are binary expressions so that the string represents the correct operator precedence as specified in the filter condition. eg (a = 1 or b = 1) and c = 1 now gets converted to a = 1 or b = 1 and c = 1 . * src/org/apache/pig/Expression.java - in Const.toString() - It will be better to use single quotes instead of double quotes around string constants, as string literals in SQL (standard) and pig-latin are single-quoted . Update sources to reflect recent changes in load-store interfaces - Key: PIG-1090 URL: https://issues.apache.org/jira/browse/PIG-1090 Project: Pig Issue Type: Sub-task Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090.patch, PIG-1190-5.patch There have been some changes (as recorded in the Changes Section, Nov 2 2009 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the load/store interfaces - this jira is to track the task of making those changes under src. Changes under test will be addresses in a different jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1090) Update sources to reflect recent changes in load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794213#action_12794213 ] Thejas M Nair commented on PIG-1090: My previous comment is regardingPIG-1090-4.patch . Update sources to reflect recent changes in load-store interfaces - Key: PIG-1090 URL: https://issues.apache.org/jira/browse/PIG-1090 Project: Pig Issue Type: Sub-task Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090.patch, PIG-1190-5.patch There have been some changes (as recorded in the Changes Section, Nov 2 2009 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the load/store interfaces - this jira is to track the task of making those changes under src. Changes under test will be addresses in a different jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1164) [zebra]smoke test
[ https://issues.apache.org/jira/browse/PIG-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Huang updated PIG-1164: Attachment: smoke.patch patch for zebra smoke test [zebra]smoke test - Key: PIG-1164 URL: https://issues.apache.org/jira/browse/PIG-1164 Project: Pig Issue Type: Test Affects Versions: 0.6.0 Reporter: Jing Huang Fix For: 0.7.0 Attachments: smoke.patch Change zebra build.xml file to add smoke target. And env.sh and run script under zebra/src/test/smoke dir -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1102) Collect number of spills per job
[ https://issues.apache.org/jira/browse/PIG-1102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794217#action_12794217 ] Sriranjan Manjunath commented on PIG-1102: -- (3) refers to the case where we try to guess the number of records that fit into memory and start spilling the other records. InternalCachedBag.java addresses this case: +if (cacheLimit!= 0 mContents.size() % cacheLimit == 0) { +/* Increment the spill count*/ +incSpillCount(PigCounters.PROACTIVE_SPILL_COUNT); +} } cacheLimit holds the number of records that can be held in memory whereas mContents is the tuple that holds all the records. Here, I do not increment the counter for every record. Instead I count every n'th record, n being the cacheLimit. This however, does not increment the counter by the buffer size. Incrementing it by the buffer size will give us a value which approximately equal to the number of spilled records. Collect number of spills per job Key: PIG-1102 URL: https://issues.apache.org/jira/browse/PIG-1102 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Sriranjan Manjunath Fix For: 0.7.0 Attachments: PIG_1102.patch, PIG_1102.patch.1 Memory shortage is one of the main performance issues in Pig. Knowing when we spill do the disk is useful for understanding query performance and also to see how certain changes in Pig effect that. Other interesting stats to collect would be average CPU usage and max mem usage but I am not sure if this information is easily retrievable. Using Hadoop counters for this would make sense. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1170) [zebra] end to end test and stress test
[ https://issues.apache.org/jira/browse/PIG-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jing Huang updated PIG-1170: Attachment: e2eStress.patch zebra e2e and stress test patch. No unit test is need. [zebra] end to end test and stress test --- Key: PIG-1170 URL: https://issues.apache.org/jira/browse/PIG-1170 Project: Pig Issue Type: Test Affects Versions: 0.6.0 Reporter: Jing Huang Fix For: 0.7.0 Attachments: e2eStress.patch Add test cases for zebra end 2 end test , stress test and stress test verification tool. No unit test is needed for this jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1164) [zebra]smoke test
[ https://issues.apache.org/jira/browse/PIG-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1164: -- Status: Patch Available (was: Open) [zebra]smoke test - Key: PIG-1164 URL: https://issues.apache.org/jira/browse/PIG-1164 Project: Pig Issue Type: Test Affects Versions: 0.6.0 Reporter: Jing Huang Fix For: 0.7.0 Attachments: smoke.patch Change zebra build.xml file to add smoke target. And env.sh and run script under zebra/src/test/smoke dir -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1170) [zebra] end to end test and stress test
[ https://issues.apache.org/jira/browse/PIG-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1170: -- Status: Patch Available (was: Open) [zebra] end to end test and stress test --- Key: PIG-1170 URL: https://issues.apache.org/jira/browse/PIG-1170 Project: Pig Issue Type: Test Affects Versions: 0.6.0 Reporter: Jing Huang Fix For: 0.7.0 Attachments: e2eStress.patch Add test cases for zebra end 2 end test , stress test and stress test verification tool. No unit test is needed for this jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1146) Inconsistent column pruning in LOUnion
[ https://issues.apache.org/jira/browse/PIG-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794232#action_12794232 ] Pradeep Kamath commented on PIG-1146: - +1 Inconsistent column pruning in LOUnion -- Key: PIG-1146 URL: https://issues.apache.org/jira/browse/PIG-1146 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 Attachments: PIG-1146-1.patch, PIG-1146-2.patch This happens when we do a union on two relations, if one column comes from a loader, the other matching column comes from a constant, and this column get pruned. We prune for the one from loader and did not prune the constant. Thus leaves union an inconsistent state. Here is a script: {code} a = load '1.txt' as (a0, a1:chararray, a2); b = load '2.txt' as (b0, b2); c = foreach b generate b0, 'hello', b2; d = union a, c; e = foreach d generate $0, $2; dump e; {code} 1.txt: {code} ulysses thompson64 1.90 katie carson25 3.65 {code} 2.txt: {code} luke king 0.73 holly davidson 2.43 {code} expected output: (ulysses thompson,1.90) (katie carson,3.65) (luke king,0.73) (holly davidson,2.43) real output: (ulysses thompson,) (katie carson,) (luke king,0.73) (holly davidson,2.43) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1148) Move splitable logic from pig latin to InputFormat
[ https://issues.apache.org/jira/browse/PIG-1148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1148: Resolution: Fixed Release Note: split by 'file' is not not allowed as part of the load statement to process input files in one map. To achieve this users will have to use an InputFormat in the loader which can return one split for the whole file. Hadoop Flags: [Incompatible change, Reviewed] Status: Resolved (was: Patch Available) +1, Thanks for the contribution Jeff - I have committed this patch on your behalf. Move splitable logic from pig latin to InputFormat -- Key: PIG-1148 URL: https://issues.apache.org/jira/browse/PIG-1148 Project: Pig Issue Type: Sub-task Reporter: Jeff Zhang Assignee: Jeff Zhang Attachments: PIG-1148.patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1090) Update sources to reflect recent changes in load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794253#action_12794253 ] Daniel Dai commented on PIG-1090: - Regarding to PIG-1090-4.patch, In LOLoad.getSchema, we shall remove the lines to setup pig.loader.signature. In the new design, UDF writers should use signature inside the LoadFun to keep track of signature rather than the Configuration. Other part relate to signature and push projection looks good to me. Update sources to reflect recent changes in load-store interfaces - Key: PIG-1090 URL: https://issues.apache.org/jira/browse/PIG-1090 Project: Pig Issue Type: Sub-task Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090.patch, PIG-1190-5.patch There have been some changes (as recorded in the Changes Section, Nov 2 2009 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the load/store interfaces - this jira is to track the task of making those changes under src. Changes under test will be addresses in a different jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1094) Fix unit tests corresponding to source changes so far
[ https://issues.apache.org/jira/browse/PIG-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1094: --- Attachment: PIG-1094_6.patch This patch replaces PIG-1094_5.patch As per Pradeep's suggestion, keeping the check in WeightedRangePartitioner.java to ensure that input is empty if sample file is empty. Also merged with latest changes in LSR branch. Fix unit tests corresponding to source changes so far - Key: PIG-1094 URL: https://issues.apache.org/jira/browse/PIG-1094 Project: Pig Issue Type: Sub-task Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1094.patch, PIG-1094_2.patch, PIG-1094_3.patch, PIG-1094_4.patch, PIG-1094_5.patch, PIG-1094_6.patch The check-in's so far on load-store-redesign branch have nor addressed unit test failures due to interface changes. This jira is to track the task of making the common case unit tests work with the new interfaces. Some aspects of the new proposal like using LoadCaster interface for casting, making local mode work have not been completed yet. Tests which are failing due to those reasons will not be fixed in this jira and addressed in the jiras corresponding to those tasks -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-1169) Problems with some top N queries
[ https://issues.apache.org/jira/browse/PIG-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding reassigned PIG-1169: - Assignee: Richard Ding Problems with some top N queries Key: PIG-1169 URL: https://issues.apache.org/jira/browse/PIG-1169 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Richard Ding Assignee: Richard Ding Recently, a couple of problems related to the Top N queries were reported by users. * From Chuang Liu: We tried to get top N results after a groupby and sort, and got different results with or without storing the full sorted results. Here is a skeleton of our pig script. {code} raw_data = Load 'input_files' AS (f1, f2, ..., fn); grouped = group raw_data by (f1, f2); data = foreach grouped generate FLATTEN(group). SUM(raw_data.fk) as value; ordered = order data by value DESC parallel 10; topn = limit ordered 10; store ordered into 'outputdir/full'; store topn into 'outputdir/topn'; {code} With the statement 'store ordered ...', top N results are incorrect, but without the statement, results are correct. Has anyone seen this before? I know a similar bug has been fixed in the multi-query release. We are on pig .4 and hadoop .20.1. * From Corry Haines: I am not sure if this is a bug, or something more subtle, but here is the problem that I am having. When I LOAD a dataset, change it with an ORDER, LIMIT it, then CROSS it with itself, the results are not correct. I expect to see the cross of the limited, ordered dataset, but instead I see the cross of the limited dataset. Effectively, its like the LIMIT is being excluded. Pig Version: 0.5.0 Hadoop Version: 0.20.1 I would greatly appreciate some help, as this is somewhat frustrating. Example code (and output) follows: {code} A = load 'foo' as (f1:int, f2:int, f3:int); B = load 'foo' as (f1:int, f2:int, f3:int); a = ORDER A BY f1 DESC; b = ORDER B BY f1 DESC; aa = LIMIT a 1; bb = LIMIT b 1; C = CROSS aa, bb; DUMP C; {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1136) [zebra] Map Split of Storage info do not allow for leading underscore char '_'
[ https://issues.apache.org/jira/browse/PIG-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794281#action_12794281 ] Hadoop QA commented on PIG-1136: +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12428866/pig-1136-xuefu-new.patch against trunk revision 893373. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/156/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/156/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/156/console This message is automatically generated. [zebra] Map Split of Storage info do not allow for leading underscore char '_' -- Key: PIG-1136 URL: https://issues.apache.org/jira/browse/PIG-1136 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Yan Zhou Priority: Minor Attachments: pig-1136-xuefu-new.patch There is some user need to support that type of map keys. Pig's column does not allow for leading underscore, but apparently no restriction is placed on the map key. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1136) [zebra] Map Split of Storage info do not allow for leading underscore char '_'
[ https://issues.apache.org/jira/browse/PIG-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-1136: -- Resolution: Fixed Fix Version/s: 0.7.0 Status: Resolved (was: Patch Available) Patch committed to Apache trunk. [zebra] Map Split of Storage info do not allow for leading underscore char '_' -- Key: PIG-1136 URL: https://issues.apache.org/jira/browse/PIG-1136 Project: Pig Issue Type: Bug Affects Versions: 0.7.0 Reporter: Yan Zhou Priority: Minor Fix For: 0.7.0 Attachments: pig-1136-xuefu-new.patch There is some user need to support that type of map keys. Pig's column does not allow for leading underscore, but apparently no restriction is placed on the map key. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1166) A bit change of the interface of Tuple DataBag ( make the set and append method return this)
[ https://issues.apache.org/jira/browse/PIG-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794301#action_12794301 ] Jeff Zhang commented on PIG-1166: - I meet this release audit problem several times, could anyone tell me what things does release audit include, so I would be more careful the next time. A bit change of the interface of Tuple DataBag ( make the set and append method return this) -- Key: PIG-1166 URL: https://issues.apache.org/jira/browse/PIG-1166 Project: Pig Issue Type: Improvement Reporter: Jeff Zhang Assignee: Jeff Zhang Priority: Minor Attachments: Pig_1166.patch When people write unit test for UDF, they always need to build a tuple or bag. If we change the interface of Tuple and DataBag, make the set and append method return this, it can decrease the code size. e.g. Now people have to write the following code to build a Tuple: {code} Tuple tuple=TupleFactory.getInstance().newTuple(3); tuple.set(0,item_0); tuple.set(1,item_1); tuple.set(2,item_2); {code} If we change the interface, make the set and append method return this, we can rewrite the above code like this: {code} Tuple tuple=TupleFactory.getInstance().newTuple(3); tuple.set(0,item_0).set(1,item_1).set(2,item_2); {code} This interface change won't have back compatibility problem and I think there's no performance problem too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-761) ERROR 2086 on simple JOIN
[ https://issues.apache.org/jira/browse/PIG-761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12794324#action_12794324 ] Hadoop QA commented on PIG-761: --- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12428861/PIG-761-1.patch against trunk revision 893660. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/157/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/157/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/157/console This message is automatically generated. ERROR 2086 on simple JOIN - Key: PIG-761 URL: https://issues.apache.org/jira/browse/PIG-761 Project: Pig Issue Type: Bug Affects Versions: 0.2.0 Environment: mapreduce mode Reporter: Vadim Zaliva Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-761-1.patch ERROR 2086: Unexpected problem during optimization. Could not find all LocalRearrange operators.org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias 109 doing pretty straightforward join in one of my pig scripts. I am able to 'dump' both relationship involved in this join. when I try to join them I am getting this error. Here is a full log: ERROR 2086: Unexpected problem during optimization. Could not find all LocalRearrange operators. org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias 109 at org.apache.pig.PigServer.registerQuery(PigServer.java:296) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:529) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:280) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:99) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75) at org.apache.pig.Main.main(Main.java:319) Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2043: Unexpected error during execution. at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:274) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:700) at org.apache.pig.PigServer.execute(PigServer.java:691) at org.apache.pig.PigServer.registerQuery(PigServer.java:292) ... 5 more Caused by: org.apache.pig.impl.plan.optimizer.OptimizerException: ERROR 2086: Unexpected problem during optimization. Could not find all LocalRearrange operators. at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator.handlePackage(POPackageAnnotator.java:116) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.plans.POPackageAnnotator.visitMROp(POPackageAnnotator.java:88) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:194) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper.visit(MapReduceOper.java:43) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:65) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67) at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer. MapReduceLauncher.compile(MapReduceLauncher.java:198) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:80) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:261) ... 8 more ERROR 1002: Unable to store alias 398