[jira] Created: (PIG-1221) Filter equality does not work for tuples
Filter equality does not work for tuples Key: PIG-1221 URL: https://issues.apache.org/jira/browse/PIG-1221 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.5.0 Environment: Windows and Linux. Java 1.6 hadoop 0.20.1 Reporter: Neil Blue From the documentation I understand that it should be possible to filter a relation based on the equality of tuples. http://wiki.apache.org/pig/PigTypesFunctionalSpec , http://hadoop.apache.org/pig/docs/r0.5.0/piglatin_reference.html#deref: However with this data file -- indext.txt: (1,one) (1,ONE) (2,two) (22, twentytwo) (3,three) (3,three) I run this pig script: A = LOAD 'indext.txt' AS (t1:(a:int, b:chararray), t2:(a:int, b:chararray)); B = FILTER A BY t1==t2; DUMP B; Expecting the output: ((3,three),(3,three)) However there is an error: 2010-02-03 09:05:20,523 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2067: EqualToExpr does not know how to handle type: tuple Pig Stack Trace --- ERROR 2067: EqualToExpr does not know how to handle type: tuple org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias B at org.apache.pig.PigServer.openIterator(PigServer.java:475) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java: 532) at org .apache .pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser. java:190) at org .apache .pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166 ) at org .apache .pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142 ) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) at org.apache.pig.Main.main(Main.java:397) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias B at org.apache.pig.PigServer.store(PigServer.java:530) at org.apache.pig.PigServer.openIterator(PigServer.java:458) ... 6 more Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2067: EqualToExpr does not know how to handle type: tuple at org .apache .pig.backend.hadoop.executionengine.physicalLayer.expressionOperat ors.EqualToExpr.getNext(EqualToExpr.java:108) at org .apache .pig.backend.hadoop.executionengine.physicalLayer.relationalOperat ors.POFilter.getNext(POFilter.java:148) at org .apache .pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator .processInput(PhysicalOperator.java:231) at org .apache .pig.backend.local.executionengine.physicalLayer.counters.POCounte r.getNext(POCounter.java:71) at org .apache .pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator .processInput(PhysicalOperator.java:231) at org .apache .pig.backend.hadoop.executionengine.physicalLayer.relationalOperat ors.POStore.getNext(POStore.java:117) at org .apache .pig.backend.local.executionengine.LocalPigLauncher.runPipeline(Lo calPigLauncher.java:146) at org .apache .pig.backend.local.executionengine.LocalPigLauncher.launchPig(Loca lPigLauncher.java:109) at org .apache .pig.backend.local.executionengine.LocalExecutionEngine.execute(Lo calExecutionEngine.java:165) Thanks Neil -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-259) allow store to overwrite existing directroy
[ https://issues.apache.org/jira/browse/PIG-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated PIG-259: --- Attachment: Pig_259.patch I choose the keyword overwrite to indicate user want to overwrite the file. The following is the implementation details: 1. Add an variable isOverWrite in LOStore 2. In the InputOutputFileValidator, delete the destination file first if you use the overwrite keyword. allow store to overwrite existing directroy --- Key: PIG-259 URL: https://issues.apache.org/jira/browse/PIG-259 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Olga Natkovich Assignee: Jeff Zhang Fix For: 0.8.0 Attachments: Pig_259.patch we have users who are asking for a flag to overwrite existing directory -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1217) [piggybank] evaluation.util.Top is broken
[ https://issues.apache.org/jira/browse/PIG-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829557#action_12829557 ] Hadoop QA commented on PIG-1217: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12434746/fix_top_udf.diff against trunk revision 906326. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/198/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/198/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/198/console This message is automatically generated. [piggybank] evaluation.util.Top is broken - Key: PIG-1217 URL: https://issues.apache.org/jira/browse/PIG-1217 Project: Pig Issue Type: Bug Affects Versions: 0.3.0, 0.3.1, 0.4.0, site, 0.5.0, 0.6.0, 0.7.0 Reporter: Dmitriy V. Ryaboy Assignee: Dmitriy V. Ryaboy Priority: Minor Fix For: 0.7.0 Attachments: fix_top_udf.diff The Top udf has been broken for a while, due to an incorrect implementation of getArgToFuncMapping. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1219) Extra listStatus call to the namenode in WeightedRangePartitioner
[ https://issues.apache.org/jira/browse/PIG-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829575#action_12829575 ] Hadoop QA commented on PIG-1219: -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12434747/PIG-1219-2.patch against trunk revision 906326. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/191/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/191/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/191/console This message is automatically generated. Extra listStatus call to the namenode in WeightedRangePartitioner - Key: PIG-1219 URL: https://issues.apache.org/jira/browse/PIG-1219 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1219-1.patch, PIG-1219-2.patch We call FileLocalizer.openDFSFile in WeightedRangePartitioner to open quantile file. openDFSFile internally will check the existence of the quantile file, which adds burden to hdfs namenode. We shall remove this extra check. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1217) [piggybank] evaluation.util.Top is broken
[ https://issues.apache.org/jira/browse/PIG-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829589#action_12829589 ] Dmitriy V. Ryaboy commented on PIG-1217: The test failures appear to be unrelated to this change. Please review. [piggybank] evaluation.util.Top is broken - Key: PIG-1217 URL: https://issues.apache.org/jira/browse/PIG-1217 Project: Pig Issue Type: Bug Affects Versions: 0.3.0, 0.3.1, 0.4.0, site, 0.5.0, 0.6.0, 0.7.0 Reporter: Dmitriy V. Ryaboy Assignee: Dmitriy V. Ryaboy Priority: Minor Fix For: 0.7.0 Attachments: fix_top_udf.diff The Top udf has been broken for a while, due to an incorrect implementation of getArgToFuncMapping. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1046) join algorithm specification is within double quotes
[ https://issues.apache.org/jira/browse/PIG-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829608#action_12829608 ] Hadoop QA commented on PIG-1046: +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12434775/pig-1046_3.patch against trunk revision 906326. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/199/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/199/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/199/console This message is automatically generated. join algorithm specification is within double quotes Key: PIG-1046 URL: https://issues.apache.org/jira/browse/PIG-1046 Project: Pig Issue Type: Bug Reporter: Thejas M Nair Assignee: Ashutosh Chauhan Fix For: 0.7.0 Attachments: pig-1046.patch, pig-1046_1.patch, pig-1046_2.patch, pig-1046_3.patch This fails - j = join l1 by $0, l2 by $0 using 'skewed'; This works - j = join l1 by $0, l2 by $0 using skewed; String constants are single-quoted in pig-latin. If the algorithm specification is supposed to be a string, specifying it within single quotes should be supported. Alternatively, we should be using identifiers here, since these are pre-defined in pig users will not be specifying arbitrary values that might not be valid identifier. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1221) Filter equality does not work for tuples
[ https://issues.apache.org/jira/browse/PIG-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829618#action_12829618 ] Ashutosh Chauhan commented on PIG-1221: --- Looking at code it seems we don't support equality on maps either, while specification tells us we should. Filter equality does not work for tuples Key: PIG-1221 URL: https://issues.apache.org/jira/browse/PIG-1221 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.5.0 Environment: Windows and Linux. Java 1.6 hadoop 0.20.1 Reporter: Neil Blue From the documentation I understand that it should be possible to filter a relation based on the equality of tuples. http://wiki.apache.org/pig/PigTypesFunctionalSpec , http://hadoop.apache.org/pig/docs/r0.5.0/piglatin_reference.html#deref: However with this data file -- indext.txt: (1,one) (1,ONE) (2,two) (22, twentytwo) (3,three) (3,three) I run this pig script: A = LOAD 'indext.txt' AS (t1:(a:int, b:chararray), t2:(a:int, b:chararray)); B = FILTER A BY t1==t2; DUMP B; Expecting the output: ((3,three),(3,three)) However there is an error: 2010-02-03 09:05:20,523 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2067: EqualToExpr does not know how to handle type: tuple Pig Stack Trace --- ERROR 2067: EqualToExpr does not know how to handle type: tuple org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias B at org.apache.pig.PigServer.openIterator(PigServer.java:475) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java: 532) at org .apache .pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser. java:190) at org .apache .pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166 ) at org .apache .pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142 ) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) at org.apache.pig.Main.main(Main.java:397) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias B at org.apache.pig.PigServer.store(PigServer.java:530) at org.apache.pig.PigServer.openIterator(PigServer.java:458) ... 6 more Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR 2067: EqualToExpr does not know how to handle type: tuple at org .apache .pig.backend.hadoop.executionengine.physicalLayer.expressionOperat ors.EqualToExpr.getNext(EqualToExpr.java:108) at org .apache .pig.backend.hadoop.executionengine.physicalLayer.relationalOperat ors.POFilter.getNext(POFilter.java:148) at org .apache .pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator .processInput(PhysicalOperator.java:231) at org .apache .pig.backend.local.executionengine.physicalLayer.counters.POCounte r.getNext(POCounter.java:71) at org .apache .pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator .processInput(PhysicalOperator.java:231) at org .apache .pig.backend.hadoop.executionengine.physicalLayer.relationalOperat ors.POStore.getNext(POStore.java:117) at org .apache .pig.backend.local.executionengine.LocalPigLauncher.runPipeline(Lo calPigLauncher.java:146) at org .apache .pig.backend.local.executionengine.LocalPigLauncher.launchPig(Loca lPigLauncher.java:109) at org .apache .pig.backend.local.executionengine.LocalExecutionEngine.execute(Lo calExecutionEngine.java:165) Thanks Neil -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1217) [piggybank] evaluation.util.Top is broken
[ https://issues.apache.org/jira/browse/PIG-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy V. Ryaboy updated PIG-1217: --- Status: Open (was: Patch Available) Huh. I wonder what Hudson tested -- I accidentally attached an old version of the unit test, which doesn't even compile with the new Top. But Hudson passed contrib tests, and managed to fail on core tests. [piggybank] evaluation.util.Top is broken - Key: PIG-1217 URL: https://issues.apache.org/jira/browse/PIG-1217 Project: Pig Issue Type: Bug Affects Versions: 0.3.0, 0.3.1, 0.4.0, site, 0.5.0, 0.6.0, 0.7.0 Reporter: Dmitriy V. Ryaboy Assignee: Dmitriy V. Ryaboy Priority: Minor Fix For: 0.7.0 Attachments: fix_top_udf.diff The Top udf has been broken for a while, due to an incorrect implementation of getArgToFuncMapping. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1217) [piggybank] evaluation.util.Top is broken
[ https://issues.apache.org/jira/browse/PIG-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy V. Ryaboy updated PIG-1217: --- Attachment: fix_top_udf.diff [piggybank] evaluation.util.Top is broken - Key: PIG-1217 URL: https://issues.apache.org/jira/browse/PIG-1217 Project: Pig Issue Type: Bug Affects Versions: 0.3.0, 0.3.1, 0.4.0, site, 0.5.0, 0.6.0, 0.7.0 Reporter: Dmitriy V. Ryaboy Assignee: Dmitriy V. Ryaboy Priority: Minor Fix For: 0.7.0 Attachments: fix_top_udf.diff, fix_top_udf.diff The Top udf has been broken for a while, due to an incorrect implementation of getArgToFuncMapping. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1217) [piggybank] evaluation.util.Top is broken
[ https://issues.apache.org/jira/browse/PIG-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy V. Ryaboy updated PIG-1217: --- Status: Patch Available (was: Open) [piggybank] evaluation.util.Top is broken - Key: PIG-1217 URL: https://issues.apache.org/jira/browse/PIG-1217 Project: Pig Issue Type: Bug Affects Versions: 0.3.0, 0.3.1, 0.4.0, site, 0.5.0, 0.6.0, 0.7.0 Reporter: Dmitriy V. Ryaboy Assignee: Dmitriy V. Ryaboy Priority: Minor Fix For: 0.7.0 Attachments: fix_top_udf.diff, fix_top_udf.diff The Top udf has been broken for a while, due to an incorrect implementation of getArgToFuncMapping. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1219) Extra listStatus call to the namenode in WeightedRangePartitioner
[ https://issues.apache.org/jira/browse/PIG-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1219: Status: Open (was: Patch Available) Extra listStatus call to the namenode in WeightedRangePartitioner - Key: PIG-1219 URL: https://issues.apache.org/jira/browse/PIG-1219 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1219-1.patch, PIG-1219-2.patch We call FileLocalizer.openDFSFile in WeightedRangePartitioner to open quantile file. openDFSFile internally will check the existence of the quantile file, which adds burden to hdfs namenode. We shall remove this extra check. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1219) Extra listStatus call to the namenode in WeightedRangePartitioner
[ https://issues.apache.org/jira/browse/PIG-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1219: Attachment: PIG-1219-3.patch The test failure is because the way we test it, not the core code. We now require the quantile file to be created before we run JobControlCompiler. In our testcase, we invoke the methods of JobControlCompiler directly without actually running the job, so we do not have quantile file when we get into JobControlCompiler. Change testcase to force create the quantile file. Extra listStatus call to the namenode in WeightedRangePartitioner - Key: PIG-1219 URL: https://issues.apache.org/jira/browse/PIG-1219 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1219-1.patch, PIG-1219-2.patch, PIG-1219-3.patch We call FileLocalizer.openDFSFile in WeightedRangePartitioner to open quantile file. openDFSFile internally will check the existence of the quantile file, which adds burden to hdfs namenode. We shall remove this extra check. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1217) [piggybank] evaluation.util.Top is broken
[ https://issues.apache.org/jira/browse/PIG-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829688#action_12829688 ] Alan Gates commented on PIG-1217: - In general, looks good. A comment on Top.Initial. If you do something like B = group A ... C = foreach B generate myudf(A); and myudf is algebraic, you are guaranteed to only get one record at a time in the Initial function because Pig doesn't do any collecting of the keys. That is, even if ten records in a row have the same key Pig won't detect that and collate them into the bag before calling Initial. We take advantage of that in a number of the built in functions (eg COUNT) to make the processing of Initial easier. You may want to do the same here. As far as getting it into 0.6 release, I think Olga was trying to roll the package today or tomorrow, so we may be out of time. [piggybank] evaluation.util.Top is broken - Key: PIG-1217 URL: https://issues.apache.org/jira/browse/PIG-1217 Project: Pig Issue Type: Bug Affects Versions: 0.3.0, 0.3.1, 0.4.0, site, 0.5.0, 0.6.0, 0.7.0 Reporter: Dmitriy V. Ryaboy Assignee: Dmitriy V. Ryaboy Priority: Minor Fix For: 0.7.0 Attachments: fix_top_udf.diff, fix_top_udf.diff The Top udf has been broken for a while, due to an incorrect implementation of getArgToFuncMapping. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1219) Extra listStatus call to the namenode in WeightedRangePartitioner
[ https://issues.apache.org/jira/browse/PIG-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829690#action_12829690 ] Olga Natkovich commented on PIG-1219: - I asked Pradeep to also review the code - just to have another set of eyes since this change is so late in the game and is not streighforward Extra listStatus call to the namenode in WeightedRangePartitioner - Key: PIG-1219 URL: https://issues.apache.org/jira/browse/PIG-1219 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1219-1.patch, PIG-1219-2.patch, PIG-1219-3.patch We call FileLocalizer.openDFSFile in WeightedRangePartitioner to open quantile file. openDFSFile internally will check the existence of the quantile file, which adds burden to hdfs namenode. We shall remove this extra check. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1209) Port POJoinPackage to proactively spill
[ https://issues.apache.org/jira/browse/PIG-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829691#action_12829691 ] Olga Natkovich commented on PIG-1209: - +1. Changes look good Port POJoinPackage to proactively spill --- Key: PIG-1209 URL: https://issues.apache.org/jira/browse/PIG-1209 Project: Pig Issue Type: Bug Reporter: Sriranjan Manjunath Assignee: Ashutosh Chauhan Fix For: 0.7.0 Attachments: pig-1209.patch POPackage proactively spills the bag whereas POJoinPackage still uses the SpillableMemoryManager. We should port this to use InternalCacheBag which proactively spills. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1217) [piggybank] evaluation.util.Top is broken
[ https://issues.apache.org/jira/browse/PIG-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829694#action_12829694 ] Dmitriy V. Ryaboy commented on PIG-1217: I see, thanks for the tip. How does this work with tuple reuse -- can I just return the input tuple, or do I need to copy the contents to a new tuple in Top.Initial() ? No worries about 0.6, I'd rather it finally go out than try to get something like this in at the last moment. [piggybank] evaluation.util.Top is broken - Key: PIG-1217 URL: https://issues.apache.org/jira/browse/PIG-1217 Project: Pig Issue Type: Bug Affects Versions: 0.3.0, 0.3.1, 0.4.0, site, 0.5.0, 0.6.0, 0.7.0 Reporter: Dmitriy V. Ryaboy Assignee: Dmitriy V. Ryaboy Priority: Minor Fix For: 0.7.0 Attachments: fix_top_udf.diff, fix_top_udf.diff The Top udf has been broken for a while, due to an incorrect implementation of getArgToFuncMapping. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-259) allow store to overwrite existing directroy
[ https://issues.apache.org/jira/browse/PIG-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829697#action_12829697 ] Alan Gates commented on PIG-259: A few comments and questions on this: 1) We should make this work against the load/store branch instead of trunk. We're hoping to merge load/store into trunk in a week or two, so it makes more sense to put it there. This will also have implications for load/store. One, it will need to communicate to the new validate function that it's ok if the file (or whatever is being overwritten) exists. Two, load implementations will need to handle removing the file (or whatever) if necessary. For example, PigStorage will need to handle removing the file so MR doesn't complain. 2) Should we have overwrite be a keyword (as originally proposed and in the patch) or should it be string, like hints in join? I don't have a strong opinion one way or another but I think it's worth considering which we want. 3) Is the semantic of overwrite that it saves whether the file is there or not, or that it's an error if the file is not there to write? Write whether there or not makes more sense to me, but I wanted to make sure we all agree on it. 4) What happens when a user requests overwrite and the job fails before it runs? In the current implementation the file will be removed up front, so any planning errors will still result in the file being removed. Also, the file will be removed up front, even if the job remains in Hadoop's queue for a long time waiting to run. At the very least, I think Pig should delay removing the file until it is ready to launch the job so that type checking errors or whatever don't result in the file being removed when the job is not run. allow store to overwrite existing directroy --- Key: PIG-259 URL: https://issues.apache.org/jira/browse/PIG-259 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Olga Natkovich Assignee: Jeff Zhang Fix For: 0.8.0 Attachments: Pig_259.patch we have users who are asking for a flag to overwrite existing directory -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-259) allow store to overwrite existing directroy
[ https://issues.apache.org/jira/browse/PIG-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829728#action_12829728 ] Hadoop QA commented on PIG-259: --- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12434801/Pig_259.patch against trunk revision 906326. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/192/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/192/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/192/console This message is automatically generated. allow store to overwrite existing directroy --- Key: PIG-259 URL: https://issues.apache.org/jira/browse/PIG-259 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Olga Natkovich Assignee: Jeff Zhang Fix For: 0.8.0 Attachments: Pig_259.patch we have users who are asking for a flag to overwrite existing directory -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1046) join algorithm specification is within double quotes
[ https://issues.apache.org/jira/browse/PIG-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829731#action_12829731 ] Olga Natkovich commented on PIG-1046: - (1) I think the error message should be made a little more clear on invalid cogroup modifier. Something like: Only COLLECTED or REGULAR are valid GROUP modifiers. (2) There seems to be some code duplication to support doubequotes. It would be better if you just had warning for deprication but then had the rest of the code in one place. (3) Similar comments for the join part. join algorithm specification is within double quotes Key: PIG-1046 URL: https://issues.apache.org/jira/browse/PIG-1046 Project: Pig Issue Type: Bug Reporter: Thejas M Nair Assignee: Ashutosh Chauhan Fix For: 0.7.0 Attachments: pig-1046.patch, pig-1046_1.patch, pig-1046_2.patch, pig-1046_3.patch This fails - j = join l1 by $0, l2 by $0 using 'skewed'; This works - j = join l1 by $0, l2 by $0 using skewed; String constants are single-quoted in pig-latin. If the algorithm specification is supposed to be a string, specifying it within single quotes should be supported. Alternatively, we should be using identifiers here, since these are pre-defined in pig users will not be specifying arbitrary values that might not be valid identifier. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1131) Pig simple join does not work when it contains empty lines
[ https://issues.apache.org/jira/browse/PIG-1131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829736#action_12829736 ] Ashutosh Chauhan commented on PIG-1131: --- Can't reproduce this on trunk. PIG-1194 touched upon the same piece of code and was recently checked in. That one might have fixed this one too. Viraj, can you please confirm if you can reproduce it or some variant of it ? Pig simple join does not work when it contains empty lines -- Key: PIG-1131 URL: https://issues.apache.org/jira/browse/PIG-1131 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Viraj Bhat Assignee: Ashutosh Chauhan Fix For: 0.7.0 Attachments: junk1.txt, junk2.txt, simplejoinscript.pig I have a simple script, which does a JOIN. {code} input1 = load '/user/viraj/junk1.txt' using PigStorage(' '); describe input1; input2 = load '/user/viraj/junk2.txt' using PigStorage('\u0001'); describe input2; joineddata = JOIN input1 by $0, input2 by $0; describe joineddata; store joineddata into 'result'; {code} The input data contains empty lines. The join fails in the Map phase with the following error in the PRLocalRearrange.java java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:143) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.constructLROutput(POLocalRearrange.java:464) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:360) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POUnion.getNext(POUnion.java:162) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:253) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:244) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:94) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307) at org.apache.hadoop.mapred.Child.main(Child.java:159) I am surprised that the test cases did not detect this error. Could we add this data which contains empty lines to the testcases? Viraj -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1222) cast ends up with NULL value
cast ends up with NULL value Key: PIG-1222 URL: https://issues.apache.org/jira/browse/PIG-1222 Project: Pig Issue Type: Bug Reporter: Ying He I want to generate data with bags, so I did this, take a simple text file b.txt 100 apple 200 orange 300 pear 400 apple then run query: a = load 'b.txt' as (id, f); b = group a by id; store b into 'g' using BinStorage(); then run another query to load data generated from previous step. a = load 'g/part*' using BinStorage() as (id, d:bag{t:(v, s)}); b = foreach a generate (double)id, flatten(d); dump b; then I got the following result: (,100,apple) (,100,apple) (,200,orange) (,200,apple) (,300,strawberry) (,300,pear) (,400,pear) the value for id is gone. If there is no cast, then the result is correct. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with
[ https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829759#action_12829759 ] Alan Gates commented on PIG-1178: - Comments that came out of a review of the twiki doc the pig team did: 1) In OperatorPlan, the use of roots and leaves in the graph was considered confusing. Some people view roots as sources and some as sinks. It was recommended that we switch roots to sources and leaves to sinks to avoid confusion. 2) The new OperatorPlan does not include mergeSharedPlan, which was used by multi-query functionality in the old plan. After further investigation I found that merge is currently only used by multi-query for physical plans. While ideally we would like to use this infrastructure for physical plans too, I feel it is reasonable to put off adding merge until at least the initial prototyping phase is done. After briefling looking at it I see no reason why it should not work, though we may need a more precise way to decide when two nodes are the same and should be merged. 3) A point was raised that perhaps the optimizer should reset the annotations on the nodes after a transform and all the attached listeners have been run. With further thought, I don't think so, as there may be annotations we want to last across transforms. For example, a rule that could match an infinite number of times may want to sign a node to note it's already been there so that it does not fire on the node again. The easiest way to do this signing would be with the annotations. However, I can see that there would be a desire to clear certain annotations so that each pass of the optimizer has a fresh state. To accomplish this I was wondering if we should allow developers to add visitors that would be run after all the listeners run. So PlanOptimizer would change to have a new method: {code} addStatusResettingVisitor(Visitor v) { resetters.add(v); } {code} and in the optimize loop {code} for (OperatorPlan m : matches) { if (transformer.check(m)) { sawMatch = true; transformer.transform(m); for(PlanTransformListener l: listeners) { l.transformed(plan, transformer.reportChanges()); } } } {code} would change to be: {code} for (OperatorPlan m : matches) { if (transformer.check(m)) { sawMatch = true; transformer.transform(m); for(PlanTransformListener l: listeners) { l.transformed(plan, transformer.reportChanges()); } for(Visitor v : resetters) { v.visit(); } } } {code} Thoughts? 4) There is not clarity on how column pruning will work in the new optimizer. Will it be represented by a rule? If so, how, since the new optimizer does not allow matching on any operator just on specific operators? Would it be better instead to have it use the Transformers but not the PlanOptimizer infrastructure, since it isn't clear that we would want the column pruning rule to be triggered more than once? To answer these I think we should prototype the column pruning soon. It was one of the hardest parts of the existing infrastructure. We want to make sure it can be done well in this new approach before committing to the approach. 5) The comment was made that while the examples in the document appear to show that the proposal will work for nested plans (that is, inner plans in foreach) they do not show that it will work for operators not yet nestable in foreach (e.g. group, foreach). Since a stated goal of Pig Latin is to someday allow arbitrary nesting, we should validate that the proposal will support these additional operators to be nested in foreach. LogicalPlan and Optimizer are too complex and hard to work with --- Key: PIG-1178 URL: https://issues.apache.org/jira/browse/PIG-1178 Project: Pig Issue Type: Improvement Reporter: Alan Gates Assignee: Ying He Attachments: expressions-2.patch, expressions.patch, lp.patch, lp.patch, pig_1178.patch, PIG_1178.patch The current implementation of the logical plan and the logical optimizer in Pig has proven to not be easily extensible. Developer feedback has indicated that adding new rules to the optimizer is quite burdensome. In addition, the logical plan has been an area of numerous bugs, many of which have been difficult to fix. Developers also feel that the logical plan is difficult to understand and maintain. The root cause for these issues is that a number of design decisions that were made as part of the 0.2 rewrite of the front end have now proven to be sub-optimal. The heart of this proposal is to revisit a number of those proposals and rebuild the logical plan with a simpler design that will make it much easier
[jira] Updated: (PIG-1209) Port POJoinPackage to proactively spill
[ https://issues.apache.org/jira/browse/PIG-1209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated PIG-1209: -- Resolution: Fixed Status: Resolved (was: Patch Available) Patch checked-in. Port POJoinPackage to proactively spill --- Key: PIG-1209 URL: https://issues.apache.org/jira/browse/PIG-1209 Project: Pig Issue Type: Bug Reporter: Sriranjan Manjunath Assignee: Ashutosh Chauhan Fix For: 0.7.0 Attachments: pig-1209.patch POPackage proactively spills the bag whereas POJoinPackage still uses the SpillableMemoryManager. We should port this to use InternalCacheBag which proactively spills. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1217) [piggybank] evaluation.util.Top is broken
[ https://issues.apache.org/jira/browse/PIG-1217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829811#action_12829811 ] Hadoop QA commented on PIG-1217: +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12434839/fix_top_udf.diff against trunk revision 906326. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/200/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/200/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/200/console This message is automatically generated. [piggybank] evaluation.util.Top is broken - Key: PIG-1217 URL: https://issues.apache.org/jira/browse/PIG-1217 Project: Pig Issue Type: Bug Affects Versions: 0.3.0, 0.3.1, 0.4.0, site, 0.5.0, 0.6.0, 0.7.0 Reporter: Dmitriy V. Ryaboy Assignee: Dmitriy V. Ryaboy Priority: Minor Fix For: 0.7.0 Attachments: fix_top_udf.diff, fix_top_udf.diff The Top udf has been broken for a while, due to an incorrect implementation of getArgToFuncMapping. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1046) join algorithm specification is within double quotes
[ https://issues.apache.org/jira/browse/PIG-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated PIG-1046: -- Status: Open (was: Patch Available) join algorithm specification is within double quotes Key: PIG-1046 URL: https://issues.apache.org/jira/browse/PIG-1046 Project: Pig Issue Type: Bug Reporter: Thejas M Nair Assignee: Ashutosh Chauhan Fix For: 0.7.0 Attachments: pig-1046.patch, pig-1046_1.patch, pig-1046_2.patch, pig-1046_3.patch This fails - j = join l1 by $0, l2 by $0 using 'skewed'; This works - j = join l1 by $0, l2 by $0 using skewed; String constants are single-quoted in pig-latin. If the algorithm specification is supposed to be a string, specifying it within single quotes should be supported. Alternatively, we should be using identifiers here, since these are pre-defined in pig users will not be specifying arbitrary values that might not be valid identifier. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1046) join algorithm specification is within double quotes
[ https://issues.apache.org/jira/browse/PIG-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated PIG-1046: -- Attachment: pig-1046_4.patch Updated patch incorporating Olga's comments. join algorithm specification is within double quotes Key: PIG-1046 URL: https://issues.apache.org/jira/browse/PIG-1046 Project: Pig Issue Type: Bug Reporter: Thejas M Nair Assignee: Ashutosh Chauhan Fix For: 0.7.0 Attachments: pig-1046.patch, pig-1046_1.patch, pig-1046_2.patch, pig-1046_3.patch, pig-1046_4.patch This fails - j = join l1 by $0, l2 by $0 using 'skewed'; This works - j = join l1 by $0, l2 by $0 using skewed; String constants are single-quoted in pig-latin. If the algorithm specification is supposed to be a string, specifying it within single quotes should be supported. Alternatively, we should be using identifiers here, since these are pre-defined in pig users will not be specifying arbitrary values that might not be valid identifier. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1046) join algorithm specification is within double quotes
[ https://issues.apache.org/jira/browse/PIG-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated PIG-1046: -- Status: Patch Available (was: Open) join algorithm specification is within double quotes Key: PIG-1046 URL: https://issues.apache.org/jira/browse/PIG-1046 Project: Pig Issue Type: Bug Reporter: Thejas M Nair Assignee: Ashutosh Chauhan Fix For: 0.7.0 Attachments: pig-1046.patch, pig-1046_1.patch, pig-1046_2.patch, pig-1046_3.patch, pig-1046_4.patch This fails - j = join l1 by $0, l2 by $0 using 'skewed'; This works - j = join l1 by $0, l2 by $0 using skewed; String constants are single-quoted in pig-latin. If the algorithm specification is supposed to be a string, specifying it within single quotes should be supported. Alternatively, we should be using identifiers here, since these are pre-defined in pig users will not be specifying arbitrary values that might not be valid identifier. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1223) [zebra] Add cli to help admin zebra
[zebra] Add cli to help admin zebra --- Key: PIG-1223 URL: https://issues.apache.org/jira/browse/PIG-1223 Project: Pig Issue Type: Wish Reporter: He Yongqiang -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1219) Extra listStatus call to the namenode in WeightedRangePartitioner
[ https://issues.apache.org/jira/browse/PIG-1219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829865#action_12829865 ] Hadoop QA commented on PIG-1219: +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12434850/PIG-1219-3.patch against trunk revision 906326. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/193/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/193/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/193/console This message is automatically generated. Extra listStatus call to the namenode in WeightedRangePartitioner - Key: PIG-1219 URL: https://issues.apache.org/jira/browse/PIG-1219 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1219-1.patch, PIG-1219-2.patch, PIG-1219-3.patch We call FileLocalizer.openDFSFile in WeightedRangePartitioner to open quantile file. openDFSFile internally will check the existence of the quantile file, which adds burden to hdfs namenode. We shall remove this extra check. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-834) incorrect plan when algebraic functions are nested
[ https://issues.apache.org/jira/browse/PIG-834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated PIG-834: - Attachment: pig-834.patch In this patch, I look for a pattern of POUserFunc followed by another POUserFunc in the inner plan of ForEach and if thats found I flag the combiner optimizer to not fire. This disables the combiner for this particular query (test case included). Wondering if this fix is sufficient for this bug ? incorrect plan when algebraic functions are nested -- Key: PIG-834 URL: https://issues.apache.org/jira/browse/PIG-834 Project: Pig Issue Type: Bug Components: impl Reporter: Thejas M Nair Assignee: Ashutosh Chauhan Fix For: 0.7.0 Attachments: pig-834.patch a = load 'students.txt' as (c1,c2,c3,c4); c = group a by c2; f = foreach c generate COUNT(org.apache.pig.builtin.Distinct($1.$2)); Notice that Distinct udf is missing in Combiner and reduce stage. As a result distinct does not function, and incorrect results are produced. Distinct should have been evaluated in the 3 stages and output of Distinct should be given to COUNT in reduce stage. {code} # Map Reduce Plan #-- MapReduce node 1-122 Map Plan Local Rearrange[tuple]{bytearray}(false) - 1-139 | | | Project[bytearray][1] - 1-140 | |---New For Each(false,false)[bag] - 1-127 | | | POUserFunc(org.apache.pig.builtin.COUNT$Initial)[tuple] - 1-125 | | | |---POUserFunc(org.apache.pig.builtin.Distinct)[bag] - 1-126 | | | |---Project[bag][2] - 1-123 | | | |---Project[bag][1] - 1-124 | | | Project[bytearray][0] - 1-133 | |---Pre Combiner Local Rearrange[tuple]{Unknown} - 1-141 | |---Load(hdfs://wilbur11.labs.corp.sp1.yahoo.com/user/tejas/students.txt:org.apache.pig.builtin.PigStorage) - 1-111 Combine Plan Local Rearrange[tuple]{bytearray}(false) - 1-143 | | | Project[bytearray][1] - 1-144 | |---New For Each(false,false)[bag] - 1-132 | | | POUserFunc(org.apache.pig.builtin.COUNT$Intermediate)[tuple] - 1-130 | | | |---Project[bag][0] - 1-135 | | | Project[bytearray][1] - 1-134 | |---POCombinerPackage[tuple]{bytearray} - 1-137 Reduce Plan Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-121 | |---New For Each(false)[bag] - 1-120 | | | POUserFunc(org.apache.pig.builtin.COUNT$Final)[long] - 1-119 | | | |---Project[bag][0] - 1-136 | |---POCombinerPackage[tuple]{bytearray} - 1-145 Global sort: false {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
What's Zebra?
What's the architecture of Zebra? Do it depend on Pig and HDFS? Please help. I hope it's following: DFS is better for unstructed data, but DTS (not bigtable) is better for structed data, data warehouse is structed, so I think a table is better than a file. DTS is following: 1. Break a logic big table into a many physical small table 2. The same size blocks is not necessary 3. The order of blocks is not necessary 4. Only store structed data 5. Support block indexes 6. Support deleting and updating 7. The interfaces are SQL, but only a block 8. Spliting a table horizontally and vertically is supported at the same time 9. 。。。
Re: What's Zebra?
You can refer here http://wiki.apache.org/pig/zebra 2010/2/5 jian yi eyj...@gmail.com What's the architecture of Zebra? Do it depend on Pig and HDFS? Please help. I hope it's following: DFS is better for unstructed data, but DTS (not bigtable) is better for structed data, data warehouse is structed, so I think a table is better than a file. DTS is following: 1. Break a logic big table into a many physical small table 2. The same size blocks is not necessary 3. The order of blocks is not necessary 4. Only store structed data 5. Support block indexes 6. Support deleting and updating 7. The interfaces are SQL, but only a block 8. Spliting a table horizontally and vertically is supported at the same time 9. 。。。 -- Best Regards Jeff Zhang
[jira] Commented: (PIG-1046) join algorithm specification is within double quotes
[ https://issues.apache.org/jira/browse/PIG-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829962#action_12829962 ] Hadoop QA commented on PIG-1046: +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12434884/pig-1046_4.patch against trunk revision 906657. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/201/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/201/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/201/console This message is automatically generated. join algorithm specification is within double quotes Key: PIG-1046 URL: https://issues.apache.org/jira/browse/PIG-1046 Project: Pig Issue Type: Bug Reporter: Thejas M Nair Assignee: Ashutosh Chauhan Fix For: 0.7.0 Attachments: pig-1046.patch, pig-1046_1.patch, pig-1046_2.patch, pig-1046_3.patch, pig-1046_4.patch This fails - j = join l1 by $0, l2 by $0 using 'skewed'; This works - j = join l1 by $0, l2 by $0 using skewed; String constants are single-quoted in pig-latin. If the algorithm specification is supposed to be a string, specifying it within single quotes should be supported. Alternatively, we should be using identifiers here, since these are pre-defined in pig users will not be specifying arbitrary values that might not be valid identifier. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-259) allow store to overwrite existing directroy
[ https://issues.apache.org/jira/browse/PIG-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated PIG-259: --- Status: Open (was: Patch Available) allow store to overwrite existing directroy --- Key: PIG-259 URL: https://issues.apache.org/jira/browse/PIG-259 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Olga Natkovich Assignee: Jeff Zhang Fix For: 0.8.0 Attachments: Pig_259.patch, Pig_259_2.patch we have users who are asking for a flag to overwrite existing directory -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-259) allow store to overwrite existing directroy
[ https://issues.apache.org/jira/browse/PIG-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated PIG-259: --- Attachment: Pig_259_2.patch allow store to overwrite existing directroy --- Key: PIG-259 URL: https://issues.apache.org/jira/browse/PIG-259 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Olga Natkovich Assignee: Jeff Zhang Fix For: 0.8.0 Attachments: Pig_259.patch, Pig_259_2.patch we have users who are asking for a flag to overwrite existing directory -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-259) allow store to overwrite existing directroy
[ https://issues.apache.org/jira/browse/PIG-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829981#action_12829981 ] Jeff Zhang commented on PIG-259: Response to Alan regarding his comments, 1. I put the logic of deleting output file in JobControlCompiler, then it is easy for me to delay the deletion util the dependent job is done. 2. I prefer using keywords rather than string, because if using string, the following statement: {code} store a into 'output' 'overwrite'; {code} has two consecutive string, it looks a little weird in my opinion. 3. I think the semantic of overwrite is the same as it is in file system. In file system, when we overwrite file using java api, it won't complain even the file does not exist allow store to overwrite existing directroy --- Key: PIG-259 URL: https://issues.apache.org/jira/browse/PIG-259 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Olga Natkovich Assignee: Jeff Zhang Fix For: 0.8.0 Attachments: Pig_259.patch, Pig_259_2.patch we have users who are asking for a flag to overwrite existing directory -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-259) allow store to overwrite existing directroy
[ https://issues.apache.org/jira/browse/PIG-259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12829982#action_12829982 ] Jeff Zhang commented on PIG-259: Alan, Should I create a new sub task under Pig-966 ? or is there any way to move this task under Pig-966 ? allow store to overwrite existing directroy --- Key: PIG-259 URL: https://issues.apache.org/jira/browse/PIG-259 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Olga Natkovich Assignee: Jeff Zhang Fix For: 0.8.0 Attachments: Pig_259.patch, Pig_259_2.patch we have users who are asking for a flag to overwrite existing directory -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.