[jira] Commented: (PIG-911) [Piggybank] SequenceFileLoader
[ https://issues.apache.org/jira/browse/PIG-911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742239#action_12742239 ] Alan Gates commented on PIG-911: Dmitry, First this is great. We've had requests to read Sequence files. Being able to write them also would be great. A few thoughts: 1) This should not extend UTF8StorageConverter. This loader will be returning actual data types, not bytes that need to be interpreted. I would think instead that it should implement the bytesToX() methods itself and just throw an exception saying it didn't expect to do any conversion. 2) The getSampledTuple looks fine if skip is handling getting the stream to the point that reading the next tuple is viable. 3) In the bindTo call, where you obtain the key and value by reflection, should there be a try/catch block there in case the cast to Writable fails? In the same way, in describe schema you're asking how to suppress warnings from the cast in reader.getKeyClass(). But don't you want to check that what you got really is a writable, since there is no guarantee? > [Piggybank] SequenceFileLoader > --- > > Key: PIG-911 > URL: https://issues.apache.org/jira/browse/PIG-911 > Project: Pig > Issue Type: New Feature >Reporter: Dmitriy V. Ryaboy > Attachments: pig_sequencefile.patch > > > The proposed piggybank contribution adds a SequenceFileLoader to the > piggybank. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-907) Provide multiple version of HashFNV (Piggybank)
[ https://issues.apache.org/jira/browse/PIG-907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-907: --- Resolution: Fixed Status: Resolved (was: Patch Available) Patch committed > Provide multiple version of HashFNV (Piggybank) > --- > > Key: PIG-907 > URL: https://issues.apache.org/jira/browse/PIG-907 > Project: Pig > Issue Type: Improvement > Components: impl >Affects Versions: 0.3.0 >Reporter: Daniel Dai >Priority: Minor > Fix For: 0.4.0 > > Attachments: PIG-907-1.patch, PIG-907-2.patch > > > HashFNV takes 1 or 2 parameters. It is better to create 2 versions of HashFNV > when PIG-902 is not solved. So we can let the Pig pick the right version, do > the type cast. Otherwise, user have to do the explicit cast. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-893) support cast of chararray to other simple types
[ https://issues.apache.org/jira/browse/PIG-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-893: --- Resolution: Fixed Release Note: PIG-893: Added casts from chararray to int, long, float, and double. Status: Resolved (was: Patch Available) Patch checked in. Thanks Jeff for your work on this. > support cast of chararray to other simple types > --- > > Key: PIG-893 > URL: https://issues.apache.org/jira/browse/PIG-893 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.4.0 >Reporter: Thejas M Nair >Assignee: Jeff Zhang > Fix For: 0.4.0 > > Attachments: Pig_893.Patch > > > Pig should support casting of chararray to > integer,long,float,double,bytearray. If the conversion fails for reasons such > as overflow, cast should return null and log a warning. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-845) PERFORMANCE: Merge Join
[ https://issues.apache.org/jira/browse/PIG-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742231#action_12742231 ] Ashutosh Chauhan commented on PIG-845: -- Hi Pradeep, Thanks for the review. Please find my comments inline. 1) In LogicalPlanTester.java, why is the following change required? Typically when PigContext is constructed in Map-reduce mode, the properties should correspond to the cluster configuration. So the above initialization seems odd because the Properties object is an empty object in the constructor call above. >> This is required because in local mode merge join gets rewritten as a >> regular join. So, if we had exec type as local, the plan which I get in >> MRCompiler corresponds to regular join plan against which we cant test merge >> join plan. Properties object has no bearing here, because LogicalPlanTester >> is used only for testing logical plans. Further I think all our tests should >> have exec type as MapReduce because we want to test the correctness in >> MapReduce mode. 2) In PigMapBase.java: public static final String END_OF_INP_IN_MAP = "pig.stream.in.map"; can change to public static final String END_OF_INP_IN_MAP = "pig.blocking.operator.in.map"; and this should be put as a public static member of JobControlCompiler. In JobControlCompiler.java, jobConf.set("pig.stream.in.map", "true"); should change to use the above public static String. >> Will update this in new patch. 3) Remove the following comment in QueryParser.jjt (line 302): * Join parser. Currently can only handle skewed joins. >> Will be removed in next patch. 4) In QueryParser.jjt the joinPlans passed to LOJoin constructor is not a LinkedMultiMap but in LogToPhyTranslationVistior the join plans are put in a LinkedMultiMap. If order is important, shouldn't QueryParser.jjt also change? >> Good catch. Order is indeed important. Will fix this in next patch. 5) Some comments in LogToPhyTranslationVisitor about the different lists and maps would help >> those lists and maps were there earlier also, I didnt introduce anything >> new. I just moved them around :) But I agree that section needs to be >> documented better. Also took me a while to get my head around it. Will >> include comment about purpose of each in next patch. 6) In validateMergeJoin() - the code only considers direct successors and predecessors of LOJoin. It should check the entire plan and ensure that predecessors of LOJoin all the way to the LOLoad are only LOForEach and LOFilter. Strictly we should not allow LOForeach since it could change sort order or position of join keys and hence invalidate the index - but we need it so that the Foreach introduced by the TypeCastInserter when there is a schema for either of the inputs remains. You should note in the documentation that only Order and join key position preserving Foreachs and Filters are allowed as predecessors to merge join and check the same in validateMergeJoin() - it is better to use a whitelist of allowed operators than a blacklist of disallowed once (since then the blacklist would need to be updated anytime a new operator comes along. The exception source here is not really a bug but a user input error since merge join really doesnot support other ops. Again for the successor, all successors from mergejoin down to map leaf should be checked to ensure stream is absent (really there should be no restriction on stream being present after the join - if there is an issue currently with this, it is fine to not allow stream but eventually it would be good to not have any restriction on what follows the merge join). You can just use a visitor to check presence of stream in the plan - this should be done after complete LogToPhyTranslation is done - in visit() so that the whole plan can be looked at. >> Agreed. I fixed the bug for Streaming. Now there is no restriction for what >> follows Merge Join. For predecessors, I included new function which walks >> all the way up to make sure operators preceding merge join are the only the >> ones among the whitelist of LOLoad or LOForEach or LOFilter. 7) Is MRStreamHandler.java now replaced by /org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/plans/EndOfAllInputSetter.java ? >> Yes. 8) Some of MRCompilerExceptions do not follow the Error handling spec - errcode, errMsg, Src >> Will update them. 9) Should assert() statements in MRCompiler be replaced with Exceptions since assertions are disabled by default in Java. >> Will update them. 10) In MRCompiler.java I wonder if you should change rightMapPlan.disconnect(rightLoader, loadSucc); rightMapPlan.remove(loadSucc); to rightMapPlan.trimBelow(rightLoader); We really want to remove all operators in rightMapPlan other than the loader. >> Didn't know about this function. This indeed is the one which is needed here. 11) We should note in d
Build failed in Hudson: Pig-Patch-minerva.apache.org #159
See http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/159/ -- [...truncated 102729 lines...] [exec] [junit] 09/08/12 03:47:23 INFO dfs.DataNode: PacketResponder 1 for block blk_1014328216252725665_1010 terminating [exec] [junit] 09/08/12 03:47:23 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:38309 is added to blk_1014328216252725665_1010 size 6 [exec] [junit] 09/08/12 03:47:23 INFO dfs.DataNode: Received block blk_1014328216252725665_1010 of size 6 from /127.0.0.1 [exec] [junit] 09/08/12 03:47:23 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:40130 is added to blk_1014328216252725665_1010 size 6 [exec] [junit] 09/08/12 03:47:23 INFO dfs.DataNode: PacketResponder 2 for block blk_1014328216252725665_1010 terminating [exec] [junit] 09/08/12 03:47:23 INFO dfs.StateChange: BLOCK* NameSystem.allocateBlock: /user/hudson/input2.txt. blk_-2485811821289249348_1011 [exec] [junit] 09/08/12 03:47:23 INFO dfs.DataNode: Receiving block blk_-2485811821289249348_1011 src: /127.0.0.1:41887 dest: /127.0.0.1:38309 [exec] [junit] 09/08/12 03:47:23 INFO dfs.DataNode: Receiving block blk_-2485811821289249348_1011 src: /127.0.0.1:59131 dest: /127.0.0.1:57055 [exec] [junit] 09/08/12 03:47:23 INFO dfs.DataNode: Receiving block blk_-2485811821289249348_1011 src: /127.0.0.1:37202 dest: /127.0.0.1:41872 [exec] [junit] 09/08/12 03:47:23 INFO dfs.DataNode: Received block blk_-2485811821289249348_1011 of size 6 from /127.0.0.1 [exec] [junit] 09/08/12 03:47:23 INFO dfs.DataNode: PacketResponder 0 for block blk_-2485811821289249348_1011 terminating [exec] [junit] 09/08/12 03:47:23 INFO dfs.DataNode: Received block blk_-2485811821289249348_1011 of size 6 from /127.0.0.1 [exec] [junit] 09/08/12 03:47:23 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:41872 is added to blk_-2485811821289249348_1011 size 6 [exec] [junit] 09/08/12 03:47:23 INFO dfs.DataNode: PacketResponder 1 for block blk_-2485811821289249348_1011 terminating [exec] [junit] 09/08/12 03:47:23 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:57055 is added to blk_-2485811821289249348_1011 size 6 [exec] [junit] 09/08/12 03:47:23 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:38309 is added to blk_-2485811821289249348_1011 size 6 [exec] [junit] 09/08/12 03:47:23 INFO dfs.DataNode: Received block blk_-2485811821289249348_1011 of size 6 from /127.0.0.1 [exec] [junit] 09/08/12 03:47:23 INFO dfs.DataNode: PacketResponder 2 for block blk_-2485811821289249348_1011 terminating [exec] [junit] 09/08/12 03:47:23 INFO executionengine.HExecutionEngine: Connecting to hadoop file system at: hdfs://localhost:47237 [exec] [junit] 09/08/12 03:47:23 INFO executionengine.HExecutionEngine: Connecting to map-reduce job tracker at: localhost:49986 [exec] [junit] 09/08/12 03:47:23 INFO mapReduceLayer.MultiQueryOptimizer: MR plan size before optimization: 1 [exec] [junit] 09/08/12 03:47:23 INFO mapReduceLayer.MultiQueryOptimizer: MR plan size after optimization: 1 [exec] [junit] 09/08/12 03:47:23 INFO dfs.DataNode: Deleting block blk_-6228595752809205073_1004 file dfs/data/data2/current/blk_-6228595752809205073 [exec] [junit] 09/08/12 03:47:23 INFO dfs.DataNode: Deleting block blk_579166466543892216_1006 file dfs/data/data1/current/blk_579166466543892216 [exec] [junit] 09/08/12 03:47:24 INFO mapReduceLayer.JobControlCompiler: Setting up single store job [exec] [junit] 09/08/12 03:47:24 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. [exec] [junit] 09/08/12 03:47:24 INFO dfs.StateChange: BLOCK* NameSystem.allocateBlock: /tmp/hadoop-hudson/mapred/system/job_200908120346_0002/job.jar. blk_-1819354404747182069_1012 [exec] [junit] 09/08/12 03:47:24 INFO dfs.DataNode: Receiving block blk_-1819354404747182069_1012 src: /127.0.0.1:41890 dest: /127.0.0.1:38309 [exec] [junit] 09/08/12 03:47:24 INFO dfs.DataNode: Receiving block blk_-1819354404747182069_1012 src: /127.0.0.1:59134 dest: /127.0.0.1:57055 [exec] [junit] 09/08/12 03:47:24 INFO dfs.DataNode: Receiving block blk_-1819354404747182069_1012 src: /127.0.0.1:33222 dest: /127.0.0.1:40130 [exec] [junit] 09/08/12 03:47:24 INFO dfs.DataNode: Received block blk_-1819354404747182069_1012 of size 1480653 from /127.0.0.1 [exec] [junit] 09/08/12 03:47:24 INFO dfs.DataNode: PacketResponder 0 for block blk_-1819354404747182069_1012 terminating [exec] [junit] 09/08/12 03:47:24 INFO dfs.DataNode: Received block blk_-181935
[jira] Commented: (PIG-890) Create a sampler interface and improve the skewed join sampler
[ https://issues.apache.org/jira/browse/PIG-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742203#action_12742203 ] Hadoop QA commented on PIG-890: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12416267/sampler.patch against trunk revision 803312. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/159/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/159/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/159/console This message is automatically generated. > Create a sampler interface and improve the skewed join sampler > -- > > Key: PIG-890 > URL: https://issues.apache.org/jira/browse/PIG-890 > Project: Pig > Issue Type: Improvement >Reporter: Sriranjan Manjunath > Attachments: sampler.patch > > > We need a different sampler for order by and skewed join. We thus need a > better sampling interface. The design of the same is described here: > http://wiki.apache.org/pig/PigSampler -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-833) Storage access layer
[ https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742201#action_12742201 ] Jay Tang commented on PIG-833: -- Zebra has a dependency on TFile that is available in Hadoop 20; that's why the compilation instruction is more complicated. A new wiki at http://wiki.apache.org/pig/zebra will provide more information on Zebra. > Storage access layer > > > Key: PIG-833 > URL: https://issues.apache.org/jira/browse/PIG-833 > Project: Pig > Issue Type: New Feature >Reporter: Jay Tang > Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, > PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, > TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz > > > A layer is needed to provide a high level data access abstraction and a > tabular view of data in Hadoop, and could free Pig users from implementing > their own data storage/retrieval code. This layer should also include a > columnar storage format in order to provide fast data projection, > CPU/space-efficient data serialization, and a schema language to manage > physical storage metadata. Eventually it could also support predicate > pushdown for further performance improvement. Initially, this layer could be > a contrib project in Pig and become a hadoop subproject later on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-833) Storage access layer
[ https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742170#action_12742170 ] Dmitriy V. Ryaboy commented on PIG-833: --- Alan, this means Pig contrib/ is no longer compatible with Hadoop 18. Which probably means that you need to either rolls this back or roll 660 in (and add the hadoop20.jar file to lib/ ) Otherwise the build is broken. > Storage access layer > > > Key: PIG-833 > URL: https://issues.apache.org/jira/browse/PIG-833 > Project: Pig > Issue Type: New Feature >Reporter: Jay Tang > Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, > PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, > TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz > > > A layer is needed to provide a high level data access abstraction and a > tabular view of data in Hadoop, and could free Pig users from implementing > their own data storage/retrieval code. This layer should also include a > columnar storage format in order to provide fast data projection, > CPU/space-efficient data serialization, and a schema language to manage > physical storage metadata. Eventually it could also support predicate > pushdown for further performance improvement. Initially, this layer could be > a contrib project in Pig and become a hadoop subproject later on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-913) Error in Pig script when grouping on chararray column
[ https://issues.apache.org/jira/browse/PIG-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742168#action_12742168 ] Hadoop QA commented on PIG-913: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12416263/PIG-913.patch against trunk revision 803312. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/158/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/158/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/158/console This message is automatically generated. > Error in Pig script when grouping on chararray column > - > > Key: PIG-913 > URL: https://issues.apache.org/jira/browse/PIG-913 > Project: Pig > Issue Type: Bug >Affects Versions: 0.4.0 >Reporter: Viraj Bhat >Priority: Critical > Fix For: 0.4.0 > > Attachments: PIG-913.patch > > > I have a very simple script which fails at parsetime due to the schema I > specified in the loader. > {code} > data = LOAD '/user/viraj/studenttab10k' AS (s:chararray); > dataSmall = limit data 100; > bb = GROUP dataSmall by $0; > dump bb; > {code} > = > 2009-08-06 18:47:56,297 [main] INFO org.apache.pig.Main - Logging error > messages to: /homes/viraj/pig-svn/trunk/pig_1249609676296.log > 09/08/06 18:47:56 INFO pig.Main: Logging error messages to: > /homes/viraj/pig-svn/trunk/pig_1249609676296.log > 2009-08-06 18:47:56,459 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting > to hadoop file system at: hdfs://localhost:9000 > 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to hadoop > file system at: hdfs://localhost:9000 > 2009-08-06 18:47:56,694 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting > to map-reduce job tracker at: localhost:9001 > 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to > map-reduce job tracker at: localhost:9001 > 2009-08-06 18:47:57,008 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 1002: Unable to store alias bb > 09/08/06 18:47:57 ERROR grunt.Grunt: ERROR 1002: Unable to store alias bb > Details at logfile: /homes/viraj/pig-svn/trunk/pig_1249609676296.log > = > = > Pig Stack Trace > --- > ERROR 1002: Unable to store alias bb > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to > open iterator for alias bb > at org.apache.pig.PigServer.openIterator(PigServer.java:481) > at > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:531) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) > at org.apache.pig.Main.main(Main.java:397) > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: > Unable to store alias bb > at org.apache.pig.PigServer.store(PigServer.java:536) > at org.apache.pig.PigServer.openIterator(PigServer.java:464) > ... 6 more > Caused by: java.lang.NullPointerException > at > org.apache.pig.impl.logicalLayer.LOCogroup.unsetSchema(LOCogroup.java:359) > at > org.apache.pig.impl.logicalLayer.optimizer.SchemaRemover.visit(SchemaRemover.java:64) > at > org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:335) > at or
Build failed in Hudson: Pig-Patch-minerva.apache.org #158
See http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/158/changes Changes: [gates] PIG-833: Added Zebra, new columnar storage mechanism for HDFS. -- [...truncated 103108 lines...] [exec] [junit] 09/08/12 01:19:32 INFO mapReduceLayer.MultiQueryOptimizer: MR plan size before optimization: 1 [exec] [junit] 09/08/12 01:19:32 INFO mapReduceLayer.MultiQueryOptimizer: MR plan size after optimization: 1 [exec] [junit] 09/08/12 01:19:32 WARN dfs.DataNode: Unexpected error trying to delete block blk_-1535404250649000663_1004. BlockInfo not found in volumeMap. [exec] [junit] 09/08/12 01:19:32 INFO dfs.DataNode: Deleting block blk_4954179736192186775_1006 file dfs/data/data8/current/blk_4954179736192186775 [exec] [junit] 09/08/12 01:19:32 WARN dfs.DataNode: java.io.IOException: Error in deleting blocks. [exec] [junit] at org.apache.hadoop.dfs.FSDataset.invalidate(FSDataset.java:1146) [exec] [junit] at org.apache.hadoop.dfs.DataNode.processCommand(DataNode.java:793) [exec] [junit] at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:663) [exec] [junit] at org.apache.hadoop.dfs.DataNode.run(DataNode.java:2888) [exec] [junit] at java.lang.Thread.run(Thread.java:619) [exec] [junit] [exec] [junit] 09/08/12 01:19:33 INFO mapReduceLayer.JobControlCompiler: Setting up single store job [exec] [junit] 09/08/12 01:19:33 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. [exec] [junit] 09/08/12 01:19:33 INFO dfs.StateChange: BLOCK* NameSystem.allocateBlock: /tmp/hadoop-hudson/mapred/system/job_200908120118_0002/job.jar. blk_2669403222345271811_1012 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: Receiving block blk_2669403222345271811_1012 src: /127.0.0.1:58050 dest: /127.0.0.1:40049 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: Receiving block blk_2669403222345271811_1012 src: /127.0.0.1:38276 dest: /127.0.0.1:54901 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: Receiving block blk_2669403222345271811_1012 src: /127.0.0.1:48397 dest: /127.0.0.1:34055 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: Received block blk_2669403222345271811_1012 of size 1476187 from /127.0.0.1 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: PacketResponder 0 for block blk_2669403222345271811_1012 terminating [exec] [junit] 09/08/12 01:19:33 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:34055 is added to blk_2669403222345271811_1012 size 1476187 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: Received block blk_2669403222345271811_1012 of size 1476187 from /127.0.0.1 [exec] [junit] 09/08/12 01:19:33 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:54901 is added to blk_2669403222345271811_1012 size 1476187 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: PacketResponder 1 for block blk_2669403222345271811_1012 terminating [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: Received block blk_2669403222345271811_1012 of size 1476187 from /127.0.0.1 [exec] [junit] 09/08/12 01:19:33 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:40049 is added to blk_2669403222345271811_1012 size 1476187 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: PacketResponder 2 for block blk_2669403222345271811_1012 terminating [exec] [junit] 09/08/12 01:19:33 INFO fs.FSNamesystem: Increasing replication for file /tmp/hadoop-hudson/mapred/system/job_200908120118_0002/job.jar. New replication is 2 [exec] [junit] 09/08/12 01:19:33 INFO fs.FSNamesystem: Reducing replication for file /tmp/hadoop-hudson/mapred/system/job_200908120118_0002/job.jar. New replication is 2 [exec] [junit] 09/08/12 01:19:33 INFO dfs.StateChange: BLOCK* NameSystem.allocateBlock: /tmp/hadoop-hudson/mapred/system/job_200908120118_0002/job.split. blk_-777871427035102840_1013 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: Receiving block blk_-777871427035102840_1013 src: /127.0.0.1:48398 dest: /127.0.0.1:34055 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: Receiving block blk_-777871427035102840_1013 src: /127.0.0.1:58054 dest: /127.0.0.1:40049 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: Receiving block blk_-777871427035102840_1013 src: /127.0.0.1:38280 dest: /127.0.0.1:54901 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: Received block blk_-777871427035102840_1013 of size 1837 from /127.0.0.1 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: PacketResponder 0 for block blk_-777871427035102840_1013 terminating [exe
[jira] Commented: (PIG-913) Error in Pig script when grouping on chararray column
[ https://issues.apache.org/jira/browse/PIG-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742150#action_12742150 ] Santhosh Srinivasan commented on PIG-913: - +1 for the fix. As Dmitriy indicates, we need new unit test cases after Hudson verifies the patch. > Error in Pig script when grouping on chararray column > - > > Key: PIG-913 > URL: https://issues.apache.org/jira/browse/PIG-913 > Project: Pig > Issue Type: Bug >Affects Versions: 0.4.0 >Reporter: Viraj Bhat >Priority: Critical > Fix For: 0.4.0 > > Attachments: PIG-913.patch > > > I have a very simple script which fails at parsetime due to the schema I > specified in the loader. > {code} > data = LOAD '/user/viraj/studenttab10k' AS (s:chararray); > dataSmall = limit data 100; > bb = GROUP dataSmall by $0; > dump bb; > {code} > = > 2009-08-06 18:47:56,297 [main] INFO org.apache.pig.Main - Logging error > messages to: /homes/viraj/pig-svn/trunk/pig_1249609676296.log > 09/08/06 18:47:56 INFO pig.Main: Logging error messages to: > /homes/viraj/pig-svn/trunk/pig_1249609676296.log > 2009-08-06 18:47:56,459 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting > to hadoop file system at: hdfs://localhost:9000 > 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to hadoop > file system at: hdfs://localhost:9000 > 2009-08-06 18:47:56,694 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting > to map-reduce job tracker at: localhost:9001 > 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to > map-reduce job tracker at: localhost:9001 > 2009-08-06 18:47:57,008 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 1002: Unable to store alias bb > 09/08/06 18:47:57 ERROR grunt.Grunt: ERROR 1002: Unable to store alias bb > Details at logfile: /homes/viraj/pig-svn/trunk/pig_1249609676296.log > = > = > Pig Stack Trace > --- > ERROR 1002: Unable to store alias bb > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to > open iterator for alias bb > at org.apache.pig.PigServer.openIterator(PigServer.java:481) > at > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:531) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) > at org.apache.pig.Main.main(Main.java:397) > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: > Unable to store alias bb > at org.apache.pig.PigServer.store(PigServer.java:536) > at org.apache.pig.PigServer.openIterator(PigServer.java:464) > ... 6 more > Caused by: java.lang.NullPointerException > at > org.apache.pig.impl.logicalLayer.LOCogroup.unsetSchema(LOCogroup.java:359) > at > org.apache.pig.impl.logicalLayer.optimizer.SchemaRemover.visit(SchemaRemover.java:64) > at > org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:335) > at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:46) > at > org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68) > at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) > at > org.apache.pig.impl.logicalLayer.optimizer.LogicalTransformer.rebuildSchemas(LogicalTransformer.java:67) > at > org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:187) > at org.apache.pig.PigServer.compileLp(PigServer.java:854) > at org.apache.pig.PigServer.compileLp(PigServer.java:791) > at org.apache.pig.PigServer.store(PigServer.java:509) > ... 7 more > = -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-893) support cast of chararray to other simple types
[ https://issues.apache.org/jira/browse/PIG-893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742144#action_12742144 ] Alan Gates commented on PIG-893: I'm reviewing this patch. > support cast of chararray to other simple types > --- > > Key: PIG-893 > URL: https://issues.apache.org/jira/browse/PIG-893 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.4.0 >Reporter: Thejas M Nair >Assignee: Jeff Zhang > Fix For: 0.4.0 > > Attachments: Pig_893.Patch > > > Pig should support casting of chararray to > integer,long,float,double,bytearray. If the conversion fails for reasons such > as overflow, cast should return null and log a warning. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-907) Provide multiple version of HashFNV (Piggybank)
[ https://issues.apache.org/jira/browse/PIG-907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742137#action_12742137 ] Olga Natkovich commented on PIG-907: +1 > Provide multiple version of HashFNV (Piggybank) > --- > > Key: PIG-907 > URL: https://issues.apache.org/jira/browse/PIG-907 > Project: Pig > Issue Type: Improvement > Components: impl >Affects Versions: 0.3.0 >Reporter: Daniel Dai >Priority: Minor > Fix For: 0.4.0 > > Attachments: PIG-907-1.patch, PIG-907-2.patch > > > HashFNV takes 1 or 2 parameters. It is better to create 2 versions of HashFNV > when PIG-902 is not solved. So we can let the Pig pick the right version, do > the type cast. Otherwise, user have to do the explicit cast. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-890) Create a sampler interface and improve the skewed join sampler
[ https://issues.apache.org/jira/browse/PIG-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriranjan Manjunath updated PIG-890: Attachment: sampler.patch Made some constants static to clear the findbugs warnings. This patch does not warrant a new test case since it only affects the performance of the skewed join sampler and SkewedJoin test case already handles the correctness of the join. > Create a sampler interface and improve the skewed join sampler > -- > > Key: PIG-890 > URL: https://issues.apache.org/jira/browse/PIG-890 > Project: Pig > Issue Type: Improvement >Reporter: Sriranjan Manjunath > Attachments: sampler.patch > > > We need a different sampler for order by and skewed join. We thus need a > better sampling interface. The design of the same is described here: > http://wiki.apache.org/pig/PigSampler -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-890) Create a sampler interface and improve the skewed join sampler
[ https://issues.apache.org/jira/browse/PIG-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriranjan Manjunath updated PIG-890: Status: Patch Available (was: Open) > Create a sampler interface and improve the skewed join sampler > -- > > Key: PIG-890 > URL: https://issues.apache.org/jira/browse/PIG-890 > Project: Pig > Issue Type: Improvement >Reporter: Sriranjan Manjunath > Attachments: sampler.patch > > > We need a different sampler for order by and skewed join. We thus need a > better sampling interface. The design of the same is described here: > http://wiki.apache.org/pig/PigSampler -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-890) Create a sampler interface and improve the skewed join sampler
[ https://issues.apache.org/jira/browse/PIG-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742136#action_12742136 ] Sriranjan Manjunath commented on PIG-890: - Let me know if you think that this requires a test case and I will be happy to include it. > Create a sampler interface and improve the skewed join sampler > -- > > Key: PIG-890 > URL: https://issues.apache.org/jira/browse/PIG-890 > Project: Pig > Issue Type: Improvement >Reporter: Sriranjan Manjunath > Attachments: sampler.patch > > > We need a different sampler for order by and skewed join. We thus need a > better sampling interface. The design of the same is described here: > http://wiki.apache.org/pig/PigSampler -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-890) Create a sampler interface and improve the skewed join sampler
[ https://issues.apache.org/jira/browse/PIG-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriranjan Manjunath updated PIG-890: Attachment: (was: sampler.patch) > Create a sampler interface and improve the skewed join sampler > -- > > Key: PIG-890 > URL: https://issues.apache.org/jira/browse/PIG-890 > Project: Pig > Issue Type: Improvement >Reporter: Sriranjan Manjunath > Attachments: sampler.patch > > > We need a different sampler for order by and skewed join. We thus need a > better sampling interface. The design of the same is described here: > http://wiki.apache.org/pig/PigSampler -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-913) Error in Pig script when grouping on chararray column
[ https://issues.apache.org/jira/browse/PIG-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742124#action_12742124 ] Daniel Dai commented on PIG-913: Thanks, Dmitriy, I will put unit test. I submit it first to see if it broke any existing unit test first. > Error in Pig script when grouping on chararray column > - > > Key: PIG-913 > URL: https://issues.apache.org/jira/browse/PIG-913 > Project: Pig > Issue Type: Bug >Affects Versions: 0.4.0 >Reporter: Viraj Bhat >Priority: Critical > Fix For: 0.4.0 > > Attachments: PIG-913.patch > > > I have a very simple script which fails at parsetime due to the schema I > specified in the loader. > {code} > data = LOAD '/user/viraj/studenttab10k' AS (s:chararray); > dataSmall = limit data 100; > bb = GROUP dataSmall by $0; > dump bb; > {code} > = > 2009-08-06 18:47:56,297 [main] INFO org.apache.pig.Main - Logging error > messages to: /homes/viraj/pig-svn/trunk/pig_1249609676296.log > 09/08/06 18:47:56 INFO pig.Main: Logging error messages to: > /homes/viraj/pig-svn/trunk/pig_1249609676296.log > 2009-08-06 18:47:56,459 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting > to hadoop file system at: hdfs://localhost:9000 > 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to hadoop > file system at: hdfs://localhost:9000 > 2009-08-06 18:47:56,694 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting > to map-reduce job tracker at: localhost:9001 > 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to > map-reduce job tracker at: localhost:9001 > 2009-08-06 18:47:57,008 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 1002: Unable to store alias bb > 09/08/06 18:47:57 ERROR grunt.Grunt: ERROR 1002: Unable to store alias bb > Details at logfile: /homes/viraj/pig-svn/trunk/pig_1249609676296.log > = > = > Pig Stack Trace > --- > ERROR 1002: Unable to store alias bb > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to > open iterator for alias bb > at org.apache.pig.PigServer.openIterator(PigServer.java:481) > at > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:531) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) > at org.apache.pig.Main.main(Main.java:397) > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: > Unable to store alias bb > at org.apache.pig.PigServer.store(PigServer.java:536) > at org.apache.pig.PigServer.openIterator(PigServer.java:464) > ... 6 more > Caused by: java.lang.NullPointerException > at > org.apache.pig.impl.logicalLayer.LOCogroup.unsetSchema(LOCogroup.java:359) > at > org.apache.pig.impl.logicalLayer.optimizer.SchemaRemover.visit(SchemaRemover.java:64) > at > org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:335) > at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:46) > at > org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68) > at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) > at > org.apache.pig.impl.logicalLayer.optimizer.LogicalTransformer.rebuildSchemas(LogicalTransformer.java:67) > at > org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:187) > at org.apache.pig.PigServer.compileLp(PigServer.java:854) > at org.apache.pig.PigServer.compileLp(PigServer.java:791) > at org.apache.pig.PigServer.store(PigServer.java:509) > ... 7 more > = -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-890) Create a sampler interface and improve the skewed join sampler
[ https://issues.apache.org/jira/browse/PIG-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriranjan Manjunath updated PIG-890: Status: Open (was: Patch Available) > Create a sampler interface and improve the skewed join sampler > -- > > Key: PIG-890 > URL: https://issues.apache.org/jira/browse/PIG-890 > Project: Pig > Issue Type: Improvement >Reporter: Sriranjan Manjunath > Attachments: sampler.patch > > > We need a different sampler for order by and skewed join. We thus need a > better sampling interface. The design of the same is described here: > http://wiki.apache.org/pig/PigSampler -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-913) Error in Pig script when grouping on chararray column
[ https://issues.apache.org/jira/browse/PIG-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742120#action_12742120 ] Dmitriy V. Ryaboy commented on PIG-913: --- Daniel -- throw in a test to check for optimizer regressions in the future? > Error in Pig script when grouping on chararray column > - > > Key: PIG-913 > URL: https://issues.apache.org/jira/browse/PIG-913 > Project: Pig > Issue Type: Bug >Affects Versions: 0.4.0 >Reporter: Viraj Bhat >Priority: Critical > Fix For: 0.4.0 > > Attachments: PIG-913.patch > > > I have a very simple script which fails at parsetime due to the schema I > specified in the loader. > {code} > data = LOAD '/user/viraj/studenttab10k' AS (s:chararray); > dataSmall = limit data 100; > bb = GROUP dataSmall by $0; > dump bb; > {code} > = > 2009-08-06 18:47:56,297 [main] INFO org.apache.pig.Main - Logging error > messages to: /homes/viraj/pig-svn/trunk/pig_1249609676296.log > 09/08/06 18:47:56 INFO pig.Main: Logging error messages to: > /homes/viraj/pig-svn/trunk/pig_1249609676296.log > 2009-08-06 18:47:56,459 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting > to hadoop file system at: hdfs://localhost:9000 > 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to hadoop > file system at: hdfs://localhost:9000 > 2009-08-06 18:47:56,694 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting > to map-reduce job tracker at: localhost:9001 > 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to > map-reduce job tracker at: localhost:9001 > 2009-08-06 18:47:57,008 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 1002: Unable to store alias bb > 09/08/06 18:47:57 ERROR grunt.Grunt: ERROR 1002: Unable to store alias bb > Details at logfile: /homes/viraj/pig-svn/trunk/pig_1249609676296.log > = > = > Pig Stack Trace > --- > ERROR 1002: Unable to store alias bb > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to > open iterator for alias bb > at org.apache.pig.PigServer.openIterator(PigServer.java:481) > at > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:531) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) > at org.apache.pig.Main.main(Main.java:397) > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: > Unable to store alias bb > at org.apache.pig.PigServer.store(PigServer.java:536) > at org.apache.pig.PigServer.openIterator(PigServer.java:464) > ... 6 more > Caused by: java.lang.NullPointerException > at > org.apache.pig.impl.logicalLayer.LOCogroup.unsetSchema(LOCogroup.java:359) > at > org.apache.pig.impl.logicalLayer.optimizer.SchemaRemover.visit(SchemaRemover.java:64) > at > org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:335) > at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:46) > at > org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68) > at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) > at > org.apache.pig.impl.logicalLayer.optimizer.LogicalTransformer.rebuildSchemas(LogicalTransformer.java:67) > at > org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:187) > at org.apache.pig.PigServer.compileLp(PigServer.java:854) > at org.apache.pig.PigServer.compileLp(PigServer.java:791) > at org.apache.pig.PigServer.store(PigServer.java:509) > ... 7 more > = -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-890) Create a sampler interface and improve the skewed join sampler
[ https://issues.apache.org/jira/browse/PIG-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742118#action_12742118 ] Hadoop QA commented on PIG-890: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12416250/sampler.patch against trunk revision 801865. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 6 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/157/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/157/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/157/console This message is automatically generated. > Create a sampler interface and improve the skewed join sampler > -- > > Key: PIG-890 > URL: https://issues.apache.org/jira/browse/PIG-890 > Project: Pig > Issue Type: Improvement >Reporter: Sriranjan Manjunath > Attachments: sampler.patch > > > We need a different sampler for order by and skewed join. We thus need a > better sampling interface. The design of the same is described here: > http://wiki.apache.org/pig/PigSampler -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Build failed in Hudson: Pig-Patch-minerva.apache.org #157
See http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/157/ -- [...truncated 103063 lines...] [exec] [junit] 09/08/11 23:36:15 INFO dfs.DataNode: Received block blk_-6509224781215538639_1011 of size 6 from /127.0.0.1 [exec] [junit] 09/08/11 23:36:15 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:40940 is added to blk_-6509224781215538639_1011 size 6 [exec] [junit] 09/08/11 23:36:15 INFO dfs.DataNode: PacketResponder 1 for block blk_-6509224781215538639_1011 terminating [exec] [junit] 09/08/11 23:36:15 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:38934 is added to blk_-6509224781215538639_1011 size 6 [exec] [junit] 09/08/11 23:36:15 INFO dfs.DataNode: Received block blk_-6509224781215538639_1011 of size 6 from /127.0.0.1 [exec] [junit] 09/08/11 23:36:15 INFO dfs.DataNode: PacketResponder 2 for block blk_-6509224781215538639_1011 terminating [exec] [junit] 09/08/11 23:36:15 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:56715 is added to blk_-6509224781215538639_1011 size 6 [exec] [junit] 09/08/11 23:36:15 INFO executionengine.HExecutionEngine: Connecting to hadoop file system at: hdfs://localhost:40772 [exec] [junit] 09/08/11 23:36:15 INFO executionengine.HExecutionEngine: Connecting to map-reduce job tracker at: localhost:42304 [exec] [junit] 09/08/11 23:36:15 INFO mapReduceLayer.MultiQueryOptimizer: MR plan size before optimization: 1 [exec] [junit] 09/08/11 23:36:15 INFO mapReduceLayer.MultiQueryOptimizer: MR plan size after optimization: 1 [exec] [junit] 09/08/11 23:36:16 WARN dfs.DataNode: Unexpected error trying to delete block blk_-7801099502017534561_1004. BlockInfo not found in volumeMap. [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: Deleting block blk_-7252209396593481868_1006 file dfs/data/data7/current/blk_-7252209396593481868 [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: Deleting block blk_-1800239565210147527_1005 file dfs/data/data8/current/blk_-1800239565210147527 [exec] [junit] 09/08/11 23:36:16 WARN dfs.DataNode: java.io.IOException: Error in deleting blocks. [exec] [junit] at org.apache.hadoop.dfs.FSDataset.invalidate(FSDataset.java:1146) [exec] [junit] at org.apache.hadoop.dfs.DataNode.processCommand(DataNode.java:793) [exec] [junit] at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:663) [exec] [junit] at org.apache.hadoop.dfs.DataNode.run(DataNode.java:2888) [exec] [junit] at java.lang.Thread.run(Thread.java:619) [exec] [junit] [exec] [junit] 09/08/11 23:36:16 INFO mapReduceLayer.JobControlCompiler: Setting up single store job [exec] [junit] 09/08/11 23:36:16 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. [exec] [junit] 09/08/11 23:36:16 INFO dfs.StateChange: BLOCK* NameSystem.allocateBlock: /tmp/hadoop-hudson/mapred/system/job_200908112335_0002/job.jar. blk_5812011963372313027_1012 [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: Receiving block blk_5812011963372313027_1012 src: /127.0.0.1:56518 dest: /127.0.0.1:37446 [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: Receiving block blk_5812011963372313027_1012 src: /127.0.0.1:53963 dest: /127.0.0.1:40940 [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: Receiving block blk_5812011963372313027_1012 src: /127.0.0.1:36671 dest: /127.0.0.1:56715 [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: Received block blk_5812011963372313027_1012 of size 1480752 from /127.0.0.1 [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: PacketResponder 0 for block blk_5812011963372313027_1012 terminating [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: Received block blk_5812011963372313027_1012 of size 1480752 from /127.0.0.1 [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: PacketResponder 1 for block blk_5812011963372313027_1012 terminating [exec] [junit] 09/08/11 23:36:16 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:56715 is added to blk_5812011963372313027_1012 size 1480752 [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: Received block blk_5812011963372313027_1012 of size 1480752 from /127.0.0.1 [exec] [junit] 09/08/11 23:36:16 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:40940 is added to blk_5812011963372313027_1012 size 1480752 [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: PacketResponder 2 for block blk_5812011963372313027_1012 terminating [exec] [junit] 09/08/11 23:36:16 IN
[jira] Commented: (PIG-913) Error in Pig script when grouping on chararray column
[ https://issues.apache.org/jira/browse/PIG-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742117#action_12742117 ] Daniel Dai commented on PIG-913: The problem is caused by OpLimitOptimizer, which should use a correct way to rewire operators. > Error in Pig script when grouping on chararray column > - > > Key: PIG-913 > URL: https://issues.apache.org/jira/browse/PIG-913 > Project: Pig > Issue Type: Bug >Affects Versions: 0.4.0 >Reporter: Viraj Bhat >Priority: Critical > Fix For: 0.4.0 > > Attachments: PIG-913.patch > > > I have a very simple script which fails at parsetime due to the schema I > specified in the loader. > {code} > data = LOAD '/user/viraj/studenttab10k' AS (s:chararray); > dataSmall = limit data 100; > bb = GROUP dataSmall by $0; > dump bb; > {code} > = > 2009-08-06 18:47:56,297 [main] INFO org.apache.pig.Main - Logging error > messages to: /homes/viraj/pig-svn/trunk/pig_1249609676296.log > 09/08/06 18:47:56 INFO pig.Main: Logging error messages to: > /homes/viraj/pig-svn/trunk/pig_1249609676296.log > 2009-08-06 18:47:56,459 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting > to hadoop file system at: hdfs://localhost:9000 > 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to hadoop > file system at: hdfs://localhost:9000 > 2009-08-06 18:47:56,694 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting > to map-reduce job tracker at: localhost:9001 > 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to > map-reduce job tracker at: localhost:9001 > 2009-08-06 18:47:57,008 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 1002: Unable to store alias bb > 09/08/06 18:47:57 ERROR grunt.Grunt: ERROR 1002: Unable to store alias bb > Details at logfile: /homes/viraj/pig-svn/trunk/pig_1249609676296.log > = > = > Pig Stack Trace > --- > ERROR 1002: Unable to store alias bb > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to > open iterator for alias bb > at org.apache.pig.PigServer.openIterator(PigServer.java:481) > at > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:531) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) > at org.apache.pig.Main.main(Main.java:397) > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: > Unable to store alias bb > at org.apache.pig.PigServer.store(PigServer.java:536) > at org.apache.pig.PigServer.openIterator(PigServer.java:464) > ... 6 more > Caused by: java.lang.NullPointerException > at > org.apache.pig.impl.logicalLayer.LOCogroup.unsetSchema(LOCogroup.java:359) > at > org.apache.pig.impl.logicalLayer.optimizer.SchemaRemover.visit(SchemaRemover.java:64) > at > org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:335) > at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:46) > at > org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68) > at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) > at > org.apache.pig.impl.logicalLayer.optimizer.LogicalTransformer.rebuildSchemas(LogicalTransformer.java:67) > at > org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:187) > at org.apache.pig.PigServer.compileLp(PigServer.java:854) > at org.apache.pig.PigServer.compileLp(PigServer.java:791) > at org.apache.pig.PigServer.store(PigServer.java:509) > ... 7 more > = -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-913) Error in Pig script when grouping on chararray column
[ https://issues.apache.org/jira/browse/PIG-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-913: --- Status: Patch Available (was: Open) > Error in Pig script when grouping on chararray column > - > > Key: PIG-913 > URL: https://issues.apache.org/jira/browse/PIG-913 > Project: Pig > Issue Type: Bug >Affects Versions: 0.4.0 >Reporter: Viraj Bhat >Priority: Critical > Fix For: 0.4.0 > > Attachments: PIG-913.patch > > > I have a very simple script which fails at parsetime due to the schema I > specified in the loader. > {code} > data = LOAD '/user/viraj/studenttab10k' AS (s:chararray); > dataSmall = limit data 100; > bb = GROUP dataSmall by $0; > dump bb; > {code} > = > 2009-08-06 18:47:56,297 [main] INFO org.apache.pig.Main - Logging error > messages to: /homes/viraj/pig-svn/trunk/pig_1249609676296.log > 09/08/06 18:47:56 INFO pig.Main: Logging error messages to: > /homes/viraj/pig-svn/trunk/pig_1249609676296.log > 2009-08-06 18:47:56,459 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting > to hadoop file system at: hdfs://localhost:9000 > 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to hadoop > file system at: hdfs://localhost:9000 > 2009-08-06 18:47:56,694 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting > to map-reduce job tracker at: localhost:9001 > 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to > map-reduce job tracker at: localhost:9001 > 2009-08-06 18:47:57,008 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 1002: Unable to store alias bb > 09/08/06 18:47:57 ERROR grunt.Grunt: ERROR 1002: Unable to store alias bb > Details at logfile: /homes/viraj/pig-svn/trunk/pig_1249609676296.log > = > = > Pig Stack Trace > --- > ERROR 1002: Unable to store alias bb > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to > open iterator for alias bb > at org.apache.pig.PigServer.openIterator(PigServer.java:481) > at > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:531) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) > at org.apache.pig.Main.main(Main.java:397) > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: > Unable to store alias bb > at org.apache.pig.PigServer.store(PigServer.java:536) > at org.apache.pig.PigServer.openIterator(PigServer.java:464) > ... 6 more > Caused by: java.lang.NullPointerException > at > org.apache.pig.impl.logicalLayer.LOCogroup.unsetSchema(LOCogroup.java:359) > at > org.apache.pig.impl.logicalLayer.optimizer.SchemaRemover.visit(SchemaRemover.java:64) > at > org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:335) > at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:46) > at > org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68) > at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) > at > org.apache.pig.impl.logicalLayer.optimizer.LogicalTransformer.rebuildSchemas(LogicalTransformer.java:67) > at > org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:187) > at org.apache.pig.PigServer.compileLp(PigServer.java:854) > at org.apache.pig.PigServer.compileLp(PigServer.java:791) > at org.apache.pig.PigServer.store(PigServer.java:509) > ... 7 more > = -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-913) Error in Pig script when grouping on chararray column
[ https://issues.apache.org/jira/browse/PIG-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-913: --- Attachment: PIG-913.patch > Error in Pig script when grouping on chararray column > - > > Key: PIG-913 > URL: https://issues.apache.org/jira/browse/PIG-913 > Project: Pig > Issue Type: Bug >Affects Versions: 0.4.0 >Reporter: Viraj Bhat >Priority: Critical > Fix For: 0.4.0 > > Attachments: PIG-913.patch > > > I have a very simple script which fails at parsetime due to the schema I > specified in the loader. > {code} > data = LOAD '/user/viraj/studenttab10k' AS (s:chararray); > dataSmall = limit data 100; > bb = GROUP dataSmall by $0; > dump bb; > {code} > = > 2009-08-06 18:47:56,297 [main] INFO org.apache.pig.Main - Logging error > messages to: /homes/viraj/pig-svn/trunk/pig_1249609676296.log > 09/08/06 18:47:56 INFO pig.Main: Logging error messages to: > /homes/viraj/pig-svn/trunk/pig_1249609676296.log > 2009-08-06 18:47:56,459 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting > to hadoop file system at: hdfs://localhost:9000 > 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to hadoop > file system at: hdfs://localhost:9000 > 2009-08-06 18:47:56,694 [main] INFO > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting > to map-reduce job tracker at: localhost:9001 > 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to > map-reduce job tracker at: localhost:9001 > 2009-08-06 18:47:57,008 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR > 1002: Unable to store alias bb > 09/08/06 18:47:57 ERROR grunt.Grunt: ERROR 1002: Unable to store alias bb > Details at logfile: /homes/viraj/pig-svn/trunk/pig_1249609676296.log > = > = > Pig Stack Trace > --- > ERROR 1002: Unable to store alias bb > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to > open iterator for alias bb > at org.apache.pig.PigServer.openIterator(PigServer.java:481) > at > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:531) > at > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165) > at > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) > at org.apache.pig.Main.main(Main.java:397) > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: > Unable to store alias bb > at org.apache.pig.PigServer.store(PigServer.java:536) > at org.apache.pig.PigServer.openIterator(PigServer.java:464) > ... 6 more > Caused by: java.lang.NullPointerException > at > org.apache.pig.impl.logicalLayer.LOCogroup.unsetSchema(LOCogroup.java:359) > at > org.apache.pig.impl.logicalLayer.optimizer.SchemaRemover.visit(SchemaRemover.java:64) > at > org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:335) > at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:46) > at > org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68) > at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) > at > org.apache.pig.impl.logicalLayer.optimizer.LogicalTransformer.rebuildSchemas(LogicalTransformer.java:67) > at > org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:187) > at org.apache.pig.PigServer.compileLp(PigServer.java:854) > at org.apache.pig.PigServer.compileLp(PigServer.java:791) > at org.apache.pig.PigServer.store(PigServer.java:509) > ... 7 more > = -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-833) Storage access layer
[ https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742100#action_12742100 ] Alan Gates commented on PIG-833: Patch checked in. All the unit tests passed. > Storage access layer > > > Key: PIG-833 > URL: https://issues.apache.org/jira/browse/PIG-833 > Project: Pig > Issue Type: New Feature >Reporter: Jay Tang > Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, > PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, > TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz > > > A layer is needed to provide a high level data access abstraction and a > tabular view of data in Hadoop, and could free Pig users from implementing > their own data storage/retrieval code. This layer should also include a > columnar storage format in order to provide fast data projection, > CPU/space-efficient data serialization, and a schema language to manage > physical storage metadata. Eventually it could also support predicate > pushdown for further performance improvement. Initially, this layer could be > a contrib project in Pig and become a hadoop subproject later on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-833) Storage access layer
[ https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742093#action_12742093 ] Alan Gates commented on PIG-833: My bad. I missed the line in the instructions where it said to apply the PIG-660 patch. I applied that and am trying again. > Storage access layer > > > Key: PIG-833 > URL: https://issues.apache.org/jira/browse/PIG-833 > Project: Pig > Issue Type: New Feature >Reporter: Jay Tang > Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, > PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, > TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz > > > A layer is needed to provide a high level data access abstraction and a > tabular view of data in Hadoop, and could free Pig users from implementing > their own data storage/retrieval code. This layer should also include a > columnar storage format in order to provide fast data projection, > CPU/space-efficient data serialization, and a schema language to manage > physical storage metadata. Eventually it could also support predicate > pushdown for further performance improvement. Initially, this layer could be > a contrib project in Pig and become a hadoop subproject later on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-833) Storage access layer
[ https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742083#action_12742083 ] Dmitriy V. Ryaboy commented on PIG-833: --- Alan -- if it's not finding .dfs , it's probably not linking hadoop20.jar Try my patch in 660 :-) > Storage access layer > > > Key: PIG-833 > URL: https://issues.apache.org/jira/browse/PIG-833 > Project: Pig > Issue Type: New Feature >Reporter: Jay Tang > Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, > PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, > TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz > > > A layer is needed to provide a high level data access abstraction and a > tabular view of data in Hadoop, and could free Pig users from implementing > their own data storage/retrieval code. This layer should also include a > columnar storage format in order to provide fast data projection, > CPU/space-efficient data serialization, and a schema language to manage > physical storage metadata. Eventually it could also support predicate > pushdown for further performance improvement. Initially, this layer could be > a contrib project in Pig and become a hadoop subproject later on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-833) Storage access layer
[ https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-833: --- Attachment: TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt Okay, now that I've first built Pig's test, I run the tests and I get: {code} [delete] Deleting directory /Users/gates/src/pig/apache/top/zebra/trunk/build/contrib/zebra/test/logs [mkdir] Created dir: /Users/gates/src/pig/apache/top/zebra/trunk/build/contrib/zebra/test/logs [junit] Running org.apache.hadoop.zebra.io.TestCheckin [junit] Tests run: 125, Failures: 0, Errors: 0, Time elapsed: 16.894 sec [junit] Running org.apache.hadoop.zebra.mapred.TestCheckin [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 158.741 sec [junit] Running org.apache.hadoop.zebra.pig.TestCheckin1 [junit] Tests run: 0, Failures: 0, Errors: 2, Time elapsed: 0.13 sec [junit] Test org.apache.hadoop.zebra.pig.TestCheckin1 FAILED [junit] Running org.apache.hadoop.zebra.pig.TestCheckin2 [junit] Tests run: 0, Failures: 0, Errors: 2, Time elapsed: 0.131 sec [junit] Test org.apache.hadoop.zebra.pig.TestCheckin2 FAILED [junit] Running org.apache.hadoop.zebra.pig.TestCheckin3 [junit] Tests run: 0, Failures: 0, Errors: 2, Time elapsed: 0.133 sec [junit] Test org.apache.hadoop.zebra.pig.TestCheckin3 FAILED [junit] Running org.apache.hadoop.zebra.pig.TestCheckin4 [junit] Tests run: 0, Failures: 0, Errors: 2, Time elapsed: 0.128 sec [junit] Test org.apache.hadoop.zebra.pig.TestCheckin4 FAILED [junit] Running org.apache.hadoop.zebra.pig.TestCheckin5 [junit] Tests run: 0, Failures: 0, Errors: 2, Time elapsed: 0.128 sec [junit] Test org.apache.hadoop.zebra.pig.TestCheckin5 FAILED [junit] Running org.apache.hadoop.zebra.types.TestCheckin [junit] Tests run: 45, Failures: 0, Errors: 0, Time elapsed: 0.253 sec {code} I've attached the output from one of the tests. > Storage access layer > > > Key: PIG-833 > URL: https://issues.apache.org/jira/browse/PIG-833 > Project: Pig > Issue Type: New Feature >Reporter: Jay Tang > Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, > PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, > TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz > > > A layer is needed to provide a high level data access abstraction and a > tabular view of data in Hadoop, and could free Pig users from implementing > their own data storage/retrieval code. This layer should also include a > columnar storage format in order to provide fast data projection, > CPU/space-efficient data serialization, and a schema language to manage > physical storage metadata. Eventually it could also support predicate > pushdown for further performance improvement. Initially, this layer could be > a contrib project in Pig and become a hadoop subproject later on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-890) Create a sampler interface and improve the skewed join sampler
[ https://issues.apache.org/jira/browse/PIG-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriranjan Manjunath updated PIG-890: Attachment: sampler.patch The attached file has the redesigned sampler interface. Skewed join now uses a trivial implementation of the poisson sampling mechanism. > Create a sampler interface and improve the skewed join sampler > -- > > Key: PIG-890 > URL: https://issues.apache.org/jira/browse/PIG-890 > Project: Pig > Issue Type: Improvement >Reporter: Sriranjan Manjunath > Attachments: sampler.patch > > > We need a different sampler for order by and skewed join. We thus need a > better sampling interface. The design of the same is described here: > http://wiki.apache.org/pig/PigSampler -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-890) Create a sampler interface and improve the skewed join sampler
[ https://issues.apache.org/jira/browse/PIG-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriranjan Manjunath updated PIG-890: Status: Patch Available (was: Open) > Create a sampler interface and improve the skewed join sampler > -- > > Key: PIG-890 > URL: https://issues.apache.org/jira/browse/PIG-890 > Project: Pig > Issue Type: Improvement >Reporter: Sriranjan Manjunath > Attachments: sampler.patch > > > We need a different sampler for order by and skewed join. We thus need a > better sampling interface. The design of the same is described here: > http://wiki.apache.org/pig/PigSampler -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-833) Storage access layer
[ https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742069#action_12742069 ] Raghu Angadi commented on PIG-833: -- Alan, in order to run unit tests you need to build pig test-core. As mentioned in the instructions above please run {{'ant -Dtestcase=none test-core'}} under top level directory before running 'ant test' under contrib/zebra. > Storage access layer > > > Key: PIG-833 > URL: https://issues.apache.org/jira/browse/PIG-833 > Project: Pig > Issue Type: New Feature >Reporter: Jay Tang > Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, > PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, test.out, zebra-javadoc.tgz > > > A layer is needed to provide a high level data access abstraction and a > tabular view of data in Hadoop, and could free Pig users from implementing > their own data storage/retrieval code. This layer should also include a > columnar storage format in order to provide fast data projection, > CPU/space-efficient data serialization, and a schema language to manage > physical storage metadata. Eventually it could also support predicate > pushdown for further performance improvement. Initially, this layer could be > a contrib project in Pig and become a hadoop subproject later on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-833) Storage access layer
[ https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-833: --- Attachment: test.out When I run ant test in contrib/zebra, I get failures. I've attached the output of the command. > Storage access layer > > > Key: PIG-833 > URL: https://issues.apache.org/jira/browse/PIG-833 > Project: Pig > Issue Type: New Feature >Reporter: Jay Tang > Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, > PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, test.out, zebra-javadoc.tgz > > > A layer is needed to provide a high level data access abstraction and a > tabular view of data in Hadoop, and could free Pig users from implementing > their own data storage/retrieval code. This layer should also include a > columnar storage format in order to provide fast data projection, > CPU/space-efficient data serialization, and a schema language to manage > physical storage metadata. Eventually it could also support predicate > pushdown for further performance improvement. Initially, this layer could be > a contrib project in Pig and become a hadoop subproject later on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-833) Storage access layer
[ https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi updated PIG-833: - Attachment: PIG-833-zebra.patch.bz2 > Storage access layer > > > Key: PIG-833 > URL: https://issues.apache.org/jira/browse/PIG-833 > Project: Pig > Issue Type: New Feature >Reporter: Jay Tang > Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, > PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, zebra-javadoc.tgz > > > A layer is needed to provide a high level data access abstraction and a > tabular view of data in Hadoop, and could free Pig users from implementing > their own data storage/retrieval code. This layer should also include a > columnar storage format in order to provide fast data projection, > CPU/space-efficient data serialization, and a schema language to manage > physical storage metadata. Eventually it could also support predicate > pushdown for further performance improvement. Initially, this layer could be > a contrib project in Pig and become a hadoop subproject later on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-833) Storage access layer
[ https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi updated PIG-833: - Attachment: PIG-833-zebra.patch.bz2 Updated patch. Only change is that ant prints a descriptive error to user if hadoop20.jar does not exist in top level lib directory. It lists basic steps to get this built until PIG-660 is committed. > Storage access layer > > > Key: PIG-833 > URL: https://issues.apache.org/jira/browse/PIG-833 > Project: Pig > Issue Type: New Feature >Reporter: Jay Tang > Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, > PIG-833-zebra.patch.bz2, zebra-javadoc.tgz > > > A layer is needed to provide a high level data access abstraction and a > tabular view of data in Hadoop, and could free Pig users from implementing > their own data storage/retrieval code. This layer should also include a > columnar storage format in order to provide fast data projection, > CPU/space-efficient data serialization, and a schema language to manage > physical storage metadata. Eventually it could also support predicate > pushdown for further performance improvement. Initially, this layer could be > a contrib project in Pig and become a hadoop subproject later on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-916) Change the pig hbase interface to get more than one row at a time when scanning
Change the pig hbase interface to get more than one row at a time when scanning --- Key: PIG-916 URL: https://issues.apache.org/jira/browse/PIG-916 Project: Pig Issue Type: Improvement Reporter: Alex Newman Priority: Trivial It should be significantly faster to get numerous rows at the same time rather than one row at a time for large table extraction processes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-916) Change the pig hbase interface to get more than one row at a time when scanning
[ https://issues.apache.org/jira/browse/PIG-916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742008#action_12742008 ] Alex Newman commented on PIG-916: - Feel free to assign this to me. > Change the pig hbase interface to get more than one row at a time when > scanning > --- > > Key: PIG-916 > URL: https://issues.apache.org/jira/browse/PIG-916 > Project: Pig > Issue Type: Improvement >Reporter: Alex Newman >Priority: Trivial > > It should be significantly faster to get numerous rows at the same time > rather than one row at a time for large table extraction processes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-915) Pig HBase
Pig HBase - Key: PIG-915 URL: https://issues.apache.org/jira/browse/PIG-915 Project: Pig Issue Type: Improvement Reporter: Alex Newman Priority: Minor Currently their is no way to get the Row names when doing a query from HBase, we should probably remedy this as important data may be stored there. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-914) Change the PIG hbase interface to use bytes along with strings
[ https://issues.apache.org/jira/browse/PIG-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741997#action_12741997 ] Alex Newman commented on PIG-914: - Someone should assign this to me. > Change the PIG hbase interface to use bytes along with strings > -- > > Key: PIG-914 > URL: https://issues.apache.org/jira/browse/PIG-914 > Project: Pig > Issue Type: Improvement >Reporter: Alex Newman >Priority: Minor > > Currently start rows, tablenames, column names are all strings, and HBase > supports bytes we might want to change the Pig interface to support bytes > along with strings. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-915) Pig HBase
[ https://issues.apache.org/jira/browse/PIG-915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741998#action_12741998 ] Alex Newman commented on PIG-915: - Feel free to assign this to me. > Pig HBase > - > > Key: PIG-915 > URL: https://issues.apache.org/jira/browse/PIG-915 > Project: Pig > Issue Type: Improvement >Reporter: Alex Newman >Priority: Minor > > Currently their is no way to get the Row names when doing a query from HBase, > we should probably remedy this as important data may be stored there. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-914) Change the PIG hbase interface to use bytes along with strings
Change the PIG hbase interface to use bytes along with strings -- Key: PIG-914 URL: https://issues.apache.org/jira/browse/PIG-914 Project: Pig Issue Type: Improvement Reporter: Alex Newman Priority: Minor Currently start rows, tablenames, column names are all strings, and HBase supports bytes we might want to change the Pig interface to support bytes along with strings. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-759) HBaseStorage scheme for Load/Slice function
[ https://issues.apache.org/jira/browse/PIG-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741986#action_12741986 ] Alex Newman commented on PIG-759: - Someone can feel free to assign this to me. I will fix up the syntax also. Should we switch everything from Strings to bytes, is that even possible to pass to with PIG? > HBaseStorage scheme for Load/Slice function > --- > > Key: PIG-759 > URL: https://issues.apache.org/jira/browse/PIG-759 > Project: Pig > Issue Type: Bug >Reporter: Gunther Hagleitner > Attachments: patch.p1 > > > We would like to change the HBaseStorage function to use a scheme when > loading a table in pig. The scheme we are thinking of is: "hbase". So in > order to load an hbase table in a pig script the statement should read: > {noformat} > table = load 'hbase://' using HBaseStorage(); > {noformat} > If the scheme is omitted pig would assume the tablename to be an hdfs path > and the storage function would use the last component of the path as a table > name and output a warning. > For details on why see jira issue: PIG-758 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-759) HBaseStorage scheme for Load/Slice function
[ https://issues.apache.org/jira/browse/PIG-759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Newman updated PIG-759: Attachment: patch.p1 This allows you to select start rows and end rows to filter the table. > HBaseStorage scheme for Load/Slice function > --- > > Key: PIG-759 > URL: https://issues.apache.org/jira/browse/PIG-759 > Project: Pig > Issue Type: Bug >Reporter: Gunther Hagleitner > Attachments: patch.p1 > > > We would like to change the HBaseStorage function to use a scheme when > loading a table in pig. The scheme we are thinking of is: "hbase". So in > order to load an hbase table in a pig script the statement should read: > {noformat} > table = load 'hbase://' using HBaseStorage(); > {noformat} > If the scheme is omitted pig would assume the tablename to be an hdfs path > and the storage function would use the last component of the path as a table > name and output a warning. > For details on why see jira issue: PIG-758 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-893) support cast of chararray to other simple types
[ https://issues.apache.org/jira/browse/PIG-893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741963#action_12741963 ] Hadoop QA commented on PIG-893: --- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12416201/Pig_893.Patch against trunk revision 801865. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/156/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/156/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/156/console This message is automatically generated. > support cast of chararray to other simple types > --- > > Key: PIG-893 > URL: https://issues.apache.org/jira/browse/PIG-893 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.4.0 >Reporter: Thejas M Nair >Assignee: Jeff Zhang > Fix For: 0.4.0 > > Attachments: Pig_893.Patch > > > Pig should support casting of chararray to > integer,long,float,double,bytearray. If the conversion fails for reasons such > as overflow, cast should return null and log a warning. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Hudson build is back to normal: Pig-Patch-minerva.apache.org #156
See http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/156/
[jira] Updated: (PIG-893) support cast of chararray to other simple types
[ https://issues.apache.org/jira/browse/PIG-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated PIG-893: --- Attachment: Pig_893.Patch Updated the patch. 1. Add license header. (for audit warning) 2. Change new Long(long) to Long.valueOf(long) for findbug warning > support cast of chararray to other simple types > --- > > Key: PIG-893 > URL: https://issues.apache.org/jira/browse/PIG-893 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.4.0 >Reporter: Thejas M Nair >Assignee: Jeff Zhang > Fix For: 0.4.0 > > Attachments: Pig_893.Patch > > > Pig should support casting of chararray to > integer,long,float,double,bytearray. If the conversion fails for reasons such > as overflow, cast should return null and log a warning. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-893) support cast of chararray to other simple types
[ https://issues.apache.org/jira/browse/PIG-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated PIG-893: --- Status: Patch Available (was: Open) > support cast of chararray to other simple types > --- > > Key: PIG-893 > URL: https://issues.apache.org/jira/browse/PIG-893 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.4.0 >Reporter: Thejas M Nair >Assignee: Jeff Zhang > Fix For: 0.4.0 > > Attachments: Pig_893.Patch > > > Pig should support casting of chararray to > integer,long,float,double,bytearray. If the conversion fails for reasons such > as overflow, cast should return null and log a warning. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-893) support cast of chararray to other simple types
[ https://issues.apache.org/jira/browse/PIG-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated PIG-893: --- Attachment: (was: Pig_893.Patch) > support cast of chararray to other simple types > --- > > Key: PIG-893 > URL: https://issues.apache.org/jira/browse/PIG-893 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.4.0 >Reporter: Thejas M Nair >Assignee: Jeff Zhang > Fix For: 0.4.0 > > > Pig should support casting of chararray to > integer,long,float,double,bytearray. If the conversion fails for reasons such > as overflow, cast should return null and log a warning. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-893) support cast of chararray to other simple types
[ https://issues.apache.org/jira/browse/PIG-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated PIG-893: --- Status: Open (was: Patch Available) > support cast of chararray to other simple types > --- > > Key: PIG-893 > URL: https://issues.apache.org/jira/browse/PIG-893 > Project: Pig > Issue Type: New Feature >Affects Versions: 0.4.0 >Reporter: Thejas M Nair >Assignee: Jeff Zhang > Fix For: 0.4.0 > > > Pig should support casting of chararray to > integer,long,float,double,bytearray. If the conversion fails for reasons such > as overflow, cast should return null and log a warning. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Hudson build is back to normal: Pig-trunk #519
See http://hudson.zones.apache.org/hudson/job/Pig-trunk/519/
[jira] Commented: (PIG-845) PERFORMANCE: Merge Join
[ https://issues.apache.org/jira/browse/PIG-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741733#action_12741733 ] Ashutosh Chauhan commented on PIG-845: -- Hi Dmitriy, Thanks for review. Please find my comments inline. 1. EndOfAllInput flags - could you add comments here about what the point of this flag is? You explain what EndOfAllInputSetter does (which is actually rather self-explanatory) but not what the meaning of the flag is and how it's used. There is a bit of an explanation in PigMapBase, but it really belongs here. >> EndofAllInput flag is basically a flag to indicate that on close() call of >> map/reduce task, run the pipeline once more. Till now it was used only by >> POStream, but now POMergeJoin also make use of it. 2. Could you explain the relationship between EndOfAllInput and (deleted) POStream? >> POStream is still there, I guess you are referring to MRStreamHandler which >> is deleted. Its renaming of class. Now that POMergeJoin also makes use of >> it, its better to give it a generic name like EndOfAllInput instead of >> MRStreamHandler. 3. Comments in MRCompiler alternate between referring to the left MROp as LeftMROper and curMROper. Choose one. >> Ya, will update the comments. 4. I am curious about the decision to throw compiler exceptions if MergeJoin requirements re number of inputs, etc, aren't satisfied. It seems like a better user experience would be to log a warning and fall back to a regular join. >> Ya, a good suggestion. It would be straight forward to do it while parsing >> (e.g. when there are more then two inputs). Though its not straight forward >> to do at logical to physical plan and physical to MRJobs translation time. 5. Style notes for visitMergeJoin: It's a 200-line method. Any way you can break it up into smaller components? As is, it's hard to follow. >> I can break it up, but that will bloat the MRCompiler class size. Better >> idea is to have MRCompilerHelper or some such class where all the low level >> helper function lives, so that MRCompiler itself is small and thus easier to >> read. The if statements should be broken up into multiple lines to agree with the style guides. Variable naming: you've got topPrj, prj, pkg, lr, ce, nig.. one at a time they are fine, but together in a 200-line method they are undreadable. Please consider more descriptive names. >> Will use more descriptive names in next patch. 6. Kind of a global comment, since it applies to more than just MergeJoin: It seems to me like we need a Builder for operators to clean up some of the new, set, set, set stuff. Having the setters return this and a Plan's add() method return the plan, would let us replace this: POProject topPrj = new POProject(new OperatorKey(scope,nig.getNextNodeId(scope))); topPrj.setColumn(1); topPrj.setResultType(DataType.TUPLE); topPrj.setOverloaded(true); rightMROpr.reducePlan.add(topPrj); rightMROpr.reducePlan.connect(pkg, topPrj); with this: POProject topPrj = new POProject(new OperatorKey(scope,nig.getNextNodeId(scope))) .setColumn(1).setResultType(DataType.TUPLE) .setOverloaded(true); rightMROpr.reducePlan.add(topPrj).connect(pkg, topPrj) >>I agree. At many places there are too many parameters to set. Setters should >>be smart and should return the object instead of being void and then this >>idea of chaining will help to cut down the number of lines. 7. Is the change to List> keyTypes in POFRJoin related to MergeJoin or just rolled in? POFRJoin can do without this change, but to avoid code duplication, I update the POFRJoin to use List> keyTypes. 8. MergeJoin break getNext() into components. >> I dont want to do that because it already has lots of class members which >> are getting updated at various places. Making those variables live in >> multiple functions will make logic even more harder to follow. Also, I am >> not sure if java compiler can always inline the private methods. I don't see you supporting Left outer joins. Plans for that? At least document the planned approach. >> Ya, outer joins are currently not supported. Its documented in >> specification. Will include comment in code also. Error codes being declared deep inside classes, and documented on the wiki, is a poor practice, imo. They should be pulled out into PigErrors (as lightweight final objects that have an error code, a name, and a description..) I thought Santhosh made progress on this already, no? >> Not sure if I understand you completely. I am using ExecException, >> FrontEndException etc. Arent these are lightweight final objects you are >> referring to ? Could you explain the problem with splits and streams? Why can't this work for them? >> Streaming after the join will be supported. There was a bug which I fixed >> and will be a part of next patch. Streaming before Join will not be >> supported because in endOfAllInput case, str