[jira] Commented: (PIG-845) PERFORMANCE: Merge Join
[ https://issues.apache.org/jira/browse/PIG-845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12741733#action_12741733 ] Ashutosh Chauhan commented on PIG-845: -- Hi Dmitriy, Thanks for review. Please find my comments inline. 1. EndOfAllInput flags - could you add comments here about what the point of this flag is? You explain what EndOfAllInputSetter does (which is actually rather self-explanatory) but not what the meaning of the flag is and how it's used. There is a bit of an explanation in PigMapBase, but it really belongs here. EndofAllInput flag is basically a flag to indicate that on close() call of map/reduce task, run the pipeline once more. Till now it was used only by POStream, but now POMergeJoin also make use of it. 2. Could you explain the relationship between EndOfAllInput and (deleted) POStream? POStream is still there, I guess you are referring to MRStreamHandler which is deleted. Its renaming of class. Now that POMergeJoin also makes use of it, its better to give it a generic name like EndOfAllInput instead of MRStreamHandler. 3. Comments in MRCompiler alternate between referring to the left MROp as LeftMROper and curMROper. Choose one. Ya, will update the comments. 4. I am curious about the decision to throw compiler exceptions if MergeJoin requirements re number of inputs, etc, aren't satisfied. It seems like a better user experience would be to log a warning and fall back to a regular join. Ya, a good suggestion. It would be straight forward to do it while parsing (e.g. when there are more then two inputs). Though its not straight forward to do at logical to physical plan and physical to MRJobs translation time. 5. Style notes for visitMergeJoin: It's a 200-line method. Any way you can break it up into smaller components? As is, it's hard to follow. I can break it up, but that will bloat the MRCompiler class size. Better idea is to have MRCompilerHelper or some such class where all the low level helper function lives, so that MRCompiler itself is small and thus easier to read. The if statements should be broken up into multiple lines to agree with the style guides. Variable naming: you've got topPrj, prj, pkg, lr, ce, nig.. one at a time they are fine, but together in a 200-line method they are undreadable. Please consider more descriptive names. Will use more descriptive names in next patch. 6. Kind of a global comment, since it applies to more than just MergeJoin: It seems to me like we need a Builder for operators to clean up some of the new, set, set, set stuff. Having the setters return this and a Plan's add() method return the plan, would let us replace this: POProject topPrj = new POProject(new OperatorKey(scope,nig.getNextNodeId(scope))); topPrj.setColumn(1); topPrj.setResultType(DataType.TUPLE); topPrj.setOverloaded(true); rightMROpr.reducePlan.add(topPrj); rightMROpr.reducePlan.connect(pkg, topPrj); with this: POProject topPrj = new POProject(new OperatorKey(scope,nig.getNextNodeId(scope))) .setColumn(1).setResultType(DataType.TUPLE) .setOverloaded(true); rightMROpr.reducePlan.add(topPrj).connect(pkg, topPrj) I agree. At many places there are too many parameters to set. Setters should be smart and should return the object instead of being void and then this idea of chaining will help to cut down the number of lines. 7. Is the change to ListListByte keyTypes in POFRJoin related to MergeJoin or just rolled in? POFRJoin can do without this change, but to avoid code duplication, I update the POFRJoin to use ListListByte keyTypes. 8. MergeJoin break getNext() into components. I dont want to do that because it already has lots of class members which are getting updated at various places. Making those variables live in multiple functions will make logic even more harder to follow. Also, I am not sure if java compiler can always inline the private methods. I don't see you supporting Left outer joins. Plans for that? At least document the planned approach. Ya, outer joins are currently not supported. Its documented in specification. Will include comment in code also. Error codes being declared deep inside classes, and documented on the wiki, is a poor practice, imo. They should be pulled out into PigErrors (as lightweight final objects that have an error code, a name, and a description..) I thought Santhosh made progress on this already, no? Not sure if I understand you completely. I am using ExecException, FrontEndException etc. Arent these are lightweight final objects you are referring to ? Could you explain the problem with splits and streams? Why can't this work for them? Streaming after the join will be supported. There was a bug which I fixed and will be a part of next patch. Streaming before Join will not be supported because in endOfAllInput case, streaming may potentially produce multiple tuples
Hudson build is back to normal: Pig-trunk #519
See http://hudson.zones.apache.org/hudson/job/Pig-trunk/519/
[jira] Updated: (PIG-893) support cast of chararray to other simple types
[ https://issues.apache.org/jira/browse/PIG-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated PIG-893: --- Status: Open (was: Patch Available) support cast of chararray to other simple types --- Key: PIG-893 URL: https://issues.apache.org/jira/browse/PIG-893 Project: Pig Issue Type: New Feature Affects Versions: 0.4.0 Reporter: Thejas M Nair Assignee: Jeff Zhang Fix For: 0.4.0 Pig should support casting of chararray to integer,long,float,double,bytearray. If the conversion fails for reasons such as overflow, cast should return null and log a warning. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-893) support cast of chararray to other simple types
[ https://issues.apache.org/jira/browse/PIG-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated PIG-893: --- Attachment: (was: Pig_893.Patch) support cast of chararray to other simple types --- Key: PIG-893 URL: https://issues.apache.org/jira/browse/PIG-893 Project: Pig Issue Type: New Feature Affects Versions: 0.4.0 Reporter: Thejas M Nair Assignee: Jeff Zhang Fix For: 0.4.0 Pig should support casting of chararray to integer,long,float,double,bytearray. If the conversion fails for reasons such as overflow, cast should return null and log a warning. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-893) support cast of chararray to other simple types
[ https://issues.apache.org/jira/browse/PIG-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated PIG-893: --- Status: Patch Available (was: Open) support cast of chararray to other simple types --- Key: PIG-893 URL: https://issues.apache.org/jira/browse/PIG-893 Project: Pig Issue Type: New Feature Affects Versions: 0.4.0 Reporter: Thejas M Nair Assignee: Jeff Zhang Fix For: 0.4.0 Attachments: Pig_893.Patch Pig should support casting of chararray to integer,long,float,double,bytearray. If the conversion fails for reasons such as overflow, cast should return null and log a warning. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-893) support cast of chararray to other simple types
[ https://issues.apache.org/jira/browse/PIG-893?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang updated PIG-893: --- Attachment: Pig_893.Patch Updated the patch. 1. Add license header. (for audit warning) 2. Change new Long(long) to Long.valueOf(long) for findbug warning support cast of chararray to other simple types --- Key: PIG-893 URL: https://issues.apache.org/jira/browse/PIG-893 Project: Pig Issue Type: New Feature Affects Versions: 0.4.0 Reporter: Thejas M Nair Assignee: Jeff Zhang Fix For: 0.4.0 Attachments: Pig_893.Patch Pig should support casting of chararray to integer,long,float,double,bytearray. If the conversion fails for reasons such as overflow, cast should return null and log a warning. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Hudson build is back to normal: Pig-Patch-minerva.apache.org #156
See http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/156/
[jira] Created: (PIG-915) Pig HBase
Pig HBase - Key: PIG-915 URL: https://issues.apache.org/jira/browse/PIG-915 Project: Pig Issue Type: Improvement Reporter: Alex Newman Priority: Minor Currently their is no way to get the Row names when doing a query from HBase, we should probably remedy this as important data may be stored there. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-914) Change the PIG hbase interface to use bytes along with strings
[ https://issues.apache.org/jira/browse/PIG-914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12741997#action_12741997 ] Alex Newman commented on PIG-914: - Someone should assign this to me. Change the PIG hbase interface to use bytes along with strings -- Key: PIG-914 URL: https://issues.apache.org/jira/browse/PIG-914 Project: Pig Issue Type: Improvement Reporter: Alex Newman Priority: Minor Currently start rows, tablenames, column names are all strings, and HBase supports bytes we might want to change the Pig interface to support bytes along with strings. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-916) Change the pig hbase interface to get more than one row at a time when scanning
Change the pig hbase interface to get more than one row at a time when scanning --- Key: PIG-916 URL: https://issues.apache.org/jira/browse/PIG-916 Project: Pig Issue Type: Improvement Reporter: Alex Newman Priority: Trivial It should be significantly faster to get numerous rows at the same time rather than one row at a time for large table extraction processes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-916) Change the pig hbase interface to get more than one row at a time when scanning
[ https://issues.apache.org/jira/browse/PIG-916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742008#action_12742008 ] Alex Newman commented on PIG-916: - Feel free to assign this to me. Change the pig hbase interface to get more than one row at a time when scanning --- Key: PIG-916 URL: https://issues.apache.org/jira/browse/PIG-916 Project: Pig Issue Type: Improvement Reporter: Alex Newman Priority: Trivial It should be significantly faster to get numerous rows at the same time rather than one row at a time for large table extraction processes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-833) Storage access layer
[ https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi updated PIG-833: - Attachment: PIG-833-zebra.patch.bz2 Updated patch. Only change is that ant prints a descriptive error to user if hadoop20.jar does not exist in top level lib directory. It lists basic steps to get this built until PIG-660 is committed. Storage access layer Key: PIG-833 URL: https://issues.apache.org/jira/browse/PIG-833 Project: Pig Issue Type: New Feature Reporter: Jay Tang Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, PIG-833-zebra.patch.bz2, zebra-javadoc.tgz A layer is needed to provide a high level data access abstraction and a tabular view of data in Hadoop, and could free Pig users from implementing their own data storage/retrieval code. This layer should also include a columnar storage format in order to provide fast data projection, CPU/space-efficient data serialization, and a schema language to manage physical storage metadata. Eventually it could also support predicate pushdown for further performance improvement. Initially, this layer could be a contrib project in Pig and become a hadoop subproject later on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-833) Storage access layer
[ https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi updated PIG-833: - Attachment: PIG-833-zebra.patch.bz2 Storage access layer Key: PIG-833 URL: https://issues.apache.org/jira/browse/PIG-833 Project: Pig Issue Type: New Feature Reporter: Jay Tang Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, zebra-javadoc.tgz A layer is needed to provide a high level data access abstraction and a tabular view of data in Hadoop, and could free Pig users from implementing their own data storage/retrieval code. This layer should also include a columnar storage format in order to provide fast data projection, CPU/space-efficient data serialization, and a schema language to manage physical storage metadata. Eventually it could also support predicate pushdown for further performance improvement. Initially, this layer could be a contrib project in Pig and become a hadoop subproject later on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-833) Storage access layer
[ https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742069#action_12742069 ] Raghu Angadi commented on PIG-833: -- Alan, in order to run unit tests you need to build pig test-core. As mentioned in the instructions above please run {{'ant -Dtestcase=none test-core'}} under top level directory before running 'ant test' under contrib/zebra. Storage access layer Key: PIG-833 URL: https://issues.apache.org/jira/browse/PIG-833 Project: Pig Issue Type: New Feature Reporter: Jay Tang Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, test.out, zebra-javadoc.tgz A layer is needed to provide a high level data access abstraction and a tabular view of data in Hadoop, and could free Pig users from implementing their own data storage/retrieval code. This layer should also include a columnar storage format in order to provide fast data projection, CPU/space-efficient data serialization, and a schema language to manage physical storage metadata. Eventually it could also support predicate pushdown for further performance improvement. Initially, this layer could be a contrib project in Pig and become a hadoop subproject later on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-890) Create a sampler interface and improve the skewed join sampler
[ https://issues.apache.org/jira/browse/PIG-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriranjan Manjunath updated PIG-890: Attachment: sampler.patch The attached file has the redesigned sampler interface. Skewed join now uses a trivial implementation of the poisson sampling mechanism. Create a sampler interface and improve the skewed join sampler -- Key: PIG-890 URL: https://issues.apache.org/jira/browse/PIG-890 Project: Pig Issue Type: Improvement Reporter: Sriranjan Manjunath Attachments: sampler.patch We need a different sampler for order by and skewed join. We thus need a better sampling interface. The design of the same is described here: http://wiki.apache.org/pig/PigSampler -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-890) Create a sampler interface and improve the skewed join sampler
[ https://issues.apache.org/jira/browse/PIG-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriranjan Manjunath updated PIG-890: Status: Patch Available (was: Open) Create a sampler interface and improve the skewed join sampler -- Key: PIG-890 URL: https://issues.apache.org/jira/browse/PIG-890 Project: Pig Issue Type: Improvement Reporter: Sriranjan Manjunath Attachments: sampler.patch We need a different sampler for order by and skewed join. We thus need a better sampling interface. The design of the same is described here: http://wiki.apache.org/pig/PigSampler -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-833) Storage access layer
[ https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-833: --- Attachment: TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt Okay, now that I've first built Pig's test, I run the tests and I get: {code} [delete] Deleting directory /Users/gates/src/pig/apache/top/zebra/trunk/build/contrib/zebra/test/logs [mkdir] Created dir: /Users/gates/src/pig/apache/top/zebra/trunk/build/contrib/zebra/test/logs [junit] Running org.apache.hadoop.zebra.io.TestCheckin [junit] Tests run: 125, Failures: 0, Errors: 0, Time elapsed: 16.894 sec [junit] Running org.apache.hadoop.zebra.mapred.TestCheckin [junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 158.741 sec [junit] Running org.apache.hadoop.zebra.pig.TestCheckin1 [junit] Tests run: 0, Failures: 0, Errors: 2, Time elapsed: 0.13 sec [junit] Test org.apache.hadoop.zebra.pig.TestCheckin1 FAILED [junit] Running org.apache.hadoop.zebra.pig.TestCheckin2 [junit] Tests run: 0, Failures: 0, Errors: 2, Time elapsed: 0.131 sec [junit] Test org.apache.hadoop.zebra.pig.TestCheckin2 FAILED [junit] Running org.apache.hadoop.zebra.pig.TestCheckin3 [junit] Tests run: 0, Failures: 0, Errors: 2, Time elapsed: 0.133 sec [junit] Test org.apache.hadoop.zebra.pig.TestCheckin3 FAILED [junit] Running org.apache.hadoop.zebra.pig.TestCheckin4 [junit] Tests run: 0, Failures: 0, Errors: 2, Time elapsed: 0.128 sec [junit] Test org.apache.hadoop.zebra.pig.TestCheckin4 FAILED [junit] Running org.apache.hadoop.zebra.pig.TestCheckin5 [junit] Tests run: 0, Failures: 0, Errors: 2, Time elapsed: 0.128 sec [junit] Test org.apache.hadoop.zebra.pig.TestCheckin5 FAILED [junit] Running org.apache.hadoop.zebra.types.TestCheckin [junit] Tests run: 45, Failures: 0, Errors: 0, Time elapsed: 0.253 sec {code} I've attached the output from one of the tests. Storage access layer Key: PIG-833 URL: https://issues.apache.org/jira/browse/PIG-833 Project: Pig Issue Type: New Feature Reporter: Jay Tang Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz A layer is needed to provide a high level data access abstraction and a tabular view of data in Hadoop, and could free Pig users from implementing their own data storage/retrieval code. This layer should also include a columnar storage format in order to provide fast data projection, CPU/space-efficient data serialization, and a schema language to manage physical storage metadata. Eventually it could also support predicate pushdown for further performance improvement. Initially, this layer could be a contrib project in Pig and become a hadoop subproject later on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-833) Storage access layer
[ https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742083#action_12742083 ] Dmitriy V. Ryaboy commented on PIG-833: --- Alan -- if it's not finding .dfs , it's probably not linking hadoop20.jar Try my patch in 660 :-) Storage access layer Key: PIG-833 URL: https://issues.apache.org/jira/browse/PIG-833 Project: Pig Issue Type: New Feature Reporter: Jay Tang Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz A layer is needed to provide a high level data access abstraction and a tabular view of data in Hadoop, and could free Pig users from implementing their own data storage/retrieval code. This layer should also include a columnar storage format in order to provide fast data projection, CPU/space-efficient data serialization, and a schema language to manage physical storage metadata. Eventually it could also support predicate pushdown for further performance improvement. Initially, this layer could be a contrib project in Pig and become a hadoop subproject later on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-833) Storage access layer
[ https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742093#action_12742093 ] Alan Gates commented on PIG-833: My bad. I missed the line in the instructions where it said to apply the PIG-660 patch. I applied that and am trying again. Storage access layer Key: PIG-833 URL: https://issues.apache.org/jira/browse/PIG-833 Project: Pig Issue Type: New Feature Reporter: Jay Tang Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz A layer is needed to provide a high level data access abstraction and a tabular view of data in Hadoop, and could free Pig users from implementing their own data storage/retrieval code. This layer should also include a columnar storage format in order to provide fast data projection, CPU/space-efficient data serialization, and a schema language to manage physical storage metadata. Eventually it could also support predicate pushdown for further performance improvement. Initially, this layer could be a contrib project in Pig and become a hadoop subproject later on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-833) Storage access layer
[ https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742100#action_12742100 ] Alan Gates commented on PIG-833: Patch checked in. All the unit tests passed. Storage access layer Key: PIG-833 URL: https://issues.apache.org/jira/browse/PIG-833 Project: Pig Issue Type: New Feature Reporter: Jay Tang Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz A layer is needed to provide a high level data access abstraction and a tabular view of data in Hadoop, and could free Pig users from implementing their own data storage/retrieval code. This layer should also include a columnar storage format in order to provide fast data projection, CPU/space-efficient data serialization, and a schema language to manage physical storage metadata. Eventually it could also support predicate pushdown for further performance improvement. Initially, this layer could be a contrib project in Pig and become a hadoop subproject later on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-913) Error in Pig script when grouping on chararray column
[ https://issues.apache.org/jira/browse/PIG-913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-913: --- Status: Patch Available (was: Open) Error in Pig script when grouping on chararray column - Key: PIG-913 URL: https://issues.apache.org/jira/browse/PIG-913 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: Viraj Bhat Priority: Critical Fix For: 0.4.0 Attachments: PIG-913.patch I have a very simple script which fails at parsetime due to the schema I specified in the loader. {code} data = LOAD '/user/viraj/studenttab10k' AS (s:chararray); dataSmall = limit data 100; bb = GROUP dataSmall by $0; dump bb; {code} = 2009-08-06 18:47:56,297 [main] INFO org.apache.pig.Main - Logging error messages to: /homes/viraj/pig-svn/trunk/pig_1249609676296.log 09/08/06 18:47:56 INFO pig.Main: Logging error messages to: /homes/viraj/pig-svn/trunk/pig_1249609676296.log 2009-08-06 18:47:56,459 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://localhost:9000 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to hadoop file system at: hdfs://localhost:9000 2009-08-06 18:47:56,694 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:9001 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to map-reduce job tracker at: localhost:9001 2009-08-06 18:47:57,008 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1002: Unable to store alias bb 09/08/06 18:47:57 ERROR grunt.Grunt: ERROR 1002: Unable to store alias bb Details at logfile: /homes/viraj/pig-svn/trunk/pig_1249609676296.log = = Pig Stack Trace --- ERROR 1002: Unable to store alias bb org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to open iterator for alias bb at org.apache.pig.PigServer.openIterator(PigServer.java:481) at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:531) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) at org.apache.pig.Main.main(Main.java:397) Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: Unable to store alias bb at org.apache.pig.PigServer.store(PigServer.java:536) at org.apache.pig.PigServer.openIterator(PigServer.java:464) ... 6 more Caused by: java.lang.NullPointerException at org.apache.pig.impl.logicalLayer.LOCogroup.unsetSchema(LOCogroup.java:359) at org.apache.pig.impl.logicalLayer.optimizer.SchemaRemover.visit(SchemaRemover.java:64) at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:335) at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:46) at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.impl.logicalLayer.optimizer.LogicalTransformer.rebuildSchemas(LogicalTransformer.java:67) at org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:187) at org.apache.pig.PigServer.compileLp(PigServer.java:854) at org.apache.pig.PigServer.compileLp(PigServer.java:791) at org.apache.pig.PigServer.store(PigServer.java:509) ... 7 more = -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Build failed in Hudson: Pig-Patch-minerva.apache.org #157
See http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/157/ -- [...truncated 103063 lines...] [exec] [junit] 09/08/11 23:36:15 INFO dfs.DataNode: Received block blk_-6509224781215538639_1011 of size 6 from /127.0.0.1 [exec] [junit] 09/08/11 23:36:15 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:40940 is added to blk_-6509224781215538639_1011 size 6 [exec] [junit] 09/08/11 23:36:15 INFO dfs.DataNode: PacketResponder 1 for block blk_-6509224781215538639_1011 terminating [exec] [junit] 09/08/11 23:36:15 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:38934 is added to blk_-6509224781215538639_1011 size 6 [exec] [junit] 09/08/11 23:36:15 INFO dfs.DataNode: Received block blk_-6509224781215538639_1011 of size 6 from /127.0.0.1 [exec] [junit] 09/08/11 23:36:15 INFO dfs.DataNode: PacketResponder 2 for block blk_-6509224781215538639_1011 terminating [exec] [junit] 09/08/11 23:36:15 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:56715 is added to blk_-6509224781215538639_1011 size 6 [exec] [junit] 09/08/11 23:36:15 INFO executionengine.HExecutionEngine: Connecting to hadoop file system at: hdfs://localhost:40772 [exec] [junit] 09/08/11 23:36:15 INFO executionengine.HExecutionEngine: Connecting to map-reduce job tracker at: localhost:42304 [exec] [junit] 09/08/11 23:36:15 INFO mapReduceLayer.MultiQueryOptimizer: MR plan size before optimization: 1 [exec] [junit] 09/08/11 23:36:15 INFO mapReduceLayer.MultiQueryOptimizer: MR plan size after optimization: 1 [exec] [junit] 09/08/11 23:36:16 WARN dfs.DataNode: Unexpected error trying to delete block blk_-7801099502017534561_1004. BlockInfo not found in volumeMap. [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: Deleting block blk_-7252209396593481868_1006 file dfs/data/data7/current/blk_-7252209396593481868 [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: Deleting block blk_-1800239565210147527_1005 file dfs/data/data8/current/blk_-1800239565210147527 [exec] [junit] 09/08/11 23:36:16 WARN dfs.DataNode: java.io.IOException: Error in deleting blocks. [exec] [junit] at org.apache.hadoop.dfs.FSDataset.invalidate(FSDataset.java:1146) [exec] [junit] at org.apache.hadoop.dfs.DataNode.processCommand(DataNode.java:793) [exec] [junit] at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:663) [exec] [junit] at org.apache.hadoop.dfs.DataNode.run(DataNode.java:2888) [exec] [junit] at java.lang.Thread.run(Thread.java:619) [exec] [junit] [exec] [junit] 09/08/11 23:36:16 INFO mapReduceLayer.JobControlCompiler: Setting up single store job [exec] [junit] 09/08/11 23:36:16 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. [exec] [junit] 09/08/11 23:36:16 INFO dfs.StateChange: BLOCK* NameSystem.allocateBlock: /tmp/hadoop-hudson/mapred/system/job_200908112335_0002/job.jar. blk_5812011963372313027_1012 [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: Receiving block blk_5812011963372313027_1012 src: /127.0.0.1:56518 dest: /127.0.0.1:37446 [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: Receiving block blk_5812011963372313027_1012 src: /127.0.0.1:53963 dest: /127.0.0.1:40940 [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: Receiving block blk_5812011963372313027_1012 src: /127.0.0.1:36671 dest: /127.0.0.1:56715 [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: Received block blk_5812011963372313027_1012 of size 1480752 from /127.0.0.1 [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: PacketResponder 0 for block blk_5812011963372313027_1012 terminating [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: Received block blk_5812011963372313027_1012 of size 1480752 from /127.0.0.1 [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: PacketResponder 1 for block blk_5812011963372313027_1012 terminating [exec] [junit] 09/08/11 23:36:16 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:56715 is added to blk_5812011963372313027_1012 size 1480752 [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: Received block blk_5812011963372313027_1012 of size 1480752 from /127.0.0.1 [exec] [junit] 09/08/11 23:36:16 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:40940 is added to blk_5812011963372313027_1012 size 1480752 [exec] [junit] 09/08/11 23:36:16 INFO dfs.DataNode: PacketResponder 2 for block blk_5812011963372313027_1012 terminating [exec] [junit] 09/08/11 23:36:16
[jira] Commented: (PIG-890) Create a sampler interface and improve the skewed join sampler
[ https://issues.apache.org/jira/browse/PIG-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742118#action_12742118 ] Hadoop QA commented on PIG-890: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12416250/sampler.patch against trunk revision 801865. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 6 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/157/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/157/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/157/console This message is automatically generated. Create a sampler interface and improve the skewed join sampler -- Key: PIG-890 URL: https://issues.apache.org/jira/browse/PIG-890 Project: Pig Issue Type: Improvement Reporter: Sriranjan Manjunath Attachments: sampler.patch We need a different sampler for order by and skewed join. We thus need a better sampling interface. The design of the same is described here: http://wiki.apache.org/pig/PigSampler -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-890) Create a sampler interface and improve the skewed join sampler
[ https://issues.apache.org/jira/browse/PIG-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriranjan Manjunath updated PIG-890: Status: Open (was: Patch Available) Create a sampler interface and improve the skewed join sampler -- Key: PIG-890 URL: https://issues.apache.org/jira/browse/PIG-890 Project: Pig Issue Type: Improvement Reporter: Sriranjan Manjunath Attachments: sampler.patch We need a different sampler for order by and skewed join. We thus need a better sampling interface. The design of the same is described here: http://wiki.apache.org/pig/PigSampler -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-890) Create a sampler interface and improve the skewed join sampler
[ https://issues.apache.org/jira/browse/PIG-890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sriranjan Manjunath updated PIG-890: Attachment: (was: sampler.patch) Create a sampler interface and improve the skewed join sampler -- Key: PIG-890 URL: https://issues.apache.org/jira/browse/PIG-890 Project: Pig Issue Type: Improvement Reporter: Sriranjan Manjunath Attachments: sampler.patch We need a different sampler for order by and skewed join. We thus need a better sampling interface. The design of the same is described here: http://wiki.apache.org/pig/PigSampler -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-890) Create a sampler interface and improve the skewed join sampler
[ https://issues.apache.org/jira/browse/PIG-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742136#action_12742136 ] Sriranjan Manjunath commented on PIG-890: - Let me know if you think that this requires a test case and I will be happy to include it. Create a sampler interface and improve the skewed join sampler -- Key: PIG-890 URL: https://issues.apache.org/jira/browse/PIG-890 Project: Pig Issue Type: Improvement Reporter: Sriranjan Manjunath Attachments: sampler.patch We need a different sampler for order by and skewed join. We thus need a better sampling interface. The design of the same is described here: http://wiki.apache.org/pig/PigSampler -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-907) Provide multiple version of HashFNV (Piggybank)
[ https://issues.apache.org/jira/browse/PIG-907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742137#action_12742137 ] Olga Natkovich commented on PIG-907: +1 Provide multiple version of HashFNV (Piggybank) --- Key: PIG-907 URL: https://issues.apache.org/jira/browse/PIG-907 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.3.0 Reporter: Daniel Dai Priority: Minor Fix For: 0.4.0 Attachments: PIG-907-1.patch, PIG-907-2.patch HashFNV takes 1 or 2 parameters. It is better to create 2 versions of HashFNV when PIG-902 is not solved. So we can let the Pig pick the right version, do the type cast. Otherwise, user have to do the explicit cast. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-893) support cast of chararray to other simple types
[ https://issues.apache.org/jira/browse/PIG-893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742144#action_12742144 ] Alan Gates commented on PIG-893: I'm reviewing this patch. support cast of chararray to other simple types --- Key: PIG-893 URL: https://issues.apache.org/jira/browse/PIG-893 Project: Pig Issue Type: New Feature Affects Versions: 0.4.0 Reporter: Thejas M Nair Assignee: Jeff Zhang Fix For: 0.4.0 Attachments: Pig_893.Patch Pig should support casting of chararray to integer,long,float,double,bytearray. If the conversion fails for reasons such as overflow, cast should return null and log a warning. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Build failed in Hudson: Pig-Patch-minerva.apache.org #158
See http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/158/changes Changes: [gates] PIG-833: Added Zebra, new columnar storage mechanism for HDFS. -- [...truncated 103108 lines...] [exec] [junit] 09/08/12 01:19:32 INFO mapReduceLayer.MultiQueryOptimizer: MR plan size before optimization: 1 [exec] [junit] 09/08/12 01:19:32 INFO mapReduceLayer.MultiQueryOptimizer: MR plan size after optimization: 1 [exec] [junit] 09/08/12 01:19:32 WARN dfs.DataNode: Unexpected error trying to delete block blk_-1535404250649000663_1004. BlockInfo not found in volumeMap. [exec] [junit] 09/08/12 01:19:32 INFO dfs.DataNode: Deleting block blk_4954179736192186775_1006 file dfs/data/data8/current/blk_4954179736192186775 [exec] [junit] 09/08/12 01:19:32 WARN dfs.DataNode: java.io.IOException: Error in deleting blocks. [exec] [junit] at org.apache.hadoop.dfs.FSDataset.invalidate(FSDataset.java:1146) [exec] [junit] at org.apache.hadoop.dfs.DataNode.processCommand(DataNode.java:793) [exec] [junit] at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:663) [exec] [junit] at org.apache.hadoop.dfs.DataNode.run(DataNode.java:2888) [exec] [junit] at java.lang.Thread.run(Thread.java:619) [exec] [junit] [exec] [junit] 09/08/12 01:19:33 INFO mapReduceLayer.JobControlCompiler: Setting up single store job [exec] [junit] 09/08/12 01:19:33 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. [exec] [junit] 09/08/12 01:19:33 INFO dfs.StateChange: BLOCK* NameSystem.allocateBlock: /tmp/hadoop-hudson/mapred/system/job_200908120118_0002/job.jar. blk_2669403222345271811_1012 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: Receiving block blk_2669403222345271811_1012 src: /127.0.0.1:58050 dest: /127.0.0.1:40049 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: Receiving block blk_2669403222345271811_1012 src: /127.0.0.1:38276 dest: /127.0.0.1:54901 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: Receiving block blk_2669403222345271811_1012 src: /127.0.0.1:48397 dest: /127.0.0.1:34055 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: Received block blk_2669403222345271811_1012 of size 1476187 from /127.0.0.1 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: PacketResponder 0 for block blk_2669403222345271811_1012 terminating [exec] [junit] 09/08/12 01:19:33 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:34055 is added to blk_2669403222345271811_1012 size 1476187 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: Received block blk_2669403222345271811_1012 of size 1476187 from /127.0.0.1 [exec] [junit] 09/08/12 01:19:33 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:54901 is added to blk_2669403222345271811_1012 size 1476187 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: PacketResponder 1 for block blk_2669403222345271811_1012 terminating [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: Received block blk_2669403222345271811_1012 of size 1476187 from /127.0.0.1 [exec] [junit] 09/08/12 01:19:33 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:40049 is added to blk_2669403222345271811_1012 size 1476187 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: PacketResponder 2 for block blk_2669403222345271811_1012 terminating [exec] [junit] 09/08/12 01:19:33 INFO fs.FSNamesystem: Increasing replication for file /tmp/hadoop-hudson/mapred/system/job_200908120118_0002/job.jar. New replication is 2 [exec] [junit] 09/08/12 01:19:33 INFO fs.FSNamesystem: Reducing replication for file /tmp/hadoop-hudson/mapred/system/job_200908120118_0002/job.jar. New replication is 2 [exec] [junit] 09/08/12 01:19:33 INFO dfs.StateChange: BLOCK* NameSystem.allocateBlock: /tmp/hadoop-hudson/mapred/system/job_200908120118_0002/job.split. blk_-777871427035102840_1013 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: Receiving block blk_-777871427035102840_1013 src: /127.0.0.1:48398 dest: /127.0.0.1:34055 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: Receiving block blk_-777871427035102840_1013 src: /127.0.0.1:58054 dest: /127.0.0.1:40049 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: Receiving block blk_-777871427035102840_1013 src: /127.0.0.1:38280 dest: /127.0.0.1:54901 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: Received block blk_-777871427035102840_1013 of size 1837 from /127.0.0.1 [exec] [junit] 09/08/12 01:19:33 INFO dfs.DataNode: PacketResponder 0 for block blk_-777871427035102840_1013 terminating
[jira] Commented: (PIG-833) Storage access layer
[ https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742170#action_12742170 ] Dmitriy V. Ryaboy commented on PIG-833: --- Alan, this means Pig contrib/ is no longer compatible with Hadoop 18. Which probably means that you need to either rolls this back or roll 660 in (and add the hadoop20.jar file to lib/ ) Otherwise the build is broken. Storage access layer Key: PIG-833 URL: https://issues.apache.org/jira/browse/PIG-833 Project: Pig Issue Type: New Feature Reporter: Jay Tang Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz A layer is needed to provide a high level data access abstraction and a tabular view of data in Hadoop, and could free Pig users from implementing their own data storage/retrieval code. This layer should also include a columnar storage format in order to provide fast data projection, CPU/space-efficient data serialization, and a schema language to manage physical storage metadata. Eventually it could also support predicate pushdown for further performance improvement. Initially, this layer could be a contrib project in Pig and become a hadoop subproject later on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-833) Storage access layer
[ https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742201#action_12742201 ] Jay Tang commented on PIG-833: -- Zebra has a dependency on TFile that is available in Hadoop 20; that's why the compilation instruction is more complicated. A new wiki at http://wiki.apache.org/pig/zebra will provide more information on Zebra. Storage access layer Key: PIG-833 URL: https://issues.apache.org/jira/browse/PIG-833 Project: Pig Issue Type: New Feature Reporter: Jay Tang Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz A layer is needed to provide a high level data access abstraction and a tabular view of data in Hadoop, and could free Pig users from implementing their own data storage/retrieval code. This layer should also include a columnar storage format in order to provide fast data projection, CPU/space-efficient data serialization, and a schema language to manage physical storage metadata. Eventually it could also support predicate pushdown for further performance improvement. Initially, this layer could be a contrib project in Pig and become a hadoop subproject later on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-890) Create a sampler interface and improve the skewed join sampler
[ https://issues.apache.org/jira/browse/PIG-890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12742203#action_12742203 ] Hadoop QA commented on PIG-890: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12416267/sampler.patch against trunk revision 803312. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/159/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/159/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/159/console This message is automatically generated. Create a sampler interface and improve the skewed join sampler -- Key: PIG-890 URL: https://issues.apache.org/jira/browse/PIG-890 Project: Pig Issue Type: Improvement Reporter: Sriranjan Manjunath Attachments: sampler.patch We need a different sampler for order by and skewed join. We thus need a better sampling interface. The design of the same is described here: http://wiki.apache.org/pig/PigSampler -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.