Pig 0.4.0 release
Pig Developers, We have made several significant performance and other improvements over the last couple of months: (1) Added an optimizer with several rules (2) Introduced skew and merge joins (3) Cleaned COUNT and AVG semantics I think it is time for another release to make this functionality available to users. I propose that Pig 0.4.0 is released against Hadoop 18 since most users are still using this version. Once Hadoop 20.1 is released, we will roll Pig 0.5.0 based on Hadoop 20. Please, vote on the proposal by Thursday. Olga
[jira] Updated: (PIG-923) Allow setting logfile location in pig.properties
[ https://issues.apache.org/jira/browse/PIG-923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy V. Ryaboy updated PIG-923: -- Status: Patch Available (was: Open) Allow setting logfile location in pig.properties Key: PIG-923 URL: https://issues.apache.org/jira/browse/PIG-923 Project: Pig Issue Type: Bug Affects Versions: 0.3.0 Reporter: Dmitriy V. Ryaboy Fix For: 0.4.0 Attachments: pig_923.patch Local log file location can be specified through the -l flag, but it cannot be set in pig.properties. This JIRA proposes a change to Main.java that allows it to read the pig.logfile property from the configuration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Pig 0.4.0 release
Olga, Do non-commiters get a vote? Zebra is in trunk, but relies on 0.20, which is somewhat inconsistent even if it's in contrib/ Would love to see dynamic (or at least static) shims incorporated into the 0.4 release (see PIG-660, PIG-924) There are a couple of bugs still outstanding that I think would need to get fixed before a release: https://issues.apache.org/jira/browse/PIG-859 https://issues.apache.org/jira/browse/PIG-925 I think all of these can be solved within a week; assuming we are talking about a release after these go into trunk, +1. -D On Mon, Aug 17, 2009 at 11:46 AM, Olga Natkovichol...@yahoo-inc.com wrote: Pig Developers, We have made several significant performance and other improvements over the last couple of months: (1) Added an optimizer with several rules (2) Introduced skew and merge joins (3) Cleaned COUNT and AVG semantics I think it is time for another release to make this functionality available to users. I propose that Pig 0.4.0 is released against Hadoop 18 since most users are still using this version. Once Hadoop 20.1 is released, we will roll Pig 0.5.0 based on Hadoop 20. Please, vote on the proposal by Thursday. Olga
RE: Pig 0.4.0 release
Hi Dmitry, Non-committers get a non-binding vote. Zebra needs Hadoop 20.1 because it is relying on TFile functionality that is not available in Hadoop 20. In general, the recommendation from the Hadoop team is to wait till hadoop 20.1 is released. For the remainder of the issues, while I see that it would be nice to resolve them, I don't see them as blockers for Pig 0.4.0. My plan was to release what's currently in the trunk and have a follow up patch releases if needed. Olga -Original Message- From: Dmitriy Ryaboy [mailto:dvrya...@cloudera.com] Sent: Monday, August 17, 2009 12:04 PM To: pig-dev@hadoop.apache.org Subject: Re: Pig 0.4.0 release Olga, Do non-commiters get a vote? Zebra is in trunk, but relies on 0.20, which is somewhat inconsistent even if it's in contrib/ Would love to see dynamic (or at least static) shims incorporated into the 0.4 release (see PIG-660, PIG-924) There are a couple of bugs still outstanding that I think would need to get fixed before a release: https://issues.apache.org/jira/browse/PIG-859 https://issues.apache.org/jira/browse/PIG-925 I think all of these can be solved within a week; assuming we are talking about a release after these go into trunk, +1. -D On Mon, Aug 17, 2009 at 11:46 AM, Olga Natkovichol...@yahoo-inc.com wrote: Pig Developers, We have made several significant performance and other improvements over the last couple of months: (1) Added an optimizer with several rules (2) Introduced skew and merge joins (3) Cleaned COUNT and AVG semantics I think it is time for another release to make this functionality available to users. I propose that Pig 0.4.0 is released against Hadoop 18 since most users are still using this version. Once Hadoop 20.1 is released, we will roll Pig 0.5.0 based on Hadoop 20. Please, vote on the proposal by Thursday. Olga
RE: Pig 0.4.0 release
I have a question: Will we be able to fix piggybank sources given that Zebra needs 0.20 and the rest of Pig requires 0.18? If the answer is yes then, +1 for the release. I agree with the plan of making 0.4.0 with Hadoop-0.18 and a later release (0.5.0) for Hadoop-0.20.1. Thanks, Santhosh -Original Message- From: Olga Natkovich [mailto:ol...@yahoo-inc.com] Sent: Monday, August 17, 2009 12:57 PM To: pig-dev@hadoop.apache.org Subject: RE: Pig 0.4.0 release Hi Dmitry, Non-committers get a non-binding vote. Zebra needs Hadoop 20.1 because it is relying on TFile functionality that is not available in Hadoop 20. In general, the recommendation from the Hadoop team is to wait till hadoop 20.1 is released. For the remainder of the issues, while I see that it would be nice to resolve them, I don't see them as blockers for Pig 0.4.0. My plan was to release what's currently in the trunk and have a follow up patch releases if needed. Olga -Original Message- From: Dmitriy Ryaboy [mailto:dvrya...@cloudera.com] Sent: Monday, August 17, 2009 12:04 PM To: pig-dev@hadoop.apache.org Subject: Re: Pig 0.4.0 release Olga, Do non-commiters get a vote? Zebra is in trunk, but relies on 0.20, which is somewhat inconsistent even if it's in contrib/ Would love to see dynamic (or at least static) shims incorporated into the 0.4 release (see PIG-660, PIG-924) There are a couple of bugs still outstanding that I think would need to get fixed before a release: https://issues.apache.org/jira/browse/PIG-859 https://issues.apache.org/jira/browse/PIG-925 I think all of these can be solved within a week; assuming we are talking about a release after these go into trunk, +1. -D On Mon, Aug 17, 2009 at 11:46 AM, Olga Natkovichol...@yahoo-inc.com wrote: Pig Developers, We have made several significant performance and other improvements over the last couple of months: (1) Added an optimizer with several rules (2) Introduced skew and merge joins (3) Cleaned COUNT and AVG semantics I think it is time for another release to make this functionality available to users. I propose that Pig 0.4.0 is released against Hadoop 18 since most users are still using this version. Once Hadoop 20.1 is released, we will roll Pig 0.5.0 based on Hadoop 20. Please, vote on the proposal by Thursday. Olga
RE: Pig 0.4.0 release
Hi Santhosh, What do you mean by fixing piggybank? Olga -Original Message- From: Santhosh Srinivasan [mailto:s...@yahoo-inc.com] Sent: Monday, August 17, 2009 1:37 PM To: pig-dev@hadoop.apache.org Subject: RE: Pig 0.4.0 release I have a question: Will we be able to fix piggybank sources given that Zebra needs 0.20 and the rest of Pig requires 0.18? If the answer is yes then, +1 for the release. I agree with the plan of making 0.4.0 with Hadoop-0.18 and a later release (0.5.0) for Hadoop-0.20.1. Thanks, Santhosh -Original Message- From: Olga Natkovich [mailto:ol...@yahoo-inc.com] Sent: Monday, August 17, 2009 12:57 PM To: pig-dev@hadoop.apache.org Subject: RE: Pig 0.4.0 release Hi Dmitry, Non-committers get a non-binding vote. Zebra needs Hadoop 20.1 because it is relying on TFile functionality that is not available in Hadoop 20. In general, the recommendation from the Hadoop team is to wait till hadoop 20.1 is released. For the remainder of the issues, while I see that it would be nice to resolve them, I don't see them as blockers for Pig 0.4.0. My plan was to release what's currently in the trunk and have a follow up patch releases if needed. Olga -Original Message- From: Dmitriy Ryaboy [mailto:dvrya...@cloudera.com] Sent: Monday, August 17, 2009 12:04 PM To: pig-dev@hadoop.apache.org Subject: Re: Pig 0.4.0 release Olga, Do non-commiters get a vote? Zebra is in trunk, but relies on 0.20, which is somewhat inconsistent even if it's in contrib/ Would love to see dynamic (or at least static) shims incorporated into the 0.4 release (see PIG-660, PIG-924) There are a couple of bugs still outstanding that I think would need to get fixed before a release: https://issues.apache.org/jira/browse/PIG-859 https://issues.apache.org/jira/browse/PIG-925 I think all of these can be solved within a week; assuming we are talking about a release after these go into trunk, +1. -D On Mon, Aug 17, 2009 at 11:46 AM, Olga Natkovichol...@yahoo-inc.com wrote: Pig Developers, We have made several significant performance and other improvements over the last couple of months: (1) Added an optimizer with several rules (2) Introduced skew and merge joins (3) Cleaned COUNT and AVG semantics I think it is time for another release to make this functionality available to users. I propose that Pig 0.4.0 is released against Hadoop 18 since most users are still using this version. Once Hadoop 20.1 is released, we will roll Pig 0.5.0 based on Hadoop 20. Please, vote on the proposal by Thursday. Olga
RE: Pig 0.4.0 release
Till we release 0.5.0, will zebra's requirement on 0.20 prevent any bugs/issues with Piggybank? Santhosh -Original Message- From: Olga Natkovich [mailto:ol...@yahoo-inc.com] Sent: Monday, August 17, 2009 1:43 PM To: pig-dev@hadoop.apache.org Subject: RE: Pig 0.4.0 release Hi Santhosh, What do you mean by fixing piggybank? Olga -Original Message- From: Santhosh Srinivasan [mailto:s...@yahoo-inc.com] Sent: Monday, August 17, 2009 1:37 PM To: pig-dev@hadoop.apache.org Subject: RE: Pig 0.4.0 release I have a question: Will we be able to fix piggybank sources given that Zebra needs 0.20 and the rest of Pig requires 0.18? If the answer is yes then, +1 for the release. I agree with the plan of making 0.4.0 with Hadoop-0.18 and a later release (0.5.0) for Hadoop-0.20.1. Thanks, Santhosh -Original Message- From: Olga Natkovich [mailto:ol...@yahoo-inc.com] Sent: Monday, August 17, 2009 12:57 PM To: pig-dev@hadoop.apache.org Subject: RE: Pig 0.4.0 release Hi Dmitry, Non-committers get a non-binding vote. Zebra needs Hadoop 20.1 because it is relying on TFile functionality that is not available in Hadoop 20. In general, the recommendation from the Hadoop team is to wait till hadoop 20.1 is released. For the remainder of the issues, while I see that it would be nice to resolve them, I don't see them as blockers for Pig 0.4.0. My plan was to release what's currently in the trunk and have a follow up patch releases if needed. Olga -Original Message- From: Dmitriy Ryaboy [mailto:dvrya...@cloudera.com] Sent: Monday, August 17, 2009 12:04 PM To: pig-dev@hadoop.apache.org Subject: Re: Pig 0.4.0 release Olga, Do non-commiters get a vote? Zebra is in trunk, but relies on 0.20, which is somewhat inconsistent even if it's in contrib/ Would love to see dynamic (or at least static) shims incorporated into the 0.4 release (see PIG-660, PIG-924) There are a couple of bugs still outstanding that I think would need to get fixed before a release: https://issues.apache.org/jira/browse/PIG-859 https://issues.apache.org/jira/browse/PIG-925 I think all of these can be solved within a week; assuming we are talking about a release after these go into trunk, +1. -D On Mon, Aug 17, 2009 at 11:46 AM, Olga Natkovichol...@yahoo-inc.com wrote: Pig Developers, We have made several significant performance and other improvements over the last couple of months: (1) Added an optimizer with several rules (2) Introduced skew and merge joins (3) Cleaned COUNT and AVG semantics I think it is time for another release to make this functionality available to users. I propose that Pig 0.4.0 is released against Hadoop 18 since most users are still using this version. Once Hadoop 20.1 is released, we will roll Pig 0.5.0 based on Hadoop 20. Please, vote on the proposal by Thursday. Olga
Build failed in Hudson: Pig-Patch-minerva.apache.org #166
See http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/166/changes Changes: [olga] PIG-892: Make COUNT and AVG deal with nulls accordingly with SQL standart(olgan) -- [...truncated 111335 lines...] [exec] [junit] 09/08/17 20:53:33 INFO dfs.DataNode: Received block blk_-7573988517355695937_1011 of size 6 from /127.0.0.1 [exec] [junit] 09/08/17 20:53:33 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:37312 is added to blk_-7573988517355695937_1011 size 6 [exec] [junit] 09/08/17 20:53:33 INFO dfs.DataNode: PacketResponder 1 for block blk_-7573988517355695937_1011 terminating [exec] [junit] 09/08/17 20:53:33 INFO dfs.DataNode: Received block blk_-7573988517355695937_1011 of size 6 from /127.0.0.1 [exec] [junit] 09/08/17 20:53:33 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:44051 is added to blk_-7573988517355695937_1011 size 6 [exec] [junit] 09/08/17 20:53:33 INFO dfs.DataNode: PacketResponder 2 for block blk_-7573988517355695937_1011 terminating [exec] [junit] 09/08/17 20:53:33 INFO executionengine.HExecutionEngine: Connecting to hadoop file system at: hdfs://localhost:49389 [exec] [junit] 09/08/17 20:53:33 INFO executionengine.HExecutionEngine: Connecting to map-reduce job tracker at: localhost:38842 [exec] [junit] 09/08/17 20:53:33 INFO mapReduceLayer.MultiQueryOptimizer: MR plan size before optimization: 1 [exec] [junit] 09/08/17 20:53:33 INFO mapReduceLayer.MultiQueryOptimizer: MR plan size after optimization: 1 [exec] [junit] 09/08/17 20:53:33 WARN dfs.DataNode: Unexpected error trying to delete block blk_-6104859714580735196_1004. BlockInfo not found in volumeMap. [exec] [junit] 09/08/17 20:53:33 INFO dfs.DataNode: Deleting block blk_5262338578915024207_1006 file dfs/data/data8/current/blk_5262338578915024207 [exec] [junit] 09/08/17 20:53:33 INFO dfs.DataNode: Deleting block blk_8979557582064004070_1005 file dfs/data/data7/current/blk_8979557582064004070 [exec] [junit] 09/08/17 20:53:33 WARN dfs.DataNode: java.io.IOException: Error in deleting blocks. [exec] [junit] at org.apache.hadoop.dfs.FSDataset.invalidate(FSDataset.java:1146) [exec] [junit] at org.apache.hadoop.dfs.DataNode.processCommand(DataNode.java:793) [exec] [junit] at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:663) [exec] [junit] at org.apache.hadoop.dfs.DataNode.run(DataNode.java:2888) [exec] [junit] at java.lang.Thread.run(Thread.java:619) [exec] [junit] [exec] [junit] 09/08/17 20:53:33 INFO dfs.StateChange: BLOCK* ask 127.0.0.1:33755 to delete blk_8979557582064004070_1005 blk_-6104859714580735196_1004 blk_5262338578915024207_1006 [exec] [junit] 09/08/17 20:53:34 INFO mapReduceLayer.JobControlCompiler: Setting up single store job [exec] [junit] 09/08/17 20:53:34 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. [exec] [junit] 09/08/17 20:53:34 INFO dfs.StateChange: BLOCK* NameSystem.allocateBlock: /tmp/hadoop-hudson/mapred/system/job_200908172052_0002/job.jar. blk_1529136330435284967_1012 [exec] [junit] 09/08/17 20:53:34 INFO dfs.DataNode: Receiving block blk_1529136330435284967_1012 src: /127.0.0.1:38043 dest: /127.0.0.1:44051 [exec] [junit] 09/08/17 20:53:34 INFO dfs.DataNode: Receiving block blk_1529136330435284967_1012 src: /127.0.0.1:41366 dest: /127.0.0.1:33755 [exec] [junit] 09/08/17 20:53:34 INFO dfs.DataNode: Receiving block blk_1529136330435284967_1012 src: /127.0.0.1:36554 dest: /127.0.0.1:52175 [exec] [junit] 09/08/17 20:53:34 INFO dfs.DataNode: Received block blk_1529136330435284967_1012 of size 1497498 from /127.0.0.1 [exec] [junit] 09/08/17 20:53:34 INFO dfs.DataNode: PacketResponder 0 for block blk_1529136330435284967_1012 terminating [exec] [junit] 09/08/17 20:53:34 INFO dfs.DataNode: Received block blk_1529136330435284967_1012 of size 1497498 from /127.0.0.1 [exec] [junit] 09/08/17 20:53:34 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:52175 is added to blk_1529136330435284967_1012 size 1497498 [exec] [junit] 09/08/17 20:53:34 INFO dfs.DataNode: PacketResponder 1 for block blk_1529136330435284967_1012 terminating [exec] [junit] 09/08/17 20:53:34 INFO dfs.DataNode: Received block blk_1529136330435284967_1012 of size 1497498 from /127.0.0.1 [exec] [junit] 09/08/17 20:53:34 INFO dfs.DataNode: PacketResponder 2 for block blk_1529136330435284967_1012 terminating [exec] [junit] 09/08/17 20:53:34 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated:
[jira] Commented: (PIG-923) Allow setting logfile location in pig.properties
[ https://issues.apache.org/jira/browse/PIG-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12744194#action_12744194 ] Hadoop QA commented on PIG-923: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12416593/pig_923.patch against trunk revision 804406. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/166/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/166/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/166/console This message is automatically generated. Allow setting logfile location in pig.properties Key: PIG-923 URL: https://issues.apache.org/jira/browse/PIG-923 Project: Pig Issue Type: Bug Affects Versions: 0.3.0 Reporter: Dmitriy V. Ryaboy Fix For: 0.4.0 Attachments: pig_923.patch Local log file location can be specified through the -l flag, but it cannot be set in pig.properties. This JIRA proposes a change to Main.java that allows it to read the pig.logfile property from the configuration. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12744216#action_12744216 ] Todd Lipcon commented on PIG-924: - Oops, apparently it is Monday and my brain is scrambled. Above should read pretty important that a single build of *Pig* will work..., of course. Make Pig work with multiple versions of Hadoop -- Key: PIG-924 URL: https://issues.apache.org/jira/browse/PIG-924 Project: Pig Issue Type: Bug Reporter: Dmitriy V. Ryaboy Attachments: pig_924.patch The current Pig build scripts package hadoop and other dependencies into the pig.jar file. This means that if users upgrade Hadoop, they also need to upgrade Pig. Pig has relatively few dependencies on Hadoop interfaces that changed between 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to use the correct calls for any of the above versions of Hadoop. Unfortunately, the building process precludes us from the ability to do this at runtime, and forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: Pig 0.4.0 release
Rephrasing my question: Till we release 0.5.0, will zebra's requirement on hadoop-0.20 prevent fixing of any bugs/issues with Piggybank? Santhosh -Original Message- From: Santhosh Srinivasan [mailto:s...@yahoo-inc.com] Sent: Monday, August 17, 2009 1:47 PM To: pig-dev@hadoop.apache.org Subject: RE: Pig 0.4.0 release Till we release 0.5.0, will zebra's requirement on 0.20 prevent any bugs/issues with Piggybank? Santhosh -Original Message- From: Olga Natkovich [mailto:ol...@yahoo-inc.com] Sent: Monday, August 17, 2009 1:43 PM To: pig-dev@hadoop.apache.org Subject: RE: Pig 0.4.0 release Hi Santhosh, What do you mean by fixing piggybank? Olga -Original Message- From: Santhosh Srinivasan [mailto:s...@yahoo-inc.com] Sent: Monday, August 17, 2009 1:37 PM To: pig-dev@hadoop.apache.org Subject: RE: Pig 0.4.0 release I have a question: Will we be able to fix piggybank sources given that Zebra needs 0.20 and the rest of Pig requires 0.18? If the answer is yes then, +1 for the release. I agree with the plan of making 0.4.0 with Hadoop-0.18 and a later release (0.5.0) for Hadoop-0.20.1. Thanks, Santhosh -Original Message- From: Olga Natkovich [mailto:ol...@yahoo-inc.com] Sent: Monday, August 17, 2009 12:57 PM To: pig-dev@hadoop.apache.org Subject: RE: Pig 0.4.0 release Hi Dmitry, Non-committers get a non-binding vote. Zebra needs Hadoop 20.1 because it is relying on TFile functionality that is not available in Hadoop 20. In general, the recommendation from the Hadoop team is to wait till hadoop 20.1 is released. For the remainder of the issues, while I see that it would be nice to resolve them, I don't see them as blockers for Pig 0.4.0. My plan was to release what's currently in the trunk and have a follow up patch releases if needed. Olga -Original Message- From: Dmitriy Ryaboy [mailto:dvrya...@cloudera.com] Sent: Monday, August 17, 2009 12:04 PM To: pig-dev@hadoop.apache.org Subject: Re: Pig 0.4.0 release Olga, Do non-commiters get a vote? Zebra is in trunk, but relies on 0.20, which is somewhat inconsistent even if it's in contrib/ Would love to see dynamic (or at least static) shims incorporated into the 0.4 release (see PIG-660, PIG-924) There are a couple of bugs still outstanding that I think would need to get fixed before a release: https://issues.apache.org/jira/browse/PIG-859 https://issues.apache.org/jira/browse/PIG-925 I think all of these can be solved within a week; assuming we are talking about a release after these go into trunk, +1. -D On Mon, Aug 17, 2009 at 11:46 AM, Olga Natkovichol...@yahoo-inc.com wrote: Pig Developers, We have made several significant performance and other improvements over the last couple of months: (1) Added an optimizer with several rules (2) Introduced skew and merge joins (3) Cleaned COUNT and AVG semantics I think it is time for another release to make this functionality available to users. I propose that Pig 0.4.0 is released against Hadoop 18 since most users are still using this version. Once Hadoop 20.1 is released, we will roll Pig 0.5.0 based on Hadoop 20. Please, vote on the proposal by Thursday. Olga
[jira] Commented: (PIG-824) SQL interface for Pig
[ https://issues.apache.org/jira/browse/PIG-824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12744238#action_12744238 ] Thejas M Nair commented on PIG-824: --- JFlex.jar (required for build this patch) can be downloaded from http://www.jflex.de/download.html . SQL interface for Pig - Key: PIG-824 URL: https://issues.apache.org/jira/browse/PIG-824 Project: Pig Issue Type: New Feature Reporter: Olga Natkovich Attachments: PIG-824.1.patch, PIG-824.binfiles.tar.gz, SQL_IN_PIG.html In the last 18 month PigLatin has gained significant popularity within the open source community. Many users like its data flow model, its rich type system and its ability to work with any data available on HDFS or outside. We have also heard from many users that having Pig speak SQL would bring many more users. Having a single system that exports multiple interfaces is a big advantage as it guarantees consistent semantics, custom code reuse, and reduces the amount of maintenance. This is especially relevant for project where using both interfaces for different parts of the system is relevant. For instance, in a data warehousing system, you would have ETL component that brings data into the warehouse and a component that analyzes the data and produces reports. PigLatin is uniquely suited for ETL processing while SQL might be a better fit for report generation. To start, it would make sense to implement a subset of SQL92 standard and to be as much as possible standard compliant. This would include all the standard constructs: select, from, where, group-by + having, order by, limit, join (inner + outer). Several extensions such as support for pig's UDFs and possibly streaming, multiquery and support for pig's complex types would be helpful. This work is dependent on metadata support outlined in https://issues.apache.org/jira/browse/PIG-823 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Proposal to create a branch for contrib project Zebra
Thanks to the PIG team, The first version of contrib project Zebra (PIG-833) is committed to PIG trunk. In short, Zebra is a table storage layer built for use in PIG and other Hadoop applications. While we are stabilizing current version V1 in the trunk, we plan to add more new features to it. We would like to create an svn branch for the new features. We will be responsible for managing zebra in PIG trunk and in the new branch. We will merge the branch when it is ready. We expect the changes to affect only 'contrib/zebra' directory. As a regular contributor to Hadoop, I will be the initial committer for Zebra. As more patches are contributed by other Zebra developers, there might be more commiters added through normal Hadoop/Apache procedure. I would like to create a branch called 'zebra-v2' with approval from PIG team. Thanks, Raghu.
RE: Proposal to create a branch for contrib project Zebra
+1 -Original Message- From: Raghu Angadi [mailto:rang...@yahoo-inc.com] Sent: Monday, August 17, 2009 4:06 PM To: pig-dev@hadoop.apache.org Subject: Proposal to create a branch for contrib project Zebra Thanks to the PIG team, The first version of contrib project Zebra (PIG-833) is committed to PIG trunk. In short, Zebra is a table storage layer built for use in PIG and other Hadoop applications. While we are stabilizing current version V1 in the trunk, we plan to add more new features to it. We would like to create an svn branch for the new features. We will be responsible for managing zebra in PIG trunk and in the new branch. We will merge the branch when it is ready. We expect the changes to affect only 'contrib/zebra' directory. As a regular contributor to Hadoop, I will be the initial committer for Zebra. As more patches are contributed by other Zebra developers, there might be more commiters added through normal Hadoop/Apache procedure. I would like to create a branch called 'zebra-v2' with approval from PIG team. Thanks, Raghu.
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12744273#action_12744273 ] Daniel Dai commented on PIG-924: I am reviewing the patch. Make Pig work with multiple versions of Hadoop -- Key: PIG-924 URL: https://issues.apache.org/jira/browse/PIG-924 Project: Pig Issue Type: Bug Reporter: Dmitriy V. Ryaboy Attachments: pig_924.patch The current Pig build scripts package hadoop and other dependencies into the pig.jar file. This means that if users upgrade Hadoop, they also need to upgrade Pig. Pig has relatively few dependencies on Hadoop interfaces that changed between 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to use the correct calls for any of the above versions of Hadoop. Unfortunately, the building process precludes us from the ability to do this at runtime, and forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: Proposal to create a branch for contrib project Zebra
My vote is -1 -Original Message- From: Santhosh Srinivasan Sent: Monday, August 17, 2009 4:38 PM To: 'pig-dev@hadoop.apache.org' Subject: RE: Proposal to create a branch for contrib project Zebra Is there any precedence for such proposals? I am not comfortable with extending committer access to contrib teams. I would suggest that Zebra be made a sub-project of Hadoop and have a life of its own. Santhosh -Original Message- From: Raghu Angadi [mailto:rang...@yahoo-inc.com] Sent: Monday, August 17, 2009 4:06 PM To: pig-dev@hadoop.apache.org Subject: Proposal to create a branch for contrib project Zebra Thanks to the PIG team, The first version of contrib project Zebra (PIG-833) is committed to PIG trunk. In short, Zebra is a table storage layer built for use in PIG and other Hadoop applications. While we are stabilizing current version V1 in the trunk, we plan to add more new features to it. We would like to create an svn branch for the new features. We will be responsible for managing zebra in PIG trunk and in the new branch. We will merge the branch when it is ready. We expect the changes to affect only 'contrib/zebra' directory. As a regular contributor to Hadoop, I will be the initial committer for Zebra. As more patches are contributed by other Zebra developers, there might be more commiters added through normal Hadoop/Apache procedure. I would like to create a branch called 'zebra-v2' with approval from PIG team. Thanks, Raghu.
RE: Proposal to create a branch for contrib project Zebra
Is there any precedence for such proposals? I am not comfortable with extending committer access to contrib teams. I would suggest that Zebra be made a sub-project of Hadoop and have a life of its own. Santhosh -Original Message- From: Raghu Angadi [mailto:rang...@yahoo-inc.com] Sent: Monday, August 17, 2009 4:06 PM To: pig-dev@hadoop.apache.org Subject: Proposal to create a branch for contrib project Zebra Thanks to the PIG team, The first version of contrib project Zebra (PIG-833) is committed to PIG trunk. In short, Zebra is a table storage layer built for use in PIG and other Hadoop applications. While we are stabilizing current version V1 in the trunk, we plan to add more new features to it. We would like to create an svn branch for the new features. We will be responsible for managing zebra in PIG trunk and in the new branch. We will merge the branch when it is ready. We expect the changes to affect only 'contrib/zebra' directory. As a regular contributor to Hadoop, I will be the initial committer for Zebra. As more patches are contributed by other Zebra developers, there might be more commiters added through normal Hadoop/Apache procedure. I would like to create a branch called 'zebra-v2' with approval from PIG team. Thanks, Raghu.
RE: Proposal to create a branch for contrib project Zebra
Raghu is PMC member and as such already has committer rights to all subprojects. So we are not breaking any new grounds here. The reasoning is the same as for creating branches for Pig multiquery work that we did in Pig. Olga -Original Message- From: Santhosh Srinivasan [mailto:s...@yahoo-inc.com] Sent: Monday, August 17, 2009 4:39 PM To: Santhosh Srinivasan; pig-dev@hadoop.apache.org Subject: RE: Proposal to create a branch for contrib project Zebra My vote is -1 -Original Message- From: Santhosh Srinivasan Sent: Monday, August 17, 2009 4:38 PM To: 'pig-dev@hadoop.apache.org' Subject: RE: Proposal to create a branch for contrib project Zebra Is there any precedence for such proposals? I am not comfortable with extending committer access to contrib teams. I would suggest that Zebra be made a sub-project of Hadoop and have a life of its own. Santhosh -Original Message- From: Raghu Angadi [mailto:rang...@yahoo-inc.com] Sent: Monday, August 17, 2009 4:06 PM To: pig-dev@hadoop.apache.org Subject: Proposal to create a branch for contrib project Zebra Thanks to the PIG team, The first version of contrib project Zebra (PIG-833) is committed to PIG trunk. In short, Zebra is a table storage layer built for use in PIG and other Hadoop applications. While we are stabilizing current version V1 in the trunk, we plan to add more new features to it. We would like to create an svn branch for the new features. We will be responsible for managing zebra in PIG trunk and in the new branch. We will merge the branch when it is ready. We expect the changes to affect only 'contrib/zebra' directory. As a regular contributor to Hadoop, I will be the initial committer for Zebra. As more patches are contributed by other Zebra developers, there might be more commiters added through normal Hadoop/Apache procedure. I would like to create a branch called 'zebra-v2' with approval from PIG team. Thanks, Raghu.
Re: Proposal to create a branch for contrib project Zebra
+1 On 8/18/09 7:11 AM, Olga Natkovich ol...@yahoo-inc.com wrote: +1 -Original Message- From: Raghu Angadi [mailto:rang...@yahoo-inc.com] Sent: Monday, August 17, 2009 4:06 PM To: pig-dev@hadoop.apache.org Subject: Proposal to create a branch for contrib project Zebra Thanks to the PIG team, The first version of contrib project Zebra (PIG-833) is committed to PIG trunk. In short, Zebra is a table storage layer built for use in PIG and other Hadoop applications. While we are stabilizing current version V1 in the trunk, we plan to add more new features to it. We would like to create an svn branch for the new features. We will be responsible for managing zebra in PIG trunk and in the new branch. We will merge the branch when it is ready. We expect the changes to affect only 'contrib/zebra' directory. As a regular contributor to Hadoop, I will be the initial committer for Zebra. As more patches are contributed by other Zebra developers, there might be more commiters added through normal Hadoop/Apache procedure. I would like to create a branch called 'zebra-v2' with approval from PIG team. Thanks, Raghu. -- Yiping Han F-3140 (408)349-4403 y...@yahoo-inc.com
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12744305#action_12744305 ] Todd Lipcon commented on PIG-924: - Couple notes on the patch: - you've turned javac.deprecation from on to off - seems unwise. perhaps you should just do this for the one javac task where you want that behavior - src.shims.dir.com in the build.xml has a REMOVE mark on it - is this still needed? it looks like it is, but perhaps is better named .common instead of .com - you've moved junit.hadoop.conf into basedir instead of ${user.home} - this seems reasonable but is orthogonal to this patch. should be a separate JIRA - why are we now excluding HBase storage test? - some spurious whitespace changes (eg TypeCheckingVisitor.java) - in MRCompiler, a factor of 0.9 seems to have disappeared. the commented-out line should be removed - some tab characters seem to be introduced - in MiniCluster, also some commented-out code which should be cleaned up Make Pig work with multiple versions of Hadoop -- Key: PIG-924 URL: https://issues.apache.org/jira/browse/PIG-924 Project: Pig Issue Type: Bug Reporter: Dmitriy V. Ryaboy Attachments: pig_924.patch The current Pig build scripts package hadoop and other dependencies into the pig.jar file. This means that if users upgrade Hadoop, they also need to upgrade Pig. Pig has relatively few dependencies on Hadoop interfaces that changed between 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to use the correct calls for any of the above versions of Hadoop. Unfortunately, the building process precludes us from the ability to do this at runtime, and forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Proposal to create a branch for contrib project Zebra
IANAC, but my (non-binding) vote is also -1. I think all the improvements and feature addition to zebra should be available through pig trunk. The codebase is not big enough to justify creating a branch. If the reason is Pig's dependence on a checked in hadoop jar, the shims proposal by Dmitry should be taken up asap, so that those who want to use zebra can use pig trunk with hadoop 0.20 - milind On 8/17/09 5:14 PM, Yiping Han y...@yahoo-inc.com wrote: +1 On 8/18/09 7:11 AM, Olga Natkovich ol...@yahoo-inc.com wrote: +1 -Original Message- From: Raghu Angadi [mailto:rang...@yahoo-inc.com] Sent: Monday, August 17, 2009 4:06 PM To: pig-dev@hadoop.apache.org Subject: Proposal to create a branch for contrib project Zebra Thanks to the PIG team, The first version of contrib project Zebra (PIG-833) is committed to PIG trunk. In short, Zebra is a table storage layer built for use in PIG and other Hadoop applications. While we are stabilizing current version V1 in the trunk, we plan to add more new features to it. We would like to create an svn branch for the new features. We will be responsible for managing zebra in PIG trunk and in the new branch. We will merge the branch when it is ready. We expect the changes to affect only 'contrib/zebra' directory. As a regular contributor to Hadoop, I will be the initial committer for Zebra. As more patches are contributed by other Zebra developers, there might be more commiters added through normal Hadoop/Apache procedure. I would like to create a branch called 'zebra-v2' with approval from PIG team. Thanks, Raghu. -- Yiping Han F-3140 (408)349-4403 y...@yahoo-inc.com -- Milind Bhandarkar Y!IM: GridSolutions Tel: 408-349-2136 (mili...@yahoo-inc.com)
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12744307#action_12744307 ] Dmitriy V. Ryaboy commented on PIG-924: --- Thanks for looking, Todd -- most of those changes, like the factor of 0.9, deprecation, excluding HBase test, etc, are consistent with the 0.20 patch posted to PIG-660 . Moving junit.hadoop.conf is critical -- there are comments about this in 660 -- without it, resetting hadoop.version doesn't actually work, as some of the information from a previous build sticks around. I'll fix the whitespace; this wasn't a final patch, more of a proof of concept. Point being this could work, but it can't, because Hadoop is bundled in the jar. I am looking for comments from the core developer team regarding the possibility of un-bundling. Make Pig work with multiple versions of Hadoop -- Key: PIG-924 URL: https://issues.apache.org/jira/browse/PIG-924 Project: Pig Issue Type: Bug Reporter: Dmitriy V. Ryaboy Attachments: pig_924.patch The current Pig build scripts package hadoop and other dependencies into the pig.jar file. This means that if users upgrade Hadoop, they also need to upgrade Pig. Pig has relatively few dependencies on Hadoop interfaces that changed between 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to use the correct calls for any of the above versions of Hadoop. Unfortunately, the building process precludes us from the ability to do this at runtime, and forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop
[ https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12744310#action_12744310 ] Todd Lipcon commented on PIG-924: - Gotcha, thanks for explaining. Aside from the nits, patch looks good to me. Make Pig work with multiple versions of Hadoop -- Key: PIG-924 URL: https://issues.apache.org/jira/browse/PIG-924 Project: Pig Issue Type: Bug Reporter: Dmitriy V. Ryaboy Attachments: pig_924.patch The current Pig build scripts package hadoop and other dependencies into the pig.jar file. This means that if users upgrade Hadoop, they also need to upgrade Pig. Pig has relatively few dependencies on Hadoop interfaces that changed between 18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to use the correct calls for any of the above versions of Hadoop. Unfortunately, the building process precludes us from the ability to do this at runtime, and forces an unnecessary Pig rebuild even if dynamic shims are created. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Build failed in Hudson: Pig-Patch-minerva.apache.org #167
See http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/167/ -- [...truncated 111282 lines...] [exec] [junit] 09/08/18 01:01:56 INFO dfs.DataNode: PacketResponder 2 for block blk_3027939285115887556_1011 terminating [exec] [junit] 09/08/18 01:01:56 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:57132 is added to blk_3027939285115887556_1011 size 6 [exec] [junit] 09/08/18 01:01:56 INFO executionengine.HExecutionEngine: Connecting to hadoop file system at: hdfs://localhost:57553 [exec] [junit] 09/08/18 01:01:56 INFO executionengine.HExecutionEngine: Connecting to map-reduce job tracker at: localhost:52163 [exec] [junit] 09/08/18 01:01:56 INFO mapReduceLayer.MultiQueryOptimizer: MR plan size before optimization: 1 [exec] [junit] 09/08/18 01:01:56 INFO mapReduceLayer.MultiQueryOptimizer: MR plan size after optimization: 1 [exec] [junit] 09/08/18 01:01:57 INFO dfs.DataNode: Deleting block blk_-4258982856765574979_1005 file dfs/data/data5/current/blk_-4258982856765574979 [exec] [junit] 09/08/18 01:01:57 WARN dfs.DataNode: Unexpected error trying to delete block blk_5421843601365247738_1004. BlockInfo not found in volumeMap. [exec] [junit] 09/08/18 01:01:57 INFO dfs.DataNode: Deleting block blk_8703349292237962083_1006 file dfs/data/data6/current/blk_8703349292237962083 [exec] [junit] 09/08/18 01:01:57 WARN dfs.DataNode: java.io.IOException: Error in deleting blocks. [exec] [junit] at org.apache.hadoop.dfs.FSDataset.invalidate(FSDataset.java:1146) [exec] [junit] at org.apache.hadoop.dfs.DataNode.processCommand(DataNode.java:793) [exec] [junit] at org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:663) [exec] [junit] at org.apache.hadoop.dfs.DataNode.run(DataNode.java:2888) [exec] [junit] at java.lang.Thread.run(Thread.java:619) [exec] [junit] [exec] [junit] 09/08/18 01:01:57 INFO mapReduceLayer.JobControlCompiler: Setting up single store job [exec] [junit] 09/08/18 01:01:57 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. [exec] [junit] 09/08/18 01:01:57 INFO dfs.StateChange: BLOCK* NameSystem.allocateBlock: /tmp/hadoop-hudson/mapred/system/job_200908180101_0002/job.jar. blk_-9109308561601697298_1012 [exec] [junit] 09/08/18 01:01:57 INFO dfs.DataNode: Receiving block blk_-9109308561601697298_1012 src: /127.0.0.1:46186 dest: /127.0.0.1:48123 [exec] [junit] 09/08/18 01:01:57 INFO dfs.DataNode: Receiving block blk_-9109308561601697298_1012 src: /127.0.0.1:57220 dest: /127.0.0.1:52254 [exec] [junit] 09/08/18 01:01:57 INFO dfs.DataNode: Receiving block blk_-9109308561601697298_1012 src: /127.0.0.1:41307 dest: /127.0.0.1:57132 [exec] [junit] 09/08/18 01:01:57 INFO dfs.DataNode: Received block blk_-9109308561601697298_1012 of size 1498535 from /127.0.0.1 [exec] [junit] 09/08/18 01:01:57 INFO dfs.DataNode: PacketResponder 0 for block blk_-9109308561601697298_1012 terminating [exec] [junit] 09/08/18 01:01:57 INFO dfs.DataNode: Received block blk_-9109308561601697298_1012 of size 1498535 from /127.0.0.1 [exec] [junit] 09/08/18 01:01:57 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:57132 is added to blk_-9109308561601697298_1012 size 1498535 [exec] [junit] 09/08/18 01:01:57 INFO dfs.DataNode: PacketResponder 1 for block blk_-9109308561601697298_1012 terminating [exec] [junit] 09/08/18 01:01:57 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:52254 is added to blk_-9109308561601697298_1012 size 1498535 [exec] [junit] 09/08/18 01:01:57 INFO dfs.DataNode: Received block blk_-9109308561601697298_1012 of size 1498535 from /127.0.0.1 [exec] [junit] 09/08/18 01:01:57 INFO dfs.DataNode: PacketResponder 2 for block blk_-9109308561601697298_1012 terminating [exec] [junit] 09/08/18 01:01:57 INFO dfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:48123 is added to blk_-9109308561601697298_1012 size 1498535 [exec] [junit] 09/08/18 01:01:57 INFO fs.FSNamesystem: Increasing replication for file /tmp/hadoop-hudson/mapred/system/job_200908180101_0002/job.jar. New replication is 2 [exec] [junit] 09/08/18 01:01:57 INFO fs.FSNamesystem: Reducing replication for file /tmp/hadoop-hudson/mapred/system/job_200908180101_0002/job.jar. New replication is 2 [exec] [junit] 09/08/18 01:01:58 INFO dfs.StateChange: BLOCK* NameSystem.allocateBlock: /tmp/hadoop-hudson/mapred/system/job_200908180101_0002/job.split. blk_7949841277658716829_1013 [exec] [junit] 09/08/18 01:01:58 INFO
[jira] Commented: (PIG-925) Fix join in local mode
[ https://issues.apache.org/jira/browse/PIG-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12744316#action_12744316 ] Hadoop QA commented on PIG-925: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12416812/PIG-925-1.patch against trunk revision 804406. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 251 javac compiler warnings (more than the trunk's current 250 warnings). +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/167/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/167/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/167/console This message is automatically generated. Fix join in local mode -- Key: PIG-925 URL: https://issues.apache.org/jira/browse/PIG-925 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.3.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.4.0 Attachments: PIG-925-1.patch Join is broken after LOJoin patch (Optimizer_Phase5.patch of [PIG-697|https://issues.apache.org/jira/browse/PIG-697). Even the simplest join script is not working under local mode: eg: a = load '1.txt'; b = load '2.txt'; c = join a by $0, b by $0; dump c; Caused by: java.lang.NullPointerException at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNext(POPackage.java:206) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:191) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.local.executionengine.physicalLayer.counters.POCounter.getNext(POCounter.java:71) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117) at org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:146) at org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:109) at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:165) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Proposal to create a branch for contrib project Zebra
On Aug 17, 2009, at 4:38 PM, Santhosh Srinivasan wrote: Is there any precedence for such proposals? I am not comfortable with extending committer access to contrib teams. I would suggest that Zebra be made a sub-project of Hadoop and have a life of its own. There has been sufficient precedence for 'contrib committers' in Hadoop (e.g. Chukwa vis-a-vis the former 'Hadoop Core' sub-project) and is normal within the Apache world for committers with specific 'roles' e.g specific Contrib modules, QA, Release/Build etc. (http://hadoop.apache.org/common/credits.html - in fact, Giridharan Kesavan is an unlisted 'release' committer for Apache Hadoop) I believe it's a desired, nay stated, goal for Zebra to graduate as a Hadoop sub-project eventually, based on which it was voted-in as a contrib module by the Apache Pig. Given these, I don't see any cause for concern here. Arun Santhosh -Original Message- From: Raghu Angadi [mailto:rang...@yahoo-inc.com] Sent: Monday, August 17, 2009 4:06 PM To: pig-dev@hadoop.apache.org Subject: Proposal to create a branch for contrib project Zebra Thanks to the PIG team, The first version of contrib project Zebra (PIG-833) is committed to PIG trunk. In short, Zebra is a table storage layer built for use in PIG and other Hadoop applications. While we are stabilizing current version V1 in the trunk, we plan to add more new features to it. We would like to create an svn branch for the new features. We will be responsible for managing zebra in PIG trunk and in the new branch. We will merge the branch when it is ready. We expect the changes to affect only 'contrib/zebra' directory. As a regular contributor to Hadoop, I will be the initial committer for Zebra. As more patches are contributed by other Zebra developers, there might be more commiters added through normal Hadoop/Apache procedure. I would like to create a branch called 'zebra-v2' with approval from PIG team. Thanks, Raghu.
[jira] Commented: (PIG-833) Storage access layer
[ https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12744323#action_12744323 ] Jeff Hammerbacher commented on PIG-833: --- Hey, Raghu, you mention that a design document is forthcoming. It would be great to have a PDF design document, like Matei's for the fair scheduler, in addition to the Javadoc and wiki page. Any progress on that front? I'm quite interested in learning more about Zebra's use and implementation. On a larger note, it would be great if Pig moved to the Hadoop model for new features, where a design document and test plan is required to commit. See https://issues.apache.org/jira/browse/HADOOP-5587. It's tough to digest the bulk dumps of Owl, Zebra, and Giraffe, though we certainly appreciate the work Yahoo has done on these projects! Thanks, Jeff Storage access layer Key: PIG-833 URL: https://issues.apache.org/jira/browse/PIG-833 Project: Pig Issue Type: New Feature Reporter: Jay Tang Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2, TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz A layer is needed to provide a high level data access abstraction and a tabular view of data in Hadoop, and could free Pig users from implementing their own data storage/retrieval code. This layer should also include a columnar storage format in order to provide fast data projection, CPU/space-efficient data serialization, and a schema language to manage physical storage metadata. Eventually it could also support predicate pushdown for further performance improvement. Initially, this layer could be a contrib project in Pig and become a hadoop subproject later on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-823) Hadoop Metadata Service
[ https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12744326#action_12744326 ] Jeff Hammerbacher commented on PIG-823: --- Hey, Great to see Owl source! I've filed a ticket over on the Hive project (https://issues.apache.org/jira/browse/HIVE-762) to see if we can find some common ground between Pig and Hive's metadata needs; it would be great to have a single metadata service for all of Hadoop's structured data manipulation tools. If you're interested, please chime in there (or open a ticket here? Whatever seems sane to you). Thanks, Jeff Hadoop Metadata Service --- Key: PIG-823 URL: https://issues.apache.org/jira/browse/PIG-823 Project: Pig Issue Type: New Feature Reporter: Olga Natkovich Attachments: owl.filelist, owl.patch.gz, owl_libdeps.tgz, owl_otherdeps.tgz This JIRA is created to track development of a metadata system for Hadoop. The goal of the system is to allow users and applications to register data stored on HDFS, search for the data available on HDFS, and associate metadata such as schema, statistics, etc. with a particular data unit or a data set stored on HDFS. The initial goal is to provide a fairly generic, low level abstraction that any user or application on HDFS can use to store an retrieve metadata. Over time a higher level abstractions closely tied to particular applications or tools can be developed. Over time, it would make sense for the metadata service to become a subproject within Hadoop. For now, the proposal is to make it a contrib to Pig since Pig SQL is likely to be the first user of the system. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Proposal to create a branch for contrib project Zebra
That leaves us with contrib committers. Can you point to earlier email threads that cover the topic of giving committer access to contrib projects? Specifically, what does it mean to award someone committer privileges to a contrib project, what are the access privileges that come with such rights, what are the dos/don'ts, etc. Chukwa was a contrib module prior to it's current avatar as a full- fledged sub-project. It's 'contrib committers' Ari Rabkin and Eric Yang became it's first committers: http://markmail.org/message/75qvvcigi3qumifp Unfortunately the email threads for voting contrib committers are private to the Hadoop PMC, you'll just have to take my word for it. *smile* I did dig-up some other examples for you: http://www.gossamer-threads.com/lists/lucene/java-dev/81122 http://www.nabble.com/ANNOUNCE:-Welcome--as-Contrib-Committer-td21506295.html Contrib committers have privileges to commit only to their 'module': pig/trunk/contrib/zebra in this case. Thirdly, are there instances of contrib committers creating branches? Branches are a development tool... I don't see the problem with creating/using them. Arun
[jira] Updated: (PIG-911) [Piggybank] SequenceFileLoader
[ https://issues.apache.org/jira/browse/PIG-911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy V. Ryaboy updated PIG-911: -- Status: Open (was: Patch Available) [Piggybank] SequenceFileLoader --- Key: PIG-911 URL: https://issues.apache.org/jira/browse/PIG-911 Project: Pig Issue Type: New Feature Reporter: Dmitriy V. Ryaboy Attachments: pig_911.2.patch, pig_sequencefile.patch The proposed piggybank contribution adds a SequenceFileLoader to the piggybank. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-911) [Piggybank] SequenceFileLoader
[ https://issues.apache.org/jira/browse/PIG-911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitriy V. Ryaboy updated PIG-911: -- Attachment: pig_911.2.patch Addressed Alan's comments. [Piggybank] SequenceFileLoader --- Key: PIG-911 URL: https://issues.apache.org/jira/browse/PIG-911 Project: Pig Issue Type: New Feature Reporter: Dmitriy V. Ryaboy Attachments: pig_911.2.patch, pig_sequencefile.patch The proposed piggybank contribution adds a SequenceFileLoader to the piggybank. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-911) [Piggybank] SequenceFileLoader
[ https://issues.apache.org/jira/browse/PIG-911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12744343#action_12744343 ] Dmitriy V. Ryaboy commented on PIG-911: --- Concerning making this a StoreFunc, as well -- the StoreFunc interface is not very friendly to this. All you get in the bind call is the output stream; for LoadFunc, you also get the name of the file (or, presumably, whatever it was the user passed in under the guise of a file name). This means that for the LoadFunc, I was able to use the passed in filename to back into a Path and a FileSystem. I can't do the same for StoreFunc, where the filename is not available -- only the output stream is. That means I can't create the appropriate SequenceFile.Writer . Is there a way around this limitation that does not involve requiring special constructor parameters to be used? Is it possible to change the StoreFunc api to provide this information, or to make it available through some side channel (MapRedUtils or similar)? [Piggybank] SequenceFileLoader --- Key: PIG-911 URL: https://issues.apache.org/jira/browse/PIG-911 Project: Pig Issue Type: New Feature Reporter: Dmitriy V. Ryaboy Attachments: pig_911.2.patch, pig_sequencefile.patch The proposed piggybank contribution adds a SequenceFileLoader to the piggybank. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Proposal to create a branch for contrib project Zebra
Hi Santosh, There are two separate things : (a) voting a contributor as a committer (b) committing to a contrib project. (b): My experience with Hadoop is that Contrib by definition is very loosely coupled with core. By convention, we as committers to core (hdfs, mapred, etc) did not have to monitor changes to contrib as thoroughly as we would monitor core changes. It is the responsibility of contrib developers to make sure they are not breaking builds etc. Contrib changes get reviewed by people interested in the project. (a): Voting takes place when a contributor is being blessed as a committer. It involves some legal stuff as well. Although a committer has permissions to commit to any part of a project, it is expected that they don't misuse it. e.g. if I have a patch for core Map/Reduce, I would certainly wait for a regular MR contributor to review it and possibly commit it. It does not matter how many patches I might have contributed to say HDFS. Reason for (a) is simple scalability. We can not monitor everything. If you or another PIG developer volunteers to commit zebra patches, we are more than happy to let you do it. Please let us know. Or at any stage, if you feel we may be violating normal conventions (like breaking builds or committing some PIG changes).. please raise the issue. We have not seen serious problems in this regd with any other project, I think we should get benefit or doubt. I have not addressed the reason for a new branch here. will pitch for it another mail. Raghu. Santhosh Srinivasan wrote: Is there any precedence for such proposals? I am not comfortable with extending committer access to contrib teams. I would suggest that Zebra be made a sub-project of Hadoop and have a life of its own. Santhosh -Original Message- From: Raghu Angadi [mailto:rang...@yahoo-inc.com] Sent: Monday, August 17, 2009 4:06 PM To: pig-dev@hadoop.apache.org Subject: Proposal to create a branch for contrib project Zebra Thanks to the PIG team, The first version of contrib project Zebra (PIG-833) is committed to PIG trunk. In short, Zebra is a table storage layer built for use in PIG and other Hadoop applications. While we are stabilizing current version V1 in the trunk, we plan to add more new features to it. We would like to create an svn branch for the new features. We will be responsible for managing zebra in PIG trunk and in the new branch. We will merge the branch when it is ready. We expect the changes to affect only 'contrib/zebra' directory. As a regular contributor to Hadoop, I will be the initial committer for Zebra. As more patches are contributed by other Zebra developers, there might be more commiters added through normal Hadoop/Apache procedure. I would like to create a branch called 'zebra-v2' with approval from PIG team. Thanks, Raghu.
[jira] Updated: (PIG-925) Fix join in local mode
[ https://issues.apache.org/jira/browse/PIG-925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-925: --- Status: Patch Available (was: Open) Fix join in local mode -- Key: PIG-925 URL: https://issues.apache.org/jira/browse/PIG-925 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.3.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.4.0 Attachments: PIG-925-1.patch, PIG-925-2.patch Join is broken after LOJoin patch (Optimizer_Phase5.patch of [PIG-697|https://issues.apache.org/jira/browse/PIG-697). Even the simplest join script is not working under local mode: eg: a = load '1.txt'; b = load '2.txt'; c = join a by $0, b by $0; dump c; Caused by: java.lang.NullPointerException at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNext(POPackage.java:206) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:191) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.local.executionengine.physicalLayer.counters.POCounter.getNext(POCounter.java:71) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117) at org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:146) at org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:109) at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:165) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-925) Fix join in local mode
[ https://issues.apache.org/jira/browse/PIG-925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-925: --- Attachment: PIG-925-2.patch Address the javac warning Fix join in local mode -- Key: PIG-925 URL: https://issues.apache.org/jira/browse/PIG-925 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.3.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.4.0 Attachments: PIG-925-1.patch, PIG-925-2.patch Join is broken after LOJoin patch (Optimizer_Phase5.patch of [PIG-697|https://issues.apache.org/jira/browse/PIG-697). Even the simplest join script is not working under local mode: eg: a = load '1.txt'; b = load '2.txt'; c = join a by $0, b by $0; dump c; Caused by: java.lang.NullPointerException at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNext(POPackage.java:206) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:191) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.local.executionengine.physicalLayer.counters.POCounter.getNext(POCounter.java:71) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117) at org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:146) at org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:109) at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:165) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-925) Fix join in local mode
[ https://issues.apache.org/jira/browse/PIG-925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-925: --- Status: Open (was: Patch Available) Fix join in local mode -- Key: PIG-925 URL: https://issues.apache.org/jira/browse/PIG-925 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.3.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.4.0 Attachments: PIG-925-1.patch, PIG-925-2.patch Join is broken after LOJoin patch (Optimizer_Phase5.patch of [PIG-697|https://issues.apache.org/jira/browse/PIG-697). Even the simplest join script is not working under local mode: eg: a = load '1.txt'; b = load '2.txt'; c = join a by $0, b by $0; dump c; Caused by: java.lang.NullPointerException at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNext(POPackage.java:206) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:191) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.local.executionengine.physicalLayer.counters.POCounter.getNext(POCounter.java:71) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117) at org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:146) at org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:109) at org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:165) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Hudson build is back to normal: Pig-Patch-minerva.apache.org #168
See http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/168/
[jira] Commented: (PIG-911) [Piggybank] SequenceFileLoader
[ https://issues.apache.org/jira/browse/PIG-911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12744373#action_12744373 ] Hadoop QA commented on PIG-911: --- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12416830/pig_911.2.patch against trunk revision 804406. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/168/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/168/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/168/console This message is automatically generated. [Piggybank] SequenceFileLoader --- Key: PIG-911 URL: https://issues.apache.org/jira/browse/PIG-911 Project: Pig Issue Type: New Feature Reporter: Dmitriy V. Ryaboy Attachments: pig_911.2.patch, pig_sequencefile.patch The proposed piggybank contribution adds a SequenceFileLoader to the piggybank. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Proposal to create a branch for contrib project Zebra
Raghu Angadi wrote: Hi Santosh, There are two separate things : (a) voting a contributor as a committer (b) committing to a contrib project. [...] Reason for (a) is simple scalability. We can not monitor everything. If I meant to say Reason for (b) (why contrib commits are treated bit differently). Our motivation is not to bypass any oversight.. it is just so that we don't to burden PIG committers too much. We are happy if a PIG committer volunteers to oversee and commit. Raghu. you or another PIG developer volunteers to commit zebra patches, we are more than happy to let you do it. Please let us know. Or at any stage, if you feel we may be violating normal conventions (like breaking builds or committing some PIG changes).. please raise the issue. We have not seen serious problems in this regd with any other project, I think we should get benefit or doubt. I have not addressed the reason for a new branch here. will pitch for it another mail. Raghu. Santhosh Srinivasan wrote: Is there any precedence for such proposals? I am not comfortable with extending committer access to contrib teams. I would suggest that Zebra be made a sub-project of Hadoop and have a life of its own. Santhosh -Original Message- From: Raghu Angadi [mailto:rang...@yahoo-inc.com] Sent: Monday, August 17, 2009 4:06 PM To: pig-dev@hadoop.apache.org Subject: Proposal to create a branch for contrib project Zebra Thanks to the PIG team, The first version of contrib project Zebra (PIG-833) is committed to PIG trunk. In short, Zebra is a table storage layer built for use in PIG and other Hadoop applications. While we are stabilizing current version V1 in the trunk, we plan to add more new features to it. We would like to create an svn branch for the new features. We will be responsible for managing zebra in PIG trunk and in the new branch. We will merge the branch when it is ready. We expect the changes to affect only 'contrib/zebra' directory. As a regular contributor to Hadoop, I will be the initial committer for Zebra. As more patches are contributed by other Zebra developers, there might be more commiters added through normal Hadoop/Apache procedure. I would like to create a branch called 'zebra-v2' with approval from PIG team. Thanks, Raghu.