date:20090817

Pig Developers,

 

We have made several significant performance and other improvements over
the last couple of months:

 

(1) Added an optimizer with several rules

(2) Introduced skew and merge joins

(3) Cleaned COUNT and AVG semantics

 

I think it is time for another release to make this functionality
available to users.

 

I propose that Pig 0.4.0 is released against Hadoop 18 since most users
are still using this version. Once Hadoop 20.1 is released, we will roll
Pig 0.5.0 based on Hadoop 20.

 

Please, vote on the proposal by Thursday. 

 

Olga

[jira] Updated: (PIG-923) Allow setting logfile location in pig.properties


 [ 
https://issues.apache.org/jira/browse/PIG-923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-923:
--

Status: Patch Available  (was: Open)

 Allow setting logfile location in pig.properties
 

 Key: PIG-923
 URL: https://issues.apache.org/jira/browse/PIG-923
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.3.0
Reporter: Dmitriy V. Ryaboy
 Fix For: 0.4.0

 Attachments: pig_923.patch


 Local log file location can be specified through the -l flag, but it cannot 
 be set in pig.properties.
 This JIRA proposes a change to Main.java that allows it to read the 
 pig.logfile property from the configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: Pig 0.4.0 release

2009-08-17 Thread Dmitriy Ryaboy

Olga,

Do non-commiters get a vote?

Zebra is in trunk, but relies on 0.20, which is somewhat inconsistent
even if it's in contrib/

Would love to see dynamic (or at least static) shims incorporated into
the 0.4 release (see PIG-660, PIG-924)

There are a couple of bugs still outstanding that I think would need
to get fixed before a release:

https://issues.apache.org/jira/browse/PIG-859
https://issues.apache.org/jira/browse/PIG-925

 I think all of these can be solved within a week; assuming we are
talking about a release after these go into trunk, +1.

-D


On Mon, Aug 17, 2009 at 11:46 AM, Olga Natkovichol...@yahoo-inc.com wrote:
 Pig Developers,



 We have made several significant performance and other improvements over
 the last couple of months:



 (1)     Added an optimizer with several rules

 (2)     Introduced skew and merge joins

 (3)     Cleaned COUNT and AVG semantics



 I think it is time for another release to make this functionality
 available to users.



 I propose that Pig 0.4.0 is released against Hadoop 18 since most users
 are still using this version. Once Hadoop 20.1 is released, we will roll
 Pig 0.5.0 based on Hadoop 20.



 Please, vote on the proposal by Thursday.



 Olga

RE: Pig 0.4.0 release

Hi Dmitry,

Non-committers get a non-binding vote.

Zebra needs Hadoop 20.1 because it is relying on TFile functionality that is 
not available in Hadoop 20. In general, the recommendation from the Hadoop team 
is to wait till hadoop 20.1 is released.

For the remainder of the issues, while I see that it would be nice to resolve 
them, I don't see them as blockers for Pig 0.4.0.

My plan was to release what's currently in the trunk and have a follow up patch 
releases if needed.

Olga

-Original Message-
From: Dmitriy Ryaboy [mailto:dvrya...@cloudera.com] 
Sent: Monday, August 17, 2009 12:04 PM
To: pig-dev@hadoop.apache.org
Subject: Re: Pig 0.4.0 release

Olga,

Do non-commiters get a vote?

Zebra is in trunk, but relies on 0.20, which is somewhat inconsistent
even if it's in contrib/

Would love to see dynamic (or at least static) shims incorporated into
the 0.4 release (see PIG-660, PIG-924)

There are a couple of bugs still outstanding that I think would need
to get fixed before a release:

https://issues.apache.org/jira/browse/PIG-859
https://issues.apache.org/jira/browse/PIG-925

 I think all of these can be solved within a week; assuming we are
talking about a release after these go into trunk, +1.

-D


On Mon, Aug 17, 2009 at 11:46 AM, Olga Natkovichol...@yahoo-inc.com wrote:
 Pig Developers,



 We have made several significant performance and other improvements over
 the last couple of months:



 (1)     Added an optimizer with several rules

 (2)     Introduced skew and merge joins

 (3)     Cleaned COUNT and AVG semantics



 I think it is time for another release to make this functionality
 available to users.



 I propose that Pig 0.4.0 is released against Hadoop 18 since most users
 are still using this version. Once Hadoop 20.1 is released, we will roll
 Pig 0.5.0 based on Hadoop 20.



 Please, vote on the proposal by Thursday.



 Olga

RE: Pig 0.4.0 release

I have a question:

Will we be able to fix piggybank sources given that Zebra needs 0.20 and the 
rest of Pig requires 0.18? 

If the answer is yes then, +1 for the release. I agree with the plan of making 
0.4.0 with Hadoop-0.18 and a later release (0.5.0) for Hadoop-0.20.1.

Thanks,
Santhosh

-Original Message-
From: Olga Natkovich [mailto:ol...@yahoo-inc.com] 
Sent: Monday, August 17, 2009 12:57 PM
To: pig-dev@hadoop.apache.org
Subject: RE: Pig 0.4.0 release

Hi Dmitry,

Non-committers get a non-binding vote.

Zebra needs Hadoop 20.1 because it is relying on TFile functionality that is 
not available in Hadoop 20. In general, the recommendation from the Hadoop team 
is to wait till hadoop 20.1 is released.

For the remainder of the issues, while I see that it would be nice to resolve 
them, I don't see them as blockers for Pig 0.4.0.

My plan was to release what's currently in the trunk and have a follow up patch 
releases if needed.

Olga

-Original Message-
From: Dmitriy Ryaboy [mailto:dvrya...@cloudera.com] 
Sent: Monday, August 17, 2009 12:04 PM
To: pig-dev@hadoop.apache.org
Subject: Re: Pig 0.4.0 release

Olga,

Do non-commiters get a vote?

Zebra is in trunk, but relies on 0.20, which is somewhat inconsistent
even if it's in contrib/

Would love to see dynamic (or at least static) shims incorporated into
the 0.4 release (see PIG-660, PIG-924)

There are a couple of bugs still outstanding that I think would need
to get fixed before a release:

https://issues.apache.org/jira/browse/PIG-859
https://issues.apache.org/jira/browse/PIG-925

 I think all of these can be solved within a week; assuming we are
talking about a release after these go into trunk, +1.

-D


On Mon, Aug 17, 2009 at 11:46 AM, Olga Natkovichol...@yahoo-inc.com wrote:
 Pig Developers,



 We have made several significant performance and other improvements over
 the last couple of months:



 (1)     Added an optimizer with several rules

 (2)     Introduced skew and merge joins

 (3)     Cleaned COUNT and AVG semantics



 I think it is time for another release to make this functionality
 available to users.



 I propose that Pig 0.4.0 is released against Hadoop 18 since most users
 are still using this version. Once Hadoop 20.1 is released, we will roll
 Pig 0.5.0 based on Hadoop 20.



 Please, vote on the proposal by Thursday.



 Olga

RE: Pig 0.4.0 release

Hi Santhosh,

What do you mean by fixing piggybank?

Olga

-Original Message-
From: Santhosh Srinivasan [mailto:s...@yahoo-inc.com] 
Sent: Monday, August 17, 2009 1:37 PM
To: pig-dev@hadoop.apache.org
Subject: RE: Pig 0.4.0 release

I have a question:

Will we be able to fix piggybank sources given that Zebra needs 0.20 and the 
rest of Pig requires 0.18? 

If the answer is yes then, +1 for the release. I agree with the plan of making 
0.4.0 with Hadoop-0.18 and a later release (0.5.0) for Hadoop-0.20.1.

Thanks,
Santhosh

-Original Message-
From: Olga Natkovich [mailto:ol...@yahoo-inc.com] 
Sent: Monday, August 17, 2009 12:57 PM
To: pig-dev@hadoop.apache.org
Subject: RE: Pig 0.4.0 release

Hi Dmitry,

Non-committers get a non-binding vote.

Zebra needs Hadoop 20.1 because it is relying on TFile functionality that is 
not available in Hadoop 20. In general, the recommendation from the Hadoop team 
is to wait till hadoop 20.1 is released.

For the remainder of the issues, while I see that it would be nice to resolve 
them, I don't see them as blockers for Pig 0.4.0.

My plan was to release what's currently in the trunk and have a follow up patch 
releases if needed.

Olga

-Original Message-
From: Dmitriy Ryaboy [mailto:dvrya...@cloudera.com] 
Sent: Monday, August 17, 2009 12:04 PM
To: pig-dev@hadoop.apache.org
Subject: Re: Pig 0.4.0 release

Olga,

Do non-commiters get a vote?

Zebra is in trunk, but relies on 0.20, which is somewhat inconsistent
even if it's in contrib/

Would love to see dynamic (or at least static) shims incorporated into
the 0.4 release (see PIG-660, PIG-924)

There are a couple of bugs still outstanding that I think would need
to get fixed before a release:

https://issues.apache.org/jira/browse/PIG-859
https://issues.apache.org/jira/browse/PIG-925

 I think all of these can be solved within a week; assuming we are
talking about a release after these go into trunk, +1.

-D

On Mon, Aug 17, 2009 at 11:46 AM, Olga Natkovichol...@yahoo-inc.com wrote:
 Pig Developers,

 We have made several significant performance and other improvements over
 the last couple of months:

 (1)     Added an optimizer with several rules

 (2)     Introduced skew and merge joins

 (3)     Cleaned COUNT and AVG semantics

 I think it is time for another release to make this functionality
 available to users.

 I propose that Pig 0.4.0 is released against Hadoop 18 since most users
 are still using this version. Once Hadoop 20.1 is released, we will roll
 Pig 0.5.0 based on Hadoop 20.

 Please, vote on the proposal by Thursday.

 Olga

RE: Pig 0.4.0 release

Till we release 0.5.0, will zebra's requirement on 0.20 prevent any bugs/issues 
with Piggybank?

Santhosh

-Original Message-
From: Olga Natkovich [mailto:ol...@yahoo-inc.com] 
Sent: Monday, August 17, 2009 1:43 PM
To: pig-dev@hadoop.apache.org
Subject: RE: Pig 0.4.0 release

Hi Santhosh,

What do you mean by fixing piggybank?

Olga

-Original Message-
From: Santhosh Srinivasan [mailto:s...@yahoo-inc.com] 
Sent: Monday, August 17, 2009 1:37 PM
To: pig-dev@hadoop.apache.org
Subject: RE: Pig 0.4.0 release

I have a question:

Will we be able to fix piggybank sources given that Zebra needs 0.20 and the 
rest of Pig requires 0.18? 

If the answer is yes then, +1 for the release. I agree with the plan of making 
0.4.0 with Hadoop-0.18 and a later release (0.5.0) for Hadoop-0.20.1.

Thanks,
Santhosh

-Original Message-
From: Olga Natkovich [mailto:ol...@yahoo-inc.com] 
Sent: Monday, August 17, 2009 12:57 PM
To: pig-dev@hadoop.apache.org
Subject: RE: Pig 0.4.0 release

Hi Dmitry,

Non-committers get a non-binding vote.

Zebra needs Hadoop 20.1 because it is relying on TFile functionality that is 
not available in Hadoop 20. In general, the recommendation from the Hadoop team 
is to wait till hadoop 20.1 is released.

For the remainder of the issues, while I see that it would be nice to resolve 
them, I don't see them as blockers for Pig 0.4.0.

My plan was to release what's currently in the trunk and have a follow up patch 
releases if needed.

Olga

-Original Message-
From: Dmitriy Ryaboy [mailto:dvrya...@cloudera.com] 
Sent: Monday, August 17, 2009 12:04 PM
To: pig-dev@hadoop.apache.org
Subject: Re: Pig 0.4.0 release

Olga,

Do non-commiters get a vote?

Zebra is in trunk, but relies on 0.20, which is somewhat inconsistent
even if it's in contrib/

Would love to see dynamic (or at least static) shims incorporated into
the 0.4 release (see PIG-660, PIG-924)

There are a couple of bugs still outstanding that I think would need
to get fixed before a release:

https://issues.apache.org/jira/browse/PIG-859
https://issues.apache.org/jira/browse/PIG-925

 I think all of these can be solved within a week; assuming we are
talking about a release after these go into trunk, +1.

-D

On Mon, Aug 17, 2009 at 11:46 AM, Olga Natkovichol...@yahoo-inc.com wrote:
 Pig Developers,

 We have made several significant performance and other improvements over
 the last couple of months:

 (1)     Added an optimizer with several rules

 (2)     Introduced skew and merge joins

 (3)     Cleaned COUNT and AVG semantics

 I think it is time for another release to make this functionality
 available to users.

 I propose that Pig 0.4.0 is released against Hadoop 18 since most users
 are still using this version. Once Hadoop 20.1 is released, we will roll
 Pig 0.5.0 based on Hadoop 20.

 Please, vote on the proposal by Thursday.

 Olga

Build failed in Hudson: Pig-Patch-minerva.apache.org #166

2009-08-17 Thread Apache Hudson Server

See 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/166/changes

Changes:

[olga] PIG-892: Make COUNT and AVG deal with nulls accordingly with SQL 
standart(olgan)

--
[...truncated 111335 lines...]
 [exec] [junit] 09/08/17 20:53:33 INFO dfs.DataNode: Received block 
blk_-7573988517355695937_1011 of size 6 from /127.0.0.1
 [exec] [junit] 09/08/17 20:53:33 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:37312 is added to 
blk_-7573988517355695937_1011 size 6
 [exec] [junit] 09/08/17 20:53:33 INFO dfs.DataNode: PacketResponder 1 
for block blk_-7573988517355695937_1011 terminating
 [exec] [junit] 09/08/17 20:53:33 INFO dfs.DataNode: Received block 
blk_-7573988517355695937_1011 of size 6 from /127.0.0.1
 [exec] [junit] 09/08/17 20:53:33 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:44051 is added to 
blk_-7573988517355695937_1011 size 6
 [exec] [junit] 09/08/17 20:53:33 INFO dfs.DataNode: PacketResponder 2 
for block blk_-7573988517355695937_1011 terminating
 [exec] [junit] 09/08/17 20:53:33 INFO 
executionengine.HExecutionEngine: Connecting to hadoop file system at: 
hdfs://localhost:49389
 [exec] [junit] 09/08/17 20:53:33 INFO 
executionengine.HExecutionEngine: Connecting to map-reduce job tracker at: 
localhost:38842
 [exec] [junit] 09/08/17 20:53:33 INFO 
mapReduceLayer.MultiQueryOptimizer: MR plan size before optimization: 1
 [exec] [junit] 09/08/17 20:53:33 INFO 
mapReduceLayer.MultiQueryOptimizer: MR plan size after optimization: 1
 [exec] [junit] 09/08/17 20:53:33 WARN dfs.DataNode: Unexpected error 
trying to delete block blk_-6104859714580735196_1004. BlockInfo not found in 
volumeMap.
 [exec] [junit] 09/08/17 20:53:33 INFO dfs.DataNode: Deleting block 
blk_5262338578915024207_1006 file dfs/data/data8/current/blk_5262338578915024207
 [exec] [junit] 09/08/17 20:53:33 INFO dfs.DataNode: Deleting block 
blk_8979557582064004070_1005 file dfs/data/data7/current/blk_8979557582064004070
 [exec] [junit] 09/08/17 20:53:33 WARN dfs.DataNode: 
java.io.IOException: Error in deleting blocks.
 [exec] [junit] at 
org.apache.hadoop.dfs.FSDataset.invalidate(FSDataset.java:1146)
 [exec] [junit] at 
org.apache.hadoop.dfs.DataNode.processCommand(DataNode.java:793)
 [exec] [junit] at 
org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:663)
 [exec] [junit] at 
org.apache.hadoop.dfs.DataNode.run(DataNode.java:2888)
 [exec] [junit] at java.lang.Thread.run(Thread.java:619)
 [exec] [junit] 
 [exec] [junit] 09/08/17 20:53:33 INFO dfs.StateChange: BLOCK* ask 
127.0.0.1:33755 to delete  blk_8979557582064004070_1005 
blk_-6104859714580735196_1004 blk_5262338578915024207_1006
 [exec] [junit] 09/08/17 20:53:34 INFO 
mapReduceLayer.JobControlCompiler: Setting up single store job
 [exec] [junit] 09/08/17 20:53:34 WARN mapred.JobClient: Use 
GenericOptionsParser for parsing the arguments. Applications should implement 
Tool for the same.
 [exec] [junit] 09/08/17 20:53:34 INFO dfs.StateChange: BLOCK* 
NameSystem.allocateBlock: 
/tmp/hadoop-hudson/mapred/system/job_200908172052_0002/job.jar. 
blk_1529136330435284967_1012
 [exec] [junit] 09/08/17 20:53:34 INFO dfs.DataNode: Receiving block 
blk_1529136330435284967_1012 src: /127.0.0.1:38043 dest: /127.0.0.1:44051
 [exec] [junit] 09/08/17 20:53:34 INFO dfs.DataNode: Receiving block 
blk_1529136330435284967_1012 src: /127.0.0.1:41366 dest: /127.0.0.1:33755
 [exec] [junit] 09/08/17 20:53:34 INFO dfs.DataNode: Receiving block 
blk_1529136330435284967_1012 src: /127.0.0.1:36554 dest: /127.0.0.1:52175
 [exec] [junit] 09/08/17 20:53:34 INFO dfs.DataNode: Received block 
blk_1529136330435284967_1012 of size 1497498 from /127.0.0.1
 [exec] [junit] 09/08/17 20:53:34 INFO dfs.DataNode: PacketResponder 0 
for block blk_1529136330435284967_1012 terminating
 [exec] [junit] 09/08/17 20:53:34 INFO dfs.DataNode: Received block 
blk_1529136330435284967_1012 of size 1497498 from /127.0.0.1
 [exec] [junit] 09/08/17 20:53:34 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:52175 is added to 
blk_1529136330435284967_1012 size 1497498
 [exec] [junit] 09/08/17 20:53:34 INFO dfs.DataNode: PacketResponder 1 
for block blk_1529136330435284967_1012 terminating
 [exec] [junit] 09/08/17 20:53:34 INFO dfs.DataNode: Received block 
blk_1529136330435284967_1012 of size 1497498 from /127.0.0.1
 [exec] [junit] 09/08/17 20:53:34 INFO dfs.DataNode: PacketResponder 2 
for block blk_1529136330435284967_1012 terminating
 [exec] [junit] 09/08/17 20:53:34 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated:

[jira] Commented: (PIG-923) Allow setting logfile location in pig.properties

2009-08-17 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/PIG-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12744194#action_12744194
]

Hadoop QA commented on PIG-923:
---

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12416593/pig_923.patch
against trunk revision 804406.

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified
tests.
Please justify why no tests are needed for this patch.

+1 javadoc. The javadoc tool did not generate any warning messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/166/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/166/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/166/console

This message is automatically generated.

Allow setting logfile location in pig.properties

Key: PIG-923
URL: https://issues.apache.org/jira/browse/PIG-923
Project: Pig
Issue Type: Bug
Affects Versions: 0.3.0
Reporter: Dmitriy V. Ryaboy
Fix For: 0.4.0

Attachments: pig_923.patch

Local log file location can be specified through the -l flag, but it cannot
be set in pig.properties.
This JIRA proposes a change to Main.java that allows it to read the
pig.logfile property from the configuration.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-17 Thread Todd Lipcon (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12744216#action_12744216
 ] 

Todd Lipcon commented on PIG-924:
-

Oops, apparently it is Monday and my brain is scrambled. Above should read 
pretty important that a single build of *Pig* will work..., of course.

 Make Pig work with multiple versions of Hadoop
 --

 Key: PIG-924
 URL: https://issues.apache.org/jira/browse/PIG-924
 Project: Pig
  Issue Type: Bug
Reporter: Dmitriy V. Ryaboy
 Attachments: pig_924.patch


 The current Pig build scripts package hadoop and other dependencies into the 
 pig.jar file.
 This means that if users upgrade Hadoop, they also need to upgrade Pig.
 Pig has relatively few dependencies on Hadoop interfaces that changed between 
 18, 19, and 20.  It is possibly to write a dynamic shim that allows Pig to 
 use the correct calls for any of the above versions of Hadoop. Unfortunately, 
 the building process precludes us from the ability to do this at runtime, and 
 forces an unnecessary Pig rebuild even if dynamic shims are created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

RE: Pig 0.4.0 release

Rephrasing my question:

Till we release 0.5.0, will zebra's requirement on hadoop-0.20 prevent fixing 
of any bugs/issues with Piggybank? 

Santhosh

-Original Message-
From: Santhosh Srinivasan [mailto:s...@yahoo-inc.com] 
Sent: Monday, August 17, 2009 1:47 PM
To: pig-dev@hadoop.apache.org
Subject: RE: Pig 0.4.0 release

Till we release 0.5.0, will zebra's requirement on 0.20 prevent any bugs/issues 
with Piggybank?

Santhosh

-Original Message-
From: Olga Natkovich [mailto:ol...@yahoo-inc.com] 
Sent: Monday, August 17, 2009 1:43 PM
To: pig-dev@hadoop.apache.org
Subject: RE: Pig 0.4.0 release

Hi Santhosh,

What do you mean by fixing piggybank?

Olga

-Original Message-
From: Santhosh Srinivasan [mailto:s...@yahoo-inc.com] 
Sent: Monday, August 17, 2009 1:37 PM
To: pig-dev@hadoop.apache.org
Subject: RE: Pig 0.4.0 release

I have a question:

Will we be able to fix piggybank sources given that Zebra needs 0.20 and the 
rest of Pig requires 0.18? 

If the answer is yes then, +1 for the release. I agree with the plan of making 
0.4.0 with Hadoop-0.18 and a later release (0.5.0) for Hadoop-0.20.1.

Thanks,
Santhosh

-Original Message-
From: Olga Natkovich [mailto:ol...@yahoo-inc.com] 
Sent: Monday, August 17, 2009 12:57 PM
To: pig-dev@hadoop.apache.org
Subject: RE: Pig 0.4.0 release

Hi Dmitry,

Non-committers get a non-binding vote.

Zebra needs Hadoop 20.1 because it is relying on TFile functionality that is 
not available in Hadoop 20. In general, the recommendation from the Hadoop team 
is to wait till hadoop 20.1 is released.

For the remainder of the issues, while I see that it would be nice to resolve 
them, I don't see them as blockers for Pig 0.4.0.

My plan was to release what's currently in the trunk and have a follow up patch 
releases if needed.

Olga

-Original Message-
From: Dmitriy Ryaboy [mailto:dvrya...@cloudera.com] 
Sent: Monday, August 17, 2009 12:04 PM
To: pig-dev@hadoop.apache.org
Subject: Re: Pig 0.4.0 release

Olga,

Do non-commiters get a vote?

Zebra is in trunk, but relies on 0.20, which is somewhat inconsistent
even if it's in contrib/

Would love to see dynamic (or at least static) shims incorporated into
the 0.4 release (see PIG-660, PIG-924)

There are a couple of bugs still outstanding that I think would need
to get fixed before a release:

https://issues.apache.org/jira/browse/PIG-859
https://issues.apache.org/jira/browse/PIG-925

 I think all of these can be solved within a week; assuming we are
talking about a release after these go into trunk, +1.

-D


On Mon, Aug 17, 2009 at 11:46 AM, Olga Natkovichol...@yahoo-inc.com wrote:
 Pig Developers,



 We have made several significant performance and other improvements over
 the last couple of months:



 (1)     Added an optimizer with several rules

 (2)     Introduced skew and merge joins

 (3)     Cleaned COUNT and AVG semantics



 I think it is time for another release to make this functionality
 available to users.



 I propose that Pig 0.4.0 is released against Hadoop 18 since most users
 are still using this version. Once Hadoop 20.1 is released, we will roll
 Pig 0.5.0 based on Hadoop 20.



 Please, vote on the proposal by Thursday.



 Olga

[jira] Commented: (PIG-824) SQL interface for Pig

2009-08-17 Thread Thejas M Nair (JIRA)

[
https://issues.apache.org/jira/browse/PIG-824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12744238#action_12744238
]

Thejas M Nair commented on PIG-824:
---

JFlex.jar (required for build this patch) can be downloaded from
http://www.jflex.de/download.html .

SQL interface for Pig
-

Key: PIG-824
URL: https://issues.apache.org/jira/browse/PIG-824
Project: Pig
Issue Type: New Feature
Reporter: Olga Natkovich
Attachments: PIG-824.1.patch, PIG-824.binfiles.tar.gz, SQL_IN_PIG.html

In the last 18 month PigLatin has gained significant popularity within the
open source community. Many users like its data flow model, its rich type
system and its ability to work with any data available on HDFS or outside. We
have also heard from many users that having Pig speak SQL would bring many
more users. Having a single system that exports multiple interfaces is a big
advantage as it guarantees consistent semantics, custom code reuse, and
reduces the amount of maintenance. This is especially relevant for project
where using both interfaces for different parts of the system is relevant.
For instance, in a
data warehousing system, you would have ETL component that brings data into
the warehouse and a component that analyzes the data and produces reports.
PigLatin is uniquely suited for ETL processing while SQL might be a better
fit for report generation.
To start, it would make sense to implement a subset of SQL92 standard and to
be as much as possible standard compliant. This would include all the
standard constructs: select, from, where, group-by + having, order by, limit,
join (inner + outer). Several extensions such as support for pig's UDFs and
possibly streaming, multiquery and support for pig's complex types would be
helpful.
This work is dependent on metadata support outlined in
https://issues.apache.org/jira/browse/PIG-823

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Proposal to create a branch for contrib project Zebra

2009-08-17 Thread Raghu Angadi



Thanks to the PIG team, The first version of contrib project Zebra 
(PIG-833) is committed to PIG trunk.


In short, Zebra is a table storage layer built for use in PIG and other 
Hadoop applications.


While we are stabilizing current version V1 in the trunk, we plan to add 
more new features to it. We would like to create an svn branch for the 
new features. We will be responsible for managing zebra in PIG trunk and 
in the new branch. We will merge the branch when it is ready. We expect 
the changes to affect only 'contrib/zebra' directory.


As a regular contributor to Hadoop, I will be the initial committer for 
Zebra. As more patches are contributed by other Zebra developers, there 
might be more commiters added through normal Hadoop/Apache procedure.


I would like to create a branch called 'zebra-v2' with approval from PIG 
team.


Thanks,
Raghu.

RE: Proposal to create a branch for contrib project Zebra

+1

-Original Message-
From: Raghu Angadi [mailto:rang...@yahoo-inc.com] 
Sent: Monday, August 17, 2009 4:06 PM
To: pig-dev@hadoop.apache.org
Subject: Proposal to create a branch for contrib project Zebra

Thanks to the PIG team, The first version of contrib project Zebra 
(PIG-833) is committed to PIG trunk.

In short, Zebra is a table storage layer built for use in PIG and other 
Hadoop applications.

While we are stabilizing current version V1 in the trunk, we plan to add

more new features to it. We would like to create an svn branch for the 
new features. We will be responsible for managing zebra in PIG trunk and

in the new branch. We will merge the branch when it is ready. We expect 
the changes to affect only 'contrib/zebra' directory.

As a regular contributor to Hadoop, I will be the initial committer for 
Zebra. As more patches are contributed by other Zebra developers, there 
might be more commiters added through normal Hadoop/Apache procedure.

I would like to create a branch called 'zebra-v2' with approval from PIG

team.

Thanks,
Raghu.

[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop


[ 
https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12744273#action_12744273
 ] 

Daniel Dai commented on PIG-924:


I am reviewing the patch.

 Make Pig work with multiple versions of Hadoop
 --

 Key: PIG-924
 URL: https://issues.apache.org/jira/browse/PIG-924
 Project: Pig
  Issue Type: Bug
Reporter: Dmitriy V. Ryaboy
 Attachments: pig_924.patch


 The current Pig build scripts package hadoop and other dependencies into the 
 pig.jar file.
 This means that if users upgrade Hadoop, they also need to upgrade Pig.
 Pig has relatively few dependencies on Hadoop interfaces that changed between 
 18, 19, and 20.  It is possibly to write a dynamic shim that allows Pig to 
 use the correct calls for any of the above versions of Hadoop. Unfortunately, 
 the building process precludes us from the ability to do this at runtime, and 
 forces an unnecessary Pig rebuild even if dynamic shims are created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

RE: Proposal to create a branch for contrib project Zebra

My vote is -1 

-Original Message-
From: Santhosh Srinivasan 
Sent: Monday, August 17, 2009 4:38 PM
To: 'pig-dev@hadoop.apache.org'
Subject: RE: Proposal to create a branch for contrib project Zebra

Is there any precedence for such proposals? I am not comfortable with
extending committer access to contrib teams. I would suggest that Zebra
be made a sub-project of Hadoop and have a life of its own.

Santhosh 

-Original Message-
From: Raghu Angadi [mailto:rang...@yahoo-inc.com] 
Sent: Monday, August 17, 2009 4:06 PM
To: pig-dev@hadoop.apache.org
Subject: Proposal to create a branch for contrib project Zebra

Thanks to the PIG team, The first version of contrib project Zebra 
(PIG-833) is committed to PIG trunk.

In short, Zebra is a table storage layer built for use in PIG and other 
Hadoop applications.

While we are stabilizing current version V1 in the trunk, we plan to add

more new features to it. We would like to create an svn branch for the 
new features. We will be responsible for managing zebra in PIG trunk and

in the new branch. We will merge the branch when it is ready. We expect 
the changes to affect only 'contrib/zebra' directory.

As a regular contributor to Hadoop, I will be the initial committer for 
Zebra. As more patches are contributed by other Zebra developers, there 
might be more commiters added through normal Hadoop/Apache procedure.

I would like to create a branch called 'zebra-v2' with approval from PIG

team.

Thanks,
Raghu.

RE: Proposal to create a branch for contrib project Zebra

Is there any precedence for such proposals? I am not comfortable with
extending committer access to contrib teams. I would suggest that Zebra
be made a sub-project of Hadoop and have a life of its own.

Santhosh 

-Original Message-
From: Raghu Angadi [mailto:rang...@yahoo-inc.com] 
Sent: Monday, August 17, 2009 4:06 PM
To: pig-dev@hadoop.apache.org
Subject: Proposal to create a branch for contrib project Zebra

Thanks to the PIG team, The first version of contrib project Zebra 
(PIG-833) is committed to PIG trunk.

In short, Zebra is a table storage layer built for use in PIG and other 
Hadoop applications.

While we are stabilizing current version V1 in the trunk, we plan to add

more new features to it. We would like to create an svn branch for the 
new features. We will be responsible for managing zebra in PIG trunk and

in the new branch. We will merge the branch when it is ready. We expect 
the changes to affect only 'contrib/zebra' directory.

As a regular contributor to Hadoop, I will be the initial committer for 
Zebra. As more patches are contributed by other Zebra developers, there 
might be more commiters added through normal Hadoop/Apache procedure.

I would like to create a branch called 'zebra-v2' with approval from PIG

team.

Thanks,
Raghu.

RE: Proposal to create a branch for contrib project Zebra

Raghu is PMC member and as such already has committer rights to all
subprojects. So we are not breaking any new grounds here. The reasoning
is the same as for creating branches for Pig multiquery work that we did
in Pig.

Olga

-Original Message-
From: Santhosh Srinivasan [mailto:s...@yahoo-inc.com] 
Sent: Monday, August 17, 2009 4:39 PM
To: Santhosh Srinivasan; pig-dev@hadoop.apache.org
Subject: RE: Proposal to create a branch for contrib project Zebra

My vote is -1 

-Original Message-
From: Santhosh Srinivasan 
Sent: Monday, August 17, 2009 4:38 PM
To: 'pig-dev@hadoop.apache.org'
Subject: RE: Proposal to create a branch for contrib project Zebra

Is there any precedence for such proposals? I am not comfortable with
extending committer access to contrib teams. I would suggest that Zebra
be made a sub-project of Hadoop and have a life of its own.

Santhosh 

-Original Message-
From: Raghu Angadi [mailto:rang...@yahoo-inc.com] 
Sent: Monday, August 17, 2009 4:06 PM
To: pig-dev@hadoop.apache.org
Subject: Proposal to create a branch for contrib project Zebra


Thanks to the PIG team, The first version of contrib project Zebra 
(PIG-833) is committed to PIG trunk.

In short, Zebra is a table storage layer built for use in PIG and other 
Hadoop applications.

While we are stabilizing current version V1 in the trunk, we plan to add

more new features to it. We would like to create an svn branch for the 
new features. We will be responsible for managing zebra in PIG trunk and

in the new branch. We will merge the branch when it is ready. We expect 
the changes to affect only 'contrib/zebra' directory.

As a regular contributor to Hadoop, I will be the initial committer for 
Zebra. As more patches are contributed by other Zebra developers, there 
might be more commiters added through normal Hadoop/Apache procedure.

I would like to create a branch called 'zebra-v2' with approval from PIG

team.

Thanks,
Raghu.

Re: Proposal to create a branch for contrib project Zebra

2009-08-17 Thread Yiping Han

+1


On 8/18/09 7:11 AM, Olga Natkovich ol...@yahoo-inc.com wrote:

 +1
 
 -Original Message-
 From: Raghu Angadi [mailto:rang...@yahoo-inc.com]
 Sent: Monday, August 17, 2009 4:06 PM
 To: pig-dev@hadoop.apache.org
 Subject: Proposal to create a branch for contrib project Zebra
 
 
 Thanks to the PIG team, The first version of contrib project Zebra
 (PIG-833) is committed to PIG trunk.
 
 In short, Zebra is a table storage layer built for use in PIG and other
 Hadoop applications.
 
 While we are stabilizing current version V1 in the trunk, we plan to add
 
 more new features to it. We would like to create an svn branch for the
 new features. We will be responsible for managing zebra in PIG trunk and
 
 in the new branch. We will merge the branch when it is ready. We expect
 the changes to affect only 'contrib/zebra' directory.
 
 As a regular contributor to Hadoop, I will be the initial committer for
 Zebra. As more patches are contributed by other Zebra developers, there
 might be more commiters added through normal Hadoop/Apache procedure.
 
 I would like to create a branch called 'zebra-v2' with approval from PIG
 
 team.
 
 Thanks,
 Raghu.

--
Yiping Han
F-3140 
(408)349-4403
y...@yahoo-inc.com

[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-17 Thread Todd Lipcon (JIRA)

[
https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12744305#action_12744305
]

Todd Lipcon commented on PIG-924:
-

Couple notes on the patch:

- you've turned javac.deprecation from on to off - seems unwise. perhaps
you should just do this for the one javac task where you want that behavior
- src.shims.dir.com in the build.xml has a REMOVE mark on it - is this still
needed? it looks like it is, but perhaps is better named .common instead of .com
- you've moved junit.hadoop.conf into basedir instead of ${user.home} - this
seems reasonable but is orthogonal to this patch. should be a separate JIRA
- why are we now excluding HBase storage test?
- some spurious whitespace changes (eg TypeCheckingVisitor.java)
- in MRCompiler, a factor of 0.9 seems to have disappeared. the commented-out
line should be removed
- some tab characters seem to be introduced
- in MiniCluster, also some commented-out code which should be cleaned up

Make Pig work with multiple versions of Hadoop
--

Key: PIG-924
URL: https://issues.apache.org/jira/browse/PIG-924
Project: Pig
Issue Type: Bug
Reporter: Dmitriy V. Ryaboy
Attachments: pig_924.patch

The current Pig build scripts package hadoop and other dependencies into the
pig.jar file.
This means that if users upgrade Hadoop, they also need to upgrade Pig.
Pig has relatively few dependencies on Hadoop interfaces that changed between
18, 19, and 20. It is possibly to write a dynamic shim that allows Pig to
use the correct calls for any of the above versions of Hadoop. Unfortunately,
the building process precludes us from the ability to do this at runtime, and
forces an unnecessary Pig rebuild even if dynamic shims are created.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: Proposal to create a branch for contrib project Zebra

2009-08-17 Thread Milind A Bhandarkar

IANAC, but my (non-binding) vote is also -1. I think all the improvements
and feature addition to zebra should be available through pig trunk. The
codebase is not big enough to justify creating a branch. If the reason is
Pig's dependence on a checked in hadoop jar, the shims proposal by Dmitry
should be taken up asap, so that those who want to use zebra can use pig
trunk with hadoop 0.20

- milind


On 8/17/09 5:14 PM, Yiping Han y...@yahoo-inc.com wrote:

 +1
 
 
 On 8/18/09 7:11 AM, Olga Natkovich ol...@yahoo-inc.com wrote:
 
 +1
 
 -Original Message-
 From: Raghu Angadi [mailto:rang...@yahoo-inc.com]
 Sent: Monday, August 17, 2009 4:06 PM
 To: pig-dev@hadoop.apache.org
 Subject: Proposal to create a branch for contrib project Zebra
 
 
 Thanks to the PIG team, The first version of contrib project Zebra
 (PIG-833) is committed to PIG trunk.
 
 In short, Zebra is a table storage layer built for use in PIG and other
 Hadoop applications.
 
 While we are stabilizing current version V1 in the trunk, we plan to add
 
 more new features to it. We would like to create an svn branch for the
 new features. We will be responsible for managing zebra in PIG trunk and
 
 in the new branch. We will merge the branch when it is ready. We expect
 the changes to affect only 'contrib/zebra' directory.
 
 As a regular contributor to Hadoop, I will be the initial committer for
 Zebra. As more patches are contributed by other Zebra developers, there
 might be more commiters added through normal Hadoop/Apache procedure.
 
 I would like to create a branch called 'zebra-v2' with approval from PIG
 
 team.
 
 Thanks,
 Raghu.
 
 --
 Yiping Han
 F-3140 
 (408)349-4403
 y...@yahoo-inc.com
 


-- 
Milind Bhandarkar
Y!IM: GridSolutions
Tel: 408-349-2136 
(mili...@yahoo-inc.com)

[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop

[
https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12744307#action_12744307
]

Dmitriy V. Ryaboy commented on PIG-924:
---

Thanks for looking, Todd -- most of those changes, like the factor of 0.9,
deprecation, excluding HBase test, etc, are consistent with the 0.20 patch
posted to PIG-660 .
Moving junit.hadoop.conf is critical -- there are comments about this in 660 --
without it, resetting hadoop.version doesn't actually work, as some of the
information from a previous build sticks around.

I'll fix the whitespace; this wasn't a final patch, more of a proof of concept.
Point being this could work, but it can't, because Hadoop is bundled in the
jar. I am looking for comments from the core developer team regarding the
possibility of un-bundling.

Make Pig work with multiple versions of Hadoop
--

Key: PIG-924
URL: https://issues.apache.org/jira/browse/PIG-924
Project: Pig
Issue Type: Bug
Reporter: Dmitriy V. Ryaboy
Attachments: pig_924.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-17 Thread Todd Lipcon (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12744310#action_12744310
 ] 

Todd Lipcon commented on PIG-924:
-

Gotcha, thanks for explaining. Aside from the nits, patch looks good to me.

 Make Pig work with multiple versions of Hadoop
 --

 Key: PIG-924
 URL: https://issues.apache.org/jira/browse/PIG-924
 Project: Pig
  Issue Type: Bug
Reporter: Dmitriy V. Ryaboy
 Attachments: pig_924.patch


 The current Pig build scripts package hadoop and other dependencies into the 
 pig.jar file.
 This means that if users upgrade Hadoop, they also need to upgrade Pig.
 Pig has relatively few dependencies on Hadoop interfaces that changed between 
 18, 19, and 20.  It is possibly to write a dynamic shim that allows Pig to 
 use the correct calls for any of the above versions of Hadoop. Unfortunately, 
 the building process precludes us from the ability to do this at runtime, and 
 forces an unnecessary Pig rebuild even if dynamic shims are created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Build failed in Hudson: Pig-Patch-minerva.apache.org #167

2009-08-17 Thread Apache Hudson Server

See http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/167/

--
[...truncated 111282 lines...]
 [exec] [junit] 09/08/18 01:01:56 INFO dfs.DataNode: PacketResponder 2 
for block blk_3027939285115887556_1011 terminating
 [exec] [junit] 09/08/18 01:01:56 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:57132 is added to 
blk_3027939285115887556_1011 size 6
 [exec] [junit] 09/08/18 01:01:56 INFO 
executionengine.HExecutionEngine: Connecting to hadoop file system at: 
hdfs://localhost:57553
 [exec] [junit] 09/08/18 01:01:56 INFO 
executionengine.HExecutionEngine: Connecting to map-reduce job tracker at: 
localhost:52163
 [exec] [junit] 09/08/18 01:01:56 INFO 
mapReduceLayer.MultiQueryOptimizer: MR plan size before optimization: 1
 [exec] [junit] 09/08/18 01:01:56 INFO 
mapReduceLayer.MultiQueryOptimizer: MR plan size after optimization: 1
 [exec] [junit] 09/08/18 01:01:57 INFO dfs.DataNode: Deleting block 
blk_-4258982856765574979_1005 file 
dfs/data/data5/current/blk_-4258982856765574979
 [exec] [junit] 09/08/18 01:01:57 WARN dfs.DataNode: Unexpected error 
trying to delete block blk_5421843601365247738_1004. BlockInfo not found in 
volumeMap.
 [exec] [junit] 09/08/18 01:01:57 INFO dfs.DataNode: Deleting block 
blk_8703349292237962083_1006 file dfs/data/data6/current/blk_8703349292237962083
 [exec] [junit] 09/08/18 01:01:57 WARN dfs.DataNode: 
java.io.IOException: Error in deleting blocks.
 [exec] [junit] at 
org.apache.hadoop.dfs.FSDataset.invalidate(FSDataset.java:1146)
 [exec] [junit] at 
org.apache.hadoop.dfs.DataNode.processCommand(DataNode.java:793)
 [exec] [junit] at 
org.apache.hadoop.dfs.DataNode.offerService(DataNode.java:663)
 [exec] [junit] at 
org.apache.hadoop.dfs.DataNode.run(DataNode.java:2888)
 [exec] [junit] at java.lang.Thread.run(Thread.java:619)
 [exec] [junit] 
 [exec] [junit] 09/08/18 01:01:57 INFO 
mapReduceLayer.JobControlCompiler: Setting up single store job
 [exec] [junit] 09/08/18 01:01:57 WARN mapred.JobClient: Use 
GenericOptionsParser for parsing the arguments. Applications should implement 
Tool for the same.
 [exec] [junit] 09/08/18 01:01:57 INFO dfs.StateChange: BLOCK* 
NameSystem.allocateBlock: 
/tmp/hadoop-hudson/mapred/system/job_200908180101_0002/job.jar. 
blk_-9109308561601697298_1012
 [exec] [junit] 09/08/18 01:01:57 INFO dfs.DataNode: Receiving block 
blk_-9109308561601697298_1012 src: /127.0.0.1:46186 dest: /127.0.0.1:48123
 [exec] [junit] 09/08/18 01:01:57 INFO dfs.DataNode: Receiving block 
blk_-9109308561601697298_1012 src: /127.0.0.1:57220 dest: /127.0.0.1:52254
 [exec] [junit] 09/08/18 01:01:57 INFO dfs.DataNode: Receiving block 
blk_-9109308561601697298_1012 src: /127.0.0.1:41307 dest: /127.0.0.1:57132
 [exec] [junit] 09/08/18 01:01:57 INFO dfs.DataNode: Received block 
blk_-9109308561601697298_1012 of size 1498535 from /127.0.0.1
 [exec] [junit] 09/08/18 01:01:57 INFO dfs.DataNode: PacketResponder 0 
for block blk_-9109308561601697298_1012 terminating
 [exec] [junit] 09/08/18 01:01:57 INFO dfs.DataNode: Received block 
blk_-9109308561601697298_1012 of size 1498535 from /127.0.0.1
 [exec] [junit] 09/08/18 01:01:57 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:57132 is added to 
blk_-9109308561601697298_1012 size 1498535
 [exec] [junit] 09/08/18 01:01:57 INFO dfs.DataNode: PacketResponder 1 
for block blk_-9109308561601697298_1012 terminating
 [exec] [junit] 09/08/18 01:01:57 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:52254 is added to 
blk_-9109308561601697298_1012 size 1498535
 [exec] [junit] 09/08/18 01:01:57 INFO dfs.DataNode: Received block 
blk_-9109308561601697298_1012 of size 1498535 from /127.0.0.1
 [exec] [junit] 09/08/18 01:01:57 INFO dfs.DataNode: PacketResponder 2 
for block blk_-9109308561601697298_1012 terminating
 [exec] [junit] 09/08/18 01:01:57 INFO dfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:48123 is added to 
blk_-9109308561601697298_1012 size 1498535
 [exec] [junit] 09/08/18 01:01:57 INFO fs.FSNamesystem: Increasing 
replication for file 
/tmp/hadoop-hudson/mapred/system/job_200908180101_0002/job.jar. New replication 
is 2
 [exec] [junit] 09/08/18 01:01:57 INFO fs.FSNamesystem: Reducing 
replication for file 
/tmp/hadoop-hudson/mapred/system/job_200908180101_0002/job.jar. New replication 
is 2
 [exec] [junit] 09/08/18 01:01:58 INFO dfs.StateChange: BLOCK* 
NameSystem.allocateBlock: 
/tmp/hadoop-hudson/mapred/system/job_200908180101_0002/job.split. 
blk_7949841277658716829_1013
 [exec] [junit] 09/08/18 01:01:58 INFO

[jira] Commented: (PIG-925) Fix join in local mode

2009-08-17 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/PIG-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12744316#action_12744316
]

Hadoop QA commented on PIG-925:
---

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12416812/PIG-925-1.patch
against trunk revision 804406.

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 3 new or modified tests.

+1 javadoc. The javadoc tool did not generate any warning messages.

-1 javac. The applied patch generated 251 javac compiler warnings (more
than the trunk's current 250 warnings).

+1 findbugs. The patch does not introduce any new Findbugs warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

+1 core tests. The patch passed core unit tests.

+1 contrib tests. The patch passed contrib unit tests.

Test results:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/167/testReport/
Findbugs warnings:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/167/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output:
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/167/console

This message is automatically generated.

Fix join in local mode
--

Key: PIG-925
URL: https://issues.apache.org/jira/browse/PIG-925
Project: Pig
Issue Type: Bug
Components: impl
Affects Versions: 0.3.0
Reporter: Daniel Dai
Assignee: Daniel Dai
Fix For: 0.4.0

Attachments: PIG-925-1.patch

Join is broken after LOJoin patch (Optimizer_Phase5.patch of
[PIG-697|https://issues.apache.org/jira/browse/PIG-697). Even the simplest
join script is not working under local mode:
eg:
a = load '1.txt';
b = load '2.txt';
c = join a by $0, b by $0;
dump c;
Caused by: java.lang.NullPointerException
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNext(POPackage.java:206)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:191)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
at
org.apache.pig.backend.local.executionengine.physicalLayer.counters.POCounter.getNext(POCounter.java:71)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
at
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117)
at
org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:146)
at
org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:109)
at
org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:165)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: Proposal to create a branch for contrib project Zebra

2009-08-17 Thread Arun C Murthy



On Aug 17, 2009, at 4:38 PM, Santhosh Srinivasan wrote:


Is there any precedence for such proposals? I am not comfortable with
extending committer access to contrib teams. I would suggest that  
Zebra

be made a sub-project of Hadoop and have a life of its own.



There has been sufficient precedence for 'contrib committers' in  
Hadoop (e.g. Chukwa vis-a-vis the former 'Hadoop Core' sub-project)  
and is normal within the Apache world for committers with specific  
'roles' e.g specific Contrib modules, QA, Release/Build etc. (http://hadoop.apache.org/common/credits.html 
 - in fact, Giridharan Kesavan is an unlisted 'release' committer for  
Apache Hadoop)


I believe it's a desired, nay stated,  goal for Zebra to graduate as a  
Hadoop sub-project eventually, based on which it was voted-in as a  
contrib module by the Apache Pig.


Given these, I don't see  any cause for concern here.

Arun


Santhosh

-Original Message-
From: Raghu Angadi [mailto:rang...@yahoo-inc.com]
Sent: Monday, August 17, 2009 4:06 PM
To: pig-dev@hadoop.apache.org
Subject: Proposal to create a branch for contrib project Zebra


Thanks to the PIG team, The first version of contrib project Zebra
(PIG-833) is committed to PIG trunk.

In short, Zebra is a table storage layer built for use in PIG and  
other

Hadoop applications.

While we are stabilizing current version V1 in the trunk, we plan to  
add


more new features to it. We would like to create an svn branch for the
new features. We will be responsible for managing zebra in PIG trunk  
and


in the new branch. We will merge the branch when it is ready. We  
expect

the changes to affect only 'contrib/zebra' directory.

As a regular contributor to Hadoop, I will be the initial committer  
for
Zebra. As more patches are contributed by other Zebra developers,  
there

might be more commiters added through normal Hadoop/Apache procedure.

I would like to create a branch called 'zebra-v2' with approval from  
PIG


team.

Thanks,
Raghu.

[jira] Commented: (PIG-833) Storage access layer

2009-08-17 Thread Jeff Hammerbacher (JIRA)

[
https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12744323#action_12744323
]

Jeff Hammerbacher commented on PIG-833:
---

Hey,

Raghu, you mention that a design document is forthcoming. It would be great to
have a PDF design document, like Matei's for the fair scheduler, in addition to
the Javadoc and wiki page. Any progress on that front? I'm quite interested in
learning more about Zebra's use and implementation.

On a larger note, it would be great if Pig moved to the Hadoop model for new
features, where a design document and test plan is required to commit. See
https://issues.apache.org/jira/browse/HADOOP-5587. It's tough to digest the
bulk dumps of Owl, Zebra, and Giraffe, though we certainly appreciate the work
Yahoo has done on these projects!

Thanks,
Jeff

Storage access layer

Key: PIG-833
URL: https://issues.apache.org/jira/browse/PIG-833
Project: Pig
Issue Type: New Feature
Reporter: Jay Tang
Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch,
PIG-833-zebra.patch.bz2, PIG-833-zebra.patch.bz2,
TEST-org.apache.hadoop.zebra.pig.TestCheckin1.txt, test.out, zebra-javadoc.tgz

A layer is needed to provide a high level data access abstraction and a
tabular view of data in Hadoop, and could free Pig users from implementing
their own data storage/retrieval code. This layer should also include a
columnar storage format in order to provide fast data projection,
CPU/space-efficient data serialization, and a schema language to manage
physical storage metadata. Eventually it could also support predicate
pushdown for further performance improvement. Initially, this layer could be
a contrib project in Pig and become a hadoop subproject later on.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-823) Hadoop Metadata Service

2009-08-17 Thread Jeff Hammerbacher (JIRA)

[
https://issues.apache.org/jira/browse/PIG-823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12744326#action_12744326
]

Jeff Hammerbacher commented on PIG-823:
---

Hey,

Great to see Owl source! I've filed a ticket over on the Hive project
(https://issues.apache.org/jira/browse/HIVE-762) to see if we can find some
common ground between Pig and Hive's metadata needs; it would be great to have
a single metadata service for all of Hadoop's structured data manipulation
tools. If you're interested, please chime in there (or open a ticket here?
Whatever seems sane to you).

Thanks,
Jeff

Hadoop Metadata Service
---

Key: PIG-823
URL: https://issues.apache.org/jira/browse/PIG-823
Project: Pig
Issue Type: New Feature
Reporter: Olga Natkovich
Attachments: owl.filelist, owl.patch.gz, owl_libdeps.tgz,
owl_otherdeps.tgz

This JIRA is created to track development of a metadata system for Hadoop.
The goal of the system is to allow users and applications to register data
stored on HDFS, search for the data available on HDFS, and associate metadata
such as schema, statistics, etc. with a particular data unit or a data set
stored on HDFS. The initial goal is to provide a fairly generic, low level
abstraction that any user or application on HDFS can use to store an retrieve
metadata. Over time a higher level abstractions closely tied to particular
applications or tools can be developed.
Over time, it would make sense for the metadata service to become a
subproject within Hadoop. For now, the proposal is to make it a contrib to
Pig since Pig SQL is likely to be the first user of the system.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: Proposal to create a branch for contrib project Zebra

2009-08-17 Thread Arun C Murthy



That leaves us with contrib committers.

Can you point to earlier email threads that cover the topic of giving
committer access to contrib projects? Specifically, what does it  
mean to

award someone committer privileges to a contrib project, what are the
access privileges that come with such rights, what are the dos/don'ts,
etc.



Chukwa was a contrib module prior to it's current avatar as a full- 
fledged sub-project.


It's 'contrib committers' Ari Rabkin and Eric Yang became it's first  
committers: http://markmail.org/message/75qvvcigi3qumifp


Unfortunately the email threads for voting contrib committers are  
private to the Hadoop PMC, you'll just have to take my word for it.  
*smile*

I did dig-up some other examples for you:
http://www.gossamer-threads.com/lists/lucene/java-dev/81122
http://www.nabble.com/ANNOUNCE:-Welcome--as-Contrib-Committer-td21506295.html

Contrib committers have privileges to commit only to their 'module':  
pig/trunk/contrib/zebra in this case.




Thirdly, are there instances of contrib committers creating branches?



Branches are a development tool... I don't see the problem with  
creating/using them.


Arun

[jira] Updated: (PIG-911) [Piggybank] SequenceFileLoader


 [ 
https://issues.apache.org/jira/browse/PIG-911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-911:
--

Status: Open  (was: Patch Available)

 [Piggybank] SequenceFileLoader 
 ---

 Key: PIG-911
 URL: https://issues.apache.org/jira/browse/PIG-911
 Project: Pig
  Issue Type: New Feature
Reporter: Dmitriy V. Ryaboy
 Attachments: pig_911.2.patch, pig_sequencefile.patch


 The proposed piggybank contribution adds a SequenceFileLoader to the 
 piggybank.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-911) [Piggybank] SequenceFileLoader


 [ 
https://issues.apache.org/jira/browse/PIG-911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitriy V. Ryaboy updated PIG-911:
--

Attachment: pig_911.2.patch

Addressed Alan's comments.

 [Piggybank] SequenceFileLoader 
 ---

 Key: PIG-911
 URL: https://issues.apache.org/jira/browse/PIG-911
 Project: Pig
  Issue Type: New Feature
Reporter: Dmitriy V. Ryaboy
 Attachments: pig_911.2.patch, pig_sequencefile.patch


 The proposed piggybank contribution adds a SequenceFileLoader to the 
 piggybank.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-911) [Piggybank] SequenceFileLoader


[ 
https://issues.apache.org/jira/browse/PIG-911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12744343#action_12744343
 ] 

Dmitriy V. Ryaboy commented on PIG-911:
---

Concerning making this a StoreFunc, as well -- the StoreFunc interface is not 
very friendly to this.
All you get in the bind call is the output stream; for LoadFunc, you also get 
the name of the file (or, presumably, whatever it was the user passed in under 
the guise of a file name).  This means that for the LoadFunc, I was able to use 
the passed in filename to back into a Path and a FileSystem.  I can't do the 
same for StoreFunc, where the filename is not available -- only the output 
stream is.  That means I can't create the appropriate SequenceFile.Writer .  Is 
there a way around this limitation that does not involve requiring special 
constructor parameters to be used?  
Is it possible to change the StoreFunc api to provide this information, or to 
make it available through some side channel (MapRedUtils or similar)?

 [Piggybank] SequenceFileLoader 
 ---

 Key: PIG-911
 URL: https://issues.apache.org/jira/browse/PIG-911
 Project: Pig
  Issue Type: New Feature
Reporter: Dmitriy V. Ryaboy
 Attachments: pig_911.2.patch, pig_sequencefile.patch


 The proposed piggybank contribution adds a SequenceFileLoader to the 
 piggybank.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: Proposal to create a branch for contrib project Zebra

2009-08-17 Thread Raghu Angadi


Hi Santosh,

There are two separate things :
  (a) voting a contributor as a committer
  (b) committing to a contrib project.

(b):
My experience with Hadoop is that Contrib by definition is very 
loosely coupled with core. By convention, we as committers to core 
(hdfs, mapred, etc) did not have to monitor changes to contrib as 
thoroughly as we would monitor core changes. It is the responsibility of 
contrib developers to make sure they are not breaking builds etc. 
Contrib changes get reviewed by people interested in the project.


(a):
Voting takes place when a contributor is being blessed as a committer. 
It involves some legal stuff as well. Although a committer has 
permissions to commit to any part of a project, it is expected that they 
don't misuse it. e.g. if I have a patch for core Map/Reduce, I would 
certainly wait for a regular MR contributor to review it and possibly 
commit it. It does not matter how many patches I might have contributed 
to say HDFS.


Reason for (a) is simple scalability. We can not monitor everything. If 
you or another PIG developer volunteers to commit zebra patches, we are 
more than happy to let you do it. Please let us know. Or at any stage, 
if you feel we may be violating normal conventions (like breaking builds 
or committing some PIG changes).. please raise the issue. We have not 
seen serious problems in this regd with any other project, I think we 
should get benefit or doubt.


I have not addressed the reason for a new branch here. will pitch for it 
another mail.


Raghu.

Santhosh Srinivasan wrote:

Is there any precedence for such proposals? I am not comfortable with
extending committer access to contrib teams. I would suggest that Zebra
be made a sub-project of Hadoop and have a life of its own.

Santhosh 


-Original Message-
From: Raghu Angadi [mailto:rang...@yahoo-inc.com] 
Sent: Monday, August 17, 2009 4:06 PM

To: pig-dev@hadoop.apache.org
Subject: Proposal to create a branch for contrib project Zebra


Thanks to the PIG team, The first version of contrib project Zebra 
(PIG-833) is committed to PIG trunk.


In short, Zebra is a table storage layer built for use in PIG and other 
Hadoop applications.


While we are stabilizing current version V1 in the trunk, we plan to add

more new features to it. We would like to create an svn branch for the 
new features. We will be responsible for managing zebra in PIG trunk and


in the new branch. We will merge the branch when it is ready. We expect 
the changes to affect only 'contrib/zebra' directory.


As a regular contributor to Hadoop, I will be the initial committer for 
Zebra. As more patches are contributed by other Zebra developers, there 
might be more commiters added through normal Hadoop/Apache procedure.


I would like to create a branch called 'zebra-v2' with approval from PIG

team.

Thanks,
Raghu.

[jira] Updated: (PIG-925) Fix join in local mode


 [ 
https://issues.apache.org/jira/browse/PIG-925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-925:
---

Status: Patch Available  (was: Open)

 Fix join in local mode
 --

 Key: PIG-925
 URL: https://issues.apache.org/jira/browse/PIG-925
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.3.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.4.0

 Attachments: PIG-925-1.patch, PIG-925-2.patch


 Join is broken after LOJoin patch (Optimizer_Phase5.patch of 
 [PIG-697|https://issues.apache.org/jira/browse/PIG-697). Even the simplest 
 join script is not working under local mode:
 eg:
 a = load '1.txt';
 b = load '2.txt';
 c = join a by $0, b by $0;
 dump c;
 Caused by: java.lang.NullPointerException
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNext(POPackage.java:206)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:191)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.local.executionengine.physicalLayer.counters.POCounter.getNext(POCounter.java:71)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117)
 at 
 org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:146)
 at 
 org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:109)
 at 
 org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:165)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-925) Fix join in local mode


 [ 
https://issues.apache.org/jira/browse/PIG-925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-925:
---

Attachment: PIG-925-2.patch

Address the javac warning

 Fix join in local mode
 --

 Key: PIG-925
 URL: https://issues.apache.org/jira/browse/PIG-925
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.3.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.4.0

 Attachments: PIG-925-1.patch, PIG-925-2.patch


 Join is broken after LOJoin patch (Optimizer_Phase5.patch of 
 [PIG-697|https://issues.apache.org/jira/browse/PIG-697). Even the simplest 
 join script is not working under local mode:
 eg:
 a = load '1.txt';
 b = load '2.txt';
 c = join a by $0, b by $0;
 dump c;
 Caused by: java.lang.NullPointerException
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POPackage.getNext(POPackage.java:206)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:191)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.local.executionengine.physicalLayer.counters.POCounter.getNext(POCounter.java:71)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:117)
 at 
 org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:146)
 at 
 org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:109)
 at 
 org.apache.pig.backend.local.executionengine.LocalExecutionEngine.execute(LocalExecutionEngine.java:165)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-925) Fix join in local mode