Re: Pig 0.4.0 release

2009-08-18 Thread Dmitriy Ryaboy
I am about to submit a cleaned up patch for 924.
It works fine as a static patch (in fact I can attach it to 660 as
well) -- compiling with -Dhadoop.version=XX works as proposed for the
static shims. It does the necessary prep for the code to be able to
switch based on what's in its classpath, but it does not require
unbundling to work statically.

The hadoop20 jar attached to the zebra ticket is built in a different
way than 18 and 19; it does not report its version (18 and 19 do).
Right now I get around it by hard-coding a special case (Unknown =
20), but that's obviously suboptimal. Could someone rebuild
hadoop20.jar the way Pig wants it, and with the proper version
identification?  If that happens, 924/660 can go in together with
hadoop20.jar and users will at least be able to build against a static
version of hadoop without requiring a patch.

-Dmitriy

On Tue, Aug 18, 2009 at 9:56 AM, Alan Gatesga...@yahoo-inc.com wrote:
 Non-committers certainly get a vote, it just isn't binding.

 I agree on PIG-925 as a blocker.  I don't see PIG-859 as a blocker since
 there is a simple work around.

 If we want to release 0.4.0 within a week or so, dynamic shims won't be an
 option because we won't be able to solve the bundled hadoop lib problem in
 that amount of time.  I agree that we are not making life easy enough for
 users who want to build with hadoop 0.20.  Based on comments on the JIRA,
 I'm not sure the patch for the static shims is ready.  What if instead we
 checked in a version of hadoop20.jar that will work for users who want to
 build with 0.20.  This way users can still build this if they want and our
 release isn't blocked on the patch.

 Alan.


 On Aug 17, 2009, at 12:03 PM, Dmitriy Ryaboy wrote:

 Olga,

 Do non-commiters get a vote?

 Zebra is in trunk, but relies on 0.20, which is somewhat inconsistent
 even if it's in contrib/

 Would love to see dynamic (or at least static) shims incorporated into
 the 0.4 release (see PIG-660, PIG-924)

 There are a couple of bugs still outstanding that I think would need
 to get fixed before a release:

 https://issues.apache.org/jira/browse/PIG-859
 https://issues.apache.org/jira/browse/PIG-925

 I think all of these can be solved within a week; assuming we are
 talking about a release after these go into trunk, +1.

 -D


 On Mon, Aug 17, 2009 at 11:46 AM, Olga Natkovichol...@yahoo-inc.com
 wrote:

 Pig Developers,



 We have made several significant performance and other improvements over
 the last couple of months:



 (1)     Added an optimizer with several rules

 (2)     Introduced skew and merge joins

 (3)     Cleaned COUNT and AVG semantics



 I think it is time for another release to make this functionality
 available to users.



 I propose that Pig 0.4.0 is released against Hadoop 18 since most users
 are still using this version. Once Hadoop 20.1 is released, we will roll
 Pig 0.5.0 based on Hadoop 20.



 Please, vote on the proposal by Thursday.



 Olga






Re: Pig 0.4.0 release

2009-08-18 Thread Alan Gates


On Aug 18, 2009, at 10:05 AM, Dmitriy Ryaboy wrote:


I am about to submit a cleaned up patch for 924.
It works fine as a static patch (in fact I can attach it to 660 as
well) -- compiling with -Dhadoop.version=XX works as proposed for the
static shims. It does the necessary prep for the code to be able to
switch based on what's in its classpath, but it does not require
unbundling to work statically.


Ok, we'll take a look.



The hadoop20 jar attached to the zebra ticket is built in a different
way than 18 and 19; it does not report its version (18 and 19 do).
Right now I get around it by hard-coding a special case (Unknown =
20), but that's obviously suboptimal. Could someone rebuild
hadoop20.jar the way Pig wants it, and with the proper version
identification?  If that happens, 924/660 can go in together with
hadoop20.jar and users will at least be able to build against a static
version of hadoop without requiring a patch.


The hadoop 0.20 jar submitted with Zebra is not a standard jar.  It  
has extra tfile functionality that was not in 0.20, but will be in  
0.20.1.  It isn't something we should publish.  If we put a  
hadoop20.jar into pig's lib, it should be from 0.20 (or when  
available, 0.20.1).


Alan.



-Dmitriy

On Tue, Aug 18, 2009 at 9:56 AM, Alan Gatesga...@yahoo-inc.com  
wrote:

Non-committers certainly get a vote, it just isn't binding.

I agree on PIG-925 as a blocker.  I don't see PIG-859 as a blocker  
since

there is a simple work around.

If we want to release 0.4.0 within a week or so, dynamic shims  
won't be an
option because we won't be able to solve the bundled hadoop lib  
problem in
that amount of time.  I agree that we are not making life easy  
enough for
users who want to build with hadoop 0.20.  Based on comments on the  
JIRA,
I'm not sure the patch for the static shims is ready.  What if  
instead we
checked in a version of hadoop20.jar that will work for users who  
want to
build with 0.20.  This way users can still build this if they want  
and our

release isn't blocked on the patch.

Alan.


On Aug 17, 2009, at 12:03 PM, Dmitriy Ryaboy wrote:


Olga,

Do non-commiters get a vote?

Zebra is in trunk, but relies on 0.20, which is somewhat  
inconsistent

even if it's in contrib/

Would love to see dynamic (or at least static) shims incorporated  
into

the 0.4 release (see PIG-660, PIG-924)

There are a couple of bugs still outstanding that I think would need
to get fixed before a release:

https://issues.apache.org/jira/browse/PIG-859
https://issues.apache.org/jira/browse/PIG-925

I think all of these can be solved within a week; assuming we are
talking about a release after these go into trunk, +1.

-D


On Mon, Aug 17, 2009 at 11:46 AM, Olga Natkovichol...@yahoo- 
inc.com

wrote:


Pig Developers,



We have made several significant performance and other  
improvements over

the last couple of months:



(1) Added an optimizer with several rules

(2) Introduced skew and merge joins

(3) Cleaned COUNT and AVG semantics



I think it is time for another release to make this functionality
available to users.



I propose that Pig 0.4.0 is released against Hadoop 18 since most  
users
are still using this version. Once Hadoop 20.1 is released, we  
will roll

Pig 0.5.0 based on Hadoop 20.



Please, vote on the proposal by Thursday.



Olga









Re: Pig 0.4.0 release

2009-08-17 Thread Dmitriy Ryaboy
Olga,

Do non-commiters get a vote?

Zebra is in trunk, but relies on 0.20, which is somewhat inconsistent
even if it's in contrib/

Would love to see dynamic (or at least static) shims incorporated into
the 0.4 release (see PIG-660, PIG-924)

There are a couple of bugs still outstanding that I think would need
to get fixed before a release:

https://issues.apache.org/jira/browse/PIG-859
https://issues.apache.org/jira/browse/PIG-925

 I think all of these can be solved within a week; assuming we are
talking about a release after these go into trunk, +1.

-D


On Mon, Aug 17, 2009 at 11:46 AM, Olga Natkovichol...@yahoo-inc.com wrote:
 Pig Developers,



 We have made several significant performance and other improvements over
 the last couple of months:



 (1)     Added an optimizer with several rules

 (2)     Introduced skew and merge joins

 (3)     Cleaned COUNT and AVG semantics



 I think it is time for another release to make this functionality
 available to users.



 I propose that Pig 0.4.0 is released against Hadoop 18 since most users
 are still using this version. Once Hadoop 20.1 is released, we will roll
 Pig 0.5.0 based on Hadoop 20.



 Please, vote on the proposal by Thursday.



 Olga




RE: Pig 0.4.0 release

2009-08-17 Thread Olga Natkovich
Hi Dmitry,

Non-committers get a non-binding vote.

Zebra needs Hadoop 20.1 because it is relying on TFile functionality that is 
not available in Hadoop 20. In general, the recommendation from the Hadoop team 
is to wait till hadoop 20.1 is released.

For the remainder of the issues, while I see that it would be nice to resolve 
them, I don't see them as blockers for Pig 0.4.0.

My plan was to release what's currently in the trunk and have a follow up patch 
releases if needed.

Olga

-Original Message-
From: Dmitriy Ryaboy [mailto:dvrya...@cloudera.com] 
Sent: Monday, August 17, 2009 12:04 PM
To: pig-dev@hadoop.apache.org
Subject: Re: Pig 0.4.0 release

Olga,

Do non-commiters get a vote?

Zebra is in trunk, but relies on 0.20, which is somewhat inconsistent
even if it's in contrib/

Would love to see dynamic (or at least static) shims incorporated into
the 0.4 release (see PIG-660, PIG-924)

There are a couple of bugs still outstanding that I think would need
to get fixed before a release:

https://issues.apache.org/jira/browse/PIG-859
https://issues.apache.org/jira/browse/PIG-925

 I think all of these can be solved within a week; assuming we are
talking about a release after these go into trunk, +1.

-D


On Mon, Aug 17, 2009 at 11:46 AM, Olga Natkovichol...@yahoo-inc.com wrote:
 Pig Developers,



 We have made several significant performance and other improvements over
 the last couple of months:



 (1)     Added an optimizer with several rules

 (2)     Introduced skew and merge joins

 (3)     Cleaned COUNT and AVG semantics



 I think it is time for another release to make this functionality
 available to users.



 I propose that Pig 0.4.0 is released against Hadoop 18 since most users
 are still using this version. Once Hadoop 20.1 is released, we will roll
 Pig 0.5.0 based on Hadoop 20.



 Please, vote on the proposal by Thursday.



 Olga




RE: Pig 0.4.0 release

2009-08-17 Thread Santhosh Srinivasan
I have a question:

Will we be able to fix piggybank sources given that Zebra needs 0.20 and the 
rest of Pig requires 0.18? 

If the answer is yes then, +1 for the release. I agree with the plan of making 
0.4.0 with Hadoop-0.18 and a later release (0.5.0) for Hadoop-0.20.1.

Thanks,
Santhosh

-Original Message-
From: Olga Natkovich [mailto:ol...@yahoo-inc.com] 
Sent: Monday, August 17, 2009 12:57 PM
To: pig-dev@hadoop.apache.org
Subject: RE: Pig 0.4.0 release

Hi Dmitry,

Non-committers get a non-binding vote.

Zebra needs Hadoop 20.1 because it is relying on TFile functionality that is 
not available in Hadoop 20. In general, the recommendation from the Hadoop team 
is to wait till hadoop 20.1 is released.

For the remainder of the issues, while I see that it would be nice to resolve 
them, I don't see them as blockers for Pig 0.4.0.

My plan was to release what's currently in the trunk and have a follow up patch 
releases if needed.

Olga

-Original Message-
From: Dmitriy Ryaboy [mailto:dvrya...@cloudera.com] 
Sent: Monday, August 17, 2009 12:04 PM
To: pig-dev@hadoop.apache.org
Subject: Re: Pig 0.4.0 release

Olga,

Do non-commiters get a vote?

Zebra is in trunk, but relies on 0.20, which is somewhat inconsistent
even if it's in contrib/

Would love to see dynamic (or at least static) shims incorporated into
the 0.4 release (see PIG-660, PIG-924)

There are a couple of bugs still outstanding that I think would need
to get fixed before a release:

https://issues.apache.org/jira/browse/PIG-859
https://issues.apache.org/jira/browse/PIG-925

 I think all of these can be solved within a week; assuming we are
talking about a release after these go into trunk, +1.

-D


On Mon, Aug 17, 2009 at 11:46 AM, Olga Natkovichol...@yahoo-inc.com wrote:
 Pig Developers,



 We have made several significant performance and other improvements over
 the last couple of months:



 (1)     Added an optimizer with several rules

 (2)     Introduced skew and merge joins

 (3)     Cleaned COUNT and AVG semantics



 I think it is time for another release to make this functionality
 available to users.



 I propose that Pig 0.4.0 is released against Hadoop 18 since most users
 are still using this version. Once Hadoop 20.1 is released, we will roll
 Pig 0.5.0 based on Hadoop 20.



 Please, vote on the proposal by Thursday.



 Olga




RE: Pig 0.4.0 release

2009-08-17 Thread Olga Natkovich
Hi Santhosh,

What do you mean by fixing piggybank?

Olga

-Original Message-
From: Santhosh Srinivasan [mailto:s...@yahoo-inc.com] 
Sent: Monday, August 17, 2009 1:37 PM
To: pig-dev@hadoop.apache.org
Subject: RE: Pig 0.4.0 release

I have a question:

Will we be able to fix piggybank sources given that Zebra needs 0.20 and the 
rest of Pig requires 0.18? 

If the answer is yes then, +1 for the release. I agree with the plan of making 
0.4.0 with Hadoop-0.18 and a later release (0.5.0) for Hadoop-0.20.1.

Thanks,
Santhosh

-Original Message-
From: Olga Natkovich [mailto:ol...@yahoo-inc.com] 
Sent: Monday, August 17, 2009 12:57 PM
To: pig-dev@hadoop.apache.org
Subject: RE: Pig 0.4.0 release

Hi Dmitry,

Non-committers get a non-binding vote.

Zebra needs Hadoop 20.1 because it is relying on TFile functionality that is 
not available in Hadoop 20. In general, the recommendation from the Hadoop team 
is to wait till hadoop 20.1 is released.

For the remainder of the issues, while I see that it would be nice to resolve 
them, I don't see them as blockers for Pig 0.4.0.

My plan was to release what's currently in the trunk and have a follow up patch 
releases if needed.

Olga

-Original Message-
From: Dmitriy Ryaboy [mailto:dvrya...@cloudera.com] 
Sent: Monday, August 17, 2009 12:04 PM
To: pig-dev@hadoop.apache.org
Subject: Re: Pig 0.4.0 release

Olga,

Do non-commiters get a vote?

Zebra is in trunk, but relies on 0.20, which is somewhat inconsistent
even if it's in contrib/

Would love to see dynamic (or at least static) shims incorporated into
the 0.4 release (see PIG-660, PIG-924)

There are a couple of bugs still outstanding that I think would need
to get fixed before a release:

https://issues.apache.org/jira/browse/PIG-859
https://issues.apache.org/jira/browse/PIG-925

 I think all of these can be solved within a week; assuming we are
talking about a release after these go into trunk, +1.

-D


On Mon, Aug 17, 2009 at 11:46 AM, Olga Natkovichol...@yahoo-inc.com wrote:
 Pig Developers,



 We have made several significant performance and other improvements over
 the last couple of months:



 (1)     Added an optimizer with several rules

 (2)     Introduced skew and merge joins

 (3)     Cleaned COUNT and AVG semantics



 I think it is time for another release to make this functionality
 available to users.



 I propose that Pig 0.4.0 is released against Hadoop 18 since most users
 are still using this version. Once Hadoop 20.1 is released, we will roll
 Pig 0.5.0 based on Hadoop 20.



 Please, vote on the proposal by Thursday.



 Olga




RE: Pig 0.4.0 release

2009-08-17 Thread Santhosh Srinivasan
Till we release 0.5.0, will zebra's requirement on 0.20 prevent any bugs/issues 
with Piggybank?

Santhosh

-Original Message-
From: Olga Natkovich [mailto:ol...@yahoo-inc.com] 
Sent: Monday, August 17, 2009 1:43 PM
To: pig-dev@hadoop.apache.org
Subject: RE: Pig 0.4.0 release

Hi Santhosh,

What do you mean by fixing piggybank?

Olga

-Original Message-
From: Santhosh Srinivasan [mailto:s...@yahoo-inc.com] 
Sent: Monday, August 17, 2009 1:37 PM
To: pig-dev@hadoop.apache.org
Subject: RE: Pig 0.4.0 release

I have a question:

Will we be able to fix piggybank sources given that Zebra needs 0.20 and the 
rest of Pig requires 0.18? 

If the answer is yes then, +1 for the release. I agree with the plan of making 
0.4.0 with Hadoop-0.18 and a later release (0.5.0) for Hadoop-0.20.1.

Thanks,
Santhosh

-Original Message-
From: Olga Natkovich [mailto:ol...@yahoo-inc.com] 
Sent: Monday, August 17, 2009 12:57 PM
To: pig-dev@hadoop.apache.org
Subject: RE: Pig 0.4.0 release

Hi Dmitry,

Non-committers get a non-binding vote.

Zebra needs Hadoop 20.1 because it is relying on TFile functionality that is 
not available in Hadoop 20. In general, the recommendation from the Hadoop team 
is to wait till hadoop 20.1 is released.

For the remainder of the issues, while I see that it would be nice to resolve 
them, I don't see them as blockers for Pig 0.4.0.

My plan was to release what's currently in the trunk and have a follow up patch 
releases if needed.

Olga

-Original Message-
From: Dmitriy Ryaboy [mailto:dvrya...@cloudera.com] 
Sent: Monday, August 17, 2009 12:04 PM
To: pig-dev@hadoop.apache.org
Subject: Re: Pig 0.4.0 release

Olga,

Do non-commiters get a vote?

Zebra is in trunk, but relies on 0.20, which is somewhat inconsistent
even if it's in contrib/

Would love to see dynamic (or at least static) shims incorporated into
the 0.4 release (see PIG-660, PIG-924)

There are a couple of bugs still outstanding that I think would need
to get fixed before a release:

https://issues.apache.org/jira/browse/PIG-859
https://issues.apache.org/jira/browse/PIG-925

 I think all of these can be solved within a week; assuming we are
talking about a release after these go into trunk, +1.

-D


On Mon, Aug 17, 2009 at 11:46 AM, Olga Natkovichol...@yahoo-inc.com wrote:
 Pig Developers,



 We have made several significant performance and other improvements over
 the last couple of months:



 (1)     Added an optimizer with several rules

 (2)     Introduced skew and merge joins

 (3)     Cleaned COUNT and AVG semantics



 I think it is time for another release to make this functionality
 available to users.



 I propose that Pig 0.4.0 is released against Hadoop 18 since most users
 are still using this version. Once Hadoop 20.1 is released, we will roll
 Pig 0.5.0 based on Hadoop 20.



 Please, vote on the proposal by Thursday.



 Olga




RE: Pig 0.4.0 release

2009-08-17 Thread Santhosh Srinivasan
Rephrasing my question:

Till we release 0.5.0, will zebra's requirement on hadoop-0.20 prevent fixing 
of any bugs/issues with Piggybank? 

Santhosh

-Original Message-
From: Santhosh Srinivasan [mailto:s...@yahoo-inc.com] 
Sent: Monday, August 17, 2009 1:47 PM
To: pig-dev@hadoop.apache.org
Subject: RE: Pig 0.4.0 release

Till we release 0.5.0, will zebra's requirement on 0.20 prevent any bugs/issues 
with Piggybank?

Santhosh

-Original Message-
From: Olga Natkovich [mailto:ol...@yahoo-inc.com] 
Sent: Monday, August 17, 2009 1:43 PM
To: pig-dev@hadoop.apache.org
Subject: RE: Pig 0.4.0 release

Hi Santhosh,

What do you mean by fixing piggybank?

Olga

-Original Message-
From: Santhosh Srinivasan [mailto:s...@yahoo-inc.com] 
Sent: Monday, August 17, 2009 1:37 PM
To: pig-dev@hadoop.apache.org
Subject: RE: Pig 0.4.0 release

I have a question:

Will we be able to fix piggybank sources given that Zebra needs 0.20 and the 
rest of Pig requires 0.18? 

If the answer is yes then, +1 for the release. I agree with the plan of making 
0.4.0 with Hadoop-0.18 and a later release (0.5.0) for Hadoop-0.20.1.

Thanks,
Santhosh

-Original Message-
From: Olga Natkovich [mailto:ol...@yahoo-inc.com] 
Sent: Monday, August 17, 2009 12:57 PM
To: pig-dev@hadoop.apache.org
Subject: RE: Pig 0.4.0 release

Hi Dmitry,

Non-committers get a non-binding vote.

Zebra needs Hadoop 20.1 because it is relying on TFile functionality that is 
not available in Hadoop 20. In general, the recommendation from the Hadoop team 
is to wait till hadoop 20.1 is released.

For the remainder of the issues, while I see that it would be nice to resolve 
them, I don't see them as blockers for Pig 0.4.0.

My plan was to release what's currently in the trunk and have a follow up patch 
releases if needed.

Olga

-Original Message-
From: Dmitriy Ryaboy [mailto:dvrya...@cloudera.com] 
Sent: Monday, August 17, 2009 12:04 PM
To: pig-dev@hadoop.apache.org
Subject: Re: Pig 0.4.0 release

Olga,

Do non-commiters get a vote?

Zebra is in trunk, but relies on 0.20, which is somewhat inconsistent
even if it's in contrib/

Would love to see dynamic (or at least static) shims incorporated into
the 0.4 release (see PIG-660, PIG-924)

There are a couple of bugs still outstanding that I think would need
to get fixed before a release:

https://issues.apache.org/jira/browse/PIG-859
https://issues.apache.org/jira/browse/PIG-925

 I think all of these can be solved within a week; assuming we are
talking about a release after these go into trunk, +1.

-D


On Mon, Aug 17, 2009 at 11:46 AM, Olga Natkovichol...@yahoo-inc.com wrote:
 Pig Developers,



 We have made several significant performance and other improvements over
 the last couple of months:



 (1)     Added an optimizer with several rules

 (2)     Introduced skew and merge joins

 (3)     Cleaned COUNT and AVG semantics



 I think it is time for another release to make this functionality
 available to users.



 I propose that Pig 0.4.0 is released against Hadoop 18 since most users
 are still using this version. Once Hadoop 20.1 is released, we will roll
 Pig 0.5.0 based on Hadoop 20.



 Please, vote on the proposal by Thursday.



 Olga