Re: [VOTE] Release Pig 0.4.0 (candidate 1)

2009-09-17 Thread Nigel Daley

Is anyone else getting javac errors running "ant test"?

compile-sources:
[javac] Compiling 484 source files to /Users/ndaley/hadoop/verify/ 
pig-0.4.0/build/classes
[javac] /Users/ndaley/hadoop/verify/pig-0.4.0/src/org/apache/pig/ 
ComparisonFunc.java:22: package org.apache.hadoop.io does not exist

[javac] import org.apache.hadoop.io.WritableComparable;
[javac]^
...

Nige

On Sep 17, 2009, at 12:09 PM, Olga Natkovich wrote:


Hi,

I have fixed the issue causing the failure that Alan reported.

Please test the new release:
http://people.apache.org/~olga/pig-0.4.0-candidate-1/.

Vote closes on Tuesday, 9/22.

Olga


-Original Message-
From: Olga Natkovich [mailto:ol...@yahoo-inc.com]
Sent: Monday, September 14, 2009 2:06 PM
To: pig-dev@hadoop.apache.org; priv...@hadoop.apache.org
Subject: [VOTE] Release Pig 0.4.0 (candidate 0)

Hi,



I created a candidate build for Pig 0.4.0 release. The highlights of
this release are



-  Performance improvements especially in the area of JOIN
support where we introduced two new join types: skew join to deal with
data skew and sort merge join to take advantage of the sorted data  
sets.


-  Support for Outer join.

-  Works with Hadoop 18



I ran the release audit and rat report looked fine. The relevant  
part is

attached below.



Keys used to sign the release are available at
http://svn.apache.org/viewvc/hadoop/pig/trunk/KEYS?view=markup.



Please download the release and try it out:
http://people.apache.org/~olga/pig-0.4.0-candidate-0.



Should we release this? Vote closes on Thursday, 9/17.



Olga





[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/contrib/ 
CHANGES.txt

[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/contrib/zebra/ 
CHANG

ES.txt
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/broken- 
links.x

ml
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/ 
cookbook.html

[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/index.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/linkmap.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/ 
piglatin_refer

ence.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/ 
piglatin_users

.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/setup.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/ 
tutorial.html

[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/udf.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/api/ 
package-li

st
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes.

html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
missingS

inces.txt
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
user_com

ments_for_pig_0.3.1_to_pig_0.5.0-dev.xml
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

alldiffs_index_additions.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

alldiffs_index_all.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

alldiffs_index_changes.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

alldiffs_index_removals.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

changes-summary.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

classes_index_additions.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

classes_index_all.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

classes_index_changes.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

classes_index_removals.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

constructors_index_additions.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

constructors_index_all.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

constructors_index_changes.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

constructors_index_removals.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

fields_index_additions.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

fields_index_all.html
[java]  !?
/home/

[jira] Commented: (PIG-964) Handling null in skewed join

2009-09-17 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756970#action_12756970
 ] 

Olga Natkovich commented on PIG-964:


patch committed to branch-0.5

> Handling null  in skewed join
> -
>
> Key: PIG-964
> URL: https://issues.apache.org/jira/browse/PIG-964
> Project: Pig
>  Issue Type: Bug
>Reporter: Sriranjan Manjunath
>Assignee: Sriranjan Manjunath
> Attachments: skewedjoinnull.patch
>
>
> For null tuples, the tuple size is calculated incorrectly and thus  skewed 
> join ends up expecting a large number of reducers. Further, skewed join 
> should not bail out after the second job if the number of reducers specified 
> by the user is low. It should print a warning message and continue execution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-964) Handling null in skewed join

2009-09-17 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756958#action_12756958
 ] 

Olga Natkovich commented on PIG-964:


+1 on the code

> Handling null  in skewed join
> -
>
> Key: PIG-964
> URL: https://issues.apache.org/jira/browse/PIG-964
> Project: Pig
>  Issue Type: Bug
>Reporter: Sriranjan Manjunath
>Assignee: Sriranjan Manjunath
> Attachments: skewedjoinnull.patch
>
>
> For null tuples, the tuple size is calculated incorrectly and thus  skewed 
> join ends up expecting a large number of reducers. Further, skewed join 
> should not bail out after the second job if the number of reducers specified 
> by the user is low. It should print a warning message and continue execution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-964) Handling null in skewed join

2009-09-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756891#action_12756891
 ] 

Hadoop QA commented on PIG-964:
---

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12419938/skewedjoinnull.patch
  against trunk revision 816339.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/37/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/37/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/37/console

This message is automatically generated.

> Handling null  in skewed join
> -
>
> Key: PIG-964
> URL: https://issues.apache.org/jira/browse/PIG-964
> Project: Pig
>  Issue Type: Bug
>Reporter: Sriranjan Manjunath
>Assignee: Sriranjan Manjunath
> Attachments: skewedjoinnull.patch
>
>
> For null tuples, the tuple size is calculated incorrectly and thus  skewed 
> join ends up expecting a large number of reducers. Further, skewed join 
> should not bail out after the second job if the number of reducers specified 
> by the user is low. It should print a warning message and continue execution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-964) Handling null in skewed join

2009-09-17 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-964:


Attachment: (was: skjoin2b.patch)

> Handling null  in skewed join
> -
>
> Key: PIG-964
> URL: https://issues.apache.org/jira/browse/PIG-964
> Project: Pig
>  Issue Type: Bug
>Reporter: Sriranjan Manjunath
>Assignee: Sriranjan Manjunath
> Attachments: skewedjoinnull.patch
>
>
> For null tuples, the tuple size is calculated incorrectly and thus  skewed 
> join ends up expecting a large number of reducers. Further, skewed join 
> should not bail out after the second job if the number of reducers specified 
> by the user is low. It should print a warning message and continue execution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-964) Handling null in skewed join

2009-09-17 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-964:


Attachment: skewedjoinnull.patch

Cleared end-end tests and added a new unit test to check for nulls in the 
dataset.

> Handling null  in skewed join
> -
>
> Key: PIG-964
> URL: https://issues.apache.org/jira/browse/PIG-964
> Project: Pig
>  Issue Type: Bug
>Reporter: Sriranjan Manjunath
>Assignee: Sriranjan Manjunath
> Attachments: skewedjoinnull.patch
>
>
> For null tuples, the tuple size is calculated incorrectly and thus  skewed 
> join ends up expecting a large number of reducers. Further, skewed join 
> should not bail out after the second job if the number of reducers specified 
> by the user is low. It should print a warning message and continue execution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-964) Handling null in skewed join

2009-09-17 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-964:


Status: Patch Available  (was: Open)

> Handling null  in skewed join
> -
>
> Key: PIG-964
> URL: https://issues.apache.org/jira/browse/PIG-964
> Project: Pig
>  Issue Type: Bug
>Reporter: Sriranjan Manjunath
>Assignee: Sriranjan Manjunath
> Attachments: skewedjoinnull.patch
>
>
> For null tuples, the tuple size is calculated incorrectly and thus  skewed 
> join ends up expecting a large number of reducers. Further, skewed join 
> should not bail out after the second job if the number of reducers specified 
> by the user is low. It should print a warning message and continue execution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-964) Handling null in skewed join

2009-09-17 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-964:


Status: Open  (was: Patch Available)

> Handling null  in skewed join
> -
>
> Key: PIG-964
> URL: https://issues.apache.org/jira/browse/PIG-964
> Project: Pig
>  Issue Type: Bug
>Reporter: Sriranjan Manjunath
>Assignee: Sriranjan Manjunath
> Attachments: skjoin2b.patch
>
>
> For null tuples, the tuple size is calculated incorrectly and thus  skewed 
> join ends up expecting a large number of reducers. Further, skewed join 
> should not bail out after the second job if the number of reducers specified 
> by the user is low. It should print a warning message and continue execution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage

2009-09-17 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-960:
---

Status: Open  (was: Patch Available)

This patch failed in release audit

> Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage 
> ---
>
> Key: PIG-960
> URL: https://issues.apache.org/jira/browse/PIG-960
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Ankit Modi
>
> PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's 
> {{LineRecordReader}}.
> This can help in following areas
> - Improving performance reading of Tuples (lines) in {{PigStorage}}
> - Any future improvements in line reading done in Hadoop's 
> {{LineRecordReader}} is automatically carried over to Pig
> Issues that are handled by this patch
> - BZip uses internal buffers and positioning for determining the number of 
> bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off
> - Current implementation of {{LocalSeekableInputStream}} does not implement 
> {{available}} method. This method has to be implemented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage

2009-09-17 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-960:
---

Attachment: (was: pig_rlr.patch)

> Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage 
> ---
>
> Key: PIG-960
> URL: https://issues.apache.org/jira/browse/PIG-960
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Ankit Modi
>
> PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's 
> {{LineRecordReader}}.
> This can help in following areas
> - Improving performance reading of Tuples (lines) in {{PigStorage}}
> - Any future improvements in line reading done in Hadoop's 
> {{LineRecordReader}} is automatically carried over to Pig
> Issues that are handled by this patch
> - BZip uses internal buffers and positioning for determining the number of 
> bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off
> - Current implementation of {{LocalSeekableInputStream}} does not implement 
> {{available}} method. This method has to be implemented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [VOTE] Release Pig 0.4.0 (candidate 1)

2009-09-17 Thread Alan Gates
Now the code won't build because there's no hadoop jar in the lib  
directory.


Alan.

On Sep 17, 2009, at 12:09 PM, Olga Natkovich wrote:


Hi,

I have fixed the issue causing the failure that Alan reported.

Please test the new release:
http://people.apache.org/~olga/pig-0.4.0-candidate-1/.

Vote closes on Tuesday, 9/22.

Olga


-Original Message-
From: Olga Natkovich [mailto:ol...@yahoo-inc.com]
Sent: Monday, September 14, 2009 2:06 PM
To: pig-dev@hadoop.apache.org; priv...@hadoop.apache.org
Subject: [VOTE] Release Pig 0.4.0 (candidate 0)

Hi,



I created a candidate build for Pig 0.4.0 release. The highlights of
this release are



-  Performance improvements especially in the area of JOIN
support where we introduced two new join types: skew join to deal with
data skew and sort merge join to take advantage of the sorted data  
sets.


-  Support for Outer join.

-  Works with Hadoop 18



I ran the release audit and rat report looked fine. The relevant  
part is

attached below.



Keys used to sign the release are available at
http://svn.apache.org/viewvc/hadoop/pig/trunk/KEYS?view=markup.



Please download the release and try it out:
http://people.apache.org/~olga/pig-0.4.0-candidate-0.



Should we release this? Vote closes on Thursday, 9/17.



Olga





[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/contrib/ 
CHANGES.txt

[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/contrib/zebra/ 
CHANG

ES.txt
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/broken- 
links.x

ml
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/ 
cookbook.html

[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/index.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/linkmap.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/ 
piglatin_refer

ence.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/ 
piglatin_users

.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/setup.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/ 
tutorial.html

[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/udf.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/api/ 
package-li

st
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes.

html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
missingS

inces.txt
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
user_com

ments_for_pig_0.3.1_to_pig_0.5.0-dev.xml
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

alldiffs_index_additions.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

alldiffs_index_all.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

alldiffs_index_changes.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

alldiffs_index_removals.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

changes-summary.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

classes_index_additions.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

classes_index_all.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

classes_index_changes.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

classes_index_removals.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

constructors_index_additions.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

constructors_index_all.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

constructors_index_changes.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

constructors_index_removals.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

fields_index_additions.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

fields_index_all.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

fields_index_changes.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

fields_index_removals.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

jdiff_help.html
[j

[VOTE] Release Pig 0.4.0 (candidate 1)

2009-09-17 Thread Olga Natkovich
Hi,

I have fixed the issue causing the failure that Alan reported.

Please test the new release:
http://people.apache.org/~olga/pig-0.4.0-candidate-1/.

Vote closes on Tuesday, 9/22.

Olga


-Original Message-
From: Olga Natkovich [mailto:ol...@yahoo-inc.com] 
Sent: Monday, September 14, 2009 2:06 PM
To: pig-dev@hadoop.apache.org; priv...@hadoop.apache.org
Subject: [VOTE] Release Pig 0.4.0 (candidate 0)

Hi,

 

I created a candidate build for Pig 0.4.0 release. The highlights of
this release are

 

-  Performance improvements especially in the area of JOIN
support where we introduced two new join types: skew join to deal with
data skew and sort merge join to take advantage of the sorted data sets.

-  Support for Outer join.

-  Works with Hadoop 18

 

I ran the release audit and rat report looked fine. The relevant part is
attached below.

 

Keys used to sign the release are available at
http://svn.apache.org/viewvc/hadoop/pig/trunk/KEYS?view=markup.

 

Please download the release and try it out:
http://people.apache.org/~olga/pig-0.4.0-candidate-0.

 

Should we release this? Vote closes on Thursday, 9/17.

 

Olga

 

 

 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/contrib/CHANGES.txt
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/contrib/zebra/CHANG
ES.txt
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/broken-links.x
ml
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/cookbook.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/index.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/linkmap.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/piglatin_refer
ence.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/piglatin_users
.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/setup.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/tutorial.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/udf.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/api/package-li
st
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes.
html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/missingS
inces.txt
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/user_com
ments_for_pig_0.3.1_to_pig_0.5.0-dev.xml
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
alldiffs_index_additions.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
alldiffs_index_all.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
alldiffs_index_changes.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
alldiffs_index_removals.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
changes-summary.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
classes_index_additions.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
classes_index_all.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
classes_index_changes.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
classes_index_removals.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
constructors_index_additions.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
constructors_index_all.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
constructors_index_changes.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
constructors_index_removals.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
fields_index_additions.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
fields_index_all.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
fields_index_changes.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
fields_index_removals.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
jdiff_help.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
jdiff_statistics.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff

[jira] Commented: (PIG-964) Handling null in skewed join

2009-09-17 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756700#action_12756700
 ] 

Olga Natkovich commented on PIG-964:


The patch needs unit tests

> Handling null  in skewed join
> -
>
> Key: PIG-964
> URL: https://issues.apache.org/jira/browse/PIG-964
> Project: Pig
>  Issue Type: Bug
>Reporter: Sriranjan Manjunath
>Assignee: Sriranjan Manjunath
> Attachments: skjoin2b.patch
>
>
> For null tuples, the tuple size is calculated incorrectly and thus  skewed 
> join ends up expecting a large number of reducers. Further, skewed join 
> should not bail out after the second job if the number of reducers specified 
> by the user is low. It should print a warning message and continue execution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-09-17 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756673#action_12756673
 ] 

Thejas M Nair commented on PIG-965:
---

Hive like clause implementation is here - 
http://svn.apache.org/viewvc/hadoop/hive/trunk/ql/src/java/org/apache/hadoop
/hive/ql/udf/UDFLike.java?revision=802066&view=markup

I ran simple tests with a simple java program to see the impact of these 
optimizations. Optimization 1 reduces runtime to 1/2, optimization 2 reduces 
runtime to 1/4 . 

{code}
int matches =0;
int tot = 0;
String prefix = "123";
Pattern p =  Pattern.compile("123.*");
while((str = in.readLine()) != null ){



//without proposed optimizations
//test setups 1 and 2 took 9secs, 126 secs respectively
//if(str.matches("123.*"))
//matches++;



// with optimization 1
//test sestups 1, 2 took  4, 57 secs respectively
//if((p.matcher(str).matches()))
//matches++;


// with optimization 1
//test sestups 1, 2 took  2.5, 25 secs respectively
//takes 2.5, 25 secs
//int len = prefix.length();
//boolean matched = true;
//for(int i=0; i PERFORMANCE: optimize common case in matches (PORegex)
> --
>
> Key: PIG-965
> URL: https://issues.apache.org/jira/browse/PIG-965
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Reporter: Thejas M Nair
>
> Some frequently seen use cases of 'matches' comparison operator have follow 
> properties -
> 1. The rhs is a constant string . eg "c1 matches 'abc%' "
> 2. Regexes such that look for matching prefix , suffix etc are very common. 
> eg - "abc%', "%abc", '%abc%' 
> To optimize for these common cases , PORegex.java can be changed to -
> 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
> not changed. 
> 2. Use string comparisons for simple common regexes (in 2 above).
> The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-366) PigPen - Eclipse plugin for a graphical PigLatin editor

2009-09-17 Thread patrick o'leary (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756672#action_12756672
 ] 

patrick o'leary commented on PIG-366:
-

I'm guessing the 2008-11-12 12:25 AM patch isn't upto date?
The tar doesn't contain the src

> PigPen - Eclipse plugin for a graphical PigLatin editor
> ---
>
> Key: PIG-366
> URL: https://issues.apache.org/jira/browse/PIG-366
> Project: Pig
>  Issue Type: New Feature
>Reporter: Shubham Chopra
>Assignee: Shubham Chopra
>Priority: Minor
> Attachments: org.apache.pig.pigpen_0.0.1.jar, 
> org.apache.pig.pigpen_0.0.1.tgz, org.apache.pig.pigpen_0.0.4.jar, 
> pigpen.patch, pigPen.patch, PigPen.tgz
>
>
> This is an Eclipse plugin that provides a GUI that can help users create 
> PigLatin scripts and see the example generator outputs on the fly and submit 
> the jobs to hadoop clusters.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-09-17 Thread Thejas M Nair (JIRA)
PERFORMANCE: optimize common case in matches (PORegex)
--

 Key: PIG-965
 URL: https://issues.apache.org/jira/browse/PIG-965
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Thejas M Nair


Some frequently seen use cases of 'matches' comparison operator have follow 
properties -
1. The rhs is a constant string . eg "c1 matches 'abc%' "
2. Regexes such that look for matching prefix , suffix etc are very common. eg 
- "abc%', "%abc", '%abc%' 

To optimize for these common cases , PORegex.java can be changed to -
1. Compile the pattern (rhs of matches) re-use it if the pattern string has not 
changed. 
2. Use string comparisons for simple common regexes (in 2 above).

The implementation of Hive like clause uses similar optimizations.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-951) Reset parallelism to 1 for indexing job in MergeJoin

2009-09-17 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756654#action_12756654
 ] 

Alan Gates commented on PIG-951:


I'll be reviewing this patch.

> Reset parallelism to 1 for indexing job in MergeJoin
> 
>
> Key: PIG-951
> URL: https://issues.apache.org/jira/browse/PIG-951
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Ashutosh Chauhan
>Assignee: Ashutosh Chauhan
> Attachments: pig-951.patch
>
>
> After sampling one tuple from every block, one reducer is used to sort the 
> index entries in reduce phase to produce sorted index to be used in actual 
> join job. Thus, parallelism of index job should be explictly set to 1. 
> Currently, its not.
> Currently, this is a non-issue, since we don't allow any blocking operators 
> in pipeline before merge-join. However, later when we do allow blocking 
> operators, then parallelism of indexing job will be that of preceding 
> blocking operator. Even then, job will complete successfully because all 
> tuple will go to only one reducer, because we are grouping on only one key 
> "all". However, it will waste cluster resources by starting all the extra 
> reducers which get no data and thus do nothing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-366) PigPen - Eclipse plugin for a graphical PigLatin editor

2009-09-17 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756636#action_12756636
 ] 

Alan Gates commented on PIG-366:


At this point no one has picked up PigPen recently and kept it up to date.  I 
know it worked with Pig 0.2.0, but it has not been updated since then.

> PigPen - Eclipse plugin for a graphical PigLatin editor
> ---
>
> Key: PIG-366
> URL: https://issues.apache.org/jira/browse/PIG-366
> Project: Pig
>  Issue Type: New Feature
>Reporter: Shubham Chopra
>Assignee: Shubham Chopra
>Priority: Minor
> Attachments: org.apache.pig.pigpen_0.0.1.jar, 
> org.apache.pig.pigpen_0.0.1.tgz, org.apache.pig.pigpen_0.0.4.jar, 
> pigpen.patch, pigPen.patch, PigPen.tgz
>
>
> This is an Eclipse plugin that provides a GUI that can help users create 
> PigLatin scripts and see the example generator outputs on the fly and submit 
> the jobs to hadoop clusters.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-366) PigPen - Eclipse plugin for a graphical PigLatin editor

2009-09-17 Thread patrick o'leary (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756619#action_12756619
 ] 

patrick o'leary commented on PIG-366:
-

What version of hadoop is PigPen designed to use?
Am getting the following error
Caused by: org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol 
org.apache.hadoop.mapred.JobSubmissionProtocol version mismatch. (client = 11, 
server = 10)

Currently using pigpen pigpen_0.0.4.jar and hadoop 0.18.3

The wiki should contain version numbers and be updated to point to the new tar 
ball

> PigPen - Eclipse plugin for a graphical PigLatin editor
> ---
>
> Key: PIG-366
> URL: https://issues.apache.org/jira/browse/PIG-366
> Project: Pig
>  Issue Type: New Feature
>Reporter: Shubham Chopra
>Assignee: Shubham Chopra
>Priority: Minor
> Attachments: org.apache.pig.pigpen_0.0.1.jar, 
> org.apache.pig.pigpen_0.0.1.tgz, org.apache.pig.pigpen_0.0.4.jar, 
> pigpen.patch, pigPen.patch, PigPen.tgz
>
>
> This is an Eclipse plugin that provides a GUI that can help users create 
> PigLatin scripts and see the example generator outputs on the fly and submit 
> the jobs to hadoop clusters.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-964) Handling null in skewed join

2009-09-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12756454#action_12756454
 ] 

Hadoop QA commented on PIG-964:
---

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12419855/skjoin2b.patch
  against trunk revision 816012.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/36/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/36/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/36/console

This message is automatically generated.

> Handling null  in skewed join
> -
>
> Key: PIG-964
> URL: https://issues.apache.org/jira/browse/PIG-964
> Project: Pig
>  Issue Type: Bug
>Reporter: Sriranjan Manjunath
>Assignee: Sriranjan Manjunath
> Attachments: skjoin2b.patch
>
>
> For null tuples, the tuple size is calculated incorrectly and thus  skewed 
> join ends up expecting a large number of reducers. Further, skewed join 
> should not bail out after the second job if the number of reducers specified 
> by the user is low. It should print a warning message and continue execution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-964) Handling null in skewed join

2009-09-17 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-964:


Description: For null tuples, the tuple size is calculated incorrectly and 
thus  skewed join ends up expecting a large number of reducers. Further, skewed 
join should not bail out after the second job if the number of reducers 
specified by the user is low. It should print a warning message and continue 
execution.  (was: The tuple size is calculated incorrectly and thus the skewed 
join ends up expecting a large number of reducers. Further, skewed join should 
not bail out after the second job if the number of reducers specified by the 
user is low. It should print a warning message and continue execution.)
Summary: Handling null  in skewed join  (was: Handling null keys in 
skewed join)

> Handling null  in skewed join
> -
>
> Key: PIG-964
> URL: https://issues.apache.org/jira/browse/PIG-964
> Project: Pig
>  Issue Type: Bug
>Reporter: Sriranjan Manjunath
> Attachments: skjoin2b.patch
>
>
> For null tuples, the tuple size is calculated incorrectly and thus  skewed 
> join ends up expecting a large number of reducers. Further, skewed join 
> should not bail out after the second job if the number of reducers specified 
> by the user is low. It should print a warning message and continue execution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-964) Handling null in skewed join

2009-09-17 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-964:


Assignee: Sriranjan Manjunath
  Status: Patch Available  (was: Open)

> Handling null  in skewed join
> -
>
> Key: PIG-964
> URL: https://issues.apache.org/jira/browse/PIG-964
> Project: Pig
>  Issue Type: Bug
>Reporter: Sriranjan Manjunath
>Assignee: Sriranjan Manjunath
> Attachments: skjoin2b.patch
>
>
> For null tuples, the tuple size is calculated incorrectly and thus  skewed 
> join ends up expecting a large number of reducers. Further, skewed join 
> should not bail out after the second job if the number of reducers specified 
> by the user is low. It should print a warning message and continue execution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-964) Handling null in skewed join

2009-09-17 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-964:


Attachment: skjoin2b.patch

Attached patch solves both the issues.

> Handling null  in skewed join
> -
>
> Key: PIG-964
> URL: https://issues.apache.org/jira/browse/PIG-964
> Project: Pig
>  Issue Type: Bug
>Reporter: Sriranjan Manjunath
> Attachments: skjoin2b.patch
>
>
> For null tuples, the tuple size is calculated incorrectly and thus  skewed 
> join ends up expecting a large number of reducers. Further, skewed join 
> should not bail out after the second job if the number of reducers specified 
> by the user is low. It should print a warning message and continue execution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-964) Handling null keys in skewed join

2009-09-17 Thread Sriranjan Manjunath (JIRA)
Handling null keys in skewed join
-

 Key: PIG-964
 URL: https://issues.apache.org/jira/browse/PIG-964
 Project: Pig
  Issue Type: Bug
Reporter: Sriranjan Manjunath


The tuple size is calculated incorrectly and thus the skewed join ends up 
expecting a large number of reducers. Further, skewed join should not bail out 
after the second job if the number of reducers specified by the user is low. It 
should print a warning message and continue execution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.