[jira] Created: (PIG-964) Handling null keys in skewed join

2009-09-17 Thread Sriranjan Manjunath (JIRA)
Handling null keys in skewed join
-

 Key: PIG-964
 URL: https://issues.apache.org/jira/browse/PIG-964
 Project: Pig
  Issue Type: Bug
Reporter: Sriranjan Manjunath


The tuple size is calculated incorrectly and thus the skewed join ends up 
expecting a large number of reducers. Further, skewed join should not bail out 
after the second job if the number of reducers specified by the user is low. It 
should print a warning message and continue execution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-964) Handling null in skewed join

2009-09-17 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-964:


Attachment: skjoin2b.patch

Attached patch solves both the issues.

 Handling null  in skewed join
 -

 Key: PIG-964
 URL: https://issues.apache.org/jira/browse/PIG-964
 Project: Pig
  Issue Type: Bug
Reporter: Sriranjan Manjunath
 Attachments: skjoin2b.patch


 For null tuples, the tuple size is calculated incorrectly and thus  skewed 
 join ends up expecting a large number of reducers. Further, skewed join 
 should not bail out after the second job if the number of reducers specified 
 by the user is low. It should print a warning message and continue execution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-964) Handling null in skewed join

2009-09-17 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-964:


Description: For null tuples, the tuple size is calculated incorrectly and 
thus  skewed join ends up expecting a large number of reducers. Further, skewed 
join should not bail out after the second job if the number of reducers 
specified by the user is low. It should print a warning message and continue 
execution.  (was: The tuple size is calculated incorrectly and thus the skewed 
join ends up expecting a large number of reducers. Further, skewed join should 
not bail out after the second job if the number of reducers specified by the 
user is low. It should print a warning message and continue execution.)
Summary: Handling null  in skewed join  (was: Handling null keys in 
skewed join)

 Handling null  in skewed join
 -

 Key: PIG-964
 URL: https://issues.apache.org/jira/browse/PIG-964
 Project: Pig
  Issue Type: Bug
Reporter: Sriranjan Manjunath
 Attachments: skjoin2b.patch


 For null tuples, the tuple size is calculated incorrectly and thus  skewed 
 join ends up expecting a large number of reducers. Further, skewed join 
 should not bail out after the second job if the number of reducers specified 
 by the user is low. It should print a warning message and continue execution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-964) Handling null in skewed join

2009-09-17 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-964:


Assignee: Sriranjan Manjunath
  Status: Patch Available  (was: Open)

 Handling null  in skewed join
 -

 Key: PIG-964
 URL: https://issues.apache.org/jira/browse/PIG-964
 Project: Pig
  Issue Type: Bug
Reporter: Sriranjan Manjunath
Assignee: Sriranjan Manjunath
 Attachments: skjoin2b.patch


 For null tuples, the tuple size is calculated incorrectly and thus  skewed 
 join ends up expecting a large number of reducers. Further, skewed join 
 should not bail out after the second job if the number of reducers specified 
 by the user is low. It should print a warning message and continue execution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-964) Handling null in skewed join

2009-09-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756454#action_12756454
 ] 

Hadoop QA commented on PIG-964:
---

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12419855/skjoin2b.patch
  against trunk revision 816012.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/36/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/36/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/36/console

This message is automatically generated.

 Handling null  in skewed join
 -

 Key: PIG-964
 URL: https://issues.apache.org/jira/browse/PIG-964
 Project: Pig
  Issue Type: Bug
Reporter: Sriranjan Manjunath
Assignee: Sriranjan Manjunath
 Attachments: skjoin2b.patch


 For null tuples, the tuple size is calculated incorrectly and thus  skewed 
 join ends up expecting a large number of reducers. Further, skewed join 
 should not bail out after the second job if the number of reducers specified 
 by the user is low. It should print a warning message and continue execution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-366) PigPen - Eclipse plugin for a graphical PigLatin editor

2009-09-17 Thread patrick o'leary (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756619#action_12756619
 ] 

patrick o'leary commented on PIG-366:
-

What version of hadoop is PigPen designed to use?
Am getting the following error
Caused by: org.apache.hadoop.ipc.RPC$VersionMismatch: Protocol 
org.apache.hadoop.mapred.JobSubmissionProtocol version mismatch. (client = 11, 
server = 10)

Currently using pigpen pigpen_0.0.4.jar and hadoop 0.18.3

The wiki should contain version numbers and be updated to point to the new tar 
ball

 PigPen - Eclipse plugin for a graphical PigLatin editor
 ---

 Key: PIG-366
 URL: https://issues.apache.org/jira/browse/PIG-366
 Project: Pig
  Issue Type: New Feature
Reporter: Shubham Chopra
Assignee: Shubham Chopra
Priority: Minor
 Attachments: org.apache.pig.pigpen_0.0.1.jar, 
 org.apache.pig.pigpen_0.0.1.tgz, org.apache.pig.pigpen_0.0.4.jar, 
 pigpen.patch, pigPen.patch, PigPen.tgz


 This is an Eclipse plugin that provides a GUI that can help users create 
 PigLatin scripts and see the example generator outputs on the fly and submit 
 the jobs to hadoop clusters.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-366) PigPen - Eclipse plugin for a graphical PigLatin editor

2009-09-17 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756636#action_12756636
 ] 

Alan Gates commented on PIG-366:


At this point no one has picked up PigPen recently and kept it up to date.  I 
know it worked with Pig 0.2.0, but it has not been updated since then.

 PigPen - Eclipse plugin for a graphical PigLatin editor
 ---

 Key: PIG-366
 URL: https://issues.apache.org/jira/browse/PIG-366
 Project: Pig
  Issue Type: New Feature
Reporter: Shubham Chopra
Assignee: Shubham Chopra
Priority: Minor
 Attachments: org.apache.pig.pigpen_0.0.1.jar, 
 org.apache.pig.pigpen_0.0.1.tgz, org.apache.pig.pigpen_0.0.4.jar, 
 pigpen.patch, pigPen.patch, PigPen.tgz


 This is an Eclipse plugin that provides a GUI that can help users create 
 PigLatin scripts and see the example generator outputs on the fly and submit 
 the jobs to hadoop clusters.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-951) Reset parallelism to 1 for indexing job in MergeJoin

2009-09-17 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756654#action_12756654
 ] 

Alan Gates commented on PIG-951:


I'll be reviewing this patch.

 Reset parallelism to 1 for indexing job in MergeJoin
 

 Key: PIG-951
 URL: https://issues.apache.org/jira/browse/PIG-951
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: pig-951.patch


 After sampling one tuple from every block, one reducer is used to sort the 
 index entries in reduce phase to produce sorted index to be used in actual 
 join job. Thus, parallelism of index job should be explictly set to 1. 
 Currently, its not.
 Currently, this is a non-issue, since we don't allow any blocking operators 
 in pipeline before merge-join. However, later when we do allow blocking 
 operators, then parallelism of indexing job will be that of preceding 
 blocking operator. Even then, job will complete successfully because all 
 tuple will go to only one reducer, because we are grouping on only one key 
 all. However, it will waste cluster resources by starting all the extra 
 reducers which get no data and thus do nothing.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-09-17 Thread Thejas M Nair (JIRA)
PERFORMANCE: optimize common case in matches (PORegex)
--

 Key: PIG-965
 URL: https://issues.apache.org/jira/browse/PIG-965
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Thejas M Nair


Some frequently seen use cases of 'matches' comparison operator have follow 
properties -
1. The rhs is a constant string . eg c1 matches 'abc%' 
2. Regexes such that look for matching prefix , suffix etc are very common. eg 
- abc%', %abc, '%abc%' 

To optimize for these common cases , PORegex.java can be changed to -
1. Compile the pattern (rhs of matches) re-use it if the pattern string has not 
changed. 
2. Use string comparisons for simple common regexes (in 2 above).

The implementation of Hive like clause uses similar optimizations.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-965) PERFORMANCE: optimize common case in matches (PORegex)

2009-09-17 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756673#action_12756673
 ] 

Thejas M Nair commented on PIG-965:
---

Hive like clause implementation is here - 
http://svn.apache.org/viewvc/hadoop/hive/trunk/ql/src/java/org/apache/hadoop
/hive/ql/udf/UDFLike.java?revision=802066view=markup

I ran simple tests with a simple java program to see the impact of these 
optimizations. Optimization 1 reduces runtime to 1/2, optimization 2 reduces 
runtime to 1/4 . 

{code}
int matches =0;
int tot = 0;
String prefix = 123;
Pattern p =  Pattern.compile(123.*);
while((str = in.readLine()) != null ){



//without proposed optimizations
//test setups 1 and 2 took 9secs, 126 secs respectively
//if(str.matches(123.*))
//matches++;



// with optimization 1
//test sestups 1, 2 took  4, 57 secs respectively
//if((p.matcher(str).matches()))
//matches++;


// with optimization 1
//test sestups 1, 2 took  2.5, 25 secs respectively
//takes 2.5, 25 secs
//int len = prefix.length();
//boolean matched = true;
//for(int i=0; ilen; i++){
//if(prefix.charAt(i) != str.charAt(i)){
//matched = false;
//break;
//}
//}
//if(matched)
//matches++;

tot++;
}
   }
System.out.println(matches  + matches +  tot  + tot);
{code}

 PERFORMANCE: optimize common case in matches (PORegex)
 --

 Key: PIG-965
 URL: https://issues.apache.org/jira/browse/PIG-965
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Thejas M Nair

 Some frequently seen use cases of 'matches' comparison operator have follow 
 properties -
 1. The rhs is a constant string . eg c1 matches 'abc%' 
 2. Regexes such that look for matching prefix , suffix etc are very common. 
 eg - abc%', %abc, '%abc%' 
 To optimize for these common cases , PORegex.java can be changed to -
 1. Compile the pattern (rhs of matches) re-use it if the pattern string has 
 not changed. 
 2. Use string comparisons for simple common regexes (in 2 above).
 The implementation of Hive like clause uses similar optimizations.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[VOTE] Release Pig 0.4.0 (candidate 1)

2009-09-17 Thread Olga Natkovich
Hi,

I have fixed the issue causing the failure that Alan reported.

Please test the new release:
http://people.apache.org/~olga/pig-0.4.0-candidate-1/.

Vote closes on Tuesday, 9/22.

Olga


-Original Message-
From: Olga Natkovich [mailto:ol...@yahoo-inc.com] 
Sent: Monday, September 14, 2009 2:06 PM
To: pig-dev@hadoop.apache.org; priv...@hadoop.apache.org
Subject: [VOTE] Release Pig 0.4.0 (candidate 0)

Hi,

 

I created a candidate build for Pig 0.4.0 release. The highlights of
this release are

 

-  Performance improvements especially in the area of JOIN
support where we introduced two new join types: skew join to deal with
data skew and sort merge join to take advantage of the sorted data sets.

-  Support for Outer join.

-  Works with Hadoop 18

 

I ran the release audit and rat report looked fine. The relevant part is
attached below.

 

Keys used to sign the release are available at
http://svn.apache.org/viewvc/hadoop/pig/trunk/KEYS?view=markup.

 

Please download the release and try it out:
http://people.apache.org/~olga/pig-0.4.0-candidate-0.

 

Should we release this? Vote closes on Thursday, 9/17.

 

Olga

 

 

 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/contrib/CHANGES.txt
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/contrib/zebra/CHANG
ES.txt
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/broken-links.x
ml
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/cookbook.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/index.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/linkmap.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/piglatin_refer
ence.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/piglatin_users
.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/setup.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/tutorial.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/udf.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/api/package-li
st
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes.
html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/missingS
inces.txt
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/user_com
ments_for_pig_0.3.1_to_pig_0.5.0-dev.xml
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
alldiffs_index_additions.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
alldiffs_index_all.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
alldiffs_index_changes.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
alldiffs_index_removals.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
changes-summary.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
classes_index_additions.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
classes_index_all.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
classes_index_changes.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
classes_index_removals.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
constructors_index_additions.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
constructors_index_all.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
constructors_index_changes.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
constructors_index_removals.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
fields_index_additions.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
fields_index_all.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
fields_index_changes.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
fields_index_removals.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
jdiff_help.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
jdiff_statistics.html
 [java]  !?

Re: [VOTE] Release Pig 0.4.0 (candidate 1)

2009-09-17 Thread Alan Gates
Now the code won't build because there's no hadoop jar in the lib  
directory.


Alan.

On Sep 17, 2009, at 12:09 PM, Olga Natkovich wrote:


Hi,

I have fixed the issue causing the failure that Alan reported.

Please test the new release:
http://people.apache.org/~olga/pig-0.4.0-candidate-1/.

Vote closes on Tuesday, 9/22.

Olga


-Original Message-
From: Olga Natkovich [mailto:ol...@yahoo-inc.com]
Sent: Monday, September 14, 2009 2:06 PM
To: pig-dev@hadoop.apache.org; priv...@hadoop.apache.org
Subject: [VOTE] Release Pig 0.4.0 (candidate 0)

Hi,



I created a candidate build for Pig 0.4.0 release. The highlights of
this release are



-  Performance improvements especially in the area of JOIN
support where we introduced two new join types: skew join to deal with
data skew and sort merge join to take advantage of the sorted data  
sets.


-  Support for Outer join.

-  Works with Hadoop 18



I ran the release audit and rat report looked fine. The relevant  
part is

attached below.



Keys used to sign the release are available at
http://svn.apache.org/viewvc/hadoop/pig/trunk/KEYS?view=markup.



Please download the release and try it out:
http://people.apache.org/~olga/pig-0.4.0-candidate-0.



Should we release this? Vote closes on Thursday, 9/17.



Olga





[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/contrib/ 
CHANGES.txt

[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/contrib/zebra/ 
CHANG

ES.txt
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/broken- 
links.x

ml
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/ 
cookbook.html

[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/index.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/linkmap.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/ 
piglatin_refer

ence.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/ 
piglatin_users

.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/setup.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/ 
tutorial.html

[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/udf.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/api/ 
package-li

st
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes.

html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
missingS

inces.txt
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
user_com

ments_for_pig_0.3.1_to_pig_0.5.0-dev.xml
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

alldiffs_index_additions.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

alldiffs_index_all.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

alldiffs_index_changes.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

alldiffs_index_removals.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

changes-summary.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

classes_index_additions.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

classes_index_all.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

classes_index_changes.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

classes_index_removals.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

constructors_index_additions.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

constructors_index_all.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

constructors_index_changes.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

constructors_index_removals.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

fields_index_additions.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

fields_index_all.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

fields_index_changes.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

fields_index_removals.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

jdiff_help.html

[jira] Updated: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage

2009-09-17 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-960:
---

Status: Open  (was: Patch Available)

This patch failed in release audit

 Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage 
 ---

 Key: PIG-960
 URL: https://issues.apache.org/jira/browse/PIG-960
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Ankit Modi

 PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's 
 {{LineRecordReader}}.
 This can help in following areas
 - Improving performance reading of Tuples (lines) in {{PigStorage}}
 - Any future improvements in line reading done in Hadoop's 
 {{LineRecordReader}} is automatically carried over to Pig
 Issues that are handled by this patch
 - BZip uses internal buffers and positioning for determining the number of 
 bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off
 - Current implementation of {{LocalSeekableInputStream}} does not implement 
 {{available}} method. This method has to be implemented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-964) Handling null in skewed join

2009-09-17 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-964:


Status: Open  (was: Patch Available)

 Handling null  in skewed join
 -

 Key: PIG-964
 URL: https://issues.apache.org/jira/browse/PIG-964
 Project: Pig
  Issue Type: Bug
Reporter: Sriranjan Manjunath
Assignee: Sriranjan Manjunath
 Attachments: skjoin2b.patch


 For null tuples, the tuple size is calculated incorrectly and thus  skewed 
 join ends up expecting a large number of reducers. Further, skewed join 
 should not bail out after the second job if the number of reducers specified 
 by the user is low. It should print a warning message and continue execution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-964) Handling null in skewed join

2009-09-17 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-964:


Attachment: (was: skjoin2b.patch)

 Handling null  in skewed join
 -

 Key: PIG-964
 URL: https://issues.apache.org/jira/browse/PIG-964
 Project: Pig
  Issue Type: Bug
Reporter: Sriranjan Manjunath
Assignee: Sriranjan Manjunath
 Attachments: skewedjoinnull.patch


 For null tuples, the tuple size is calculated incorrectly and thus  skewed 
 join ends up expecting a large number of reducers. Further, skewed join 
 should not bail out after the second job if the number of reducers specified 
 by the user is low. It should print a warning message and continue execution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-964) Handling null in skewed join

2009-09-17 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-964:


Attachment: skewedjoinnull.patch

Cleared end-end tests and added a new unit test to check for nulls in the 
dataset.

 Handling null  in skewed join
 -

 Key: PIG-964
 URL: https://issues.apache.org/jira/browse/PIG-964
 Project: Pig
  Issue Type: Bug
Reporter: Sriranjan Manjunath
Assignee: Sriranjan Manjunath
 Attachments: skewedjoinnull.patch


 For null tuples, the tuple size is calculated incorrectly and thus  skewed 
 join ends up expecting a large number of reducers. Further, skewed join 
 should not bail out after the second job if the number of reducers specified 
 by the user is low. It should print a warning message and continue execution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-964) Handling null in skewed join

2009-09-17 Thread Sriranjan Manjunath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sriranjan Manjunath updated PIG-964:


Status: Patch Available  (was: Open)

 Handling null  in skewed join
 -

 Key: PIG-964
 URL: https://issues.apache.org/jira/browse/PIG-964
 Project: Pig
  Issue Type: Bug
Reporter: Sriranjan Manjunath
Assignee: Sriranjan Manjunath
 Attachments: skewedjoinnull.patch


 For null tuples, the tuple size is calculated incorrectly and thus  skewed 
 join ends up expecting a large number of reducers. Further, skewed join 
 should not bail out after the second job if the number of reducers specified 
 by the user is low. It should print a warning message and continue execution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-964) Handling null in skewed join

2009-09-17 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756891#action_12756891
 ] 

Hadoop QA commented on PIG-964:
---

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12419938/skewedjoinnull.patch
  against trunk revision 816339.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/37/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/37/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/37/console

This message is automatically generated.

 Handling null  in skewed join
 -

 Key: PIG-964
 URL: https://issues.apache.org/jira/browse/PIG-964
 Project: Pig
  Issue Type: Bug
Reporter: Sriranjan Manjunath
Assignee: Sriranjan Manjunath
 Attachments: skewedjoinnull.patch


 For null tuples, the tuple size is calculated incorrectly and thus  skewed 
 join ends up expecting a large number of reducers. Further, skewed join 
 should not bail out after the second job if the number of reducers specified 
 by the user is low. It should print a warning message and continue execution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-964) Handling null in skewed join

2009-09-17 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756958#action_12756958
 ] 

Olga Natkovich commented on PIG-964:


+1 on the code

 Handling null  in skewed join
 -

 Key: PIG-964
 URL: https://issues.apache.org/jira/browse/PIG-964
 Project: Pig
  Issue Type: Bug
Reporter: Sriranjan Manjunath
Assignee: Sriranjan Manjunath
 Attachments: skewedjoinnull.patch


 For null tuples, the tuple size is calculated incorrectly and thus  skewed 
 join ends up expecting a large number of reducers. Further, skewed join 
 should not bail out after the second job if the number of reducers specified 
 by the user is low. It should print a warning message and continue execution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-964) Handling null in skewed join

2009-09-17 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756970#action_12756970
 ] 

Olga Natkovich commented on PIG-964:


patch committed to branch-0.5

 Handling null  in skewed join
 -

 Key: PIG-964
 URL: https://issues.apache.org/jira/browse/PIG-964
 Project: Pig
  Issue Type: Bug
Reporter: Sriranjan Manjunath
Assignee: Sriranjan Manjunath
 Attachments: skewedjoinnull.patch


 For null tuples, the tuple size is calculated incorrectly and thus  skewed 
 join ends up expecting a large number of reducers. Further, skewed join 
 should not bail out after the second job if the number of reducers specified 
 by the user is low. It should print a warning message and continue execution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [VOTE] Release Pig 0.4.0 (candidate 1)

2009-09-17 Thread Nigel Daley

Is anyone else getting javac errors running ant test?

compile-sources:
[javac] Compiling 484 source files to /Users/ndaley/hadoop/verify/ 
pig-0.4.0/build/classes
[javac] /Users/ndaley/hadoop/verify/pig-0.4.0/src/org/apache/pig/ 
ComparisonFunc.java:22: package org.apache.hadoop.io does not exist

[javac] import org.apache.hadoop.io.WritableComparable;
[javac]^
...

Nige

On Sep 17, 2009, at 12:09 PM, Olga Natkovich wrote:


Hi,

I have fixed the issue causing the failure that Alan reported.

Please test the new release:
http://people.apache.org/~olga/pig-0.4.0-candidate-1/.

Vote closes on Tuesday, 9/22.

Olga


-Original Message-
From: Olga Natkovich [mailto:ol...@yahoo-inc.com]
Sent: Monday, September 14, 2009 2:06 PM
To: pig-dev@hadoop.apache.org; priv...@hadoop.apache.org
Subject: [VOTE] Release Pig 0.4.0 (candidate 0)

Hi,



I created a candidate build for Pig 0.4.0 release. The highlights of
this release are



-  Performance improvements especially in the area of JOIN
support where we introduced two new join types: skew join to deal with
data skew and sort merge join to take advantage of the sorted data  
sets.


-  Support for Outer join.

-  Works with Hadoop 18



I ran the release audit and rat report looked fine. The relevant  
part is

attached below.



Keys used to sign the release are available at
http://svn.apache.org/viewvc/hadoop/pig/trunk/KEYS?view=markup.



Please download the release and try it out:
http://people.apache.org/~olga/pig-0.4.0-candidate-0.



Should we release this? Vote closes on Thursday, 9/17.



Olga





[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/contrib/ 
CHANGES.txt

[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/contrib/zebra/ 
CHANG

ES.txt
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/broken- 
links.x

ml
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/ 
cookbook.html

[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/index.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/linkmap.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/ 
piglatin_refer

ence.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/ 
piglatin_users

.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/setup.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/ 
tutorial.html

[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/udf.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/api/ 
package-li

st
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes.

html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
missingS

inces.txt
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
user_com

ments_for_pig_0.3.1_to_pig_0.5.0-dev.xml
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

alldiffs_index_additions.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

alldiffs_index_all.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

alldiffs_index_changes.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

alldiffs_index_removals.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

changes-summary.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

classes_index_additions.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

classes_index_all.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

classes_index_changes.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

classes_index_removals.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

constructors_index_additions.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

constructors_index_all.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

constructors_index_changes.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

constructors_index_removals.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

fields_index_additions.html
[java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ 
changes/

fields_index_all.html
[java]  !?