[jira] Commented: (PIG-894) order-by fails when input is empty

2009-09-14 Thread Ankur (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12754883#action_12754883
 ] 

Ankur commented on PIG-894:
---

Is empty inputs referring to relation - l ('students.txt')  or f (filter l by 1 
== 2). I am seeing a similar issue where the sampler produces an empty file 
when the number of records in the relation being sorted in too low (  4 ). 

 order-by fails when input is empty
 --

 Key: PIG-894
 URL: https://issues.apache.org/jira/browse/PIG-894
 Project: Pig
  Issue Type: Bug
Reporter: Thejas M Nair

 grunt l = load 'students.txt' ;
 grunt f = filter l by 1 == 2;
 grunt o = order f by $0 ;
 grunt dump o;
 This results in 3 MR jobs . The 2nd (sampling) MR creates empty sample file, 
 and 3rd MR (order-by) fails with following error in Map job -
 java.lang.RuntimeException: java.lang.RuntimeException: Empty samples file
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:104)
   at 
 org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58)
   at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82)
   at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.(MapTask.java:348)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:193)
   at 
 org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207)
 Caused by: java.lang.RuntimeException: Empty samples file
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:89)
   ... 5 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-958) Splitting output data on key field

2009-09-14 Thread Ankur (JIRA)
Splitting output data on key field
--

 Key: PIG-958
 URL: https://issues.apache.org/jira/browse/PIG-958
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Ankur


Pig users often face the need to split the output records into a bunch of files 
and directories depending on the type of record. Pig's SPLIT operator is useful 
when record types are few and known in advance. In cases where type is not 
directly known but is derived dynamically from values of a key field in the 
output tuple, a custom store function is a better solution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-793) Improving memory efficiency of Tuple implementation

2009-09-14 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755019#action_12755019
 ] 

Alan Gates commented on PIG-793:


Sri is looking into the array vs arraylist changes as well.

 Improving memory efficiency of Tuple implementation
 ---

 Key: PIG-793
 URL: https://issues.apache.org/jira/browse/PIG-793
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Alan Gates

 Currently, our tuple is a real pig and uses a lot of extra memory. 
 There are several places where we can improve memory efficiency:
 (1) Laying out memory for the fields rather than using java objects since 
 since each object for a numeric field takes 16 bytes
 (2) For the cases where we know the schema using Java arrays rather than 
 ArrayList.
 There might be more.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-891) Fixing dfs statement for Pig

2009-09-14 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755081#action_12755081
 ] 

Daniel Dai commented on PIG-891:


Not quite sure about it now. But I will figure out and let you know. Thanks.

 Fixing dfs statement for Pig
 

 Key: PIG-891
 URL: https://issues.apache.org/jira/browse/PIG-891
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Daniel Dai
Assignee: Jeff Zhang
Priority: Minor
 Fix For: 0.4.0

 Attachments: Pig_891.patch


 Several hadoop dfs commands are not support or restrictive on current Pig. We 
 need to fix that. These include:
 1. Several commands do not supported: lsr, dus, count, rmr, expunge, put, 
 moveFromLocal, get, getmerge, text, moveToLocal, mkdir, touchz, test, stat, 
 tail, chmod, chown, chgrp. A reference for these command can be found in 
 http://hadoop.apache.org/common/docs/current/hdfs_shell.html
 2. All existing dfs commands do not support globing.
 3. Pig should provide a programmatic way to perform dfs commands. Several of 
 them exist in PigServer, but not all of them.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-957) Tutorial is broken with 0.4 branch and trunk

2009-09-14 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-957:
---

Status: Open  (was: Patch Available)

 Tutorial is broken with 0.4 branch and trunk
 

 Key: PIG-957
 URL: https://issues.apache.org/jira/browse/PIG-957
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.3.0
Reporter: Olga Natkovich
Assignee: Pradeep Kamath
 Fix For: 0.4.0

 Attachments: PIG-957-2.patch, PIG-957.patch


 As I was testing the Pig Tutorial in preparation for the release, I found 
 that we broke the second script both in local mode and in MR mode. The issue 
 has to do with schema and naming fields.  
 Here is what I see:
  
 java -cp pig.jar org.apache.pig.Main -x local script2-local.pig
 2009-09-11 12:52:46,961 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1000: Error during parsing. Invalid alias: hour00::group::ngram in 
 {group::ngram: chararray,group::hour: chararray,hour00::count: long,ngram: 
 chararray,hour: chararray,hour12::count: long}
 09/09/11 12:52:46 ERROR grunt.Grunt: ERROR 1000: Error during parsing. 
 Invalid alias: hour00::group::ngram in {group::ngram: chararray,group::hour: 
 chararray,hour00::count: long,ngram: chararray,hour: chararray,hour12::count: 
 long}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-957) Tutorial is broken with 0.4 branch and trunk

2009-09-14 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-957:
---

Attachment: PIG-957-2.patch

There were two unit test failures in the last patch 
1) TestPigServer had a failure which was because join's describe now prefixes 
the outer relation alias for each field - corrected the test case to update the 
expected result.
2) TestSkewedJoin had a timeout - this ran fine on my local box.

Resubmitting with just the change in 1) above.

 Tutorial is broken with 0.4 branch and trunk
 

 Key: PIG-957
 URL: https://issues.apache.org/jira/browse/PIG-957
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.3.0
Reporter: Olga Natkovich
Assignee: Pradeep Kamath
 Fix For: 0.4.0

 Attachments: PIG-957-2.patch, PIG-957.patch


 As I was testing the Pig Tutorial in preparation for the release, I found 
 that we broke the second script both in local mode and in MR mode. The issue 
 has to do with schema and naming fields.  
 Here is what I see:
  
 java -cp pig.jar org.apache.pig.Main -x local script2-local.pig
 2009-09-11 12:52:46,961 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1000: Error during parsing. Invalid alias: hour00::group::ngram in 
 {group::ngram: chararray,group::hour: chararray,hour00::count: long,ngram: 
 chararray,hour: chararray,hour12::count: long}
 09/09/11 12:52:46 ERROR grunt.Grunt: ERROR 1000: Error during parsing. 
 Invalid alias: hour00::group::ngram in {group::ngram: chararray,group::hour: 
 chararray,hour00::count: long,ngram: chararray,hour: chararray,hour12::count: 
 long}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-957) Tutorial is broken with 0.4 branch and trunk

2009-09-14 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-957:
---

Status: Patch Available  (was: Open)

 Tutorial is broken with 0.4 branch and trunk
 

 Key: PIG-957
 URL: https://issues.apache.org/jira/browse/PIG-957
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.3.0
Reporter: Olga Natkovich
Assignee: Pradeep Kamath
 Fix For: 0.4.0

 Attachments: PIG-957-2.patch, PIG-957.patch


 As I was testing the Pig Tutorial in preparation for the release, I found 
 that we broke the second script both in local mode and in MR mode. The issue 
 has to do with schema and naming fields.  
 Here is what I see:
  
 java -cp pig.jar org.apache.pig.Main -x local script2-local.pig
 2009-09-11 12:52:46,961 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1000: Error during parsing. Invalid alias: hour00::group::ngram in 
 {group::ngram: chararray,group::hour: chararray,hour00::count: long,ngram: 
 chararray,hour: chararray,hour12::count: long}
 09/09/11 12:52:46 ERROR grunt.Grunt: ERROR 1000: Error during parsing. 
 Invalid alias: hour00::group::ngram in {group::ngram: chararray,group::hour: 
 chararray,hour00::count: long,ngram: chararray,hour: chararray,hour12::count: 
 long}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour

2009-09-14 Thread Yan Zhou (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755094#action_12755094
 ] 

Yan Zhou commented on PIG-949:
--

The problem is caused by not adding ColumnMappingEntrys from the key-split 
specs in storage info to an  explicitly specified MAP item in storage info, 
thus causing missing CGs as needed by the key-split specs. Everything falls 
apart thereafter. Will create a patch for R1 patch release soon.

 Zebra Bug: splitting map into multiple column group using storage hint causes 
 unexpected behaviour
 --

 Key: PIG-949
 URL: https://issues.apache.org/jira/browse/PIG-949
 Project: Pig
  Issue Type: Bug
 Environment: linux
Reporter: Alok Singh

 Hi 
  The storage hint
 specification plays a important part whether the output table is readable or 
 not
 say if we have have the map 'map'.
 One can split the map into a column group using [map#{k1}, map#{k2}...] 
 however the remaining map field will automatically be added to the default 
 group.
 if user try to create a new column group for the remaining fields as follows
 [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group
 the table writer will create the table.
 however, if one tries to load the created table via pig or via map reduce 
 using TableInputFormat
  
 then the reader  have problem reading the map
 We get the following stack trace
 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : 
 attempt_200908191538_33939_m_21_2, Status : FAILED
 java.io.IOException: getValue() failed: null
 at 
 org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775)
 at 
 org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717)
 at 
 org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191)
 at 
 org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175)
 at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
 at org.apache.hadoop.mapred.Child.main(Child.java:170)
 Alok

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-957) Tutorial is broken with 0.4 branch and trunk

2009-09-14 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755101#action_12755101
 ] 

Olga Natkovich commented on PIG-957:


Pradeep, please, commit. The change is trivial enough not to wait for another 
automated test run.

 Tutorial is broken with 0.4 branch and trunk
 

 Key: PIG-957
 URL: https://issues.apache.org/jira/browse/PIG-957
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.3.0
Reporter: Olga Natkovich
Assignee: Pradeep Kamath
 Fix For: 0.4.0

 Attachments: PIG-957-2.patch, PIG-957.patch


 As I was testing the Pig Tutorial in preparation for the release, I found 
 that we broke the second script both in local mode and in MR mode. The issue 
 has to do with schema and naming fields.  
 Here is what I see:
  
 java -cp pig.jar org.apache.pig.Main -x local script2-local.pig
 2009-09-11 12:52:46,961 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1000: Error during parsing. Invalid alias: hour00::group::ngram in 
 {group::ngram: chararray,group::hour: chararray,hour00::count: long,ngram: 
 chararray,hour: chararray,hour12::count: long}
 09/09/11 12:52:46 ERROR grunt.Grunt: ERROR 1000: Error during parsing. 
 Invalid alias: hour00::group::ngram in {group::ngram: chararray,group::hour: 
 chararray,hour00::count: long,ngram: chararray,hour: chararray,hour12::count: 
 long}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-955) Skewed join generates incorrect results

2009-09-14 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-955:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

patch committed to trunk and branch-04. Thanks, Ying

 Skewed join generates  incorrect results 
 -

 Key: PIG-955
 URL: https://issues.apache.org/jira/browse/PIG-955
 Project: Pig
  Issue Type: Improvement
Reporter: Ying He
 Attachments: PIG-955.patch, PIG-955.patch2


 SkewedPartitioner doesn't partition the skewed keys in partition table (first 
 table) correctly. This can cause data loss.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-957) Tutorial is broken with 0.4 branch and trunk

2009-09-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755140#action_12755140
 ] 

Hadoop QA commented on PIG-957:
---

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12419544/PIG-957-2.patch
  against trunk revision 814075.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/27/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/27/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/27/console

This message is automatically generated.

 Tutorial is broken with 0.4 branch and trunk
 

 Key: PIG-957
 URL: https://issues.apache.org/jira/browse/PIG-957
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.3.0
Reporter: Olga Natkovich
Assignee: Pradeep Kamath
 Fix For: 0.4.0

 Attachments: PIG-957-2.patch, PIG-957.patch


 As I was testing the Pig Tutorial in preparation for the release, I found 
 that we broke the second script both in local mode and in MR mode. The issue 
 has to do with schema and naming fields.  
 Here is what I see:
  
 java -cp pig.jar org.apache.pig.Main -x local script2-local.pig
 2009-09-11 12:52:46,961 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1000: Error during parsing. Invalid alias: hour00::group::ngram in 
 {group::ngram: chararray,group::hour: chararray,hour00::count: long,ngram: 
 chararray,hour: chararray,hour12::count: long}
 09/09/11 12:52:46 ERROR grunt.Grunt: ERROR 1000: Error during parsing. 
 Invalid alias: hour00::group::ngram in {group::ngram: chararray,group::hour: 
 chararray,hour00::count: long,ngram: chararray,hour: chararray,hour12::count: 
 long}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-957) Tutorial is broken with 0.4 branch and trunk

2009-09-14 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-957:
---

  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Patch committed to both trunk and branch-0.4

 Tutorial is broken with 0.4 branch and trunk
 

 Key: PIG-957
 URL: https://issues.apache.org/jira/browse/PIG-957
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.3.0
Reporter: Olga Natkovich
Assignee: Pradeep Kamath
 Fix For: 0.4.0

 Attachments: PIG-957-2.patch, PIG-957.patch


 As I was testing the Pig Tutorial in preparation for the release, I found 
 that we broke the second script both in local mode and in MR mode. The issue 
 has to do with schema and naming fields.  
 Here is what I see:
  
 java -cp pig.jar org.apache.pig.Main -x local script2-local.pig
 2009-09-11 12:52:46,961 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1000: Error during parsing. Invalid alias: hour00::group::ngram in 
 {group::ngram: chararray,group::hour: chararray,hour00::count: long,ngram: 
 chararray,hour: chararray,hour12::count: long}
 09/09/11 12:52:46 ERROR grunt.Grunt: ERROR 1000: Error during parsing. 
 Invalid alias: hour00::group::ngram in {group::ngram: chararray,group::hour: 
 chararray,hour00::count: long,ngram: chararray,hour: chararray,hour12::count: 
 long}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-922) Logical optimizer: push up project

2009-09-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-922:
---

Attachment: PIG-922-p3_1.patch

Attach phase 3 patch. I am still working on adding more unit test.

 Logical optimizer: push up project
 --

 Key: PIG-922
 URL: https://issues.apache.org/jira/browse/PIG-922
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.3.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Attachments: PIG-922-p1_0.patch, PIG-922-p1_1.patch, 
 PIG-922-p1_2.patch, PIG-922-p1_3.patch, PIG-922-p1_4.patch, 
 PIG-922-p2_preview.patch, PIG-922-p2_preview2.patch, PIG-922-p3_1.patch


 This is a continuation work of 
 [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add 
 another rule to the logical optimizer: Push up project, ie, prune columns as 
 early as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: [VOTE] Release Pig 0.4.0 (candidate 0)

2009-09-14 Thread Pradeep Kamath
+1 for release.

-Original Message-
From: Olga Natkovich [mailto:ol...@yahoo-inc.com] 
Sent: Monday, September 14, 2009 2:06 PM
To: pig-dev@hadoop.apache.org; priv...@hadoop.apache.org
Subject: [VOTE] Release Pig 0.4.0 (candidate 0)

Hi,

 

I created a candidate build for Pig 0.4.0 release. The highlights of
this release are

 

-  Performance improvements especially in the area of JOIN
support where we introduced two new join types: skew join to deal with
data skew and sort merge join to take advantage of the sorted data sets.

-  Support for Outer join.

-  Works with Hadoop 18

 

I ran the release audit and rat report looked fine. The relevant part is
attached below.

 

Keys used to sign the release are available at
http://svn.apache.org/viewvc/hadoop/pig/trunk/KEYS?view=markup.

 

Please download the release and try it out:
http://people.apache.org/~olga/pig-0.4.0-candidate-0.

 

Should we release this? Vote closes on Thursday, 9/17.

 

Olga

 

 

 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/contrib/CHANGES.txt
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/contrib/zebra/CHANG
ES.txt
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/broken-links.x
ml
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/cookbook.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/index.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/linkmap.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/piglatin_refer
ence.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/piglatin_users
.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/setup.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/tutorial.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/udf.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/api/package-li
st
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes.
html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/missingS
inces.txt
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/user_com
ments_for_pig_0.3.1_to_pig_0.5.0-dev.xml
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
alldiffs_index_additions.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
alldiffs_index_all.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
alldiffs_index_changes.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
alldiffs_index_removals.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
changes-summary.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
classes_index_additions.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
classes_index_all.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
classes_index_changes.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
classes_index_removals.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
constructors_index_additions.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
constructors_index_all.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
constructors_index_changes.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
constructors_index_removals.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
fields_index_additions.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
fields_index_all.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
fields_index_changes.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
fields_index_removals.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
jdiff_help.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
jdiff_statistics.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
jdiff_topleftframe.html
 [java]  !?
/home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/
methods_index_additions.html
 [java]  

[jira] Updated: (PIG-592) schema inferred incorrectly

2009-09-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-592:
---

Fix Version/s: 0.5.0
Affects Version/s: (was: 0.2.0)
   0.4.0
   Status: Patch Available  (was: Open)

 schema inferred incorrectly
 ---

 Key: PIG-592
 URL: https://issues.apache.org/jira/browse/PIG-592
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Christopher Olston
 Fix For: 0.5.0

 Attachments: PIG-592-1.patch


 A simple pig script, that never introduces any schema information:
 A = load 'foo';
 B = foreach (group A by $8) generate group, COUNT($1);
 C = load 'bar';   // ('bar' has two columns)
 D = join B by $0, C by $0;
 E = foreach D generate $0, $1, $3;
 Fails, complaining that $3 does not exist:
 java.io.IOException: Out of bound access. Trying to access non-existent 
 column: 3. Schema {B::group: bytearray,long,bytearray} has 3 column(s).
 Apparently Pig gets confused, and thinks it knows the schema for C (a single 
 bytearray column).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-592) schema inferred incorrectly

2009-09-14 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-592:
---

Attachment: PIG-592-1.patch

 schema inferred incorrectly
 ---

 Key: PIG-592
 URL: https://issues.apache.org/jira/browse/PIG-592
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Christopher Olston
 Fix For: 0.5.0

 Attachments: PIG-592-1.patch


 A simple pig script, that never introduces any schema information:
 A = load 'foo';
 B = foreach (group A by $8) generate group, COUNT($1);
 C = load 'bar';   // ('bar' has two columns)
 D = join B by $0, C by $0;
 E = foreach D generate $0, $1, $3;
 Fails, complaining that $3 does not exist:
 java.io.IOException: Out of bound access. Trying to access non-existent 
 column: 3. Schema {B::group: bytearray,long,bytearray} has 3 column(s).
 Apparently Pig gets confused, and thinks it knows the schema for C (a single 
 bytearray column).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-858) Order By followed by replicated join fails while compiling MR-plan from physical plan

2009-09-14 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan reassigned PIG-858:


Assignee: Ashutosh Chauhan

 Order By followed by replicated join fails while compiling MR-plan from 
 physical plan
 ---

 Key: PIG-858
 URL: https://issues.apache.org/jira/browse/PIG-858
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.3.0
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan
 Attachments: pig-858.patch


 Consider the query:
 {code}
 A = load 'a';
 B = order A by $0;
 C = join A by $0, B by $0;
 explain C;
 {code}
 works. But if replicated join is used instead
 {code}
 A = load 'a';
 B = order A by $0;
 C = join A by $0, B by $0 using replicated;
 explain C;
 {code}
 this fails with ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2034: Error 
 compiling operator POFRJoin
 relevant stacktrace:
 {code}
 Caused by: java.lang.RuntimeException: 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompilerException:
  ERROR 2034: Error compiling operator POFRJoin
 at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.explain(HExecutionEngine.java:306)
 at org.apache.pig.PigServer.explain(PigServer.java:574)
 ... 8 more
 Caused by: 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompilerException:
  ERROR 2034: Error compiling operator POFRJoin
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:942)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.visit(POFRJoin.java:173)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:342)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:327)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:233)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:301)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.explain(MapReduceLauncher.java:278)
 at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.explain(HExecutionEngine.java:303)
 ... 9 more
 Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:901)
 ... 16 more
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-858) Order By followed by replicated join fails while compiling MR-plan from physical plan

2009-09-14 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-858:
-

Attachment: pig-858.patch

Patch as discussed in previous comment. Also included are test cases, where 
blocking operator (order-by, distinct) occurs before FRjoin.

 Order By followed by replicated join fails while compiling MR-plan from 
 physical plan
 ---

 Key: PIG-858
 URL: https://issues.apache.org/jira/browse/PIG-858
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.3.0
Reporter: Ashutosh Chauhan
 Attachments: pig-858.patch


 Consider the query:
 {code}
 A = load 'a';
 B = order A by $0;
 C = join A by $0, B by $0;
 explain C;
 {code}
 works. But if replicated join is used instead
 {code}
 A = load 'a';
 B = order A by $0;
 C = join A by $0, B by $0 using replicated;
 explain C;
 {code}
 this fails with ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2034: Error 
 compiling operator POFRJoin
 relevant stacktrace:
 {code}
 Caused by: java.lang.RuntimeException: 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompilerException:
  ERROR 2034: Error compiling operator POFRJoin
 at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.explain(HExecutionEngine.java:306)
 at org.apache.pig.PigServer.explain(PigServer.java:574)
 ... 8 more
 Caused by: 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompilerException:
  ERROR 2034: Error compiling operator POFRJoin
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:942)
 at 
 org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.visit(POFRJoin.java:173)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:342)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:327)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:233)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:301)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.explain(MapReduceLauncher.java:278)
 at 
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.explain(HExecutionEngine.java:303)
 ... 9 more
 Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:901)
 ... 16 more
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-959) Merge Join fails when there is a blocking operator before it in query.

2009-09-14 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan reassigned PIG-959:


Assignee: Ashutosh Chauhan

 Merge Join fails when there is a blocking operator before it in query.
 --

 Key: PIG-959
 URL: https://issues.apache.org/jira/browse/PIG-959
 Project: Pig
  Issue Type: Bug
Reporter: Ashutosh Chauhan
Assignee: Ashutosh Chauhan

 If there is an order-by, distinct or any other blocking operator in query 
 followed by Merge Join, pig fails to compile it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-959) Merge Join fails when there is a blocking operator before it in query.

2009-09-14 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755270#action_12755270
 ] 

Ashutosh Chauhan commented on PIG-959:
--

This issue is blocked on PIG-858

 Merge Join fails when there is a blocking operator before it in query.
 --

 Key: PIG-959
 URL: https://issues.apache.org/jira/browse/PIG-959
 Project: Pig
  Issue Type: Bug
Reporter: Ashutosh Chauhan

 If there is an order-by, distinct or any other blocking operator in query 
 followed by Merge Join, pig fails to compile it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-959) Merge Join fails when there is a blocking operator before it in query.

2009-09-14 Thread Ashutosh Chauhan (JIRA)
Merge Join fails when there is a blocking operator before it in query.
--

 Key: PIG-959
 URL: https://issues.apache.org/jira/browse/PIG-959
 Project: Pig
  Issue Type: Bug
Reporter: Ashutosh Chauhan


If there is an order-by, distinct or any other blocking operator in query 
followed by Merge Join, pig fails to compile it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage

2009-09-14 Thread Ankit Modi (JIRA)
Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage 
---

 Key: PIG-960
 URL: https://issues.apache.org/jira/browse/PIG-960
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Ankit Modi


PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's 
{{LineRecordReader}}.

This can help in following areas
- Improving performance reading of Tuples (lines) in {{PigStorage}}
- Any future improvements in line reading done in Hadoop's {{LineRecordReader}} 
is automatically carried over to Pig

Issues that are handled by this patch
- BZip uses internal buffers and positioning for determining the number of 
bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off
- Current implementation of {{LocalSeekableInputStream}} does not implement 
{{available}} method. This method has to be implemented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage

2009-09-14 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-960:
---

Patch Info:   (was: [Patch Available])

Performance improvement numbers obtained by running PigMix

||Script||svn Trunk||LineRecordReader Patch||
||L1|186|147|
||L2|73|33|
||L3|195|165|
||L4|116|76|
||L5|93|59|
||L6|102|63|
||L7|91|69|
||L8|84|44|
||L9|189|148|
||L10|285|268|
||L11|108|51|
||L12|112|73|
||Sum|1634|1196|
||% Improvement| ||26.81|



 Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage 
 ---

 Key: PIG-960
 URL: https://issues.apache.org/jira/browse/PIG-960
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Ankit Modi

 PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's 
 {{LineRecordReader}}.
 This can help in following areas
 - Improving performance reading of Tuples (lines) in {{PigStorage}}
 - Any future improvements in line reading done in Hadoop's 
 {{LineRecordReader}} is automatically carried over to Pig
 Issues that are handled by this patch
 - BZip uses internal buffers and positioning for determining the number of 
 bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off
 - Current implementation of {{LocalSeekableInputStream}} does not implement 
 {{available}} method. This method has to be implemented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.