[jira] Commented: (PIG-894) order-by fails when input is empty
[ https://issues.apache.org/jira/browse/PIG-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12754883#action_12754883 ] Ankur commented on PIG-894: --- Is empty inputs referring to relation - l ('students.txt') or f (filter l by 1 == 2). I am seeing a similar issue where the sampler produces an empty file when the number of records in the relation being sorted in too low ( 4 ). order-by fails when input is empty -- Key: PIG-894 URL: https://issues.apache.org/jira/browse/PIG-894 Project: Pig Issue Type: Bug Reporter: Thejas M Nair grunt l = load 'students.txt' ; grunt f = filter l by 1 == 2; grunt o = order f by $0 ; grunt dump o; This results in 3 MR jobs . The 2nd (sampling) MR creates empty sample file, and 3rd MR (order-by) fails with following error in Map job - java.lang.RuntimeException: java.lang.RuntimeException: Empty samples file at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:104) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.(MapTask.java:348) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:193) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207) Caused by: java.lang.RuntimeException: Empty samples file at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:89) ... 5 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-958) Splitting output data on key field
Splitting output data on key field -- Key: PIG-958 URL: https://issues.apache.org/jira/browse/PIG-958 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: Ankur Pig users often face the need to split the output records into a bunch of files and directories depending on the type of record. Pig's SPLIT operator is useful when record types are few and known in advance. In cases where type is not directly known but is derived dynamically from values of a key field in the output tuple, a custom store function is a better solution. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-793) Improving memory efficiency of Tuple implementation
[ https://issues.apache.org/jira/browse/PIG-793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755019#action_12755019 ] Alan Gates commented on PIG-793: Sri is looking into the array vs arraylist changes as well. Improving memory efficiency of Tuple implementation --- Key: PIG-793 URL: https://issues.apache.org/jira/browse/PIG-793 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Alan Gates Currently, our tuple is a real pig and uses a lot of extra memory. There are several places where we can improve memory efficiency: (1) Laying out memory for the fields rather than using java objects since since each object for a numeric field takes 16 bytes (2) For the cases where we know the schema using Java arrays rather than ArrayList. There might be more. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-891) Fixing dfs statement for Pig
[ https://issues.apache.org/jira/browse/PIG-891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755081#action_12755081 ] Daniel Dai commented on PIG-891: Not quite sure about it now. But I will figure out and let you know. Thanks. Fixing dfs statement for Pig Key: PIG-891 URL: https://issues.apache.org/jira/browse/PIG-891 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: Daniel Dai Assignee: Jeff Zhang Priority: Minor Fix For: 0.4.0 Attachments: Pig_891.patch Several hadoop dfs commands are not support or restrictive on current Pig. We need to fix that. These include: 1. Several commands do not supported: lsr, dus, count, rmr, expunge, put, moveFromLocal, get, getmerge, text, moveToLocal, mkdir, touchz, test, stat, tail, chmod, chown, chgrp. A reference for these command can be found in http://hadoop.apache.org/common/docs/current/hdfs_shell.html 2. All existing dfs commands do not support globing. 3. Pig should provide a programmatic way to perform dfs commands. Several of them exist in PigServer, but not all of them. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-957) Tutorial is broken with 0.4 branch and trunk
[ https://issues.apache.org/jira/browse/PIG-957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-957: --- Status: Open (was: Patch Available) Tutorial is broken with 0.4 branch and trunk Key: PIG-957 URL: https://issues.apache.org/jira/browse/PIG-957 Project: Pig Issue Type: Bug Affects Versions: 0.3.0 Reporter: Olga Natkovich Assignee: Pradeep Kamath Fix For: 0.4.0 Attachments: PIG-957-2.patch, PIG-957.patch As I was testing the Pig Tutorial in preparation for the release, I found that we broke the second script both in local mode and in MR mode. The issue has to do with schema and naming fields. Here is what I see: java -cp pig.jar org.apache.pig.Main -x local script2-local.pig 2009-09-11 12:52:46,961 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Invalid alias: hour00::group::ngram in {group::ngram: chararray,group::hour: chararray,hour00::count: long,ngram: chararray,hour: chararray,hour12::count: long} 09/09/11 12:52:46 ERROR grunt.Grunt: ERROR 1000: Error during parsing. Invalid alias: hour00::group::ngram in {group::ngram: chararray,group::hour: chararray,hour00::count: long,ngram: chararray,hour: chararray,hour12::count: long} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-957) Tutorial is broken with 0.4 branch and trunk
[ https://issues.apache.org/jira/browse/PIG-957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-957: --- Attachment: PIG-957-2.patch There were two unit test failures in the last patch 1) TestPigServer had a failure which was because join's describe now prefixes the outer relation alias for each field - corrected the test case to update the expected result. 2) TestSkewedJoin had a timeout - this ran fine on my local box. Resubmitting with just the change in 1) above. Tutorial is broken with 0.4 branch and trunk Key: PIG-957 URL: https://issues.apache.org/jira/browse/PIG-957 Project: Pig Issue Type: Bug Affects Versions: 0.3.0 Reporter: Olga Natkovich Assignee: Pradeep Kamath Fix For: 0.4.0 Attachments: PIG-957-2.patch, PIG-957.patch As I was testing the Pig Tutorial in preparation for the release, I found that we broke the second script both in local mode and in MR mode. The issue has to do with schema and naming fields. Here is what I see: java -cp pig.jar org.apache.pig.Main -x local script2-local.pig 2009-09-11 12:52:46,961 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Invalid alias: hour00::group::ngram in {group::ngram: chararray,group::hour: chararray,hour00::count: long,ngram: chararray,hour: chararray,hour12::count: long} 09/09/11 12:52:46 ERROR grunt.Grunt: ERROR 1000: Error during parsing. Invalid alias: hour00::group::ngram in {group::ngram: chararray,group::hour: chararray,hour00::count: long,ngram: chararray,hour: chararray,hour12::count: long} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-957) Tutorial is broken with 0.4 branch and trunk
[ https://issues.apache.org/jira/browse/PIG-957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-957: --- Status: Patch Available (was: Open) Tutorial is broken with 0.4 branch and trunk Key: PIG-957 URL: https://issues.apache.org/jira/browse/PIG-957 Project: Pig Issue Type: Bug Affects Versions: 0.3.0 Reporter: Olga Natkovich Assignee: Pradeep Kamath Fix For: 0.4.0 Attachments: PIG-957-2.patch, PIG-957.patch As I was testing the Pig Tutorial in preparation for the release, I found that we broke the second script both in local mode and in MR mode. The issue has to do with schema and naming fields. Here is what I see: java -cp pig.jar org.apache.pig.Main -x local script2-local.pig 2009-09-11 12:52:46,961 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Invalid alias: hour00::group::ngram in {group::ngram: chararray,group::hour: chararray,hour00::count: long,ngram: chararray,hour: chararray,hour12::count: long} 09/09/11 12:52:46 ERROR grunt.Grunt: ERROR 1000: Error during parsing. Invalid alias: hour00::group::ngram in {group::ngram: chararray,group::hour: chararray,hour00::count: long,ngram: chararray,hour: chararray,hour12::count: long} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour
[ https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755094#action_12755094 ] Yan Zhou commented on PIG-949: -- The problem is caused by not adding ColumnMappingEntrys from the key-split specs in storage info to an explicitly specified MAP item in storage info, thus causing missing CGs as needed by the key-split specs. Everything falls apart thereafter. Will create a patch for R1 patch release soon. Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour -- Key: PIG-949 URL: https://issues.apache.org/jira/browse/PIG-949 Project: Pig Issue Type: Bug Environment: linux Reporter: Alok Singh Hi The storage hint specification plays a important part whether the output table is readable or not say if we have have the map 'map'. One can split the map into a column group using [map#{k1}, map#{k2}...] however the remaining map field will automatically be added to the default group. if user try to create a new column group for the remaining fields as follows [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group the table writer will create the table. however, if one tries to load the created table via pig or via map reduce using TableInputFormat then the reader have problem reading the map We get the following stack trace 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : attempt_200908191538_33939_m_21_2, Status : FAILED java.io.IOException: getValue() failed: null at org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775) at org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717) at org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-957) Tutorial is broken with 0.4 branch and trunk
[ https://issues.apache.org/jira/browse/PIG-957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755101#action_12755101 ] Olga Natkovich commented on PIG-957: Pradeep, please, commit. The change is trivial enough not to wait for another automated test run. Tutorial is broken with 0.4 branch and trunk Key: PIG-957 URL: https://issues.apache.org/jira/browse/PIG-957 Project: Pig Issue Type: Bug Affects Versions: 0.3.0 Reporter: Olga Natkovich Assignee: Pradeep Kamath Fix For: 0.4.0 Attachments: PIG-957-2.patch, PIG-957.patch As I was testing the Pig Tutorial in preparation for the release, I found that we broke the second script both in local mode and in MR mode. The issue has to do with schema and naming fields. Here is what I see: java -cp pig.jar org.apache.pig.Main -x local script2-local.pig 2009-09-11 12:52:46,961 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Invalid alias: hour00::group::ngram in {group::ngram: chararray,group::hour: chararray,hour00::count: long,ngram: chararray,hour: chararray,hour12::count: long} 09/09/11 12:52:46 ERROR grunt.Grunt: ERROR 1000: Error during parsing. Invalid alias: hour00::group::ngram in {group::ngram: chararray,group::hour: chararray,hour00::count: long,ngram: chararray,hour: chararray,hour12::count: long} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-955) Skewed join generates incorrect results
[ https://issues.apache.org/jira/browse/PIG-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-955: --- Resolution: Fixed Status: Resolved (was: Patch Available) patch committed to trunk and branch-04. Thanks, Ying Skewed join generates incorrect results - Key: PIG-955 URL: https://issues.apache.org/jira/browse/PIG-955 Project: Pig Issue Type: Improvement Reporter: Ying He Attachments: PIG-955.patch, PIG-955.patch2 SkewedPartitioner doesn't partition the skewed keys in partition table (first table) correctly. This can cause data loss. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-957) Tutorial is broken with 0.4 branch and trunk
[ https://issues.apache.org/jira/browse/PIG-957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755140#action_12755140 ] Hadoop QA commented on PIG-957: --- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12419544/PIG-957-2.patch against trunk revision 814075. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 6 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/27/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/27/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/27/console This message is automatically generated. Tutorial is broken with 0.4 branch and trunk Key: PIG-957 URL: https://issues.apache.org/jira/browse/PIG-957 Project: Pig Issue Type: Bug Affects Versions: 0.3.0 Reporter: Olga Natkovich Assignee: Pradeep Kamath Fix For: 0.4.0 Attachments: PIG-957-2.patch, PIG-957.patch As I was testing the Pig Tutorial in preparation for the release, I found that we broke the second script both in local mode and in MR mode. The issue has to do with schema and naming fields. Here is what I see: java -cp pig.jar org.apache.pig.Main -x local script2-local.pig 2009-09-11 12:52:46,961 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Invalid alias: hour00::group::ngram in {group::ngram: chararray,group::hour: chararray,hour00::count: long,ngram: chararray,hour: chararray,hour12::count: long} 09/09/11 12:52:46 ERROR grunt.Grunt: ERROR 1000: Error during parsing. Invalid alias: hour00::group::ngram in {group::ngram: chararray,group::hour: chararray,hour00::count: long,ngram: chararray,hour: chararray,hour12::count: long} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-957) Tutorial is broken with 0.4 branch and trunk
[ https://issues.apache.org/jira/browse/PIG-957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-957: --- Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Patch committed to both trunk and branch-0.4 Tutorial is broken with 0.4 branch and trunk Key: PIG-957 URL: https://issues.apache.org/jira/browse/PIG-957 Project: Pig Issue Type: Bug Affects Versions: 0.3.0 Reporter: Olga Natkovich Assignee: Pradeep Kamath Fix For: 0.4.0 Attachments: PIG-957-2.patch, PIG-957.patch As I was testing the Pig Tutorial in preparation for the release, I found that we broke the second script both in local mode and in MR mode. The issue has to do with schema and naming fields. Here is what I see: java -cp pig.jar org.apache.pig.Main -x local script2-local.pig 2009-09-11 12:52:46,961 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1000: Error during parsing. Invalid alias: hour00::group::ngram in {group::ngram: chararray,group::hour: chararray,hour00::count: long,ngram: chararray,hour: chararray,hour12::count: long} 09/09/11 12:52:46 ERROR grunt.Grunt: ERROR 1000: Error during parsing. Invalid alias: hour00::group::ngram in {group::ngram: chararray,group::hour: chararray,hour00::count: long,ngram: chararray,hour: chararray,hour12::count: long} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-922) Logical optimizer: push up project
[ https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-922: --- Attachment: PIG-922-p3_1.patch Attach phase 3 patch. I am still working on adding more unit test. Logical optimizer: push up project -- Key: PIG-922 URL: https://issues.apache.org/jira/browse/PIG-922 Project: Pig Issue Type: New Feature Components: impl Affects Versions: 0.3.0 Reporter: Daniel Dai Assignee: Daniel Dai Attachments: PIG-922-p1_0.patch, PIG-922-p1_1.patch, PIG-922-p1_2.patch, PIG-922-p1_3.patch, PIG-922-p1_4.patch, PIG-922-p2_preview.patch, PIG-922-p2_preview2.patch, PIG-922-p3_1.patch This is a continuation work of [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add another rule to the logical optimizer: Push up project, ie, prune columns as early as possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: [VOTE] Release Pig 0.4.0 (candidate 0)
+1 for release. -Original Message- From: Olga Natkovich [mailto:ol...@yahoo-inc.com] Sent: Monday, September 14, 2009 2:06 PM To: pig-dev@hadoop.apache.org; priv...@hadoop.apache.org Subject: [VOTE] Release Pig 0.4.0 (candidate 0) Hi, I created a candidate build for Pig 0.4.0 release. The highlights of this release are - Performance improvements especially in the area of JOIN support where we introduced two new join types: skew join to deal with data skew and sort merge join to take advantage of the sorted data sets. - Support for Outer join. - Works with Hadoop 18 I ran the release audit and rat report looked fine. The relevant part is attached below. Keys used to sign the release are available at http://svn.apache.org/viewvc/hadoop/pig/trunk/KEYS?view=markup. Please download the release and try it out: http://people.apache.org/~olga/pig-0.4.0-candidate-0. Should we release this? Vote closes on Thursday, 9/17. Olga [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/contrib/CHANGES.txt [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/contrib/zebra/CHANG ES.txt [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/broken-links.x ml [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/cookbook.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/index.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/linkmap.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/piglatin_refer ence.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/piglatin_users .html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/setup.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/tutorial.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/udf.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/api/package-li st [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes. html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/missingS inces.txt [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/user_com ments_for_pig_0.3.1_to_pig_0.5.0-dev.xml [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ alldiffs_index_additions.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ alldiffs_index_all.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ alldiffs_index_changes.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ alldiffs_index_removals.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ changes-summary.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ classes_index_additions.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ classes_index_all.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ classes_index_changes.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ classes_index_removals.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ constructors_index_additions.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ constructors_index_all.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ constructors_index_changes.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ constructors_index_removals.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ fields_index_additions.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ fields_index_all.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ fields_index_changes.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ fields_index_removals.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ jdiff_help.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ jdiff_statistics.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ jdiff_topleftframe.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ methods_index_additions.html [java]
[jira] Updated: (PIG-592) schema inferred incorrectly
[ https://issues.apache.org/jira/browse/PIG-592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-592: --- Fix Version/s: 0.5.0 Affects Version/s: (was: 0.2.0) 0.4.0 Status: Patch Available (was: Open) schema inferred incorrectly --- Key: PIG-592 URL: https://issues.apache.org/jira/browse/PIG-592 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: Christopher Olston Fix For: 0.5.0 Attachments: PIG-592-1.patch A simple pig script, that never introduces any schema information: A = load 'foo'; B = foreach (group A by $8) generate group, COUNT($1); C = load 'bar'; // ('bar' has two columns) D = join B by $0, C by $0; E = foreach D generate $0, $1, $3; Fails, complaining that $3 does not exist: java.io.IOException: Out of bound access. Trying to access non-existent column: 3. Schema {B::group: bytearray,long,bytearray} has 3 column(s). Apparently Pig gets confused, and thinks it knows the schema for C (a single bytearray column). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-592) schema inferred incorrectly
[ https://issues.apache.org/jira/browse/PIG-592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-592: --- Attachment: PIG-592-1.patch schema inferred incorrectly --- Key: PIG-592 URL: https://issues.apache.org/jira/browse/PIG-592 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: Christopher Olston Fix For: 0.5.0 Attachments: PIG-592-1.patch A simple pig script, that never introduces any schema information: A = load 'foo'; B = foreach (group A by $8) generate group, COUNT($1); C = load 'bar'; // ('bar' has two columns) D = join B by $0, C by $0; E = foreach D generate $0, $1, $3; Fails, complaining that $3 does not exist: java.io.IOException: Out of bound access. Trying to access non-existent column: 3. Schema {B::group: bytearray,long,bytearray} has 3 column(s). Apparently Pig gets confused, and thinks it knows the schema for C (a single bytearray column). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-858) Order By followed by replicated join fails while compiling MR-plan from physical plan
[ https://issues.apache.org/jira/browse/PIG-858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan reassigned PIG-858: Assignee: Ashutosh Chauhan Order By followed by replicated join fails while compiling MR-plan from physical plan --- Key: PIG-858 URL: https://issues.apache.org/jira/browse/PIG-858 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.3.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Attachments: pig-858.patch Consider the query: {code} A = load 'a'; B = order A by $0; C = join A by $0, B by $0; explain C; {code} works. But if replicated join is used instead {code} A = load 'a'; B = order A by $0; C = join A by $0, B by $0 using replicated; explain C; {code} this fails with ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2034: Error compiling operator POFRJoin relevant stacktrace: {code} Caused by: java.lang.RuntimeException: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompilerException: ERROR 2034: Error compiling operator POFRJoin at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.explain(HExecutionEngine.java:306) at org.apache.pig.PigServer.explain(PigServer.java:574) ... 8 more Caused by: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompilerException: ERROR 2034: Error compiling operator POFRJoin at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:942) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.visit(POFRJoin.java:173) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:342) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:327) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:233) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:301) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.explain(MapReduceLauncher.java:278) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.explain(HExecutionEngine.java:303) ... 9 more Caused by: java.lang.ArrayIndexOutOfBoundsException: -1 at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:901) ... 16 more {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-858) Order By followed by replicated join fails while compiling MR-plan from physical plan
[ https://issues.apache.org/jira/browse/PIG-858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated PIG-858: - Attachment: pig-858.patch Patch as discussed in previous comment. Also included are test cases, where blocking operator (order-by, distinct) occurs before FRjoin. Order By followed by replicated join fails while compiling MR-plan from physical plan --- Key: PIG-858 URL: https://issues.apache.org/jira/browse/PIG-858 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.3.0 Reporter: Ashutosh Chauhan Attachments: pig-858.patch Consider the query: {code} A = load 'a'; B = order A by $0; C = join A by $0, B by $0; explain C; {code} works. But if replicated join is used instead {code} A = load 'a'; B = order A by $0; C = join A by $0, B by $0 using replicated; explain C; {code} this fails with ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2034: Error compiling operator POFRJoin relevant stacktrace: {code} Caused by: java.lang.RuntimeException: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompilerException: ERROR 2034: Error compiling operator POFRJoin at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.explain(HExecutionEngine.java:306) at org.apache.pig.PigServer.explain(PigServer.java:574) ... 8 more Caused by: org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompilerException: ERROR 2034: Error compiling operator POFRJoin at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:942) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POFRJoin.visit(POFRJoin.java:173) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:342) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:327) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.compile(MRCompiler.java:233) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.compile(MapReduceLauncher.java:301) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.explain(MapReduceLauncher.java:278) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.explain(HExecutionEngine.java:303) ... 9 more Caused by: java.lang.ArrayIndexOutOfBoundsException: -1 at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler.visitFRJoin(MRCompiler.java:901) ... 16 more {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-959) Merge Join fails when there is a blocking operator before it in query.
[ https://issues.apache.org/jira/browse/PIG-959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan reassigned PIG-959: Assignee: Ashutosh Chauhan Merge Join fails when there is a blocking operator before it in query. -- Key: PIG-959 URL: https://issues.apache.org/jira/browse/PIG-959 Project: Pig Issue Type: Bug Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan If there is an order-by, distinct or any other blocking operator in query followed by Merge Join, pig fails to compile it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-959) Merge Join fails when there is a blocking operator before it in query.
[ https://issues.apache.org/jira/browse/PIG-959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755270#action_12755270 ] Ashutosh Chauhan commented on PIG-959: -- This issue is blocked on PIG-858 Merge Join fails when there is a blocking operator before it in query. -- Key: PIG-959 URL: https://issues.apache.org/jira/browse/PIG-959 Project: Pig Issue Type: Bug Reporter: Ashutosh Chauhan If there is an order-by, distinct or any other blocking operator in query followed by Merge Join, pig fails to compile it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-959) Merge Join fails when there is a blocking operator before it in query.
Merge Join fails when there is a blocking operator before it in query. -- Key: PIG-959 URL: https://issues.apache.org/jira/browse/PIG-959 Project: Pig Issue Type: Bug Reporter: Ashutosh Chauhan If there is an order-by, distinct or any other blocking operator in query followed by Merge Join, pig fails to compile it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage
Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage --- Key: PIG-960 URL: https://issues.apache.org/jira/browse/PIG-960 Project: Pig Issue Type: Improvement Components: impl Reporter: Ankit Modi PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's {{LineRecordReader}}. This can help in following areas - Improving performance reading of Tuples (lines) in {{PigStorage}} - Any future improvements in line reading done in Hadoop's {{LineRecordReader}} is automatically carried over to Pig Issues that are handled by this patch - BZip uses internal buffers and positioning for determining the number of bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off - Current implementation of {{LocalSeekableInputStream}} does not implement {{available}} method. This method has to be implemented. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage
[ https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankit Modi updated PIG-960: --- Patch Info: (was: [Patch Available]) Performance improvement numbers obtained by running PigMix ||Script||svn Trunk||LineRecordReader Patch|| ||L1|186|147| ||L2|73|33| ||L3|195|165| ||L4|116|76| ||L5|93|59| ||L6|102|63| ||L7|91|69| ||L8|84|44| ||L9|189|148| ||L10|285|268| ||L11|108|51| ||L12|112|73| ||Sum|1634|1196| ||% Improvement| ||26.81| Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage --- Key: PIG-960 URL: https://issues.apache.org/jira/browse/PIG-960 Project: Pig Issue Type: Improvement Components: impl Reporter: Ankit Modi PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's {{LineRecordReader}}. This can help in following areas - Improving performance reading of Tuples (lines) in {{PigStorage}} - Any future improvements in line reading done in Hadoop's {{LineRecordReader}} is automatically carried over to Pig Issues that are handled by this patch - BZip uses internal buffers and positioning for determining the number of bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off - Current implementation of {{LocalSeekableInputStream}} does not implement {{available}} method. This method has to be implemented. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.