[jira] Updated: (PIG-958) Splitting output data on key field
[ https://issues.apache.org/jira/browse/PIG-958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur updated PIG-958: -- Status: Open (was: Patch Available) Splitting output data on key field -- Key: PIG-958 URL: https://issues.apache.org/jira/browse/PIG-958 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: Ankur Attachments: 958.v1.patch, 958.v2.patch Pig users often face the need to split the output records into a bunch of files and directories depending on the type of record. Pig's SPLIT operator is useful when record types are few and known in advance. In cases where type is not directly known but is derived dynamically from values of a key field in the output tuple, a custom store function is a better solution. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-958) Splitting output data on key field
[ https://issues.apache.org/jira/browse/PIG-958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur updated PIG-958: -- Status: Patch Available (was: Open) Splitting output data on key field -- Key: PIG-958 URL: https://issues.apache.org/jira/browse/PIG-958 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: Ankur Attachments: 958.v2.patch Pig users often face the need to split the output records into a bunch of files and directories depending on the type of record. Pig's SPLIT operator is useful when record types are few and known in advance. In cases where type is not directly known but is derived dynamically from values of a key field in the output tuple, a custom store function is a better solution. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-958) Splitting output data on key field
[ https://issues.apache.org/jira/browse/PIG-958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12758214#action_12758214 ] Hadoop QA commented on PIG-958: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12420264/958.v2.patch against trunk revision 817319. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 280 release audit warnings (more than the trunk's current 278 warnings). -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/43/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/43/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/43/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/43/console This message is automatically generated. Splitting output data on key field -- Key: PIG-958 URL: https://issues.apache.org/jira/browse/PIG-958 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: Ankur Attachments: 958.v2.patch Pig users often face the need to split the output records into a bunch of files and directories depending on the type of record. Pig's SPLIT operator is useful when record types are few and known in advance. In cases where type is not directly known but is derived dynamically from values of a key field in the output tuple, a custom store function is a better solution. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-738) Regexp passed from pigscript fails in UDF
[ https://issues.apache.org/jira/browse/PIG-738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12758272#action_12758272 ] Olga Natkovich commented on PIG-738: +1, please, commit Regexp passed from pigscript fails in UDF --- Key: PIG-738 URL: https://issues.apache.org/jira/browse/PIG-738 Project: Pig Issue Type: Bug Components: grunt Affects Versions: 0.3.0 Reporter: Viraj Bhat Assignee: Pradeep Kamath Fix For: 0.6.0 Attachments: myregexp.jar, PIG-738.patch, RegexGroupCount.java, regexp.pig, regexpinput.txt Consider a pig script which parses and counts regular expressions from a text file. The regular expression supplied in the Pig script needs to escape the . (dot) character. {code} register myregexp.jar; -- pattern not picked up define minelogs ci_pig_udfs.RegexGroupCount('www\\.yahoo\\.com/sports'); A = load '/user/viraj/regexpinput.txt' using PigStorage() as (source : chararray); B = foreach A generate minelogs(source) as sportslogs; dump B; {code} Snippet of UDF RegexGroupCount.java {code} public class RegexGroupCount extends EvalFuncInteger { private final Pattern pattern_; public RegexGroupCount(String patternStr) { System.out.println(My pattern supplied is +patternStr); System.out.println(Equality test +patternStr.equals(www\\.yahoo\\.com/sports)); pattern_ = Pattern.compile(patternStr, Pattern.DOTALL|Pattern.CASE_INSENSITIVE); } public Integer exec(Tuple input) throws IOException { } } {code} Running the above script on the following dataset : dshfdskfwww.yahoo.com/sportsjoadfjdslpdshfdskfwww.yahoo.com/sportsjoadfjdsl kas;dka;sd jsjsjwww.yahoo.com/sports jsdLSJDcom/sports wwwJyahooMcom/sports Results in the following: My pattern supplied is www\\.yahoo\\.com/sports Equality test false My pattern supplied is www\\.yahoo\\.com/sports Equality test false My pattern supplied is www\\.yahoo\\.com/sports Equality test false My pattern supplied is www\\.yahoo\\.com/sports Equality test false My pattern supplied is www\\.yahoo\\.com/sports Equality test false My pattern supplied is www\\.yahoo\\.com/sports Equality test false Userfunc: (Name: UserFunc viraj-Sat Mar 28 02:06:31 PDT 2009-14 function: ci_pig_udfs.RegexGroupCount('www\\.yahoo\\.com/sports') Operator Key: viraj-Sat Mar 28 02:06:31 PDT 2009-14) Userfunc fs: int My pattern supplied is www\\.yahoo\\.com/sports Equality test false My pattern supplied is www\\.yahoo\\.com/sports Equality test false My pattern supplied is www\\.yahoo\\.com/sports Equality test false 2009-03-28 02:06:43,923 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2009-03-28 02:06:43,923 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success! (0) (0) (0) (0) (0) In essence there seems to be no way of passing this type of constructor argument through the Pig script. The only workaround seems to be hard coding the values in the UDF!! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-971) ARITY is not documented
[ https://issues.apache.org/jira/browse/PIG-971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich resolved PIG-971. Resolution: Won't Fix ARITY has been depricated in favor of SIZE ARITY is not documented --- Key: PIG-971 URL: https://issues.apache.org/jira/browse/PIG-971 Project: Pig Issue Type: Bug Components: documentation Affects Versions: 0.3.0, 0.3.1 Reporter: Bogdan Dorohonceanu ARITY is not documented in Pig Latin Manual: http://hadoop.apache.org/pig/docs/r0.3.0/piglatin.html It is only 1 time mentioned in FAQ: Q: How do I prevent failure if some records don't have the needed number of columns? You can filter away those records by including the following in your Pig program: A = LOAD 'foo' USING PigStorage('\t'); B = FILTER A BY ARITY(*) 5; . This code would drop all records that have fewer than five (5) columns. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [VOTE] Release Pig 0.4.0 (candidate 2)
private is the pmc list. Releases need pmc votes, hence we send to private. Alan. On Sep 21, 2009, at 7:46 PM, Milind A Bhandarkar wrote: Unrelated to the message content: why is there a priv...@hadoop.apache.org on the cc here? Is this even a valid alias? An open source project needs to conduct it's discussions in public, so an email address (even) named private makes me very nervous about the development process. - Milind On Sep 21, 2009, at 18:56, Olga Natkovich ol...@yahoo-inc.com wrote: Hi, The new version is available in http://people.apache.org/~olga/pig-0.4.0-candidate-2/. I see one failure in a unit test in piggybank (contrib.) but it is not related to the functions themselves but seems to be an issue with MiniCluster and I don't feel we need to chase this down. I made sure that the same test runs ok with Hadoop 20. Please, vote by end of day on Thursday, 9/24. Olga -Original Message- From: Olga Natkovich [mailto:ol...@yahoo-inc.com] Sent: Thursday, September 17, 2009 12:09 PM To: pig-dev@hadoop.apache.org; priv...@hadoop.apache.org Subject: [VOTE] Release Pig 0.4.0 (candidate 1) Hi, I have fixed the issue causing the failure that Alan reported. Please test the new release: http://people.apache.org/~olga/pig-0.4.0-candidate-1/. Vote closes on Tuesday, 9/22. Olga -Original Message- From: Olga Natkovich [mailto:ol...@yahoo-inc.com] Sent: Monday, September 14, 2009 2:06 PM To: pig-dev@hadoop.apache.org; priv...@hadoop.apache.org Subject: [VOTE] Release Pig 0.4.0 (candidate 0) Hi, I created a candidate build for Pig 0.4.0 release. The highlights of this release are - Performance improvements especially in the area of JOIN support where we introduced two new join types: skew join to deal with data skew and sort merge join to take advantage of the sorted data sets. - Support for Outer join. - Works with Hadoop 18 I ran the release audit and rat report looked fine. The relevant part is attached below. Keys used to sign the release are available at http://svn.apache.org/viewvc/hadoop/pig/trunk/KEYS?view=markup. Please download the release and try it out: http://people.apache.org/~olga/pig-0.4.0-candidate-0. Should we release this? Vote closes on Thursday, 9/17. Olga [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/contrib/ CHANGES.txt [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/contrib/zebra/ CHANG ES.txt [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/broken- links.x ml [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/ cookbook.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/index.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/ linkmap.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/ piglatin_refer ence.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/ piglatin_users .html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/setup.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/ tutorial.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/udf.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/api/ package-li st [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ changes. html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ missingS inces.txt [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ user_com ments_for_pig_0.3.1_to_pig_0.5.0-dev.xml [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ changes/ alldiffs_index_additions.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ changes/ alldiffs_index_all.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ changes/ alldiffs_index_changes.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ changes/ alldiffs_index_removals.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ changes/ changes-summary.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ changes/ classes_index_additions.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ changes/ classes_index_all.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ changes/ classes_index_changes.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ changes/ classes_index_removals.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ changes/ constructors_index_additions.html [java] !?
[jira] Updated: (PIG-738) Regexp passed from pigscript fails in UDF
[ https://issues.apache.org/jira/browse/PIG-738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-738: --- Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Patch committed to trunk Regexp passed from pigscript fails in UDF --- Key: PIG-738 URL: https://issues.apache.org/jira/browse/PIG-738 Project: Pig Issue Type: Bug Components: grunt Affects Versions: 0.3.0 Reporter: Viraj Bhat Assignee: Pradeep Kamath Fix For: 0.6.0 Attachments: myregexp.jar, PIG-738.patch, RegexGroupCount.java, regexp.pig, regexpinput.txt Consider a pig script which parses and counts regular expressions from a text file. The regular expression supplied in the Pig script needs to escape the . (dot) character. {code} register myregexp.jar; -- pattern not picked up define minelogs ci_pig_udfs.RegexGroupCount('www\\.yahoo\\.com/sports'); A = load '/user/viraj/regexpinput.txt' using PigStorage() as (source : chararray); B = foreach A generate minelogs(source) as sportslogs; dump B; {code} Snippet of UDF RegexGroupCount.java {code} public class RegexGroupCount extends EvalFuncInteger { private final Pattern pattern_; public RegexGroupCount(String patternStr) { System.out.println(My pattern supplied is +patternStr); System.out.println(Equality test +patternStr.equals(www\\.yahoo\\.com/sports)); pattern_ = Pattern.compile(patternStr, Pattern.DOTALL|Pattern.CASE_INSENSITIVE); } public Integer exec(Tuple input) throws IOException { } } {code} Running the above script on the following dataset : dshfdskfwww.yahoo.com/sportsjoadfjdslpdshfdskfwww.yahoo.com/sportsjoadfjdsl kas;dka;sd jsjsjwww.yahoo.com/sports jsdLSJDcom/sports wwwJyahooMcom/sports Results in the following: My pattern supplied is www\\.yahoo\\.com/sports Equality test false My pattern supplied is www\\.yahoo\\.com/sports Equality test false My pattern supplied is www\\.yahoo\\.com/sports Equality test false My pattern supplied is www\\.yahoo\\.com/sports Equality test false My pattern supplied is www\\.yahoo\\.com/sports Equality test false My pattern supplied is www\\.yahoo\\.com/sports Equality test false Userfunc: (Name: UserFunc viraj-Sat Mar 28 02:06:31 PDT 2009-14 function: ci_pig_udfs.RegexGroupCount('www\\.yahoo\\.com/sports') Operator Key: viraj-Sat Mar 28 02:06:31 PDT 2009-14) Userfunc fs: int My pattern supplied is www\\.yahoo\\.com/sports Equality test false My pattern supplied is www\\.yahoo\\.com/sports Equality test false My pattern supplied is www\\.yahoo\\.com/sports Equality test false 2009-03-28 02:06:43,923 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2009-03-28 02:06:43,923 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success! (0) (0) (0) (0) (0) In essence there seems to be no way of passing this type of constructor argument through the Pig script. The only workaround seems to be hard coding the values in the UDF!! -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [VOTE] Release Pig 0.4.0 (candidate 2)
Oh, sorry about that, then. Thanks for the info. - milind On 9/22/09 10:29 AM, Alan Gates ga...@yahoo-inc.com wrote: private is the pmc list. Releases need pmc votes, hence we send to private. Alan. On Sep 21, 2009, at 7:46 PM, Milind A Bhandarkar wrote: Unrelated to the message content: why is there a priv...@hadoop.apache.org on the cc here? Is this even a valid alias? An open source project needs to conduct it's discussions in public, so an email address (even) named private makes me very nervous about the development process. - Milind On Sep 21, 2009, at 18:56, Olga Natkovich ol...@yahoo-inc.com wrote: Hi, The new version is available in http://people.apache.org/~olga/pig-0.4.0-candidate-2/. I see one failure in a unit test in piggybank (contrib.) but it is not related to the functions themselves but seems to be an issue with MiniCluster and I don't feel we need to chase this down. I made sure that the same test runs ok with Hadoop 20. Please, vote by end of day on Thursday, 9/24. Olga -Original Message- From: Olga Natkovich [mailto:ol...@yahoo-inc.com] Sent: Thursday, September 17, 2009 12:09 PM To: pig-dev@hadoop.apache.org; priv...@hadoop.apache.org Subject: [VOTE] Release Pig 0.4.0 (candidate 1) Hi, I have fixed the issue causing the failure that Alan reported. Please test the new release: http://people.apache.org/~olga/pig-0.4.0-candidate-1/. Vote closes on Tuesday, 9/22. Olga -Original Message- From: Olga Natkovich [mailto:ol...@yahoo-inc.com] Sent: Monday, September 14, 2009 2:06 PM To: pig-dev@hadoop.apache.org; priv...@hadoop.apache.org Subject: [VOTE] Release Pig 0.4.0 (candidate 0) Hi, I created a candidate build for Pig 0.4.0 release. The highlights of this release are - Performance improvements especially in the area of JOIN support where we introduced two new join types: skew join to deal with data skew and sort merge join to take advantage of the sorted data sets. - Support for Outer join. - Works with Hadoop 18 I ran the release audit and rat report looked fine. The relevant part is attached below. Keys used to sign the release are available at http://svn.apache.org/viewvc/hadoop/pig/trunk/KEYS?view=markup. Please download the release and try it out: http://people.apache.org/~olga/pig-0.4.0-candidate-0. Should we release this? Vote closes on Thursday, 9/17. Olga [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/contrib/ CHANGES.txt [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/contrib/zebra/ CHANG ES.txt [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/broken- links.x ml [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/ cookbook.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/index.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/ linkmap.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/ piglatin_refer ence.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/ piglatin_users .html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/setup.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/ tutorial.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/udf.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/api/ package-li st [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ changes. html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ missingS inces.txt [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ user_com ments_for_pig_0.3.1_to_pig_0.5.0-dev.xml [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ changes/ alldiffs_index_additions.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ changes/ alldiffs_index_all.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ changes/ alldiffs_index_changes.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ changes/ alldiffs_index_removals.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ changes/ changes-summary.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ changes/ classes_index_additions.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ changes/ classes_index_all.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/ changes/ classes_index_changes.html [java] !?
Re: [VOTE] Release Pig 0.4.0 (candidate 2)
Olga, which test failed? If it's one of the ones I contributed, I'll fix it. -D On Mon, Sep 21, 2009 at 8:54 PM, Olga Natkovich ol...@yahoo-inc.com wrote: Hi, The new version is available in http://people.apache.org/~olga/pig-0.4.0-candidate-2/. I see one failure in a unit test in piggybank (contrib.) but it is not related to the functions themselves but seems to be an issue with MiniCluster and I don't feel we need to chase this down. I made sure that the same test runs ok with Hadoop 20. Please, vote by end of day on Thursday, 9/24. Olga -Original Message- From: Olga Natkovich [mailto:ol...@yahoo-inc.com] Sent: Thursday, September 17, 2009 12:09 PM To: pig-dev@hadoop.apache.org; priv...@hadoop.apache.org Subject: [VOTE] Release Pig 0.4.0 (candidate 1) Hi, I have fixed the issue causing the failure that Alan reported. Please test the new release: http://people.apache.org/~olga/pig-0.4.0-candidate-1/. Vote closes on Tuesday, 9/22. Olga -Original Message- From: Olga Natkovich [mailto:ol...@yahoo-inc.com] Sent: Monday, September 14, 2009 2:06 PM To: pig-dev@hadoop.apache.org; priv...@hadoop.apache.org Subject: [VOTE] Release Pig 0.4.0 (candidate 0) Hi, I created a candidate build for Pig 0.4.0 release. The highlights of this release are - Performance improvements especially in the area of JOIN support where we introduced two new join types: skew join to deal with data skew and sort merge join to take advantage of the sorted data sets. - Support for Outer join. - Works with Hadoop 18 I ran the release audit and rat report looked fine. The relevant part is attached below. Keys used to sign the release are available at http://svn.apache.org/viewvc/hadoop/pig/trunk/KEYS?view=markup. Please download the release and try it out: http://people.apache.org/~olga/pig-0.4.0-candidate-0. Should we release this? Vote closes on Thursday, 9/17. Olga [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/contrib/CHANGES.txt [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/contrib/zebra/CHANG ES.txt [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/broken-links.x ml [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/cookbook.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/index.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/linkmap.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/piglatin_refer ence.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/piglatin_users .html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/setup.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/tutorial.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/udf.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/api/package-li st [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes. html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/missingS inces.txt [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/user_com ments_for_pig_0.3.1_to_pig_0.5.0-dev.xml [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ alldiffs_index_additions.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ alldiffs_index_all.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ alldiffs_index_changes.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ alldiffs_index_removals.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ changes-summary.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ classes_index_additions.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ classes_index_all.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ classes_index_changes.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ classes_index_removals.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ constructors_index_additions.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ constructors_index_all.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ constructors_index_changes.html [java] !?
[jira] Updated: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour
[ https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-949: - Attachment: Pig_949.patch Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour -- Key: PIG-949 URL: https://issues.apache.org/jira/browse/PIG-949 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Environment: linux Reporter: Alok Singh Assignee: Yan Zhou Attachments: Pig_949.patch, Pig_949.patch Hi The storage hint specification plays a important part whether the output table is readable or not say if we have have the map 'map'. One can split the map into a column group using [map#{k1}, map#{k2}...] however the remaining map field will automatically be added to the default group. if user try to create a new column group for the remaining fields as follows [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group the table writer will create the table. however, if one tries to load the created table via pig or via map reduce using TableInputFormat then the reader have problem reading the map We get the following stack trace 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : attempt_200908191538_33939_m_21_2, Status : FAILED java.io.IOException: getValue() failed: null at org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775) at org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717) at org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour
[ https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12758387#action_12758387 ] Yan Zhou commented on PIG-949: -- Test case added. Thanks, Yan Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour -- Key: PIG-949 URL: https://issues.apache.org/jira/browse/PIG-949 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Environment: linux Reporter: Alok Singh Assignee: Yan Zhou Attachments: Pig_949.patch, Pig_949.patch Hi The storage hint specification plays a important part whether the output table is readable or not say if we have have the map 'map'. One can split the map into a column group using [map#{k1}, map#{k2}...] however the remaining map field will automatically be added to the default group. if user try to create a new column group for the remaining fields as follows [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group the table writer will create the table. however, if one tries to load the created table via pig or via map reduce using TableInputFormat then the reader have problem reading the map We get the following stack trace 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : attempt_200908191538_33939_m_21_2, Status : FAILED java.io.IOException: getValue() failed: null at org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775) at org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717) at org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-968) findContainingJar fails when there's a + in the path
[ https://issues.apache.org/jira/browse/PIG-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12758394#action_12758394 ] Todd Lipcon commented on PIG-968: - bq. You need to add a unit test that checks that this works when there is a + in the path. This is very difficult to test - we'd need to add several more ant rules to build a jar into the test directory with a + in it and get that on the classpath for tests. findContainingJar itself is not currently tested. bq. Also, a more general question: I'm guessing that '+' isn't the only mishandled character. Are there others that should be checked? It's well-known that URLDecoder actually decodes x-www-form-urlencoded rather than true URL encoding. The spec for that encoding is here: http://www.w3.org/MarkUp/html-spec/html-spec_8.html#SEC8.2.1 As far as I've been able to find, '+' is the only difference, except for perhaps newlines which can't occur in pathnames afaik. More info here: http://en.wikipedia.org/wiki/Percent-encoding#The_application.2Fx-www-form-urlencoded_type findContainingJar fails when there's a + in the path Key: PIG-968 URL: https://issues.apache.org/jira/browse/PIG-968 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.4.0, 0.5.0 Reporter: Todd Lipcon Attachments: pig-968.txt This is the same bug as in MAPREDUCE-714. Please see discussion there. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Reopened: (PIG-513) PERFORMANCE: optimize some of the code in DefaultTuple
[ https://issues.apache.org/jira/browse/PIG-513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath reopened PIG-513: Pig is supposed to provide nulls for columns not present in the data. For example if a file has 3 columns name, age gpa then the following statement should still work with the column 'extra' getting nulls: {noformat} a = load 'input' as (name, age, gpa, extra); {noformat} This is broken with the current code. PERFORMANCE: optimize some of the code in DefaultTuple -- Key: PIG-513 URL: https://issues.apache.org/jira/browse/PIG-513 Project: Pig Issue Type: Bug Affects Versions: 0.2.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.6.0 Attachments: PIG-513.patch, pig-513_2.patch The following areas in DefaultTuple.java can be changed: The member methods get(), set(), getType() and isNull() all call checkBounds() which is redundant call since all these 4 functions throw ExecException. Instead of doing a bounds check, we can catch the IndexOutOfBounds exception in a try-catch and throw it as an ExecException The write() method has the following unused object (d in the code below): {code} for (int i = 0; i sz; i++) { try { Object d = get(i); } catch (ExecException ee) { throw new RuntimeException(ee); } DataReaderWriter.writeDatum(out, mFields.get(i)); } {code} {noformat} The get(i) call in the try should be replaced by the writeDatum call directly since d is never used and there is an unncessary call to get() {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-513) PERFORMANCE: optimize some of the code in DefaultTuple
[ https://issues.apache.org/jira/browse/PIG-513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-513: --- Status: Patch Available (was: Reopened) PERFORMANCE: optimize some of the code in DefaultTuple -- Key: PIG-513 URL: https://issues.apache.org/jira/browse/PIG-513 Project: Pig Issue Type: Bug Affects Versions: 0.2.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.6.0 Attachments: PIG-513-3.patch, PIG-513.patch, pig-513_2.patch The following areas in DefaultTuple.java can be changed: The member methods get(), set(), getType() and isNull() all call checkBounds() which is redundant call since all these 4 functions throw ExecException. Instead of doing a bounds check, we can catch the IndexOutOfBounds exception in a try-catch and throw it as an ExecException The write() method has the following unused object (d in the code below): {code} for (int i = 0; i sz; i++) { try { Object d = get(i); } catch (ExecException ee) { throw new RuntimeException(ee); } DataReaderWriter.writeDatum(out, mFields.get(i)); } {code} {noformat} The get(i) call in the try should be replaced by the writeDatum call directly since d is never used and there is an unncessary call to get() {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-513) PERFORMANCE: optimize some of the code in DefaultTuple
[ https://issues.apache.org/jira/browse/PIG-513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-513: --- Attachment: PIG-513-3.patch Attached patch to fix the issue. The fix involves changing POProject to catch IndexOutOfBoundsException and set up nulls for non existent fields. Similarly POUserFunc has also been changed to catch IndexOutOfBoundsException so that a more meaningful message can be provided to the user. Unit test has been added to the patch. PERFORMANCE: optimize some of the code in DefaultTuple -- Key: PIG-513 URL: https://issues.apache.org/jira/browse/PIG-513 Project: Pig Issue Type: Bug Affects Versions: 0.2.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.6.0 Attachments: PIG-513-3.patch, PIG-513.patch, pig-513_2.patch The following areas in DefaultTuple.java can be changed: The member methods get(), set(), getType() and isNull() all call checkBounds() which is redundant call since all these 4 functions throw ExecException. Instead of doing a bounds check, we can catch the IndexOutOfBounds exception in a try-catch and throw it as an ExecException The write() method has the following unused object (d in the code below): {code} for (int i = 0; i sz; i++) { try { Object d = get(i); } catch (ExecException ee) { throw new RuntimeException(ee); } DataReaderWriter.writeDatum(out, mFields.get(i)); } {code} {noformat} The get(i) call in the try should be replaced by the writeDatum call directly since d is never used and there is an unncessary call to get() {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour
[ https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-949: - Attachment: Pig_949.patch change the unit test case to TestNonDefaultWholeMapSplit Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour -- Key: PIG-949 URL: https://issues.apache.org/jira/browse/PIG-949 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Environment: linux Reporter: Alok Singh Assignee: Yan Zhou Attachments: Pig_949.patch, Pig_949.patch, Pig_949.patch Hi The storage hint specification plays a important part whether the output table is readable or not say if we have have the map 'map'. One can split the map into a column group using [map#{k1}, map#{k2}...] however the remaining map field will automatically be added to the default group. if user try to create a new column group for the remaining fields as follows [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group the table writer will create the table. however, if one tries to load the created table via pig or via map reduce using TableInputFormat then the reader have problem reading the map We get the following stack trace 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : attempt_200908191538_33939_m_21_2, Status : FAILED java.io.IOException: getValue() failed: null at org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775) at org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717) at org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour
[ https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi updated PIG-949: - Status: Open (was: Patch Available) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour -- Key: PIG-949 URL: https://issues.apache.org/jira/browse/PIG-949 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Environment: linux Reporter: Alok Singh Assignee: Yan Zhou Attachments: Pig_949.patch, Pig_949.patch, Pig_949.patch Hi The storage hint specification plays a important part whether the output table is readable or not say if we have have the map 'map'. One can split the map into a column group using [map#{k1}, map#{k2}...] however the remaining map field will automatically be added to the default group. if user try to create a new column group for the remaining fields as follows [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group the table writer will create the table. however, if one tries to load the created table via pig or via map reduce using TableInputFormat then the reader have problem reading the map We get the following stack trace 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : attempt_200908191538_33939_m_21_2, Status : FAILED java.io.IOException: getValue() failed: null at org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775) at org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717) at org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour
[ https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi updated PIG-949: - Fix Version/s: 0.5.0 0.4.0 Status: Patch Available (was: Open) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour -- Key: PIG-949 URL: https://issues.apache.org/jira/browse/PIG-949 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Environment: linux Reporter: Alok Singh Assignee: Yan Zhou Fix For: 0.4.0, 0.5.0 Attachments: Pig_949.patch, Pig_949.patch, Pig_949.patch Hi The storage hint specification plays a important part whether the output table is readable or not say if we have have the map 'map'. One can split the map into a column group using [map#{k1}, map#{k2}...] however the remaining map field will automatically be added to the default group. if user try to create a new column group for the remaining fields as follows [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group the table writer will create the table. however, if one tries to load the created table via pig or via map reduce using TableInputFormat then the reader have problem reading the map We get the following stack trace 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : attempt_200908191538_33939_m_21_2, Status : FAILED java.io.IOException: getValue() failed: null at org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775) at org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717) at org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [VOTE] Release Pig 0.4.0 (candidate 2)
+1. ran 'ant test-core'. contrib/zebra: 'ant test' passed after following directions as suggested : got a patch from PIG-660, and hadoop20.jar from PIG-833. For clarity we might attach patch suitable for PIG-660 for 0.4. Raghu. Olga Natkovich wrote: Hi, The new version is available in http://people.apache.org/~olga/pig-0.4.0-candidate-2/. I see one failure in a unit test in piggybank (contrib.) but it is not related to the functions themselves but seems to be an issue with MiniCluster and I don't feel we need to chase this down. I made sure that the same test runs ok with Hadoop 20. Please, vote by end of day on Thursday, 9/24. Olga -Original Message- From: Olga Natkovich [mailto:ol...@yahoo-inc.com] Sent: Thursday, September 17, 2009 12:09 PM To: pig-dev@hadoop.apache.org; priv...@hadoop.apache.org Subject: [VOTE] Release Pig 0.4.0 (candidate 1) Hi, I have fixed the issue causing the failure that Alan reported. Please test the new release: http://people.apache.org/~olga/pig-0.4.0-candidate-1/. Vote closes on Tuesday, 9/22. Olga -Original Message- From: Olga Natkovich [mailto:ol...@yahoo-inc.com] Sent: Monday, September 14, 2009 2:06 PM To: pig-dev@hadoop.apache.org; priv...@hadoop.apache.org Subject: [VOTE] Release Pig 0.4.0 (candidate 0) Hi, I created a candidate build for Pig 0.4.0 release. The highlights of this release are - Performance improvements especially in the area of JOIN support where we introduced two new join types: skew join to deal with data skew and sort merge join to take advantage of the sorted data sets. - Support for Outer join. - Works with Hadoop 18 I ran the release audit and rat report looked fine. The relevant part is attached below. Keys used to sign the release are available at http://svn.apache.org/viewvc/hadoop/pig/trunk/KEYS?view=markup. Please download the release and try it out: http://people.apache.org/~olga/pig-0.4.0-candidate-0. Should we release this? Vote closes on Thursday, 9/17. Olga [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/contrib/CHANGES.txt [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/contrib/zebra/CHANG ES.txt [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/broken-links.x ml [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/cookbook.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/index.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/linkmap.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/piglatin_refer ence.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/piglatin_users .html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/setup.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/tutorial.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/udf.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/api/package-li st [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes. html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/missingS inces.txt [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/user_com ments_for_pig_0.3.1_to_pig_0.5.0-dev.xml [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ alldiffs_index_additions.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ alldiffs_index_all.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ alldiffs_index_changes.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ alldiffs_index_removals.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ changes-summary.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ classes_index_additions.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ classes_index_all.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ classes_index_changes.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ classes_index_removals.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ constructors_index_additions.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ constructors_index_all.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ constructors_index_changes.html [java]
[jira] Resolved: (PIG-822) Flatten semantics are unknown
[ https://issues.apache.org/jira/browse/PIG-822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich resolved PIG-822. Resolution: Fixed ben updated the docs and they have been committed. Will be coming out as part of Pig 0.4.0 release Flatten semantics are unknown - Key: PIG-822 URL: https://issues.apache.org/jira/browse/PIG-822 Project: Pig Issue Type: Bug Components: documentation Reporter: George Mavromatis Assignee: Benjamin Reed Priority: Critical There is no formal specification of the flatten keyword in http://hadoop.apache.org/pig/docs/r0.2.0/piglatin.html There are only some examples. I have found flatten to be very fragile and unpredictable with the data types it reads and creates. Please document: Flatten to be explained formally in its own dedicated section: What are the valid input types, the output types it creates, what transformation it does from input to output and how the resulting data are named. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-972) Make describe work with nested foreach
Make describe work with nested foreach -- Key: PIG-972 URL: https://issues.apache.org/jira/browse/PIG-972 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Currently Parser can't deal with that. This is because describe is part of Grunt parser while the rest of nested foreach is handled by the QueryParser -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-973) type resolution inconsistency
type resolution inconsistency - Key: PIG-973 URL: https://issues.apache.org/jira/browse/PIG-973 Project: Pig Issue Type: Bug Reporter: Olga Natkovich This script works: A = load 'test' using PigStorage(':') as (name: chararray, age: int, gpa: float); B = group A by age; C = foreach B { D = filter A by gpa 2.5; E = order A by name; F = A.age; describe F; G = distinct F; generate group, COUNT(D), MAX (E.name), MIN(G.$0);} dump C; This one produces an error: A = load 'test' using PigStorage(':') as (name: chararray, age: int, gpa: float); B = group A by age; C = foreach B { D = filter A by gpa 2.5; E = order A by name; F = A.age; G = distinct F; generate group, COUNT(D), MAX (E.name), MIN(G);} dump C; Notice the difference in how MIN is passed the data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-513) PERFORMANCE: optimize some of the code in DefaultTuple
[ https://issues.apache.org/jira/browse/PIG-513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12758510#action_12758510 ] Hadoop QA commented on PIG-513: --- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12420307/PIG-513-3.patch against trunk revision 817739. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/44/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/44/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/44/console This message is automatically generated. PERFORMANCE: optimize some of the code in DefaultTuple -- Key: PIG-513 URL: https://issues.apache.org/jira/browse/PIG-513 Project: Pig Issue Type: Bug Affects Versions: 0.2.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Fix For: 0.6.0 Attachments: PIG-513-3.patch, PIG-513.patch, pig-513_2.patch The following areas in DefaultTuple.java can be changed: The member methods get(), set(), getType() and isNull() all call checkBounds() which is redundant call since all these 4 functions throw ExecException. Instead of doing a bounds check, we can catch the IndexOutOfBounds exception in a try-catch and throw it as an ExecException The write() method has the following unused object (d in the code below): {code} for (int i = 0; i sz; i++) { try { Object d = get(i); } catch (ExecException ee) { throw new RuntimeException(ee); } DataReaderWriter.writeDatum(out, mFields.get(i)); } {code} {noformat} The get(i) call in the try should be replaced by the writeDatum call directly since d is never used and there is an unncessary call to get() {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-966) Proposed rework for LoadFunc, StoreFunc, and Slice/r interfaces
[ https://issues.apache.org/jira/browse/PIG-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12758518#action_12758518 ] Alan Gates commented on PIG-966: In thinking about it more, it becomes obvious that we have to separate out determining the partition keys for an input from getting the schema, as Dmitry and Ashutosh suggested above. The reason is that Pig cannot ask the loader for a schema until it has completely defined what will be loaded (because the schema will depend on what is being loaded). And to completely define what is being loaded it needs to determine the partition keys and possibly specify a filter condition for them. So we need to add a getPartitionKeys and setPartitionFilter to the LoadMetadata interface. Proposed rework for LoadFunc, StoreFunc, and Slice/r interfaces --- Key: PIG-966 URL: https://issues.apache.org/jira/browse/PIG-966 Project: Pig Issue Type: Improvement Components: impl Reporter: Alan Gates Assignee: Alan Gates I propose that we rework the LoadFunc, StoreFunc, and Slice/r interfaces significantly. See http://wiki.apache.org/pig/LoadStoreRedesignProposal for full details -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-949) Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour
[ https://issues.apache.org/jira/browse/PIG-949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12758535#action_12758535 ] Hadoop QA commented on PIG-949: --- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12420313/Pig_949.patch against trunk revision 817739. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/9/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/9/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/9/console This message is automatically generated. Zebra Bug: splitting map into multiple column group using storage hint causes unexpected behaviour -- Key: PIG-949 URL: https://issues.apache.org/jira/browse/PIG-949 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Environment: linux Reporter: Alok Singh Assignee: Yan Zhou Fix For: 0.4.0, 0.5.0 Attachments: Pig_949.patch, Pig_949.patch, Pig_949.patch Hi The storage hint specification plays a important part whether the output table is readable or not say if we have have the map 'map'. One can split the map into a column group using [map#{k1}, map#{k2}...] however the remaining map field will automatically be added to the default group. if user try to create a new column group for the remaining fields as follows [map#{k1}, map#{k2}, ..][map] i.e create a seperate column group the table writer will create the table. however, if one tries to load the created table via pig or via map reduce using TableInputFormat then the reader have problem reading the map We get the following stack trace 09/09/09 00:09:45 INFO mapred.JobClient: Task Id : attempt_200908191538_33939_m_21_2, Status : FAILED java.io.IOException: getValue() failed: null at org.apache.hadoop.zebra.io.BasicTable$Reader$BTScanner.getValue(BasicTable.java:775) at org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:717) at org.apache.hadoop.zebra.mapred.TableRecordReader.next(TableInputFormat.java:651) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:191) at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:175) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:356) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.Child.main(Child.java:170) Alok -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [VOTE] Release Pig 0.4.0 (candidate 2)
I removed ~/pigtest/conf/hadoop-site.xml and build piggybank again, all pass. For some reason MiniCluster do not regenerate hadoop-site.xml and reuse the old one, which happens to be wrong Olga Natkovich wrote: Hi, The new version is available in http://people.apache.org/~olga/pig-0.4.0-candidate-2/. I see one failure in a unit test in piggybank (contrib.) but it is not related to the functions themselves but seems to be an issue with MiniCluster and I don't feel we need to chase this down. I made sure that the same test runs ok with Hadoop 20. Please, vote by end of day on Thursday, 9/24. Olga -Original Message- From: Olga Natkovich [mailto:ol...@yahoo-inc.com] Sent: Thursday, September 17, 2009 12:09 PM To: pig-dev@hadoop.apache.org; priv...@hadoop.apache.org Subject: [VOTE] Release Pig 0.4.0 (candidate 1) Hi, I have fixed the issue causing the failure that Alan reported. Please test the new release: http://people.apache.org/~olga/pig-0.4.0-candidate-1/. Vote closes on Tuesday, 9/22. Olga -Original Message- From: Olga Natkovich [mailto:ol...@yahoo-inc.com] Sent: Monday, September 14, 2009 2:06 PM To: pig-dev@hadoop.apache.org; priv...@hadoop.apache.org Subject: [VOTE] Release Pig 0.4.0 (candidate 0) Hi, I created a candidate build for Pig 0.4.0 release. The highlights of this release are - Performance improvements especially in the area of JOIN support where we introduced two new join types: skew join to deal with data skew and sort merge join to take advantage of the sorted data sets. - Support for Outer join. - Works with Hadoop 18 I ran the release audit and rat report looked fine. The relevant part is attached below. Keys used to sign the release are available at http://svn.apache.org/viewvc/hadoop/pig/trunk/KEYS?view=markup. Please download the release and try it out: http://people.apache.org/~olga/pig-0.4.0-candidate-0. Should we release this? Vote closes on Thursday, 9/17. Olga [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/contrib/CHANGES.txt [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/contrib/zebra/CHANG ES.txt [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/broken-links.x ml [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/cookbook.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/index.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/linkmap.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/piglatin_refer ence.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/piglatin_users .html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/setup.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/tutorial.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/udf.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/api/package-li st [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes. html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/missingS inces.txt [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/user_com ments_for_pig_0.3.1_to_pig_0.5.0-dev.xml [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ alldiffs_index_additions.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ alldiffs_index_all.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ alldiffs_index_changes.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ alldiffs_index_removals.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ changes-summary.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ classes_index_additions.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ classes_index_all.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ classes_index_changes.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ classes_index_removals.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ constructors_index_additions.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ constructors_index_all.html [java] !? /home/olgan/src/pig-apache/trunk/build/pig-0.5.0-dev/docs/jdiff/changes/ constructors_index_changes.html [java] !?