Re: [VOTE] Release Pig 0.14.0 (candidate 0)
+1 Verified keys Checked LICENSE, README, RELEASE_NOTES, CHANGES files, rat report. Built the source Tried running queries both using local mode and cluster Two minor issues, that doesn’t need to block this RC 1. I think we should update README to indicate the choice of execution engine. 2. pig —help does not show “tez” as valid option for “-x” argument I will create a jira to track these issues. On Wed, Nov 12, 2014 at 8:46 PM, Daniel Dai da...@hortonworks.com wrote: Hi, I have created a candidate build for Pig 0.14.0. Keys used to sign the release are available at http://svn.apache.org/viewvc/pig/trunk/KEYS?view=markup. Please download, test, and try it out: http://people.apache.org/~daijy/pig-0.14.0-candidate-0/ Release notes and the rat report are available at the same location. Should we release this? Vote closes on next Monday EOD, Nov 17th 2014. Thanks, Daniel -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: [VOTE] Drop support for Hadoop 0.20 from Pig 0.14
+1 On Thu, Sep 18, 2014 at 5:50 PM, Mona Chitnis mona.chit...@yahoo.in wrote: +1 (non-binding) Mona Chitnis Yahoo! On Thursday, September 18, 2014 8:48 AM, Ashutosh Chauhan hashut...@apache.org wrote: +1 On Wed, Sep 17, 2014 at 7:02 PM, Daniel Dai da...@hortonworks.com wrote: +1 On Wed, Sep 17, 2014 at 11:12 AM, Prashant Kommireddi prash1...@gmail.com wrote: +1 On Wed, Sep 17, 2014 at 8:44 AM, Cheolsoo Park piaozhe...@gmail.com wrote: +1 On Wed, Sep 17, 2014 at 7:09 AM, Xuefu Zhang xzh...@cloudera.com wrote: +1 On Wed, Sep 17, 2014 at 7:04 AM, Julien Le Dem jul...@ledem.net wrote: +1 Julien -Original Message- From: Rohini Palaniswamy [mailto:rohini.adi...@gmail.com] Sent: Wednesday, September 17, 2014 12:38 PM To: dev@pig.apache.org Subject: [VOTE] Drop support for Hadoop 0.20 from Pig 0.14 Hi, Hadoop has matured far from Hadoop 0.20 and has had two major releases after that and there has been no development on branch-0.20 ( http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20/) for 3 years now. It is high time we drop support for Hadoop 0.20 and only support Hadoop 1.x and 2.x lines going forward. This will reduce the maintenance effort and also enable us to right more efficient code and cut down on reflections. Vote closes on Tuesday, Sep 23 2014. Thanks, Rohini -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Review Request 24789: New logical optimizer rule: ConstantCalculator
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24789/#review51603 --- Ship it! Ship It! - Thejas Nair On Aug. 26, 2014, 10:35 p.m., Daniel Dai wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24789/ --- (Updated Aug. 26, 2014, 10:35 p.m.) Review request for pig. Repository: pig Description --- See PIG-4128 Diffs - trunk/src/org/apache/pig/EvalFunc.java 1618727 trunk/src/org/apache/pig/Main.java 1618727 trunk/src/org/apache/pig/builtin/ABS.java 1618727 trunk/src/org/apache/pig/builtin/ARITY.java 1618727 trunk/src/org/apache/pig/builtin/AddDuration.java 1618727 trunk/src/org/apache/pig/builtin/Assert.java 1618727 trunk/src/org/apache/pig/builtin/BagSize.java 1618727 trunk/src/org/apache/pig/builtin/BagToString.java 1618727 trunk/src/org/apache/pig/builtin/BagToTuple.java 1618727 trunk/src/org/apache/pig/builtin/Base.java 1618727 trunk/src/org/apache/pig/builtin/BigDecimalAbs.java 1618727 trunk/src/org/apache/pig/builtin/BigIntegerAbs.java 1618727 trunk/src/org/apache/pig/builtin/CONCAT.java 1618727 trunk/src/org/apache/pig/builtin/ConstantSize.java 1618727 trunk/src/org/apache/pig/builtin/CubeDimensions.java 1618727 trunk/src/org/apache/pig/builtin/CurrentTime.java 1618727 trunk/src/org/apache/pig/builtin/DIFF.java 1618727 trunk/src/org/apache/pig/builtin/DaysBetween.java 1618727 trunk/src/org/apache/pig/builtin/DoubleRound.java 1618727 trunk/src/org/apache/pig/builtin/DoubleRoundTo.java 1618727 trunk/src/org/apache/pig/builtin/ENDSWITH.java 1618727 trunk/src/org/apache/pig/builtin/EqualsIgnoreCase.java 1618727 trunk/src/org/apache/pig/builtin/FloatAbs.java 1618727 trunk/src/org/apache/pig/builtin/FloatRound.java 1618727 trunk/src/org/apache/pig/builtin/FloatRoundTo.java 1618727 trunk/src/org/apache/pig/builtin/GetDay.java 1618727 trunk/src/org/apache/pig/builtin/GetHour.java 1618727 trunk/src/org/apache/pig/builtin/GetMilliSecond.java 1618727 trunk/src/org/apache/pig/builtin/GetMinute.java 1618727 trunk/src/org/apache/pig/builtin/GetMonth.java 1618727 trunk/src/org/apache/pig/builtin/GetSecond.java 1618727 trunk/src/org/apache/pig/builtin/GetWeek.java 1618727 trunk/src/org/apache/pig/builtin/GetWeekYear.java 1618727 trunk/src/org/apache/pig/builtin/GetYear.java 1618727 trunk/src/org/apache/pig/builtin/HoursBetween.java 1618727 trunk/src/org/apache/pig/builtin/INDEXOF.java 1618727 trunk/src/org/apache/pig/builtin/INVERSEMAP.java 1618727 trunk/src/org/apache/pig/builtin/IntAbs.java 1618727 trunk/src/org/apache/pig/builtin/IsEmpty.java 1618727 trunk/src/org/apache/pig/builtin/KEYSET.java 1618727 trunk/src/org/apache/pig/builtin/LAST_INDEX_OF.java 1618727 trunk/src/org/apache/pig/builtin/LCFIRST.java 1618727 trunk/src/org/apache/pig/builtin/LOWER.java 1618727 trunk/src/org/apache/pig/builtin/LTRIM.java 1618727 trunk/src/org/apache/pig/builtin/LongAbs.java 1618727 trunk/src/org/apache/pig/builtin/MapSize.java 1618727 trunk/src/org/apache/pig/builtin/MilliSecondsBetween.java 1618727 trunk/src/org/apache/pig/builtin/MinutesBetween.java 1618727 trunk/src/org/apache/pig/builtin/MonthsBetween.java 1618727 trunk/src/org/apache/pig/builtin/PluckTuple.java 1618727 trunk/src/org/apache/pig/builtin/REGEX_EXTRACT.java 1618727 trunk/src/org/apache/pig/builtin/REGEX_EXTRACT_ALL.java 1618727 trunk/src/org/apache/pig/builtin/REPLACE.java 1618727 trunk/src/org/apache/pig/builtin/ROUND.java 1618727 trunk/src/org/apache/pig/builtin/ROUND_TO.java 1618727 trunk/src/org/apache/pig/builtin/RTRIM.java 1618727 trunk/src/org/apache/pig/builtin/RollupDimensions.java 1618727 trunk/src/org/apache/pig/builtin/SIZE.java 1618727 trunk/src/org/apache/pig/builtin/SPRINTF.java 1618727 trunk/src/org/apache/pig/builtin/STARTSWITH.java 1618727 trunk/src/org/apache/pig/builtin/STRSPLIT.java 1618727 trunk/src/org/apache/pig/builtin/SUBSTRING.java 1618727 trunk/src/org/apache/pig/builtin/SUBTRACT.java 1618727 trunk/src/org/apache/pig/builtin/SecondsBetween.java 1618727 trunk/src/org/apache/pig/builtin/StringConcat.java 1618727 trunk/src/org/apache/pig/builtin/StringSize.java 1618727 trunk/src/org/apache/pig/builtin/SubtractDuration.java 1618727 trunk/src/org/apache/pig/builtin/TOBAG.java 1618727 trunk/src/org/apache/pig/builtin/TOKENIZE.java 1618727 trunk/src/org/apache/pig/builtin/TOMAP.java 1618727 trunk/src/org/apache/pig/builtin/TOTUPLE.java 1618727 trunk/src/org/apache/pig/builtin/TRIM.java 1618727
Re: Review Request 24789: New logical optimizer rule: ConstantCalculator
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24789/#review51430 --- trunk/src/org/apache/pig/builtin/CurrentTime.java https://reviews.apache.org/r/24789/#comment89748 if the optimization is disabled, don't we want to go to old behavior of using pig.job.submitted ? trunk/src/org/apache/pig/newplan/logical/rules/ConstantCalculator.java https://reviews.apache.org/r/24789/#comment89769 There is no processedOperators.add happening. Is this variable needed ? trunk/src/org/apache/pig/newplan/logical/rules/ConstantCalculator.java https://reviews.apache.org/r/24789/#comment89755 does it make sense to do this setPlan in moveTree call itself? - Thejas Nair On Aug. 19, 2014, 5:41 p.m., Daniel Dai wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/24789/ --- (Updated Aug. 19, 2014, 5:41 p.m.) Review request for pig. Repository: pig Description --- See PIG-4128 Diffs - trunk/src/org/apache/pig/EvalFunc.java 1618727 trunk/src/org/apache/pig/Main.java 1618727 trunk/src/org/apache/pig/builtin/ABS.java 1618727 trunk/src/org/apache/pig/builtin/ARITY.java 1618727 trunk/src/org/apache/pig/builtin/AddDuration.java 1618727 trunk/src/org/apache/pig/builtin/Assert.java 1618727 trunk/src/org/apache/pig/builtin/BagSize.java 1618727 trunk/src/org/apache/pig/builtin/BagToString.java 1618727 trunk/src/org/apache/pig/builtin/BagToTuple.java 1618727 trunk/src/org/apache/pig/builtin/Base.java 1618727 trunk/src/org/apache/pig/builtin/BigDecimalAbs.java 1618727 trunk/src/org/apache/pig/builtin/BigIntegerAbs.java 1618727 trunk/src/org/apache/pig/builtin/CONCAT.java 1618727 trunk/src/org/apache/pig/builtin/ConstantSize.java 1618727 trunk/src/org/apache/pig/builtin/CubeDimensions.java 1618727 trunk/src/org/apache/pig/builtin/CurrentTime.java 1618727 trunk/src/org/apache/pig/builtin/DIFF.java 1618727 trunk/src/org/apache/pig/builtin/DaysBetween.java 1618727 trunk/src/org/apache/pig/builtin/DoubleRound.java 1618727 trunk/src/org/apache/pig/builtin/DoubleRoundTo.java 1618727 trunk/src/org/apache/pig/builtin/ENDSWITH.java 1618727 trunk/src/org/apache/pig/builtin/EqualsIgnoreCase.java 1618727 trunk/src/org/apache/pig/builtin/FloatAbs.java 1618727 trunk/src/org/apache/pig/builtin/FloatRound.java 1618727 trunk/src/org/apache/pig/builtin/FloatRoundTo.java 1618727 trunk/src/org/apache/pig/builtin/GetDay.java 1618727 trunk/src/org/apache/pig/builtin/GetHour.java 1618727 trunk/src/org/apache/pig/builtin/GetMilliSecond.java 1618727 trunk/src/org/apache/pig/builtin/GetMinute.java 1618727 trunk/src/org/apache/pig/builtin/GetMonth.java 1618727 trunk/src/org/apache/pig/builtin/GetSecond.java 1618727 trunk/src/org/apache/pig/builtin/GetWeek.java 1618727 trunk/src/org/apache/pig/builtin/GetWeekYear.java 1618727 trunk/src/org/apache/pig/builtin/GetYear.java 1618727 trunk/src/org/apache/pig/builtin/HoursBetween.java 1618727 trunk/src/org/apache/pig/builtin/INDEXOF.java 1618727 trunk/src/org/apache/pig/builtin/INVERSEMAP.java 1618727 trunk/src/org/apache/pig/builtin/IntAbs.java 1618727 trunk/src/org/apache/pig/builtin/IsEmpty.java 1618727 trunk/src/org/apache/pig/builtin/KEYSET.java 1618727 trunk/src/org/apache/pig/builtin/LAST_INDEX_OF.java 1618727 trunk/src/org/apache/pig/builtin/LCFIRST.java 1618727 trunk/src/org/apache/pig/builtin/LOWER.java 1618727 trunk/src/org/apache/pig/builtin/LTRIM.java 1618727 trunk/src/org/apache/pig/builtin/LongAbs.java 1618727 trunk/src/org/apache/pig/builtin/MapSize.java 1618727 trunk/src/org/apache/pig/builtin/MilliSecondsBetween.java 1618727 trunk/src/org/apache/pig/builtin/MinutesBetween.java 1618727 trunk/src/org/apache/pig/builtin/MonthsBetween.java 1618727 trunk/src/org/apache/pig/builtin/PluckTuple.java 1618727 trunk/src/org/apache/pig/builtin/REGEX_EXTRACT.java 1618727 trunk/src/org/apache/pig/builtin/REGEX_EXTRACT_ALL.java 1618727 trunk/src/org/apache/pig/builtin/REPLACE.java 1618727 trunk/src/org/apache/pig/builtin/ROUND.java 1618727 trunk/src/org/apache/pig/builtin/ROUND_TO.java 1618727 trunk/src/org/apache/pig/builtin/RTRIM.java 1618727 trunk/src/org/apache/pig/builtin/RollupDimensions.java 1618727 trunk/src/org/apache/pig/builtin/SIZE.java 1618727 trunk/src/org/apache/pig/builtin/SPRINTF.java 1618727 trunk/src/org/apache/pig/builtin/STARTSWITH.java 1618727 trunk/src/org/apache/pig/builtin/STRSPLIT.java 1618727 trunk/src/org/apache/pig/builtin/SUBSTRING.java 1618727 trunk/src/org/apache
Re: [ANNOUNCE] Apache Pig 0.12.1 released
Thanks Prashant! On Tue, Apr 15, 2014 at 10:58 AM, Cheolsoo Park piaozhe...@gmail.com wrote: Thank you Prashant for your hard work! On Mon, Apr 14, 2014 at 5:37 PM, Daniel Dai da...@hortonworks.com wrote: Thanks Prashant! On Mon, Apr 14, 2014 at 5:30 PM, Prashant Kommireddi prkommire...@apache.org wrote: The Pig team is happy to announce the Pig 0.12.1 release. Apache Pig provides a high-level data-flow language and execution framework for parallel computation on Hadoop clusters. More details about Pig can be found at http://pig.apache.org/. This is a maintenance release of Pig 0.12 and contains several bug fixes and improvements. The details of the release can be found at http://pig.apache.org/releases.html. You can download the release here http://www.apache.org/dyn/closer.cgi/pig The released maven artifacts have been made available on repository.apache.org We would like to thank all contributors that made this release possible. Thanks, Prashant Kommireddi -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: [VOTE] Release Pig 0.12.1 (Candidate 0)
Here is my late +1. Checked the md5 and asc keys. Checked release notes, CHANGES.txt. Build from source tar, tried some local queries. Checked output of version command (pig -version) The output of version command in binary is accurate. However, in case of source tar, when I build using just ant (without -Dversion..) , the version shows up as Apache Pig version 0.12.2-SNAPSHOT . I don't think this issue warrants a new RC. I think we should update the release instructions to change the version in build.xml to appropriate release version before tagging svn (and create tar using this tagged version), and then increment the version number in the next commit. If people agree, I can update the instructions in wiki. On Fri, Apr 11, 2014 at 1:54 AM, Prashant Kommireddi prash1...@gmail.com wrote: The release vote passes with 4 +1s (4 binding votes), and no -1s. +1s (binding) Dmitriy Ryaboy Daniel Dai Cheolsoo Park Alan Gates +1s (non-binding) None -1s None Thanks Daniel for pointing out the missing pig-0.12.1.tar.gz.asc file. I have added it to the RC. I will proceed with the release process. Thanks, Prashant On Thu, Apr 10, 2014 at 10:54 AM, Alan Gates ga...@hortonworks.com wrote: +1 Reviewed LICENSE, NOTICE, RELEASE_NOTES, and README. Built, built piggybank and ran tests, ran a local smoke test. Alan. On Apr 7, 2014, at 1:22 PM, Prashant Kommireddi prkommire...@apache.org wrote: I have created a candidate build for Pig 0.12.1. This is a maintenance release to Pig 0.12.0 with a few critical bug fixes. Keys used to sign the release are available at http://svn.apache.org/viewvc/pig/trunk/KEYS?view=markup. Please download, test, and try it out: http://people.apache.org/~prkommireddi/pig-0.12.1-candidate-0/ Release notes and the rat report are available from the same location. List of issues fixed in this release http://svn.apache.org/viewvc/pig/branches/branch-0.12/CHANGES.txt?view=markup Should we release this? Vote closes EOD this Thursday, April 10th. -Prashant -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: [VOTE] Release Pig 0.12.1 (Candidate 0)
I have updated the wiki for this. I have a post-release section where the version in number gets updated to the next version. https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?pageId=26120105selectedPageVersions=27selectedPageVersions=26 On Fri, Apr 11, 2014 at 11:57 AM, Daniel Dai da...@hortonworks.com wrote: This is on our release document and we do follow in prior releases. It does not seems to be an Apache convention. I don't know the motivation of this, but if it is confusing enough, it might be better to change in next release. On Fri, Apr 11, 2014 at 11:17 AM, Prashant Kommireddi prash1...@gmail.com wrote: Thanks Thejas. This actually came up during 0.12.0 RC as well and this is Daniel's reply. I do agree with you on having pig.version in build.xml reflect the current build rather than the next one. But I'm not aware of what the Apache convention is, or what other projects are doing. You guys know better :) *Hi, Mark,* *Thanks for reviewing. -SNAPSHOT is intentional according to https://cwiki.apache.org/confluence/display/PIG/HowToRelease https://cwiki.apache.org/confluence/display/PIG/HowToRelease. When userbuild the release, the version will be {next version}-SNAPSHOT.Thanks,Daniel* On Fri, Apr 11, 2014 at 10:45 AM, Thejas Nair the...@hortonworks.comwrote: Here is my late +1. Checked the md5 and asc keys. Checked release notes, CHANGES.txt. Build from source tar, tried some local queries. Checked output of version command (pig -version) The output of version command in binary is accurate. However, in case of source tar, when I build using just ant (without -Dversion..) , the version shows up as Apache Pig version 0.12.2-SNAPSHOT . I don't think this issue warrants a new RC. I think we should update the release instructions to change the version in build.xml to appropriate release version before tagging svn (and create tar using this tagged version), and then increment the version number in the next commit. If people agree, I can update the instructions in wiki. On Fri, Apr 11, 2014 at 1:54 AM, Prashant Kommireddi prash1...@gmail.com wrote: The release vote passes with 4 +1s (4 binding votes), and no -1s. +1s (binding) Dmitriy Ryaboy Daniel Dai Cheolsoo Park Alan Gates +1s (non-binding) None -1s None Thanks Daniel for pointing out the missing pig-0.12.1.tar.gz.asc file. I have added it to the RC. I will proceed with the release process. Thanks, Prashant On Thu, Apr 10, 2014 at 10:54 AM, Alan Gates ga...@hortonworks.com wrote: +1 Reviewed LICENSE, NOTICE, RELEASE_NOTES, and README. Built, built piggybank and ran tests, ran a local smoke test. Alan. On Apr 7, 2014, at 1:22 PM, Prashant Kommireddi prkommire...@apache.org wrote: I have created a candidate build for Pig 0.12.1. This is a maintenance release to Pig 0.12.0 with a few critical bug fixes. Keys used to sign the release are available at http://svn.apache.org/viewvc/pig/trunk/KEYS?view=markup. Please download, test, and try it out: http://people.apache.org/~prkommireddi/pig-0.12.1-candidate-0/ Release notes and the rat report are available from the same location. List of issues fixed in this release http://svn.apache.org/viewvc/pig/branches/branch-0.12/CHANGES.txt?view=markup Should we release this? Vote closes EOD this Thursday, April 10th. -Prashant -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby
Re: [ANNOUNCE] Congratulations to our new PMC members Rohini Palaniswamy and Cheolsoo Park
Congrats Rohini and Cheolsoo! On Thu, Sep 12, 2013 at 11:24 AM, Bill Graham billgra...@gmail.com wrote: Congrats guys! Well deserved indeed. On Wed, Sep 11, 2013 at 10:58 PM, Jarek Jarcec Cecho jar...@apache.orgwrote: Congratulations Rohini and Cheolsoo, awesome work! Jarcec On Wed, Sep 11, 2013 at 04:24:21PM -0700, Julien Le Dem wrote: Please welcome Rohini Palaniswamy and Cheolsoo Park as our latest Pig PMC members. Congrats Rohini and Cheolsoo ! -- *Note that I'm no longer using my Yahoo! email address. Please email me at billgra...@gmail.com going forward.* -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Welcome new Pig Committer - Koji Noguchi
Congrats Koji! Very well deserved! On Wed, Sep 11, 2013 at 9:49 AM, Daniel Dai da...@hortonworks.com wrote: Congratulation! You are well deserved. On Wed, Sep 11, 2013 at 6:33 AM, Miguel Angel Martin junquera mianmarjun.mailingl...@gmail.com wrote: Congratulations K Miguel Angel Martín Junquera Analyst Engineer. miguelangel.mar...@brainsins.com 2013/9/11 kun yan yankunhad...@gmail.com Congratulations Koji! 2013/9/11 Koji Noguchi knogu...@yahoo-inc.com Thanks everyone! Koji On Sep 11, 2013, at 2:18 AM, Bill Graham wrote: Congrats Koji! On Tue, Sep 10, 2013 at 10:29 PM, Cheolsoo Park piaozhe...@gmail.com wrote: Congratulations Koji! On Wed, Sep 11, 2013 at 7:32 AM, Prashant Kommireddi prash1...@gmail.com wrote: Congrats Koji! On Tue, Sep 10, 2013 at 10:01 AM, Xuefu Zhang xzh...@cloudera.com wrote: Congratulations, Koji. Looking forward to more of your contributions. --Xuefu On Tue, Sep 10, 2013 at 8:58 AM, Olga Natkovich onatkov...@yahoo.com wrote: It is my pleasure to announce that Koji Noguchi became the newest addition to the Pig Committers! Koji has been actively contributing to Pig for over a year now and has been a part of larger Hadoop community (including Hadoop Committer) for many years now. Please, join me in congratulating Koji! Olga -- *Note that I'm no longer using my Yahoo! email address. Please email me at billgra...@gmail.com going forward.* -- In the Hadoop world, I am just a novice, explore the entire Hadoop ecosystem, I hope one day I can contribute their own code YanBit yankunhad...@gmail.com -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: [VOTE] Release Pig 0.10.1 (candidate 3)
+1 Verified md5 checksums of src and binary tar.gz . Build the src tar.gz and ran queries against a hadoop 1.1 cluster, ran fs and sh commands. -Thejas On 1/3/13 12:11 PM, Rohini Palaniswamy wrote: +1. Downloaded the tar binary, checked signature, ran unit tests, piggybank unit tests, checked docs/release notes, ran a simple script locally and against a cluster. On Mon, Dec 31, 2012 at 8:41 AM, Alan Gates ga...@hortonworks.com wrote: +1, yet again :). Checked the key signature and checksum on the source package. Built and ran commit unit tests on src, ran a test job in local mode. Downloaded the tar binary and ran a job in local and cluster mode. Alan. On Dec 28, 2012, at 11:50 PM, Daniel Dai wrote: Hi, I have created a candidate build for Pig 0.10.1. This is a maintenance release of Pig 0.10. Keys used to sign the release are available at http://svn.apache.org/viewvc/pig/trunk/KEYS?view=markup Please download, test, and try it out: http://people.apache.org/~daijy/pig-0.10.1-candidate-3/ Should we release this? Vote closes on EOD next Friday, Jan 4th. Thanks, Daniel
Re: Review Request: Review for PIG-1314 - add datetime type in pig
http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/data/SizeUtil.java 1373741 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/data/TypeAwareTuple.java 1373741 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/io/NullableDateTimeWritable.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/logicalLayer/schema/Schema.java 1373741 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/logicalLayer/schema/SchemaUtil.java 1373741 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/util/CastUtils.java 1373741 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/util/NumValCarrier.java 1373741 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/util/StorageUtil.java 1373741 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/visitor/TypeCheckingExpVisitor.java 1373741 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AliasMasker.g 1373741 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AstPrinter.g 1373741 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AstValidator.g 1373741 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanGenerator.g 1373741 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/QueryLexer.g 1373741 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/QueryParser.g 1373741 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/pen/AugmentBaseDataVisitor.java 1373741 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/scripting/jruby/JrubyScriptEngine.java 1373741 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/scripting/jruby/RubySchema.java 1373741 http://svn.apache.org/repos/asf/pig/trunk/test/e2e/pig/udfs/java/org/apache/pig/test/udf/storefunc/PigPerformanceLoader.java 1373741 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/data/TestSchemaTuple.java 1373741 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestAdd.java 1373741 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestBuiltin.java 1373741 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestConversions.java 1373741 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestDivide.java 1373741 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestEqualTo.java 1373741 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestGTOrEqual.java 1373741 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestGreaterThan.java 1373741 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestLTOrEqual.java 1373741 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestLessThan.java 1373741 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestMod.java 1373741 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestMultiply.java 1373741 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestNotEqualTo.java 1373741 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestNull.java 1373741 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestOrderBy.java 1373741 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestPOBinCond.java 1373741 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestPOCast.java 1373741 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestPackage.java 1373741 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestPigTupleRawComparator.java 1373741 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestResourceSchema.java 1373741 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestSchema.java 1373741 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestStore.java 1373741 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestSubtract.java 1373741 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestTextDataParser.java 1373741 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestTypeCheckingValidatorNewLP.java 1373741 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/utils/GenRandomData.java 1373741 Diff: https://reviews.apache.org/r/5414/diff/ Testing --- Thanks, Thejas Nair
Re: Review Request: Review for PIG-1314 - add datetime type in pig
://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/io/NullableDateTimeWritable.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/logicalLayer/schema/Schema.java 1371785 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/logicalLayer/schema/SchemaUtil.java 1371785 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/util/CastUtils.java 1371785 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/util/NumValCarrier.java 1371785 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/util/StorageUtil.java 1371785 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/visitor/TypeCheckingExpVisitor.java 1371785 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AliasMasker.g 1371785 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AstPrinter.g 1371785 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AstValidator.g 1371785 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanGenerator.g 1371785 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/QueryLexer.g 1371785 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/QueryParser.g 1371785 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/pen/AugmentBaseDataVisitor.java 1371785 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/scripting/jruby/JrubyScriptEngine.java 1371785 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/scripting/jruby/RubySchema.java 1371785 http://svn.apache.org/repos/asf/pig/trunk/test/e2e/pig/udfs/java/org/apache/pig/test/udf/storefunc/PigPerformanceLoader.java 1371785 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/data/TestSchemaTuple.java 1371785 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestAdd.java 1371785 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestBuiltin.java 1371785 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestConversions.java 1371785 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestDivide.java 1371785 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestEqualTo.java 1371785 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestGTOrEqual.java 1371785 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestGreaterThan.java 1371785 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestLTOrEqual.java 1371785 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestLessThan.java 1371785 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestMod.java 1371785 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestMultiply.java 1371785 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestNotEqualTo.java 1371785 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestNull.java 1371785 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestOrderBy.java 1371785 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestPOBinCond.java 1371785 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestPOCast.java 1371785 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestPigTupleRawComparator.java 1371785 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestResourceSchema.java 1371785 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestSchema.java 1371785 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestStore.java 1371785 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestSubtract.java 1371785 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestTextDataParser.java 1371785 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestTypeCheckingValidatorNewLP.java 1371785 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/utils/GenRandomData.java 1371785 Diff: https://reviews.apache.org/r/5414/diff/ Testing --- Thanks, Thejas Nair
Re: Breaking down big unit tests
We certainly need to look at ways to reduce the runtime of the 'unit' tests. Some of them should be migrated to the e2e tests. But what you want for being able to re-test easily seems to be a way to specify specific test case within a Test*.java file . I wonder if junit lets you do that. -Thejas On 7/19/12 2:11 PM, Jie Li wrote: Hi all, Apparently some unit test classes are so fat that retesting them is a pain. While reducing the full testing time is a long-term goal, shall we just break down those big units into smaller pieces? Here are the running time of top 20 big units : 3,432.68 org.apache.pig.test.TestEvalPipeline2 2,944.075 org.apache.pig.test.TestSkewedJoin 1,819.059 org.apache.pig.test.TestMergeJoin 1,797.877 org.apache.pig.test.TestFRJoin 1,476.097 org.apache.pig.test.TestEvalPipeline 1,261.661 org.apache.pig.test.TestFRJoin2 1,164.076 org.apache.pig.test.TestAccumulator 801.747 org.apache.pig.test.TestBZip 799.689 org.apache.pig.test.TestJoin 792.808 org.apache.pig.test.TestPigRunner 750.614 org.apache.pig.test.TestStreaming 743.728 org.apache.pig.test.TestNativeMapReduce 739.31 org.apache.pig.test.TestLimitVariable 674.208 org.apache.pig.test.TestScriptLanguageJavaScript 664.857 org.apache.pig.test.TestJoinSmoke 653.671 org.apache.pig.test.TestCounters 621.06 org.apache.pig.test.TestBestFitCast 541.43 org.apache.pig.test.TestAlgebraicEval 539.939 org.apache.pig.test.TestGrunt While the full tests take about 10 hours to finish, these top 20 classes account for almost half the time. The idea is to cut them each to 10-minute pieces. Any comment? Jie
Re: Review Request: Review for PIG-1314 - add datetime type in pig
in the toDate udf . http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/data/BinInterSedes.java https://reviews.apache.org/r/5414/#comment19278 can we just compare the longs ? That way we can avoid the object creation. creating objects reduces the performance advantage of using rawcomparator . http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/data/DataType.java https://reviews.apache.org/r/5414/#comment19279 as we allow long to be cast to float/double, i think it will be more consistent to allow that for datetime as well. http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/data/DataType.java https://reviews.apache.org/r/5414/#comment19280 we need to deal with timezone in the date string http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/data/SizeUtil.java https://reviews.apache.org/r/5414/#comment19281 how did you arrive at this number ? http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/pen/AugmentBaseDataVisitor.java https://reviews.apache.org/r/5414/#comment19282 check for DATETIME should be not added here. http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/pen/AugmentBaseDataVisitor.java https://reviews.apache.org/r/5414/#comment19283 check for DATETIME should be not added here. - Thejas Nair On July 10, 2012, 5:41 p.m., Thejas Nair wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/5414/ --- (Updated July 10, 2012, 5:41 p.m.) Review request for pig. Description --- Review for PIG-1314 This addresses bug PIG-1314. https://issues.apache.org/jira/browse/PIG-1314 Diffs - http://svn.apache.org/repos/asf/pig/trunk/conf/pig.properties 1359212 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/DBStorage.java 1359212 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/HiveColumnarLoader.java 1359212 http://svn.apache.org/repos/asf/pig/trunk/contrib/piggybank/java/src/main/java/org/apache/pig/piggybank/storage/SequenceFileLoader.java 1359212 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/LoadCaster.java 1359212 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/PigServer.java 1359212 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/PigWarning.java 1359212 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/StoreCaster.java 1359212 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/DateTimeWritable.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/HDataType.java 1359212 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java 1359212 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigDateTimeRawComparator.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/PhysicalOperator.java 1359212 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/ConstantExpression.java 1359212 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/EqualToExpr.java 1359212 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/ExpressionOperator.java 1359212 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/GTOrEqualToExpr.java 1359212 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/GreaterThanExpr.java 1359212 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/LTOrEqualToExpr.java 1359212 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/LessThanExpr.java 1359212 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/NotEqualToExpr.java 1359212 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POBinCond.java 1359212 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POCast.java 1359212
Re: Review Request: Review for PIG-1314 - add datetime type in pig
/AddDuration.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/BinStorage.java 1359212 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/CurrentTime.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/DaysBetween.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/DiffDate.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/GetDay.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/GetHour.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/GetMinute.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/GetMonth.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/GetSecond.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/GetYear.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/HoursBetween.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/MinutesBetween.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/MonthsBetween.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/SecondsBetween.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/SubtractDuration.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/TextLoader.java 1359212 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/ToDate.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/ToString.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/ToUnixTime.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/Utf8StorageConverter.java 1359212 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/YearsBetween.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/data/BinInterSedes.java 1359212 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/data/DataReaderWriter.java 1359212 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/data/DataType.java 1359212 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/data/DefaultTuple.java 1359212 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/data/SchemaTupleClassGenerator.java 1359212 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/data/SizeUtil.java 1359212 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/data/TypeAwareTuple.java 1359212 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/io/NullableDateTimeWritable.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/logicalLayer/schema/SchemaUtil.java 1359212 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/util/CastUtils.java 1359212 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/util/NumValCarrier.java 1359212 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/util/StorageUtil.java 1359212 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/visitor/TypeCheckingExpVisitor.java 1359212 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AliasMasker.g 1359212 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AstPrinter.g 1359212 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/AstValidator.g 1359212 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/LogicalPlanGenerator.g 1359212 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/QueryLexer.g 1359212 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/QueryParser.g 1359212 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/pen/AugmentBaseDataVisitor.java 1359212 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/scripting/jruby/JrubyScriptEngine.java 1359212 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/scripting/jruby/RubySchema.java 1359212 http://svn.apache.org/repos/asf/pig/trunk/test/e2e/pig/udfs/java/org/apache/pig/test/udf/storefunc/PigPerformanceLoader.java 1359212 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestConversions.java 1359212 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestPOCast.java 1359212 Diff: https://reviews.apache.org/r/5414/diff/ Testing --- Thanks, Thejas Nair
Re: Are there any explanations of the implementation of illustrate?
Earlier implementation of illustrate used the pig local mode execution engine (which corresponds to the time when paper was published) . As part of illustrate reword in PIG-1712, Yan replaced the default Map and Reduce context objects with a IllustratorContext. Look for IllustratorContext and LocalMapReduceSimulator in https://issues.apache.org/jira/secure/attachment/12459267/illustrator_2.patch The context objects write their output and read input from memory. We can consider using this for pig local mode as well, by replacing the in memory list with something that can spill to disk. -Thejas On 7/3/12 6:34 PM, Jonathan Coveney wrote: Jie, that's perfect, thanks. This doc, specifically: http://i.stanford.edu/~olston/publications/sigmod09.pdf is exactly the detailed explanation I was looking for. 2012/7/3 Jie Li ji...@cs.duke.edu Some document here: http://wiki.apache.org/pig/PigIllustrate I agree that more tests are needed for illustrate, otherwise it can be easily broken without notice. Jie On Tue, Jul 3, 2012 at 12:45 PM, Jon Coveney jcove...@gmail.com wrote: I was curious at a level slightly higher than dig through the code how illustrate is so fast, and how it deals with joins effectively. Are there any resources on this (or does anyone at Hortonworks want to write a tech oriented blog post? :)
Re: Possible bug in replicated join?
That certainly looks like a bug. The replicated join should not flatten the tuple. I didn't actually know that pig supported doing joins on tuples (i guess it does not allow that on maps and bags). -Thejas On 6/21/12 11:29 AM, Jonathan Coveney wrote: Am posting before making a ticket just to make sure I'm not doing something stupid or missing something obvious. $ cat data 1 2 3 4 5 a = load 'data' as (x:int); b = foreach a generate TOTUPLE(x); c = load 'data' as (x:int); d = foreach c generate TOTUPLE(x); e = join b by $0, d by $0; dump e; ((1),(1)) ((2),(2)) ((3),(3)) ((4),(4)) ((5),(5)) ok but f = join b by $0, d by $0 using 'replicated'; dump f; (1,1) (2,2) (3,3) (4,4) (5,5)
Re: [jira] [Resolved] (PIG-2650) Convenience mock Loader and Storer to simplify unit testing of Pig scripts
In my opinion, we should only commit changes to released branches that are either critical bug fixes, or very useful minor changes which are not likely to affect the stability of the branch. This change would fall into 2nd category. Thanks, Thejas On 4/26/12 2:32 PM, Bill Graham wrote: What's fair game to commit to the the 0.10 branch? Just bug fixes, or are new small features that didn't make it into 0.10 ok? On Thu, Apr 26, 2012 at 2:15 PM, Daniel Daida...@hortonworks.com wrote: I am fine with it. Please also include the following tiny patch to fix hadoop 23 build after the patch. --- pig/trunk/ivy.xml (original) +++ pig/trunk/ivy.xml Thu Apr 26 21:11:36 2012 @@ -178,7 +178,7 @@ dependency org=net.java.dev.javacc name=javacc rev=${javacc.version} conf=compile-master/ dependency org=junit name=junit rev=${junit.version} - conf=test-default/ + conf=compile-master/ dependency org=com.google.code.p.arat name=rat-lib rev=${rats-lib.version} conf=releaseaudit-default/ dependency org=org.codehaus.jackson name=jackson-mapper-asl rev=${jackson.version} Daniel On Thu, Apr 26, 2012 at 2:07 PM, Julien Le Demjul...@twitter.com wrote: I'm planning to commit this in 0.10 branch as well The patch has only new files so it will apply cleanly. Any objection? Julien On Apr 26, 2012, at 1:30 PM, Julien Le Dem (JIRA) wrote: [ https://issues.apache.org/jira/browse/PIG-2650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel] Julien Le Dem resolved PIG-2650. Resolution: Fixed Fix Version/s: 0.11 Convenience mock Loader and Storer to simplify unit testing of Pig scripts -- Key: PIG-2650 URL: https://issues.apache.org/jira/browse/PIG-2650 Project: Pig Issue Type: New Feature Reporter: Julien Le Dem Assignee: Julien Le Dem Fix For: 0.11 Attachments: PIG-2650-a.patch, PIG-2650-b.patch, PIG-2650-c.patch, PIG-2650.patch A test would look as follows: {code} PigServer pigServer = new PigServer(ExecType.LOCAL); TupleFactory tf = TupleFactory.getInstance(); Data data = Storage.resetData(pigServer.getPigContext()); data.set(foo, Arrays.asList( tf.newTuple(a), tf.newTuple(b), tf.newTuple(c) )); pigServer.registerQuery(A = LOAD 'foo' USING mock.Storage();); // some complex script to test pigServer.registerQuery(STORE A INTO 'bar' USING mock.Storage();); IteratorTuple out = data.get(bar).iterator(); assertEquals(a, out.next().get(0)); assertEquals(b, out.next().get(0)); assertEquals(c, out.next().get(0)); {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: HCatalog scans all partition even after mentioning date filter
cc'ing dev@pig as this is a pig issue. Aniket, What you saw is not related to PIG-2339 . In your example query, the logical plan will look like this - Load (A) | Split | --- | | Filter(B1) Filter(B2) ... Because of the split operator introduced between the filter conditions and load, the filter does not get pushed into the load function. A simple way to fix this in pig would be to not share the load across the filter operators. Another option is to push the condition (B1 or B2 or B3) into Load operator and retain rest of the current plan (split and filters following the split). You can ofcourse achieve the same effect by having a separate load statememnt as input for each of the filters. I agree that we should make it possible to ask pig to throw a warning/error if the query is going to result in a full table scan on a partitioned table. Thanks, Thejas On 4/24/12 7:56 PM, Aniket Mokashi wrote: Sorry Thejas, I didnt look into the jira properly earlier. EMR pig-0.9.1 already has that patch for PIG-2339 and hence I did not hit that issue earlier (and I patched datanucleus). filter-union was a workaround I was using to avoid some of the thrift timeout problems earlier. Thrift api's timeout on client side in 20sec by default (I found the config to change this later) and I hence used a = load 'table'; b1= filter by cond1; b2=filter by cond2;.. b= union b1, b2..; to expect to push these filters separately to the loader. But, that doesn't work in pig. (I can open a jira, but I havent done enough investigation at the code level). Thoughts? Thanks, Aniket On Tue, Apr 24, 2012 at 7:00 PM, Thejas Nair the...@hortonworks.com mailto:the...@hortonworks.com wrote: The issue was not specific to filter-union - https://issues.apache.org/__jira/browse/PIG-2339 https://issues.apache.org/jira/browse/PIG-2339. The fix was to do filter PushUpFilter before PartitionFilterOptimizer . As this is not a hcat issue, it should not matter if you have an older hcat version . fyi, this bug was not there in pig 0.8.x . Was it pig 0.9.0 or 0.9.1 that you used ? Thanks, Thejas On 4/24/12 5:21 PM, Aniket Mokashi wrote: Hi Thejas, Can you point me to jira that fixes filter-union problem (in pig)? I haven't tried hcat-0.4 yet, good to know about that issue. I will keep a watcher. Thanks, Aniket On Tue, Apr 24, 2012 at 4:51 PM, Thejas Nair the...@hortonworks.com mailto:the...@hortonworks.com mailto:the...@hortonworks.com mailto:the...@hortonworks.com__ wrote: Hi Aniket, Are you using pig 0.9 or 0.9.1 ? If yes, can you try with pig 0.9.2 ? Wondering if you are also hitting the issue that Thomas mentioned . Thanks, Thejas On 4/23/12 7:39 PM, Aniket Mokashi wrote: Something similar I have noticed is - A = load ... B1 = filter A by cond1; B2 = filter A by cond2; B3 = filter A by cond3; B = union B1, B2, B3; does not push projection. Is that expected? Ideally, we should have strict mode under hcatalog, that when turned on will avoid executing pig queries on the full (partitioned) table. Thanks, Aniket On Mon, Apr 23, 2012 at 7:32 PM, Rajesh Balamohan rajesh.balamo...@gmail.com mailto:rajesh.balamo...@gmail.com mailto:rajesh.balamohan@__gmail.com mailto:rajesh.balamo...@gmail.com mailto:rajesh.balamohan@ mailto:rajesh.balamohan@__gma__il.com http://gmail.com mailto:rajesh.balamohan@__gmail.com mailto:rajesh.balamo...@gmail.com wrote: Hi Alan, Thanks for the quick response. I am using HCatalog 0.4. With simple PIG script it works great. HCatalog beautifully scans only the relevant information. However, full scan happens only when we have couple of additional joins and when we change the INNER JOIN order (we also use using skewed). Though we have looked into the debug logs, we saw the scanning of number of records from the JobTracker's counters itself. Without pruning, the m/r job was pretty much scanning the entire set of rows. I am not sure if there is a corner case, where in skewed join is trying to override the filtering. ~Rajesh.B On Tue, Apr
Re: HCatalog scans all partition even after mentioning date filter
yes, please create one. Thanks, Thejas On 4/25/12 1:47 PM, Aniket Mokashi wrote: Hi Dmitriy and Thejas, Should I open a jira for the same? Thanks, Aniket On Wed, Apr 25, 2012 at 1:45 PM, Dmitriy Ryaboy dvrya...@gmail.com mailto:dvrya...@gmail.com wrote: Yeah I think we just need to get projection pushdown to work through Split operators. D On Wed, Apr 25, 2012 at 12:52 PM, Thejas Nair the...@hortonworks.com mailto:the...@hortonworks.com wrote: cc'ing dev@pig as this is a pig issue. Aniket, What you saw is not related to PIG-2339 . In your example query, the logical plan will look like this - Load (A) | Split | --- | | Filter(B1) Filter(B2) ... Because of the split operator introduced between the filter conditions and load, the filter does not get pushed into the load function. A simple way to fix this in pig would be to not share the load across the filter operators. Another option is to push the condition (B1 or B2 or B3) into Load operator and retain rest of the current plan (split and filters following the split). You can ofcourse achieve the same effect by having a separate load statememnt as input for each of the filters. I agree that we should make it possible to ask pig to throw a warning/error if the query is going to result in a full table scan on a partitioned table. Thanks, Thejas On 4/24/12 7:56 PM, Aniket Mokashi wrote: Sorry Thejas, I didnt look into the jira properly earlier. EMR pig-0.9.1 already has that patch for PIG-2339 and hence I did not hit that issue earlier (and I patched datanucleus). filter-union was a workaround I was using to avoid some of the thrift timeout problems earlier. Thrift api's timeout on client side in 20sec by default (I found the config to change this later) and I hence used a = load 'table'; b1= filter by cond1; b2=filter by cond2;.. b= union b1, b2..; to expect to push these filters separately to the loader. But, that doesn't work in pig. (I can open a jira, but I havent done enough investigation at the code level). Thoughts? Thanks, Aniket On Tue, Apr 24, 2012 at 7:00 PM, Thejas Nair the...@hortonworks.com mailto:the...@hortonworks.com mailto:the...@hortonworks.com mailto:the...@hortonworks.com wrote: The issue was not specific to filter-union - https://issues.apache.org/__jira/browse/PIG-2339 https://issues.apache.org/jira/browse/PIG-2339. The fix was to do filter PushUpFilter before PartitionFilterOptimizer . As this is not a hcat issue, it should not matter if you have an older hcat version . fyi, this bug was not there in pig 0.8.x . Was it pig 0.9.0 or 0.9.1 that you used ? Thanks, Thejas On 4/24/12 5:21 PM, Aniket Mokashi wrote: Hi Thejas, Can you point me to jira that fixes filter-union problem (in pig)? I haven't tried hcat-0.4 yet, good to know about that issue. I will keep a watcher. Thanks, Aniket On Tue, Apr 24, 2012 at 4:51 PM, Thejas Nair the...@hortonworks.com mailto:the...@hortonworks.com mailto:the...@hortonworks.com mailto:the...@hortonworks.com mailto:the...@hortonworks.com mailto:the...@hortonworks.com mailto:the...@hortonworks.com mailto:the...@hortonworks.com__ wrote: Hi Aniket, Are you using pig 0.9 or 0.9.1 ? If yes, can you try with pig 0.9.2 ? Wondering if you are also hitting the issue that Thomas mentioned . Thanks, Thejas On 4/23/12 7:39 PM, Aniket Mokashi wrote: Something similar I have noticed is - A = load ... B1 = filter A by cond1; B2 = filter A by cond2; B3 = filter A by cond3; B = union B1, B2, B3; does not push projection. Is that expected? Ideally, we should have strict mode under hcatalog, that when turned on will avoid executing pig queries on the full (partitioned) table. Thanks, Aniket On Mon, Apr 23, 2012 at 7:32 PM, Rajesh Balamohan rajesh.balamo...@gmail.com mailto:rajesh.balamo...@gmail.com
Re: [VOTE] Release Pig 0.10.0 (candidate 0)
+1 . Checked checksum and signatures of all 3 packages. Ran simple queries in MR and local modes using tar package on unsecure cluster, and rpm package on secure cluster. Thanks, Thejas On 4/20/12 12:39 AM, Daniel Dai wrote: Hi, I have created a candidate build for Pig 0.10.0. Keys used to sign the release are available at http://svn.apache.org/viewvc/pig/trunk/KEYS?view=markup. Please download, test, and try it out: http://people.apache.org/~daijy/pig-0.10.0-candidate-0/ Should we release this? Vote closes on next Tuesday, Apr 24th. Daniel
piggybank on github (was Re: Apache Pig hackday @ Twitter (SF))
On 4/18/12 3:24 PM, Russell Jurney wrote: I'm in. I'm going to work on getting piggybank on github, including for Jython and JRuby UDFs. I think the major work involved there is to figure out how to lower the barrier to contribute and having independent release cycles for the udfs while also having a way to ensure quality. ie, figuring out the policies for it are the harder part. Just copying the udfs to github will not help. CPAN seems to have figured that out. We need to see if we can adopt a policy like that. Thanks, Thejas
Re: Apache Pig hackday @ Twitter (SF)
Count me in -Thejas On 4/18/12 2:18 PM, Dmitriy Ryaboy wrote: Hi folks, The Analytics Infra team at Twitter will be hosting a Pig hackday on May 11. On the agenda: - get newcomers set up with the apache ticket process - review and commit a bunch of stuff that's not been getting love - hack on exciting new features - fix boring old problems - the Dmitriy critiques everyone's APIs hour - the Jonathan and Julien make fun of Dmitriy for being a hater hour - whatever else y'all want to do. Conveniently, the Twitter office is across the street from Yerba Buena Gardens in the middle of SF downtown, a 15 minute walk from Cal Train and Bart. After hacking, we can do bowling or something. Or drinking. Unfortunately we have very limited space, so let me know early if you would like to come and hack! Looking forward to hacking some Pig. -Dmitriy
Fw: GSoC 2012 mentor signup
fyi, For those who expressed interest in mentoring students for GSoC, seem email below for instructions to register . Here is the apache mentoring guide - http://community.apache.org/guide-to-being-a-mentor.html - Forwarded Message - From: Ulrich Stärk u...@apache.org To: p...@apache.org; code-awa...@apache.org Sent: Tuesday, March 20, 2012 1:28 AM Subject: GSoC 2012 mentor signup [PMCs, please see the PMC section below!] Potential GSoC 2012 mentors, It is time now to sign up to be a mentor for your GSoC 2012 project(s) if you haven't already done so. To do so, follow these 4 steps: 1. sign up at Google Melange [1] and note your link_id. 2. Add your link_id to [2] if it is not already in there. If you were using a different email address for registration with Google Melange, make sure that your alternate email address is listed at [3]. You can manage email aliases through [4]. 3. Request to be a mentor for the ASF within Google Melange. 4. Send an email to code-awa...@apache.org, cc'ing the PMC(s) for which you want to mentor projects, stating that you want to be a mentor for PMC(s) x,y and z asking for silent ackknowledgement from the PMC. IMPORTANT: We won't process mentor requests in melange if you have not copied your PMC in your mentor request to code-awa...@apache.org. -- PMCs, Potential mentors will be asking you to ACK their mentor requests. If you feel that the person asking to be a mentor is not fit to mentor projects for your PMC for whatever reasons, it is your duty to NACK their request by replying to their email and copying code-awa...@apache.org. If you don't have any objections either stay silent or better yet, ACK their request. Also, please forward this email to would-be mentors not on your PMC. For the GSoC admins, Uli [1] http://www.google-melange.com/gsoc/homepage/google/gsoc2012 [2] https://svn.apache.org/repos/private/committers/GsocLinkId.txt [3] https://id.apache.org/info/MailAlias.txt [4] https://id.apache.org
Re: Where do we want to put non-java source files?
Sounds good to me. My thoughts on the costs of this change - - svn will still retain the history of the moved files. So that is not a problem. - build.xml would need some minor changes - some extra steps will be required to apply the patches generated against old directory structure. Thanks, Thejas On 3/15/12 5:54 PM, Bill Graham wrote: +1 for src/main/ruby and src/main/java. On Thu, Mar 15, 2012 at 5:22 PM, Jonathan Coveneyjcove...@gmail.comwrote: So with the jruby addition (which I'm putting a cherry on top of as we speak!), there's going to be some source files in ruby. Given that we don't currently have (afaik) any code in languages other than java, there isn't a clear place to put this. The files are such that they can be packaged in pig.jar and referenced via that (hooray for jruby), but we need a home for them. The ideal would be src/main/ruby/, and move all the java to src/main/java/, but this seems like a pretty traumatic change at this point to accomodate one file...even if we add some python and more ruby files, it doesn't seem worth killing old patches. We could also do src-ruby in the base dir and just go from there? Thoughts? Jon
Re: How Logical Plan Generator works?
See initial sections in http://infolab.stanford.edu/~olston/publications/vldb09.pdf for overview of logical plan. LogicalPlanGenerator.g is a the place where logical plan is created from parse tree. You would need to look at antlr basics to understand that. (almost?) all pig relational operations correspond to a subclass of LogicalRelationalOperator in org.apache.pig.newplan.logical.relational package. Expressions within a relation are subclasses of LogicalExpressionOperator. This document talks about motivations behind the logical plan redesign and about some special operations like LOInnerLoad, and special handling for foreach operator. http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite -Thejas On 1/29/12 8:41 PM, Prasanth J wrote: Hello Everyone I am a newbie to pig. I was going through https://cwiki.apache.org/PIG/guide-for-new-contributors.html, specifically the grammar files to start off with. I could not understand how LogicalPlanGenerator.g works by looking into the grammar file. Also there isn't much documentation available which explains how logical plans are generated for different pig operators. Is there any reference from which I can learn more about the internals (especially the logical plan generation part)? Thanks Prasanth
Re: [VOTE] Release Pig 0.9.2 (candidate 1)
+1 Checked the md5 checksums, keys of all 3 packages. Ran some simple queries using the rpm package on a secure and unsecure cluster. Checked the -version command. -Thejas On 1/18/12 11:21 AM, Daniel Dai wrote: For your information, I took a shortcut last night to refresh the candidate 1 to include 2 hadoop 23 fix (PIG-2347-4, PIG-2480). If you download the candidate yesterday, you may need to redownload the candidate. Thanks, Daniel On Tue, Jan 17, 2012 at 5:16 PM, Daniel Daida...@hortonworks.com wrote: Hi, I have created a candidate build for Pig 0.9.2. This is the second maintenance release of Pig 0.9. The rat report showed no issues in Java files outside of build directory. Keys used to sign the release are available at http://svn.apache.org/viewvc/pig/trunk/KEYS?view=markup. Please download, test, and try it out: http://people.apache.org/%7Edaijy/pig-0.9.2-candidate-0 http://people.apache.org/~daijy/pig-0.9.2-candidate-1/ Should we release this? Vote closes on this Friday EOD, Jan 20th. Thanks, Daniel
problems with @hortonworks.com email and apache mailing lists?
This is the 2nd apache user group that has reported that emails to my @hortonworks.com address are bouncing. Is anybody else seeing this ? Any way to fix it ? I tried searching for a solution for this, but didn't find any. -Thejas Original Message Subject: warning from u...@pig.apache.org Date: 9 Jan 2012 16:30:57 - From: user-h...@pig.apache.org To: the...@hortonworks.com Hi! This is the ezmlm program. I'm managing the u...@pig.apache.org mailing list. I'm working for my owner, who can be reached at user-ow...@pig.apache.org. Messages to you from the user mailing list seem to have been bouncing. I've attached a copy of the first bounce message I received. If this message bounces too, I will send you a probe. If the probe bounces, I will remove your address from the user mailing list, without further notice. I've kept a list of which messages from the user mailing list have bounced from your address. Copies of these messages may be in the archive. To retrieve a set of messages 123-145 (a maximum of 100 per request), send a short message to: user-get.123_...@pig.apache.org To receive a subject and author list for the last 100 or so messages, send a short message to: user-in...@pig.apache.org Here are the message numbers: 7950 --- Enclosed is a copy of the bounce message I received. Return-Path: Received: (qmail 9035 invoked for bounce); 29 Dec 2011 00:09:21 - Date: 29 Dec 2011 00:09:21 - From: mailer-dae...@apache.org To: user-return-79...@pig.apache.org Subject: failure notice Hi. This is the qmail-send program at apache.org. I'm afraid I wasn't able to deliver your message to the following addresses. This is a permanent error; I've given up. Sorry it didn't work out. the...@hortonworks.com: 74.125.53.26 failed after I sent the message. Remote host said: 550 5.7.1 Unauthenticated email is not accepted from this domain. d6si14020775pbk.191
Re: Next Pig release proposal
changes. Nov release will be 1.0.0, Feb release will be 1.1.0. There will be 1.0.1, 1.1.1 etc for bug fixes. I personally prefer scheme 2, increasing major version too frequently might be confusing to users. How's other folks feel? Daniel On Sat, Oct 22, 2011 at 2:31 AM, Gianmarco De Francisci Morales g...@apache.orgwrote: Hi, just my 2 cents. I think the issue here is not 1.0 vs 0.10, but what's the versioning scheme we want to use for Pig. Up to now it has been just an increasing number after a '0.' prefix, changed when the community felt it was time. I think this works well for a small project, but it is somewhat fuzzy. I like the idea of havingmajor.minor.patchversions like many other projects. It's a very clear and almost standard way of versioning a piece of software. It has clear rules on when to change each of the numbers, and lets the user get an idea of backward compatibility at a glance. So, to conclude, I am in favor of going 1.0 (or 1.0.0) as long as we decide a clear versioning policy (whichever it is). So that the 1.0 milestone would mark the beginning of our new policy. Cheers, -- Gianmarco On Fri, Oct 21, 2011 at 23:10,Milind.Bhandarkar@emc.**commilind.bhandar...@emc.com wrote: If one were to rewrite input and output formats to use the webhdfs:// APIs, this would not be an issue, right ? - milind On 10/21/11 1:50 PM, Santhosh Srinivasans...@yahoo-inc.comwrote: If I was not clear in my earlier email, I apologize for the lack of clarity. I am no longer in favour of waiting for Hadoop API stability across Hadoop versions. It's a pipe dream. When we had PigInputFormat and PigOutputFormat, your reasoning would be spot on. I am concerned about the following. Our tight integration with Hadoop due to the use of Input and Output format might lead to a break in backward compatibility. I am not sure if the comparison with that of Java is valid. Probably a majority of the users don't use JNI. Its very hard to use Pig without writing custom load and store functions. The default load and store don't suffice for a majority of use cases that I have observed. I am trying to get all factors that might influence this decision. From the few emails that have been exchanged since yesterday, we have the following factors: 1. Hadoop 0.20.205 (support for Append) 2. Hadoop 0.22 3. Hadoop 0.23 4. Maturity of the new parser 5. Stability of the new logical plan 6. Other components in the eco system. - Avro (1.5.4, 1.4.1, ...) - Cassandra (1.0.0, 0.8.7, ...) - Chukwa (0.4.0, 0.3.0, ...) - Hama (0.3.0, 0.2.0, ...) - Hbase (0.90.4, 0.90.3, 0.90.2, 0.90.1, ...) - Hive (Releases - 0.7.1, 0.7.0, 0.6.0, ...) - Zookeeper (3.3.3, 3.3.2, 3.2.2, 3.1.2, ...) Santhosh -Original Message- From: Thejas Nair [mailto:the...@hortonworks.com**] Sent: Friday, October 21, 2011 11:22 AM To: dev@pig.apache.org Subject: Re: Next Pig release proposal Santosh, I thought you meant API stability for hadoop across major versions, but I guess you are referring to stability within 0.23 versions. But argument applies to that as well, if 0.23.1 is not compatible with 0.23.0, we need to call the release for 0.23.1 as 'pig 1.x for 0.23.1 api' . We just need to communicate to the users that the InputFormat/OutputFormat api's (and any anything else we expose from hadoop) depends on the hadoop version they are using. I think it is just like different JNI libraries that you would write for different OS. But the java version remains the same across OSs. -Thejas On 10/21/11 10:59 AM, Santhosh Srinivasan wrote: Thejas, I guess you did not read my email completely. You are referring to the premise without examining the conclusion. I am repasting my entire email to avoid confusion (I hate truncated references). If you could respond again, it will bring us onto the same page. email Ref: http://tinyurl.com/4ng8upa (last discussion on 1.0) How far have we progressed from our last discussion in March. There was no consensus on the 1.0 release. Opinions ranged from having more releases to bake in the maturity of the new parser and logical plan changes to compatibility with Hadoop API (was compared to Social Security - a very hot topic these days). My concerns were around Hadoop API stability. I have heard that the APIs will not be stable for at least 1 year. This is taking me away from the Hadoop API stability factor (They passed healthcare in that duration. Really!) Do we want compatibility with 0.23 as a gating factor - not sure if this is anywhere close to getting done in the near future. Will we support append (0.20.205)? Btw, Hbase has been doing 0.90.1, 0.90.2, etc. So we can take a look at this option too. Santhosh -Original Message- From: Olga Natkovich [mailto:ol...@yahoo-inc.com] Sent: Thursday
Re: Next Pig release proposal
On 10/24/11 12:43 PM, Dmitriy Ryaboy wrote: We are finding a fair number of issues trying to move from Pig 0.8.1 to 0.9, and I don't think those issues are fixed in 10, either.. not sure that this stabilization process has happened yet. D What kind of issues are these ? Are they related to major changes in 0.8 (logical plan) or 0.9 (antlr parser, or semantic cleanup (in terms of backward compat) ) ? -Thejas
Re: Next Pig release proposal
Dmitriy, I think what you are saying is something similar to alpha/beta releases. (maybe beta1, beta2 .. is better). So the first release could be 1.0.0_beta1. I scheme will be easier for users to understand. But I am not sure what the criteria for promoting a release from betaX to general release should be. Thanks, Thejas On 10/24/11 5:38 PM, Dmitriy Ryaboy wrote: To be a little more concrete about what I am saying here -- I don't think we should put a 1.0 label on any *.0 release. 0.8.1 is pretty solid; 0.9.0 has some holes, 0.9.1 is better. If we put 1.0 on what is currently being thought of as 0.10, it will have some stability / usability issues (things tend to show up after we make a release and people in the wild start trying it), and those issues will make a poor impression on those who expect 1.0 to be shiny and polished after so much time. I'm in favor of waiting a couple of dot releases, promoting a stabilized release into 1.0, and going from there. So, pictorially: -- trunk --- 0.11-dev --0.12-dev--| 1.2-dev! \ \ \ \ 0.11.0 | 1.1.0! \ \--- 0.10.0 --- 0.10.1 --- 0.10.2 | 1.0.0 !! On Mon, Oct 24, 2011 at 12:43 PM, Dmitriy Ryaboydvrya...@gmail.com wrote: I am good with Scheme 2. We are finding a fair number of issues trying to move from Pig 0.8.1 to 0.9, and I don't think those issues are fixed in 10, either.. not sure that this stabilization process has happened yet. D On Mon, Oct 24, 2011 at 11:59 AM, Daniel Daida...@hortonworks.comwrote: Yes, we need a versioning scheme. There are two versioning scheme I can think of: Scheme 1: major.patch major will be the feature rich release every 3 month patch will be the bug fix release when necessary Nov release will be 1.0, Feb release will be 2.0. There will be 1.1, 2.1 etc for bug fixes. Scheme 2: major.minor.patch Most of our 3 month release will be counted asminor release unless there are major user facing/disruptive changes. Nov release will be 1.0.0, Feb release will be 1.1.0. There will be 1.0.1, 1.1.1 etc for bug fixes. I personally prefer scheme 2, increasing major version too frequently might be confusing to users. How's other folks feel? Daniel On Sat, Oct 22, 2011 at 2:31 AM, Gianmarco De Francisci Morales g...@apache.org wrote: Hi, just my 2 cents. I think the issue here is not 1.0 vs 0.10, but what's the versioning scheme we want to use for Pig. Up to now it has been just an increasing number after a '0.' prefix, changed when the community felt it was time. I think this works well for a small project, but it is somewhat fuzzy. I like the idea of havingmajor.minor.patch versions like many other projects. It's a very clear and almost standard way of versioning a piece of software. It has clear rules on when to change each of the numbers, and lets the user get an idea of backward compatibility at a glance. So, to conclude, I am in favor of going 1.0 (or 1.0.0) as long as we decide a clear versioning policy (whichever it is). So that the 1.0 milestone would mark the beginning of our new policy. Cheers, -- Gianmarco On Fri, Oct 21, 2011 at 23:10,milind.bhandar...@emc.com wrote: If one were to rewrite input and output formats to use the webhdfs:// APIs, this would not be an issue, right ? - milind On 10/21/11 1:50 PM, Santhosh Srinivasans...@yahoo-inc.com wrote: If I was not clear in my earlier email, I apologize for the lack of clarity. I am no longer in favour of waiting for Hadoop API stability across Hadoop versions. It's a pipe dream. When we had PigInputFormat and PigOutputFormat, your reasoning would be spot on. I am concerned about the following. Our tight integration with Hadoop due to the use of Input and Output format might lead to a break in backward compatibility. I am not sure if the comparison with that of Java is valid. Probably a majority of the users don't use JNI. Its very hard to use Pig without writing custom load and store functions. The default load and store don't suffice for a majority of use cases that I have observed. I am trying to get all factors that might influence this decision. From the few emails that have been exchanged since yesterday, we have the following factors: 1. Hadoop 0.20.205 (support for Append) 2. Hadoop 0.22 3. Hadoop 0.23 4. Maturity of the new parser 5. Stability of the new logical plan 6. Other components in the eco system. - Avro (1.5.4, 1.4.1, ...) - Cassandra (1.0.0, 0.8.7, ...) - Chukwa (0.4.0, 0.3.0, ...) - Hama (0.3.0, 0.2.0, ...) - Hbase (0.90.4, 0.90.3, 0.90.2, 0.90.1, ...) - Hive (Releases - 0.7.1, 0.7.0, 0.6.0, ...) - Zookeeper (3.3.3, 3.3.2, 3.2.2, 3.1.2, ...) Santhosh -Original Message- From: Thejas Nair [mailto:the...@hortonworks.com] Sent: Friday, October 21, 2011 11:22 AM To: dev@pig.apache.org Subject: Re: Next
LogicalExpressionSimplifier rules
Sending this email for getting wider attention. I propose disabling LogicalExpressionSimplifier optimizer rule, because the complexity of that rule and number of bugs that seem to come from there does not justify the expected performance gains - https://issues.apache.org/jira/browse/PIG-2316?focusedCommentId=13124489page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13124489 In general, I think any new code that is significantly complex (ie hard to maintain, and likely source of bugs) should be added to pig only if there are enough gains to justify it. -Thejas
Re: Review Request: Using COR function in Piggybank results in ERROR 2018: Internal error. Unable to introduce the combiner for optimization
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1929/#review1974 --- trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/CombinerOptimizer.java https://reviews.apache.org/r/1929/#comment4462 I think a comment will be useful - // The algebraic udf can have more than one input. Add the udf only once trunk/src/org/apache/pig/builtin/COR.java https://reviews.apache.org/r/1929/#comment4463 The size of the tuple would need to be size*(size-1). Details - the inner loop is executed - (n-1) + (n-2) + .. (n - (n-1)) = n(n-1)/2 . Each time the inner loop is executed two columns are being added. So 2 * n(n-1)/2 = n(n-1) trunk/src/org/apache/pig/builtin/COR.java https://reviews.apache.org/r/1929/#comment4464 I don't understand why the values are being added to a tuple as columns. That does not look right. - Thejas On 2011-09-16 18:11:08, Daniel Dai wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1929/ --- (Updated 2011-09-16 18:11:08) Review request for pig and Thejas Nair. Summary --- See PIG-2286 This addresses bug PIG-2286. https://issues.apache.org/jira/browse/PIG-2286 Diffs - trunk/src/org/apache/pig/builtin/COR.java 1171325 trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/CombinerOptimizer.java 1171325 trunk/test/e2e/pig/tests/nightly.conf 1171325 Diff: https://reviews.apache.org/r/1929/diff Testing --- Unit-test: all pass Piggybank-test: TestDBStorage fail for other reason, unrelated to patch Test-patch: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Thanks, Daniel
Re: Review Request: PIG-2228: support partial aggregation in map task
On 2011-09-13 09:15:46, Dmitriy Ryaboy wrote: trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POLocalRearrange.java, line 296 https://reviews.apache.org/r/1817/diff/1/?file=40193#file40193line296 Not sure about the value of this comment :) cleaning that - Thejas --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1817/#review1868 --- On 2011-09-15 17:27:08, Thejas Nair wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1817/ --- (Updated 2011-09-15 17:27:08) Review request for pig, Daniel Dai and Dmitriy Ryaboy. Summary --- See PIG-2228 This addresses bug PIG-2228. https://issues.apache.org/jira/browse/PIG-2228 Diffs - trunk/conf/pig.properties 1170885 trunk/src/org/apache/pig/Algebraic.java 1170885 trunk/src/org/apache/pig/Main.java 1170885 trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/CombinerOptimizer.java 1170885 trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java 1170885 trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PhyPlanSetter.java 1170885 trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/plans/EndOfAllInputSetter.java 1170885 trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PhyPlanVisitor.java 1170885 trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PlanPrinter.java 1170885 trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POLocalRearrange.java 1170885 trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPartialAgg.java PRE-CREATION trunk/src/org/apache/pig/data/DefaultTuple.java 1170885 trunk/src/org/apache/pig/data/InternalCachedBag.java 1170885 trunk/src/org/apache/pig/data/InternalDistinctBag.java 1170885 trunk/src/org/apache/pig/data/InternalSortedBag.java 1170885 trunk/src/org/apache/pig/data/SelfSpillBag.java PRE-CREATION trunk/src/org/apache/pig/data/SizeUtil.java PRE-CREATION trunk/src/org/apache/pig/data/SortedSpillBag.java 1170885 trunk/src/org/apache/pig/tools/pigstats/ScriptState.java 1170885 trunk/test/e2e/pig/tests/nightly.conf 1170885 trunk/test/org/apache/pig/test/TestDataBag.java 1170885 trunk/test/org/apache/pig/test/TestPOPartialAgg.java PRE-CREATION trunk/test/org/apache/pig/test/TestPOPartialAggPlan.java PRE-CREATION trunk/test/org/apache/pig/test/Util.java 1170885 trunk/test/org/apache/pig/test/utils/GenPhyOp.java 1170885 Diff: https://reviews.apache.org/r/1817/diff Testing --- test-patch [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 21 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] -1 release audit. The applied patch generated 461 release audit warnings (more than the trunk's current 455 warnings). release audit failures are because of jdiff changes All unit tests pass, new e2e tests added . Thanks, Thejas
Re: Review Request: PIG-2228: support partial aggregation in map task
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1817/#review1916 --- trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPartialAgg.java https://reviews.apache.org/r/1817/#comment4397 removed the extra ; in the patch checked in. - Thejas On 2011-09-15 17:27:08, Thejas Nair wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1817/ --- (Updated 2011-09-15 17:27:08) Review request for pig, Daniel Dai and Dmitriy Ryaboy. Summary --- See PIG-2228 This addresses bug PIG-2228. https://issues.apache.org/jira/browse/PIG-2228 Diffs - trunk/conf/pig.properties 1170885 trunk/src/org/apache/pig/Algebraic.java 1170885 trunk/src/org/apache/pig/Main.java 1170885 trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/CombinerOptimizer.java 1170885 trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java 1170885 trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PhyPlanSetter.java 1170885 trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/plans/EndOfAllInputSetter.java 1170885 trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PhyPlanVisitor.java 1170885 trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/plans/PlanPrinter.java 1170885 trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POLocalRearrange.java 1170885 trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POPartialAgg.java PRE-CREATION trunk/src/org/apache/pig/data/DefaultTuple.java 1170885 trunk/src/org/apache/pig/data/InternalCachedBag.java 1170885 trunk/src/org/apache/pig/data/InternalDistinctBag.java 1170885 trunk/src/org/apache/pig/data/InternalSortedBag.java 1170885 trunk/src/org/apache/pig/data/SelfSpillBag.java PRE-CREATION trunk/src/org/apache/pig/data/SizeUtil.java PRE-CREATION trunk/src/org/apache/pig/data/SortedSpillBag.java 1170885 trunk/src/org/apache/pig/tools/pigstats/ScriptState.java 1170885 trunk/test/e2e/pig/tests/nightly.conf 1170885 trunk/test/org/apache/pig/test/TestDataBag.java 1170885 trunk/test/org/apache/pig/test/TestPOPartialAgg.java PRE-CREATION trunk/test/org/apache/pig/test/TestPOPartialAggPlan.java PRE-CREATION trunk/test/org/apache/pig/test/Util.java 1170885 trunk/test/org/apache/pig/test/utils/GenPhyOp.java 1170885 Diff: https://reviews.apache.org/r/1817/diff Testing --- test-patch [exec] -1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 21 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] -1 release audit. The applied patch generated 461 release audit warnings (more than the trunk's current 455 warnings). release audit failures are because of jdiff changes All unit tests pass, new e2e tests added . Thanks, Thejas
going to request yourkit license for committers
FYI- Yourkit is very useful java profiling tool and they give license for free for use by open source projects. I am planning to request license for use by pig committers. But they need a reference from the web pages of the project to their website. - http://www.yourkit.com/purchase/index.jsp . I believe a link from a credits page should be sufficient. As the project would need to thank them, I am sharing my plan before contacting them. Thanks, Thejas
Re: Review Request: Limit produce wrong number of records after foreach flatten
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1627/#review1621 --- Ship it! +1 - Thejas On 2011-08-23 17:08:10, Daniel Dai wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1627/ --- (Updated 2011-08-23 17:08:10) Review request for pig and Thejas Nair. Summary --- See PIG-2231 This addresses bug PIG-2231. https://issues.apache.org/jira/browse/PIG-2231 Diffs - trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java 1160494 trunk/test/org/apache/pig/test/TestEvalPipeline2.java 1160494 Diff: https://reviews.apache.org/r/1627/diff Testing --- test-patch: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Unit test: all pass Thanks, Daniel
Re: Review Request: NullPointerException while Accessing Empty Bag in FOREACH { FILTER }
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1600/#review1571 --- Ship it! +1 - Thejas On 2011-08-19 20:36:09, Daniel Dai wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/1600/ --- (Updated 2011-08-19 20:36:09) Review request for pig and Thejas Nair. Summary --- See PIG-2185 This addresses bug PIG-2185. https://issues.apache.org/jira/browse/PIG-2185 Diffs - trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POProject.java 1159742 trunk/test/org/apache/pig/test/TestEvalPipeline2.java 1159742 Diff: https://reviews.apache.org/r/1600/diff Testing --- test-patch pass: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Unit tests pass. Thanks, Daniel
Re: Failing tests after parser change?
Dmitriy, You don't realize how lucky you are! ;) I have been trying hard to reproduce this problem, so that I can check if the patch in PIG-2055 actually fixes the issue. I ran build+ (small)test in a loop for 2000+ times, and this hasn't happened yet. If this is happening (almost) consistently, can you try the patch in PIG-2055 and see if that helps ? Thanks, Thejas On 8/11/11 9:44 AM, Alan Gates wrote: This looks like the intermittent Antlr bug we're seeing (https://issues.apache.org/jira/browse/PIG-2055). We're testing other versions of Antlr to try to fix this, but until we find one that addresses the issue the only solution is to do ant clean, and then rebuild and see if it goes away. We have also noticed it happens more often when built on Mac than on Linux, if you happen to have a Linux box you could build on. Alan. On Aug 10, 2011, at 11:24 PM, Dmitriy Ryaboy wrote: HBaseStorage is failing, and it's not something we did to HBaseStorage... Looks like the parser. Any takers? Testcase: testStoreToHBase_2_with_projection took 0.34 sec Caused an ERROR Error during parsing.line 1, column 84 mismatched input '(' expecting SEMI_COLON org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing.line 1, column 84 mismatched input '(' expecting SEMI_COLON at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1597) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1540) at org.apache.pig.PigServer.registerQuery(PigServer.java:540) at org.apache.pig.PigServer.registerQuery(PigServer.java:553) at org.apache.pig.test.TestHBaseStorage.scanTable1(TestHBaseStorage.java:771) at org.apache.pig.test.TestHBaseStorage.scanTable1(TestHBaseStorage.java:767) at org.apache.pig.test.TestHBaseStorage.testStoreToHBase_2_with_projection(TestHBaseStorage.java:706) Caused by: Failed to parse:line 1, column 84 mismatched input '(' expecting SEMI_COLON at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:222) at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:164) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1589)
Please welcome pig's newest committer - Gianmarco De Francisci Morales
Dear pig community, Please welcome Gianmarco as the newest committer to apache pig project! He has been contributing to pig for more than a year. His contributions include the use of binary comparator in secondary sort , support for default output in split operator, use of scalar expression in limit/sample and several other bug fixes. He has also been helping users out in the mailing lists. Congratulations Gianmarco! - Thejas
Re: [VOTE] Release Pig 0.9.0 (candidate 1)
+1 Ran queries in local mode on mac, test-commit, and verified md5 checksum. -Thejas On 7/22/11 4:24 PM, Alan Gates wrote: +1. Ran the test-commit, tutorial, and quick sanity test against a real cluster on Linux, ran a quick sanity test in local mode on Mac. Checked signature key and md5. Alan. On Jul 22, 2011, at 2:12 PM, Olga Natkovich wrote: I have created the second candidate build for Pig 0.9.0 release. This release introduces control structures, changes query parser, and performs semantic cleanup. The rat report showed no issues in Java files outside of build directory. Keys used to sign the release are available at http://svn.apache.org/viewvc/pig/trunk/KEYS?view=markup. Please try it out: http://people.apache.org/~olga/pig-0.9.0-candidate-1/ Should we release this? Vote closes on Wednesday, July 27. Olga
Re: Review Request: Project UDF output inside a non-foreach statement fail on 0.8
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/767/#review1105 --- Ship it! +1 - thejas On 2011-05-19 22:26:01, Daniel Dai wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/767/ --- (Updated 2011-05-19 22:26:01) Review request for pig and thejas. Summary --- See PIG-2077 This addresses bug PIG-2077. https://issues.apache.org/jira/browse/PIG-2077 Diffs - branches/branch-0.8/src/org/apache/pig/newplan/logical/LogicalExpPlanMigrationVistor.java 1104455 branches/branch-0.8/test/org/apache/pig/test/TestEvalPipeline2.java 1104455 Diff: https://reviews.apache.org/r/767/diff Testing --- Test patch: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Unit test: all pass End to end test: all pass Thanks, Daniel
Re: Cubing in Pig
+1 to what Gianmarco said about the place to do it. See sample_clause in LogicalPlanGenerator.g. I tried the expanded query (2 dimensions) with 0.8, it results only in 2 MR jobs, the 1st MR job has all the computation being done in a single MR job. The 2nd MR job just concats the outputs into one file. See- http://pastebin.com/aarBELC2. I got an exception in 0.9 for same query, I have created a jira (PIG-2164) to address that. The CubeDimensions udf would be a nice way to get around a combiner issue, but the combiner issue (if any) should actually get fixed. In the example, you are putting all records into same file. That would lead to a problem, because it will not be possible to distinguish between the records for (group by (a,b)) that have value of b as null and (group by (a,null)). If all inputs go into same file, it would need to have a marker column to indicate the input it belongs to. I think, in most cases people would read the results of different group-by combinations separately, so it makes sense to have different output files. (eg, 8 files if there are 3 dimensions). Ie, a split on the marker column might have to be introduced. Thanks, Thejas On 7/13/11 6:05 PM, Dmitriy Ryaboy wrote: Arnab has a really interesting presentation at the post-hadoop-summit Pig meeting about how Cubing could work in Map-Reduce, and suggested a straightforward path to integrating into Pig. Arnab, do you have the presentation posted somewhere? In any case, I started mucking around a little with this, trying to hack in the naive solution. So far, one interesting result, followed by a question: I manually cubed by writing a bunch of group-bys, like so (using pig 8) : ab = foreach (group rel by (a, b)) generate flatten(group) as (a, b), COUNT_STAR(rel) as cnt; a_only = foreach (group rel by (a, null)) generate flatten(group) as (a, b), COUNT_STAR(rel) as cnt; b_only = foreach (group rel by (null, b)) generate flatten(group) as (a, b), COUNT_STAR(rel) as cnt; ab = foreach (group rel by (null, null)) generate flatten(group) as (a, b), COUNT_STAR(rel) as cnt; cube = union ab, a_only, b_only, ab; store cube Except for extra fun, I did this with 3 dimensions and therefore 8 groupings. This generated 4 MR jobs, the first of which moved all the data across the wire despite the fact that COUNT_STAR is algebraic. On my test dataset, the work took 18 minutes. I then wrote a UDF that given a tuple, created all the cube dimensions of the tuple -- so CubeDimensions(a, b) returns { (a, b), (a, null), (null, b), (null, null) }, and this works on any number of dimensions. The naive cube then simply becomes this: cubed = foreach rel generate flatten(CubeDimensions(a, b)); cube = foreach (group rel by $0) generate flatten(group) as (a, b), COUNT_STAR(rel); On the same dataset, this generated only 1 MR job, and ran in 3 minutes because we were able to take advantage of the combiners! Assuming algebraic aggregations, this is actually pretty good given how little work it involves. I looked at adding a new operator that would be (for now) syntactic sugar around this pattern -- basically, CUBE rel by (a, b, c) would insert the operators equivalent to the code above. I can muddle my way through the grammar. What's the appropriate place to put the translation logic? Logical to physical compiler? Optimizer? The LogicalPlanBuilder? D
Re: Cubing in Pig
On 7/14/11 3:03 PM, Dmitriy Ryaboy wrote: In the dw world, using a single table and using null as an all marker is the standard thing to do But I imagine that in the dw world, the cube results would get stored in such a way that you can efficiently retrieve results of specific group-bys (partitions?). That would be similar to storing results of different group-bys operations in different output files. On the other hand, its possible that the results of most cube operations are probably small enough that you could do rest of the processing using a excel spreadsheet! (so partitioning does not matter) . In my udf I actually allow an optional string to be passed to the constructor to denote all if null is a valid value... I'll post the udf shortly, it's a prerequisite to LOCube. If results of all group-by's are are stored together, I think some such feature to indicate if its actually a null or a '*' ( the 'all' marker symbol used in Arnab's presentation) will be essential. I suspect the case of splitting out the agg levels is actually more rare, and can easily be accomplished with a SPLIT operator. The other nice Thing about the udf is how much code it saves, esp for larger numbers of dimensions. The udf code saving is important if the script is being written manually. But if pig is doing automatic translation (and assuming multiple output files is what makes sense), translating into multiple group-by statements might be more efficient, as it can avoid the filtering that would need to be done for split. But I agree that implementing this feature using udf is going to be easier. Any changes to make it more efficient can be done later. Perhaps my sample script generated 4 jobs because I had 3 dimensions? I doubt if it is because of number of dimensions, I think there might have been something else in the query that prevented the group-by's from being combined together. Do you still have the original script ? Can you send the script (and maybe the explain output) ? Thanks, Thejas On Jul 14, 2011, at 4:10 PM, Thejas Nairthe...@hortonworks.com wrote: +1 to what Gianmarco said about the place to do it. See sample_clause in LogicalPlanGenerator.g. I tried the expanded query (2 dimensions) with 0.8, it results only in 2 MR jobs, the 1st MR job has all the computation being done in a single MR job. The 2nd MR job just concats the outputs into one file. See- http://pastebin.com/aarBELC2. I got an exception in 0.9 for same query, I have created a jira (PIG-2164) to address that. The CubeDimensions udf would be a nice way to get around a combiner issue, but the combiner issue (if any) should actually get fixed. In the example, you are putting all records into same file. That would lead to a problem, because it will not be possible to distinguish between the records for (group by (a,b)) that have value of b as null and (group by (a,null)). If all inputs go into same file, it would need to have a marker column to indicate the input it belongs to. I think, in most cases people would read the results of different group-by combinations separately, so it makes sense to have different output files. (eg, 8 files if there are 3 dimensions). Ie, a split on the marker column might have to be introduced. Thanks, Thejas On 7/13/11 6:05 PM, Dmitriy Ryaboy wrote: Arnab has a really interesting presentation at the post-hadoop-summit Pig meeting about how Cubing could work in Map-Reduce, and suggested a straightforward path to integrating into Pig. Arnab, do you have the presentation posted somewhere? In any case, I started mucking around a little with this, trying to hack in the naive solution. So far, one interesting result, followed by a question: I manually cubed by writing a bunch of group-bys, like so (using pig 8) : ab = foreach (group rel by (a, b)) generate flatten(group) as (a, b), COUNT_STAR(rel) as cnt; a_only = foreach (group rel by (a, null)) generate flatten(group) as (a, b), COUNT_STAR(rel) as cnt; b_only = foreach (group rel by (null, b)) generate flatten(group) as (a, b), COUNT_STAR(rel) as cnt; ab = foreach (group rel by (null, null)) generate flatten(group) as (a, b), COUNT_STAR(rel) as cnt; cube = union ab, a_only, b_only, ab; store cube Except for extra fun, I did this with 3 dimensions and therefore 8 groupings. This generated 4 MR jobs, the first of which moved all the data across the wire despite the fact that COUNT_STAR is algebraic. On my test dataset, the work took 18 minutes. I then wrote a UDF that given a tuple, created all the cube dimensions of the tuple -- so CubeDimensions(a, b) returns { (a, b), (a, null), (null, b), (null, null) }, and this works on any number of dimensions. The naive cube then simply becomes this: cubed = foreach rel generate flatten(CubeDimensions(a, b)); cube = foreach (group rel by $0) generate flatten(group) as (a, b), COUNT_STAR(rel); On the same dataset, this generated only 1 MR job, and
Re: Pig testing proposal
On 7/14/11 2:39 PM, Alan Gates wrote: I have posted a proposal for changes in Pig's testing that I would like to make. https://cwiki.apache.org/confluence/display/PIG/PigTestProposal Please take a look and provide feedback. Alan. +1 for the proposal. -Thejas
Re: Pig testing proposal
I think having SQL as a way to generate benchmark has some value, and we should be open to having that option in e2e harness as well. But I don't see that as a blocker. In some cases, I would expect that writing an alternative pig-latin query to generate benchmark might not be easy, and there is also the danger that the alternative script also has the same bug which results buggy benchmark data. -Thejas On 7/14/11 3:51 PM, Thejas Nair wrote: On 7/14/11 2:39 PM, Alan Gates wrote: I have posted a proposal for changes in Pig's testing that I would like to make. https://cwiki.apache.org/confluence/display/PIG/PigTestProposal Please take a look and provide feedback. Alan. +1 for the proposal. -Thejas
Re: Review Request: POProject.getNext(DataBag) does not handle null
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/763/#review687 --- Ship it! +1 - thejas On 2011-05-19 17:46:48, Daniel Dai wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/763/ --- (Updated 2011-05-19 17:46:48) Review request for pig and thejas. Summary --- See PIG-2078 This addresses bug PIG-2078. https://issues.apache.org/jira/browse/PIG-2078 Diffs - trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POProject.java 1100118 trunk/test/org/apache/pig/test/TestEvalPipeline2.java 1100118 Diff: https://reviews.apache.org/r/763/diff Testing --- Test-patch: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Unit test: all pass End-to-end test: all pass Thanks, Daniel
Re: Review Request: complex type casting should return null on casting failure
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/673/#review606 --- Ship it! +1 - thejas On 2011-04-28 20:56:30, Daniel Dai wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/673/ --- (Updated 2011-04-28 20:56:30) Review request for pig and thejas. Summary --- See PIG-1989 This addresses bug PIG-1989. https://issues.apache.org/jira/browse/PIG-1989 Diffs - http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POCast.java 1097304 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestPOCast.java 1097304 Diff: https://reviews.apache.org/r/673/diff Testing --- Test-patch: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Unit test: all pass Thanks, Daniel
Re: Review Request: incorrect schema shown when project-star is used with other projections
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/624/#review499 --- http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/visitor/LineageFindRelVisitor.java https://reviews.apache.org/r/624/#comment1029 If there are multiple group-by columns, the group column will be a tuple. This will associate the load function only to the tuple and not the uids of the columns within the tuple. Need to associated load function to inner-uids as well like its done in mapMatchLoadFuncToUid - thejas On 2011-04-19 21:20:10, Daniel Dai wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/624/ --- (Updated 2011-04-19 21:20:10) Review request for pig and thejas. Summary --- See PIG-1910 This addresses bug PIG-1910. https://issues.apache.org/jira/browse/PIG-1910 Diffs - http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/expression/DereferenceExpression.java 1095145 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/expression/ExpToPhyTranslationVisitor.java 1095145 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/expression/ProjectExpression.java 1095145 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LOCogroup.java 1095145 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/visitor/ColumnAliasConversionVisitor.java 1095145 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/visitor/LineageFindRelVisitor.java 1095145 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/visitor/UDFFinder.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/QueryParserDriver.java 1095145 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestPigServer.java 1095145 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestPlanGeneration.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestTypeCheckingValidatorNewLP.java 1095145 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/Util.java 1095145 Diff: https://reviews.apache.org/r/624/diff Testing --- Test-patch: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 12 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Unit test: all pass Thanks, Daniel
Re: Review Request: Secondary sort fail when dereferencing two fields inside foreach
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/621/#review500 --- Ship it! +1 - thejas On 2011-04-19 00:37:31, Daniel Dai wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/621/ --- (Updated 2011-04-19 00:37:31) Review request for pig and thejas. Summary --- See PIG-1978 This addresses bug PIG-1978. https://issues.apache.org/jira/browse/PIG-1978 Diffs - http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/SecondaryKeyOptimizer.java 1091982 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestSecondarySort.java 1091982 Diff: https://reviews.apache.org/r/621/diff Testing --- Test-patch: [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Unit test: all pass Thanks, Daniel
Re: Review Request: New logical plan: Should not push up filter in front of Bincond
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/544/#review382 --- Ship it! +1 - thejas On 2011-04-04 18:10:55, Daniel Dai wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/544/ --- (Updated 2011-04-04 18:10:55) Review request for pig and thejas. Summary --- The following script produce wrong result: data = LOAD 'data.txt' using PigStorage() as (referrer:chararray, canonical_url:chararray, ip:chararray); best_url = FOREACH data GENERATE ((canonical_url != '' and canonical_url is not null) ? canonical_url : referrer) AS url, ip; filtered = FILTER best_url BY url == 'badsite.com'; dump filtered; data.txt: badsite.com 127.0.0.1 goodsite.com/1?foo=true goodsite.com 127.0.0.1 Expected: (badsite.com,127.0.0.1) We get nothing. This addresses bug PIG-1935. https://issues.apache.org/jira/browse/PIG-1935 Diffs - http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/expression/BinCondExpression.java 1085215 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestNewPlanFilterAboveForeach.java 1085215 Diff: https://reviews.apache.org/r/544/diff Testing --- test-patch: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Unit test: all pass End-to-end test: all pass Thanks, Daniel
Re: Review Request: Dereference a bag within a tuple does not work
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/524/#review375 --- Ship it! +1 - thejas On 2011-03-24 12:22:48, Daniel Dai wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/524/ --- (Updated 2011-03-24 12:22:48) Review request for pig and thejas. Summary --- The following script does not work (both in new and old logical plan): a = load '1.txt' as (t : tuple(i: int, b1: bag { b_tuple : tuple ( b_str: chararray) })); b = foreach a generate t.b1; dump b; 1.txt: (1,{(one),(two)}) Error from old logical plan: java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be cast to org.apache.pig.data.DataBag at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:482) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:197) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:480) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:197) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:339) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:237) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:232) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) Error from new logical plan: java.lang.NullPointerException at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.consumeInputBag(POProject.java:246) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:200) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:339) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:291) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:237) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:232) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:53) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) If we change b = foreach a generate t.b1; to b = foreach a generate t.i;, it works fine, only refer to a bag does not work. This addresses bug PIG-1866. https://issues.apache.org/jira/browse/PIG-1866 Diffs - http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java 1084415 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/expressionOperators/POProject.java 1084415 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LogToPhyTranslationVisitor.java 1084415 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestEvalPipeline2.java 1084415 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/data/GoldenFiles/MRC15.gld 1084415 Diff: https://reviews.apache.org/r/524/diff Testing --- test-patch: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 6 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs
Re: Review Request: New logical plan fails when I have complex data types from udf
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/526/#review354 --- Ship it! +1 - thejas On 2011-03-25 11:51:15, Daniel Dai wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/526/ --- (Updated 2011-03-25 11:51:15) Review request for pig and thejas. Summary --- The new logical plan fails when I have complex data types returning from my eval function. The below is my script : register myudf.jar; B1 = load 'myinput' as (id:chararray,ts:int,url:chararray); B2 = group B1 by id; B = foreach B2 { Tuples = order B1 by ts; generate Tuples; }; C1 = foreach B generate TransformToMyDataType(Tuples,-1,0,1) as seq: { t: ( previous, current, next ) }; C2 = foreach C1 generate FLATTEN(seq); C3 = foreach C2 generate current.id as id; dump C3; On C3 it fails with below message : Couldn't find matching uid -1 for project (Name: Project Type: bytearray Uid: 45 Input: 0 Column: 1) The below is the describe on C1 ; C1: {seq: {t: (previous: (id: chararray,ts: int,url: chararray),current: (id: chararray,ts: int,url: chararray),next: (id: chararray,ts: int,url: chararray))}} The script works if I turn off new logical plan or use Pig 0.7. This addresses bug PIG-1868. https://issues.apache.org/jira/browse/PIG-1868 Diffs - http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/relational/LogicalSchema.java 1081999 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestSchema.java 1081999 Diff: https://reviews.apache.org/r/526/diff Testing --- Thanks, Daniel
Re: Review Request: Switch to new parser generator technology
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/459/#review282 --- http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/util/MultiMap.java https://reviews.apache.org/r/459/#comment528 In several places in the code, an assumption is made that what it returns is a list (including casts to list), so I changed the return type to list. To prevent findbugs warnings, any casts to lists of the return value has now been removed from other classes. http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/FunctionType.java https://reviews.apache.org/r/459/#comment524 this is likely to give findbug warnings for unused variables. (Change can be part of separate incremental patch). http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/parser/ParserException.java https://reviews.apache.org/r/459/#comment521 Typo Failed to parse: . (Change can be part of separate incremental patch). http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestLogToPhyCompiler.java https://reviews.apache.org/r/459/#comment527 It will be good to have these tests migrated to new logical plan. - thejas On 2011-03-02 17:16:11, Daniel Dai wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/459/ --- (Updated 2011-03-02 17:16:11) Review request for pig, Daniel Dai, thejas, and Xuefu Zhang. Summary --- There are many bugs in Pig related to the parser, particularly to bad error messages. After review of Java CC we feel these will be difficult to address using that tool. Also, the .jjt files used by JavaCC are hard to understand and maintain. ANTLR is being reviewed as the most likely choice to move to, but other parsers will be reviewed as well. This JIRA will act as an umbrella issue for other parser issues. This addresses bug PIG-1618. https://issues.apache.org/jira/browse/PIG-1618 Diffs - http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/Main.java 1076316 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/PigServer.java 1076316 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/SortInfoSetter.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/StandAloneParser.java 1076316 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/HExecutionEngine.java 1076316 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MRCompiler.java 1076316 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/MapReduceLauncher.java 1076316 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/LogToPhyTranslationVisitor.java 1076316 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/POMergeJoin.java 1076316 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/logicalLayer/LOCogroup.java 1076316 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/logicalLayer/LOJoin.java 1076316 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/logicalLayer/ProjectFixerUpper.java 1076316 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/logicalLayer/optimizer/PushDownForeachFlatten.java 1076316 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/logicalLayer/optimizer/PushUpFilter.java 1076316 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/logicalLayer/schema/Schema.java 1076316 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/plan/OperatorPlan.java 1076316 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/impl/util/MultiMap.java 1076316 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/BaseOperatorPlan.java 1076316 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/OperatorPlan.java 1076316 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/OperatorSubPlan.java 1076316 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/LogicalExpPlanMigrationVistor.java 1076316 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/LogicalPlanMigrationVistor.java 1076316 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/Util.java 1076316
Re: Review Request: New logical plan: FilterLogicExpressionSimplifier fail to deal with UDF
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/356/#review223 --- Ship it! - thejas On 2011-02-14 17:00:02, Daniel Dai wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/356/ --- (Updated 2011-02-14 17:00:02) Review request for pig and thejas. Summary --- The following script fail: a = load '1.txt' as (a0, a1); b = filter a by (a0 is not null or a1 is not null) and IsEmpty(a0); explain b; Error message: Caused by: java.lang.ClassCastException: org.apache.pig.newplan.logical.expression.UserFuncExpression cannot be cast to org.apache.pig.newplan.logical.expression.BinaryExpression at org.apache.pig.newplan.logical.rules.LogicalExpressionSimplifier$LogicalExpressionSimplifierTransformer.handleBinary(LogicalExpressionSimplifier.java:561) at org.apache.pig.newplan.logical.rules.LogicalExpressionSimplifier$LogicalExpressionSimplifierTransformer.handleAnd(LogicalExpressionSimplifier.java:429) at org.apache.pig.newplan.logical.rules.LogicalExpressionSimplifier$LogicalExpressionSimplifierTransformer.inferRelationship(LogicalExpressionSimplifier.java:397) at org.apache.pig.newplan.logical.rules.LogicalExpressionSimplifier$LogicalExpressionSimplifierTransformer.handleDNFOr(LogicalExpressionSimplifier.java:281) at org.apache.pig.newplan.logical.rules.LogicalExpressionSimplifier$LogicalExpressionSimplifierTransformer.checkDNFLeaves(LogicalExpressionSimplifier.java:192) at org.apache.pig.newplan.logical.rules.LogicalExpressionSimplifier$LogicalExpressionSimplifierTransformer.transform(LogicalExpressionSimplifier.java:108) at org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:110) This addresses bug PIG-1820. https://issues.apache.org/jira/browse/PIG-1820 Diffs - http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/newplan/logical/rules/LogicalExpressionSimplifier.java 1062989 http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestFilterSimplification.java 1062989 Diff: https://reviews.apache.org/r/356/diff Testing --- Test-patch: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. Unit test: all pass End-to-end test: all pass Thanks, Daniel
Re: Review Request: Disable converting bytes loading from BinStorage
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/134/#review55 --- http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/BinStorageWithCaster.java https://reviews.apache.org/r/134/#comment37 I think BinStorageWithCaster should implement LoadCaster interface. - thejas On 2010-12-01 13:43:29, Daniel Dai wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/134/ --- (Updated 2010-12-01 13:43:29) Review request for pig. Summary --- Change behavior of converting bytes loading from BinStorage. 1. Converting bytes loading from BinStorage() will now result an error. 2. If user clearly understand that the data is load from PigStorage (or other LoadFunc using Utf8StorageConverter), he/she should use BinStorageWithCaster. By doing this, converting bytes to other type will still work. This addresses bug PIG-1745. https://issues.apache.org/jira/browse/PIG-1745 Diffs - http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/BinStorage.java 1040653 http://svn.apache.org/repos/asf/pig/trunk/src/org/apache/pig/builtin/BinStorageWithCaster.java PRE-CREATION http://svn.apache.org/repos/asf/pig/trunk/test/org/apache/pig/test/TestEvalPipeline2.java 1040653 Diff: https://reviews.apache.org/r/134/diff Testing --- test-patch: [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. unit-test: all pass end-to-end test: all pass Thanks, Daniel