[jira] [Commented] (PIG-4764) Make Pig work with Hive 2.0
[ https://issues.apache.org/jira/browse/PIG-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17018246#comment-17018246 ] Olga Natkovich commented on PIG-4764: - I agree with Koji. As long as this does not break Pig with Hive 1 and as long as somebody commits to test the release with Hive 2, that should work. > Make Pig work with Hive 2.0 > --- > > Key: PIG-4764 > URL: https://issues.apache.org/jira/browse/PIG-4764 > Project: Pig > Issue Type: Improvement > Components: impl >Reporter: Jianyong Dai >Assignee: Jianyong Dai >Priority: Major > Fix For: 0.18.0 > > Attachments: PIG-4764-0.patch, PIG-4764-1.patch, PIG-4764-2.patch, > PIG-4764-3.patch, PIG-4764-4.patch > > > There are a lot of changes especially around ORC in Hive 2.0. We need to make > Pig work with it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: New Apache Pig Committer: Nandor Kollar
Congratulations, Nandor!!! Olga On Thursday, September 6, 2018, 12:59:38 PM PDT, Koji Noguchi wrote: On behalf of the Apache Pig PMC, it is my pleasure to announce that Nandor Kollar has accepted the invitation to become an Apache Pig committer. We appreciate all the work Nandor has done and look forward to seeing continued involvement. Please join me in congratulating Nandor! Thanks, Koji
[jira] [Commented] (PIG-5336) Drop old documents from the site
[ https://issues.apache.org/jira/browse/PIG-5336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16509857#comment-16509857 ] Olga Natkovich commented on PIG-5336: - +1 > Drop old documents from the site > > > Key: PIG-5336 > URL: https://issues.apache.org/jira/browse/PIG-5336 > Project: Pig > Issue Type: Improvement > Components: site >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Trivial > Attachments: pig-5336-redirect.patch > > > When working on PIG-5334, saw bunch of old documents still being uploaded on > svn > {noformat} > knoguchi@truelisten-lm site> ls publish/docs/ | sort -V > r0.7.0/ > r0.8.1/ > r0.9.1/ > r0.9.2/ > r0.10.0/ > r0.10.1/ > r0.11.0/ > r0.11.1/ > r0.12.0/ > r0.12.1/ > r0.13.0/ > r0.14.0/ > r0.15.0/ > r0.16.0/ > r0.17.0/ > {noformat} > Sometimes I see our users referencing old documents due to this. > We should retire most of them and leave the recent ones. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PIG-5336) Drop old documents from the site
[ https://issues.apache.org/jira/browse/PIG-5336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434557#comment-16434557 ] Olga Natkovich commented on PIG-5336: - If I remember correctly, in the past we had a policy of keeping 3 most recent releases available. > Drop old documents from the site > > > Key: PIG-5336 > URL: https://issues.apache.org/jira/browse/PIG-5336 > Project: Pig > Issue Type: Improvement > Components: site >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Trivial > > When working on PIG-5334, saw bunch of old documents still being uploaded on > svn > {noformat} > knoguchi@truelisten-lm site> ls publish/docs/ | sort -V > r0.7.0/ > r0.8.1/ > r0.9.1/ > r0.9.2/ > r0.10.0/ > r0.10.1/ > r0.11.0/ > r0.11.1/ > r0.12.0/ > r0.12.1/ > r0.13.0/ > r0.14.0/ > r0.15.0/ > r0.16.0/ > r0.17.0/ > {noformat} > Sometimes I see our users referencing old documents due to this. > We should retire most of them and leave the recent ones. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PIG-5334) Update our site to follow a foundation request
[ https://issues.apache.org/jira/browse/PIG-5334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16434440#comment-16434440 ] Olga Natkovich commented on PIG-5334: - +1. Koji thanks for taking care of this! > Update our site to follow a foundation request > -- > > Key: PIG-5334 > URL: https://issues.apache.org/jira/browse/PIG-5334 > Project: Pig > Issue Type: Improvement > Components: site >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Attachments: Screen Shot 2018-04-09 at 4.11.08 PM.png, Screen > Shot-02-top-left.png, Screen Shot-03-bottom-left.png.png, Screen > Shot-04-left-column.png, Screen Shot-05-left-column-annoted.png, > pig-5334-v01.patch, pig-5334-v02-top-left.patch, > pig-5334-v03-bottom-left.patch, pig-5334-v04-left-column.patch, > pig-5334-v05-left-column.patch > > > Today, there was a request from the foundation to add an Apache event logo to > our Apache Pig site. > Details at [http://apache.org/events/README.txt] > Basically asking us to add > [https://www.apache.org/events/current-event-234x60.png] > or > [https://www.apache.org/events/current-event-125x125.png] > to our site. > Besides from this, email mentioned about general apache site suggestion > outlined at [https://www.apache.org/foundation/marks/pmcs#navigation] > * "License" should link to: [http://www.apache.org/licenses/] > * "Sponsorship" or "Donate" should link to: > [http://www.apache.org/foundation/sponsorship.html] > * "Thanks" should link to: [http://www.apache.org/foundation/thanks.html] > * "Security" should link to either to a project-specific page detailing how > users may securely report potential vulnerabilities, or to the main > [http://www.apache.org/security/] page -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PIG-5334) Update our site to follow a foundation request
[ https://issues.apache.org/jira/browse/PIG-5334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432921#comment-16432921 ] Olga Natkovich commented on PIG-5334: - So are we required to put this advertisement or is it part of good "will"? Perhaps we put the one for the ApacheCon as replacement for Hadoop and then revert? I don't it is worth our time to make changes to the Forest setup. > Update our site to follow a foundation request > -- > > Key: PIG-5334 > URL: https://issues.apache.org/jira/browse/PIG-5334 > Project: Pig > Issue Type: Improvement > Components: site >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Attachments: Screen Shot 2018-04-09 at 4.11.08 PM.png, Screen > Shot-02-top-left.png, Screen Shot-03-bottom-left.png.png, pig-5334-v01.patch, > pig-5334-v02-top-left.patch, pig-5334-v03-bottom-left.patch > > > Today, there was a request from the foundation to add an Apache event logo to > our Apache Pig site. > Details at [http://apache.org/events/README.txt] > Basically asking us to add > [https://www.apache.org/events/current-event-234x60.png] > or > [https://www.apache.org/events/current-event-125x125.png] > to our site. > Besides from this, email mentioned about general apache site suggestion > outlined at [https://www.apache.org/foundation/marks/pmcs#navigation] > * "License" should link to: [http://www.apache.org/licenses/] > * "Sponsorship" or "Donate" should link to: > [http://www.apache.org/foundation/sponsorship.html] > * "Thanks" should link to: [http://www.apache.org/foundation/thanks.html] > * "Security" should link to either to a project-specific page detailing how > users may securely report potential vulnerabilities, or to the main > [http://www.apache.org/security/] page -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PIG-5334) Update our site to follow a foundation request
[ https://issues.apache.org/jira/browse/PIG-5334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432809#comment-16432809 ] Olga Natkovich commented on PIG-5334: - Agree with Rohini. Did not realize this was not a temp change. > Update our site to follow a foundation request > -- > > Key: PIG-5334 > URL: https://issues.apache.org/jira/browse/PIG-5334 > Project: Pig > Issue Type: Improvement > Components: site >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Attachments: Screen Shot 2018-04-09 at 4.11.08 PM.png, Screen > Shot-02-top-left.png, Screen Shot-03-bottom-left.png.png, pig-5334-v01.patch, > pig-5334-v02-top-left.patch, pig-5334-v03-bottom-left.patch > > > Today, there was a request from the foundation to add an Apache event logo to > our Apache Pig site. > Details at [http://apache.org/events/README.txt] > Basically asking us to add > [https://www.apache.org/events/current-event-234x60.png] > or > [https://www.apache.org/events/current-event-125x125.png] > to our site. > Besides from this, email mentioned about general apache site suggestion > outlined at [https://www.apache.org/foundation/marks/pmcs#navigation] > * "License" should link to: [http://www.apache.org/licenses/] > * "Sponsorship" or "Donate" should link to: > [http://www.apache.org/foundation/sponsorship.html] > * "Thanks" should link to: [http://www.apache.org/foundation/thanks.html] > * "Security" should link to either to a project-specific page detailing how > users may securely report potential vulnerabilities, or to the main > [http://www.apache.org/security/] page -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PIG-5334) Update our site to follow a foundation request
[ https://issues.apache.org/jira/browse/PIG-5334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432687#comment-16432687 ] Olga Natkovich commented on PIG-5334: - [~knoguchi] thanks for working on this. Your changes to incorporate license, etc. look good to me. For the picture, top left or top right would work. Since it is a temp change, I don't have a strong opinion. (Don't think it would be visible if we put it at the bottom.) Seems like we should put the security link in just to satisfy the requirement but again no strong opinion. > Update our site to follow a foundation request > -- > > Key: PIG-5334 > URL: https://issues.apache.org/jira/browse/PIG-5334 > Project: Pig > Issue Type: Improvement > Components: site >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Minor > Attachments: Screen Shot 2018-04-09 at 4.11.08 PM.png, Screen > Shot-02-top-left.png, Screen Shot-03-bottom-left.png.png, pig-5334-v01.patch, > pig-5334-v02-top-left.patch, pig-5334-v03-bottom-left.patch > > > Today, there was a request from the foundation to add an Apache event logo to > our Apache Pig site. > Details at [http://apache.org/events/README.txt] > Basically asking us to add > [https://www.apache.org/events/current-event-234x60.png] > or > [https://www.apache.org/events/current-event-125x125.png] > to our site. > Besides from this, email mentioned about general apache site suggestion > outlined at [https://www.apache.org/foundation/marks/pmcs#navigation] > * "License" should link to: [http://www.apache.org/licenses/] > * "Sponsorship" or "Donate" should link to: > [http://www.apache.org/foundation/sponsorship.html] > * "Thanks" should link to: [http://www.apache.org/foundation/thanks.html] > * "Security" should link to either to a project-specific page detailing how > users may securely report potential vulnerabilities, or to the main > [http://www.apache.org/security/] page -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (PIG-5307) NPE in TezOperDependencyParallelismEstimator
[ https://issues.apache.org/jira/browse/PIG-5307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16188921#comment-16188921 ] Olga Natkovich commented on PIG-5307: - +1 > NPE in TezOperDependencyParallelismEstimator > > > Key: PIG-5307 > URL: https://issues.apache.org/jira/browse/PIG-5307 > Project: Pig > Issue Type: Bug >Reporter: Rohini Palaniswamy >Assignee: Rohini Palaniswamy >Priority: Minor > Fix For: 0.18.0 > > Attachments: PIG-5307-1.patch > > > In case of the constant being null, NPE is thrown. This was encountered by a > user who was generating the field name based on a condition which expanded to > NULL when condition was not met. For eg: > {code} > x = FILTER x BY (chararray) NULL == 'fieldvalue'; > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (PIG-5229) TestPigTest.testSpecificOrderOutput and testSpecificOrderOutputForAlias failing
[ https://issues.apache.org/jira/browse/PIG-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15989190#comment-15989190 ] Olga Natkovich commented on PIG-5229: - +1 > TestPigTest.testSpecificOrderOutput and testSpecificOrderOutputForAlias > failing > --- > > Key: PIG-5229 > URL: https://issues.apache.org/jira/browse/PIG-5229 > Project: Pig > Issue Type: Test >Reporter: Koji Noguchi >Assignee: Koji Noguchi >Priority: Trivial > Attachments: pig-5229-v01.patch, pig-5229-v02.patch > > > Error message > {noformat} > junit.framework.AssertionFailedError: expected:<([twitter,7) > (yahoo,25) > (facebook,15])> but was:<([yahoo,25) > (facebook,15) > (twitter,7])> > at org.apache.pig.pigunit.PigTest.assertEquals(PigTest.java:438) > at org.apache.pig.pigunit.PigTest.assertOutput(PigTest.java:385) > at org.apache.pig.pigunit.PigTest.assertOutput(PigTest.java:375) > at > org.apache.pig.test.pigunit.TestPigTest.testSpecificOrderOutput(TestPigTest.java:572) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (PIG-2315) Make as clause work in generate
[ https://issues.apache.org/jira/browse/PIG-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14062246#comment-14062246 ] Olga Natkovich commented on PIG-2315: - Are there plans to get this one into Pig 14? Make as clause work in generate --- Key: PIG-2315 URL: https://issues.apache.org/jira/browse/PIG-2315 Project: Pig Issue Type: Bug Reporter: Olga Natkovich Assignee: Gianmarco De Francisci Morales Fix For: 0.14.0 Attachments: PIG-2315-1.patch, PIG-2315-1.patch Currently, the following syntax is supported and ignored causing confusing with users: A1 = foreach A1 generate a as a:chararray ; After this statement a just retains its previous type -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: [ANNOUNCE] Welcome new Pig Committer - Lorand Bendig
Congrats, Lorand! On Tuesday, June 24, 2014 9:04 AM, Mona Chitnis m...@apache.org wrote: Congrats Lorand! Mona Chitnis Yahoo! On Tuesday, June 24, 2014 7:14 AM, Aniket Mokashi aniket...@gmail.com wrote: Congrats On Tue, Jun 24, 2014 at 2:03 AM, Lorand Bendig lben...@gmail.com wrote: Thank you for all of you! --Lorand On 06/23/2014 11:41 PM, Mark Wagner wrote: Congrats and welcome, Lorand! On Sun, Jun 22, 2014 at 6:39 PM, Koji Noguchi knogu...@yahoo-inc.com.invalid wrote: Congrats!!! On 6/22/14, 9:08 PM, Rohini Palaniswamy rohini.adi...@gmail.com wrote: Congratulations Lorand !!! On Sun, Jun 22, 2014 at 2:47 PM, Xuefu Zhang xzh...@cloudera.com wrote: Many congrats, Lorand! --Xuefu On Sun, Jun 22, 2014 at 12:54 PM, Daniel Dai da...@hortonworks.com wrote: Congratulations! On Sun, Jun 22, 2014 at 7:00 AM, Jarek Jarcec Cecho jar...@apache.org wrote: Congratulations Lorand, well deserved! Jarcec On Sat, Jun 21, 2014 at 10:30:01PM -0700, Cheolsoo Park wrote: It is my pleasure to announce that Lorand Bendig became the newest addition to the Pig Committers! Lorand has been actively contributing to Pig for a year now. Please join me in congratulating Lorand! -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- ...:::Aniket:::... Quetzalco@tl
Re: Plan to merge tez branch into trunk and branch 0.13
Is anybody interested in driving Pig 13 to release? It would be good to get a volunteer to be the release manager for 13 before deciding to branch. On Wednesday, May 14, 2014 9:07 AM, Cheolsoo Park piaozhe...@gmail.com wrote: I'm also +1 on releasing 0.13 and then merging Tez branch. On Tue, May 13, 2014 at 11:52 PM, Prashant Kommireddi prash1...@gmail.comwrote: I'm a +1 on branching 0.13 and merging Tez branch with trunk. That would give us ample time to make 0.14 stable with tez merged. On Tue, May 13, 2014 at 9:05 PM, Daniel Dai da...@hortonworks.com wrote: We will make sure all existing unit tests / e2e tests pass in MR mode before merge, but it is possible we might hit some issues which are not captured by existing tests after merge. I can hardly tell how much time will it take to say the codebase is stable enough at this moment, but it is better to merge to trunk early, so more Pig developers can try the merged codebase and have more time to capture issues before we release Pig with tez (most probably Pig 0.14.0). Thanks, Daniel On Tue, May 13, 2014 at 3:48 PM, Prashant Kommireddi prash1...@gmail.com wrote: Hi Daniel, How long do you think might it take for the merge to stabilize? Thanks, Prashant On Tue, May 13, 2014 at 11:47 AM, Daniel Dai da...@hortonworks.com wrote: Hi, Pig devs, After several months development, Tez branch is becoming stable and we plan to merge tez branch to trunk in the next few weeks. Several weeks ago, we have a discussion about branching 0.13, and if we still have interest to do a release before merging tez, we shall do it now. Thoughts? Thanks, Daniel -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: Pig 0.13.0 release
Just going by the list that Aniket provided, I don't really see enough for a full release. Two mentioned JIRAs are doc updates and one is a bug fix that was ported into Pig 12. On Wednesday, February 5, 2014 3:13 PM, Aniket Mokashi aniket...@gmail.com wrote: Hi All, A good number of improvements and bug fixes have gone into trunk recently. I'd like to know if we can roll out a Pig 0.13 release around mid-March? I am aware that we are planning to merge tez branch into trunk soon. However, making a release before tez branch is merged will be good. Any objections? Following are few jiras we need to wrap up before 0.13 release- PIG-3591 PIG-3740 PIG-3745 PIG-3347 PIG-3731 Any other? Thanks, Aniket
Re: Welcome to the newest Pig Committer - Mark Wagner
Congrats, Mark! On Monday, February 3, 2014 1:41 PM, Mona Chitnis mona.chit...@yahoo.in wrote: Congrats Mark! -- Mona Chitnis Yahoo! On Saturday, February 1, 2014 12:01 AM, Mark Wagner wagner.mar...@gmail.com wrote: Thanks everyone! I'm very excited to join all of you and I look forward to making some good contributions in the future! -Mark On Fri, Jan 31, 2014 at 8:51 PM, Koji Noguchi knogu...@yahoo-inc.com wrote: Congrats!!! On Jan 31, 2014, at 8:42 PM, Cheolsoo Park piaozhe...@gmail.com wrote: Congrats Mark! Look forward to many more contributions! On Fri, Jan 31, 2014 at 5:36 PM, Jarek Jarcec Cecho jar...@apache.orgwrote: Congratulations Mark, good job! Jarcec On Fri, Jan 31, 2014 at 05:20:26PM -0800, Julien Le Dem wrote: It is my pleasure to announce that Mark Wagner became the newest addition to the Pig Committers! Mark has been actively contributing to Pig and in particular to the Pig-on-Tez effort. Please, join me in congratulating Mark!
Re: Welcome to the new Pig PMC member Aniket Mokashi
Congrats, Aniket! On Tuesday, January 14, 2014 8:32 PM, Tongjie Chen tongjie.c...@gmail.com wrote: Congrats Aniket! On Tue, Jan 14, 2014 at 8:12 PM, Cheolsoo Park piaozhe...@gmail.com wrote: Congrats Aniket! On Tue, Jan 14, 2014 at 7:01 PM, Jarek Jarcec Cecho jar...@apache.org wrote: Congratulations Aniket, good work! Jarcec On Tue, Jan 14, 2014 at 06:52:10PM -0800, JULIEN LE DEM wrote: It's my pleasure to announce that Aniket Mokashi became the newest addition to the Pig PMC. Aniket has been actively contributing to Pig for years. Please join me in congratulating Aniket! Julien
Re: How do we determine 'stable' pig version?
If by stable we mean something we released, I don't see this label to be needed/useful at all. On Wednesday, October 23, 2013 8:01 AM, Koji Noguchi knogu...@yahoo-inc.com wrote: Thanks Alan, Daniel. Taking back my request on 'stable' criteria. Koji On Oct 22, 2013, at 7:18 PM, Alan Gates ga...@hortonworks.com wrote: I don't think we should change our use of stable. Our usage is in line with the Hadoop usage of the term in their releases. To the best of our knowledge as Apache developers it is stable. It passes all of the tests we have. We have no criteria for deciding stability beyond this. Alan. On Oct 22, 2013, at 4:00 PM, Daniel Dai wrote: Yes, we can revisit. The question is how to determine the stability? 0.11.1 is released for a while and should be considered stable, but actually it contains problem raised just recently. After we release 0.12.1, how soon should we declare it a stable release? Thanks, Daniel On Tue, Oct 22, 2013 at 2:25 PM, Koji Noguchi knogu...@yahoo-inc.comwrote: Thanks Daniel, Olga! Keeping 3 versions would be nice. As for 'stable', can we revisit the definition? If it's *always* pointing to the latest release, I don't see the need for having this link(dir). Is it adding any value? Koji On Oct 22, 2013, at 1:43 PM, Daniel Dai da...@hortonworks.com wrote: That's totally make sense. Let's keep both download/documentation for 3 versions. Thanks, Daniel On Tue, Oct 22, 2013 at 10:20 AM, Olga Natkovich onatkov...@yahoo.com wrote: Couple of suggestions: (1) I think we are trying to go for a more frequent release model and in that case it would make sense to keep perhaps 3 releases. Based on our experience at Yahoo, Pig 10 is the really stable release. We recently found a couple of critical bugs in 11 for which we posted patches. Also the community knows that we delayed a couple of key bugs in 12 till 12.1 (2) Our documentation needs to be consistent with the number of releases we advertise as supported. Our docs currently go all the way to Pig 9. Olga On Tuesday, October 22, 2013 10:13 AM, Daniel Dai da...@hortonworks.com wrote: Hi, Koji, Here is the criteria I use: (i) How do we determine how many releases to show on the front download page? We usually keep two most recent releases on the front page according to https://cwiki.apache.org/confluence/display/PIG/HowToRelease. (ii) How do we determine which release is considered 'stable' ? Here stable means passing all tests, peer reviewed. It does not mean production stable. Actually there is no way for us to know production stable after user download it, use it and gives feedback. That's why we will continue fixing bugs after major release. and make minor releases. Thanks, Daniel On Tue, Oct 22, 2013 at 9:45 AM, Koji Noguchi knogu...@yahoo-inc.com wrote: When I went to the pig release download page (through http://www.apache.org/dyn/closer.cgi/pig), I only saw 0.11.1 and 0.12 available. I later learned that there is an 'archive' link( http://archive.apache.org/dist/pig/) that list other versions (0.8 to 0.10). Two questions. (i) How do we determine how many releases to show on the front download page? (ii) How do we determine which release is considered 'stable' ? I still consider the stable version to be 0.10.1 so I was surprised not to see that available on the front download page and even more surprised to see release 0.12 flagged as 'stable'. Koji -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law
Re: How do we determine 'stable' pig version?
Couple of suggestions: (1) I think we are trying to go for a more frequent release model and in that case it would make sense to keep perhaps 3 releases. Based on our experience at Yahoo, Pig 10 is the really stable release. We recently found a couple of critical bugs in 11 for which we posted patches. Also the community knows that we delayed a couple of key bugs in 12 till 12.1 (2) Our documentation needs to be consistent with the number of releases we advertise as supported. Our docs currently go all the way to Pig 9. Olga On Tuesday, October 22, 2013 10:13 AM, Daniel Dai da...@hortonworks.com wrote: Hi, Koji, Here is the criteria I use: (i) How do we determine how many releases to show on the front download page? We usually keep two most recent releases on the front page according to https://cwiki.apache.org/confluence/display/PIG/HowToRelease. (ii) How do we determine which release is considered 'stable' ? Here stable means passing all tests, peer reviewed. It does not mean production stable. Actually there is no way for us to know production stable after user download it, use it and gives feedback. That's why we will continue fixing bugs after major release. and make minor releases. Thanks, Daniel On Tue, Oct 22, 2013 at 9:45 AM, Koji Noguchi knogu...@yahoo-inc.comwrote: When I went to the pig release download page (through http://www.apache.org/dyn/closer.cgi/pig), I only saw 0.11.1 and 0.12 available. I later learned that there is an 'archive' link( http://archive.apache.org/dist/pig/) that list other versions (0.8 to 0.10). Two questions. (i) How do we determine how many releases to show on the front download page? (ii) How do we determine which release is considered 'stable' ? I still consider the stable version to be 0.10.1 so I was surprised not to see that available on the front download page and even more surprised to see release 0.12 flagged as 'stable'. Koji -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Commented] (PIG-3480) TFile-based tmpfile compression crashes in some cases
[ https://issues.apache.org/jira/browse/PIG-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13781959#comment-13781959 ] Olga Natkovich commented on PIG-3480: - Agree with Rohini. Changing default just because we found a bug does not seem like a sound approach, TFile-based tmpfile compression crashes in some cases - Key: PIG-3480 URL: https://issues.apache.org/jira/browse/PIG-3480 Project: Pig Issue Type: Bug Reporter: Dmitriy V. Ryaboy Fix For: 0.12.0 Attachments: PIG-3480.patch When pig tmpfile compression is on, some jobs fail inside core hadoop internals. Suspect TFile is the problem, because an experiment in replacing TFile with SequenceFile succeeded. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (PIG-3480) TFile-based tmpfile compression crashes in some cases
[ https://issues.apache.org/jira/browse/PIG-3480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13776745#comment-13776745 ] Olga Natkovich commented on PIG-3480: - Could this be related to Hadoop version? TFile-based tmpfile compression crashes in some cases - Key: PIG-3480 URL: https://issues.apache.org/jira/browse/PIG-3480 Project: Pig Issue Type: Bug Reporter: Dmitriy V. Ryaboy Fix For: 0.12 Attachments: PIG-3480.patch When pig tmpfile compression is on, some jobs fail inside core hadoop internals. Suspect TFile is the problem, because an experiment in replacing TFile with SequenceFile succeeded. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Welcome new Pig Committer - Koji Noguchi
It is my pleasure to announce that Koji Noguchi became the newest addition to the Pig Committers! Koji has been actively contributing to Pig for over a year now and has been a part of larger Hadoop community (including Hadoop Committer) for many years now. Please, join me in congratulating Koji! Olga
[jira] [Commented] (PIG-3293) Casting fails after Union from two data sourcesloaders
[ https://issues.apache.org/jira/browse/PIG-3293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13755073#comment-13755073 ] Olga Natkovich commented on PIG-3293: - Would it help to document that typecasting needs to happen before any Union operation? Casting fails after Union from two data sourcesloaders --- Key: PIG-3293 URL: https://issues.apache.org/jira/browse/PIG-3293 Project: Pig Issue Type: Bug Reporter: Koji Noguchi Priority: Minor Attachments: pig-3293-test-only-v01.patch Script similar to {noformat} A = load 'data1' using MyLoader() as (a:bytearray); B = load 'data2' as (a:bytearray); C = union onschema A,B; D = foreach C generate (chararray)a; Store D into './out'; {noformat} fails with java.lang.Exception: org.apache.pig.backend.executionengine.ExecException: ERROR 1075: Received a bytearray from the UDF. Cannot determine how to convert the bytearray to string. Both MyLoader and PigStorage use the default Utf8StorageConverter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3279) Support nested RANK
[ https://issues.apache.org/jira/browse/PIG-3279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750279#comment-13750279 ] Olga Natkovich commented on PIG-3279: - Hi Johnny, Are you still planning to finish this work? If so, what is your timeline? Support nested RANK --- Key: PIG-3279 URL: https://issues.apache.org/jira/browse/PIG-3279 Project: Pig Issue Type: Improvement Reporter: Gianmarco De Francisci Morales Assignee: Johnny Zhang Attachments: PIG-3279-1.patch.txt, PIG-3279-2.patch.txt, PIG-3279-3.patch.txt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3419) Pluggable Execution Engine
[ https://issues.apache.org/jira/browse/PIG-3419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13749106#comment-13749106 ] Olga Natkovich commented on PIG-3419: - I think the reason we wanted it on the Tez branch is that it might evolve with Tez implementation and so we would merge the updated code back when Tez is ready. Since there are no plans for any additional backend, is there a need to apply this to trunk sooner rather than later? Pluggable Execution Engine --- Key: PIG-3419 URL: https://issues.apache.org/jira/browse/PIG-3419 Project: Pig Issue Type: New Feature Affects Versions: 0.12 Reporter: Achal Soni Assignee: Achal Soni Priority: Minor Attachments: execengine.patch, mapreduce_execengine.patch, stats_scriptstate.patch, test_failures.txt, test_suite.patch, updated-8-22-2013-exec-engine.patch In an effort to adapt Pig to work using Apache Tez (https://issues.apache.org/jira/browse/TEZ), I made some changes to allow for a cleaner ExecutionEngine abstraction than existed before. The changes are not that major as Pig was already relatively abstracted out between the frontend and backend. The changes in the attached commit are essentially the barebones changes -- I tried to not change the structure of Pig's different components too much. I think it will be interesting to see in the future how we can refactor more areas of Pig to really honor this abstraction between the frontend and backend. Some of the changes was to reinstate an ExecutionEngine interface to tie together the front end and backend, and making the changes in Pig to delegate to the EE when necessary, and creating an MRExecutionEngine that implements this interface. Other work included changing ExecType to cycle through the ExecutionEngines on the classpath and select the appropriate one (this is done using Java ServiceLoader, exactly how MapReduce does for choosing the framework to use between local and distributed mode). Also I tried to make ScriptState, JobStats, and PigStats as abstract as possible in its current state. I think in the future some work will need to be done here to perhaps re-evaluate the usage of ScriptState and the responsibilities of the different statistics classes. I haven't touched the PPNL, but I think more abstraction is needed here, perhaps in a separate patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (PIG-3351) Datetime objects cannot be stored using BinStorage or JasonLoader/JsonStorage
[ https://issues.apache.org/jira/browse/PIG-3351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich reassigned PIG-3351: --- Assignee: pat chan Datetime objects cannot be stored using BinStorage or JasonLoader/JsonStorage - Key: PIG-3351 URL: https://issues.apache.org/jira/browse/PIG-3351 Project: Pig Issue Type: Bug Components: data Affects Versions: 0.11.1 Reporter: pat chan Assignee: pat chan Priority: Minor Fix For: 0.10.1, 0.11.1 Attachments: PIG-3351.patch There's a bug in BinStorage that prevents datetime objects from being loaded. JsonLoader and JsonStorage does not support datetime objects. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: pig 0.11 candidate 2 feedback: Several problems
I agree that supporting as much as we can is a good goal. The issue is who is going to be testing against all these versions? We found the issues under discussion because of a customer report, not because we consistently test against all versions. Perhaps when we decide which versions to support for next release we need also to agree who is going to be testing and maintaining compatibility with a particular version. For instance since Hadoop 23 compatibility is important for us at Yahoo we have been maintaining compatibility with this version for 0.9, 0.10 and will do the same for 0.11 and going forward. I think we would need others to step in and claim the versions of their interest. Olga From: Kai Londenberg kai.londenb...@googlemail.com To: dev@pig.apache.org Sent: Wednesday, February 20, 2013 1:51 AM Subject: Re: pig 0.11 candidate 2 feedback: Several problems Hi, I stronly agree with Jonathan here. If there are good reasons why you can't support an older version of Hadoop any more, that's one thing. But having to change 2 lines of code doesn't really qualify as such in my point of view ;) At least for me, pig support for 0.20.2 is essential - without it, I can't use it. If it doesn't support it, I'll have to branch pig and hack it myself, or stop using it. I guess, there are a lot of people still running 0.20.2 Clusters. If you really have lots of data stored on HDFS and a continuously busy cluster, an upgrade is nothing you do just because. 2013/2/20 Jonathan Coveney jcove...@gmail.com: I agree that we shouldn't have to support old versions forever. That said, I also don't think we should be too blase about supporting older versions where it is not odious to do so. We have a lot of competition in the language space and the broader the versions we can support, the better (assuming it isn't too odious to do so). In this case, I don't think it should be too hard to change ObjectSerializer so that the commons-codec code used is compatible with both versions...we could just in-line some of the Base64 code, and comment accordingly. That said, we also should be clear about what versions we support, but 6-12 months seems short. The upgrade cycles on Hadoop are really, really long. 2013/2/20 Prashant Kommireddi prash1...@gmail.com Agreed, that makes sense. Probably supporting older hadoop version for a 1 or 2 pig releases before moving to a newer/stable version? Having said that, should we use 0.11 period to communicate the same to the community and start moving on 0.12 onwards? I know we are way past 6-12 months (1-2 release) time frame with 0.20.2, but we also need to make sure users are aware and plan accordingly. I'd also be interested to hear how other projects (Hive, Oozie) are handling this. -Prashant On Tue, Feb 19, 2013 at 3:22 PM, Olga Natkovich onatkov...@yahoo.com wrote: It seems that for each Pig release we need to agree and clearly state which Hadoop versions it will support. I guess the main question is how we decide on this. Perhaps we should say that Pig no longer supports older Hadoop versions once the newer one is out for at least 6-12 month to make sure it is stable. I don't think we can support old versions indefinitely. It is in everybody's interest to keep moving forward. Olga From: Prashant Kommireddi prash1...@gmail.com To: dev@pig.apache.org Sent: Tuesday, February 19, 2013 10:57 AM Subject: Re: pig 0.11 candidate 2 feedback: Several problems What do you guys feel about the JIRA to do with 0.20.2 compatibility (PIG-3194)? I am interested in discussing the strategy around backward compatibility as this is something that would haunt us each time we move to the next hadoop version. For eg, we might be in a similar situation while moving to Hadoop 2.0, when some of the stuff might break for 1.0. I feel it would be good to get this JIRA fix in for 0.11, as 0.20.2 users might be caught unaware. Of course, I must admit there is selfish interest here and it's probably easier for us to have a workaround on Pig rather than upgrade hadoop in all our production DCs. -Prashant On Tue, Feb 19, 2013 at 9:54 AM, Russell Jurney russell.jur...@gmail.com wrote: I think someone should step up and fix the easy ones, if possible. On Tue, Feb 19, 2013 at 9:51 AM, Bill Graham billgra...@gmail.com wrote: Thanks Kai for reporting these. What do people think about the severity of these issues w.r.t. Pig 11? I see a few possible options: 1. We include some or all of these patches in a new Pig 11 rc. We'd want to make sure that they don't destabilize the current branch. This approach makes sense if we think Pig 11 wouldn't be a good release without one or more of these included. 2. We continue with the Pig 11 release without these, but then include
Re: pig 0.11 candidate 2 feedback: Several problems
It seems that for each Pig release we need to agree and clearly state which Hadoop versions it will support. I guess the main question is how we decide on this. Perhaps we should say that Pig no longer supports older Hadoop versions once the newer one is out for at least 6-12 month to make sure it is stable. I don't think we can support old versions indefinitely. It is in everybody's interest to keep moving forward. Olga From: Prashant Kommireddi prash1...@gmail.com To: dev@pig.apache.org Sent: Tuesday, February 19, 2013 10:57 AM Subject: Re: pig 0.11 candidate 2 feedback: Several problems What do you guys feel about the JIRA to do with 0.20.2 compatibility (PIG-3194)? I am interested in discussing the strategy around backward compatibility as this is something that would haunt us each time we move to the next hadoop version. For eg, we might be in a similar situation while moving to Hadoop 2.0, when some of the stuff might break for 1.0. I feel it would be good to get this JIRA fix in for 0.11, as 0.20.2 users might be caught unaware. Of course, I must admit there is selfish interest here and it's probably easier for us to have a workaround on Pig rather than upgrade hadoop in all our production DCs. -Prashant On Tue, Feb 19, 2013 at 9:54 AM, Russell Jurney russell.jur...@gmail.comwrote: I think someone should step up and fix the easy ones, if possible. On Tue, Feb 19, 2013 at 9:51 AM, Bill Graham billgra...@gmail.com wrote: Thanks Kai for reporting these. What do people think about the severity of these issues w.r.t. Pig 11? I see a few possible options: 1. We include some or all of these patches in a new Pig 11 rc. We'd want to make sure that they don't destabilize the current branch. This approach makes sense if we think Pig 11 wouldn't be a good release without one or more of these included. 2. We continue with the Pig 11 release without these, but then include one or more in a 0.11.1 release. 3. We continue with the Pig 11 release without these, but then include them in a 0.12 release. Jon has a patch for the MAP issue (PIG-3144https://issues.apache.org/jira/browse/PIG-3144) ready, which seems like the most pressing of the three to me. thanks, Bill On Mon, Feb 18, 2013 at 2:27 AM, Kai Londenberg kai.londenb...@googlemail.com wrote: Hi, I just subscribed to the dev mailing list in order to give you some feedback on pig 0.11 candidate 2. The following three issues are currently present in 0.11 candidate 2: https://issues.apache.org/jira/browse/PIG-3144 - 'Erroneous map entry alias resolution leading to Duplicate schema alias errors' https://issues.apache.org/jira/browse/PIG-3194 - Changes to ObjectSerializer.java break compatibility with Hadoop 0.20.2 https://issues.apache.org/jira/browse/PIG-3195 - Race Condition in PhysicalOperator leads to ExecException Error while trying to get next result in POStream The last two of these are easily solveable (see the tickets for details on that). The first one is a bit trickier I think, but at least there is a workaround for it (pass Map fields through an UDF) In my personal opinion, each of these problems is pretty severe, but opinions about the importance of the MAP Datatype and STREAM Operator, as well as Hadoop 0.20.2 compatibility might differ. so far .. Kai Londenberg -- *Note that I'm no longer using my Yahoo! email address. Please email me at billgra...@gmail.com going forward.* -- Russell Jurney twitter.com/rjurney russell.jur...@gmail.com datasyndrome.com
[jira] [Commented] (PIG-2353) RANK function like in SQL
[ https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546114#comment-13546114 ] Olga Natkovich commented on PIG-2353: - I believe we agreed that the document changes are included and reviewed as part of the patch. Since this was not done this way, we need to get a separate patch for docs, RANK function like in SQL - Key: PIG-2353 URL: https://issues.apache.org/jira/browse/PIG-2353 Project: Pig Issue Type: New Feature Reporter: Gianmarco De Francisci Morales Assignee: Allan Avendaño Labels: gsoc2012, mentor Fix For: 0.11 Attachments: PIG-2353-2, PIG-2353-3.txt, PIG-2353-4.txt, PIG-2353-5.txt, PIG2353.patch Implement a function that given a (sorted) bag adds to each tuple a unique, increasing identifier without gaps, like what RANK does for SQL. This is a candidate project for Google summer of code 2012. More information about the program can be found at https://cwiki.apache.org/confluence/display/PIG/GSoc2012 Functionality implemented so far, is available at https://reviews.apache.org/r/5523/diff/#index_header -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2764) Add a biginteger and bigdecimal type to pig
[ https://issues.apache.org/jira/browse/PIG-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13535085#comment-13535085 ] Olga Natkovich commented on PIG-2764: - I think having support for BigInteger would be very helpful. We have asks within Yahoo for it. Add a biginteger and bigdecimal type to pig --- Key: PIG-2764 URL: https://issues.apache.org/jira/browse/PIG-2764 Project: Pig Issue Type: Improvement Reporter: Jonathan Coveney Assignee: Jonathan Coveney Attachments: fixedpoint.patch, PIG-2764-0.patch, PIG-2764-1.patch I think it would be useful for applications where precision is more important than speed to have the option of using java's bigdecimal and biginteger types natively. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2764) Add a biginteger and bigdecimal type to pig
[ https://issues.apache.org/jira/browse/PIG-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13535274#comment-13535274 ] Olga Natkovich commented on PIG-2764: - I agree with using standard type. Add a biginteger and bigdecimal type to pig --- Key: PIG-2764 URL: https://issues.apache.org/jira/browse/PIG-2764 Project: Pig Issue Type: Improvement Reporter: Jonathan Coveney Assignee: Jonathan Coveney Attachments: fixedpoint.patch, PIG-2764-0.patch, PIG-2764-1.patch I think it would be useful for applications where precision is more important than speed to have the option of using java's bigdecimal and biginteger types natively. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Our release process
Hi Jonathan, I thought I answered your email last week but I just noticed that the answer did not come through. We tell users that at is coming in the next release. Now that Pig is quite mature and stable, we don't see much of this. Having more frequent releases definitely helps in this respect. Olga From: Jonathan Coveney jcove...@gmail.com To: dev@pig.apache.org dev@pig.apache.org; Olga Natkovich onatkov...@yahoo.com Sent: Thursday, December 13, 2012 1:14 PM Subject: Re: Our release process Olga, A related but separate question: what do y'all do when there is a feature that is finished, but for an upcoming release? ie a feature in trunk, but not in 0.11 (which, let us assume, is stable). Jon 2012/12/13 Olga Natkovich onatkov...@yahoo.com Hi Julien, I think for us at Yahoo to be able to run our releases directly from the branch we would need the guarantees that I proposed in my initial email and something that we agreed to last year. The only changes that go in are - Failures without reasonable workarounds - Silent failures. My main concerns with the proposal is that I do not believe that our current testing infra is robust/inclusive enough to catch errors. That's why I am hesitant in widening the scope. I am fine with whatever the outcome the majority of people agrees with. I am just saying that Yahoo will likely need a private branch if our rules are too relaxed. Olga - Original Message - From: Julien Le Dem jul...@twitter.com To: dev@pig.apache.org dev@pig.apache.org; Olga Natkovich onatkov...@yahoo.com Cc: Sent: Wednesday, December 12, 2012 4:54 PM Subject: Re: Our release process Agreed. The priority of a change is subjective as well. My definition for inclusion on the release branch: - Only bug fixes. - Only if they have fairly understood repercussions (up to the committers who +/-1 as usual). - If we thought it would not break things but still does (CI or externally reported failure) we revert it. What do you want to add/change? Please reformulate those rules the way you like and let's see how we can converge. (Also, let's keep it short for clarity) Julien On Wed, Dec 12, 2012 at 11:08 AM, Olga Natkovich onatkov...@yahoo.com wrote: Hi Julien, I understand what you are trying to do and I can see that being able to make more fixes post release has value for some use cases. My concern is that things that do not destabilize the branch is fairly subjective and also not always easy to ascertain beyond trivial changes. The only way I know to keep a code stable is to limit the updates. Also we need to clearly state what the constrains are for a post release commits so that every user can decide whether it works for them. Olga From: Julien Le Dem jul...@twitter.com To: dev@pig.apache.org dev@pig.apache.org Sent: Wednesday, December 12, 2012 10:26 AM Subject: Re: Our release process I think we all agree here, let's not jump to conclusions. Everything in this branch I am talking about is in Apache Pig. Everything we do in Pig is contributed. We have a branch for 0.11 where we keep merging the official 0.11 branch plus a few patches (and it will stay small) that are only in Apache TRUNK. The goal here is to help keeping the release branch stable by not adding patches that are only useful to us. Having this branch allows us to fix anything quickly and redeploy to production. It is also what allows us to use the pig 0.11 branch in production before it is even released. This definitely benefits the community and helps making 0.11 stable. This is a very reasonable way to keep using a recent version of Pig in production. Olga: My goal is to decrease the scope of what is going in the release branch and to make sure we add only bug fixes that are not making it unstable. I also think having a short definition of this helps which is why I have been chiming in. Let us know how you want to decrease the scope. I'm just trying to simplify here. Julien On Tue, Dec 11, 2012 at 8:54 AM, Prashant Kommireddi prash1...@gmail.com wrote: Share the same concern as Russell here. Not great for the project for everyone to go private branch approach. On Tue, Dec 11, 2012 at 8:33 AM, Russell Jurney russell.jur...@gmail.com wrote: Wait. Ack. Do we want everyone to do this? This sounds like fragmentation. :( Russell Jurney twitter.com/rjurney On Dec 10, 2012, at 3:24 PM, Olga Natkovich onatkov...@yahoo.com wrote: If everybody is using a private branch then (1) We are not serving a significant part of our community (2) There is no motivation to contribute those patches to branches (only to trunk). Yahoo has been trying hard to work of the Apache branches but if we increase the scope of what is going
[jira] [Commented] (PIG-2764) Add a biginteger and bigdecimal type to pig
[ https://issues.apache.org/jira/browse/PIG-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13532746#comment-13532746 ] Olga Natkovich commented on PIG-2764: - Is anybody working on this or planning to in the near future? Add a biginteger and bigdecimal type to pig --- Key: PIG-2764 URL: https://issues.apache.org/jira/browse/PIG-2764 Project: Pig Issue Type: Improvement Reporter: Jonathan Coveney Assignee: Jonathan Coveney Attachments: fixedpoint.patch, PIG-2764-0.patch, PIG-2764-1.patch I think it would be useful for applications where precision is more important than speed to have the option of using java's bigdecimal and biginteger types natively. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Our release process
Hi Julien, I think for us at Yahoo to be able to run our releases directly from the branch we would need the guarantees that I proposed in my initial email and something that we agreed to last year. The only changes that go in are - Failures without reasonable workarounds - Silent failures. My main concerns with the proposal is that I do not believe that our current testing infra is robust/inclusive enough to catch errors. That's why I am hesitant in widening the scope. I am fine with whatever the outcome the majority of people agrees with. I am just saying that Yahoo will likely need a private branch if our rules are too relaxed. Olga - Original Message - From: Julien Le Dem jul...@twitter.com To: dev@pig.apache.org dev@pig.apache.org; Olga Natkovich onatkov...@yahoo.com Cc: Sent: Wednesday, December 12, 2012 4:54 PM Subject: Re: Our release process Agreed. The priority of a change is subjective as well. My definition for inclusion on the release branch: - Only bug fixes. - Only if they have fairly understood repercussions (up to the committers who +/-1 as usual). - If we thought it would not break things but still does (CI or externally reported failure) we revert it. What do you want to add/change? Please reformulate those rules the way you like and let's see how we can converge. (Also, let's keep it short for clarity) Julien On Wed, Dec 12, 2012 at 11:08 AM, Olga Natkovich onatkov...@yahoo.comwrote: Hi Julien, I understand what you are trying to do and I can see that being able to make more fixes post release has value for some use cases. My concern is that things that do not destabilize the branch is fairly subjective and also not always easy to ascertain beyond trivial changes. The only way I know to keep a code stable is to limit the updates. Also we need to clearly state what the constrains are for a post release commits so that every user can decide whether it works for them. Olga From: Julien Le Dem jul...@twitter.com To: dev@pig.apache.org dev@pig.apache.org Sent: Wednesday, December 12, 2012 10:26 AM Subject: Re: Our release process I think we all agree here, let's not jump to conclusions. Everything in this branch I am talking about is in Apache Pig. Everything we do in Pig is contributed. We have a branch for 0.11 where we keep merging the official 0.11 branch plus a few patches (and it will stay small) that are only in Apache TRUNK. The goal here is to help keeping the release branch stable by not adding patches that are only useful to us. Having this branch allows us to fix anything quickly and redeploy to production. It is also what allows us to use the pig 0.11 branch in production before it is even released. This definitely benefits the community and helps making 0.11 stable. This is a very reasonable way to keep using a recent version of Pig in production. Olga: My goal is to decrease the scope of what is going in the release branch and to make sure we add only bug fixes that are not making it unstable. I also think having a short definition of this helps which is why I have been chiming in. Let us know how you want to decrease the scope. I'm just trying to simplify here. Julien On Tue, Dec 11, 2012 at 8:54 AM, Prashant Kommireddi prash1...@gmail.com wrote: Share the same concern as Russell here. Not great for the project for everyone to go private branch approach. On Tue, Dec 11, 2012 at 8:33 AM, Russell Jurney russell.jur...@gmail.com wrote: Wait. Ack. Do we want everyone to do this? This sounds like fragmentation. :( Russell Jurney twitter.com/rjurney On Dec 10, 2012, at 3:24 PM, Olga Natkovich onatkov...@yahoo.com wrote: If everybody is using a private branch then (1) We are not serving a significant part of our community (2) There is no motivation to contribute those patches to branches (only to trunk). Yahoo has been trying hard to work of the Apache branches but if we increase the scope of what is going into branches, we will go with private branch approach as well. Olga From: Julien Le Dem jul...@twitter.com To: Olga Natkovich onatkov...@yahoo.com Cc: dev@pig.apache.org dev@pig.apache.org; Santhosh M S santhosh_mut...@yahoo.com; billgra...@gmail.com billgra...@gmail.com Sent: Friday, December 7, 2012 3:54 PM Subject: Re: Our release process Here's my criteria for inclusion in a release branch: - no new feature. Only bug fixes. - The criteria is more about stability than priority. The person/group asking for it has a good reason for wanting it in the branch. If commiters think the patch is reasonable and won't make the branch unstable then we should check it in. If it breaks something anyway, we revert it. For what it's worth we (at Twitter) maintain an internal branch
Re: Our release process
Hi Julien, I understand what you are trying to do and I can see that being able to make more fixes post release has value for some use cases. My concern is that things that do not destabilize the branch is fairly subjective and also not always easy to ascertain beyond trivial changes. The only way I know to keep a code stable is to limit the updates. Also we need to clearly state what the constrains are for a post release commits so that every user can decide whether it works for them. Olga From: Julien Le Dem jul...@twitter.com To: dev@pig.apache.org dev@pig.apache.org Sent: Wednesday, December 12, 2012 10:26 AM Subject: Re: Our release process I think we all agree here, let's not jump to conclusions. Everything in this branch I am talking about is in Apache Pig. Everything we do in Pig is contributed. We have a branch for 0.11 where we keep merging the official 0.11 branch plus a few patches (and it will stay small) that are only in Apache TRUNK. The goal here is to help keeping the release branch stable by not adding patches that are only useful to us. Having this branch allows us to fix anything quickly and redeploy to production. It is also what allows us to use the pig 0.11 branch in production before it is even released. This definitely benefits the community and helps making 0.11 stable. This is a very reasonable way to keep using a recent version of Pig in production. Olga: My goal is to decrease the scope of what is going in the release branch and to make sure we add only bug fixes that are not making it unstable. I also think having a short definition of this helps which is why I have been chiming in. Let us know how you want to decrease the scope. I'm just trying to simplify here. Julien On Tue, Dec 11, 2012 at 8:54 AM, Prashant Kommireddi prash1...@gmail.comwrote: Share the same concern as Russell here. Not great for the project for everyone to go private branch approach. On Tue, Dec 11, 2012 at 8:33 AM, Russell Jurney russell.jur...@gmail.com wrote: Wait. Ack. Do we want everyone to do this? This sounds like fragmentation. :( Russell Jurney twitter.com/rjurney On Dec 10, 2012, at 3:24 PM, Olga Natkovich onatkov...@yahoo.com wrote: If everybody is using a private branch then (1) We are not serving a significant part of our community (2) There is no motivation to contribute those patches to branches (only to trunk). Yahoo has been trying hard to work of the Apache branches but if we increase the scope of what is going into branches, we will go with private branch approach as well. Olga From: Julien Le Dem jul...@twitter.com To: Olga Natkovich onatkov...@yahoo.com Cc: dev@pig.apache.org dev@pig.apache.org; Santhosh M S santhosh_mut...@yahoo.com; billgra...@gmail.com billgra...@gmail.com Sent: Friday, December 7, 2012 3:54 PM Subject: Re: Our release process Here's my criteria for inclusion in a release branch: - no new feature. Only bug fixes. - The criteria is more about stability than priority. The person/group asking for it has a good reason for wanting it in the branch. If commiters think the patch is reasonable and won't make the branch unstable then we should check it in. If it breaks something anyway, we revert it. For what it's worth we (at Twitter) maintain an internal branch where we add patches we need and I would suggest anybody that wants to be able to make emergency fixes to their own deployment to do the same. We do keep that branch as close to apache as we can but it has a few patches that are in trunk only and do not satisfy the no new feature criteria. What does the PMC think ? Julien On Tue, Dec 4, 2012 at 12:46 PM, Olga Natkovich onatkov...@yahoo.com wrote: I am ok with tests running nightly and reverting patches that cause failures. We used to have that. Does anybody know what happened? Is anybody volunteering to make it work again? I would like to see specific criteria for what goes into the branch been published (rather than case-by-case). This way each team can decided if the criteria stringent enough of if they need to run a private branch. Olga -- *From:* Santhosh M S santhosh_mut...@yahoo.com *To:* Julien Le Dem jul...@twitter.com; dev@pig.apache.org dev@pig.apache.org *Cc:* billgra...@gmail.com billgra...@gmail.com *Sent:* Friday, November 30, 2012 11:46 PM *Subject:* Re: Our release process HI Julien, You are making most of the points that I did on this thread (CI for e2e, not burdening clean e2e prior to every commit for a release branch). The only point on which there is no clear agreement is the definition of a bug that can be included in a previously released branch. I am fine with a case by case
Re: Our release process
: Re: Our release process I agree releasing often is ideal, but releasing major versions once a month would be a bit agressive. +1 to Olga's initial definition of how Yahoo! determines what goes into a released branch. Basically is something broken without a workaround or is there potential silent data loss. Trying to get a more granular definition than that (i.e. P1, P2, severity, etc) will be painful. The reality in that case is that for whomever is blocked by the bug will consider it a P1. Fixes need to be relatively low-risk though to keep stability, but this is also subjective. For this I'm in favor of relying on developer and reviewer judgement to make that call and I'm +1 to Alan's proposal of rolling back patches that break the e2e tests or anything else. I think our policy should avoid time-based consideration on how many quarters away are we from the next major release since that's also impossible to quantify. Plus, if the answer to the question is that we're more than 1-2 quarters from the next release is yes then we should be fixing that release problem. On Wed, Nov 28, 2012 at 10:22 AM, Julien Le Dem jul...@twitter.com wrote: I would really like to see us doing frequent releases (at least once per quarter if not once a month). I think the whole notion of priority or being a blocker is subjective. Releasing infrequently pressures us to push more changes than we would want to the release branch. We should focus on keeping TRUNK stable as well so that it is easier to release and users can do more frequent and smaller upgrades. There should be a small enough number of patches going in the release branch so that we can get agreement on whether we check them in or not. I like Alan's proposal of reverting quickly when there's a problem. Again, this becomes less of a problem if we release more often. Which leads me to my next question: what are the next steps for releasing pig 0.11 ? Julien On Tue, Nov 27, 2012 at 10:22 PM, Santhosh M S santhosh_mut...@yahoo.com wrote: Hi Olga, For a moment, I will move away from P1 and P2 which are related to priorities and use the Severity definitions. The standard bugzilla definitions for severity are: Blocker - Blocks development and/or testing work. Critical - Crashes, loss of data, severe memory leak. Major - Major loss of function. I am skipping the other levels (normal, minor and trivial) for this discussion. Coming back to priorities, the proposed definitions map P1 to Blocker and Critical. I am proposing mapping P2 to Major even when there are known workarounds. We are doing this since JIRA does not have severity by default (see: https://confluence.atlassian.com/pages/viewpage.action?pageId=192840 ) I am proposing that P2s be included in the released branch only when trunk or unreleased versions are known to be backward incompatible or if the release is more than a quarter (or two) away. Thanks, Santhosh From: Olga Natkovich onatkov...@yahoo.com To: dev@pig.apache.org dev@pig.apache.org; Santhosh M S santhosh_mut...@yahoo.com Sent: Tuesday, November 27, 2012 10:41 AM Subject: Re: Our release process Hi Santhosh, What is your definition of P2s? Olga - Original Message - From: Santhosh M S santhosh_mut...@yahoo.com To: dev@pig.apache.org dev@pig.apache.org; Olga Natkovich onatkov...@yahoo.com Cc: Sent: Monday, November 26, 2012 11:49 PM Subject: Re: Our release process Hi Olga, I agree that we cannot guarantee backward compatibility upfront. With that knowledge, I am proposing a small modification to your proposal. 1. If the trunk or unreleased version is known to be backwards compatible then only P1 issues go into the released branch. 2. If the the trunk or unreleased version is known to be backwards incompatible or the release is a long ways off (two quarters?) then we should allow for dot releases on the branch, i.e., P1 and P2 issues. I am hoping that should provide an incentive for users to move to a higher release and at the same time allow developers to fix issues of significance without impacting stability. Thanks, Santhosh From: Olga Natkovich onatkov...@yahoo.com To: dev@pig.apache.org dev@pig.apache.org Sent: Monday, November 26, 2012 9:38 AM Subject: Re: Our release process Hi Santhosh, I understand the compatibility issue though I am not sure we can guarantee it for all releases upfront but agree that we should make an effort. On the e2e tests, part of the proposal is only do make P1 type of changes to the branch after the initial release so they should be rare. Olga From: Santhosh M S santhosh_mut...@yahoo.com To: Olga Natkovich onatkov...@yahoo.com; dev@pig.apache.org dev@pig.apache.org Sent: Monday, November 26, 2012 12:00 AM
Re: Our release process
Hi Santhosh, What is your definition of P2s? Olga - Original Message - From: Santhosh M S santhosh_mut...@yahoo.com To: dev@pig.apache.org dev@pig.apache.org; Olga Natkovich onatkov...@yahoo.com Cc: Sent: Monday, November 26, 2012 11:49 PM Subject: Re: Our release process Hi Olga, I agree that we cannot guarantee backward compatibility upfront. With that knowledge, I am proposing a small modification to your proposal. 1. If the trunk or unreleased version is known to be backwards compatible then only P1 issues go into the released branch. 2. If the the trunk or unreleased version is known to be backwards incompatible or the release is a long ways off (two quarters?) then we should allow for dot releases on the branch, i.e., P1 and P2 issues. I am hoping that should provide an incentive for users to move to a higher release and at the same time allow developers to fix issues of significance without impacting stability. Thanks, Santhosh From: Olga Natkovich onatkov...@yahoo.com To: dev@pig.apache.org dev@pig.apache.org Sent: Monday, November 26, 2012 9:38 AM Subject: Re: Our release process Hi Santhosh, I understand the compatibility issue though I am not sure we can guarantee it for all releases upfront but agree that we should make an effort. On the e2e tests, part of the proposal is only do make P1 type of changes to the branch after the initial release so they should be rare. Olga From: Santhosh M S santhosh_mut...@yahoo.com To: Olga Natkovich onatkov...@yahoo.com; dev@pig.apache.org dev@pig.apache.org Sent: Monday, November 26, 2012 12:00 AM Subject: Re: Our release process It takes too long to run. If the e2e tests are run every night or a reasonable timeframe then it will reduce the barrier for submitting patches. The context for this: the reluctance of folks to move to a higher version when the higher version is not backward compatible. Santhosh From: Olga Natkovich onatkov...@yahoo.com To: dev@pig.apache.org dev@pig.apache.org; Santhosh M S santhosh_mut...@yahoo.com Sent: Sunday, November 25, 2012 5:56 PM Subject: Re: Our release process Hi Santhosh, Can you clarify why running e2e tests on every checking is a problem? Olga From: Santhosh M S santhosh_mut...@yahoo.com To: dev@pig.apache.org dev@pig.apache.org Sent: Monday, November 19, 2012 3:48 PM Subject: Re: Our release process The push for an upgrade will work only if the higher release is backward compatible with the lower release. If not, folks will tend to use private branches. Having a stable branch on a large deployment is a good indicator of stability. However, please note that there have been instances where some releases were never adopted. I will be extremely careful in applying the rule of running e2e tests for every commit to a released branch. If we release every quarter (hopefully) and preserve backward compatibility then I am +1 to the proposal. If the backward compatibility is not preserved then I am -1 for having to run e2e for every commit to a released branch. Santhosh From: Jonathan Coveney jcove...@gmail.com To: dev@pig.apache.org dev@pig.apache.org Sent: Tuesday, November 6, 2012 6:34 PM Subject: Re: Our release process I think it might be good to clarify (for me) a couple of cases: 1. we have branched a new release 2. an existing release The way I understand things, in the case of 1, we have a backlog of patches (not all of which are P1 bugs), and that's ok. If a new bad bug comes in (the subject of debate here), then it goes in anyway (and in some cases, would go into 0.9 etc). Olga is saying that for existing release (0.9, 0.10), we should only commit P1 bug fixes there. This makes sense to me, as we're fixing the official release in place. IMHO, this would encourage people to use newer release (as this is where the latest and greatest stuff is, including non-critical bug fixes). Olga's criteria is a pretty clear barrier for inclusion into these releases. With old releases, I think the key is really that they keep doing what they have always done. Most bugs are well understood by now, and the ones that aren't will no doubt be P1. I'm not decided (thus no formal +1 or whatnot), but Olga's point seems pretty reasonable to me, especially given that trunk has pretty liberal development. Once it gets tidied up, I can understand not wanting to jostle it. 2012/11/5 Alan Gates ga...@hortonworks.com Jonathan, for clarity, are you saying you agree that we should only put bug fixes in branches or we should only put high priority bug fixes in branches? I think we all agree on the former, but there appear to be different views on the latter. Alan. On Nov 5, 2012, at 4:53 PM, Jonathan Coveney wrote: This seems to make sense to me. People can always back-port features
Re: Our release process
Hi Santhosh, I understand the compatibility issue though I am not sure we can guarantee it for all releases upfront but agree that we should make an effort. On the e2e tests, part of the proposal is only do make P1 type of changes to the branch after the initial release so they should be rare. Olga From: Santhosh M S santhosh_mut...@yahoo.com To: Olga Natkovich onatkov...@yahoo.com; dev@pig.apache.org dev@pig.apache.org Sent: Monday, November 26, 2012 12:00 AM Subject: Re: Our release process It takes too long to run. If the e2e tests are run every night or a reasonable timeframe then it will reduce the barrier for submitting patches. The context for this: the reluctance of folks to move to a higher version when the higher version is not backward compatible. Santhosh From: Olga Natkovich onatkov...@yahoo.com To: dev@pig.apache.org dev@pig.apache.org; Santhosh M S santhosh_mut...@yahoo.com Sent: Sunday, November 25, 2012 5:56 PM Subject: Re: Our release process Hi Santhosh, Can you clarify why running e2e tests on every checking is a problem? Olga From: Santhosh M S santhosh_mut...@yahoo.com To: dev@pig.apache.org dev@pig.apache.org Sent: Monday, November 19, 2012 3:48 PM Subject: Re: Our release process The push for an upgrade will work only if the higher release is backward compatible with the lower release. If not, folks will tend to use private branches. Having a stable branch on a large deployment is a good indicator of stability. However, please note that there have been instances where some releases were never adopted. I will be extremely careful in applying the rule of running e2e tests for every commit to a released branch. If we release every quarter (hopefully) and preserve backward compatibility then I am +1 to the proposal. If the backward compatibility is not preserved then I am -1 for having to run e2e for every commit to a released branch. Santhosh From: Jonathan Coveney jcove...@gmail.com To: dev@pig.apache.org dev@pig.apache.org Sent: Tuesday, November 6, 2012 6:34 PM Subject: Re: Our release process I think it might be good to clarify (for me) a couple of cases: 1. we have branched a new release 2. an existing release The way I understand things, in the case of 1, we have a backlog of patches (not all of which are P1 bugs), and that's ok. If a new bad bug comes in (the subject of debate here), then it goes in anyway (and in some cases, would go into 0.9 etc). Olga is saying that for existing release (0.9, 0.10), we should only commit P1 bug fixes there. This makes sense to me, as we're fixing the official release in place. IMHO, this would encourage people to use newer release (as this is where the latest and greatest stuff is, including non-critical bug fixes). Olga's criteria is a pretty clear barrier for inclusion into these releases. With old releases, I think the key is really that they keep doing what they have always done. Most bugs are well understood by now, and the ones that aren't will no doubt be P1. I'm not decided (thus no formal +1 or whatnot), but Olga's point seems pretty reasonable to me, especially given that trunk has pretty liberal development. Once it gets tidied up, I can understand not wanting to jostle it. 2012/11/5 Alan Gates ga...@hortonworks.com Jonathan, for clarity, are you saying you agree that we should only put bug fixes in branches or we should only put high priority bug fixes in branches? I think we all agree on the former, but there appear to be different views on the latter. Alan. On Nov 5, 2012, at 4:53 PM, Jonathan Coveney wrote: This seems to make sense to me. People can always back-port features, and this encourages them to use the newer ones. It also means we will be more rigorous about stability, which is good as it is a big plus for Pig. I think for older branches, stability trumps features in a big way. 2012/11/5 Gianmarco De Francisci Morales g...@apache.org Hi, On Mon, Nov 5, 2012 at 10:48 AM, Olga Natkovich onatkov...@yahoo.com wrote: Hi Gianmarco, Thanks for your comments. Here is a little more information. At Yahoo, we consider the following issues to be P1: (1) Bugs that cause wrong results being produced silently (2) Bugs that cause failures with no easy workaround Thanks Olga, now I get what you mean. I don't have a strong opinion on this. On one hand I see why you don't want to put too many patches in the branches in order to keep things stable. On the other hand when we do a 0.10.x release with x0 the users would like to have as many bugs fixed as possible. Regarding tests. I would suggest we have different rules for trunk and branches: (1) For branches, I think we should run the full regression suite (including e2e) prior to commit. This way we can ensure branch stability
Re: Our release process
Hi Santhosh, Can you clarify why running e2e tests on every checking is a problem? Olga From: Santhosh M S santhosh_mut...@yahoo.com To: dev@pig.apache.org dev@pig.apache.org Sent: Monday, November 19, 2012 3:48 PM Subject: Re: Our release process The push for an upgrade will work only if the higher release is backward compatible with the lower release. If not, folks will tend to use private branches. Having a stable branch on a large deployment is a good indicator of stability. However, please note that there have been instances where some releases were never adopted. I will be extremely careful in applying the rule of running e2e tests for every commit to a released branch. If we release every quarter (hopefully) and preserve backward compatibility then I am +1 to the proposal. If the backward compatibility is not preserved then I am -1 for having to run e2e for every commit to a released branch. Santhosh From: Jonathan Coveney jcove...@gmail.com To: dev@pig.apache.org dev@pig.apache.org Sent: Tuesday, November 6, 2012 6:34 PM Subject: Re: Our release process I think it might be good to clarify (for me) a couple of cases: 1. we have branched a new release 2. an existing release The way I understand things, in the case of 1, we have a backlog of patches (not all of which are P1 bugs), and that's ok. If a new bad bug comes in (the subject of debate here), then it goes in anyway (and in some cases, would go into 0.9 etc). Olga is saying that for existing release (0.9, 0.10), we should only commit P1 bug fixes there. This makes sense to me, as we're fixing the official release in place. IMHO, this would encourage people to use newer release (as this is where the latest and greatest stuff is, including non-critical bug fixes). Olga's criteria is a pretty clear barrier for inclusion into these releases. With old releases, I think the key is really that they keep doing what they have always done. Most bugs are well understood by now, and the ones that aren't will no doubt be P1. I'm not decided (thus no formal +1 or whatnot), but Olga's point seems pretty reasonable to me, especially given that trunk has pretty liberal development. Once it gets tidied up, I can understand not wanting to jostle it. 2012/11/5 Alan Gates ga...@hortonworks.com Jonathan, for clarity, are you saying you agree that we should only put bug fixes in branches or we should only put high priority bug fixes in branches? I think we all agree on the former, but there appear to be different views on the latter. Alan. On Nov 5, 2012, at 4:53 PM, Jonathan Coveney wrote: This seems to make sense to me. People can always back-port features, and this encourages them to use the newer ones. It also means we will be more rigorous about stability, which is good as it is a big plus for Pig. I think for older branches, stability trumps features in a big way. 2012/11/5 Gianmarco De Francisci Morales g...@apache.org Hi, On Mon, Nov 5, 2012 at 10:48 AM, Olga Natkovich onatkov...@yahoo.com wrote: Hi Gianmarco, Thanks for your comments. Here is a little more information. At Yahoo, we consider the following issues to be P1: (1) Bugs that cause wrong results being produced silently (2) Bugs that cause failures with no easy workaround Thanks Olga, now I get what you mean. I don't have a strong opinion on this. On one hand I see why you don't want to put too many patches in the branches in order to keep things stable. On the other hand when we do a 0.10.x release with x0 the users would like to have as many bugs fixed as possible. Regarding tests. I would suggest we have different rules for trunk and branches: (1) For branches, I think we should run the full regression suite (including e2e) prior to commit. This way we can ensure branch stability and, as number of patches should be small, will not be a burden (2) For trunk, we can go with test-commit only and fix things quickly when things break. I think this makes sense. +1 Olga Cheers, -- Gianmarco
Re: Our release process
Hi Gianmarco, Thanks for your comments. Here is a little more information. At Yahoo, we consider the following issues to be P1: (1) Bugs that cause wrong results being produced silently (2) Bugs that cause failures with no easy workaround Regarding tests. I would suggest we have different rules for trunk and branches: (1) For branches, I think we should run the full regression suite (including e2e) prior to commit. This way we can ensure branch stability and, as number of patches should be small, will not be a burden (2) For trunk, we can go with test-commit only and fix things quickly when things break. Olga From: Gianmarco De Francisci Morales g...@apache.org To: dev@pig.apache.org; Olga Natkovich onatkov...@yahoo.com Sent: Monday, November 5, 2012 10:37 AM Subject: Re: Our release process Hi, Sure we don't want to commit patches that destabilize the code base. However, unfortunately, there is no way to know whether a patch will destabilize the code or not. Even testing is only a heuristic. So how do we draw the line? We seem to agree that only bug fixing should go into branches. However it seems that we have two different views on the policy: Olga is proposing to have only P1 bugs fixed, while Alan is suggesting to be more lax on what goes into the branches. Regardless of the policy chosen, how do we define the priority of a bug? By how many users are affected? By whether it can corrupt data? Is there a formal definition we can agree on? Otherwise defining a policy becomes hard. The test-commit task does not run full regression because the full test suite takes too long to execute. And I agree that asking to run the full test suite before committing any change slows down the (already slow) review process. However, I would be fine with running the full test suite for bug fixes that need to go into branches, in order to guarantee absence of regressions. Cheers, -- Gianmarco On Sun, Nov 4, 2012 at 5:17 PM, Olga Natkovich onatkov...@yahoo.com wrote: I can see how this would work for research projects but for real production this will not work. And I actually meant much more stringent stability. I don't think we should commit patches to either trunk or branch that destabilize the tree. We used to run full regression before each commit - is this no longer the case? By stability I meant very few things go into the branch. I know that pig has pretty decent tests - better coverage than many other projects. However, we do not have any testing at scale and inevitably, users end up doing testing. So any time we deploy new major version, it takes us at least a month to get it stable and once it is stabilized we want to keep it this way. So for us at Yahoo, the only way to work directly from the branch is to go by our original plan. If that is not possible, we would go with the private git branch. Olga From: Alan Gates ga...@hortonworks.com To: dev@pig.apache.org Sent: Friday, November 2, 2012 8:19 PM Subject: Re: Our release process I am all for maintaining stability of branches, and the trunk, as everyone benefits from it. But I do not think this means we should limit bug fixing in the branches to only critical issues. As Pig gets more users we have more and more people on older branches who will want fixes for bugs without dealing with bigger version changes. So I am not in favor of limiting checkins to branches to P1 issues. What if we maintain stability on the branches by quickly reverting any patches that break the build, the unit tests, or the e2e tests? This allows us to move forward with bug fix versions, it allows those who depend on branch stability (which I suspect is everyone in the distribution business plus everyone rolling their own Pig), and it should promote developer responsibility (no one likes having their patches reverted). Alan. On Nov 2, 2012, at 3:58 PM, Olga Natkovich wrote: Hi guys, Mid next year, we agreed on a release process documented in this thread: http://www.mail-archive.com/dev@pig.apache.org/msg04172.html. Since then, we have not really followed either of its two rules: (1) Frequent (every 3 month releases) (2) Branch stability (only P1 issues on the branch). So I wanted to revisit our release procedure to make sure we have one that we can actually follow. For us at Yahoo, branch stability is very important since we release all the patches directly from the branch. If we can't rely on the fact that only critical fixes go in, we will need to resort to git branches that will make the whole process very comberson because we now need to hand pick patches from the apache branch and port them onto our private branch. I would imaging that others using Pig in production would have similar issues. Olga Olga
Re: Our release process
I can see how this would work for research projects but for real production this will not work. And I actually meant much more stringent stability. I don't think we should commit patches to either trunk or branch that destabilize the tree. We used to run full regression before each commit - is this no longer the case? By stability I meant very few things go into the branch. I know that pig has pretty decent tests - better coverage than many other projects. However, we do not have any testing at scale and inevitably, users end up doing testing. So any time we deploy new major version, it takes us at least a month to get it stable and once it is stabilized we want to keep it this way. So for us at Yahoo, the only way to work directly from the branch is to go by our original plan. If that is not possible, we would go with the private git branch. Olga From: Alan Gates ga...@hortonworks.com To: dev@pig.apache.org Sent: Friday, November 2, 2012 8:19 PM Subject: Re: Our release process I am all for maintaining stability of branches, and the trunk, as everyone benefits from it. But I do not think this means we should limit bug fixing in the branches to only critical issues. As Pig gets more users we have more and more people on older branches who will want fixes for bugs without dealing with bigger version changes. So I am not in favor of limiting checkins to branches to P1 issues. What if we maintain stability on the branches by quickly reverting any patches that break the build, the unit tests, or the e2e tests? This allows us to move forward with bug fix versions, it allows those who depend on branch stability (which I suspect is everyone in the distribution business plus everyone rolling their own Pig), and it should promote developer responsibility (no one likes having their patches reverted). Alan. On Nov 2, 2012, at 3:58 PM, Olga Natkovich wrote: Hi guys, Mid next year, we agreed on a release process documented in this thread: http://www.mail-archive.com/dev@pig.apache.org/msg04172.html. Since then, we have not really followed either of its two rules: (1) Frequent (every 3 month releases) (2) Branch stability (only P1 issues on the branch). So I wanted to revisit our release procedure to make sure we have one that we can actually follow. For us at Yahoo, branch stability is very important since we release all the patches directly from the branch. If we can't rely on the fact that only critical fixes go in, we will need to resort to git branches that will make the whole process very comberson because we now need to hand pick patches from the apache branch and port them onto our private branch. I would imaging that others using Pig in production would have similar issues. Olga Olga
Re: Pig 0.11
We are still at 43 open tickets. How do you guys like to proceed? Olga - Original Message - From: Olga Natkovich onatkov...@yahoo.com To: dev@pig.apache.org dev@pig.apache.org; Olga Natkovich onatkov...@yahoo.com Cc: Sent: Tuesday, October 30, 2012 9:07 AM Subject: Re: Pig 0.11 We are down to 45 tickets. Thanks for everybody who helped with the cleanup. We only have a couple of unassigned in the area of documentation and testing. Now we need to go through the assigned ones and see what can be done for 0.11. Here is a list of people with many tickets. Please review what you are planning to complete in the next couple of weeks and unlink the rest. Daniel - 15+ John Gordon - 6 Jonathan Coveney - 6 Thanks, Olga From: Olga Natkovich onatkov...@yahoo.com To: dev@pig.apache.org dev@pig.apache.org Sent: Friday, October 26, 2012 4:32 PM Subject: Re: Pig 0.11 74 issues still open and more than half unassigned. I think we should narrow list down next week. I am planning to start unlinking the unassigned ones next week so if you feel they need to be addressed, please, find owner. Olga - Original Message - From: Olga Natkovich onatkov...@yahoo.com To: dev@pig.apache.org dev@pig.apache.org Cc: Sent: Monday, October 22, 2012 10:14 AM Subject: Re: Pig 0.11 There are still 76 unresolved JIRAs more than half unassigned. Lets clean this up by theend of this week. I propose we do the following: (1) Unlink all JIRAs for new features since we already branched so we should not be taken on new work. If people feel strongly that some new features still need to go in please bring it up. (2) For bug fixes, if people fill strongly that some of the unassigned issues need to be addressed please take ownership. If you are unable to solve them but still feel they are important, please, bring them up. (3) Owners of unresolved issues, please, take a look if you will have time to solve them in the next 2 weeks. If not, lets move them to 12. If you can't address them but feel they are important, please, bring it up. Lets make sure that all JIRAs that require changes to the documentation have appropriate information in the release notes section so that we can quickly compile release documentation. Thanks for you help! Olga From: Alan Gates ga...@hortonworks.com To: dev@pig.apache.org Sent: Monday, October 15, 2012 11:55 AM Subject: Re: Pig 0.11 At this point no one has taken on release documentation for 0.11. Alan. On Oct 15, 2012, at 11:49 AM, Olga Natkovich wrote: Thanks! Are you talking about items 15 and 16 on the How To Release.Publish page? Also, who is doing release documentation these days? I can help with that as well. I would also be happy to roll the release if you guys need help with that. Olga From: Dmitriy Ryaboy dvrya...@gmail.com To: dev@pig.apache.org dev@pig.apache.org Cc: dev@pig.apache.org dev@pig.apache.org Sent: Friday, October 12, 2012 5:59 PM Subject: Re: Pig 0.11 Thanks Olga and welcome back! I know there's some process for linking jiras to releases, but I'm not sure what that is. If you could explain and maybe cover a portion of that work, that'd be super helpful. And reviews, of course. On Oct 12, 2012, at 2:06 PM, Olga Natkovich onatkov...@yahoo.com wrote: Dmitry, I would be happy to help with the release process. Want to get back into this now that I am back at work. Let me know what you would like me to do. Olga From: Dmitriy Ryaboy dvrya...@gmail.com To: dev@pig.apache.org Cc: billgra...@gmail.com Sent: Thursday, October 11, 2012 2:44 PM Subject: Re: Pig 0.11 Ok I will branch 0.11 tomorrow morning unless someone objects. From then on, committers should be careful to commit bug fixes to both 0.11 branch and trunk; minor polish can go into the branch, but whole new features should not (we can discuss on the list if something is in the gray area). D On Thu, Oct 11, 2012 at 2:16 PM, Gianmarco De Francisci Morales g...@apache.org wrote: I added it as a dependency as it has already its own Jira. I hope it is OK. Cheers, -- Gianmarco On Wed, Oct 10, 2012 at 11:23 PM, Bill Graham billgra...@gmail.com wrote: +1 for me. There's https://issues.apache.org/jira/browse/PIG-2756 which tracks a few documentation issues that should block Pig 0.11, but they can also be done on the trunk and merged to the branch. Gianmarco, you can add a rank subtask there to serve as a reminder. On Wed, Oct 10, 2012 at 11:03 PM, Gianmarco De Francisci Morales g...@apache.org wrote: We are missing some documentation on the RANK but I guess we could add that to the branch and trunk in parallel. All the patches I was keeping an eye on are in. So +1 for me. -- Gianmarco On Wed, Oct 10, 2012 at 5:31 PM, Jonathan Coveney
Our release process
Hi guys, Mid next year, we agreed on a release process documented in this thread: http://www.mail-archive.com/dev@pig.apache.org/msg04172.html. Since then, we have not really followed either of its two rules: (1) Frequent (every 3 month releases) (2) Branch stability (only P1 issues on the branch). So I wanted to revisit our release procedure to make sure we have one that we can actually follow. For us at Yahoo, branch stability is very important since we release all the patches directly from the branch. If we can't rely on the fact that only critical fixes go in, we will need to resort to git branches that will make the whole process very comberson because we now need to hand pick patches from the apache branch and port them onto our private branch. I would imaging that others using Pig in production would have similar issues. Olga Olga
[jira] [Updated] (PIG-2657) Print warning if using wrong jython version
[ https://issues.apache.org/jira/browse/PIG-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-2657: Fix Version/s: (was: 0.10.1) (was: 0.11) 0.12 Moving to 0.12 based on Rohini's recommendation. Please, move back if you feel it needs to make it to 0.11 Print warning if using wrong jython version --- Key: PIG-2657 URL: https://issues.apache.org/jira/browse/PIG-2657 Project: Pig Issue Type: Bug Reporter: Fabian Alenius Fix For: 0.12 Attachments: PIG-2657.1.patch, PIG-2657.2.patch Hi, It would be good if Pig would print a warning (or refuse to run) if you are using an unsupported version of jython. I spent a couple of hours before figuring out that you had to use 2.5.0. I've seen posts indicating that others have run into this problem as well. Might write up a patch if others agree this is an issue. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2521) explicit reference to namenode path with streaming results in an error
[ https://issues.apache.org/jira/browse/PIG-2521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-2521: Fix Version/s: (was: 0.11) 0.12 moving to 0.12 since it is labelled as minor and no activity explicit reference to namenode path with streaming results in an error -- Key: PIG-2521 URL: https://issues.apache.org/jira/browse/PIG-2521 Project: Pig Issue Type: Bug Affects Versions: 0.9.2 Reporter: Araceli Henley Priority: Minor Fix For: 0.12 I set this to minor because this test works with client side tables and with old style references. :: /grid/2/dev/pigqa/out/pigtest/hadoopqa/hadoopqa.1327441396/dotNext_baseline_15.pig :: THIS TEST FAILS. It uses an explicit reference to namenode1 (hdfs://namenode1.domain.com:8020) define CMD `perl PigStreamingDepend.pl` input(stdin) ship('/homes/araceli/pigtest/pigtest_next/pigharness/dist/pig_harness/libexec/PigTest/PigStreamingDepend.pl', '/homes/araceli/pigtest/pigtest_next/pigharness/dist/pig_harness/libexec/PigTest/PigStreamingModule.pm'); A = load 'hdfs://namdenode1.domain.com:8020/user/hadoopqa/pig/tests/data'; B = stream A through `perl PigStreaming.pl`; C = stream B through CMD as (name, age, gpa); D = foreach C generate name, age; store D into 'hdfs://namenode1.domain.com:8020/user/hadoopqa/pig/out1/user/hadoopqa/pig/out/hadoopqa.1327441396/dotNext_baseline_15.out'; fs -cp hdfs://namenode1.domain.com:8020/user/hadoopqa/pig/out1/user/hadoopqa/pig/out/hadoopqa.1327441396/dotNext_baseline_15.out /user/hadoopqa/pig/out/hadoopqa.1327441396/dotNext_baseline_15.out :: /grid/2/dev/pigqa/out/pigtest/hadoopqa/hadoopqa.1327441396/dotNext_baseline_1.pig :: This test PASSES. It uses an explicit reference to NN1(hdfs://namenode1.domain.com:8020) for load and store a = load 'hdfs://namenode1.domain.com:8020/user/hadoopqa/pig/tests/data/singlefile/studenttab10k' as (name, age, gpa); store a into 'hdfs://namenode1.domain.com:8020/user/hadoopqa/pig/out1/user/hadoopqa/pig/out/hadoopqa.1327441396/dotNext_baseline_1.out' ; fs -cp hdfs://namenode1.domain.com:8020/user/hadoopqa/pig/out1/user/hadoopqa/pig/out/hadoopqa.1327441396/dotNext_baseline_1.out /user/hadoopqa/pig/out/hadoopqa.1327441396/dotNext_baseline_1.out THE REMAINING TESTS ARE IDENTICAL EXCEPT FOR THE FILE REFERNCE: explicit vs mount point :: /grid/2/dev/pigqa/out/pigtest/hadoopqa/hadoopqa.1327433551/dotNext_baseline_15.pig :: This test PASSES. Its the baseline for the test, it uses old style references. define CMD `perl PigStreamingDepend.pl` input(stdin) ship('/homes/araceli/pigtest/pigtest_next/pigharness/dist/pig_harness/libexec/PigTest/PigStreamingDepend.pl', '/homes/araceli/pigtest/pigtest_next/pigharness/dist/pig_harness/libexec/PigTest/PigStreamingModule.pm'); A = load '/user/hadoopqa/pig/tests/data'; B = stream A through `perl PigStreaming.pl`; C = stream B through CMD as (name, age, gpa); D = foreach C generate name, age; store D into '/user/hadoopqa/pig/out/hadoopqa.1327433551/dotNext_baseline_15.out'; :: grid/2/dev/pigqa/out/pigtest/hadoopqa/hadoopqa.1327431567/dotNext_baseline_15.pig :: This test PASSES. It uses a mount point to namenode 1( /data1 is a mount point for hdfs://namenode1.domain.com:8020/user/hadoopqa/pig/tests/data). define CMD `perl PigStreamingDepend.pl` input(stdin) ship('/homes/araceli/pigtest/pigtest_next/pigharness/dist/pig_harness/libexec/PigTest/PigStreamingDepend.pl', '/homes/araceli/pigtest/pigtest_next/pigharness/dist/pig_harness/libexec/PigTest/PigStreamingModule.pm'); A = load '/data1'; B = stream A through `perl PigStreaming.pl`; C = stream B through CMD as (name, age, gpa); D = foreach C generate name, age; store D into '/out1/user/hadoopqa/pig/out/hadoopqa.1327431567/dotNext_baseline_15.out'; fs -cp /out1/user/hadoopqa/pig/out/hadoopqa.1327431567/dotNext_baseline_15.out /user/hadoopqa/pig/out/hadoopqa.1327431567/dotNext_baseline_15.out -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2830) Macros should work in Grunt
[ https://issues.apache.org/jira/browse/PIG-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-2830: Fix Version/s: (was: 0.11) 0.12 Moving to 0.12 since nobody is working on this Macros should work in Grunt --- Key: PIG-2830 URL: https://issues.apache.org/jira/browse/PIG-2830 Project: Pig Issue Type: Improvement Components: grunt, parser Affects Versions: 0.10.0, 0.11, 0.10.1 Reporter: Russell Jurney Priority: Minor Labels: fun, grunt, happy, macro, pants Fix For: 0.12 It would be very helpful in writing Pig scripts if Grunt could load and use Macros in an interactive session. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2687) Add relation/operator scoping to Pig
[ https://issues.apache.org/jira/browse/PIG-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-2687: Fix Version/s: (was: 0.11) 0.12 Add relation/operator scoping to Pig Key: PIG-2687 URL: https://issues.apache.org/jira/browse/PIG-2687 Project: Pig Issue Type: Improvement Reporter: Jonathan Coveney Priority: Minor Fix For: 0.12 The idea is to add a real notion of scope that can be used to manage namespace. This would mean the addition of blocks to pig, probably with some sort of syntax like this... {code} a = load thing as (x:int, y:int); b = foreach a generate x, y, x*y as z; { a = group b by z; b = foreach a generate COUNT(b); global b; } {code} which would replace the alias b with the nested b value in the scope. This could also be used in nested foreach blocks, and macros could just become blocks as well. I am 95% sure about how to implement this... I have a failed patch attempt, and need to study a bit more about how Pig uses its logical operators. Any thoughts? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2584) Command line arguments for Pig script
[ https://issues.apache.org/jira/browse/PIG-2584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-2584: Fix Version/s: (was: 0.11) 0.12 Moving to 0.12 since nobody is working on this Command line arguments for Pig script - Key: PIG-2584 URL: https://issues.apache.org/jira/browse/PIG-2584 Project: Pig Issue Type: Improvement Components: impl Reporter: Daniel Dai Priority: Minor Fix For: 0.12 We did that for Jython embeded script. It is also useful in Pig script itself: command line: pig a.pig student.txt output a.pig: a = load '$1' as (a0, a1); store a into '$2'; -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-19) A=load causes parse error
[ https://issues.apache.org/jira/browse/PIG-19?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-19: -- Fix Version/s: (was: 0.11) 0.12 A=load causes parse error - Key: PIG-19 URL: https://issues.apache.org/jira/browse/PIG-19 Project: Pig Issue Type: Bug Components: grunt Reporter: Olga Natkovich Priority: Minor Fix For: 0.12 Parser expects spaces around =. This should be a minor change in src/org/apache/pig/tools/grunt/GruntParser.jj -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2522) deprecated hdfs pig commands do not work well with client side tables
[ https://issues.apache.org/jira/browse/PIG-2522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486959#comment-13486959 ] Olga Natkovich commented on PIG-2522: - Araceli, can you provide details: - which commands do not work and what errors you were seeing? deprecated hdfs pig commands do not work well with client side tables - Key: PIG-2522 URL: https://issues.apache.org/jira/browse/PIG-2522 Project: Pig Issue Type: Bug Reporter: Araceli Henley Priority: Trivial Fix For: 0.11 I'm mostly entering this Jira to make you aware that the deprecated pig api's to access hdfs (typically thru grunt) do not work consistently with federation. The hadoop references suported in grunt do work and can be used. It should at a minimum be noted in the documentation that the deprecated api's do not work with client side tables. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (PIG-2522) deprecated hdfs pig commands do not work well with client side tables
[ https://issues.apache.org/jira/browse/PIG-2522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich reassigned PIG-2522: --- Assignee: Rohini Palaniswamy Rohini, could you check if this is going to be an issue with federation. I am mostly concerned about commands like cd for which we do not have equivalent in the fs command. Thanks! deprecated hdfs pig commands do not work well with client side tables - Key: PIG-2522 URL: https://issues.apache.org/jira/browse/PIG-2522 Project: Pig Issue Type: Bug Reporter: Araceli Henley Assignee: Rohini Palaniswamy Priority: Trivial Fix For: 0.11 I'm mostly entering this Jira to make you aware that the deprecated pig api's to access hdfs (typically thru grunt) do not work consistently with federation. The hadoop references suported in grunt do work and can be used. It should at a minimum be noted in the documentation that the deprecated api's do not work with client side tables. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2834) MultiStorage requires unused constructor argument
[ https://issues.apache.org/jira/browse/PIG-2834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-2834: Fix Version/s: (was: 0.11) 0.12 MultiStorage requires unused constructor argument - Key: PIG-2834 URL: https://issues.apache.org/jira/browse/PIG-2834 Project: Pig Issue Type: Improvement Components: data Affects Versions: 0.10.0, 0.11 Environment: Linux Reporter: Danny Antonetti Priority: Trivial Labels: newbie Fix For: 0.12 Attachments: MultiStorage.patch each constructor in org.apache.pig.piggybank.storage.MultiStorage requires a constructor argument 'parentPathStr, that has no meaningful usage. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Pig 0.11
We are down to 45 tickets. Thanks for everybody who helped with the cleanup. We only have a couple of unassigned in the area of documentation and testing. Now we need to go through the assigned ones and see what can be done for 0.11. Here is a list of people with many tickets. Please review what you are planning to complete in the next couple of weeks and unlink the rest. Daniel - 15+ John Gordon - 6 Jonathan Coveney - 6 Thanks, Olga From: Olga Natkovich onatkov...@yahoo.com To: dev@pig.apache.org dev@pig.apache.org Sent: Friday, October 26, 2012 4:32 PM Subject: Re: Pig 0.11 74 issues still open and more than half unassigned. I think we should narrow list down next week. I am planning to start unlinking the unassigned ones next week so if you feel they need to be addressed, please, find owner. Olga - Original Message - From: Olga Natkovich onatkov...@yahoo.com To: dev@pig.apache.org dev@pig.apache.org Cc: Sent: Monday, October 22, 2012 10:14 AM Subject: Re: Pig 0.11 There are still 76 unresolved JIRAs more than half unassigned. Lets clean this up by theend of this week. I propose we do the following: (1) Unlink all JIRAs for new features since we already branched so we should not be taken on new work. If people feel strongly that some new features still need to go in please bring it up. (2) For bug fixes, if people fill strongly that some of the unassigned issues need to be addressed please take ownership. If you are unable to solve them but still feel they are important, please, bring them up. (3) Owners of unresolved issues, please, take a look if you will have time to solve them in the next 2 weeks. If not, lets move them to 12. If you can't address them but feel they are important, please, bring it up. Lets make sure that all JIRAs that require changes to the documentation have appropriate information in the release notes section so that we can quickly compile release documentation. Thanks for you help! Olga From: Alan Gates ga...@hortonworks.com To: dev@pig.apache.org Sent: Monday, October 15, 2012 11:55 AM Subject: Re: Pig 0.11 At this point no one has taken on release documentation for 0.11. Alan. On Oct 15, 2012, at 11:49 AM, Olga Natkovich wrote: Thanks! Are you talking about items 15 and 16 on the How To Release.Publish page? Also, who is doing release documentation these days? I can help with that as well. I would also be happy to roll the release if you guys need help with that. Olga From: Dmitriy Ryaboy dvrya...@gmail.com To: dev@pig.apache.org dev@pig.apache.org Cc: dev@pig.apache.org dev@pig.apache.org Sent: Friday, October 12, 2012 5:59 PM Subject: Re: Pig 0.11 Thanks Olga and welcome back! I know there's some process for linking jiras to releases, but I'm not sure what that is. If you could explain and maybe cover a portion of that work, that'd be super helpful. And reviews, of course. On Oct 12, 2012, at 2:06 PM, Olga Natkovich onatkov...@yahoo.com wrote: Dmitry, I would be happy to help with the release process. Want to get back into this now that I am back at work. Let me know what you would like me to do. Olga From: Dmitriy Ryaboy dvrya...@gmail.com To: dev@pig.apache.org Cc: billgra...@gmail.com Sent: Thursday, October 11, 2012 2:44 PM Subject: Re: Pig 0.11 Ok I will branch 0.11 tomorrow morning unless someone objects. From then on, committers should be careful to commit bug fixes to both 0.11 branch and trunk; minor polish can go into the branch, but whole new features should not (we can discuss on the list if something is in the gray area). D On Thu, Oct 11, 2012 at 2:16 PM, Gianmarco De Francisci Morales g...@apache.org wrote: I added it as a dependency as it has already its own Jira. I hope it is OK. Cheers, -- Gianmarco On Wed, Oct 10, 2012 at 11:23 PM, Bill Graham billgra...@gmail.com wrote: +1 for me. There's https://issues.apache.org/jira/browse/PIG-2756 which tracks a few documentation issues that should block Pig 0.11, but they can also be done on the trunk and merged to the branch. Gianmarco, you can add a rank subtask there to serve as a reminder. On Wed, Oct 10, 2012 at 11:03 PM, Gianmarco De Francisci Morales g...@apache.org wrote: We are missing some documentation on the RANK but I guess we could add that to the branch and trunk in parallel. All the patches I was keeping an eye on are in. So +1 for me. -- Gianmarco On Wed, Oct 10, 2012 at 5:31 PM, Jonathan Coveney jcove...@gmail.com wrote: I think all of the major patches are in, no? Now it's just bug testing? Just wanted to touch base on where we are at with this. -- *Note that I'm no longer using my Yahoo! email address. Please email me at billgra...@gmail.com going forward.*
[jira] [Updated] (PIG-2461) Simplify schema syntax for cast
[ https://issues.apache.org/jira/browse/PIG-2461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-2461: Fix Version/s: (was: 0.11) Simplify schema syntax for cast --- Key: PIG-2461 URL: https://issues.apache.org/jira/browse/PIG-2461 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.10.0 Reporter: Daniel Dai Fix For: 0.12 Cast into a bag/tuple syntax is confusing: {code} b = foreach a generate (bag{tuple(int,double)})bag0; {code} It's pretty hard to get it right for users. We should make key word bag/tuple optional. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2625) Allow use of JRuby for control flow
[ https://issues.apache.org/jira/browse/PIG-2625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-2625: Fix Version/s: (was: 0.11) Allow use of JRuby for control flow --- Key: PIG-2625 URL: https://issues.apache.org/jira/browse/PIG-2625 Project: Pig Issue Type: New Feature Reporter: Jonathan Coveney Fix For: 0.12 Much like people can use jython for iterative computation, it'd be great to use JRuby for the same -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2625) Allow use of JRuby for control flow
[ https://issues.apache.org/jira/browse/PIG-2625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-2625: Fix Version/s: 0.12 Moving to 12 since this is an improvement with no activity at this point Allow use of JRuby for control flow --- Key: PIG-2625 URL: https://issues.apache.org/jira/browse/PIG-2625 Project: Pig Issue Type: New Feature Reporter: Jonathan Coveney Fix For: 0.12 Much like people can use jython for iterative computation, it'd be great to use JRuby for the same -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2628) Allow in line scripting UDF definitions
[ https://issues.apache.org/jira/browse/PIG-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-2628: Fix Version/s: (was: 0.11) 0.12 Moving to 0.12 since it is an improvement with no work done yet Allow in line scripting UDF definitions --- Key: PIG-2628 URL: https://issues.apache.org/jira/browse/PIG-2628 Project: Pig Issue Type: Improvement Reporter: Jonathan Coveney Fix For: 0.12 For small udfs in scripting languages, it may be cumbersome to force users to make a script, put it on the classpath, ship it, etc. It would be great to support a syntax that allows people to declare UDFs in line (essentially, to define a snippet of code that will be interpreted as a scriptlet) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2624) Handle recursive inclusion of scripts in JRuby UDFs
[ https://issues.apache.org/jira/browse/PIG-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-2624: Fix Version/s: (was: 0.10.1) (was: 0.11) 0.12 Moving to 0.12 since there no work or person assigned to date Handle recursive inclusion of scripts in JRuby UDFs --- Key: PIG-2624 URL: https://issues.apache.org/jira/browse/PIG-2624 Project: Pig Issue Type: Improvement Affects Versions: 0.10.0, 0.11 Reporter: Jonathan Coveney Labels: JRuby Fix For: 0.12 Currently, if you have a script which require's another script, the dependency won't be properly handled. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2631) Pig should allow self joins
[ https://issues.apache.org/jira/browse/PIG-2631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-2631: Fix Version/s: 0.12 Moving to 12 since no work has been done and the ticket is unassigned Pig should allow self joins --- Key: PIG-2631 URL: https://issues.apache.org/jira/browse/PIG-2631 Project: Pig Issue Type: Improvement Reporter: Jonathan Coveney Fix For: 0.11, 0.12 This doesn't have to even be optimized, and can still involve a double scan of the data, but there is no reason the following should work: {code} a = load 'thing' as (x:int); b = join a by x, (foreach a generate *) by x; {code} but this does not: {code} a = load 'thing' as (x:int); b = join a by x, a by x; {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2631) Pig should allow self joins
[ https://issues.apache.org/jira/browse/PIG-2631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-2631: Fix Version/s: (was: 0.11) Pig should allow self joins --- Key: PIG-2631 URL: https://issues.apache.org/jira/browse/PIG-2631 Project: Pig Issue Type: Improvement Reporter: Jonathan Coveney Fix For: 0.12 This doesn't have to even be optimized, and can still involve a double scan of the data, but there is no reason the following should work: {code} a = load 'thing' as (x:int); b = join a by x, (foreach a generate *) by x; {code} but this does not: {code} a = load 'thing' as (x:int); b = join a by x, a by x; {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (PIG-2641) Create toJSON function for all complex types: tuples, bags and maps
[ https://issues.apache.org/jira/browse/PIG-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich reassigned PIG-2641: --- Assignee: Russell Jurney Hi Russell, Is this going to be done in the next couple of weeks? if not, should we move it to 12? Create toJSON function for all complex types: tuples, bags and maps --- Key: PIG-2641 URL: https://issues.apache.org/jira/browse/PIG-2641 Project: Pig Issue Type: New Feature Components: piggybank Affects Versions: 0.11, 0.10.1 Environment: Foggy. Damn foggy. Reporter: Russell Jurney Assignee: Russell Jurney Labels: chararray, fun, happy, input, json, output, pants, pig, piggybank, string, wonderdog Fix For: 0.11, 0.10.1 Original Estimate: 96h Remaining Estimate: 96h It is a travesty that there are no UDFs in Piggybanks that, given an arbitrary Pig datatype, return a JSON string of same. I intend to fix this problem. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2591) Unit tests should not write to /tmp but respect java.io.tmpdir
[ https://issues.apache.org/jira/browse/PIG-2591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-2591: Fix Version/s: (was: 0.11) 0.12 Moving to 12 since no work has been done and the ticket is unassigned Unit tests should not write to /tmp but respect java.io.tmpdir -- Key: PIG-2591 URL: https://issues.apache.org/jira/browse/PIG-2591 Project: Pig Issue Type: Bug Components: tools Reporter: Thomas Weise Fix For: 0.12 Several tests use /tmp but should derive temporary file location from java.io.tmpdir to avoid side effects (java.io.tmpdir is already set to a test run specific location in build.xml) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-1919) order-by on bag gives error only at runtime
[ https://issues.apache.org/jira/browse/PIG-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486524#comment-13486524 ] Olga Natkovich commented on PIG-1919: - Jonathan, should this be assigned to you? Is this going to be finished for 0.11 or should be moved to 0.12? order-by on bag gives error only at runtime --- Key: PIG-1919 URL: https://issues.apache.org/jira/browse/PIG-1919 Project: Pig Issue Type: Bug Affects Versions: 0.8.0, 0.9.0 Reporter: Thejas M Nair Fix For: 0.11, 0.10.1 Attachments: PIG-1919-0.patch, PIG-1919-1.patch, PIG-1919-1.patch Order-by on a bag or tuple should give error at query compile time, instead of giving an error at runtime. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2423) document use case where co-group is better choice than join
[ https://issues.apache.org/jira/browse/PIG-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486527#comment-13486527 ] Olga Natkovich commented on PIG-2423: - Thejas, should this be assigned to you? Is this going to go into 0.11 or 0.12? document use case where co-group is better choice than join Key: PIG-2423 URL: https://issues.apache.org/jira/browse/PIG-2423 Project: Pig Issue Type: Improvement Components: documentation Reporter: Thejas M Nair Fix For: 0.11 Optimization rules 2 and 3 suggested in https://issues.apache.org/jira/secure/attachment/12506841/pig_tpch.ppt (PIG-2397) recommend the use of co-group instead of join in certain cases. These should be documented in pig performance page. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2595) BinCond only works inside parentheses
[ https://issues.apache.org/jira/browse/PIG-2595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-2595: Fix Version/s: (was: 0.11) 0.12 Moving to 12 since no work has been done and the ticket is unassigned BinCond only works inside parentheses - Key: PIG-2595 URL: https://issues.apache.org/jira/browse/PIG-2595 Project: Pig Issue Type: Bug Reporter: Daniel Dai Fix For: 0.12 Not sure if we have a Jira for this before. This script does not work: {code} a = load '/user/pig/tests/data/singlefile/studenttab10k' using PigStorage() as (name, age:int, gpa:double, instate:chararray); b = foreach a generate name, instate=='true'?gpa:gpa+1; dump b; {code} If we put bincond into parentheses, it works {code} a = load '/user/pig/tests/data/singlefile/studenttab10k' using PigStorage() as (name, age:int, gpa:double, instate:chararray); b = foreach a generate name, (instate=='true'?gpa:gpa+1); dump b; {code} Exception: ERROR 1200: file 40.pig, line 2, column 36 mismatched input '==' expecting SEMI_COLON org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during parsing. file 40.pig, line 2, column 36 mismatched input '==' expecting SEMI_COLON at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1598) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1541) at org.apache.pig.PigServer.registerQuery(PigServer.java:541) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:945) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:392) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:190) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84) at org.apache.pig.Main.run(Main.java:599) at org.apache.pig.Main.main(Main.java:153) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: Failed to parse: file 40.pig, line 2, column 36 mismatched input '==' expecting SEMI_COLON at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:226) at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:168) at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1590) ... 14 more -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2434) investigate 5% slowdown in TPC-H Q6 query in 0.10
[ https://issues.apache.org/jira/browse/PIG-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486632#comment-13486632 ] Olga Natkovich commented on PIG-2434: - Thejas, any plan to address this for 0.11? investigate 5% slowdown in TPC-H Q6 query in 0.10 - Key: PIG-2434 URL: https://issues.apache.org/jira/browse/PIG-2434 Project: Pig Issue Type: Bug Affects Versions: 0.10.0 Reporter: Thejas M Nair Fix For: 0.11 0.10 is slower than 0.9 by around 5% for TPC-H Q6 query as per observation in https://issues.apache.org/jira/browse/PIG-2228?focusedCommentId=13171461page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13171461 . This needs to be investigated. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (PIG-2812) Spill InternalCachedBag into only 1 file
[ https://issues.apache.org/jira/browse/PIG-2812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich reassigned PIG-2812: --- Assignee: Haitao Yao Spill InternalCachedBag into only 1 file Key: PIG-2812 URL: https://issues.apache.org/jira/browse/PIG-2812 Project: Pig Issue Type: Bug Components: data Reporter: Haitao Yao Assignee: Haitao Yao Fix For: 0.11 Attachments: aa.jpg, spill.patch I encountered a reducer's OOM because of java.io.DeleteOnExitHook. And I found out that the InternalCachedBag creates a seperate tmp file, and the tmp files is deleted on exit. So the file delete hook caused the OOM. Why not just hold the tmp file handle and spill only one tmp file? Too many tmp files may block the tasktracker start process, if the tmp files are not cleaned on time and the tasktracker restarts at this specific time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2812) Spill InternalCachedBag into only 1 file
[ https://issues.apache.org/jira/browse/PIG-2812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486634#comment-13486634 ] Olga Natkovich commented on PIG-2812: - Alan - are you planning to review this one? Do we need to include this in 0.11? Spill InternalCachedBag into only 1 file Key: PIG-2812 URL: https://issues.apache.org/jira/browse/PIG-2812 Project: Pig Issue Type: Bug Components: data Reporter: Haitao Yao Assignee: Haitao Yao Fix For: 0.11 Attachments: aa.jpg, spill.patch I encountered a reducer's OOM because of java.io.DeleteOnExitHook. And I found out that the InternalCachedBag creates a seperate tmp file, and the tmp files is deleted on exit. So the file delete hook caused the OOM. Why not just hold the tmp file handle and spill only one tmp file? Too many tmp files may block the tasktracker start process, if the tmp files are not cleaned on time and the tasktracker restarts at this specific time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2681) TestDriverPig.countStores() does not correctly count the number of stores for pig scripts using variables for the alias
[ https://issues.apache.org/jira/browse/PIG-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-2681: Fix Version/s: (was: 0.10.1) (was: 0.9.3) (was: 0.11) 0.12 TestDriverPig.countStores() does not correctly count the number of stores for pig scripts using variables for the alias --- Key: PIG-2681 URL: https://issues.apache.org/jira/browse/PIG-2681 Project: Pig Issue Type: Test Components: e2e harness Affects Versions: 0.9.0, 0.9.1, 0.9.2, 0.10.0 Reporter: Araceli Henley Fix For: 0.12 Attachments: PIG-2681.patch For pig macros where the out parameter is referenced in a store statement, the TestDriveP.countStores() does not correctly count the number of stores: For example, the store will not be counted in : define myMacro(in1,in2) returns A { A = load '$in1' using PigStorage('$delimeter') as (intnum1000: int,id: int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: float,doublenum: double); store $A into '$out'; } countStores() matches with: $count += $q[$i] =~ /store\s+[a-zA-Z][a-zA-Z0-9_]*\s+into/i; Since the alias has a special character $ it doesn't count it and the test fails. Need to change this to: $count += $q[$i] =~ /store\s+(\$)?[a-zA-Z][a-zA-Z0-9_]*\s+into/i; I'll submit a patch shortly. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2981) add e2e tests for DateTime data type
[ https://issues.apache.org/jira/browse/PIG-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486636#comment-13486636 ] Olga Natkovich commented on PIG-2981: - Is anybody planning to add this or should it be moved to 0.12? add e2e tests for DateTime data type - Key: PIG-2981 URL: https://issues.apache.org/jira/browse/PIG-2981 Project: Pig Issue Type: Test Reporter: Thejas M Nair Fix For: 0.11 e2e tests for DateTime datatype need to be added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2974) StreamingLocal_11 e2e test hangs
[ https://issues.apache.org/jira/browse/PIG-2974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich resolved PIG-2974. - Resolution: Duplicate StreamingLocal_11 e2e test hangs Key: PIG-2974 URL: https://issues.apache.org/jira/browse/PIG-2974 Project: Pig Issue Type: Sub-task Affects Versions: 0.11 Reporter: Rohini Palaniswamy Fix For: 0.11 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2630) Issue with setting b = a;
[ https://issues.apache.org/jira/browse/PIG-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-2630: Fix Version/s: (was: 0.10.1) (was: 0.11) 0.12 Moving to 0.12 as no work has been done so far Issue with setting b = a; --- Key: PIG-2630 URL: https://issues.apache.org/jira/browse/PIG-2630 Project: Pig Issue Type: Bug Affects Versions: 0.10.0, 0.11 Reporter: Jonathan Coveney Fix For: 0.12 The following gives an error: {code} a = load 'thing' as (x:int); b = a; c = join a by x, b by x; {code} Error: {code} 2012-04-03 14:02:47,434 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: Pig script failed to parse: line 14, column 4 pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225: Projection with nothing to reference! {code} No issue with the following, however {code} a = load 'thing' as (x:int); b = foreach a generate *; c = join a by x, b by x; {code} oh and here is the log: {code} $ cat pig_1333487146863.log Pig Stack Trace --- ERROR 1200: Pig script failed to parse: line 3, column 4 pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225: Projection with nothing to reference! Failed to parse: Pig script failed to parse: line 3, column 4 pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225: Projection with nothing to reference! at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:182) at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1566) at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1539) at org.apache.pig.PigServer.registerQuery(PigServer.java:541) at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:945) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:392) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:190) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69) at org.apache.pig.Main.run(Main.java:535) at org.apache.pig.Main.main(Main.java:153) Caused by: line 3, column 4 pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225: Projection with nothing to reference! at org.apache.pig.parser.LogicalPlanBuilder.buildJoinOp(LogicalPlanBuilder.java:363) at org.apache.pig.parser.LogicalPlanGenerator.join_clause(LogicalPlanGenerator.java:11441) at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1491) at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:791) at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:509) at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:384) at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:175) ... 10 more {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3005) TestLargeFile#testOrderBy is failing
[ https://issues.apache.org/jira/browse/PIG-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-3005: Affects Version/s: (was: 0.12) (was: 0.11) Fix Version/s: (was: 0.11) TestLargeFile#testOrderBy is failing Key: PIG-3005 URL: https://issues.apache.org/jira/browse/PIG-3005 Project: Pig Issue Type: Sub-task Environment: Mac OSX 10.6.8 Reporter: Jonathan Coveney Fix For: 0.12 When run locally, at least, this test is failing for me. Has anyone else noticed this failing? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [ANNOUNCE] Welcome new Apache Pig Committers Rohini Palaniswamy
Congrats, Rohini! - Original Message - From: Daniel Dai da...@hortonworks.com To: dev@pig.apache.org; u...@pig.apache.org Cc: Sent: Friday, October 26, 2012 4:37 PM Subject: [ANNOUNCE] Welcome new Apache Pig Committers Rohini Palaniswamy Here is another Pig committer announcement today. Please welcome Rohini Palaniswamy to be a Pig committer! Thanks, Daniel
Re: Pig 0.11
74 issues still open and more than half unassigned. I think we should narrow list down next week. I am planning to start unlinking the unassigned ones next week so if you feel they need to be addressed, please, find owner. Olga - Original Message - From: Olga Natkovich onatkov...@yahoo.com To: dev@pig.apache.org dev@pig.apache.org Cc: Sent: Monday, October 22, 2012 10:14 AM Subject: Re: Pig 0.11 There are still 76 unresolved JIRAs more than half unassigned. Lets clean this up by theend of this week. I propose we do the following: (1) Unlink all JIRAs for new features since we already branched so we should not be taken on new work. If people feel strongly that some new features still need to go in please bring it up. (2) For bug fixes, if people fill strongly that some of the unassigned issues need to be addressed please take ownership. If you are unable to solve them but still feel they are important, please, bring them up. (3) Owners of unresolved issues, please, take a look if you will have time to solve them in the next 2 weeks. If not, lets move them to 12. If you can't address them but feel they are important, please, bring it up. Lets make sure that all JIRAs that require changes to the documentation have appropriate information in the release notes section so that we can quickly compile release documentation. Thanks for you help! Olga From: Alan Gates ga...@hortonworks.com To: dev@pig.apache.org Sent: Monday, October 15, 2012 11:55 AM Subject: Re: Pig 0.11 At this point no one has taken on release documentation for 0.11. Alan. On Oct 15, 2012, at 11:49 AM, Olga Natkovich wrote: Thanks! Are you talking about items 15 and 16 on the How To Release.Publish page? Also, who is doing release documentation these days? I can help with that as well. I would also be happy to roll the release if you guys need help with that. Olga From: Dmitriy Ryaboy dvrya...@gmail.com To: dev@pig.apache.org dev@pig.apache.org Cc: dev@pig.apache.org dev@pig.apache.org Sent: Friday, October 12, 2012 5:59 PM Subject: Re: Pig 0.11 Thanks Olga and welcome back! I know there's some process for linking jiras to releases, but I'm not sure what that is. If you could explain and maybe cover a portion of that work, that'd be super helpful. And reviews, of course. On Oct 12, 2012, at 2:06 PM, Olga Natkovich onatkov...@yahoo.com wrote: Dmitry, I would be happy to help with the release process. Want to get back into this now that I am back at work. Let me know what you would like me to do. Olga From: Dmitriy Ryaboy dvrya...@gmail.com To: dev@pig.apache.org Cc: billgra...@gmail.com Sent: Thursday, October 11, 2012 2:44 PM Subject: Re: Pig 0.11 Ok I will branch 0.11 tomorrow morning unless someone objects. From then on, committers should be careful to commit bug fixes to both 0.11 branch and trunk; minor polish can go into the branch, but whole new features should not (we can discuss on the list if something is in the gray area). D On Thu, Oct 11, 2012 at 2:16 PM, Gianmarco De Francisci Morales g...@apache.org wrote: I added it as a dependency as it has already its own Jira. I hope it is OK. Cheers, -- Gianmarco On Wed, Oct 10, 2012 at 11:23 PM, Bill Graham billgra...@gmail.com wrote: +1 for me. There's https://issues.apache.org/jira/browse/PIG-2756 which tracks a few documentation issues that should block Pig 0.11, but they can also be done on the trunk and merged to the branch. Gianmarco, you can add a rank subtask there to serve as a reminder. On Wed, Oct 10, 2012 at 11:03 PM, Gianmarco De Francisci Morales g...@apache.org wrote: We are missing some documentation on the RANK but I guess we could add that to the branch and trunk in parallel. All the patches I was keeping an eye on are in. So +1 for me. -- Gianmarco On Wed, Oct 10, 2012 at 5:31 PM, Jonathan Coveney jcove...@gmail.com wrote: I think all of the major patches are in, no? Now it's just bug testing? Just wanted to touch base on where we are at with this. -- *Note that I'm no longer using my Yahoo! email address. Please email me at billgra...@gmail.com going forward.*
[jira] [Commented] (PIG-2328) Add builtin UDFs for building and using bloom filters
[ https://issues.apache.org/jira/browse/PIG-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482451#comment-13482451 ] Olga Natkovich commented on PIG-2328: - This one are in builtins at least according to the patch, so they need to be in docs. I will create a doc patch, I just was not sure if it was in a different place Add builtin UDFs for building and using bloom filters - Key: PIG-2328 URL: https://issues.apache.org/jira/browse/PIG-2328 Project: Pig Issue Type: New Feature Components: internal-udfs Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.10.0, 0.11 Attachments: PIG-bloom-2.patch, PIG-bloom-3.patch, PIG-bloom.patch Bloom filters are a common way to do select a limited set of records before moving data for a join or other heavy weight operation. Pig should add UDFs to support building and using bloom filters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Pig 0.11
There are still 76 unresolved JIRAs more than half unassigned. Lets clean this up by theend of this week. I propose we do the following: (1) Unlink all JIRAs for new features since we already branched so we should not be taken on new work. If people feel strongly that some new features still need to go in please bring it up. (2) For bug fixes, if people fill strongly that some of the unassigned issues need to be addressed please take ownership. If you are unable to solve them but still feel they are important, please, bring them up. (3) Owners of unresolved issues, please, take a look if you will have time to solve them in the next 2 weeks. If not, lets move them to 12. If you can't address them but feel they are important, please, bring it up. Lets make sure that all JIRAs that require changes to the documentation have appropriate information in the release notes section so that we can quickly compile release documentation. Thanks for you help! Olga From: Alan Gates ga...@hortonworks.com To: dev@pig.apache.org Sent: Monday, October 15, 2012 11:55 AM Subject: Re: Pig 0.11 At this point no one has taken on release documentation for 0.11. Alan. On Oct 15, 2012, at 11:49 AM, Olga Natkovich wrote: Thanks! Are you talking about items 15 and 16 on the How To Release.Publish page? Also, who is doing release documentation these days? I can help with that as well. I would also be happy to roll the release if you guys need help with that. Olga From: Dmitriy Ryaboy dvrya...@gmail.com To: dev@pig.apache.org dev@pig.apache.org Cc: dev@pig.apache.org dev@pig.apache.org Sent: Friday, October 12, 2012 5:59 PM Subject: Re: Pig 0.11 Thanks Olga and welcome back! I know there's some process for linking jiras to releases, but I'm not sure what that is. If you could explain and maybe cover a portion of that work, that'd be super helpful. And reviews, of course. On Oct 12, 2012, at 2:06 PM, Olga Natkovich onatkov...@yahoo.com wrote: Dmitry, I would be happy to help with the release process. Want to get back into this now that I am back at work. Let me know what you would like me to do. Olga From: Dmitriy Ryaboy dvrya...@gmail.com To: dev@pig.apache.org Cc: billgra...@gmail.com Sent: Thursday, October 11, 2012 2:44 PM Subject: Re: Pig 0.11 Ok I will branch 0.11 tomorrow morning unless someone objects. From then on, committers should be careful to commit bug fixes to both 0.11 branch and trunk; minor polish can go into the branch, but whole new features should not (we can discuss on the list if something is in the gray area). D On Thu, Oct 11, 2012 at 2:16 PM, Gianmarco De Francisci Morales g...@apache.org wrote: I added it as a dependency as it has already its own Jira. I hope it is OK. Cheers, -- Gianmarco On Wed, Oct 10, 2012 at 11:23 PM, Bill Graham billgra...@gmail.com wrote: +1 for me. There's https://issues.apache.org/jira/browse/PIG-2756 which tracks a few documentation issues that should block Pig 0.11, but they can also be done on the trunk and merged to the branch. Gianmarco, you can add a rank subtask there to serve as a reminder. On Wed, Oct 10, 2012 at 11:03 PM, Gianmarco De Francisci Morales g...@apache.org wrote: We are missing some documentation on the RANK but I guess we could add that to the branch and trunk in parallel. All the patches I was keeping an eye on are in. So +1 for me. -- Gianmarco On Wed, Oct 10, 2012 at 5:31 PM, Jonathan Coveney jcove...@gmail.com wrote: I think all of the major patches are in, no? Now it's just bug testing? Just wanted to touch base on where we are at with this. -- *Note that I'm no longer using my Yahoo! email address. Please email me at billgra...@gmail.com going forward.*
[jira] [Commented] (PIG-2353) RANK function like in SQL
[ https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481537#comment-13481537 ] Olga Natkovich commented on PIG-2353: - Can you please add usage example to release notes section, thanks! RANK function like in SQL - Key: PIG-2353 URL: https://issues.apache.org/jira/browse/PIG-2353 Project: Pig Issue Type: New Feature Reporter: Gianmarco De Francisci Morales Assignee: Allan Avendaño Labels: gsoc2012, mentor Fix For: 0.11 Attachments: PIG-2353-2, PIG-2353-3.txt, PIG-2353-4.txt, PIG-2353-5.txt, PIG2353.patch Implement a function that given a (sorted) bag adds to each tuple a unique, increasing identifier without gaps, like what RANK does for SQL. This is a candidate project for Google summer of code 2012. More information about the program can be found at https://cwiki.apache.org/confluence/display/PIG/GSoc2012 Functionality implemented so far, is available at https://reviews.apache.org/r/5523/diff/#index_header -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2710) Implement Naive CUBE operator
[ https://issues.apache.org/jira/browse/PIG-2710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481591#comment-13481591 ] Olga Natkovich commented on PIG-2710: - Could you, please, include release notes including syntax and examples for inclusion in the documentation, thanks! Implement Naive CUBE operator - Key: PIG-2710 URL: https://issues.apache.org/jira/browse/PIG-2710 Project: Pig Issue Type: Sub-task Reporter: Dmitriy V. Ryaboy Assignee: Prasanth J Fix For: 0.11 Attachments: PIG-2710.1.patch The Naive CUBE operator is just syntactic sugar for the CubeDimensions UDFS followed by a flatten+group-by. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2600) Better Map support
[ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481596#comment-13481596 ] Olga Natkovich commented on PIG-2600: - can you please add to release notes the UDFs that were added as well as their syntax and usage examples. This is for inclusion in the documentation, thanks! Better Map support -- Key: PIG-2600 URL: https://issues.apache.org/jira/browse/PIG-2600 Project: Pig Issue Type: Improvement Reporter: Jonathan Coveney Assignee: Prashant Kommireddi Fix For: 0.11 Attachments: PIG-2600_2.patch, PIG-2600_3.patch, PIG-2600_4.patch, PIG-2600_5.patch, PIG-2600_6.patch, PIG-2600_7.patch, PIG-2600_8.patch, PIG-2600_9.patch, PIG-2600.patch It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps. - TOBAG should take a Map and output {(key, value)} - TOMAP should take a Bag in that same form and make a map. - KEYSET should return the set of keys. - VALUESET should return the set of values. - VALUELIST should return the List of values (no deduping). - INVERSEMAP would return a Map of values = the set of keys that refer to that Key This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it. I'd love to hear other things that would be useful for people! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2710) Implement Naive CUBE operator
[ https://issues.apache.org/jira/browse/PIG-2710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481603#comment-13481603 ] Olga Natkovich commented on PIG-2710: - Hi Prashanth, The release notes look great! Do they basically cover all work we have done in this release for CUBE related support? Implement Naive CUBE operator - Key: PIG-2710 URL: https://issues.apache.org/jira/browse/PIG-2710 Project: Pig Issue Type: Sub-task Reporter: Dmitriy V. Ryaboy Assignee: Prasanth J Fix For: 0.11 Attachments: PIG-2710.1.patch The Naive CUBE operator is just syntactic sugar for the CubeDimensions UDFS followed by a flatten+group-by. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Documentation planning for Pig 0.11 release
Hi, I have gone through the resolved JIRAs for 0.11, and here is what I believe needs to go into the documentation. Please, let me know if I missed anything. Also, I have not looked and anything that has not yet been committed: Bloom filter UDF: https://issues.apache.org/jira/browse/PIG-2328 Clear command in Grunt: https://issues.apache.org/jira/browse/PIG-2706 - this is already in the docs RANK operator: https://issues.apache.org/jira/browse/PIG-2353 - this is already in docs UDF convinience classes: https://issues.apache.org/jira/browse/PIG-2547 More efficient tuple support: https://issues.apache.org/jira/browse/PIG-2359 Pluggable progress notification: https://issues.apache.org/jira/browse/PIG-2525 Merge join after ORDER BY: https://issues.apache.org/jira/browse/PIG-2673 Measure time spent in UDF: https://issues.apache.org/jira/browse/PIG-2855 Storage func improvements: https://issues.apache.org/jira/browse/PIG-1891 UDFs to flatten bags: https://issues.apache.org/jira/browse/PIG-2166 Make Tuple iterable: https://issues.apache.org/jira/browse/PIG-2724 New accumulate interface: https://issues.apache.org/jira/browse/PIG-2651 RUBY UDF: https://issues.apache.org/jira/browse/PIG-2317 . Looks like this is also in 0.10. Was documentation for this committed to 10? Re-aliasing: https://issues.apache.org/jira/browse/PIG-438 . Looks like this is also in 0.10. Was documentation for this committed to 10? Groovy UDFs: https://issues.apache.org/jira/browse/PIG-2763 Docs already committed Native cube operator: https://issues.apache.org/jira/browse/PIG-2710 - docs at: http://goo.gl/SpUad Better map support: https://issues.apache.org/jira/browse/PIG-2600 - This needs release notes to include in docs. Olga
[jira] [Commented] (PIG-2600) Better Map support
[ https://issues.apache.org/jira/browse/PIG-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481738#comment-13481738 ] Olga Natkovich commented on PIG-2600: - Yes, please, put all the information into the release notes. This way it is much easier to created documentation patch. Better Map support -- Key: PIG-2600 URL: https://issues.apache.org/jira/browse/PIG-2600 Project: Pig Issue Type: Improvement Reporter: Jonathan Coveney Assignee: Prashant Kommireddi Fix For: 0.11 Attachments: PIG-2600_2.patch, PIG-2600_3.patch, PIG-2600_4.patch, PIG-2600_5.patch, PIG-2600_6.patch, PIG-2600_7.patch, PIG-2600_8.patch, PIG-2600_9.patch, PIG-2600.patch It would be nice if Pig played better with Maps. To that end, I'd like to add a lot of utility around Maps. - TOBAG should take a Map and output {(key, value)} - TOMAP should take a Bag in that same form and make a map. - KEYSET should return the set of keys. - VALUESET should return the set of values. - VALUELIST should return the List of values (no deduping). - INVERSEMAP would return a Map of values = the set of keys that refer to that Key This would all be pretty easy. A more substantial piece of work would be to make Pig support non-String keys (this is especially an issue since UDFs and whatnot probably assume that they are all Integers). Not sure if it is worth it. I'd love to hear other things that would be useful for people! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2353) RANK function like in SQL
[ https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481817#comment-13481817 ] Olga Natkovich commented on PIG-2353: - Yes, I think that's fine - I did not realize it was covered in a separate JIRA, thanks! RANK function like in SQL - Key: PIG-2353 URL: https://issues.apache.org/jira/browse/PIG-2353 Project: Pig Issue Type: New Feature Reporter: Gianmarco De Francisci Morales Assignee: Allan Avendaño Labels: gsoc2012, mentor Fix For: 0.11 Attachments: PIG-2353-2, PIG-2353-3.txt, PIG-2353-4.txt, PIG-2353-5.txt, PIG2353.patch Implement a function that given a (sorted) bag adds to each tuple a unique, increasing identifier without gaps, like what RANK does for SQL. This is a candidate project for Google summer of code 2012. More information about the program can be found at https://cwiki.apache.org/confluence/display/PIG/GSoc2012 Functionality implemented so far, is available at https://reviews.apache.org/r/5523/diff/#index_header -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Pig 0.11
There are still 76 unresolved JIRAs more than half unassigned. Lets clean this up by theend of this week. I propose we do the following: (1) Unlink all JIRAs for new features since we already branched so we should not be taken on new work. If people feel strongly that some new features still need to go in please bring it up. (2) For bug fixes, if people fill strongly that some of the unassigned issues need to be addressed please take ownership. If you are unable to solve them but still feel they are important, please, bring them up. (3) Owners of unresolved issues, please, take a look if you will have time to solve them in the next 2 weeks. If not, lets move them to 12. If you can't address them but feel they are important, please, bring it up. Lets make sure that all JIRAs that require changes to the documentation have appropriate information in the release notes section so that we can quickly compile release documentation. Thanks for you help! Olga From: Alan Gates ga...@hortonworks.com To: dev@pig.apache.org Sent: Monday, October 15, 2012 11:55 AM Subject: Re: Pig 0.11 At this point no one has taken on release documentation for 0.11. Alan. On Oct 15, 2012, at 11:49 AM, Olga Natkovich wrote: Thanks! Are you talking about items 15 and 16 on the How To Release.Publish page? Also, who is doing release documentation these days? I can help with that as well. I would also be happy to roll the release if you guys need help with that. Olga From: Dmitriy Ryaboy dvrya...@gmail.com To: dev@pig.apache.org dev@pig.apache.org Cc: dev@pig.apache.org dev@pig.apache.org Sent: Friday, October 12, 2012 5:59 PM Subject: Re: Pig 0.11 Thanks Olga and welcome back! I know there's some process for linking jiras to releases, but I'm not sure what that is. If you could explain and maybe cover a portion of that work, that'd be super helpful. And reviews, of course. On Oct 12, 2012, at 2:06 PM, Olga Natkovich onatkov...@yahoo.com wrote: Dmitry, I would be happy to help with the release process. Want to get back into this now that I am back at work. Let me know what you would like me to do. Olga From: Dmitriy Ryaboy dvrya...@gmail.com To: dev@pig.apache.org Cc: billgra...@gmail.com Sent: Thursday, October 11, 2012 2:44 PM Subject: Re: Pig 0.11 Ok I will branch 0.11 tomorrow morning unless someone objects. From then on, committers should be careful to commit bug fixes to both 0.11 branch and trunk; minor polish can go into the branch, but whole new features should not (we can discuss on the list if something is in the gray area). D On Thu, Oct 11, 2012 at 2:16 PM, Gianmarco De Francisci Morales g...@apache.org wrote: I added it as a dependency as it has already its own Jira. I hope it is OK. Cheers, -- Gianmarco On Wed, Oct 10, 2012 at 11:23 PM, Bill Graham billgra...@gmail.com wrote: +1 for me. There's https://issues.apache.org/jira/browse/PIG-2756 which tracks a few documentation issues that should block Pig 0.11, but they can also be done on the trunk and merged to the branch. Gianmarco, you can add a rank subtask there to serve as a reminder. On Wed, Oct 10, 2012 at 11:03 PM, Gianmarco De Francisci Morales g...@apache.org wrote: We are missing some documentation on the RANK but I guess we could add that to the branch and trunk in parallel. All the patches I was keeping an eye on are in. So +1 for me. -- Gianmarco On Wed, Oct 10, 2012 at 5:31 PM, Jonathan Coveney jcove...@gmail.com wrote: I think all of the major patches are in, no? Now it's just bug testing? Just wanted to touch base on where we are at with this. -- *Note that I'm no longer using my Yahoo! email address. Please email me at billgra...@gmail.com going forward.*
[jira] [Assigned] (PIG-2756) Documentation for 0.11
[ https://issues.apache.org/jira/browse/PIG-2756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich reassigned PIG-2756: --- Assignee: Olga Natkovich Documentation for 0.11 -- Key: PIG-2756 URL: https://issues.apache.org/jira/browse/PIG-2756 Project: Pig Issue Type: Bug Components: documentation Affects Versions: 0.11 Reporter: Bill Graham Assignee: Olga Natkovich Fix For: 0.11 Tracking areas where we need documentation on the pig.apache.org site (Javadocs are typically pretty good). We can open child tasks as needed. Please add to the list if you know of others. * Pluggable {{PigProgressNotificationListener}} isn't in the docs * Pluggable reducer estimators (see PIG-2574) * ILLUSTRATE seems to have dropped off the docs * {{HBaseStorage}} (see PIG-2341) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-1314) Add DateTime Support to Pig
[ https://issues.apache.org/jira/browse/PIG-1314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-1314: Fix Version/s: 0.11 Add DateTime Support to Pig --- Key: PIG-1314 URL: https://issues.apache.org/jira/browse/PIG-1314 Project: Pig Issue Type: Bug Components: data Affects Versions: 0.7.0 Reporter: Russell Jurney Assignee: Zhijie Shen Labels: gsoc2012 Fix For: 0.11 Attachments: joda_vs_builtin.zip, PIG-1314-1.patch, PIG-1314-2.patch, PIG-1314-3.patch, PIG-1314-4.patch, PIG-1314-5.patch, PIG-1314-6.patch, PIG-1314-7.patch Original Estimate: 672h Remaining Estimate: 672h Hadoop/Pig are primarily used to parse log data, and most logs have a timestamp component. Therefore Pig should support dates as a primitive. Can someone familiar with adding types to pig comment on how hard this is? We're looking at doing this, rather than use UDFs. Is this a patch that would be accepted? This is a candidate project for Google summer of code 2012. More information about the program can be found at https://cwiki.apache.org/confluence/display/PIG/GSoc2012 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (PIG-2980) documentation for DateTime datatype
[ https://issues.apache.org/jira/browse/PIG-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich reassigned PIG-2980: --- Assignee: Zhijie Shen documentation for DateTime datatype --- Key: PIG-2980 URL: https://issues.apache.org/jira/browse/PIG-2980 Project: Pig Issue Type: Bug Components: documentation Reporter: Thejas M Nair Assignee: Zhijie Shen Fix For: 0.11 Documentation for new DateTime type needs to be added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2980) documentation for DateTime datatype
[ https://issues.apache.org/jira/browse/PIG-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481938#comment-13481938 ] Olga Natkovich commented on PIG-2980: - Sounds good. Zhijie, please, re-assign to me once you provide the information, thanks! documentation for DateTime datatype --- Key: PIG-2980 URL: https://issues.apache.org/jira/browse/PIG-2980 Project: Pig Issue Type: Bug Components: documentation Reporter: Thejas M Nair Assignee: Zhijie Shen Fix For: 0.11 Documentation for new DateTime type needs to be added. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2328) Add builtin UDFs for building and using bloom filters
[ https://issues.apache.org/jira/browse/PIG-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481948#comment-13481948 ] Olga Natkovich commented on PIG-2328: - Looks like the change made it into 10 but what about documentation? I could not find it ib builtins but just want to make sure it was not put in some other place? Add builtin UDFs for building and using bloom filters - Key: PIG-2328 URL: https://issues.apache.org/jira/browse/PIG-2328 Project: Pig Issue Type: New Feature Components: internal-udfs Reporter: Alan Gates Assignee: Alan Gates Fix For: 0.10.0, 0.11 Attachments: PIG-bloom-2.patch, PIG-bloom-3.patch, PIG-bloom.patch Bloom filters are a common way to do select a limited set of records before moving data for a join or other heavy weight operation. Pig should add UDFs to support building and using bloom filters. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Pig 0.11
Thanks! Are you talking about items 15 and 16 on the How To Release.Publish page? Also, who is doing release documentation these days? I can help with that as well. I would also be happy to roll the release if you guys need help with that. Olga From: Dmitriy Ryaboy dvrya...@gmail.com To: dev@pig.apache.org dev@pig.apache.org Cc: dev@pig.apache.org dev@pig.apache.org Sent: Friday, October 12, 2012 5:59 PM Subject: Re: Pig 0.11 Thanks Olga and welcome back! I know there's some process for linking jiras to releases, but I'm not sure what that is. If you could explain and maybe cover a portion of that work, that'd be super helpful. And reviews, of course. On Oct 12, 2012, at 2:06 PM, Olga Natkovich onatkov...@yahoo.com wrote: Dmitry, I would be happy to help with the release process. Want to get back into this now that I am back at work. Let me know what you would like me to do. Olga From: Dmitriy Ryaboy dvrya...@gmail.com To: dev@pig.apache.org Cc: billgra...@gmail.com Sent: Thursday, October 11, 2012 2:44 PM Subject: Re: Pig 0.11 Ok I will branch 0.11 tomorrow morning unless someone objects. From then on, committers should be careful to commit bug fixes to both 0.11 branch and trunk; minor polish can go into the branch, but whole new features should not (we can discuss on the list if something is in the gray area). D On Thu, Oct 11, 2012 at 2:16 PM, Gianmarco De Francisci Morales g...@apache.org wrote: I added it as a dependency as it has already its own Jira. I hope it is OK. Cheers, -- Gianmarco On Wed, Oct 10, 2012 at 11:23 PM, Bill Graham billgra...@gmail.com wrote: +1 for me. There's https://issues.apache.org/jira/browse/PIG-2756 which tracks a few documentation issues that should block Pig 0.11, but they can also be done on the trunk and merged to the branch. Gianmarco, you can add a rank subtask there to serve as a reminder. On Wed, Oct 10, 2012 at 11:03 PM, Gianmarco De Francisci Morales g...@apache.org wrote: We are missing some documentation on the RANK but I guess we could add that to the branch and trunk in parallel. All the patches I was keeping an eye on are in. So +1 for me. -- Gianmarco On Wed, Oct 10, 2012 at 5:31 PM, Jonathan Coveney jcove...@gmail.com wrote: I think all of the major patches are in, no? Now it's just bug testing? Just wanted to touch base on where we are at with this. -- *Note that I'm no longer using my Yahoo! email address. Please email me at billgra...@gmail.com going forward.*
Re: Pig 0.11
Dmitry, I would be happy to help with the release process. Want to get back into this now that I am back at work. Let me know what you would like me to do. Olga From: Dmitriy Ryaboy dvrya...@gmail.com To: dev@pig.apache.org Cc: billgra...@gmail.com Sent: Thursday, October 11, 2012 2:44 PM Subject: Re: Pig 0.11 Ok I will branch 0.11 tomorrow morning unless someone objects. From then on, committers should be careful to commit bug fixes to both 0.11 branch and trunk; minor polish can go into the branch, but whole new features should not (we can discuss on the list if something is in the gray area). D On Thu, Oct 11, 2012 at 2:16 PM, Gianmarco De Francisci Morales g...@apache.org wrote: I added it as a dependency as it has already its own Jira. I hope it is OK. Cheers, -- Gianmarco On Wed, Oct 10, 2012 at 11:23 PM, Bill Graham billgra...@gmail.com wrote: +1 for me. There's https://issues.apache.org/jira/browse/PIG-2756 which tracks a few documentation issues that should block Pig 0.11, but they can also be done on the trunk and merged to the branch. Gianmarco, you can add a rank subtask there to serve as a reminder. On Wed, Oct 10, 2012 at 11:03 PM, Gianmarco De Francisci Morales g...@apache.org wrote: We are missing some documentation on the RANK but I guess we could add that to the branch and trunk in parallel. All the patches I was keeping an eye on are in. So +1 for me. -- Gianmarco On Wed, Oct 10, 2012 at 5:31 PM, Jonathan Coveney jcove...@gmail.com wrote: I think all of the major patches are in, no? Now it's just bug testing? Just wanted to touch base on where we are at with this. -- *Note that I'm no longer using my Yahoo! email address. Please email me at billgra...@gmail.com going forward.*
[jira] [Updated] (PIG-2442) Multiple Stores in pig streaming causes infinite waiting
[ https://issues.apache.org/jira/browse/PIG-2442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-2442: Fix Version/s: 0.10 Multiple Stores in pig streaming causes infinite waiting Key: PIG-2442 URL: https://issues.apache.org/jira/browse/PIG-2442 Project: Pig Issue Type: Bug Affects Versions: 0.8.1, 0.9.0 Reporter: Anitha Raju Fix For: 0.10 Hi, If there are multiple store in a pig streaming script, it goes into infinite waiting. Script {code} DEFINE SCRIPT `./a.pl` SHIP ('/homes/anithar/a.pl');; DEFINE SCRIPT1 `./b.pl` SHIP ('/homes/anithar/b.pl');; A = LOAD 'test.txt' USING PigStorage() ; B1 = STREAM A THROUGH SCRIPT ; B1 = foreach B1 generate $0; STORE B1 INTO 'B1' USING PigStorage(); B2 = STREAM B1 THROUGH SCRIPT1; STORE B2 INTO 'B2' USING PigStorage(); {code} a.pl #! /usr/bin/perl -w while (my $line = STDIN) { print uc($line); } b.pl - #! /usr/bin/perl -w while (my $line = STDIN) { print $line; } - Input (test.txt) {code} test hi hello {code} This infinite waiting happens randomly causing the job to fail with Task attempt failed to report status for 605 seconds. Killing!. Same happens with 0.8 version too. Regards, Anitha -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2426) ProgressableReporter.progress(String msg) is an empty function
[ https://issues.apache.org/jira/browse/PIG-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-2426: Fix Version/s: 0.10 ProgressableReporter.progress(String msg) is an empty function -- Key: PIG-2426 URL: https://issues.apache.org/jira/browse/PIG-2426 Project: Pig Issue Type: Bug Affects Versions: 0.8.1, 0.9.1 Reporter: Vivek Padmanabhan Assignee: Vivek Padmanabhan Priority: Minor Fix For: 0.10 Attachments: PIG-2426_1.patch In current implementation the reporter function ProgressableReporter.progress(String msg) is an empty function. If I have a long running UDF and I want update the status using a message, the preferred way is to use this api. The previous implementation of ProgressableReporter used org.apache.hadoop.mapred.Reporter api directly. But the currently used org.apache.hadoop.util.Progressable interface does not have api to set status as a given message. Hence I believe the empty method. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2410) Piggybank does not compile in 23
[ https://issues.apache.org/jira/browse/PIG-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13169789#comment-13169789 ] Olga Natkovich commented on PIG-2410: - I beleive that HadoopJobHistoryLoader.java issue was traced to JobHistory interface changes. I also remember that there was a discussion of having 2 version of the function. I believe that is confusing for users and we should make an effort to shim it the same way we did for Pif core code Piggybank does not compile in 23 Key: PIG-2410 URL: https://issues.apache.org/jira/browse/PIG-2410 Project: Pig Issue Type: Bug Components: piggybank Affects Versions: 0.10, 0.9.2, 0.11 Reporter: Daniel Dai Assignee: Daniel Dai Labels: hadoop2.0 Fix For: 0.10, 0.9.2, 0.11 These does not compile: AllLoader.java HiveRCInputFormat.java HadoopJobHistoryLoader.java HiveColumnarLoader.java PathPartitionHelper.java IndexedStorage.java -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2390) Support for empty schema in AS () syntax is broken
[ https://issues.apache.org/jira/browse/PIG-2390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich resolved PIG-2390. - Resolution: Won't Fix Support for empty schema in AS () syntax is broken - Key: PIG-2390 URL: https://issues.apache.org/jira/browse/PIG-2390 Project: Pig Issue Type: Bug Affects Versions: 0.9.1 Reporter: Francis Liu running this command in pig 0.8 works: A = LOAD 'myfile.txt' USING PigStorage('\t') AS () but in 0.9, you get: ERROR 1200: line 1, column 49 mismatched input ')' expecting IDENTIFIER_L -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2374) streaming regression with dotNext
[ https://issues.apache.org/jira/browse/PIG-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13164769#comment-13164769 ] Olga Natkovich commented on PIG-2374: - I think Ashutosh is brining a really good point. We seemed to always fixing things in Pig because understandably it is easier for us. However, if Hadoop is breaking contract they should be fixing this especially if we have to be paying performance penalty on this streaming regression with dotNext - Key: PIG-2374 URL: https://issues.apache.org/jira/browse/PIG-2374 Project: Pig Issue Type: Bug Environment: hadoopApache Pig version 0.9.2.101150 (r1200499) compiled Nov 10 2011, 19:50:15 -bash-3.1$ hadoop version Hadoop 0.23.0.080202 Subversion http://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.23.0/hadoop-common-project/hadoop-common -r 1196973 Compiled by hadoopqa on Tue Nov 8 02:12:04 PST 2011 From source with checksum 4e42b2d96c899a98a8ab8c7cc23f27ae Reporter: Araceli Henley Assignee: Daniel Dai Labels: hadoop2.0 Fix For: 0.9.2 Attachments: PIG-2374-1.patch Streaming seems to be broken in dotNext. There are several tests that are failing. The results from C below produce clean results. The results from D which are streamed through CMD produce control characters on some of the output. define CMD `perl GroupBy.pl '\t' 0` ship('/homes/monster/pigtest/pigtest_next/pigharness/dist/pig_harness/libexec/PigTest/GroupBy.pl'); A = load '/user/user1/pig/tests/data/singlefile/studenttab10k'; B = group A by $0; C = foreach B generate flatten(A); D = stream C through CMD; store C into '/user/user1/pig/out/user1.1321117428/ComputeSpec_7_C.out'; store D into '/user/user1/pig/out/user1.1321117428/ComputeSpec_7_D.out'; Other streaming tests that fail with control characters: EST FAILED ComputeSpec_7 TEST FAILED ComputeSpec_8 TEST FAILED ComputeSpec_10 TEST FAILED ComputeSpec_11 TEST FAILED ComputeSpec_12 TEST FAILED JobManagement_2 TEST FAILED JobManagement_3 TEST FAILED StreamingIO_4 TEST FAILED NonStreaming_1 TEST FAILED MultiQuery_21 ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2404) NullPointerException when I have multiple python udfs
[ https://issues.apache.org/jira/browse/PIG-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-2404: Assignee: xuting zhao NullPointerException when I have multiple python udfs - Key: PIG-2404 URL: https://issues.apache.org/jira/browse/PIG-2404 Project: Pig Issue Type: Bug Affects Versions: 0.8.1, 0.9.1 Reporter: Vivek Padmanabhan Assignee: xuting zhao Fix For: 0.9.2 When I have multiple python udfs registered, the script fails at compile phase while trying to get the udf ouputschema. {code} register 'a.py' using org.apache.pig.scripting.jython.JythonScriptEngine as a_func; register 'b.py' using org.apache.pig.scripting.jython.JythonScriptEngine as b_func; a = load 'i1' as (f1:chararray); b = foreach a generate a_func.helloworld(), b_func.square(3); dump b; {code} a.py {code} @outputSchema(word:chararray) def helloworld(): return 'Hello, World' {code} b.py {code} @outputSchemaFunction(squareSchema) def square(num): return ((num)*(num)) {code} Moreover , in the log we can see duplicate and incorrect registration of udfs which I believe the cause for the script failure. INFO org.apache.pig.scripting.jython.JythonScriptEngine - Register scripting UDF: a_func.helloworld INFO org.apache.pig.scripting.jython.JythonScriptEngine - Register scripting UDF: b_func.square INFO org.apache.pig.scripting.jython.JythonScriptEngine - Register scripting UDF: b_func.helloworld This issue is observed in 0.9,0.8 and in trunk also. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2404) NullPointerException when I have multiple python udfs
[ https://issues.apache.org/jira/browse/PIG-2404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-2404: Fix Version/s: 0.9.2 NullPointerException when I have multiple python udfs - Key: PIG-2404 URL: https://issues.apache.org/jira/browse/PIG-2404 Project: Pig Issue Type: Bug Affects Versions: 0.8.1, 0.9.1 Reporter: Vivek Padmanabhan Assignee: xuting zhao Fix For: 0.9.2 When I have multiple python udfs registered, the script fails at compile phase while trying to get the udf ouputschema. {code} register 'a.py' using org.apache.pig.scripting.jython.JythonScriptEngine as a_func; register 'b.py' using org.apache.pig.scripting.jython.JythonScriptEngine as b_func; a = load 'i1' as (f1:chararray); b = foreach a generate a_func.helloworld(), b_func.square(3); dump b; {code} a.py {code} @outputSchema(word:chararray) def helloworld(): return 'Hello, World' {code} b.py {code} @outputSchemaFunction(squareSchema) def square(num): return ((num)*(num)) {code} Moreover , in the log we can see duplicate and incorrect registration of udfs which I believe the cause for the script failure. INFO org.apache.pig.scripting.jython.JythonScriptEngine - Register scripting UDF: a_func.helloworld INFO org.apache.pig.scripting.jython.JythonScriptEngine - Register scripting UDF: b_func.square INFO org.apache.pig.scripting.jython.JythonScriptEngine - Register scripting UDF: b_func.helloworld This issue is observed in 0.9,0.8 and in trunk also. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2391) Bzip_2 test is broken
[ https://issues.apache.org/jira/browse/PIG-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163897#comment-13163897 ] Olga Natkovich commented on PIG-2391: - Hi Xuting, Could you explain what is causing this regression? It is not obvious to me what the fix is doing. Also, what would happen if another store function like BinStorage is used? Thanks Bzip_2 test is broken - Key: PIG-2391 URL: https://issues.apache.org/jira/browse/PIG-2391 Project: Pig Issue Type: Bug Affects Versions: 0.10 Reporter: Olga Natkovich Assignee: xuting zhao Fix For: 0.10, 0.11 Attachments: PIG-2391.patch This test is currently commented out but if you uncomment it it fails with Pig 10 but runs successfully with Pig 9. Script: a = load '/homes/olgan/studenttab10k' using PigStorage() as (name, age, gpa); store a into 'intermediate.bz'; b = load 'intermediate.bz'; store b into 'final.bz'; A couple of observations: (1) Identical script (represented by Bzip_1 test) that has bz2 instead of bz extension in the script succeeds in Pig 10 (2) The problem occurs while reading intermediate.bz which has different size with Pig 9 and Pig 10 (3) Problem can be reproduced in local mode with small subset of data in the file (4) The following stack trace is observed: 2011-12-01 13:53:12,280 [Thread-22] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0002 java.lang.RuntimeException: java.io.IOException: compressedStream EOF at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initNextRecordReader(PigRecordReader.java:237) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.init(PigRecordReader.java:109) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:119) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:588) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) Caused by: java.io.IOException: compressedStream EOF at org.apache.tools.bzip2r.CBZip2InputStream.cadvise(CBZip2InputStream.java:92) at org.apache.tools.bzip2r.CBZip2InputStream.compressedStreamEOF(CBZip2InputStream.java:96) at org.apache.tools.bzip2r.CBZip2InputStream.bsR(CBZip2InputStream.java:451) at org.apache.tools.bzip2r.CBZip2InputStream.initBlock(CBZip2InputStream.java:348) at org.apache.tools.bzip2r.CBZip2InputStream.init(CBZip2InputStream.java:220) at org.apache.pig.bzip2r.Bzip2TextInputFormat$BZip2LineRecordReader.init(Bzip2TextInputFormat.java:105) at org.apache.pig.bzip2r.Bzip2TextInputFormat.createRecordReader(Bzip2TextInputFormat.java:244) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initNextRecordReader(PigRecordReader.java:227) ... 5 more -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2391) Bzip_2 test is broken
[ https://issues.apache.org/jira/browse/PIG-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13163899#comment-13163899 ] Olga Natkovich commented on PIG-2391: - Looks like our comments crossed. So the issue is that Hadoop does not understand .bz extension and you need to fake it by saying it is actually bz2. Bzip_2 test is broken - Key: PIG-2391 URL: https://issues.apache.org/jira/browse/PIG-2391 Project: Pig Issue Type: Bug Affects Versions: 0.10 Reporter: Olga Natkovich Assignee: xuting zhao Fix For: 0.10, 0.11 Attachments: PIG-2391.patch This test is currently commented out but if you uncomment it it fails with Pig 10 but runs successfully with Pig 9. Script: a = load '/homes/olgan/studenttab10k' using PigStorage() as (name, age, gpa); store a into 'intermediate.bz'; b = load 'intermediate.bz'; store b into 'final.bz'; A couple of observations: (1) Identical script (represented by Bzip_1 test) that has bz2 instead of bz extension in the script succeeds in Pig 10 (2) The problem occurs while reading intermediate.bz which has different size with Pig 9 and Pig 10 (3) Problem can be reproduced in local mode with small subset of data in the file (4) The following stack trace is observed: 2011-12-01 13:53:12,280 [Thread-22] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0002 java.lang.RuntimeException: java.io.IOException: compressedStream EOF at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initNextRecordReader(PigRecordReader.java:237) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.init(PigRecordReader.java:109) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.createRecordReader(PigInputFormat.java:119) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:588) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) Caused by: java.io.IOException: compressedStream EOF at org.apache.tools.bzip2r.CBZip2InputStream.cadvise(CBZip2InputStream.java:92) at org.apache.tools.bzip2r.CBZip2InputStream.compressedStreamEOF(CBZip2InputStream.java:96) at org.apache.tools.bzip2r.CBZip2InputStream.bsR(CBZip2InputStream.java:451) at org.apache.tools.bzip2r.CBZip2InputStream.initBlock(CBZip2InputStream.java:348) at org.apache.tools.bzip2r.CBZip2InputStream.init(CBZip2InputStream.java:220) at org.apache.pig.bzip2r.Bzip2TextInputFormat$BZip2LineRecordReader.init(Bzip2TextInputFormat.java:105) at org.apache.pig.bzip2r.Bzip2TextInputFormat.createRecordReader(Bzip2TextInputFormat.java:244) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.initNextRecordReader(PigRecordReader.java:227) ... 5 more -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira