[jira] [Resolved] (PIG-3715) CLONE - Default split destination
[ https://issues.apache.org/jira/browse/PIG-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gianmarco De Francisci Morales resolved PIG-3715. - Resolution: Fixed CLONE - Default split destination - Key: PIG-3715 URL: https://issues.apache.org/jira/browse/PIG-3715 Project: Pig Issue Type: New Feature Reporter: Hardik Assignee: Gianmarco De Francisci Morales Labels: gsoc2011 Fix For: 0.10.0 split statement is better to have a default destination, eg: {code} SPLIT A INTO X IF f17, Y IF f2==5, Z IF (f36 OR f36), OTHER otherwise; -- OTHERS has all tuples with f1=7 f2!=5 f3==6 {code} This is a candidate project for Google summer of code 2011. More information about the program can be found at http://wiki.apache.org/pig/GSoc2011 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (PIG-4392) RANK BY fails when default_parallel is greater than cardinality of field being ranked by
[ https://issues.apache.org/jira/browse/PIG-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14304961#comment-14304961 ] Gianmarco De Francisci Morales commented on PIG-4392: - Yes, the order is guaranteed by RANK, which uses ORDER BY. Not sure about the reversed order either, otherwise LGTM +1. RANK BY fails when default_parallel is greater than cardinality of field being ranked by Key: PIG-4392 URL: https://issues.apache.org/jira/browse/PIG-4392 Project: Pig Issue Type: Bug Affects Versions: 0.11.1 Reporter: Anthony Hsu Assignee: Daniel Dai Fix For: 0.15.0 Attachments: PIG-4392-1.patch To reproduce: {code:title=input.txt} 1 2 3 4 5 6 7 8 9 {code} {code:title=rank.pig} set default_parallel 4; d = load 'input.txt' using PigStorage(' ') as (a:int, b:int, c:int); e = rank d by a; dump e; {code} If {{default_parallel}} is set to {{3}}, the script succeeds. So I'm guessing RANK BY has issues if the {{default_parallel}} exceeds the cardinality of the field being ranked by. I'm seeing this issue with Pig 0.11.1 (which has the PIG-2932 patch applied) and Hadoop 2.3.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [RESULT] [VOTE] Drop support for Hadoop 0.20 from Pig 0.14
We should add this in the release notes for 0.14 as well. Cheers, -- Gianmarco On 29 September 2014 19:25, Rohini Palaniswamy rohini.adi...@gmail.com wrote: My +1 as well. With 6 binding +1s, 8 non-binding +1s and no -1s this vote passes. Nothing special to address for this. PIG-3507 which went into Pig 0.14 used UserGroupInformation class without reflection and so Pig 0.14 is already incompatible with Hadoop 0.20. Regards, Rohini On Mon, Sep 22, 2014 at 5:56 PM, Thejas Nair the...@hortonworks.com wrote: +1 On Thu, Sep 18, 2014 at 5:50 PM, Mona Chitnis mona.chit...@yahoo.in wrote: +1 (non-binding) Mona Chitnis Yahoo! On Thursday, September 18, 2014 8:48 AM, Ashutosh Chauhan hashut...@apache.org wrote: +1 On Wed, Sep 17, 2014 at 7:02 PM, Daniel Dai da...@hortonworks.com wrote: +1 On Wed, Sep 17, 2014 at 11:12 AM, Prashant Kommireddi prash1...@gmail.com wrote: +1 On Wed, Sep 17, 2014 at 8:44 AM, Cheolsoo Park piaozhe...@gmail.com wrote: +1 On Wed, Sep 17, 2014 at 7:09 AM, Xuefu Zhang xzh...@cloudera.com wrote: +1 On Wed, Sep 17, 2014 at 7:04 AM, Julien Le Dem jul...@ledem.net wrote: +1 Julien -Original Message- From: Rohini Palaniswamy [mailto:rohini.adi...@gmail.com] Sent: Wednesday, September 17, 2014 12:38 PM To: dev@pig.apache.org Subject: [VOTE] Drop support for Hadoop 0.20 from Pig 0.14 Hi, Hadoop has matured far from Hadoop 0.20 and has had two major releases after that and there has been no development on branch-0.20 ( http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20/) for 3 years now. It is high time we drop support for Hadoop 0.20 and only support Hadoop 1.x and 2.x lines going forward. This will reduce the maintenance effort and also enable us to right more efficient code and cut down on reflections. Vote closes on Tuesday, Sep 23 2014. Thanks, Rohini -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You. -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
Re: [VOTE] Drop support for JDK 6 from Pig 0.14
+1 -- Gianmarco On 17 September 2014 10:11, Lorand Bendig lben...@gmail.com wrote: +1 On 17/09/14 06:47, Rohini Palaniswamy wrote: Hi, Hadoop is dropping support for JDK6 from hadoop-2.7 this year as mentioned in the mail below. Pig should also move to JDK7 to be able to compile against future hadoop 2.x releases and start making releases with jars (binaries, maven repo) compiled in JDK 7. This would also open it up for developers to code with JDK7 specific APIs. Vote closes on Tuesday, Sep 23 2014. Thanks, Rohini -- Forwarded message -- From: Arun C Murthy a...@hortonworks.com Date: Tue, Aug 19, 2014 at 10:52 AM Subject: Dropping support for JDK6 in Apache Hadoop To: d...@hbase.apache.org d...@hbase.apache.org, d...@hive.apache.org, dev@pig.apache.org, d...@oozie.apache.org Cc: common-...@hadoop.apache.org common-...@hadoop.apache.org [Apologies for the wide distribution.] Dear HBase/Hive/Pig/Oozie communities, We, over at Hadoop are considering dropping support for JDK6 this year. As you maybe aware we just released hadoop-2.5.0 and are now considering making the next release i.e. hadoop-2.6.0 the *last* release of Apache Hadoop which supports JDK6. This means, from hadoop-2.7.0 onwards we will not support JDK6 anymore and we *may* start relying on JDK7-specific apis. Now, the above releases a proposal and we do not want to pull the trigger without talking to projects downstream - hence the request for you feedback. Please feel free to forward this to other communities you might deem to be at risk from this too. thanks, Arun
Re: [VOTE] Drop support for Hadoop 0.20 from Pig 0.14
+1 -- Gianmarco On 17 September 2014 10:11, Lorand Bendig lben...@gmail.com wrote: +1 On 17/09/14 06:38, Rohini Palaniswamy wrote: Hi, Hadoop has matured far from Hadoop 0.20 and has had two major releases after that and there has been no development on branch-0.20 ( http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20/) for 3 years now. It is high time we drop support for Hadoop 0.20 and only support Hadoop 1.x and 2.x lines going forward. This will reduce the maintenance effort and also enable us to right more efficient code and cut down on reflections. Vote closes on Tuesday, Sep 23 2014. Thanks, Rohini
[jira] [Commented] (PIG-3900) SAMPLE and RANDOM should optionally stabilize their output from run-to-run, even across a large input set
[ https://issues.apache.org/jira/browse/PIG-3900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972525#comment-13972525 ] Gianmarco De Francisci Morales commented on PIG-3900: - +1 on the idea. SAMPLE and RANDOM should optionally stabilize their output from run-to-run, even across a large input set - Key: PIG-3900 URL: https://issues.apache.org/jira/browse/PIG-3900 Project: Pig Issue Type: Bug Reporter: Philip (flip) Kromer Priority: Minor Labels: features, random, sample, seed SAMPLE and RANDOM should be able to give output that is stable from run-to-run, yet random across a large input set. Although PIG-2965 allows the RANDOM function to be constructed with a seed, each mapper will generate the same sequence of values, which is unacceptable. It's typically undesirable to have the output of a large job be completely non-deterministic. Testing becomes difficult, and failed map tasks don't provide the same output from attempt to attempt, which complicates debugging. The most desirable implementation would provide a guarantee that a given seed and input data would produce an identical result in any environment. I believe this is difficult in a distributed environment, however. If each mapper added the index of its task ID to the provided seed, then the output would be stable for most practical purposes -- as long as the assignment of input splits to mappers doesn't change from job to job, the number produced for each row won't change from job to job. Doing it this way would be backwards compatible with the current Pig 0.12.0 implementation (PIG-2965) in the case of a single mapper (which is the only justifiable use of the current seed feature). Alternatively, one could use a hash of the input file path, the split offset, and the provided seed. Both approaches are not stable if the splitCombination logic is not stable. Suggested documentation for new functionality of RANDOM: {quote} This example constructs a function, providing a seed to control the series of numbers generated. Each of the three fields will have an independent series of random values, and the output will be stable from run to run. (Note that the result is only stable if the input splits remain stable). {code:sql} DEFINE rollRand RANDOM('12345'); DEFINE yawRand RANDOM('69'); DEFINE pitchRand RANDOM('42'); position = LOAD 'position.tsv'; orientation = FOREACH position GENERATE rollRand() AS roll:double, pitchRand() AS pitch:double, yawRand() AS yaw:double; {code} {quote} Suggested documentation for new functionality of SAMPLE: {quote} In this example, we provide a seed that stabilizes which rows are selected from run to run. (Note that the result is only stable if the input splits remain stable). {code:sql} a = LOAD 'a.txt'; b = SAMPLE A 0.1 SEED 42; {code} {quote} -- This message was sent by Atlassian JIRA (v6.2#6252)
Re: Pig 0.13.0 release
+1 on releasing a 0.13 and if somebody feels strongly about releasing a 0.12.1 that fixes PIG-3492 I am +1 on that too. -- Gianmarco On 13 February 2014 03:08, Dmitriy Ryaboy dvrya...@gmail.com wrote: So I think we agree that we should branch 0.13 at this point, right? (and possibly look at releasing an incremental bump to 12 or 10? I'm not sure what should be included there but I support the general idea). On Thu, Feb 6, 2014 at 2:22 PM, Koji Noguchi knogu...@yahoo-inc.com wrote: Releasing 0.13 and 0.10.1 is totally independent in my opinion I should have referenced my previous request on including 0.10.1 on the top release page. http://www.mail-archive.com/dev@pig.apache.org/msg20629.html By minor I meant 0.13 0.10.1 is a bug fix release. as in Major.Minor.BugFix I see. Then I should have said, but I'd like to request we make BugFix releases more often. Thanks for correcting my mistake. Koji On Feb 6, 2014, at 5:05 PM, Julien Le Dem jul...@ledem.net wrote: Releasing 0.13 and 0.10.1 is totally independent in my opinion. It just takes the time of a committer that needs the release to happen to do it. By minor I meant 0.13 0.10.1 is a bug fix release. as in Major.Minor.BugFix Our Major version is still 0 On Feb 6, 2014, at 1:43 PM, Koji Noguchi wrote: To add to the discussion, I think we should release more often, based on time elapsed rather than volume of change. I don't have preference on the frequency, but I'd like to request we make minor releases more often. At this moment, stable pig release (to me) is still 0.10.1. 0.11.1 and 0.12.0 both have regression bug PIG-3492 that caused multiple production pig scripts in our clusters to fail randomly. (unless user is disabling ColumnMapKeyPrune) If releasing 0.13 means 0.10.1 gets kicked out from the front release list, I'd like to see minor release on 0.11 or 0.12 first. Koji On Feb 6, 2014, at 4:25 PM, Cheolsoo Park piaozhe...@gmail.com wrote: +1 to 0.13 release. Why not if someone is volunteering? On Thu, Feb 6, 2014 at 4:06 PM, Julien Le Dem jul...@ledem.net wrote: To add to the discussion, I think we should release more often, based on time elapsed rather than volume of change. The more often we release, the easier it is to release. Also that makes it easier for contributors to use their own contributions in official releases. It is also probably a good idea to have a clean starting point before merging the Tez branch That said, I think those changes by themselves are enough to warrant a minor release. Julien On Feb 6, 2014, at 12:24 PM, Dmitriy Ryaboy wrote: Major updates since we release 12 that are currently in trunk: - lazy output (don't generate empty part files) - jar caching optimization - automatic local mode for small job (big wall-clock wins for long-tail jobs) - improved support for BigInteger, BigDecimal - hbase loader improvements - debug mode that leaves temp files around for examination (!) - fixes to a few nasty bugs (PIG-3641) - pluggable execution engine allowing work like Tez and Spork - .. and more I'd say this justifies a release. D On Wed, Feb 5, 2014 at 3:55 PM, Aniket Mokashi aniket...@gmail.com wrote: List I mentioned is pending tasks before we can make a release. A complete list of contributions can be seen at - http://svn.apache.org/viewvc/pig/trunk/CHANGES.txt?view=markup. Some of the things that make it a good candidate for a release- - PIG-3419 (has several backwards incompatible api changes) - PIG-2672 - PIG-3642 - PIG-3463 - PIG-3511 - PIG-3657 Thanks, Aniket On Wed, Feb 5, 2014 at 3:23 PM, Olga Natkovich onatkov...@yahoo.com wrote: Just going by the list that Aniket provided, I don't really see enough for a full release. Two mentioned JIRAs are doc updates and one is a bug fix that was ported into Pig 12. On Wednesday, February 5, 2014 3:13 PM, Aniket Mokashi aniket...@gmail.com wrote: Hi All, A good number of improvements and bug fixes have gone into trunk recently. I'd like to know if we can roll out a Pig 0.13 release around mid-March? I am aware that we are planning to merge tez branch into trunk soon. However, making a release before tez branch is merged will be good. Any objections? Following are few jiras we need to wrap up before 0.13 release- PIG-3591 PIG-3740 PIG-3745 PIG-3347 PIG-3731 Any other? Thanks, Aniket -- ...:::Aniket:::... Quetzalco@tl
Re: GSOC 2014
I had a partial implementation of Pig mavenization last year (that of course now is obsolete). I think it's not as complicated as it sounds, just time consuming. +1 on the idea, might even have some time to co-mentor. -- Gianmarco On 11 February 2014 20:49, Julien Le Dem jul...@ledem.net wrote: Some project ideas: - mavenize Pig (I know... but still, I think it should be done) - Compile physical operators to bytecode. I have a prototype, that could be made real by a student: https://github.com/julienledem/pig/compare/trunk...compile_physical_plan Julien On Feb 11, 2014, at 9:42 AM, Daniel Dai wrote: Any committer interested in mentoring? Any project ideas? We need to make project description ready by 2/24. Thanks, Daniel -- CONFIDENTIALITY NOTICE NOTICE: This message is intended for the use of the individual or entity to which it is addressed and may contain information that is confidential, privileged and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, you are hereby notified that any printing, copying, dissemination, distribution, disclosure or forwarding of this communication is strictly prohibited. If you have received this communication in error, please contact the sender immediately and delete it from your system. Thank You.
[jira] [Commented] (PIG-3642) Direct HDFS access for small jobs (fetch)
[ https://issues.apache.org/jira/browse/PIG-3642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860156#comment-13860156 ] Gianmarco De Francisci Morales commented on PIG-3642: - I am -0 on this idea. Skipping MR requires rewriting good part of the execution logic, and might introduce weird optimization bugs. More importantly, the added advantage brought by this feature is small. Usually, if you want to test your program on a small input, you copy it locally and run Pig in local mode. Direct HDFS access for small jobs (fetch) -- Key: PIG-3642 URL: https://issues.apache.org/jira/browse/PIG-3642 Project: Pig Issue Type: Improvement Reporter: Lorand Bendig Assignee: Lorand Bendig Fix For: 0.13.0 Attachments: PIG-3642.patch With this patch I'd like to add the possibility to directly read data from HDFS instead of launching MR jobs in case of simple (map-only) tasks. Hive already has this feature (fetch). This patch shares some similarities with the local mode of Pig 0.6. Here, fetching kicks off when the following holds for a script: * it contains only LIMIT, FILTER, UNION (if no split is generated), STREAM, (nested) FOREACH with expression operators, custom UDFs..etc * no scalar aliases * no SampleLoader * single leaf job * DUMP (no STORE) The feature is enabled by default and can be toggled with: * -N or -no_fetch * set opt.fetch true/false; There's no STORE support because I wanted to make it explicit that this optimization is for launching small/simple scripts during development, rather than querying and filtering large number of rows on the client machine. However, a threshold could be given on the input size (an estimation) to determine whether to prefer fetch over MR jobs, similar to what Hive's '{{hive.fetch.task.conversion.threshold}}' does. (through Pig's LoadMetadata#getStatistic ?) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (PIG-3453) Implement a Storm backend to Pig
[ https://issues.apache.org/jira/browse/PIG-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gianmarco De Francisci Morales updated PIG-3453: Status: Open (was: Patch Available) Canceling patch as it is not ready to be committed. Implement a Storm backend to Pig Key: PIG-3453 URL: https://issues.apache.org/jira/browse/PIG-3453 Project: Pig Issue Type: New Feature Affects Versions: 0.13.0 Reporter: Pradeep Gollakota Assignee: Jacob Perkins Labels: storm Fix For: 0.13.0 Attachments: storm-integration.patch There is a lot of interest around implementing a Storm backend to Pig for streaming processing. The proposal and initial discussions can be found at https://cwiki.apache.org/confluence/display/PIG/Pig+on+Storm+Proposal -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (PIG-3642) Direct HDFS access for small jobs (fetch)
[ https://issues.apache.org/jira/browse/PIG-3642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860228#comment-13860228 ] Gianmarco De Francisci Morales commented on PIG-3642: - I haven't reviewed the patch thoroughly so take my comments with the due care. I am just afraid that we will redo the same mistake we did with the local mode execution of Pig that you mention in the ticket. That mode of execution was removed because it was a burden to maintain, and in the end the two implementations (MR and local mode) were out of synch, resulting in the same script doing different things. I just want to avoid the same thing happening again. If [~cheolsoo] has reviewed the patch, I would like to hear his comments on this issue. Direct HDFS access for small jobs (fetch) -- Key: PIG-3642 URL: https://issues.apache.org/jira/browse/PIG-3642 Project: Pig Issue Type: Improvement Reporter: Lorand Bendig Assignee: Lorand Bendig Fix For: 0.13.0 Attachments: PIG-3642.patch With this patch I'd like to add the possibility to directly read data from HDFS instead of launching MR jobs in case of simple (map-only) tasks. Hive already has this feature (fetch). This patch shares some similarities with the local mode of Pig 0.6. Here, fetching kicks off when the following holds for a script: * it contains only LIMIT, FILTER, UNION (if no split is generated), STREAM, (nested) FOREACH with expression operators, custom UDFs..etc * no scalar aliases * no SampleLoader * single leaf job * DUMP (no STORE) The feature is enabled by default and can be toggled with: * -N or -no_fetch * set opt.fetch true/false; There's no STORE support because I wanted to make it explicit that this optimization is for launching small/simple scripts during development, rather than querying and filtering large number of rows on the client machine. However, a threshold could be given on the input size (an estimation) to determine whether to prefer fetch over MR jobs, similar to what Hive's '{{hive.fetch.task.conversion.threshold}}' does. (through Pig's LoadMetadata#getStatistic ?) -- This message was sent by Atlassian JIRA (v6.1.5#6160)
Re: Welcome our newest committer Prashant Kommireddi
Congrats! -- Gianmarco On 2 May 2013 22:26, Johnny Zhang xiao...@cloudera.com wrote: Congrats Prashant! On Thu, May 2, 2013 at 1:17 PM, Bill Graham billgra...@gmail.com wrote: Congrats Prashant! On Thu, May 2, 2013 at 1:11 PM, Daniel Dai da...@hortonworks.com wrote: Congratulation! On Thu, May 2, 2013 at 1:06 PM, Cheolsoo Park piaozhe...@gmail.com wrote: Congrats Prashant! On Thu, May 2, 2013 at 12:56 PM, Julien Le Dem jul...@ledem.net wrote: All, Please join me in welcoming Prashant Kommireddi as our newest Pig committer. He's been contributing to Pig for a while now. We look forward to him being a part of the project. Julien -- *Note that I'm no longer using my Yahoo! email address. Please email me at billgra...@gmail.com going forward.*
[jira] [Commented] (PIG-3225) Stratified sampling
[ https://issues.apache.org/jira/browse/PIG-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13637830#comment-13637830 ] Gianmarco De Francisci Morales commented on PIG-3225: - Hi Saiph, I am happy to see interest in this project idea. This idea should be combined with the other sampling projects in Pig as shown in https://cwiki.apache.org/confluence/display/PIG/GSoc2013 to prepare a GSoC project proposal. In my view, reservoir and bootstrap sampling are the easiest, while stratified sampling might be more complicated. Stratified sampling --- Key: PIG-3225 URL: https://issues.apache.org/jira/browse/PIG-3225 Project: Pig Issue Type: New Feature Reporter: Gianmarco De Francisci Morales Labels: gsoc2013 Implement a stratified sampling option ( http://en.wikipedia.org/wiki/Stratified_sampling ) in Pig's SAMPLE operator. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3221) Bootstrap sampling
[ https://issues.apache.org/jira/browse/PIG-3221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13637831#comment-13637831 ] Gianmarco De Francisci Morales commented on PIG-3221: - Hi Vicky, Thanks for your interest in this project idea. Given that Pig is not a statistics only, my current understanding is that we want the samples to be materialized because they can be used, e.g., to train an ensemble classifier. Of course the case where we are only interested in statistics can be optimized. Maybe a UDF would do the trick in this latter case. Bootstrap sampling -- Key: PIG-3221 URL: https://issues.apache.org/jira/browse/PIG-3221 Project: Pig Issue Type: New Feature Reporter: Gianmarco De Francisci Morales Labels: gsoc2013 Implement a bootstrap sampling option ( http://en.wikipedia.org/wiki/Bootstrap_(statistics) ) in Pig's SAMPLE operator. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3279) Support nested RANK
Gianmarco De Francisci Morales created PIG-3279: --- Summary: Support nested RANK Key: PIG-3279 URL: https://issues.apache.org/jira/browse/PIG-3279 Project: Pig Issue Type: Improvement Reporter: Gianmarco De Francisci Morales -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: GSoC 2013
+1 to what Dmitriy says. Cheers, -- Gianmarco On Mon, Apr 8, 2013 at 8:57 PM, Dmitriy Ryaboy dvrya...@gmail.com wrote: Hi, I think this is an interesting project but is not core to Pig itself -- it may be more interesting / viable as a standalone project on github that uses Pig to implement graph algorithms. At this point in its development, I feel that Pig needs to concentrate on doing the things it already does, and do them better (operator efficiency, storage efficiency, better MR plan generation, etc) rather than expand to specific verticals; we should allow our users to create their own solution suites that use Pig for specific purposes. A successful example of such a standalone project is PacketPig (https://github.com/packetloop/packetpig) , a PCAP network capture analysis tool. D On Tue, Apr 2, 2013 at 9:48 AM, burakkk burak.isi...@gmail.com wrote: I know that but giraph tries to use bsp. What I'm saying is nothing shared model except reducers. Besides I don't want to divide iteration. One phase is still responsible for whole iteration. Every different origin vertex will be processed in parallel. Thanks Best regards... On Tue, Apr 2, 2013 at 7:20 PM, Gianmarco De Francisci Morales g...@gdfm.me wrote: FYI, Giraph has a Random Walk implementation. Pig does not support iteration natively, so any iterative algorithm is not a very good fit for it. Just my 2c. Cheers, -- Gianmarco On Tue, Apr 2, 2013 at 10:04 AM, burakkk burak.isi...@gmail.com wrote: So what do you suggest? Is it clear? On Mon, Apr 1, 2013 at 9:35 PM, burakkk burak.isi...@gmail.com wrote: I'm using only WTF graph representation to fit the memory. By the way I haven't seen any explanation from the pig 0.11 release page about WTF or graph models. I don't wanna use Cassovary. I believe it can be done with pig. I implement a graph representation using WTF paper to pig and then I'll use it to implement random walk algorithm. To do that maybe I need to improve some features such as joins(fuzzy join) etc or implement a new operator. I can implement it using either existing operators or new operators. That's up to us and it doesn't really matter. If there is already a implementation to random walker algorithm, please feel free to tell. Because I haven't found it. Are you proposing to create an open-source implementation of those algorithms? Yes, I'm proposing to implement a random walk algorithm, new data model which is representing graph. After that, people can use it coding the pig. Do you suggest they should be Pig scripts added to the Pig project, or do you want to create some new operators? Maybe, it can be UDF or new operator. I made a quick example. It may not be completely accurate, I've just tried to explain it. Think about you have a graph file just like that user_id follower 1 2 1 3 1 10 2 3 3 4 3 5 ... Vertex List is an array including sorted vertex ids node List is a matrix including vertex id and its starting position graph = load 'graph' using PigStorage() (vertex:int, follower:int) - --load the graph file vertex = COGROUP graph BY (vertex); list = FOREACH vertex GENERATE org.apache.pig.generateVertex(vertex) as vertexList; --load the whole vertexes from HDFS into the memory list = FOREACH graph GENERATE org.apache.pig.generateNode(list) as nodeList; --load the whole vertexes from HDFS into the memory randomWalk = FOREACH vertex GENERATE flatten(org.apache.pig.RandomWalk(list, endVertex)) as score; -- generate a score using the node list you can traverse the graph to the your finishing position store... Thanks Best Regards... On Mon, Apr 1, 2013 at 7:20 PM, Dmitriy Ryaboy dvrya...@gmail.com wrote: I'm somewhat familiar with WTF code (my day job is managing the analytics infrastructure team at Twitter). WTF is implemented using Pig 0.11 (in fact some of the Pig 11 features/improvements are directly due to this project...), and mostly has to do with clever algorithms implemented in Pig (an earlier version of WTF loaded the graph into main memory on large-mem machines -- that system is open sourced, too, under github.com/twitter/cassovary). Are you proposing to create an open-source implementation of those algorithms? Do you suggest they should be Pig scripts added to the Pig project, or do you want to create some new operators? I'm not totally sure where you are going here. GSoC proposals for Pig are usually made by students who want to work on issues labeled as GSoC candidates
Re: GSoC 2013
FYI, Giraph has a Random Walk implementation. Pig does not support iteration natively, so any iterative algorithm is not a very good fit for it. Just my 2c. Cheers, -- Gianmarco On Tue, Apr 2, 2013 at 10:04 AM, burakkk burak.isi...@gmail.com wrote: So what do you suggest? Is it clear? On Mon, Apr 1, 2013 at 9:35 PM, burakkk burak.isi...@gmail.com wrote: I'm using only WTF graph representation to fit the memory. By the way I haven't seen any explanation from the pig 0.11 release page about WTF or graph models. I don't wanna use Cassovary. I believe it can be done with pig. I implement a graph representation using WTF paper to pig and then I'll use it to implement random walk algorithm. To do that maybe I need to improve some features such as joins(fuzzy join) etc or implement a new operator. I can implement it using either existing operators or new operators. That's up to us and it doesn't really matter. If there is already a implementation to random walker algorithm, please feel free to tell. Because I haven't found it. Are you proposing to create an open-source implementation of those algorithms? Yes, I'm proposing to implement a random walk algorithm, new data model which is representing graph. After that, people can use it coding the pig. Do you suggest they should be Pig scripts added to the Pig project, or do you want to create some new operators? Maybe, it can be UDF or new operator. I made a quick example. It may not be completely accurate, I've just tried to explain it. Think about you have a graph file just like that user_id follower 1 2 1 3 1 10 2 3 3 4 3 5 ... Vertex List is an array including sorted vertex ids node List is a matrix including vertex id and its starting position graph = load 'graph' using PigStorage() (vertex:int, follower:int) - --load the graph file vertex = COGROUP graph BY (vertex); list = FOREACH vertex GENERATE org.apache.pig.generateVertex(vertex) as vertexList; --load the whole vertexes from HDFS into the memory list = FOREACH graph GENERATE org.apache.pig.generateNode(list) as nodeList; --load the whole vertexes from HDFS into the memory randomWalk = FOREACH vertex GENERATE flatten(org.apache.pig.RandomWalk(list, endVertex)) as score; -- generate a score using the node list you can traverse the graph to the your finishing position store... Thanks Best Regards... On Mon, Apr 1, 2013 at 7:20 PM, Dmitriy Ryaboy dvrya...@gmail.com wrote: I'm somewhat familiar with WTF code (my day job is managing the analytics infrastructure team at Twitter). WTF is implemented using Pig 0.11 (in fact some of the Pig 11 features/improvements are directly due to this project...), and mostly has to do with clever algorithms implemented in Pig (an earlier version of WTF loaded the graph into main memory on large-mem machines -- that system is open sourced, too, under github.com/twitter/cassovary). Are you proposing to create an open-source implementation of those algorithms? Do you suggest they should be Pig scripts added to the Pig project, or do you want to create some new operators? I'm not totally sure where you are going here. GSoC proposals for Pig are usually made by students who want to work on issues labeled as GSoC candidates on the apache jira. The students spend some time to understand the problem stated in the jira, familiarize themselves with the existing codebase, and put a basic technical implementation plan and schedule into their proposal. Since in this case you are proposing something we haven't scoped or defined well for ourselves, we need you to be very clear and specific about what you are trying to do, and how you plan to go about it. I think that Graph processing in Pig (or other Hadoop-based systems) is a really interesting topic and there is a lot of work to be done, but we really need you to be far more detailed to be able to give you good guidance with regards to GSoC. Best, Dmitriy On Sat, Mar 30, 2013 at 10:12 AM, burakkk burak.isi...@gmail.com wrote: Sure. We can implement a graph model using WTF: The Who to Follow Service at Twitter article we can article.This article's said that in this way graph can be stored one machine's memory so that every node will read from HDFS and cache the graph to the memory. Every node is responsible from its bucket edge to process. I mean it can be splitted. Every node can be processed its bucket using random walk algorithm for instance. Finally it can be reduced to get to the final results. I hope it's clear :) Thanks Best Regards... On Fri, Mar 29, 2013 at 6:10 PM, Dmitriy Ryaboy dvrya...@gmail.com wrote: Hi Burakk, The general idea of making graph processing easier is a good one. I'm not sure what exactly you are proposing to do, though. Could you be more
[jira] [Commented] (PIG-3225) Stratified sampling
[ https://issues.apache.org/jira/browse/PIG-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614617#comment-13614617 ] Gianmarco De Francisci Morales commented on PIG-3225: - Hi Dishara, Happy to see your interest. While we haven't discussed in detail with the rest of the Committers, my personal view on this project is that it should be combined with the one on Bootstrap sampling PIG-3221 to be worth of GSoC. Regarding the sampling, this part of the project requires designing and changing the parser to recognize new part of the syntax for the SAMPLE operator (to specify the strata), and implementing the logical and physical operators connected to it. Stratified sampling --- Key: PIG-3225 URL: https://issues.apache.org/jira/browse/PIG-3225 Project: Pig Issue Type: New Feature Reporter: Gianmarco De Francisci Morales Labels: gsoc2013 Implement a stratified sampling option ( http://en.wikipedia.org/wiki/Stratified_sampling ) in Pig's SAMPLE operator. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3221) Bootstrap sampling
[ https://issues.apache.org/jira/browse/PIG-3221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614618#comment-13614618 ] Gianmarco De Francisci Morales commented on PIG-3221: - Here an example http://hortonworks.com/blog/bootstrap-sampling-with-apache-pig Bootstrap sampling -- Key: PIG-3221 URL: https://issues.apache.org/jira/browse/PIG-3221 Project: Pig Issue Type: New Feature Reporter: Gianmarco De Francisci Morales Labels: gsoc2013 Implement a bootstrap sampling option ( http://en.wikipedia.org/wiki/Bootstrap_(statistics) ) in Pig's SAMPLE operator. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3225) Stratified sampling
Gianmarco De Francisci Morales created PIG-3225: --- Summary: Stratified sampling Key: PIG-3225 URL: https://issues.apache.org/jira/browse/PIG-3225 Project: Pig Issue Type: New Feature Reporter: Gianmarco De Francisci Morales Implement a stratified sampling option ( http://en.wikipedia.org/wiki/Stratified_sampling ) in Pig's SAMPLE operator. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3224) Reservoir sampling
Gianmarco De Francisci Morales created PIG-3224: --- Summary: Reservoir sampling Key: PIG-3224 URL: https://issues.apache.org/jira/browse/PIG-3224 Project: Pig Issue Type: New Feature Reporter: Gianmarco De Francisci Morales Implement a reservoir sampling option, or make it the default ( http://en.wikipedia.org/wiki/Reservoir_sampling ) in Pig's SAMPLE operator. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3224) Reservoir sampling
[ https://issues.apache.org/jira/browse/PIG-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gianmarco De Francisci Morales updated PIG-3224: Labels: gsoc2013 (was: ) Reservoir sampling -- Key: PIG-3224 URL: https://issues.apache.org/jira/browse/PIG-3224 Project: Pig Issue Type: New Feature Reporter: Gianmarco De Francisci Morales Labels: gsoc2013 Implement a reservoir sampling option, or make it the default ( http://en.wikipedia.org/wiki/Reservoir_sampling ) in Pig's SAMPLE operator. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3225) Stratified sampling
[ https://issues.apache.org/jira/browse/PIG-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gianmarco De Francisci Morales updated PIG-3225: Labels: gsoc2013 (was: ) Stratified sampling --- Key: PIG-3225 URL: https://issues.apache.org/jira/browse/PIG-3225 Project: Pig Issue Type: New Feature Reporter: Gianmarco De Francisci Morales Labels: gsoc2013 Implement a stratified sampling option ( http://en.wikipedia.org/wiki/Stratified_sampling ) in Pig's SAMPLE operator. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3225) Stratified sampling
[ https://issues.apache.org/jira/browse/PIG-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gianmarco De Francisci Morales updated PIG-3225: Tags: (was: gsoc2013) Stratified sampling --- Key: PIG-3225 URL: https://issues.apache.org/jira/browse/PIG-3225 Project: Pig Issue Type: New Feature Reporter: Gianmarco De Francisci Morales Labels: gsoc2013 Implement a stratified sampling option ( http://en.wikipedia.org/wiki/Stratified_sampling ) in Pig's SAMPLE operator. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3221) Bootstrap sampling
[ https://issues.apache.org/jira/browse/PIG-3221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gianmarco De Francisci Morales updated PIG-3221: Tags: (was: gsoc2013) Bootstrap sampling -- Key: PIG-3221 URL: https://issues.apache.org/jira/browse/PIG-3221 Project: Pig Issue Type: New Feature Reporter: Gianmarco De Francisci Morales Labels: gsoc2013 Implement a bootstrap sampling option ( http://en.wikipedia.org/wiki/Bootstrap_(statistics) ) in Pig's SAMPLE operator. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3221) Bootstrap sampling
[ https://issues.apache.org/jira/browse/PIG-3221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gianmarco De Francisci Morales updated PIG-3221: Labels: gsoc2013 (was: ) Bootstrap sampling -- Key: PIG-3221 URL: https://issues.apache.org/jira/browse/PIG-3221 Project: Pig Issue Type: New Feature Reporter: Gianmarco De Francisci Morales Labels: gsoc2013 Implement a bootstrap sampling option ( http://en.wikipedia.org/wiki/Bootstrap_(statistics) ) in Pig's SAMPLE operator. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3221) Bootstrap sampling
Gianmarco De Francisci Morales created PIG-3221: --- Summary: Bootstrap sampling Key: PIG-3221 URL: https://issues.apache.org/jira/browse/PIG-3221 Project: Pig Issue Type: New Feature Reporter: Gianmarco De Francisci Morales Implement a bootstrap sampling option ( http://en.wikipedia.org/wiki/Bootstrap_(statistics) ) in Pig's SAMPLE operator. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [ANNOUNCE] Welcome Bill Graham to join Pig PMC
Congrats Bill! :) -- Gianmarco On Wed, Feb 20, 2013 at 10:00 AM, Jonathan Coveney jcove...@gmail.comwrote: congrats :) 2013/2/20 Jarek Jarcec Cecho jar...@apache.org Congratulations Bill, good job! Jarcec On Tue, Feb 19, 2013 at 01:48:18PM -0800, Daniel Dai wrote: Please welcome Bill Graham as our latest Pig PMC member. Congrats Bill!
[jira] [Updated] (PIG-2353) RANK function like in SQL
[ https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gianmarco De Francisci Morales updated PIG-2353: Release Note: Pig includes a new RANK operator: RANK relation ( BY column (ASC|DES)? (DENSE)? )? This operator prepends a consecutive integer to each tuple in the relation starting from 1. If the BY clause is present, RANK sorts the relation before ranking it, otherwise it uses the order in which it receives the relation (e.g. the order in which the relation is stored if RANK is performed right after a LOAD). The DENSE modifier produces a dense rank, which has no gaps in it regardless of ties. RANK is now a reserved keyword and is *not* backward compatible. Please review your scripts to avoid usage of RANK as a relation name. was: Pig includes a new RANK operator: RANK relation ( BY column (ASC|DES)? (DENSE)? )? This operator prepends a consecutive integer to each tuple in the relation starting from 1. If the BY clause is present, RANK sorts the relation before ranking it, otherwise it uses the order in which it receives the relation (e.g. the order in which the relation is stored if RANK is performed right after a LOAD). The DENSE modifier produces a dense rank, which has no gaps in it regardless of ties. RANK function like in SQL - Key: PIG-2353 URL: https://issues.apache.org/jira/browse/PIG-2353 Project: Pig Issue Type: New Feature Reporter: Gianmarco De Francisci Morales Assignee: Allan Avendaño Labels: gsoc2012, mentor Fix For: 0.11 Attachments: PIG-2353-2, PIG-2353-3.txt, PIG-2353-4.txt, PIG-2353-5.txt, PIG2353.patch Implement a function that given a (sorted) bag adds to each tuple a unique, increasing identifier without gaps, like what RANK does for SQL. This is a candidate project for Google summer of code 2012. More information about the program can be found at https://cwiki.apache.org/confluence/display/PIG/GSoc2012 Functionality implemented so far, is available at https://reviews.apache.org/r/5523/diff/#index_header -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2353) RANK function like in SQL
[ https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546296#comment-13546296 ] Gianmarco De Francisci Morales commented on PIG-2353: - Hi, sorry I guess I misunderstood. I thought that PIG-2947 was sufficient as documentation and that we just wanted to clarify the release notes. Should I open a separate Jira to include the release notes of the Jira inside RELEASE_NOTES.txt ? RANK function like in SQL - Key: PIG-2353 URL: https://issues.apache.org/jira/browse/PIG-2353 Project: Pig Issue Type: New Feature Reporter: Gianmarco De Francisci Morales Assignee: Allan Avendaño Labels: gsoc2012, mentor Fix For: 0.11 Attachments: PIG-2353-2, PIG-2353-3.txt, PIG-2353-4.txt, PIG-2353-5.txt, PIG2353.patch Implement a function that given a (sorted) bag adds to each tuple a unique, increasing identifier without gaps, like what RANK does for SQL. This is a candidate project for Google summer of code 2012. More information about the program can be found at https://cwiki.apache.org/confluence/display/PIG/GSoc2012 Functionality implemented so far, is available at https://reviews.apache.org/r/5523/diff/#index_header -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2362) Rework Ant build.xml to use macrodef instead of antcall
[ https://issues.apache.org/jira/browse/PIG-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13543726#comment-13543726 ] Gianmarco De Francisci Morales commented on PIG-2362: - Hi Cheolsoo, Good catch, thanks. I wasn't familiar with PIG-2748. I verified that {{ant mvn-jar}} passes. Also ran the {{eclipse-files}}, {{src-release}}, {{tar-release}} targets and verified their output. +1 to the last patch Rework Ant build.xml to use macrodef instead of antcall --- Key: PIG-2362 URL: https://issues.apache.org/jira/browse/PIG-2362 Project: Pig Issue Type: Improvement Reporter: Gianmarco De Francisci Morales Assignee: Gianmarco De Francisci Morales Priority: Minor Fix For: 0.12 Attachments: PIG-2362.10.patch, PIG-2362.1.patch, PIG-2362.2.patch, PIG-2362.3.patch, PIG-2362.4.patch, PIG-2362.5.patch, PIG-2362.6.patch, PIG-2362.7.patch, PIG-2362.8.patch, PIG-2362.9.patch, PIG-2362.9.patch.nowhitespace Antcall is evil: http://www.build-doctor.com/2008/03/13/antcall-is-evil/ We'd better use macrodef and let Ant build a clean dependency graph. http://ant.apache.org/manual/Tasks/macrodef.html Right now we do like this: {code} target name=buildAllJars antcall target=buildJar param name=build.dir value=jar-A/ /antcall antcall target=buildJar param name=build.dir value=jar-B/ /antcall antcall target=buildJar param name=build.dir value=jar-C/ /antcall /target target name=buildJar jar destfile=target/${build.dir}.jar basedir=${build.dir}/classfiles/ /target {code} But it would be better if we did like this: {code} target name=buildAllJars buildJar build.dir=jar-A/ buildJar build.dir=jar-B/ buildJar build.dir=jar-C/ /target macrodef name=buildJar attribute name=build.dir/ jar destfile=target/${build.dir}.jar basedir=${build.dir}/classfiles/ /macrodef {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2362) Rework Ant build.xml to use macrodef instead of antcall
[ https://issues.apache.org/jira/browse/PIG-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13542216#comment-13542216 ] Gianmarco De Francisci Morales commented on PIG-2362: - Hi Cheolsoo, Would you mind rebasing it to trunk once again? It does not apply cleanly anymore. I will review it as soon as it is rebased. Rework Ant build.xml to use macrodef instead of antcall --- Key: PIG-2362 URL: https://issues.apache.org/jira/browse/PIG-2362 Project: Pig Issue Type: Improvement Reporter: Gianmarco De Francisci Morales Assignee: Gianmarco De Francisci Morales Priority: Minor Fix For: 0.12 Attachments: PIG-2362.1.patch, PIG-2362.2.patch, PIG-2362.3.patch, PIG-2362.4.patch, PIG-2362.5.patch, PIG-2362.6.patch, PIG-2362.7.patch, PIG-2362.8.patch Antcall is evil: http://www.build-doctor.com/2008/03/13/antcall-is-evil/ We'd better use macrodef and let Ant build a clean dependency graph. http://ant.apache.org/manual/Tasks/macrodef.html Right now we do like this: {code} target name=buildAllJars antcall target=buildJar param name=build.dir value=jar-A/ /antcall antcall target=buildJar param name=build.dir value=jar-B/ /antcall antcall target=buildJar param name=build.dir value=jar-C/ /antcall /target target name=buildJar jar destfile=target/${build.dir}.jar basedir=${build.dir}/classfiles/ /target {code} But it would be better if we did like this: {code} target name=buildAllJars buildJar build.dir=jar-A/ buildJar build.dir=jar-B/ buildJar build.dir=jar-C/ /target macrodef name=buildJar attribute name=build.dir/ jar destfile=target/${build.dir}.jar basedir=${build.dir}/classfiles/ /macrodef {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2362) Rework Ant build.xml to use macrodef instead of antcall
[ https://issues.apache.org/jira/browse/PIG-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gianmarco De Francisci Morales updated PIG-2362: Status: Open (was: Patch Available) Rework Ant build.xml to use macrodef instead of antcall --- Key: PIG-2362 URL: https://issues.apache.org/jira/browse/PIG-2362 Project: Pig Issue Type: Improvement Reporter: Gianmarco De Francisci Morales Assignee: Gianmarco De Francisci Morales Priority: Minor Fix For: 0.12 Attachments: PIG-2362.1.patch, PIG-2362.2.patch, PIG-2362.3.patch, PIG-2362.4.patch, PIG-2362.5.patch, PIG-2362.6.patch, PIG-2362.7.patch, PIG-2362.8.patch Antcall is evil: http://www.build-doctor.com/2008/03/13/antcall-is-evil/ We'd better use macrodef and let Ant build a clean dependency graph. http://ant.apache.org/manual/Tasks/macrodef.html Right now we do like this: {code} target name=buildAllJars antcall target=buildJar param name=build.dir value=jar-A/ /antcall antcall target=buildJar param name=build.dir value=jar-B/ /antcall antcall target=buildJar param name=build.dir value=jar-C/ /antcall /target target name=buildJar jar destfile=target/${build.dir}.jar basedir=${build.dir}/classfiles/ /target {code} But it would be better if we did like this: {code} target name=buildAllJars buildJar build.dir=jar-A/ buildJar build.dir=jar-B/ buildJar build.dir=jar-C/ /target macrodef name=buildJar attribute name=build.dir/ jar destfile=target/${build.dir}.jar basedir=${build.dir}/classfiles/ /macrodef {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2353) RANK function like in SQL
[ https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13535455#comment-13535455 ] Gianmarco De Francisci Morales commented on PIG-2353: - Hi Jonathan, Yes, RANK is now an operator and thus a reserved keyword. We can add it to the release notes. The parser is definitely a bit rough and could use some reworking, especially in the error messages, so I am all in for it. Not sure if it is a known issue. Can you use LOAD or FOREACH as column names? RANK function like in SQL - Key: PIG-2353 URL: https://issues.apache.org/jira/browse/PIG-2353 Project: Pig Issue Type: New Feature Reporter: Gianmarco De Francisci Morales Assignee: Allan Avendaño Labels: gsoc2012, mentor Fix For: 0.11 Attachments: PIG-2353-2, PIG-2353-3.txt, PIG-2353-4.txt, PIG-2353-5.txt, PIG2353.patch Implement a function that given a (sorted) bag adds to each tuple a unique, increasing identifier without gaps, like what RANK does for SQL. This is a candidate project for Google summer of code 2012. More information about the program can be found at https://cwiki.apache.org/confluence/display/PIG/GSoc2012 Functionality implemented so far, is available at https://reviews.apache.org/r/5523/diff/#index_header -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Jenkins / Clover
Hi Daniel, Thanks for adding me to the group. I will have a look at it ASAP. Cheers, -- Gianmarco On Mon, Nov 12, 2012 at 10:22 AM, Daniel Dai da...@hortonworks.com wrote: Hi, Gianmarco I added you to hudson-jobadmin group. Thanks, Daniel On Thu, Jul 19, 2012 at 12:33 AM, Gianmarco De Francisci Morales g...@apache.org wrote: Fine, Alan, could you add me to the hudson-jobadmin group? modify_appgroups.pl hudson-jobadmin --add=gdfm On people.apache.org, according to the page. I have subscribed to infrastructure and builds. Cheers, -- Gianmarco On Thu, Jul 19, 2012 at 12:17 AM, Alan Gates ga...@hortonworks.com wrote: http://wiki.apache.org/general/Jenkins?action=showredirect=Hudsondescribeshow to get an account so you can administer the Jenkins builds. Alan. On Jul 18, 2012, at 12:27 PM, Gianmarco De Francisci Morales wrote: What is the procedure to modify the nightly build? If everyone agrees (and somebody explains me how) I volunteer to fix it. Cheers, -- Gianmarco On Wed, Jul 18, 2012 at 8:25 AM, Jonathan Coveney jcove...@gmail.com wrote: +1 A while ago I tried to get apache builds to deal with this, and nothing. Very annoying, but pending a fix, we should remove it from the nightly. 2012/7/17 Alan Gates ga...@hortonworks.com I'm fine with removing it from the nightly build. I don't see any reason to run that every day, especially since it slows down the tests. Let's not remove it from ant, as it's useful to run occasionally. Alan. On Jul 17, 2012, at 3:17 PM, Gianmarco De Francisci Morales wrote: Hi, Clover constantly makes a number of our Jenkins builds fail (usually because of license issues, I think it is a misconfiguration). Do we actually use it? If we don't I would propose to remove it from our build. What do you think? Cheers, -- Gianmarco
[jira] [Commented] (PIG-2989) Illustrate for Rank Operator
[ https://issues.apache.org/jira/browse/PIG-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13496344#comment-13496344 ] Gianmarco De Francisci Morales commented on PIG-2989: - Has this been committed? When I try to apply the patch to trunk I get an error: {code} patch -p0 patch_1 patching file src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java Reversed (or previously applied) patch detected! Assume -R? [n] n Apply anyway? [n] y Hunk #1 FAILED at 326. 1 out of 1 hunk FAILED -- saving rejects to file src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java.rej patching file src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/PORank.java Reversed (or previously applied) patch detected! Assume -R? [n] n Apply anyway? [n] y Hunk #1 FAILED at 156. 1 out of 1 hunk FAILED -- saving rejects to file src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/PORank.java.rej {code} Illustrate for Rank Operator Key: PIG-2989 URL: https://issues.apache.org/jira/browse/PIG-2989 Project: Pig Issue Type: Bug Components: build Affects Versions: 0.11 Reporter: Allan Avendaño Assignee: Allan Avendaño Priority: Minor Attachments: patch_1 Specifically useful, when it's required a quick view of final results of Rank operator use. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2325) Make e2e test directory for data configurable in HDFS
[ https://issues.apache.org/jira/browse/PIG-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gianmarco De Francisci Morales resolved PIG-2325. - Resolution: Invalid Fix Version/s: 0.12 Assignee: Gianmarco De Francisci Morales Thanks! Make e2e test directory for data configurable in HDFS - Key: PIG-2325 URL: https://issues.apache.org/jira/browse/PIG-2325 Project: Pig Issue Type: Improvement Reporter: Gianmarco De Francisci Morales Assignee: Gianmarco De Francisci Morales Priority: Minor Fix For: 0.12 Right now the place for the data generated in e2e tests is hardcoded in test/e2e/pig/conf/default.conf as: {code} $cfg = { #HDFS 'inpathbase' = '/user/pig/tests/data' , 'outpathbase'= '/user/pig/out' {code} It would be better to make it configurable (with an environment variable?) as the rest of the paths. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3006) Modernize a chunk of the tests
[ https://issues.apache.org/jira/browse/PIG-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492477#comment-13492477 ] Gianmarco De Francisci Morales commented on PIG-3006: - Great job guys! Modernize a chunk of the tests -- Key: PIG-3006 URL: https://issues.apache.org/jira/browse/PIG-3006 Project: Pig Issue Type: Improvement Reporter: Jonathan Coveney Assignee: Jonathan Coveney Fix For: 0.12 Attachments: PIG-3006-0.patch, PIG-3006-1.patch, PIG-3006-2.patch, PIG-3006-3.patch, PIG-3006-4.patch A lot of the tests use antiquated patterns. My goal was to refactor them in a couple ways: - get rid of the annotation specifying Junit 4. All should use JUnit 4 (question: where is the Junit 3 dependency even being pulled in?) - Nothing should extend TestCase. Everything should be annotation driven. - Properly use asserts. There was a lot of assertTrue(null==thing), so I replaced it with assertNull(thing), and so on. - Get rid of MiniCluster use in a handful of cases. I've run every test and they pass, EXCEPT TestLargeFile which is failing on trunk anyway. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Our release process
Hi, Sure we don't want to commit patches that destabilize the code base. However, unfortunately, there is no way to know whether a patch will destabilize the code or not. Even testing is only a heuristic. So how do we draw the line? We seem to agree that only bug fixing should go into branches. However it seems that we have two different views on the policy: Olga is proposing to have only P1 bugs fixed, while Alan is suggesting to be more lax on what goes into the branches. Regardless of the policy chosen, how do we define the priority of a bug? By how many users are affected? By whether it can corrupt data? Is there a formal definition we can agree on? Otherwise defining a policy becomes hard. The test-commit task does not run full regression because the full test suite takes too long to execute. And I agree that asking to run the full test suite before committing any change slows down the (already slow) review process. However, I would be fine with running the full test suite for bug fixes that need to go into branches, in order to guarantee absence of regressions. Cheers, -- Gianmarco On Sun, Nov 4, 2012 at 5:17 PM, Olga Natkovich onatkov...@yahoo.com wrote: I can see how this would work for research projects but for real production this will not work. And I actually meant much more stringent stability. I don't think we should commit patches to either trunk or branch that destabilize the tree. We used to run full regression before each commit - is this no longer the case? By stability I meant very few things go into the branch. I know that pig has pretty decent tests - better coverage than many other projects. However, we do not have any testing at scale and inevitably, users end up doing testing. So any time we deploy new major version, it takes us at least a month to get it stable and once it is stabilized we want to keep it this way. So for us at Yahoo, the only way to work directly from the branch is to go by our original plan. If that is not possible, we would go with the private git branch. Olga From: Alan Gates ga...@hortonworks.com To: dev@pig.apache.org Sent: Friday, November 2, 2012 8:19 PM Subject: Re: Our release process I am all for maintaining stability of branches, and the trunk, as everyone benefits from it. But I do not think this means we should limit bug fixing in the branches to only critical issues. As Pig gets more users we have more and more people on older branches who will want fixes for bugs without dealing with bigger version changes. So I am not in favor of limiting checkins to branches to P1 issues. What if we maintain stability on the branches by quickly reverting any patches that break the build, the unit tests, or the e2e tests? This allows us to move forward with bug fix versions, it allows those who depend on branch stability (which I suspect is everyone in the distribution business plus everyone rolling their own Pig), and it should promote developer responsibility (no one likes having their patches reverted). Alan. On Nov 2, 2012, at 3:58 PM, Olga Natkovich wrote: Hi guys, Mid next year, we agreed on a release process documented in this thread: http://www.mail-archive.com/dev@pig.apache.org/msg04172.html. Since then, we have not really followed either of its two rules: (1) Frequent (every 3 month releases) (2) Branch stability (only P1 issues on the branch). So I wanted to revisit our release procedure to make sure we have one that we can actually follow. For us at Yahoo, branch stability is very important since we release all the patches directly from the branch. If we can't rely on the fact that only critical fixes go in, we will need to resort to git branches that will make the whole process very comberson because we now need to hand pick patches from the apache branch and port them onto our private branch. I would imaging that others using Pig in production would have similar issues. Olga Olga
[jira] [Updated] (PIG-2315) Make as clause work in generate
[ https://issues.apache.org/jira/browse/PIG-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gianmarco De Francisci Morales updated PIG-2315: Fix Version/s: (was: 0.11) 0.12 Make as clause work in generate --- Key: PIG-2315 URL: https://issues.apache.org/jira/browse/PIG-2315 Project: Pig Issue Type: Bug Reporter: Olga Natkovich Assignee: Gianmarco De Francisci Morales Fix For: 0.12 Currently, the following syntax is supported and ignored causing confusing with users: A1 = foreach A1 generate a as a:chararray ; After this statement a just retains its previous type -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Our release process
Hi, On Mon, Nov 5, 2012 at 10:48 AM, Olga Natkovich onatkov...@yahoo.com wrote: Hi Gianmarco, Thanks for your comments. Here is a little more information. At Yahoo, we consider the following issues to be P1: (1) Bugs that cause wrong results being produced silently (2) Bugs that cause failures with no easy workaround Thanks Olga, now I get what you mean. I don't have a strong opinion on this. On one hand I see why you don't want to put too many patches in the branches in order to keep things stable. On the other hand when we do a 0.10.x release with x0 the users would like to have as many bugs fixed as possible. Regarding tests. I would suggest we have different rules for trunk and branches: (1) For branches, I think we should run the full regression suite (including e2e) prior to commit. This way we can ensure branch stability and, as number of patches should be small, will not be a burden (2) For trunk, we can go with test-commit only and fix things quickly when things break. I think this makes sense. +1 Olga Cheers, -- Gianmarco
Re: Our release process
Hi, I agree with what Alan says and I like his proposal. However, to make it feasible, we need to make jenkins builds stable, otherwise a real problem introduced by a patch might be lost in the hundreds of failures due to clover licenses, minicluster issues, etc... I don't like too much making jenkins post to jira the results of the build after a patch is committed, as it pollutes the jira itself, however in this case it might be a good way to promote developer responsibility. Is it possible to activate this only for branches? Cheers, -- Gianmarco On Fri, Nov 2, 2012 at 8:19 PM, Alan Gates ga...@hortonworks.com wrote: I am all for maintaining stability of branches, and the trunk, as everyone benefits from it. But I do not think this means we should limit bug fixing in the branches to only critical issues. As Pig gets more users we have more and more people on older branches who will want fixes for bugs without dealing with bigger version changes. So I am not in favor of limiting checkins to branches to P1 issues. What if we maintain stability on the branches by quickly reverting any patches that break the build, the unit tests, or the e2e tests? This allows us to move forward with bug fix versions, it allows those who depend on branch stability (which I suspect is everyone in the distribution business plus everyone rolling their own Pig), and it should promote developer responsibility (no one likes having their patches reverted). Alan. On Nov 2, 2012, at 3:58 PM, Olga Natkovich wrote: Hi guys, Mid next year, we agreed on a release process documented in this thread: http://www.mail-archive.com/dev@pig.apache.org/msg04172.html. Since then, we have not really followed either of its two rules: (1) Frequent (every 3 month releases) (2) Branch stability (only P1 issues on the branch). So I wanted to revisit our release procedure to make sure we have one that we can actually follow. For us at Yahoo, branch stability is very important since we release all the patches directly from the branch. If we can't rely on the fact that only critical fixes go in, we will need to resort to git branches that will make the whole process very comberson because we now need to hand pick patches from the apache branch and port them onto our private branch. I would imaging that others using Pig in production would have similar issues. Olga Olga
Re: [DISCUSS] Remove Penny from contrib
+1 -- Gianmarco On Wed, Oct 31, 2012 at 10:01 PM, Russell Jurney russell.jur...@gmail.comwrote: I'll be the +1 :) Russell Jurney http://datasyndrome.com On Nov 1, 2012, at 12:53 AM, Bill Graham billgra...@gmail.com wrote: +1 On Wed, Oct 31, 2012 at 5:36 PM, Cheolsoo Park cheol...@cloudera.com wrote: +1. I agree. On Wed, Oct 31, 2012 at 2:54 PM, Alan Gates ga...@hortonworks.com wrote: I propose we remove Penny from contrib. Currently it does not compile in trunk. Looking through the commit logs no significant work has been done on it since it was initially committed. There are 3 open JIRAs that reference it ( https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truejqlQuery=project+%3D+PIG+AND+%28summary+%7E+penny+OR+description+%7E+penny%29+AND+status+%3D+Open ). At this point I do not think anyone is using it or maintaining it. If someone is interested in moving this piece forward I would propose we move it to Apache Extras or something similar. Alan. -- *Note that I'm no longer using my Yahoo! email address. Please email me at billgra...@gmail.com going forward.*
[jira] [Commented] (PIG-2970) Nested foreach getting incorrect schema when having unrelated inner query
[ https://issues.apache.org/jira/browse/PIG-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488197#comment-13488197 ] Gianmarco De Francisci Morales commented on PIG-2970: - If it works and does not interfere with other parts of the plan building, I think this is a good approach. Nested foreach getting incorrect schema when having unrelated inner query - Key: PIG-2970 URL: https://issues.apache.org/jira/browse/PIG-2970 Project: Pig Issue Type: Bug Components: parser Affects Versions: 0.10.0 Reporter: Koji Noguchi Assignee: Koji Noguchi Priority: Minor Fix For: 0.11, 0.12 Attachments: pig-2970-trunk-v01.txt While looking at PIG-2968, hit a weird error message. {noformat} $ cat -n test/foreach2.pig 1 daily = load 'nyse' as (exchange, symbol); 2 grpd = group daily by exchange; 3 unique = foreach grpd { 4 sym = daily.symbol; 5 uniq_sym = distinct sym; 6 --ignoring uniq_sym result 7 generate group, daily; 8 }; 9 describe unique; 10 zzz = foreach unique generate group; 11 explain zzz; % pig -x local -t ColumnMapKeyPrune test/foreach2.pig ... unique: {symbol: bytearray} 2012-10-12 16:55:44,226 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025: file test/foreach2.pig, line 10, column 30 Invalid field projection. Projected field [group] does not exist in schema: symbol:bytearray. ... {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-19) A=load causes parse error
[ https://issues.apache.org/jira/browse/PIG-19?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gianmarco De Francisci Morales updated PIG-19: -- Assignee: Gianmarco De Francisci Morales A=load causes parse error - Key: PIG-19 URL: https://issues.apache.org/jira/browse/PIG-19 Project: Pig Issue Type: Bug Components: grunt Reporter: Olga Natkovich Assignee: Gianmarco De Francisci Morales Priority: Minor Fix For: 0.12 Parser expects spaces around =. This should be a minor change in src/org/apache/pig/tools/grunt/GruntParser.jj -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3008) Fix whitespace in Pig code
[ https://issues.apache.org/jira/browse/PIG-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486350#comment-13486350 ] Gianmarco De Francisci Morales commented on PIG-3008: - 1. I am assuming we will integrate this with Ant. The checkstyle ant target wants a list of file to run on. So if you are able to identify the list of files you changed automatically, then yes. How do you define changed? 2. Yes, as long as there is a compatible formatter profile (i.e. it is not too complex). For our use case it should be fine. I know it can be done with Eclipse. For other IDEs I guess it can be done as well but don't know how. We also need integration with Jenkins for automatic builds. Fix whitespace in Pig code -- Key: PIG-3008 URL: https://issues.apache.org/jira/browse/PIG-3008 Project: Pig Issue Type: Improvement Reporter: Jonathan Coveney Fix For: 0.12 Attachments: checkstyle.xml This JIRA exists mainly to get a conversation started. We've talked about it before, and it's a tricky issue. That said, some of the Pig code is super, super gnarly. We need some sort of path that will let it eventually be fix-able. I posit: any file that hasn't been touched for over 6 months is eligible for a whitespace patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3008) Fix whitespace in Pig code
[ https://issues.apache.org/jira/browse/PIG-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13485689#comment-13485689 ] Gianmarco De Francisci Morales commented on PIG-3008: - I agree to fix whitespace, and the policy you suggest seems reasonable. However, this conversation belongs to the more general topic of style. Are we willing to define and enforce a code style policy for Pig? Other projects are using checkstyle to do so, so that new patches comply. I am fine with being stricter on style, but only for things that can be automated within an IDE (whitespace is one of those). Fix whitespace in Pig code -- Key: PIG-3008 URL: https://issues.apache.org/jira/browse/PIG-3008 Project: Pig Issue Type: Improvement Reporter: Jonathan Coveney Fix For: 0.12 This JIRA exists mainly to get a conversation started. We've talked about it before, and it's a tricky issue. That said, some of the Pig code is super, super gnarly. We need some sort of path that will let it eventually be fix-able. I posit: any file that hasn't been touched for over 6 months is eligible for a whitespace patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3001) TestExecutableManager.testAddJobConfToEnv fails randomly
[ https://issues.apache.org/jira/browse/PIG-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13485710#comment-13485710 ] Gianmarco De Francisci Morales commented on PIG-3001: - Thanks Rohini, I had missed this bit. Then I also vote for removing the conversion. TestExecutableManager.testAddJobConfToEnv fails randomly Key: PIG-3001 URL: https://issues.apache.org/jira/browse/PIG-3001 Project: Pig Issue Type: Sub-task Reporter: Gianmarco De Francisci Morales Assignee: Cheolsoo Park Priority: Minor Attachments: PIG-3001-2.patch, PIG-3001-3.patch, PIG-3001-4.patch, PIG-3001.patch The test in the Summary fails intermittently. This is due to using a random number generator without seeding it. We should avoid stochastic tests. Furthermore, the test itself is ill conceived. Here the failure summary: {code} 12/10/23 11:02:48 WARN streaming.ExecutableManager: Property set in pig.streaming.environment not found in Configuration: ⻨ꢏ切歯 12/10/23 11:02:48 WARN streaming.ExecutableManager: Property set in pig.streaming.environment not found in Configuration: 狓偝 12/10/23 11:02:48 WARN streaming.ExecutableManager: Property set in pig.streaming.environment not found in Configuration: 墣챟㌌̀썬鼹騷 12/10/23 11:02:48 WARN streaming.ExecutableManager: Property set in pig.streaming.environment not found in Configuration: 훎滼 {code} {code} Error Message: There should be no remaining pairs in the included map Stacktrace: junit.framework.AssertionFailedError: There should be no remaining pairs in the included map at org.apache.pig.impl.streaming.TestExecutableManager.testAddJobConfToEnv(TestExecutableManager.java:84) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3008) Fix whitespace in Pig code
[ https://issues.apache.org/jira/browse/PIG-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gianmarco De Francisci Morales updated PIG-3008: Attachment: checkstyle.xml Agreed, let's start with this. Here a first stab for a checkstyle with FileTabCharacter, Indentation, and RegexpSingleLine for trailing white space. I threw in a few other things which I think are useful as well: Checking all files have Apache header, checking imports and using ==/!+ with Strings. Fix whitespace in Pig code -- Key: PIG-3008 URL: https://issues.apache.org/jira/browse/PIG-3008 Project: Pig Issue Type: Improvement Reporter: Jonathan Coveney Fix For: 0.12 Attachments: checkstyle.xml This JIRA exists mainly to get a conversation started. We've talked about it before, and it's a tricky issue. That said, some of the Pig code is super, super gnarly. We need some sort of path that will let it eventually be fix-able. I posit: any file that hasn't been touched for over 6 months is eligible for a whitespace patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3006) Modernize a chunk of the tests
[ https://issues.apache.org/jira/browse/PIG-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13485355#comment-13485355 ] Gianmarco De Francisci Morales commented on PIG-3006: - I am fine with the changes, but it's a huge patch so I skimmed through it, I might have missed something. +0, let's wait for another pair of eyeballs. Modernize a chunk of the tests -- Key: PIG-3006 URL: https://issues.apache.org/jira/browse/PIG-3006 Project: Pig Issue Type: Improvement Reporter: Jonathan Coveney Assignee: Jonathan Coveney Fix For: 0.12 Attachments: PIG-3006-0.patch, PIG-3006-1.patch A lot of the tests use antiquated patterns. My goal was to refactor them in a couple ways: - get rid of the annotation specifying Junit 4. All should use JUnit 4 (question: where is the Junit 3 dependency even being pulled in?) - Nothing should extend TestCase. Everything should be annotation driven. - Properly use asserts. There was a lot of assertTrue(null==thing), so I replaced it with assertNull(thing), and so on. - Get rid of MiniCluster use in a handful of cases. I've run every test and they pass, EXCEPT TestLargeFile which is failing on trunk anyway. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3001) TestExecutableManager.testAddJobConfToEnv fails randomly
[ https://issues.apache.org/jira/browse/PIG-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13485357#comment-13485357 ] Gianmarco De Francisci Morales commented on PIG-3001: - I would also remove it, however backwards compatibility is an issue. I am fine with deprecating it and show a warning to the user. We can remove it altogether in next release. (if we choose this way, let's open a Jira to keep track of it). TestExecutableManager.testAddJobConfToEnv fails randomly Key: PIG-3001 URL: https://issues.apache.org/jira/browse/PIG-3001 Project: Pig Issue Type: Sub-task Reporter: Gianmarco De Francisci Morales Assignee: Cheolsoo Park Priority: Minor Attachments: PIG-3001-2.patch, PIG-3001-3.patch, PIG-3001-4.patch, PIG-3001.patch The test in the Summary fails intermittently. This is due to using a random number generator without seeding it. We should avoid stochastic tests. Furthermore, the test itself is ill conceived. Here the failure summary: {code} 12/10/23 11:02:48 WARN streaming.ExecutableManager: Property set in pig.streaming.environment not found in Configuration: ⻨ꢏ切歯 12/10/23 11:02:48 WARN streaming.ExecutableManager: Property set in pig.streaming.environment not found in Configuration: 狓偝 12/10/23 11:02:48 WARN streaming.ExecutableManager: Property set in pig.streaming.environment not found in Configuration: 墣챟㌌̀썬鼹騷 12/10/23 11:02:48 WARN streaming.ExecutableManager: Property set in pig.streaming.environment not found in Configuration: 훎滼 {code} {code} Error Message: There should be no remaining pairs in the included map Stacktrace: junit.framework.AssertionFailedError: There should be no remaining pairs in the included map at org.apache.pig.impl.streaming.TestExecutableManager.testAddJobConfToEnv(TestExecutableManager.java:84) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [ANNOUNCE] Welcome new Apache Pig Committers Rohini Palaniswamy
Congratulations Rohini! Welcome onboard :) -- Gianmarco On Fri, Oct 26, 2012 at 7:32 PM, Prasanth J buckeye.prasa...@gmail.com wrote: Congrats Rohini! Thanks -- Prasanth On Oct 26, 2012, at 10:21 PM, Santhosh Srinivasan santhosh_mut...@yahoo.com wrote: Congrats Rohini! Full speed ahead now :) On Oct 26, 2012, at 4:37 PM, Daniel Dai da...@hortonworks.com wrote: Here is another Pig committer announcement today. Please welcome Rohini Palaniswamy to be a Pig committer! Thanks, Daniel
Re: Review Request: Modernize a chunk of the tests
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/7734/#review12786 --- test/org/apache/pig/test/TestConversions.java https://reviews.apache.org/r/7734/#comment27353 Do we really need Integer.valueOf() ? test/org/apache/pig/test/TestInputOutputFileValidator.java https://reviews.apache.org/r/7734/#comment27354 We should use @Test(expected = Exception.class) instead test/org/apache/pig/test/TestInputOutputFileValidator.java https://reviews.apache.org/r/7734/#comment27355 Same here, @Test(expected = Exception.class) Possibly a proper subclass of Exception test/org/apache/pig/test/TestInputOutputMiniClusterFileValidator.java https://reviews.apache.org/r/7734/#comment27356 Same here, @Test(expected = Exception.class) or a proper subclass. test/org/apache/pig/test/TestLargeFile.java https://reviews.apache.org/r/7734/#comment27357 Why the change in name? test/org/apache/pig/test/TestNullConstant.java https://reviews.apache.org/r/7734/#comment27358 What is the accepted way of creating temporary datasets? Are we suggesting everybody to use mock.Storage() ? - Gianmarco De Francisci Morales On Oct. 25, 2012, 6:05 p.m., Jonathan Coveney wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/7734/ --- (Updated Oct. 25, 2012, 6:05 p.m.) Review request for pig and Julien Le Dem. Description --- A lot of the tests use antiquated patterns. My goal was to refactor them in a couple ways: - get rid of the annotation specifying Junit 4. All should use JUnit 4 (question: where is the Junit 3 dependency even being pulled?) - Nothing should extend TestCase. Everything should be annotation driven. - Properly use asserts. There was a lot of assertTrue(null==thing), so I replaced it with assertNull(thing), and so on. - Get rid of MiniCluster use in a handful of cases. This addresses bug PIG-3006. https://issues.apache.org/jira/browse/PIG-3006 Diffs - test/org/apache/pig/test/PigExecTestCase.java 32a502c test/org/apache/pig/test/TestAlgebraicEval.java 0bbd83d test/org/apache/pig/test/TestAlgebraicEvalLocal.java df4b76a test/org/apache/pig/test/TestBagFormat.java 09298d4 test/org/apache/pig/test/TestBatchAliases.java 6e952c7 test/org/apache/pig/test/TestCompressedFiles.java d54ffaa test/org/apache/pig/test/TestConversions.java 152ad5c test/org/apache/pig/test/TestDeleteOnFail.java 7070285 test/org/apache/pig/test/TestFilterOpNumeric.java 730e808 test/org/apache/pig/test/TestFilterOpString.java b65965f test/org/apache/pig/test/TestFilterSimplification.java ade97b6 test/org/apache/pig/test/TestForEachNestedPlanLocal.java a78568e test/org/apache/pig/test/TestFuncSpec.java bc7144c test/org/apache/pig/test/TestInfixArithmetic.java cdf6948 test/org/apache/pig/test/TestInputOutputFileValidator.java 67b2873 test/org/apache/pig/test/TestInputOutputMiniClusterFileValidator.java caa62cb test/org/apache/pig/test/TestInstantiateFunc.java 31c37b1 test/org/apache/pig/test/TestJoin.java a4f3aff test/org/apache/pig/test/TestKeyTypeDiscoveryVisitor.java 2bbeca1 test/org/apache/pig/test/TestLargeFile.java 79590ce test/org/apache/pig/test/TestLocal.java 5680196 test/org/apache/pig/test/TestLocal2.java eea7b2f test/org/apache/pig/test/TestMapReduce2.java 30574db test/org/apache/pig/test/TestNewPlanColumnPrune.java bed006e test/org/apache/pig/test/TestNewPlanListener.java 7701182 test/org/apache/pig/test/TestNewPlanOperatorPlan.java 1f8fe56 test/org/apache/pig/test/TestNewPlanPruneMapKeys.java d1cce22 test/org/apache/pig/test/TestNewPlanRule.java 4a7ff0a test/org/apache/pig/test/TestNullConstant.java 3ae25d9 test/org/apache/pig/test/TestOrderBy2.java 4ee4f26 test/org/apache/pig/test/TestOrderBy3.java 2067d7a test/org/apache/pig/test/TestPOBinCond.java 20bd734 test/org/apache/pig/test/TestPODistinct.java 60f9d73 test/org/apache/pig/test/TestPOGenerate.java e0fd796 test/org/apache/pig/test/TestPOMapLookUp.java 3ed0900 test/org/apache/pig/test/TestPONegative.java 220c409 test/org/apache/pig/test/TestPORegexp.java d6e15ac test/org/apache/pig/test/TestPOSort.java 600ee0c test/org/apache/pig/test/TestPOUserFunc.java 3a90d6c test/org/apache/pig/test/TestParamSubPreproc.java 1a52691 test/org/apache/pig/test/TestParser.java 17dc42a test/org/apache/pig/test/TestPi.java f0883d1 test/org/apache/pig/test/TestPigProgressReporting.java e4f76ec test/org/apache/pig/test/TestPigScriptParser.java 2acb1a8 test/org/apache/pig/test/TestPigSplit.java af70e9d test
[jira] [Commented] (PIG-3006) Modernize a chunk of the tests
[ https://issues.apache.org/jira/browse/PIG-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13484424#comment-13484424 ] Gianmarco De Francisci Morales commented on PIG-3006: - Great job Jon! Our test suite definitely needed some cleanup. Modernize a chunk of the tests -- Key: PIG-3006 URL: https://issues.apache.org/jira/browse/PIG-3006 Project: Pig Issue Type: Improvement Reporter: Jonathan Coveney Assignee: Jonathan Coveney Fix For: 0.12 Attachments: PIG-3006-0.patch A lot of the tests use antiquated patterns. My goal was to refactor them in a couple ways: - get rid of the annotation specifying Junit 4. All should use JUnit 4 (question: where is the Junit 3 dependency even being pulled in?) - Nothing should extend TestCase. Everything should be annotation driven. - Properly use asserts. There was a lot of assertTrue(null==thing), so I replaced it with assertNull(thing), and so on. - Get rid of MiniCluster use in a handful of cases. I've run every test and they pass, EXCEPT TestLargeFile which is failing on trunk anyway. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3006) Modernize a chunk of the tests
[ https://issues.apache.org/jira/browse/PIG-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13484437#comment-13484437 ] Gianmarco De Francisci Morales commented on PIG-3006: - I did a quick review and it looked good (I left a couple of comments), but I would feel better if one more committer reviews it. Modernize a chunk of the tests -- Key: PIG-3006 URL: https://issues.apache.org/jira/browse/PIG-3006 Project: Pig Issue Type: Improvement Reporter: Jonathan Coveney Assignee: Jonathan Coveney Fix For: 0.12 Attachments: PIG-3006-0.patch A lot of the tests use antiquated patterns. My goal was to refactor them in a couple ways: - get rid of the annotation specifying Junit 4. All should use JUnit 4 (question: where is the Junit 3 dependency even being pulled in?) - Nothing should extend TestCase. Everything should be annotation driven. - Properly use asserts. There was a lot of assertTrue(null==thing), so I replaced it with assertNull(thing), and so on. - Get rid of MiniCluster use in a handful of cases. I've run every test and they pass, EXCEPT TestLargeFile which is failing on trunk anyway. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2999) Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort failing
[ https://issues.apache.org/jira/browse/PIG-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483338#comment-13483338 ] Gianmarco De Francisci Morales commented on PIG-2999: - I see these tests fail: org.apache.pig.impl.streaming.TestExecutableManager.testAddJobConfToEnv unit.framework.AssertionFailedError: There should be no remaining pairs in the included map at org.apache.pig.impl.streaming.TestExecutableManager.testAddJobConfToEnv(TestExecutableManager.java:84) org.apache.pig.test.TestDataModel.testMultiFieldTupleCompareTo less than tuple with greater value expected:-1 but was:-2 For the second one, the solution proposed by Koji is fine. Not sure the first one is related (probably not). These tests also fail but I think they are not directly related to this patch. org.apache.pig.test.TestJobSubmission.testReducerNumEstimation org.apache.pig.test.TestMacroExpansion.testMacroAliasConversion org.apache.pig.test.TestScriptLanguage.bindLocalVariableTest2 org.apache.pig.test.TestStreaming.testInputCacheSpecs Once the testMultiFieldTupleCompareTo test case is fixed I am OK with the patch. Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort failing - Key: PIG-2999 URL: https://issues.apache.org/jira/browse/PIG-2999 Project: Pig Issue Type: Bug Affects Versions: 0.11, 0.12 Reporter: Koji Noguchi Assignee: Jonathan Coveney Attachments: pig-2999-v1.txt, pig-2999-v2.txt I think I broke the build from PIG-2975. I see couple of tests failing at BinInterSedesTupleRawComparator. {noformat} 12/10/22 22:26:15 WARN mapred.LocalJobRunner: job_local_0022 java.nio.BufferUnderflowException at java.nio.Buffer.nextGetIndex(Buffer.java:478) at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:387) at org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinInterSedesDatum(BinInterSedes.java:829) at org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinSedesTuple(BinInterSedes.java:732) at org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compare(BinInterSedes.java:695) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSecondaryKeyComparator.compare(PigSecondaryKeyComparator.java:78) at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373) at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:139) at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103) at org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335) at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350) at org.apache.hadoop.mapred.ReduceTask$4.next(ReduceTask.java:625) at org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:117) at org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3001) TestExecutableManager.testAddJobConfToEnv fails randomly
Gianmarco De Francisci Morales created PIG-3001: --- Summary: TestExecutableManager.testAddJobConfToEnv fails randomly Key: PIG-3001 URL: https://issues.apache.org/jira/browse/PIG-3001 Project: Pig Issue Type: Bug Reporter: Gianmarco De Francisci Morales Priority: Minor The test in the Summary fails intermittently. This is due to using a random number generator without seeding it. We should avoid stochastic tests. Furthermore, the test itself is ill conceived. Here the failure summary: {code} 12/10/23 11:02:48 WARN streaming.ExecutableManager: Property set in pig.streaming.environment not found in Configuration: ⻨ꢏ切歯 12/10/23 11:02:48 WARN streaming.ExecutableManager: Property set in pig.streaming.environment not found in Configuration: 狓偝 12/10/23 11:02:48 WARN streaming.ExecutableManager: Property set in pig.streaming.environment not found in Configuration: 墣챟㌌̀썬鼹騷 12/10/23 11:02:48 WARN streaming.ExecutableManager: Property set in pig.streaming.environment not found in Configuration: 훎滼 {code} {code} Error Message: There should be no remaining pairs in the included map Stacktrace: junit.framework.AssertionFailedError: There should be no remaining pairs in the included map at org.apache.pig.impl.streaming.TestExecutableManager.testAddJobConfToEnv(TestExecutableManager.java:84) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2999) Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort failing
[ https://issues.apache.org/jira/browse/PIG-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483352#comment-13483352 ] Gianmarco De Francisci Morales commented on PIG-2999: - I checked TestExecutableManager and the patch has nothing to do with it. I opened PIG-3001 to address the issue. Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort failing - Key: PIG-2999 URL: https://issues.apache.org/jira/browse/PIG-2999 Project: Pig Issue Type: Bug Affects Versions: 0.11, 0.12 Reporter: Koji Noguchi Assignee: Jonathan Coveney Attachments: pig-2999-v1.txt, pig-2999-v2.txt I think I broke the build from PIG-2975. I see couple of tests failing at BinInterSedesTupleRawComparator. {noformat} 12/10/22 22:26:15 WARN mapred.LocalJobRunner: job_local_0022 java.nio.BufferUnderflowException at java.nio.Buffer.nextGetIndex(Buffer.java:478) at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:387) at org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinInterSedesDatum(BinInterSedes.java:829) at org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinSedesTuple(BinInterSedes.java:732) at org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compare(BinInterSedes.java:695) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSecondaryKeyComparator.compare(PigSecondaryKeyComparator.java:78) at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373) at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:139) at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103) at org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335) at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350) at org.apache.hadoop.mapred.ReduceTask$4.next(ReduceTask.java:625) at org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:117) at org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2999) Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort failing
[ https://issues.apache.org/jira/browse/PIG-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483388#comment-13483388 ] Gianmarco De Francisci Morales commented on PIG-2999: - Hi Cheolsoo, Thanks for the summary. I guess it is an intermittent failure: {code} Error Message Unable to find region for after 10 tries. Stacktrace org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find region for after 10 tries. at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:677) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:586) at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:555) at org.apache.hadoop.hbase.client.HTable.init(HTable.java:171) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:369) at org.apache.pig.test.TestJobSubmission.testReducerNumEstimation(TestJobSubmission.java:545) {code} Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort failing - Key: PIG-2999 URL: https://issues.apache.org/jira/browse/PIG-2999 Project: Pig Issue Type: Bug Affects Versions: 0.11, 0.12 Reporter: Koji Noguchi Assignee: Jonathan Coveney Attachments: pig-2999-v1.txt, pig-2999-v2.txt I think I broke the build from PIG-2975. I see couple of tests failing at BinInterSedesTupleRawComparator. {noformat} 12/10/22 22:26:15 WARN mapred.LocalJobRunner: job_local_0022 java.nio.BufferUnderflowException at java.nio.Buffer.nextGetIndex(Buffer.java:478) at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:387) at org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinInterSedesDatum(BinInterSedes.java:829) at org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinSedesTuple(BinInterSedes.java:732) at org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compare(BinInterSedes.java:695) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSecondaryKeyComparator.compare(PigSecondaryKeyComparator.java:78) at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373) at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:139) at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103) at org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335) at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350) at org.apache.hadoop.mapred.ReduceTask$4.next(ReduceTask.java:625) at org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:117) at org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2999) Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort failing
[ https://issues.apache.org/jira/browse/PIG-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gianmarco De Francisci Morales updated PIG-2999: Resolution: Fixed Fix Version/s: 0.11 Assignee: Cheolsoo Park (was: Jonathan Coveney) Status: Resolved (was: Patch Available) Verified that failing tests pass locally. +1 Committed to trunk and 0.11. Thanks for fixing this Cheolsoo, Koji. Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort failing - Key: PIG-2999 URL: https://issues.apache.org/jira/browse/PIG-2999 Project: Pig Issue Type: Sub-task Affects Versions: 0.11, 0.12 Reporter: Koji Noguchi Assignee: Cheolsoo Park Fix For: 0.11 Attachments: pig-2999-v1.txt, pig-2999-v2.txt, pig-2999-v3.txt I think I broke the build from PIG-2975. I see couple of tests failing at BinInterSedesTupleRawComparator. {noformat} 12/10/22 22:26:15 WARN mapred.LocalJobRunner: job_local_0022 java.nio.BufferUnderflowException at java.nio.Buffer.nextGetIndex(Buffer.java:478) at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:387) at org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinInterSedesDatum(BinInterSedes.java:829) at org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinSedesTuple(BinInterSedes.java:732) at org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compare(BinInterSedes.java:695) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSecondaryKeyComparator.compare(PigSecondaryKeyComparator.java:78) at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373) at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:139) at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103) at org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335) at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350) at org.apache.hadoop.mapred.ReduceTask$4.next(ReduceTask.java:625) at org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:117) at org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: PROPOSAL: how to handle release documentation going forward
I guess this is the only way to ensure documented code. +1 We need to put this rule somewhere, maybe in the Wiki? Cheers, -- Gianmarco On Tue, Oct 23, 2012 at 12:37 AM, Santhosh M S santhosh_mut...@yahoo.com wrote: +1 From: Jonathan Coveney jcove...@gmail.com To: dev@pig.apache.org; Olga Natkovich onatkov...@yahoo.com Sent: Monday, October 22, 2012 5:09 PM Subject: Re: PROPOSAL: how to handle release documentation going forward As someone who chronically under-documents, I think that this is a good idea. +1 2012/10/22 Olga Natkovich onatkov...@yahoo.com Hi, Since we lost the dedicated document writer for Pig, would it make sense to require that going forward (0.12 and beyond) we require that documentation updates are included in the patch together with code changes and tests. I think that should work for most features/updates except perhaps big items that might require more than one JIRA to be completed before documentation changes make sense. Comments? Olga
[jira] [Commented] (PIG-2999) Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort failing
[ https://issues.apache.org/jira/browse/PIG-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482489#comment-13482489 ] Gianmarco De Francisci Morales commented on PIG-2999: - The patch looks good. Running tests to make sure everything works. Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort failing - Key: PIG-2999 URL: https://issues.apache.org/jira/browse/PIG-2999 Project: Pig Issue Type: Bug Affects Versions: 0.11, 0.12 Reporter: Koji Noguchi Assignee: Jonathan Coveney Attachments: pig-2999-v1.txt, pig-2999-v2.txt I think I broke the build from PIG-2975. I see couple of tests failing at BinInterSedesTupleRawComparator. {noformat} 12/10/22 22:26:15 WARN mapred.LocalJobRunner: job_local_0022 java.nio.BufferUnderflowException at java.nio.Buffer.nextGetIndex(Buffer.java:478) at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:387) at org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinInterSedesDatum(BinInterSedes.java:829) at org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinSedesTuple(BinInterSedes.java:732) at org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compare(BinInterSedes.java:695) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSecondaryKeyComparator.compare(PigSecondaryKeyComparator.java:78) at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373) at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:139) at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103) at org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335) at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350) at org.apache.hadoop.mapred.ReduceTask$4.next(ReduceTask.java:625) at org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:117) at org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2999) Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort failing
[ https://issues.apache.org/jira/browse/PIG-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482538#comment-13482538 ] Gianmarco De Francisci Morales commented on PIG-2999: - Sure, I will take care of it. Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort failing - Key: PIG-2999 URL: https://issues.apache.org/jira/browse/PIG-2999 Project: Pig Issue Type: Bug Affects Versions: 0.11, 0.12 Reporter: Koji Noguchi Assignee: Jonathan Coveney Attachments: pig-2999-v1.txt, pig-2999-v2.txt I think I broke the build from PIG-2975. I see couple of tests failing at BinInterSedesTupleRawComparator. {noformat} 12/10/22 22:26:15 WARN mapred.LocalJobRunner: job_local_0022 java.nio.BufferUnderflowException at java.nio.Buffer.nextGetIndex(Buffer.java:478) at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:387) at org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinInterSedesDatum(BinInterSedes.java:829) at org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinSedesTuple(BinInterSedes.java:732) at org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compare(BinInterSedes.java:695) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSecondaryKeyComparator.compare(PigSecondaryKeyComparator.java:78) at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373) at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:139) at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103) at org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335) at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350) at org.apache.hadoop.mapred.ReduceTask$4.next(ReduceTask.java:625) at org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:117) at org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2999) Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort failing
[ https://issues.apache.org/jira/browse/PIG-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482837#comment-13482837 ] Gianmarco De Francisci Morales commented on PIG-2999: - Not yet. I am running the full test suite to be sure we don't break other things, but it takes a while. Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort failing - Key: PIG-2999 URL: https://issues.apache.org/jira/browse/PIG-2999 Project: Pig Issue Type: Bug Affects Versions: 0.11, 0.12 Reporter: Koji Noguchi Assignee: Jonathan Coveney Attachments: pig-2999-v1.txt, pig-2999-v2.txt I think I broke the build from PIG-2975. I see couple of tests failing at BinInterSedesTupleRawComparator. {noformat} 12/10/22 22:26:15 WARN mapred.LocalJobRunner: job_local_0022 java.nio.BufferUnderflowException at java.nio.Buffer.nextGetIndex(Buffer.java:478) at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:387) at org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinInterSedesDatum(BinInterSedes.java:829) at org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinSedesTuple(BinInterSedes.java:732) at org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compare(BinInterSedes.java:695) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSecondaryKeyComparator.compare(PigSecondaryKeyComparator.java:78) at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373) at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:139) at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103) at org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335) at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350) at org.apache.hadoop.mapred.ReduceTask$4.next(ReduceTask.java:625) at org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:117) at org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2999) Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort failing
[ https://issues.apache.org/jira/browse/PIG-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482936#comment-13482936 ] Gianmarco De Francisci Morales commented on PIG-2999: - Makes sense. Comparable only guarantees 0 ==0 and 0 Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort failing - Key: PIG-2999 URL: https://issues.apache.org/jira/browse/PIG-2999 Project: Pig Issue Type: Bug Affects Versions: 0.11, 0.12 Reporter: Koji Noguchi Assignee: Jonathan Coveney Attachments: pig-2999-v1.txt, pig-2999-v2.txt I think I broke the build from PIG-2975. I see couple of tests failing at BinInterSedesTupleRawComparator. {noformat} 12/10/22 22:26:15 WARN mapred.LocalJobRunner: job_local_0022 java.nio.BufferUnderflowException at java.nio.Buffer.nextGetIndex(Buffer.java:478) at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:387) at org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinInterSedesDatum(BinInterSedes.java:829) at org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinSedesTuple(BinInterSedes.java:732) at org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compare(BinInterSedes.java:695) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSecondaryKeyComparator.compare(PigSecondaryKeyComparator.java:78) at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373) at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:139) at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103) at org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335) at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350) at org.apache.hadoop.mapred.ReduceTask$4.next(ReduceTask.java:625) at org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:117) at org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result
[ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481770#comment-13481770 ] Gianmarco De Francisci Morales commented on PIG-2975: - Guys, great job in moving this forward! I am sold an all the improvements in the patch. +1 TestTypedMap.testOrderBy failing with incorrect result --- Key: PIG-2975 URL: https://issues.apache.org/jira/browse/PIG-2975 Project: Pig Issue Type: Sub-task Affects Versions: 0.11 Reporter: Koji Noguchi Assignee: Koji Noguchi Priority: Blocker Fix For: 0.11 Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, pig-2975-trunk_v03-unionapproach.txt, pig-2975-trunk_v04-purerawcompare.txt, pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withouttest.txt, pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withtest2.txt, pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withtest.txt Looked at {noformat} junit.framework.AssertionFailedError at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352) {noformat} This looks like a valid test case failing with incorrect result. {noformat} % cat test/orderby.txt [key#1,key9#23] [key#3,key3#2] [key#22] % cat test/orderby.pig a = load 'test/orderby.txt' as (m:[]); b = foreach a generate m#'key' as b0; dump b; c = order b by b0; dump c; % java ... org.apache.pig.Main-x local test/orderby.pig [dump b] (1) (3) (22) ... [dump c] (1) (1) (22) % where did the '(3)' go? {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2993) Fix local mode on Hadoop-0.23
[ https://issues.apache.org/jira/browse/PIG-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gianmarco De Francisci Morales resolved PIG-2993. - Resolution: Duplicate Fix Version/s: (was: 0.11) Thanks for the walkthrough. Indeed, Pig was picking the Hadoop installed on my machine. All the rest is as you described. Closing as duplicate. Fix local mode on Hadoop-0.23 - Key: PIG-2993 URL: https://issues.apache.org/jira/browse/PIG-2993 Project: Pig Issue Type: Sub-task Reporter: Gianmarco De Francisci Morales When compiling with -Dhadoopversion=23 and launching Pig in local mode (-x local) the shell just fills up with error notifications: {code} 2012-10-19 15:10:17,360 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. Could not initialize class org.apache.pig.tools.pigstats.PigStatsUtil {code} Here the stack trace: {code} Pig Stack Trace --- ERROR 2998: Unhandled internal error. org/apache/hadoop/mapreduce/task/JobContextImpl java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/task/JobContextImpl at org.apache.pig.tools.pigstats.PigStatsUtil.clinit(PigStatsUtil.java:54) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:67) at org.apache.pig.Main.run(Main.java:538) at org.apache.pig.Main.main(Main.java:154) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.task.JobContextImpl at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) ... 9 more Pig Stack Trace --- ERROR 2998: Unhandled internal error. Could not initialize class org.apache.pig.tools.pigstats.PigStatsUtil java.lang.NoClassDefFoundError: Could not initialize class org.apache.pig.tools.pigstats.PigStatsUtil at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:67) at org.apache.pig.Main.run(Main.java:538) at org.apache.pig.Main.main(Main.java:154) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2941) Ivy resolvers in pig don't have consistent chaining and don't have a kitchen sink option for novices
[ https://issues.apache.org/jira/browse/PIG-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gianmarco De Francisci Morales updated PIG-2941: Resolution: Fixed Fix Version/s: (was: 0.10.0) 0.12 Status: Resolved (was: Patch Available) +1 Committed to trunk. Thanks John! Ivy resolvers in pig don't have consistent chaining and don't have a kitchen sink option for novices Key: PIG-2941 URL: https://issues.apache.org/jira/browse/PIG-2941 Project: Pig Issue Type: Bug Components: build Affects Versions: 0.10.0 Reporter: John Gordon Assignee: John Gordon Fix For: 0.12 Attachments: 0001-IvySettings.xml-refactor-to-simplify-resolution.patch, PIG-2941.trunk.002.patch The Ivy resolvers in Pig are split into default, external, and internal -- and they are all actually distinct. There isn't a resolver that rolls over all three, and fallbacks aren't in place. Ideally, these resolver should chain right through with the default following a best practice fallback for novices. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2999) Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort failing
[ https://issues.apache.org/jira/browse/PIG-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482082#comment-13482082 ] Gianmarco De Francisci Morales commented on PIG-2999: - Most likely you are correct Jonathan. Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort failing - Key: PIG-2999 URL: https://issues.apache.org/jira/browse/PIG-2999 Project: Pig Issue Type: Bug Affects Versions: 0.11, 0.12 Reporter: Koji Noguchi I think I broke the build from PIG-2975. I see couple of tests failing at BinInterSedesTupleRawComparator. {noformat} 12/10/22 22:26:15 WARN mapred.LocalJobRunner: job_local_0022 java.nio.BufferUnderflowException at java.nio.Buffer.nextGetIndex(Buffer.java:478) at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:387) at org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinInterSedesDatum(BinInterSedes.java:829) at org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinSedesTuple(BinInterSedes.java:732) at org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compare(BinInterSedes.java:695) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSecondaryKeyComparator.compare(PigSecondaryKeyComparator.java:78) at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373) at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:139) at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103) at org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335) at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350) at org.apache.hadoop.mapred.ReduceTask$4.next(ReduceTask.java:625) at org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:117) at org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260) {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result
[ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480247#comment-13480247 ] Gianmarco De Francisci Morales commented on PIG-2975: - Hi, We use ByteBuffer in the comparator for convenience. However, I don't think we should really compare the 6 minutes of the incorrect version with the 10 minutes of the correct version too much. IMHO correctness is more important than performance. The slowness is due to the fact that we need to unnest the ByteArray from the Tuple and that we are using a Tuple to store any kind of data. That said, BinInterSedes.BinInterSedesRawComparator is meant for performance, so if there is a way to make it faster it's more than welcome. My guess is that it won't be easy to recover the original speed. I would suggest to profile the code with some micro benchmark to see where the time is spent. TestTypedMap.testOrderBy failing with incorrect result --- Key: PIG-2975 URL: https://issues.apache.org/jira/browse/PIG-2975 Project: Pig Issue Type: Sub-task Affects Versions: 0.11 Reporter: Koji Noguchi Assignee: Koji Noguchi Priority: Blocker Fix For: 0.11 Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, pig-2975-trunk_v03-unionapproach.txt Looked at {noformat} junit.framework.AssertionFailedError at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352) {noformat} This looks like a valid test case failing with incorrect result. {noformat} % cat test/orderby.txt [key#1,key9#23] [key#3,key3#2] [key#22] % cat test/orderby.pig a = load 'test/orderby.txt' as (m:[]); b = foreach a generate m#'key' as b0; dump b; c = order b by b0; dump c; % java ... org.apache.pig.Main-x local test/orderby.pig [dump b] (1) (3) (22) ... [dump c] (1) (1) (22) % where did the '(3)' go? {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result
[ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gianmarco De Francisci Morales reassigned PIG-2975: --- Assignee: Gianmarco De Francisci Morales (was: Koji Noguchi) TestTypedMap.testOrderBy failing with incorrect result --- Key: PIG-2975 URL: https://issues.apache.org/jira/browse/PIG-2975 Project: Pig Issue Type: Sub-task Affects Versions: 0.11 Reporter: Koji Noguchi Assignee: Gianmarco De Francisci Morales Priority: Blocker Fix For: 0.11 Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, pig-2975-trunk_v03-unionapproach.txt Looked at {noformat} junit.framework.AssertionFailedError at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352) {noformat} This looks like a valid test case failing with incorrect result. {noformat} % cat test/orderby.txt [key#1,key9#23] [key#3,key3#2] [key#22] % cat test/orderby.pig a = load 'test/orderby.txt' as (m:[]); b = foreach a generate m#'key' as b0; dump b; c = order b by b0; dump c; % java ... org.apache.pig.Main-x local test/orderby.pig [dump b] (1) (3) (22) ... [dump c] (1) (1) (22) % where did the '(3)' go? {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result
[ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gianmarco De Francisci Morales updated PIG-2975: Assignee: Koji Noguchi (was: Gianmarco De Francisci Morales) TestTypedMap.testOrderBy failing with incorrect result --- Key: PIG-2975 URL: https://issues.apache.org/jira/browse/PIG-2975 Project: Pig Issue Type: Sub-task Affects Versions: 0.11 Reporter: Koji Noguchi Assignee: Koji Noguchi Priority: Blocker Fix For: 0.11 Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, pig-2975-trunk_v03-unionapproach.txt Looked at {noformat} junit.framework.AssertionFailedError at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352) {noformat} This looks like a valid test case failing with incorrect result. {noformat} % cat test/orderby.txt [key#1,key9#23] [key#3,key3#2] [key#22] % cat test/orderby.pig a = load 'test/orderby.txt' as (m:[]); b = foreach a generate m#'key' as b0; dump b; c = order b by b0; dump c; % java ... org.apache.pig.Main-x local test/orderby.pig [dump b] (1) (3) (22) ... [dump c] (1) (1) (22) % where did the '(3)' go? {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result
[ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480309#comment-13480309 ] Gianmarco De Francisci Morales commented on PIG-2975: - Personally I don't care about byte order, it has no definite semantic already. However, by including the 4 bytes in the comparison I am afraid we are exposing ourselves to further bugs when the serialization format changes. TestTypedMap.testOrderBy failing with incorrect result --- Key: PIG-2975 URL: https://issues.apache.org/jira/browse/PIG-2975 Project: Pig Issue Type: Sub-task Affects Versions: 0.11 Reporter: Koji Noguchi Assignee: Koji Noguchi Priority: Blocker Fix For: 0.11 Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, pig-2975-trunk_v03-unionapproach.txt Looked at {noformat} junit.framework.AssertionFailedError at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352) {noformat} This looks like a valid test case failing with incorrect result. {noformat} % cat test/orderby.txt [key#1,key9#23] [key#3,key3#2] [key#22] % cat test/orderby.pig a = load 'test/orderby.txt' as (m:[]); b = foreach a generate m#'key' as b0; dump b; c = order b by b0; dump c; % java ... org.apache.pig.Main-x local test/orderby.pig [dump b] (1) (3) (22) ... [dump c] (1) (1) (22) % where did the '(3)' go? {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result
[ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480373#comment-13480373 ] Gianmarco De Francisci Morales commented on PIG-2975: - Yes, I was referring to the last alternative. If the serialization format changes (say we redefine the codes for TINYTUPLE) then we end up with a new order for tuples. As I said ByteArray sorting does not have a definite semantic, but I feel that it would be good to keep it stable across releases, if possible. TestTypedMap.testOrderBy failing with incorrect result --- Key: PIG-2975 URL: https://issues.apache.org/jira/browse/PIG-2975 Project: Pig Issue Type: Sub-task Affects Versions: 0.11 Reporter: Koji Noguchi Assignee: Koji Noguchi Priority: Blocker Fix For: 0.11 Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, pig-2975-trunk_v03-unionapproach.txt, pig-2975-trunk_v04-purerawcompare.txt Looked at {noformat} junit.framework.AssertionFailedError at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352) {noformat} This looks like a valid test case failing with incorrect result. {noformat} % cat test/orderby.txt [key#1,key9#23] [key#3,key3#2] [key#22] % cat test/orderby.pig a = load 'test/orderby.txt' as (m:[]); b = foreach a generate m#'key' as b0; dump b; c = order b by b0; dump c; % java ... org.apache.pig.Main-x local test/orderby.pig [dump b] (1) (3) (22) ... [dump c] (1) (1) (22) % where did the '(3)' go? {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result
[ https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480393#comment-13480393 ] Gianmarco De Francisci Morales commented on PIG-2975: - Indeed, my idea to keep the order stable is only a nice to have. For sure there is no strict requirement to keep it, so I am OK with foregoing it and directly comparing the whole bytes. Koji, that's a good question. I guess that it could happen if we lose the schema during the execution of a plan, e.g. because of a UDF. TestTypedMap.testOrderBy failing with incorrect result --- Key: PIG-2975 URL: https://issues.apache.org/jira/browse/PIG-2975 Project: Pig Issue Type: Sub-task Affects Versions: 0.11 Reporter: Koji Noguchi Assignee: Koji Noguchi Priority: Blocker Fix For: 0.11 Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, pig-2975-trunk_v03-unionapproach.txt, pig-2975-trunk_v04-purerawcompare.txt Looked at {noformat} junit.framework.AssertionFailedError at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352) {noformat} This looks like a valid test case failing with incorrect result. {noformat} % cat test/orderby.txt [key#1,key9#23] [key#3,key3#2] [key#22] % cat test/orderby.pig a = load 'test/orderby.txt' as (m:[]); b = foreach a generate m#'key' as b0; dump b; c = order b by b0; dump c; % java ... org.apache.pig.Main-x local test/orderby.pig [dump b] (1) (3) (22) ... [dump c] (1) (1) (22) % where did the '(3)' go? {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-2993) Fix local mode on Hadoop-0.23
Gianmarco De Francisci Morales created PIG-2993: --- Summary: Fix local mode on Hadoop-0.23 Key: PIG-2993 URL: https://issues.apache.org/jira/browse/PIG-2993 Project: Pig Issue Type: Sub-task Reporter: Gianmarco De Francisci Morales When compiling with -Dhadoopversion=23 and launching Pig in local mode (-x local) the shell just fills up with error notifications: {code} 2012-10-19 15:10:17,360 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. Could not initialize class org.apache.pig.tools.pigstats.PigStatsUtil {code} Here the stack trace: {code} Pig Stack Trace --- ERROR 2998: Unhandled internal error. org/apache/hadoop/mapreduce/task/JobContextImpl java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/task/JobContextImpl at org.apache.pig.tools.pigstats.PigStatsUtil.clinit(PigStatsUtil.java:54) at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:67) at org.apache.pig.Main.run(Main.java:538) at org.apache.pig.Main.main(Main.java:154) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.task.JobContextImpl at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:247) ... 9 more Pig Stack Trace --- ERROR 2998: Unhandled internal error. Could not initialize class org.apache.pig.tools.pigstats.PigStatsUtil java.lang.NoClassDefFoundError: Could not initialize class org.apache.pig.tools.pigstats.PigStatsUtil at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:67) at org.apache.pig.Main.run(Main.java:538) at org.apache.pig.Main.main(Main.java:154) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2985) TestRank1,2,3 fail with hadoop-2.0.x
[ https://issues.apache.org/jira/browse/PIG-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479213#comment-13479213 ] Gianmarco De Francisci Morales commented on PIG-2985: - This is a simple bug fix, should go to both 0.11 and trunk. TestRank1,2,3 fail with hadoop-2.0.x Key: PIG-2985 URL: https://issues.apache.org/jira/browse/PIG-2985 Project: Pig Issue Type: Sub-task Reporter: Cheolsoo Park Assignee: Rohini Palaniswamy Fix For: 0.11 Attachments: PIG-2985.patch To reproduce the error, please run: {code} ant clean test -Dhadoopversion=23 -Dtestcase=TestRank1 {code} This fails with the following error: {code} Caused by: java.lang.RuntimeException: Error to read counters into Rank operation counterSize 0 at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.saveCounters(JobControlCompiler.java:386) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.updateMROpPlan(JobControlCompiler.java:330) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:370) at org.apache.pig.PigServer.launchPlan(PigServer.java:1264) Caused by: java.lang.NullPointerException at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.saveCounters(JobControlCompiler.java:359) {code} I see the failures with hadoop-2.0.x only. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2985) TestRank1,2,3 fail with hadoop-2.0.x
[ https://issues.apache.org/jira/browse/PIG-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gianmarco De Francisci Morales updated PIG-2985: Resolution: Fixed Status: Resolved (was: Patch Available) +1, committed to both trunk and 0.11. Thanks Rohini! Interestingly, tests with hadoop-2.0 take 1/3 of the time compared to hadoop-1.0 TestRank1,2,3 fail with hadoop-2.0.x Key: PIG-2985 URL: https://issues.apache.org/jira/browse/PIG-2985 Project: Pig Issue Type: Sub-task Reporter: Cheolsoo Park Assignee: Rohini Palaniswamy Fix For: 0.11 Attachments: PIG-2985.patch To reproduce the error, please run: {code} ant clean test -Dhadoopversion=23 -Dtestcase=TestRank1 {code} This fails with the following error: {code} Caused by: java.lang.RuntimeException: Error to read counters into Rank operation counterSize 0 at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.saveCounters(JobControlCompiler.java:386) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.updateMROpPlan(JobControlCompiler.java:330) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:370) at org.apache.pig.PigServer.launchPlan(PigServer.java:1264) Caused by: java.lang.NullPointerException at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.saveCounters(JobControlCompiler.java:359) {code} I see the failures with hadoop-2.0.x only. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: CHANGES.txt in branches
OK, I fixed a bunch of them. There was also some misspelling on Pig issue numbers, which made everything even more confusing :) Cheers, -- Gianmarco On Tue, Oct 16, 2012 at 10:59 PM, Bill Graham billgra...@gmail.com wrote: Also guilty as of about 15 minutes ago. I just moved my entry for PIG-2976 to the Pig 0.11 section on the trunk. Great catch. On Tue, Oct 16, 2012 at 9:46 PM, Dmitriy Ryaboy dvrya...@gmail.com wrote: Guilty.. I guess we should be putting them under 0.11 in trunk. On Tue, Oct 16, 2012 at 8:18 PM, Jonathan Coveney jcove...@gmail.com wrote: AFAIK (and I don't really know), I thought that if we put it in both, that it'd go in the pig 11 section in trunk, and if not, we don't. Is this correct? Good job noticing this. 2012/10/16 Gianmarco De Francisci Morales g...@apache.org Hi devs, I noticed there is a misalignment in CHANGES.txt between 0.11 and trunk. It seems some people are putting patches on top in both versions of the file, while other are putting changes that get into 0.11 in the 0.11 section of the trunk file. Let me show an example: This is 0.11 Pig Change Log Release 0.11.0 (unreleased) INCOMPATIBLE CHANGES PIG-1891 Enable StoreFunc to make intelligent decision based on job success or failure (initialcontext via gates) IMPROVEMENTS PIG-2947: Documentation for Rank operator (xalan via azaroth) PIG-2943: DevTests, Refactor Windows checks to use new Util.WINDOWS method for code health (jgordon via dvryaboy) PIG-2794: Pig test: add utils to simplify testing on Windows (jgordon via gates) PIG-2908: Fix unit tests to work with jdk7 (rohini via dvryaboy) PIG-2965: RANDOM should allow seed initialization for ease of testing (jcoveney) PIG-2964: Add helper method getJobList() to PigStats.JobGraph. Extend visibility of couple methods on same class (prkommireddi via billgraham) And this is trunk: Pig Change Log Trunk (unreleased changes) INCOMPATIBLE CHANGES IMPROVEMENTS PIG-2943: DevTests, Refactor Windows checks to use new Util.WINDOWS method for code health (jgordon via dvryaboy) PIG-2966: Test failures on CentOS 6 because MALLOC_ARENA_MAX is not set (cheolsoo via sms) PIG-2793: Pig test: add utils to simplify testing on Windows (jgordon via gates) PIG-2908: Fix unit tests to work with jdk7 (rohini via dvryaboy) OPTIMIZATIONS BUG FIXES PIG-2928: Fix e2e test failures in trunk: FilterBoolean_23/24 (cheolsoo via dvryaboy) Release 0.11.0 (unreleased) INCOMPATIBLE CHANGES PIG-1891 Enable StoreFunc to make intelligent decision based on job success or failure (initialcontext via gates) IMPROVEMENTS PIG-2947: Documentation for Rank operator (xalan via azaroth) PIG-2910: Add function to read schema from outout of Schema.toString() (initialcontext via thejas) PIG-2965: RANDOM should allow seed initialization for ease of testing (jcoveney) PIG-2964: Add helper method getJobList() to PigStats.JobGraph. Extend visibility of couple methods on same class (prkommireddi via billgraham) Notice how PIG-2943, PIG-2793, PIG-2908 are marked as appearing in trunk in trunk and in 0.11 in 0.11. PIG-2910 is in 0.11 in trunk but not in 0.11 (I guess it is a small mistake). So, what's the correct behavior? Do we mark a patch in CHANGES.txt at the earliest place it appears in the code (so that CHANGES.txt is consistent across releases)? Or do we treat the branches independently, and thus we put each patch always at the top? Personally, I put PIG-2947 in the 0.11 section in trunk, but I don't have a strong opinion on it (as long as we are consistent). Cheers, -- Gianmarco -- *Note that I'm no longer using my Yahoo! email address. Please email me at billgra...@gmail.com going forward.*
[jira] [Resolved] (PIG-2947) Documentation for Rank operator
[ https://issues.apache.org/jira/browse/PIG-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gianmarco De Francisci Morales resolved PIG-2947. - Resolution: Fixed Documentation for Rank operator --- Key: PIG-2947 URL: https://issues.apache.org/jira/browse/PIG-2947 Project: Pig Issue Type: Improvement Reporter: Allan Avendaño Assignee: Allan Avendaño Priority: Trivial Labels: documentation Fix For: 0.11 Attachments: patch_01, patch_02, patch_03 User documentation for recently released Rank operator, with some basic explanation of usage and examples -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2922) Documentation and examples for RANK
[ https://issues.apache.org/jira/browse/PIG-2922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gianmarco De Francisci Morales resolved PIG-2922. - Resolution: Duplicate Documentation and examples for RANK --- Key: PIG-2922 URL: https://issues.apache.org/jira/browse/PIG-2922 Project: Pig Issue Type: Improvement Components: documentation Reporter: Gianmarco De Francisci Morales Assignee: Allan Avendaño Labels: documentation We need documentation and examples for the newly introduced RANK command. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2985) TestRank1,2,3 fail with hadoop-2.0.x
[ https://issues.apache.org/jira/browse/PIG-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477253#comment-13477253 ] Gianmarco De Francisci Morales commented on PIG-2985: - Did hadoop-2.0 change the way to access counters? TestRank1,2,3 fail with hadoop-2.0.x Key: PIG-2985 URL: https://issues.apache.org/jira/browse/PIG-2985 Project: Pig Issue Type: Sub-task Reporter: Cheolsoo Park Fix For: 0.11 To reproduce the error, please run: {code} ant clean test -Dhadoopversion=23 -Dtestcase=TestRank1 {code} This fails with the following error: {code} Caused by: java.lang.RuntimeException: Error to read counters into Rank operation counterSize 0 at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.saveCounters(JobControlCompiler.java:386) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.updateMROpPlan(JobControlCompiler.java:330) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:370) at org.apache.pig.PigServer.launchPlan(PigServer.java:1264) Caused by: java.lang.NullPointerException at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.saveCounters(JobControlCompiler.java:359) {code} I see the failures with hadoop-2.0.x only. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2947) Documentation for Rank operator
[ https://issues.apache.org/jira/browse/PIG-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gianmarco De Francisci Morales updated PIG-2947: Attachment: patch_03 Thanks Allan, the documentation looks good! Attaching a version with some minor changes. Will commit to both trunk and 0.11 branch. Documentation for Rank operator --- Key: PIG-2947 URL: https://issues.apache.org/jira/browse/PIG-2947 Project: Pig Issue Type: Improvement Reporter: Allan Avendaño Assignee: Allan Avendaño Priority: Trivial Labels: documentation Fix For: 0.11 Attachments: patch_01, patch_02, patch_03 User documentation for recently released Rank operator, with some basic explanation of usage and examples -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2947) Documentation for Rank operator
[ https://issues.apache.org/jira/browse/PIG-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477544#comment-13477544 ] Gianmarco De Francisci Morales commented on PIG-2947: - +1 Committed to both trunk and 0.11 branch. Documentation for Rank operator --- Key: PIG-2947 URL: https://issues.apache.org/jira/browse/PIG-2947 Project: Pig Issue Type: Improvement Reporter: Allan Avendaño Assignee: Allan Avendaño Priority: Trivial Labels: documentation Fix For: 0.11 Attachments: patch_01, patch_02, patch_03 User documentation for recently released Rank operator, with some basic explanation of usage and examples -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
CHANGES.txt in branches
Hi devs, I noticed there is a misalignment in CHANGES.txt between 0.11 and trunk. It seems some people are putting patches on top in both versions of the file, while other are putting changes that get into 0.11 in the 0.11 section of the trunk file. Let me show an example: This is 0.11 Pig Change Log Release 0.11.0 (unreleased) INCOMPATIBLE CHANGES PIG-1891 Enable StoreFunc to make intelligent decision based on job success or failure (initialcontext via gates) IMPROVEMENTS PIG-2947: Documentation for Rank operator (xalan via azaroth) PIG-2943: DevTests, Refactor Windows checks to use new Util.WINDOWS method for code health (jgordon via dvryaboy) PIG-2794: Pig test: add utils to simplify testing on Windows (jgordon via gates) PIG-2908: Fix unit tests to work with jdk7 (rohini via dvryaboy) PIG-2965: RANDOM should allow seed initialization for ease of testing (jcoveney) PIG-2964: Add helper method getJobList() to PigStats.JobGraph. Extend visibility of couple methods on same class (prkommireddi via billgraham) And this is trunk: Pig Change Log Trunk (unreleased changes) INCOMPATIBLE CHANGES IMPROVEMENTS PIG-2943: DevTests, Refactor Windows checks to use new Util.WINDOWS method for code health (jgordon via dvryaboy) PIG-2966: Test failures on CentOS 6 because MALLOC_ARENA_MAX is not set (cheolsoo via sms) PIG-2793: Pig test: add utils to simplify testing on Windows (jgordon via gates) PIG-2908: Fix unit tests to work with jdk7 (rohini via dvryaboy) OPTIMIZATIONS BUG FIXES PIG-2928: Fix e2e test failures in trunk: FilterBoolean_23/24 (cheolsoo via dvryaboy) Release 0.11.0 (unreleased) INCOMPATIBLE CHANGES PIG-1891 Enable StoreFunc to make intelligent decision based on job success or failure (initialcontext via gates) IMPROVEMENTS PIG-2947: Documentation for Rank operator (xalan via azaroth) PIG-2910: Add function to read schema from outout of Schema.toString() (initialcontext via thejas) PIG-2965: RANDOM should allow seed initialization for ease of testing (jcoveney) PIG-2964: Add helper method getJobList() to PigStats.JobGraph. Extend visibility of couple methods on same class (prkommireddi via billgraham) Notice how PIG-2943, PIG-2793, PIG-2908 are marked as appearing in trunk in trunk and in 0.11 in 0.11. PIG-2910 is in 0.11 in trunk but not in 0.11 (I guess it is a small mistake). So, what's the correct behavior? Do we mark a patch in CHANGES.txt at the earliest place it appears in the code (so that CHANGES.txt is consistent across releases)? Or do we treat the branches independently, and thus we put each patch always at the top? Personally, I put PIG-2947 in the 0.11 section in trunk, but I don't have a strong opinion on it (as long as we are consistent). Cheers, -- Gianmarco
[jira] [Commented] (PIG-2970) Nested foreach getting incorrect schema when having unrelated inner query
[ https://issues.apache.org/jira/browse/PIG-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475423#comment-13475423 ] Gianmarco De Francisci Morales commented on PIG-2970: - Haven't had time to look at the patch, but I guess it is related to PIG-2119. I thought we had solved it though. Nested foreach getting incorrect schema when having unrelated inner query - Key: PIG-2970 URL: https://issues.apache.org/jira/browse/PIG-2970 Project: Pig Issue Type: Bug Components: parser Reporter: Koji Noguchi Assignee: Koji Noguchi Priority: Minor Attachments: pig-2970-trunk-v01.txt While looking at PIG-2968, hit a weird error message. {noformat} $ cat -n test/foreach2.pig 1 daily = load 'nyse' as (exchange, symbol); 2 grpd = group daily by exchange; 3 unique = foreach grpd { 4 sym = daily.symbol; 5 uniq_sym = distinct sym; 6 --ignoring uniq_sym result 7 generate group, daily; 8 }; 9 describe unique; 10 zzz = foreach unique generate group; 11 explain zzz; % pig -x local -t ColumnMapKeyPrune test/foreach2.pig ... unique: {symbol: bytearray} 2012-10-12 16:55:44,226 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1025: file test/foreach2.pig, line 10, column 30 Invalid field projection. Projected field [group] does not exist in schema: symbol:bytearray. ... {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Pig 0.11
We are missing some documentation on the RANK but I guess we could add that to the branch and trunk in parallel. All the patches I was keeping an eye on are in. So +1 for me. -- Gianmarco On Wed, Oct 10, 2012 at 5:31 PM, Jonathan Coveney jcove...@gmail.comwrote: I think all of the major patches are in, no? Now it's just bug testing? Just wanted to touch base on where we are at with this.
Re: Pig 0.11
I added it as a dependency as it has already its own Jira. I hope it is OK. Cheers, -- Gianmarco On Wed, Oct 10, 2012 at 11:23 PM, Bill Graham billgra...@gmail.com wrote: +1 for me. There's https://issues.apache.org/jira/browse/PIG-2756 which tracks a few documentation issues that should block Pig 0.11, but they can also be done on the trunk and merged to the branch. Gianmarco, you can add a rank subtask there to serve as a reminder. On Wed, Oct 10, 2012 at 11:03 PM, Gianmarco De Francisci Morales g...@apache.org wrote: We are missing some documentation on the RANK but I guess we could add that to the branch and trunk in parallel. All the patches I was keeping an eye on are in. So +1 for me. -- Gianmarco On Wed, Oct 10, 2012 at 5:31 PM, Jonathan Coveney jcove...@gmail.com wrote: I think all of the major patches are in, no? Now it's just bug testing? Just wanted to touch base on where we are at with this. -- *Note that I'm no longer using my Yahoo! email address. Please email me at billgra...@gmail.com going forward.*
[jira] [Updated] (PIG-2941) Ivy resolvers in pig don't have consistent chaining and don't have a kitchen sink option for novices
[ https://issues.apache.org/jira/browse/PIG-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gianmarco De Francisci Morales updated PIG-2941: Assignee: John Gordon Status: Open (was: Patch Available) Can you regenerate the patch with --no-prefix option and without the email headers? Also, can you explain why maven2 is both on internal and external resolver? Canceling patch for the moment. Ivy resolvers in pig don't have consistent chaining and don't have a kitchen sink option for novices Key: PIG-2941 URL: https://issues.apache.org/jira/browse/PIG-2941 Project: Pig Issue Type: Bug Components: build Affects Versions: 0.10.0 Reporter: John Gordon Assignee: John Gordon Fix For: 0.10.0 Attachments: 0001-IvySettings.xml-refactor-to-simplify-resolution.patch The Ivy resolvers in Pig are split into default, external, and internal -- and they are all actually distinct. There isn't a resolver that rolls over all three, and fallbacks aren't in place. Ideally, these resolver should chain right through with the default following a best practice fallback for novices. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2947) Documentation for Rank operator
[ https://issues.apache.org/jira/browse/PIG-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471580#comment-13471580 ] Gianmarco De Francisci Morales commented on PIG-2947: - There is no single place where all the reserved keywords are collected AFAIK. If we decide to create one, it should be automated. However, on the rank issues, I have used 'count' as a column name in many of my Pig scripts and it never blew up. I guess the situation is similar. Documentation for Rank operator --- Key: PIG-2947 URL: https://issues.apache.org/jira/browse/PIG-2947 Project: Pig Issue Type: Improvement Reporter: Allan Avendaño Assignee: Allan Avendaño Priority: Trivial Labels: documentation Attachments: patch_01 User documentation for recently released Rank operator, with some basic explanation of usage and examples -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2946) Documentation of history and clear commands
[ https://issues.apache.org/jira/browse/PIG-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gianmarco De Francisci Morales updated PIG-2946: Attachment: patch_02 I just reworded a couple of sentences. Apart from that, +1. Uploading the patch committed to trunk. Thanks Allan! Documentation of history and clear commands Key: PIG-2946 URL: https://issues.apache.org/jira/browse/PIG-2946 Project: Pig Issue Type: Improvement Components: documentation Reporter: Allan Avendaño Assignee: Allan Avendaño Priority: Trivial Labels: documentation Attachments: patch_01, patch_02 After adding these two commands history and clear to the Pig Grunt Shell, this is a basic user documentation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (PIG-2946) Documentation of history and clear commands
[ https://issues.apache.org/jira/browse/PIG-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gianmarco De Francisci Morales resolved PIG-2946. - Resolution: Fixed Fix Version/s: 0.11 Documentation of history and clear commands Key: PIG-2946 URL: https://issues.apache.org/jira/browse/PIG-2946 Project: Pig Issue Type: Improvement Components: documentation Reporter: Allan Avendaño Assignee: Allan Avendaño Priority: Trivial Labels: documentation Fix For: 0.11 Attachments: patch_01, patch_02 After adding these two commands history and clear to the Pig Grunt Shell, this is a basic user documentation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2947) Documentation for Rank operator
[ https://issues.apache.org/jira/browse/PIG-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471391#comment-13471391 ] Gianmarco De Francisci Morales commented on PIG-2947: - There seems to be some imprecision in the Syntax of RANK. The BY clause should be optional (so use square brackets). Same for the DENSE option. The examples look good. The wording could use some cleaning up to make it more clear (for example using less passive voice, clearly stating what RANK is supposed to do). Documentation for Rank operator --- Key: PIG-2947 URL: https://issues.apache.org/jira/browse/PIG-2947 Project: Pig Issue Type: Improvement Reporter: Allan Avendaño Assignee: Allan Avendaño Priority: Trivial Labels: documentation Attachments: patch_01 User documentation for recently released Rank operator, with some basic explanation of usage and examples -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2920) e2e tests override PERL5LIB environment variable
[ https://issues.apache.org/jira/browse/PIG-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gianmarco De Francisci Morales updated PIG-2920: Resolution: Fixed Fix Version/s: 0.11 Status: Resolved (was: Patch Available) e2e tests override PERL5LIB environment variable Key: PIG-2920 URL: https://issues.apache.org/jira/browse/PIG-2920 Project: Pig Issue Type: Bug Reporter: Gianmarco De Francisci Morales Assignee: Gianmarco De Francisci Morales Priority: Minor Fix For: 0.11 Attachments: PIG-2920.2.patch, PIG-2920.patch I am not sure why but e2e tests set PERL5LIB like this: {code} env key=PERL5LIB value=./libexec/ {code} This overrides any env variable, so there is no way to use custom Perl installations. This patch just removes the line, thus we will rely on the user to configure PERL5LIB appropriately. With this modification I am able to use my custom Perl installation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2920) e2e tests override PERL5LIB environment variable
[ https://issues.apache.org/jira/browse/PIG-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gianmarco De Francisci Morales updated PIG-2920: Attachment: PIG-2920.2.patch Addressed the comments by Rohini. Now the user can set the property harness.PERL5LIB to control the PERL5LIB environment variable in the tests. e2e tests override PERL5LIB environment variable Key: PIG-2920 URL: https://issues.apache.org/jira/browse/PIG-2920 Project: Pig Issue Type: Bug Reporter: Gianmarco De Francisci Morales Assignee: Gianmarco De Francisci Morales Priority: Minor Attachments: PIG-2920.2.patch, PIG-2920.patch I am not sure why but e2e tests set PERL5LIB like this: {code} env key=PERL5LIB value=./libexec/ {code} This overrides any env variable, so there is no way to use custom Perl installations. This patch just removes the line, thus we will rely on the user to configure PERL5LIB appropriately. With this modification I am able to use my custom Perl installation. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2932) Setting high default_parallel causes IOException in local mode
[ https://issues.apache.org/jira/browse/PIG-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13465232#comment-13465232 ] Gianmarco De Francisci Morales commented on PIG-2932: - Cheolsoo, thanks for the explanation. Now it is more clear. I agree with your proposals. Will test and commit the patch tomorrow. Setting high default_parallel causes IOException in local mode -- Key: PIG-2932 URL: https://issues.apache.org/jira/browse/PIG-2932 Project: Pig Issue Type: Bug Reporter: Gianmarco De Francisci Morales Priority: Critical Attachments: PIG-2932.patch This bug has been confirmed only in local mode. When setting a high default_parallel, Pig fails on some operations. The following data and script reproduce the bug. Data: {code} grunt cat file.txt 111 qwer 122 qwerty 133 ert 133 ertyu 144 zxcv 166 fsdfg 166 fdfghj 188 fjklopi {code} Script: {code} SET default_parallel 9 a = load 'file.txt' as (id1:int, id2:int, str:chararray); b = group a by (id1,id2); c = foreach b generate flatten(group), a; d = order c by group::id1 ASC, group::id2 ASC; dump d {code} Error: {code} 2012-09-26 15:28:13,230 [Thread-32] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map - Aliases being processed per job phase (AliasName[line,offset]): M: d[12,4] C: R: 2012-09-26 15:28:13,232 [Thread-32] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0009 java.io.IOException: Illegal partition for Null: false index: 0 (12,2) (1) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:123) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) {code} The script succeeds if default_parallel is set to 2. I guess it depends on the fact that the default_parallel is higher than the number of unique keys, probably some quirk with ORDER BY. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: e2e tests for Rank function
I was able to reproduce the bug, I opened PIG-2932 to track it. Cheers, -- Gianmarco On Wed, Sep 26, 2012 at 12:07 PM, Gianmarco De Francisci Morales g...@apache.org wrote: Forwarding to pig-dev. Summary, it looks like we have a regression in trunk. We need to investigate it before branching 0.11 Cheers, -- Gianmarco -- Forwarded message -- From: Allan aaven...@gmail.com Date: Wed, Sep 26, 2012 at 11:21 AM Subject: Re: e2e tests for Rank function To: cheolsoo cheol...@cloudera.com, Gianmarco De Francisci Morales g...@apache.org Hi Cheolsoo and Gianmarco, I double check the e2e tests, and I reproduced the scenario and it's correct...it's failing. Then, looking for a possible reason, I tried the following script: SET default_parallel 9; A = LOAD 'prerank' using PigStorage(',') as (rownumber:long,rankcabd:long,rankbdaa:long,rankbdca:long,rankaacd:long,rankaaba:long,a:int,b:int,c:int,tail:bytearray); B = group A by (a, b); C = foreach B generate flatten(group),A; D = order C by group::a ASC, group::b ASC; And it fails, with the same exception' message. Then, I tried the same script, but omitting the SET default_parallel 9; and it works. So, I'm really surprised that on local mode it doesn't work with parallelism. The reason for using this script is because RANK (RANK BY) operator uses the same chain of operators: GROUP (B), a flatten (C), SORT (D). Best regards, On Sun, Sep 23, 2012 at 10:43 PM, Cheolsoo Park cheol...@cloudera.comwrote: Hello, The e2e tests for Rank function in trunk do not pass for me when running in local mode. I am wondering whether they all pass for everyone. What I am doing is as following: ant clean ant -Dhadoopversion=20 ... test-e2e-deploy-local ant -Dhadoopversion=20 ... test-e2e-local -Dtests.to.run=-t Rank All tests except Rank_4 fail with errors similar to this: java.io.IOException: Illegal partition for Null: false index: 0 (1,7) (1) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:123) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) I wanted to double check whether I am doing something wrong before I open a jira. Thanks, Cheolsoo -- Allan Avendaño S. Computer Engineer SWY22 Participant GSOC 2012 Participant Rome - Italy Gmail: aaven...@gmail.com --
[jira] [Updated] (PIG-2353) RANK function like in SQL
[ https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gianmarco De Francisci Morales updated PIG-2353: Release Note: Pig includes a new RANK operator: RANK relation ( BY column (ASC|DES)? (DENSE)? )? This operator prepends a consecutive integer to each tuple in the relation starting from 1. If the BY clause is present, RANK sorts the relation before ranking it, otherwise it uses the order in which it receives the relation (e.g. the order in which the relation is stored if RANK is performed right after a LOAD). The DENSE modifier produces a dense rank, which has no gaps in it regardless of ties. was: Pig includes a new RANK operator: RANK relation ( BY column (ASC|DES)? )? This operator prepends a consecutive integer to each tuple in the relation starting from 1. If the BY clause is present, RANK sorts the relation before ranking it, otherwise it uses the order in which it receives the relation (e.g. the order in which the relation is stored if RANK is performed right after a LOAD). RANK function like in SQL - Key: PIG-2353 URL: https://issues.apache.org/jira/browse/PIG-2353 Project: Pig Issue Type: New Feature Reporter: Gianmarco De Francisci Morales Assignee: Allan Avendaño Labels: gsoc2012, mentor Fix For: 0.11 Attachments: PIG-2353-2, PIG-2353-3.txt, PIG-2353-4.txt, PIG-2353-5.txt, PIG2353.patch Implement a function that given a (sorted) bag adds to each tuple a unique, increasing identifier without gaps, like what RANK does for SQL. This is a candidate project for Google summer of code 2012. More information about the program can be found at https://cwiki.apache.org/confluence/display/PIG/GSoc2012 Functionality implemented so far, is available at https://reviews.apache.org/r/5523/diff/#index_header -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: e2e tests for Rank function
Hi, Weird, they should be passing. I will double check them tomorrow. Cheers, -- Gianmarco On Sun, Sep 23, 2012 at 10:43 PM, Cheolsoo Park cheol...@cloudera.comwrote: Hello, The e2e tests for Rank function in trunk do not pass for me when running in local mode. I am wondering whether they all pass for everyone. What I am doing is as following: ant clean ant -Dhadoopversion=20 ... test-e2e-deploy-local ant -Dhadoopversion=20 ... test-e2e-local -Dtests.to.run=-t Rank All tests except Rank_4 fail with errors similar to this: java.io.IOException: Illegal partition for Null: false index: 0 (1,7) (1) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691) at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:123) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212) I wanted to double check whether I am doing something wrong before I open a jira. Thanks, Cheolsoo
[jira] [Updated] (PIG-2879) Pig current releases lack a UDF startsWith.This UDF tests if a given string starts with the specified prefix.
[ https://issues.apache.org/jira/browse/PIG-2879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gianmarco De Francisci Morales updated PIG-2879: Resolution: Fixed Fix Version/s: 0.11 Release Note: Pig now includes a STARTSWITH built-in UDF that checks for presence of a given prefix in a chararray. Status: Resolved (was: Patch Available) +1 Committed to trunk. Thanks Eli! Pig current releases lack a UDF startsWith.This UDF tests if a given string starts with the specified prefix. -- Key: PIG-2879 URL: https://issues.apache.org/jira/browse/PIG-2879 Project: Pig Issue Type: New Feature Components: piggybank Affects Versions: 0.10.0 Reporter: Anuroopa George Assignee: Eli Reisman Labels: features, patch Fix For: 0.11 Attachments: PIG-2879-1.patch, PIG-2879-2.patch, PIG-2879-3.patch, PIG-2879-4.patch Pig current releases lack a UDF startsWith.This UDF tests if a given string starts with the specified prefix.This UDF returns true if the character sequence represented by the string argument given as a prefix is a prefix of the character sequence represented by the given string; false otherwise.Also true will be returned if the given prefix is an empty string or is equal to the given String. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira