[jira] [Resolved] (PIG-3715) CLONE - Default split destination

2015-05-07 Thread Gianmarco De Francisci Morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3715?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianmarco De Francisci Morales resolved PIG-3715.
-
Resolution: Fixed

 CLONE - Default split destination
 -

 Key: PIG-3715
 URL: https://issues.apache.org/jira/browse/PIG-3715
 Project: Pig
  Issue Type: New Feature
Reporter: Hardik
Assignee: Gianmarco De Francisci Morales
  Labels: gsoc2011
 Fix For: 0.10.0


 split statement is better to have a default destination, eg:
 {code}
 SPLIT A INTO X IF f17, Y IF f2==5, Z IF (f36 OR f36), OTHER otherwise; -- 
 OTHERS has all tuples with f1=7  f2!=5  f3==6
 {code}
 This is a candidate project for Google summer of code 2011. More information 
 about the program can be found at http://wiki.apache.org/pig/GSoc2011



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (PIG-4392) RANK BY fails when default_parallel is greater than cardinality of field being ranked by

2015-02-04 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-4392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14304961#comment-14304961
 ] 

Gianmarco De Francisci Morales commented on PIG-4392:
-

Yes, the order is guaranteed by RANK, which uses ORDER BY.
Not sure about the reversed order either, otherwise LGTM +1.

 RANK BY fails when default_parallel is greater than cardinality of field 
 being ranked by
 

 Key: PIG-4392
 URL: https://issues.apache.org/jira/browse/PIG-4392
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11.1
Reporter: Anthony Hsu
Assignee: Daniel Dai
 Fix For: 0.15.0

 Attachments: PIG-4392-1.patch


 To reproduce:
 {code:title=input.txt}
 1 2 3
 4 5 6
 7 8 9
 {code}
 {code:title=rank.pig}
 set default_parallel 4;
 d = load 'input.txt' using PigStorage(' ') as (a:int, b:int, c:int);
 e = rank d by a;
 dump e;
 {code}
 If {{default_parallel}} is set to {{3}}, the script succeeds. So I'm guessing 
 RANK BY has issues if the {{default_parallel}} exceeds the cardinality of the 
 field being ranked by.
 I'm seeing this issue with Pig 0.11.1 (which has the PIG-2932 patch applied) 
 and Hadoop 2.3.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [RESULT] [VOTE] Drop support for Hadoop 0.20 from Pig 0.14

2014-10-01 Thread Gianmarco De Francisci Morales
We should add this in the release notes for 0.14 as well.

Cheers,

--
Gianmarco

On 29 September 2014 19:25, Rohini Palaniswamy rohini.adi...@gmail.com
wrote:

 My +1 as well.

 With 6 binding +1s, 8 non-binding +1s and no -1s this vote passes.

 Nothing special to address for this. PIG-3507 which went into Pig 0.14 used
 UserGroupInformation class without reflection and so Pig 0.14 is already
 incompatible with Hadoop 0.20.

 Regards,
 Rohini

 On Mon, Sep 22, 2014 at 5:56 PM, Thejas Nair the...@hortonworks.com
 wrote:

  +1
 
  On Thu, Sep 18, 2014 at 5:50 PM, Mona Chitnis mona.chit...@yahoo.in
  wrote:
  
   +1 (non-binding)
Mona Chitnis
   Yahoo!
  
On Thursday, September 18, 2014 8:48 AM, Ashutosh Chauhan 
  hashut...@apache.org wrote:
  
  
+1
  
   On Wed, Sep 17, 2014 at 7:02 PM, Daniel Dai da...@hortonworks.com
  wrote:
  
   +1
  
   On Wed, Sep 17, 2014 at 11:12 AM, Prashant Kommireddi
   prash1...@gmail.com wrote:
+1
   
On Wed, Sep 17, 2014 at 8:44 AM, Cheolsoo Park 
 piaozhe...@gmail.com
   wrote:
   
+1
   
On Wed, Sep 17, 2014 at 7:09 AM, Xuefu Zhang xzh...@cloudera.com
   wrote:
   
 +1

 On Wed, Sep 17, 2014 at 7:04 AM, Julien Le Dem jul...@ledem.net
 
   wrote:

  +1
 
  Julien
 
   -Original Message-
   From: Rohini Palaniswamy [mailto:rohini.adi...@gmail.com]
   Sent: Wednesday, September 17, 2014 12:38 PM
   To: dev@pig.apache.org
   Subject: [VOTE] Drop support for Hadoop 0.20 from Pig 0.14
  
   Hi,
Hadoop has matured far from Hadoop 0.20 and has had two
 major
 releases
  after that and there has been no development on branch-0.20 (
  
  http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20/)
for
 3
  years now. It is high time we drop support for Hadoop 0.20 and
  only
 support
  Hadoop 1.x and 2.x lines going forward. This will reduce the
maintenance
  effort and also enable us to right more efficient code and cut
  down
   on
  reflections.
  
   Vote closes on Tuesday, Sep 23 2014.
  
   Thanks,
   Rohini
 

   
  
   --
   CONFIDENTIALITY NOTICE
   NOTICE: This message is intended for the use of the individual or
  entity to
   which it is addressed and may contain information that is
 confidential,
   privileged and exempt from disclosure under applicable law. If the
  reader
   of this message is not the intended recipient, you are hereby notified
  that
   any printing, copying, dissemination, distribution, disclosure or
   forwarding of this communication is strictly prohibited. If you have
   received this communication in error, please contact the sender
  immediately
   and delete it from your system. Thank You.
  
  
  
  
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.
 



Re: [VOTE] Drop support for JDK 6 from Pig 0.14

2014-09-17 Thread Gianmarco De Francisci Morales
+1

--
Gianmarco

On 17 September 2014 10:11, Lorand Bendig lben...@gmail.com wrote:

 +1


 On 17/09/14 06:47, Rohini Palaniswamy wrote:

 Hi,
 Hadoop is dropping support for JDK6 from hadoop-2.7 this year as
 mentioned in the mail below. Pig should also move to JDK7 to be able to
 compile against future hadoop 2.x releases and start making releases with
 jars (binaries, maven repo) compiled in JDK 7. This would also open it up
 for developers to code with JDK7 specific APIs.

 Vote closes on Tuesday, Sep 23 2014.

 Thanks,
 Rohini




 -- Forwarded message --
 From: Arun C Murthy a...@hortonworks.com
 Date: Tue, Aug 19, 2014 at 10:52 AM
 Subject: Dropping support for JDK6 in Apache Hadoop
 To: d...@hbase.apache.org d...@hbase.apache.org, d...@hive.apache.org,
 dev@pig.apache.org, d...@oozie.apache.org
 Cc: common-...@hadoop.apache.org common-...@hadoop.apache.org


 [Apologies for the wide distribution.]

 Dear HBase/Hive/Pig/Oozie communities,

   We, over at Hadoop are considering dropping support for JDK6 this year.

   As you maybe aware we just released hadoop-2.5.0 and are now considering
 making the next release i.e. hadoop-2.6.0 the *last* release of Apache
 Hadoop which supports JDK6. This means, from hadoop-2.7.0 onwards we will
 not support JDK6 anymore and we *may* start relying on JDK7-specific apis.

   Now, the above releases a proposal and we do not want to pull the
 trigger
 without talking to projects downstream - hence the request for you
 feedback.

   Please feel free to forward this to other communities you might deem to
 be
 at risk from this too.

 thanks,
 Arun





Re: [VOTE] Drop support for Hadoop 0.20 from Pig 0.14

2014-09-17 Thread Gianmarco De Francisci Morales
+1

--
Gianmarco

On 17 September 2014 10:11, Lorand Bendig lben...@gmail.com wrote:

 +1


 On 17/09/14 06:38, Rohini Palaniswamy wrote:

 Hi,
 Hadoop has matured far from Hadoop 0.20 and has had two major releases
 after that and there has been no development on branch-0.20 (
 http://svn.apache.org/viewvc/hadoop/common/branches/branch-0.20/) for 3
 years now. It is high time we drop support for Hadoop 0.20 and only
 support
 Hadoop 1.x and 2.x lines going forward. This will reduce the maintenance
 effort and also enable us to right more efficient code and cut down on
 reflections.

 Vote closes on Tuesday, Sep 23 2014.

 Thanks,
 Rohini





[jira] [Commented] (PIG-3900) SAMPLE and RANDOM should optionally stabilize their output from run-to-run, even across a large input set

2014-04-17 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972525#comment-13972525
 ] 

Gianmarco De Francisci Morales commented on PIG-3900:
-

+1 on the idea.

 SAMPLE and RANDOM should optionally stabilize their output from run-to-run, 
 even across a large input set
 -

 Key: PIG-3900
 URL: https://issues.apache.org/jira/browse/PIG-3900
 Project: Pig
  Issue Type: Bug
Reporter: Philip (flip) Kromer
Priority: Minor
  Labels: features, random, sample, seed

 SAMPLE and RANDOM should be able to give output that is stable from 
 run-to-run, yet random across a large input set. Although PIG-2965 allows the 
 RANDOM function to be constructed with a seed, each mapper will generate the 
 same sequence of values, which is unacceptable.
 It's typically undesirable to have the output of a large job be completely 
 non-deterministic. Testing becomes difficult, and failed map tasks don't 
 provide the same output from attempt to attempt, which complicates debugging.
 The most desirable implementation would provide a guarantee that a given seed 
 and input data would produce an identical result in any environment. I 
 believe this is difficult in a distributed environment, however.
 If each mapper added the index of its task ID to the provided seed, then the 
 output would be stable for most practical purposes -- as long as the 
 assignment of input splits to mappers doesn't change from job to job, the 
 number produced for each row won't change from job to job. Doing it this way 
 would be backwards compatible with the current Pig 0.12.0 implementation 
 (PIG-2965) in the case of a single mapper (which is the only justifiable use 
 of the current seed feature). Alternatively, one could use a hash of the 
 input file path, the split offset, and the provided seed. Both approaches are 
 not stable if the splitCombination logic is not stable. 
 Suggested documentation for new functionality of RANDOM:
 {quote}
 This example constructs a function, providing a seed to control the series of 
 numbers generated. Each of the three fields will have an  independent series 
 of random values, and the output will be stable from run to run. (Note that 
 the result is only stable if the input splits remain stable).
 {code:sql}
 DEFINE rollRand  RANDOM('12345');
 DEFINE yawRand   RANDOM('69');
 DEFINE pitchRand RANDOM('42');
 position = LOAD 'position.tsv';
 orientation = FOREACH position GENERATE rollRand() AS roll:double, 
 pitchRand() AS pitch:double, yawRand() AS yaw:double;
 {code}
 {quote}
 Suggested documentation for new functionality of SAMPLE:
 {quote}
 In this example, we provide a seed that stabilizes which rows are selected 
 from run to run. (Note that the result is only stable if the input splits 
 remain stable).
 {code:sql}
 a = LOAD 'a.txt';
 b = SAMPLE A 0.1 SEED 42;
 {code}
 {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Pig 0.13.0 release

2014-02-13 Thread Gianmarco De Francisci Morales
+1 on releasing a 0.13 and if somebody feels strongly about releasing a
0.12.1 that fixes PIG-3492 I am +1 on that too.

--
Gianmarco


On 13 February 2014 03:08, Dmitriy Ryaboy dvrya...@gmail.com wrote:

 So I think we agree that we should branch 0.13 at this point, right?

 (and possibly look at releasing an incremental bump to 12 or 10? I'm not
 sure what should be included there but I support the general idea).


 On Thu, Feb 6, 2014 at 2:22 PM, Koji Noguchi knogu...@yahoo-inc.com
 wrote:

   Releasing 0.13 and 0.10.1 is totally independent in my opinion
  
  I should have referenced my previous request on including 0.10.1 on the
  top release page.
 http://www.mail-archive.com/dev@pig.apache.org/msg20629.html
 
   By minor I meant 0.13
   0.10.1 is a bug fix release.
   as in Major.Minor.BugFix
  
  I see. Then I should have said,
but I'd like to request we make BugFix releases more often.
 
  Thanks for correcting my mistake.
 
  Koji
 
 
  On Feb 6, 2014, at 5:05 PM, Julien Le Dem jul...@ledem.net wrote:
 
   Releasing 0.13 and 0.10.1 is totally independent in my opinion.
   It just takes the time of a committer that needs the release to happen
  to do it.
  
   By minor I meant 0.13
   0.10.1 is a bug fix release.
   as in Major.Minor.BugFix
  
   Our Major version is still 0
  
   On Feb 6, 2014, at 1:43 PM, Koji Noguchi wrote:
  
   To add to the discussion, I think we should release more often, based
  on time elapsed rather than volume of change.
  
   I don't have preference on the frequency, but I'd like to request we
  make minor releases more often.
  
   At this moment, stable pig release (to me) is still 0.10.1.
   0.11.1 and 0.12.0 both have regression bug PIG-3492 that caused
  multiple production pig scripts in our clusters to fail randomly.
   (unless user is disabling ColumnMapKeyPrune)
  
   If releasing 0.13 means 0.10.1 gets kicked out from the front release
  list, I'd like to see minor release on 0.11 or 0.12 first.
  
   Koji
  
  
  
  
   On Feb 6, 2014, at 4:25 PM, Cheolsoo Park piaozhe...@gmail.com
   wrote:
  
   +1 to 0.13 release. Why not if someone is volunteering?
  
   On Thu, Feb 6, 2014 at 4:06 PM, Julien Le Dem jul...@ledem.net
  wrote:
  
   To add to the discussion, I think we should release more often,
 based
  on
   time elapsed rather than volume of change.
   The more often we release, the easier it is to release.
   Also that makes it easier for contributors to use their own
  contributions
   in official releases.
   It is also probably a good idea to have a clean starting point
 before
   merging the Tez branch
  
   That said, I think those changes by themselves are enough to
 warrant a
   minor release.
  
   Julien
  
   On Feb 6, 2014, at 12:24 PM, Dmitriy Ryaboy wrote:
  
   Major updates since we release 12 that are currently in trunk:
  
   - lazy output (don't generate empty part files)
   - jar caching optimization
   - automatic local mode for small job (big wall-clock wins for
  long-tail
   jobs)
   - improved support for BigInteger, BigDecimal
   - hbase loader improvements
   - debug mode that leaves temp files around for examination (!)
   - fixes to a few nasty bugs (PIG-3641)
   - pluggable execution engine allowing work like Tez and Spork
   - .. and more
  
   I'd say this justifies a release.
  
   D
  
  
   On Wed, Feb 5, 2014 at 3:55 PM, Aniket Mokashi 
 aniket...@gmail.com
   wrote:
  
   List I mentioned is pending tasks before we can make a release.
  
   A complete list of contributions can be seen at -
   http://svn.apache.org/viewvc/pig/trunk/CHANGES.txt?view=markup.
  
   Some of the things that make it a good candidate for a release-
   - PIG-3419 (has several backwards incompatible api changes)
   - PIG-2672
   - PIG-3642
   - PIG-3463
   - PIG-3511
   - PIG-3657
  
   Thanks,
   Aniket
  
  
  
   On Wed, Feb 5, 2014 at 3:23 PM, Olga Natkovich 
  onatkov...@yahoo.com
   wrote:
  
   Just going by the list that Aniket provided, I don't really see
  enough
   for
   a full release. Two mentioned JIRAs are doc updates and one is a
  bug
   fix
   that was ported into Pig 12.
  
  
  
   On Wednesday, February 5, 2014 3:13 PM, Aniket Mokashi 
   aniket...@gmail.com wrote:
  
   Hi All,
  
   A good number of improvements and bug fixes have gone into trunk
   recently.
   I'd like to know if we can roll out a Pig 0.13 release around
   mid-March?
  
   I am aware that we are planning to merge tez branch into trunk
  soon.
   However, making a release before tez branch is merged will be
  good. Any
   objections?
  
   Following are few jiras we need to wrap up before 0.13 release-
   PIG-3591
   PIG-3740
   PIG-3745
   PIG-3347
   PIG-3731
   Any other?
  
   Thanks,
   Aniket
  
  
  
  
   --
   ...:::Aniket:::... Quetzalco@tl
  
  
  
  
  
 
 



Re: GSOC 2014

2014-02-13 Thread Gianmarco De Francisci Morales
I had a partial implementation of Pig mavenization last year (that of
course now is obsolete).
I think it's not as complicated as it sounds, just time consuming.
+1 on the idea, might even have some time to co-mentor.

--
Gianmarco


On 11 February 2014 20:49, Julien Le Dem jul...@ledem.net wrote:

 Some project ideas:
   - mavenize Pig (I know... but still, I think it should be done)
   - Compile physical operators to bytecode. I have a prototype, that could
 be made real by a student:
 https://github.com/julienledem/pig/compare/trunk...compile_physical_plan

 Julien

 On Feb 11, 2014, at 9:42 AM, Daniel Dai wrote:

  Any committer interested in mentoring? Any project ideas? We need to
  make project description ready by 2/24.
 
  Thanks,
  Daniel
 
  --
  CONFIDENTIALITY NOTICE
  NOTICE: This message is intended for the use of the individual or entity
 to
  which it is addressed and may contain information that is confidential,
  privileged and exempt from disclosure under applicable law. If the reader
  of this message is not the intended recipient, you are hereby notified
 that
  any printing, copying, dissemination, distribution, disclosure or
  forwarding of this communication is strictly prohibited. If you have
  received this communication in error, please contact the sender
 immediately
  and delete it from your system. Thank You.




[jira] [Commented] (PIG-3642) Direct HDFS access for small jobs (fetch)

2014-01-02 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860156#comment-13860156
 ] 

Gianmarco De Francisci Morales commented on PIG-3642:
-

I am -0 on this idea.
Skipping MR requires rewriting good part of the execution logic, and might 
introduce weird optimization bugs.
More importantly, the added advantage brought by this feature is small.
Usually, if you want to test your program on a small input, you copy it locally 
and run Pig in local mode.

 Direct HDFS access for small jobs (fetch) 
 --

 Key: PIG-3642
 URL: https://issues.apache.org/jira/browse/PIG-3642
 Project: Pig
  Issue Type: Improvement
Reporter: Lorand Bendig
Assignee: Lorand Bendig
 Fix For: 0.13.0

 Attachments: PIG-3642.patch


 With this patch I'd like to add the possibility to directly read data from 
 HDFS instead of launching MR jobs in case of simple (map-only) tasks. Hive 
 already has this feature (fetch). This patch shares some similarities with 
 the local mode of Pig 0.6. Here, fetching kicks off when the following holds 
 for a script:
 * it contains only LIMIT, FILTER, UNION (if no split is generated), STREAM, 
 (nested) FOREACH with expression operators, custom UDFs..etc
 * no scalar aliases
 * no SampleLoader
 * single leaf job
 * DUMP (no STORE)
 The feature is enabled by default and can be toggled with:
 * -N or -no_fetch 
 * set opt.fetch true/false; 
 There's no STORE support because I wanted to make it explicit that this 
 optimization is for launching small/simple scripts during development, 
 rather than querying and filtering large number of rows on the client 
 machine. However, a threshold could be given on the input size (an 
 estimation) to determine whether to prefer fetch over MR jobs, similar to 
 what Hive's '{{hive.fetch.task.conversion.threshold}}' does. (through Pig's 
 LoadMetadata#getStatistic ?)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Updated] (PIG-3453) Implement a Storm backend to Pig

2014-01-02 Thread Gianmarco De Francisci Morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianmarco De Francisci Morales updated PIG-3453:


Status: Open  (was: Patch Available)

Canceling patch as it is not ready to be committed.

 Implement a Storm backend to Pig
 

 Key: PIG-3453
 URL: https://issues.apache.org/jira/browse/PIG-3453
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.13.0
Reporter: Pradeep Gollakota
Assignee: Jacob Perkins
  Labels: storm
 Fix For: 0.13.0

 Attachments: storm-integration.patch


 There is a lot of interest around implementing a Storm backend to Pig for 
 streaming processing. The proposal and initial discussions can be found at 
 https://cwiki.apache.org/confluence/display/PIG/Pig+on+Storm+Proposal



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (PIG-3642) Direct HDFS access for small jobs (fetch)

2014-01-02 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13860228#comment-13860228
 ] 

Gianmarco De Francisci Morales commented on PIG-3642:
-

I haven't reviewed the patch thoroughly so take my comments with the due care.
I am just afraid that we will redo the same mistake we did with the local 
mode execution of Pig that you mention in the ticket.
That mode of execution was removed because it was a burden to maintain, and in 
the end the two implementations (MR and local mode) were out of synch, 
resulting in the same script doing different things.
I just want to avoid the same thing happening again.

If [~cheolsoo] has reviewed the patch, I would like to hear his comments on 
this issue.

 Direct HDFS access for small jobs (fetch) 
 --

 Key: PIG-3642
 URL: https://issues.apache.org/jira/browse/PIG-3642
 Project: Pig
  Issue Type: Improvement
Reporter: Lorand Bendig
Assignee: Lorand Bendig
 Fix For: 0.13.0

 Attachments: PIG-3642.patch


 With this patch I'd like to add the possibility to directly read data from 
 HDFS instead of launching MR jobs in case of simple (map-only) tasks. Hive 
 already has this feature (fetch). This patch shares some similarities with 
 the local mode of Pig 0.6. Here, fetching kicks off when the following holds 
 for a script:
 * it contains only LIMIT, FILTER, UNION (if no split is generated), STREAM, 
 (nested) FOREACH with expression operators, custom UDFs..etc
 * no scalar aliases
 * no SampleLoader
 * single leaf job
 * DUMP (no STORE)
 The feature is enabled by default and can be toggled with:
 * -N or -no_fetch 
 * set opt.fetch true/false; 
 There's no STORE support because I wanted to make it explicit that this 
 optimization is for launching small/simple scripts during development, 
 rather than querying and filtering large number of rows on the client 
 machine. However, a threshold could be given on the input size (an 
 estimation) to determine whether to prefer fetch over MR jobs, similar to 
 what Hive's '{{hive.fetch.task.conversion.threshold}}' does. (through Pig's 
 LoadMetadata#getStatistic ?)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


Re: Welcome our newest committer Prashant Kommireddi

2013-05-02 Thread Gianmarco De Francisci Morales
Congrats!

--
Gianmarco


On 2 May 2013 22:26, Johnny Zhang xiao...@cloudera.com wrote:

 Congrats Prashant!


 On Thu, May 2, 2013 at 1:17 PM, Bill Graham billgra...@gmail.com wrote:

  Congrats Prashant!
 
 
  On Thu, May 2, 2013 at 1:11 PM, Daniel Dai da...@hortonworks.com
 wrote:
 
   Congratulation!
  
  
   On Thu, May 2, 2013 at 1:06 PM, Cheolsoo Park piaozhe...@gmail.com
   wrote:
  
Congrats Prashant!
   
   
On Thu, May 2, 2013 at 12:56 PM, Julien Le Dem jul...@ledem.net
  wrote:
   
 All,

 Please join me in welcoming Prashant Kommireddi as our newest Pig
 committer.
 He's been contributing to Pig for a while now. We look forward to
 him
 being a part of the project.

 Julien
   
  
 
 
 
  --
  *Note that I'm no longer using my Yahoo! email address. Please email me
 at
  billgra...@gmail.com going forward.*
 



[jira] [Commented] (PIG-3225) Stratified sampling

2013-04-22 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13637830#comment-13637830
 ] 

Gianmarco De Francisci Morales commented on PIG-3225:
-

Hi Saiph,
I am happy to see interest in this project idea.

This idea should be combined with the other sampling projects in Pig as shown 
in  https://cwiki.apache.org/confluence/display/PIG/GSoc2013 to prepare a GSoC 
project proposal.

In my view, reservoir and bootstrap sampling are the easiest, while stratified 
sampling might be more complicated.

 Stratified sampling
 ---

 Key: PIG-3225
 URL: https://issues.apache.org/jira/browse/PIG-3225
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales
  Labels: gsoc2013

 Implement a stratified sampling option ( 
 http://en.wikipedia.org/wiki/Stratified_sampling ) in Pig's SAMPLE operator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3221) Bootstrap sampling

2013-04-22 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13637831#comment-13637831
 ] 

Gianmarco De Francisci Morales commented on PIG-3221:
-

Hi Vicky,

Thanks for your interest in this project idea.

Given that Pig is not a statistics only, my current understanding is that we 
want the samples to be materialized because they can be used, e.g., to train an 
ensemble classifier.
Of course the case where we are only interested in statistics can be optimized.
Maybe a UDF would do the trick in this latter case.

 Bootstrap sampling
 --

 Key: PIG-3221
 URL: https://issues.apache.org/jira/browse/PIG-3221
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales
  Labels: gsoc2013

 Implement a bootstrap sampling option ( 
 http://en.wikipedia.org/wiki/Bootstrap_(statistics) ) in Pig's SAMPLE 
 operator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3279) Support nested RANK

2013-04-16 Thread Gianmarco De Francisci Morales (JIRA)
Gianmarco De Francisci Morales created PIG-3279:
---

 Summary: Support nested RANK
 Key: PIG-3279
 URL: https://issues.apache.org/jira/browse/PIG-3279
 Project: Pig
  Issue Type: Improvement
Reporter: Gianmarco De Francisci Morales




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: GSoC 2013

2013-04-09 Thread Gianmarco De Francisci Morales
+1 to what Dmitriy says.

Cheers,

--
Gianmarco


On Mon, Apr 8, 2013 at 8:57 PM, Dmitriy Ryaboy dvrya...@gmail.com wrote:

 Hi,
 I think this is an interesting project but is not core to Pig itself --
 it may be more interesting / viable as a standalone project on github that
 uses Pig to implement graph algorithms.
 At this point in its development, I feel that Pig needs to concentrate on
 doing the things it already does, and do them better (operator efficiency,
 storage efficiency, better MR plan generation, etc) rather than expand to
 specific verticals; we should allow our users to create their own solution
 suites that use Pig for specific purposes. A successful example of such a
 standalone project is PacketPig (https://github.com/packetloop/packetpig)
 ,
 a PCAP network capture analysis tool.

 D


 On Tue, Apr 2, 2013 at 9:48 AM, burakkk burak.isi...@gmail.com wrote:

  I know that but giraph tries to use bsp. What I'm saying is nothing
 shared
  model except reducers. Besides I don't want to divide iteration. One
 phase
  is still responsible for whole iteration. Every different origin vertex
  will be processed in parallel.
 
  Thanks
  Best regards...
 
 
  On Tue, Apr 2, 2013 at 7:20 PM, Gianmarco De Francisci Morales 
  g...@gdfm.me
   wrote:
 
   FYI, Giraph has a Random Walk implementation.
  
   Pig does not support iteration natively, so any iterative algorithm is
  not
   a very good fit for it. Just my 2c.
  
   Cheers,
  
   --
   Gianmarco
  
  
   On Tue, Apr 2, 2013 at 10:04 AM, burakkk burak.isi...@gmail.com
 wrote:
  
So what do you suggest? Is it clear?
   
   
On Mon, Apr 1, 2013 at 9:35 PM, burakkk burak.isi...@gmail.com
  wrote:
   
 I'm using only WTF graph representation to fit the memory. By the
  way I
 haven't seen any explanation from the pig 0.11 release page about
 WTF
   or
 graph models.
 I don't wanna use Cassovary. I believe it can be done with pig. I
 implement a graph representation using WTF paper to pig and then
 I'll
   use
 it to implement random walk algorithm. To do that maybe I need to
   improve
 some features such as joins(fuzzy join) etc or implement a new
   operator.
I
 can implement it using either existing operators or new operators.
   That's
 up to us and it doesn't really matter. If there is already a
implementation
 to random walker algorithm, please feel free to tell. Because I
  haven't
 found it.
 Are you proposing to create an open-source implementation of those
 algorithms?
 Yes, I'm proposing to implement a random walk algorithm, new data
  model
 which is representing graph. After that, people can use it coding
 the
pig.

 Do you suggest they should be Pig scripts added to the Pig project,
  or
   do
 you want to create some new operators?
 Maybe, it can be UDF or new operator.

 I made a quick example. It may not be completely accurate, I've
 just
tried
 to explain it.
 Think about you have a graph file just like that
 user_id follower
 1 2
 1 3
 1 10
 2 3
 3 4
 3 5
 ...

 Vertex List is an array including sorted vertex ids
 node List is a matrix including vertex id and its starting position


 graph = load 'graph' using PigStorage() (vertex:int, follower:int)
 -
 --load the graph file
 vertex = COGROUP graph BY (vertex);
 list = FOREACH vertex GENERATE
 org.apache.pig.generateVertex(vertex)
  as
 vertexList; --load the whole vertexes from HDFS into the memory
 list = FOREACH graph GENERATE org.apache.pig.generateNode(list) as
 nodeList; --load the whole vertexes from HDFS into the memory
 randomWalk = FOREACH vertex GENERATE
 flatten(org.apache.pig.RandomWalk(list, endVertex)) as score; --
generate a
 score using the node list you can traverse the graph to the your
finishing
 position
 store...


 Thanks
 Best Regards...


 On Mon, Apr 1, 2013 at 7:20 PM, Dmitriy Ryaboy dvrya...@gmail.com
 
wrote:

 I'm somewhat familiar with WTF code (my day job is managing the
analytics
 infrastructure team at Twitter). WTF is implemented using Pig 0.11
  (in
 fact
 some of the Pig 11 features/improvements are directly due to this
 project...), and mostly has to do with clever algorithms
 implemented
   in
 Pig
 (an earlier version of WTF loaded the graph into main memory on
large-mem
 machines -- that system is open sourced, too, under
 github.com/twitter/cassovary). Are you proposing to create an
open-source
 implementation of those algorithms? Do you suggest they should be
  Pig
 scripts added to the Pig project, or do you want to create some
 new
 operators? I'm not totally sure where you are going here.

 GSoC proposals for Pig are usually made by students who want to
 work
   on
 issues labeled as GSoC candidates

Re: GSoC 2013

2013-04-02 Thread Gianmarco De Francisci Morales
FYI, Giraph has a Random Walk implementation.

Pig does not support iteration natively, so any iterative algorithm is not
a very good fit for it. Just my 2c.

Cheers,

--
Gianmarco


On Tue, Apr 2, 2013 at 10:04 AM, burakkk burak.isi...@gmail.com wrote:

 So what do you suggest? Is it clear?


 On Mon, Apr 1, 2013 at 9:35 PM, burakkk burak.isi...@gmail.com wrote:

  I'm using only WTF graph representation to fit the memory. By the way I
  haven't seen any explanation from the pig 0.11 release page about WTF or
  graph models.
  I don't wanna use Cassovary. I believe it can be done with pig. I
  implement a graph representation using WTF paper to pig and then I'll use
  it to implement random walk algorithm. To do that maybe I need to improve
  some features such as joins(fuzzy join) etc or implement a new operator.
 I
  can implement it using either existing operators or new operators. That's
  up to us and it doesn't really matter. If there is already a
 implementation
  to random walker algorithm, please feel free to tell. Because I haven't
  found it.
  Are you proposing to create an open-source implementation of those
  algorithms?
  Yes, I'm proposing to implement a random walk algorithm, new data model
  which is representing graph. After that, people can use it coding the
 pig.
 
  Do you suggest they should be Pig scripts added to the Pig project, or do
  you want to create some new operators?
  Maybe, it can be UDF or new operator.
 
  I made a quick example. It may not be completely accurate, I've just
 tried
  to explain it.
  Think about you have a graph file just like that
  user_id follower
  1 2
  1 3
  1 10
  2 3
  3 4
  3 5
  ...
 
  Vertex List is an array including sorted vertex ids
  node List is a matrix including vertex id and its starting position
 
 
  graph = load 'graph' using PigStorage() (vertex:int, follower:int) -
  --load the graph file
  vertex = COGROUP graph BY (vertex);
  list = FOREACH vertex GENERATE org.apache.pig.generateVertex(vertex) as
  vertexList; --load the whole vertexes from HDFS into the memory
  list = FOREACH graph GENERATE org.apache.pig.generateNode(list) as
  nodeList; --load the whole vertexes from HDFS into the memory
  randomWalk = FOREACH vertex GENERATE
  flatten(org.apache.pig.RandomWalk(list, endVertex)) as score; --
 generate a
  score using the node list you can traverse the graph to the your
 finishing
  position
  store...
 
 
  Thanks
  Best Regards...
 
 
  On Mon, Apr 1, 2013 at 7:20 PM, Dmitriy Ryaboy dvrya...@gmail.com
 wrote:
 
  I'm somewhat familiar with WTF code (my day job is managing the
 analytics
  infrastructure team at Twitter). WTF is implemented using Pig 0.11 (in
  fact
  some of the Pig 11 features/improvements are directly due to this
  project...), and mostly has to do with clever algorithms implemented in
  Pig
  (an earlier version of WTF loaded the graph into main memory on
 large-mem
  machines -- that system is open sourced, too, under
  github.com/twitter/cassovary). Are you proposing to create an
 open-source
  implementation of those algorithms? Do you suggest they should be Pig
  scripts added to the Pig project, or do you want to create some new
  operators? I'm not totally sure where you are going here.
 
  GSoC proposals for Pig are usually made by students who want to work on
  issues labeled as GSoC candidates on the apache jira. The students spend
  some time to understand the problem stated in the jira, familiarize
  themselves with the existing codebase, and put a basic technical
  implementation plan and schedule into their proposal. Since in this case
  you are proposing something we haven't scoped or defined well for
  ourselves, we need you to be very clear and specific about what you are
  trying to do, and how you plan to go about it. I think that Graph
  processing in Pig (or other Hadoop-based systems) is a really
 interesting
  topic and there is a lot of work to be done, but we really need you to
 be
  far more detailed to be able to give you good guidance with regards to
  GSoC.
 
  Best,
  Dmitriy
 
 
  On Sat, Mar 30, 2013 at 10:12 AM, burakkk burak.isi...@gmail.com
 wrote:
 
   Sure. We can implement a graph model using  WTF: The Who to Follow
  Service
   at Twitter article we can article.This article's said that in this
 way
   graph can be stored one machine's memory so that every node will read
  from
   HDFS and cache the graph to the memory. Every node is responsible from
  its
   bucket edge to process. I mean it can be splitted. Every node can be
   processed its bucket using random walk algorithm for instance. Finally
  it
   can be reduced to get to the final results. I hope it's clear :)
  
   Thanks
   Best Regards...
  
  
   On Fri, Mar 29, 2013 at 6:10 PM, Dmitriy Ryaboy dvrya...@gmail.com
   wrote:
  
Hi Burakk,
The general idea of making graph processing easier is a good one.
 I'm
  not
sure what exactly you are proposing to do, though. Could you be more

[jira] [Commented] (PIG-3225) Stratified sampling

2013-03-26 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614617#comment-13614617
 ] 

Gianmarco De Francisci Morales commented on PIG-3225:
-

Hi Dishara,
Happy to see your interest.
While we haven't discussed in detail with the rest of the Committers, my 
personal view on this project is that it should be combined with the one on 
Bootstrap sampling PIG-3221 to be worth of GSoC.

Regarding the sampling, this part of the project requires designing and 
changing the parser to recognize new part of the syntax for the SAMPLE operator 
(to specify the strata), and implementing the logical and physical operators 
connected to it.

 Stratified sampling
 ---

 Key: PIG-3225
 URL: https://issues.apache.org/jira/browse/PIG-3225
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales
  Labels: gsoc2013

 Implement a stratified sampling option ( 
 http://en.wikipedia.org/wiki/Stratified_sampling ) in Pig's SAMPLE operator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3221) Bootstrap sampling

2013-03-26 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13614618#comment-13614618
 ] 

Gianmarco De Francisci Morales commented on PIG-3221:
-

Here an example http://hortonworks.com/blog/bootstrap-sampling-with-apache-pig

 Bootstrap sampling
 --

 Key: PIG-3221
 URL: https://issues.apache.org/jira/browse/PIG-3221
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales
  Labels: gsoc2013

 Implement a bootstrap sampling option ( 
 http://en.wikipedia.org/wiki/Bootstrap_(statistics) ) in Pig's SAMPLE 
 operator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3225) Stratified sampling

2013-02-28 Thread Gianmarco De Francisci Morales (JIRA)
Gianmarco De Francisci Morales created PIG-3225:
---

 Summary: Stratified sampling
 Key: PIG-3225
 URL: https://issues.apache.org/jira/browse/PIG-3225
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales


Implement a stratified sampling option ( 
http://en.wikipedia.org/wiki/Stratified_sampling ) in Pig's SAMPLE operator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3224) Reservoir sampling

2013-02-28 Thread Gianmarco De Francisci Morales (JIRA)
Gianmarco De Francisci Morales created PIG-3224:
---

 Summary: Reservoir sampling
 Key: PIG-3224
 URL: https://issues.apache.org/jira/browse/PIG-3224
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales


Implement a reservoir sampling option, or make it the default ( 
http://en.wikipedia.org/wiki/Reservoir_sampling ) in Pig's SAMPLE operator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3224) Reservoir sampling

2013-02-28 Thread Gianmarco De Francisci Morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianmarco De Francisci Morales updated PIG-3224:


Labels: gsoc2013  (was: )

 Reservoir sampling
 --

 Key: PIG-3224
 URL: https://issues.apache.org/jira/browse/PIG-3224
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales
  Labels: gsoc2013

 Implement a reservoir sampling option, or make it the default ( 
 http://en.wikipedia.org/wiki/Reservoir_sampling ) in Pig's SAMPLE operator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3225) Stratified sampling

2013-02-28 Thread Gianmarco De Francisci Morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianmarco De Francisci Morales updated PIG-3225:


Labels: gsoc2013  (was: )

 Stratified sampling
 ---

 Key: PIG-3225
 URL: https://issues.apache.org/jira/browse/PIG-3225
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales
  Labels: gsoc2013

 Implement a stratified sampling option ( 
 http://en.wikipedia.org/wiki/Stratified_sampling ) in Pig's SAMPLE operator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3225) Stratified sampling

2013-02-28 Thread Gianmarco De Francisci Morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianmarco De Francisci Morales updated PIG-3225:


Tags:   (was: gsoc2013)

 Stratified sampling
 ---

 Key: PIG-3225
 URL: https://issues.apache.org/jira/browse/PIG-3225
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales
  Labels: gsoc2013

 Implement a stratified sampling option ( 
 http://en.wikipedia.org/wiki/Stratified_sampling ) in Pig's SAMPLE operator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3221) Bootstrap sampling

2013-02-28 Thread Gianmarco De Francisci Morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianmarco De Francisci Morales updated PIG-3221:


Tags:   (was: gsoc2013)

 Bootstrap sampling
 --

 Key: PIG-3221
 URL: https://issues.apache.org/jira/browse/PIG-3221
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales
  Labels: gsoc2013

 Implement a bootstrap sampling option ( 
 http://en.wikipedia.org/wiki/Bootstrap_(statistics) ) in Pig's SAMPLE 
 operator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3221) Bootstrap sampling

2013-02-28 Thread Gianmarco De Francisci Morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianmarco De Francisci Morales updated PIG-3221:


Labels: gsoc2013  (was: )

 Bootstrap sampling
 --

 Key: PIG-3221
 URL: https://issues.apache.org/jira/browse/PIG-3221
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales
  Labels: gsoc2013

 Implement a bootstrap sampling option ( 
 http://en.wikipedia.org/wiki/Bootstrap_(statistics) ) in Pig's SAMPLE 
 operator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3221) Bootstrap sampling

2013-02-25 Thread Gianmarco De Francisci Morales (JIRA)
Gianmarco De Francisci Morales created PIG-3221:
---

 Summary: Bootstrap sampling
 Key: PIG-3221
 URL: https://issues.apache.org/jira/browse/PIG-3221
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales


Implement a bootstrap sampling option ( 
http://en.wikipedia.org/wiki/Bootstrap_(statistics) ) in Pig's SAMPLE operator.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [ANNOUNCE] Welcome Bill Graham to join Pig PMC

2013-02-20 Thread Gianmarco De Francisci Morales
Congrats Bill! :)

--
Gianmarco


On Wed, Feb 20, 2013 at 10:00 AM, Jonathan Coveney jcove...@gmail.comwrote:

 congrats :)


 2013/2/20 Jarek Jarcec Cecho jar...@apache.org

  Congratulations Bill, good job!
 
  Jarcec
 
  On Tue, Feb 19, 2013 at 01:48:18PM -0800, Daniel Dai wrote:
   Please welcome Bill Graham as our latest Pig PMC member.
  
   Congrats Bill!
 



[jira] [Updated] (PIG-2353) RANK function like in SQL

2013-01-07 Thread Gianmarco De Francisci Morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianmarco De Francisci Morales updated PIG-2353:


Release Note: 
Pig includes a new RANK operator:
RANK relation ( BY column (ASC|DES)? (DENSE)? )?
This operator prepends a consecutive integer to each tuple in the relation 
starting from 1.
If the BY clause is present, RANK sorts the relation before ranking it, 
otherwise it uses the order in which it receives the relation (e.g. the order 
in which the relation is stored if RANK is performed right after a LOAD).
The DENSE modifier produces a dense rank, which has no gaps in it regardless of 
ties.

RANK is now a reserved keyword and is *not* backward compatible.
Please review your scripts to avoid usage of RANK as a relation name.

  was:
Pig includes a new RANK operator:
RANK relation ( BY column (ASC|DES)? (DENSE)? )?
This operator prepends a consecutive integer to each tuple in the relation 
starting from 1.
If the BY clause is present, RANK sorts the relation before ranking it, 
otherwise it uses the order in which it receives the relation (e.g. the order 
in which the relation is stored if RANK is performed right after a LOAD).
The DENSE modifier produces a dense rank, which has no gaps in it regardless of 
ties.




 RANK function like in SQL
 -

 Key: PIG-2353
 URL: https://issues.apache.org/jira/browse/PIG-2353
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales
Assignee: Allan Avendaño
  Labels: gsoc2012, mentor
 Fix For: 0.11

 Attachments: PIG-2353-2, PIG-2353-3.txt, PIG-2353-4.txt, 
 PIG-2353-5.txt, PIG2353.patch


 Implement a function that given a (sorted) bag adds to each tuple a unique, 
 increasing identifier without gaps, like what RANK does for SQL.
 This is a candidate project for Google summer of code 2012. More information 
 about the program can be found at 
 https://cwiki.apache.org/confluence/display/PIG/GSoc2012
 Functionality implemented so far, is available at 
 https://reviews.apache.org/r/5523/diff/#index_header

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2353) RANK function like in SQL

2013-01-07 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13546296#comment-13546296
 ] 

Gianmarco De Francisci Morales commented on PIG-2353:
-

Hi, sorry I guess I misunderstood.
I thought that PIG-2947 was sufficient as documentation and that we just wanted 
to clarify the release notes.

Should I open a separate Jira to include the release notes of the Jira inside 
RELEASE_NOTES.txt ?

 RANK function like in SQL
 -

 Key: PIG-2353
 URL: https://issues.apache.org/jira/browse/PIG-2353
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales
Assignee: Allan Avendaño
  Labels: gsoc2012, mentor
 Fix For: 0.11

 Attachments: PIG-2353-2, PIG-2353-3.txt, PIG-2353-4.txt, 
 PIG-2353-5.txt, PIG2353.patch


 Implement a function that given a (sorted) bag adds to each tuple a unique, 
 increasing identifier without gaps, like what RANK does for SQL.
 This is a candidate project for Google summer of code 2012. More information 
 about the program can be found at 
 https://cwiki.apache.org/confluence/display/PIG/GSoc2012
 Functionality implemented so far, is available at 
 https://reviews.apache.org/r/5523/diff/#index_header

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2362) Rework Ant build.xml to use macrodef instead of antcall

2013-01-04 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13543726#comment-13543726
 ] 

Gianmarco De Francisci Morales commented on PIG-2362:
-

Hi Cheolsoo,

Good catch, thanks. I wasn't familiar with PIG-2748.
I verified that {{ant mvn-jar}} passes.
Also ran the {{eclipse-files}}, {{src-release}}, {{tar-release}} targets and 
verified their output.

+1 to the last patch

 Rework Ant build.xml to use macrodef instead of antcall
 ---

 Key: PIG-2362
 URL: https://issues.apache.org/jira/browse/PIG-2362
 Project: Pig
  Issue Type: Improvement
Reporter: Gianmarco De Francisci Morales
Assignee: Gianmarco De Francisci Morales
Priority: Minor
 Fix For: 0.12

 Attachments: PIG-2362.10.patch, PIG-2362.1.patch, PIG-2362.2.patch, 
 PIG-2362.3.patch, PIG-2362.4.patch, PIG-2362.5.patch, PIG-2362.6.patch, 
 PIG-2362.7.patch, PIG-2362.8.patch, PIG-2362.9.patch, 
 PIG-2362.9.patch.nowhitespace


 Antcall is evil: http://www.build-doctor.com/2008/03/13/antcall-is-evil/
 We'd better use macrodef and let Ant build a clean dependency graph.
 http://ant.apache.org/manual/Tasks/macrodef.html
 Right now we do like this:
 {code}
 target name=buildAllJars
   antcall target=buildJar
 param name=build.dir value=jar-A/
   /antcall
   antcall target=buildJar
 param name=build.dir value=jar-B/
   /antcall
   antcall target=buildJar
 param name=build.dir value=jar-C/
   /antcall
 /target
 target name=buildJar
   jar destfile=target/${build.dir}.jar basedir=${build.dir}/classfiles/
 /target
 {code}
 But it would be better if we did like this:
 {code}
 target name=buildAllJars
   buildJar build.dir=jar-A/
   buildJar build.dir=jar-B/
   buildJar build.dir=jar-C/
 /target
 macrodef name=buildJar
   attribute name=build.dir/
   jar destfile=target/${build.dir}.jar basedir=${build.dir}/classfiles/
 /macrodef
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2362) Rework Ant build.xml to use macrodef instead of antcall

2013-01-02 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13542216#comment-13542216
 ] 

Gianmarco De Francisci Morales commented on PIG-2362:
-

Hi Cheolsoo,
Would you mind rebasing it to trunk once again?
It does not apply cleanly anymore.
I will review it as soon as it is rebased.

 Rework Ant build.xml to use macrodef instead of antcall
 ---

 Key: PIG-2362
 URL: https://issues.apache.org/jira/browse/PIG-2362
 Project: Pig
  Issue Type: Improvement
Reporter: Gianmarco De Francisci Morales
Assignee: Gianmarco De Francisci Morales
Priority: Minor
 Fix For: 0.12

 Attachments: PIG-2362.1.patch, PIG-2362.2.patch, PIG-2362.3.patch, 
 PIG-2362.4.patch, PIG-2362.5.patch, PIG-2362.6.patch, PIG-2362.7.patch, 
 PIG-2362.8.patch


 Antcall is evil: http://www.build-doctor.com/2008/03/13/antcall-is-evil/
 We'd better use macrodef and let Ant build a clean dependency graph.
 http://ant.apache.org/manual/Tasks/macrodef.html
 Right now we do like this:
 {code}
 target name=buildAllJars
   antcall target=buildJar
 param name=build.dir value=jar-A/
   /antcall
   antcall target=buildJar
 param name=build.dir value=jar-B/
   /antcall
   antcall target=buildJar
 param name=build.dir value=jar-C/
   /antcall
 /target
 target name=buildJar
   jar destfile=target/${build.dir}.jar basedir=${build.dir}/classfiles/
 /target
 {code}
 But it would be better if we did like this:
 {code}
 target name=buildAllJars
   buildJar build.dir=jar-A/
   buildJar build.dir=jar-B/
   buildJar build.dir=jar-C/
 /target
 macrodef name=buildJar
   attribute name=build.dir/
   jar destfile=target/${build.dir}.jar basedir=${build.dir}/classfiles/
 /macrodef
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2362) Rework Ant build.xml to use macrodef instead of antcall

2013-01-02 Thread Gianmarco De Francisci Morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianmarco De Francisci Morales updated PIG-2362:


Status: Open  (was: Patch Available)

 Rework Ant build.xml to use macrodef instead of antcall
 ---

 Key: PIG-2362
 URL: https://issues.apache.org/jira/browse/PIG-2362
 Project: Pig
  Issue Type: Improvement
Reporter: Gianmarco De Francisci Morales
Assignee: Gianmarco De Francisci Morales
Priority: Minor
 Fix For: 0.12

 Attachments: PIG-2362.1.patch, PIG-2362.2.patch, PIG-2362.3.patch, 
 PIG-2362.4.patch, PIG-2362.5.patch, PIG-2362.6.patch, PIG-2362.7.patch, 
 PIG-2362.8.patch


 Antcall is evil: http://www.build-doctor.com/2008/03/13/antcall-is-evil/
 We'd better use macrodef and let Ant build a clean dependency graph.
 http://ant.apache.org/manual/Tasks/macrodef.html
 Right now we do like this:
 {code}
 target name=buildAllJars
   antcall target=buildJar
 param name=build.dir value=jar-A/
   /antcall
   antcall target=buildJar
 param name=build.dir value=jar-B/
   /antcall
   antcall target=buildJar
 param name=build.dir value=jar-C/
   /antcall
 /target
 target name=buildJar
   jar destfile=target/${build.dir}.jar basedir=${build.dir}/classfiles/
 /target
 {code}
 But it would be better if we did like this:
 {code}
 target name=buildAllJars
   buildJar build.dir=jar-A/
   buildJar build.dir=jar-B/
   buildJar build.dir=jar-C/
 /target
 macrodef name=buildJar
   attribute name=build.dir/
   jar destfile=target/${build.dir}.jar basedir=${build.dir}/classfiles/
 /macrodef
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2353) RANK function like in SQL

2012-12-18 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13535455#comment-13535455
 ] 

Gianmarco De Francisci Morales commented on PIG-2353:
-

Hi Jonathan,
Yes, RANK is now an operator and thus a reserved keyword.
We can add it to the release notes.

The parser is definitely a bit rough and could use some reworking, especially 
in the error messages, so I am all in for it. Not sure if it is a known issue. 
Can you use LOAD or FOREACH as column names?

 RANK function like in SQL
 -

 Key: PIG-2353
 URL: https://issues.apache.org/jira/browse/PIG-2353
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales
Assignee: Allan Avendaño
  Labels: gsoc2012, mentor
 Fix For: 0.11

 Attachments: PIG-2353-2, PIG-2353-3.txt, PIG-2353-4.txt, 
 PIG-2353-5.txt, PIG2353.patch


 Implement a function that given a (sorted) bag adds to each tuple a unique, 
 increasing identifier without gaps, like what RANK does for SQL.
 This is a candidate project for Google summer of code 2012. More information 
 about the program can be found at 
 https://cwiki.apache.org/confluence/display/PIG/GSoc2012
 Functionality implemented so far, is available at 
 https://reviews.apache.org/r/5523/diff/#index_header

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Jenkins / Clover

2012-11-13 Thread Gianmarco De Francisci Morales
Hi Daniel,
Thanks for adding me to the group.
I will have a look at it ASAP.

Cheers,
--
Gianmarco


On Mon, Nov 12, 2012 at 10:22 AM, Daniel Dai da...@hortonworks.com wrote:

 Hi, Gianmarco
 I added you to hudson-jobadmin group.

 Thanks,
 Daniel

 On Thu, Jul 19, 2012 at 12:33 AM, Gianmarco De Francisci Morales
 g...@apache.org wrote:
  Fine,
  Alan, could you add me to the hudson-jobadmin group?
 
  modify_appgroups.pl hudson-jobadmin --add=gdfm
 
  On people.apache.org, according to the page.
 
  I have subscribed to infrastructure and builds.
 
  Cheers,
  --
  Gianmarco
 
 
 
 
  On Thu, Jul 19, 2012 at 12:17 AM, Alan Gates ga...@hortonworks.com
 wrote:
 
 
 http://wiki.apache.org/general/Jenkins?action=showredirect=Hudsondescribeshow
  to get an account so you can administer the Jenkins builds.
 
  Alan.
 
  On Jul 18, 2012, at 12:27 PM, Gianmarco De Francisci Morales wrote:
 
   What is the procedure to modify the nightly build?
   If everyone agrees (and somebody explains me how) I volunteer to fix
 it.
  
   Cheers,
   --
   Gianmarco
  
  
  
  
   On Wed, Jul 18, 2012 at 8:25 AM, Jonathan Coveney jcove...@gmail.com
  wrote:
  
   +1
  
   A while ago I tried to get apache builds to deal with this, and
 nothing.
   Very annoying, but pending a fix, we should remove it from the
 nightly.
  
   2012/7/17 Alan Gates ga...@hortonworks.com
  
   I'm fine with removing it from the nightly build.  I don't see any
  reason
   to run that every day, especially since it slows down the tests.
  Let's
   not
   remove it from ant, as it's useful to run occasionally.
  
   Alan.
  
   On Jul 17, 2012, at 3:17 PM, Gianmarco De Francisci Morales wrote:
  
   Hi,
  
   Clover constantly makes a number of our Jenkins builds fail
 (usually
   because of license issues, I think it is a misconfiguration).
   Do we actually use it?
   If we don't I would propose to remove it from our build.
   What do you think?
  
   Cheers,
   --
   Gianmarco
  
  
  
 
 



[jira] [Commented] (PIG-2989) Illustrate for Rank Operator

2012-11-13 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13496344#comment-13496344
 ] 

Gianmarco De Francisci Morales commented on PIG-2989:
-

Has this been committed?
When I try to apply the patch to trunk I get an error:
{code}
 patch -p0  patch_1 
patching file 
src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
Reversed (or previously applied) patch detected!  Assume -R? [n] n
Apply anyway? [n] y
Hunk #1 FAILED at 326.
1 out of 1 hunk FAILED -- saving rejects to file 
src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java.rej
patching file 
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/PORank.java
Reversed (or previously applied) patch detected!  Assume -R? [n] n
Apply anyway? [n] y
Hunk #1 FAILED at 156.
1 out of 1 hunk FAILED -- saving rejects to file 
src/org/apache/pig/backend/hadoop/executionengine/physicalLayer/relationalOperators/PORank.java.rej
{code}


 Illustrate for Rank Operator
 

 Key: PIG-2989
 URL: https://issues.apache.org/jira/browse/PIG-2989
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.11
Reporter: Allan Avendaño
Assignee: Allan Avendaño
Priority: Minor
 Attachments: patch_1


 Specifically useful, when it's required a quick view of final results of Rank 
 operator use.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2325) Make e2e test directory for data configurable in HDFS

2012-11-09 Thread Gianmarco De Francisci Morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianmarco De Francisci Morales resolved PIG-2325.
-

   Resolution: Invalid
Fix Version/s: 0.12
 Assignee: Gianmarco De Francisci Morales

Thanks!

 Make e2e test directory for data configurable in HDFS
 -

 Key: PIG-2325
 URL: https://issues.apache.org/jira/browse/PIG-2325
 Project: Pig
  Issue Type: Improvement
Reporter: Gianmarco De Francisci Morales
Assignee: Gianmarco De Francisci Morales
Priority: Minor
 Fix For: 0.12


 Right now the place for the data generated in e2e tests is hardcoded in 
 test/e2e/pig/conf/default.conf as:
 {code}
  $cfg = {
  #HDFS
   'inpathbase' = '/user/pig/tests/data'
 , 'outpathbase'= '/user/pig/out'
 {code}
 It would be better to make it configurable (with an environment variable?) as 
 the rest of the paths.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3006) Modernize a chunk of the tests

2012-11-07 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13492477#comment-13492477
 ] 

Gianmarco De Francisci Morales commented on PIG-3006:
-

Great job guys!

 Modernize a chunk of the tests
 --

 Key: PIG-3006
 URL: https://issues.apache.org/jira/browse/PIG-3006
 Project: Pig
  Issue Type: Improvement
Reporter: Jonathan Coveney
Assignee: Jonathan Coveney
 Fix For: 0.12

 Attachments: PIG-3006-0.patch, PIG-3006-1.patch, PIG-3006-2.patch, 
 PIG-3006-3.patch, PIG-3006-4.patch


 A lot of the tests use antiquated patterns. My goal was to refactor them in a 
 couple ways:
 - get rid of the annotation specifying Junit 4. All should use JUnit 4 
 (question: where is the Junit 3 dependency even being pulled in?)
 - Nothing should extend TestCase. Everything should be annotation driven.
 - Properly use asserts. There was a lot of assertTrue(null==thing), so I 
 replaced it with assertNull(thing), and so on.
 - Get rid of MiniCluster use in a handful of cases.
 I've run every test and they pass, EXCEPT TestLargeFile which is failing on 
 trunk anyway.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Our release process

2012-11-05 Thread Gianmarco De Francisci Morales
Hi,

Sure we don't want to commit patches that destabilize the code base.
However, unfortunately, there is no way to know whether a patch will
destabilize the code or not. Even testing is only a heuristic. So how do we
draw the line?
We seem to agree that only bug fixing should go into branches. However it
seems that we have two different views on the policy: Olga is proposing to
have only P1 bugs fixed, while Alan is suggesting to be more lax on what
goes into the branches.
Regardless of the policy chosen, how do we define the priority of a bug? By
how many users are affected? By whether it can corrupt data? Is there a
formal definition we can agree on? Otherwise defining a policy becomes hard.

The test-commit task does not run full regression because the full test
suite takes too long to execute. And I agree that asking to run the full
test suite before committing any change slows down the (already slow)
review process.
However, I would be fine with running the full test suite for bug fixes
that need to go into branches, in order to guarantee absence of regressions.

Cheers,
--
Gianmarco



On Sun, Nov 4, 2012 at 5:17 PM, Olga Natkovich onatkov...@yahoo.com wrote:

 I can see how this would work for research projects but for real
 production this will not work. And I actually meant much more stringent
 stability. I don't think we should commit patches to either trunk or branch
 that destabilize the tree. We used to run full regression before each
 commit - is this no longer the case? By stability I meant very few things
 go into the branch. I know that pig has pretty decent tests - better
 coverage than many other projects. However, we do not have any testing at
 scale and inevitably, users end up doing testing. So any time we deploy new
 major version, it takes us at least a month to get it stable and once it is
 stabilized we want to keep it this way.

 So for us at Yahoo, the only way to work directly from the branch is to go
 by our original plan. If that is not possible, we would go with the private
 git branch.

 Olga


 
  From: Alan Gates ga...@hortonworks.com
 To: dev@pig.apache.org
 Sent: Friday, November 2, 2012 8:19 PM
 Subject: Re: Our release process

 I am all for maintaining stability of branches, and the trunk, as everyone
 benefits from it.  But I do not think this means we should limit bug fixing
 in the branches to only critical issues.  As Pig gets more users we have
 more and more people on older branches who will want fixes for bugs without
 dealing with bigger version changes.  So I am not in favor of limiting
 checkins to branches to P1 issues.

 What if we maintain stability on the branches by quickly reverting any
 patches that break the build, the unit tests, or the e2e tests?  This
 allows us to move forward with bug fix versions, it allows those who depend
 on branch stability (which I suspect is everyone in the distribution
 business plus everyone rolling their own Pig), and it should promote
 developer responsibility (no one likes having their patches reverted).

 Alan.

 On Nov 2, 2012, at 3:58 PM, Olga Natkovich wrote:

  Hi guys,
 
  Mid next year, we agreed on a release process documented in this thread:
 http://www.mail-archive.com/dev@pig.apache.org/msg04172.html.
 
  Since then, we have not really followed either of its two rules:
 
  (1) Frequent (every 3 month releases)
  (2) Branch stability (only P1 issues on the branch).
 
  So I wanted to revisit our release procedure to make sure we have one
 that we can actually follow.
 
  For us at Yahoo, branch stability is very important since we release all
 the patches directly from the branch. If we can't rely on the fact that
 only critical fixes go in, we will need to resort to git branches that will
 make the whole process very comberson because we now need to hand pick
 patches from the apache branch and port them onto our private branch. I
 would imaging that others using Pig in production would have similar issues.
 
  Olga
 
 
  Olga



[jira] [Updated] (PIG-2315) Make as clause work in generate

2012-11-05 Thread Gianmarco De Francisci Morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianmarco De Francisci Morales updated PIG-2315:


Fix Version/s: (was: 0.11)
   0.12

 Make as clause work in generate
 ---

 Key: PIG-2315
 URL: https://issues.apache.org/jira/browse/PIG-2315
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
Assignee: Gianmarco De Francisci Morales
 Fix For: 0.12


 Currently, the following syntax is supported and ignored causing confusing 
 with users:
 A1 = foreach A1 generate a as a:chararray ;
 After this statement a just retains its previous type

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Our release process

2012-11-05 Thread Gianmarco De Francisci Morales
Hi,

On Mon, Nov 5, 2012 at 10:48 AM, Olga Natkovich onatkov...@yahoo.com wrote:
 Hi Gianmarco,

 Thanks for your comments. Here is a little more information.

 At Yahoo, we consider the following issues to be P1:

 (1) Bugs that cause wrong results being produced silently
 (2) Bugs that cause failures with no easy workaround


Thanks Olga, now I get what you mean.
I don't have a strong opinion on this.
On one hand I see why you don't want to put too many patches in the
branches in order to keep things stable.
On the other hand when we do a 0.10.x release with x0 the users would
like to have as many bugs fixed as possible.

 Regarding tests. I would suggest we have different rules for trunk and 
 branches:

 (1) For branches, I think we should run the full regression suite (including 
 e2e) prior to commit. This way we can ensure branch stability and, as number 
 of patches should be small, will not be a burden
 (2) For trunk, we can go with test-commit only and fix things quickly when 
 things break.

I think this makes sense. +1

 Olga

Cheers,
--
Gianmarco


Re: Our release process

2012-11-03 Thread Gianmarco De Francisci Morales
Hi,

I agree with what Alan says and I like his proposal.
However, to make it feasible, we need to make jenkins builds stable,
otherwise a real problem introduced by a patch might be lost in the
hundreds of failures due to clover licenses, minicluster issues, etc...

I don't like too much making jenkins post to jira the results of the build
after a patch is committed, as it pollutes the jira itself, however in this
case it might be a good way to promote developer responsibility. Is it
possible to activate this only for branches?

Cheers,
--
Gianmarco



On Fri, Nov 2, 2012 at 8:19 PM, Alan Gates ga...@hortonworks.com wrote:

 I am all for maintaining stability of branches, and the trunk, as everyone
 benefits from it.  But I do not think this means we should limit bug fixing
 in the branches to only critical issues.  As Pig gets more users we have
 more and more people on older branches who will want fixes for bugs without
 dealing with bigger version changes.  So I am not in favor of limiting
 checkins to branches to P1 issues.

 What if we maintain stability on the branches by quickly reverting any
 patches that break the build, the unit tests, or the e2e tests?  This
 allows us to move forward with bug fix versions, it allows those who depend
 on branch stability (which I suspect is everyone in the distribution
 business plus everyone rolling their own Pig), and it should promote
 developer responsibility (no one likes having their patches reverted).

 Alan.

 On Nov 2, 2012, at 3:58 PM, Olga Natkovich wrote:

  Hi guys,
 
  Mid next year, we agreed on a release process documented in this thread:
 http://www.mail-archive.com/dev@pig.apache.org/msg04172.html.
 
  Since then, we have not really followed either of its two rules:
 
  (1) Frequent (every 3 month releases)
  (2) Branch stability (only P1 issues on the branch).
 
  So I wanted to revisit our release procedure to make sure we have one
 that we can actually follow.
 
  For us at Yahoo, branch stability is very important since we release all
 the patches directly from the branch. If we can't rely on the fact that
 only critical fixes go in, we will need to resort to git branches that will
 make the whole process very comberson because we now need to hand pick
 patches from the apache branch and port them onto our private branch. I
 would imaging that others using Pig in production would have similar issues.
 
  Olga
 
 
  Olga




Re: [DISCUSS] Remove Penny from contrib

2012-11-01 Thread Gianmarco De Francisci Morales
+1
--
Gianmarco




On Wed, Oct 31, 2012 at 10:01 PM, Russell Jurney
russell.jur...@gmail.comwrote:

 I'll be the +1 :)

 Russell Jurney http://datasyndrome.com

 On Nov 1, 2012, at 12:53 AM, Bill Graham billgra...@gmail.com wrote:

  +1
 
  On Wed, Oct 31, 2012 at 5:36 PM, Cheolsoo Park cheol...@cloudera.com
 wrote:
 
  +1. I agree.
 
  On Wed, Oct 31, 2012 at 2:54 PM, Alan Gates ga...@hortonworks.com
 wrote:
 
  I propose we remove Penny from contrib.  Currently it does not compile
 in
  trunk.  Looking through the commit logs no significant work has been
 done
  on it since it was initially committed.  There are 3 open JIRAs that
  reference it (
 
 
 https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truejqlQuery=project+%3D+PIG+AND+%28summary+%7E+penny+OR+description+%7E+penny%29+AND+status+%3D+Open
 ).
  At this point I do not think anyone is using it or maintaining it.
 
  If someone is interested in moving this piece forward I would propose
 we
  move it to Apache Extras or something similar.
 
  Alan.
 
 
 
 
  --
  *Note that I'm no longer using my Yahoo! email address. Please email me
 at
  billgra...@gmail.com going forward.*



[jira] [Commented] (PIG-2970) Nested foreach getting incorrect schema when having unrelated inner query

2012-10-31 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13488197#comment-13488197
 ] 

Gianmarco De Francisci Morales commented on PIG-2970:
-

If it works and does not interfere with other parts of the plan building, I 
think this is a good approach.

 Nested foreach getting incorrect schema when having unrelated inner query
 -

 Key: PIG-2970
 URL: https://issues.apache.org/jira/browse/PIG-2970
 Project: Pig
  Issue Type: Bug
  Components: parser
Affects Versions: 0.10.0
Reporter: Koji Noguchi
Assignee: Koji Noguchi
Priority: Minor
 Fix For: 0.11, 0.12

 Attachments: pig-2970-trunk-v01.txt


 While looking at PIG-2968, hit a weird error message.
 {noformat}
 $ cat -n test/foreach2.pig
  1  daily = load 'nyse' as (exchange, symbol);
  2  grpd = group daily by exchange;
  3  unique = foreach grpd {
  4  sym = daily.symbol;
  5  uniq_sym = distinct sym;
  6  --ignoring uniq_sym result
  7  generate group, daily;
  8  };
  9  describe unique;
 10  zzz = foreach unique generate group;
 11  explain zzz;
 % pig -x local -t ColumnMapKeyPrune test/foreach2.pig
 ...
 unique: {symbol: bytearray}
 2012-10-12 16:55:44,226 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1025: 
 file test/foreach2.pig, line 10, column 30 Invalid field projection. 
 Projected field [group] does not exist in schema: symbol:bytearray.
 ...
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-19) A=load causes parse error

2012-10-30 Thread Gianmarco De Francisci Morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-19?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianmarco De Francisci Morales updated PIG-19:
--

Assignee: Gianmarco De Francisci Morales

 A=load causes parse error
 -

 Key: PIG-19
 URL: https://issues.apache.org/jira/browse/PIG-19
 Project: Pig
  Issue Type: Bug
  Components: grunt
Reporter: Olga Natkovich
Assignee: Gianmarco De Francisci Morales
Priority: Minor
 Fix For: 0.12


 Parser expects spaces around =. This should be a minor change in 
 src/org/apache/pig/tools/grunt/GruntParser.jj

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3008) Fix whitespace in Pig code

2012-10-29 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486350#comment-13486350
 ] 

Gianmarco De Francisci Morales commented on PIG-3008:
-

1. I am assuming we will integrate this with Ant. The checkstyle ant target 
wants a list of file to run on. So if you are able to identify the list of 
files you changed automatically, then yes. How do you define changed?
2. Yes, as long as there is a compatible formatter profile (i.e. it is not too 
complex). For our use case it should be fine. I know it can be done with 
Eclipse. For other IDEs I guess it can be done as well but don't know how.

We also need integration with Jenkins for automatic builds.

 Fix whitespace in Pig code
 --

 Key: PIG-3008
 URL: https://issues.apache.org/jira/browse/PIG-3008
 Project: Pig
  Issue Type: Improvement
Reporter: Jonathan Coveney
 Fix For: 0.12

 Attachments: checkstyle.xml


 This JIRA exists mainly to get a conversation started. We've talked about it 
 before, and it's a tricky issue. That said, some of the Pig code is super, 
 super gnarly. We need some sort of path that will let it eventually be 
 fix-able.
 I posit: any file that hasn't been touched for over 6 months is eligible for 
 a whitespace patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3008) Fix whitespace in Pig code

2012-10-28 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13485689#comment-13485689
 ] 

Gianmarco De Francisci Morales commented on PIG-3008:
-

I agree to fix whitespace, and the policy you suggest seems reasonable.
However, this conversation belongs to the more general topic of style.
Are we willing to define and enforce a code style policy for Pig?
Other projects are using checkstyle to do so, so that new patches comply.
I am fine with being stricter on style, but only for things that can be 
automated within an IDE (whitespace is one of those).

 Fix whitespace in Pig code
 --

 Key: PIG-3008
 URL: https://issues.apache.org/jira/browse/PIG-3008
 Project: Pig
  Issue Type: Improvement
Reporter: Jonathan Coveney
 Fix For: 0.12


 This JIRA exists mainly to get a conversation started. We've talked about it 
 before, and it's a tricky issue. That said, some of the Pig code is super, 
 super gnarly. We need some sort of path that will let it eventually be 
 fix-able.
 I posit: any file that hasn't been touched for over 6 months is eligible for 
 a whitespace patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3001) TestExecutableManager.testAddJobConfToEnv fails randomly

2012-10-28 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13485710#comment-13485710
 ] 

Gianmarco De Francisci Morales commented on PIG-3001:
-

Thanks Rohini, I had missed this bit.
Then I also vote for removing the conversion.

 TestExecutableManager.testAddJobConfToEnv fails randomly
 

 Key: PIG-3001
 URL: https://issues.apache.org/jira/browse/PIG-3001
 Project: Pig
  Issue Type: Sub-task
Reporter: Gianmarco De Francisci Morales
Assignee: Cheolsoo Park
Priority: Minor
 Attachments: PIG-3001-2.patch, PIG-3001-3.patch, PIG-3001-4.patch, 
 PIG-3001.patch


 The test in the Summary fails intermittently.
 This is due to using a random number generator without seeding it.
 We should avoid stochastic tests.
 Furthermore, the test itself is ill conceived.
 Here the failure summary:
 {code}
 12/10/23 11:02:48 WARN streaming.ExecutableManager: Property set in 
 pig.streaming.environment not found in Configuration: ⻨ꢏ切歯
 12/10/23 11:02:48 WARN streaming.ExecutableManager: Property set in 
 pig.streaming.environment not found in Configuration: 狓偝
 12/10/23 11:02:48 WARN streaming.ExecutableManager: Property set in 
 pig.streaming.environment not found in Configuration: 墣챟㌌̀썬鼹騷
 12/10/23 11:02:48 WARN streaming.ExecutableManager: Property set in 
 pig.streaming.environment not found in Configuration: 훎滼
 {code}
 {code}
 Error Message:
 There should be no remaining pairs in the included map
 Stacktrace:
 junit.framework.AssertionFailedError: There should be no remaining pairs in 
 the included map
   at 
 org.apache.pig.impl.streaming.TestExecutableManager.testAddJobConfToEnv(TestExecutableManager.java:84)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3008) Fix whitespace in Pig code

2012-10-28 Thread Gianmarco De Francisci Morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianmarco De Francisci Morales updated PIG-3008:


Attachment: checkstyle.xml

Agreed, let's start with this.
Here a first stab for a checkstyle with FileTabCharacter, Indentation, and 
RegexpSingleLine for trailing white space.
I threw in a few other things which I think are useful as well:
Checking all files have Apache header, checking imports and using ==/!+ with 
Strings.

 Fix whitespace in Pig code
 --

 Key: PIG-3008
 URL: https://issues.apache.org/jira/browse/PIG-3008
 Project: Pig
  Issue Type: Improvement
Reporter: Jonathan Coveney
 Fix For: 0.12

 Attachments: checkstyle.xml


 This JIRA exists mainly to get a conversation started. We've talked about it 
 before, and it's a tricky issue. That said, some of the Pig code is super, 
 super gnarly. We need some sort of path that will let it eventually be 
 fix-able.
 I posit: any file that hasn't been touched for over 6 months is eligible for 
 a whitespace patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3006) Modernize a chunk of the tests

2012-10-26 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13485355#comment-13485355
 ] 

Gianmarco De Francisci Morales commented on PIG-3006:
-

I am fine with the changes, but it's a huge patch so I skimmed through it, I 
might have missed something.
+0, let's wait for another pair of eyeballs.

 Modernize a chunk of the tests
 --

 Key: PIG-3006
 URL: https://issues.apache.org/jira/browse/PIG-3006
 Project: Pig
  Issue Type: Improvement
Reporter: Jonathan Coveney
Assignee: Jonathan Coveney
 Fix For: 0.12

 Attachments: PIG-3006-0.patch, PIG-3006-1.patch


 A lot of the tests use antiquated patterns. My goal was to refactor them in a 
 couple ways:
 - get rid of the annotation specifying Junit 4. All should use JUnit 4 
 (question: where is the Junit 3 dependency even being pulled in?)
 - Nothing should extend TestCase. Everything should be annotation driven.
 - Properly use asserts. There was a lot of assertTrue(null==thing), so I 
 replaced it with assertNull(thing), and so on.
 - Get rid of MiniCluster use in a handful of cases.
 I've run every test and they pass, EXCEPT TestLargeFile which is failing on 
 trunk anyway.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3001) TestExecutableManager.testAddJobConfToEnv fails randomly

2012-10-26 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13485357#comment-13485357
 ] 

Gianmarco De Francisci Morales commented on PIG-3001:
-

I would also remove it, however backwards compatibility is an issue.
I am fine with deprecating it and show a warning to the user.
We can remove it altogether in next release.
(if we choose this way, let's open a Jira to keep track of it).

 TestExecutableManager.testAddJobConfToEnv fails randomly
 

 Key: PIG-3001
 URL: https://issues.apache.org/jira/browse/PIG-3001
 Project: Pig
  Issue Type: Sub-task
Reporter: Gianmarco De Francisci Morales
Assignee: Cheolsoo Park
Priority: Minor
 Attachments: PIG-3001-2.patch, PIG-3001-3.patch, PIG-3001-4.patch, 
 PIG-3001.patch


 The test in the Summary fails intermittently.
 This is due to using a random number generator without seeding it.
 We should avoid stochastic tests.
 Furthermore, the test itself is ill conceived.
 Here the failure summary:
 {code}
 12/10/23 11:02:48 WARN streaming.ExecutableManager: Property set in 
 pig.streaming.environment not found in Configuration: ⻨ꢏ切歯
 12/10/23 11:02:48 WARN streaming.ExecutableManager: Property set in 
 pig.streaming.environment not found in Configuration: 狓偝
 12/10/23 11:02:48 WARN streaming.ExecutableManager: Property set in 
 pig.streaming.environment not found in Configuration: 墣챟㌌̀썬鼹騷
 12/10/23 11:02:48 WARN streaming.ExecutableManager: Property set in 
 pig.streaming.environment not found in Configuration: 훎滼
 {code}
 {code}
 Error Message:
 There should be no remaining pairs in the included map
 Stacktrace:
 junit.framework.AssertionFailedError: There should be no remaining pairs in 
 the included map
   at 
 org.apache.pig.impl.streaming.TestExecutableManager.testAddJobConfToEnv(TestExecutableManager.java:84)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [ANNOUNCE] Welcome new Apache Pig Committers Rohini Palaniswamy

2012-10-26 Thread Gianmarco De Francisci Morales
Congratulations Rohini!
Welcome onboard :)
--
Gianmarco


On Fri, Oct 26, 2012 at 7:32 PM, Prasanth J buckeye.prasa...@gmail.com wrote:
 Congrats Rohini!

 Thanks
 -- Prasanth

 On Oct 26, 2012, at 10:21 PM, Santhosh Srinivasan santhosh_mut...@yahoo.com 
 wrote:

 Congrats Rohini! Full speed ahead now :)

 On Oct 26, 2012, at 4:37 PM, Daniel Dai da...@hortonworks.com wrote:

 Here is another Pig committer announcement today. Please welcome
 Rohini Palaniswamy to be a Pig committer!

 Thanks,
 Daniel



Re: Review Request: Modernize a chunk of the tests

2012-10-25 Thread Gianmarco De Francisci Morales

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7734/#review12786
---



test/org/apache/pig/test/TestConversions.java
https://reviews.apache.org/r/7734/#comment27353

Do we really need Integer.valueOf() ?



test/org/apache/pig/test/TestInputOutputFileValidator.java
https://reviews.apache.org/r/7734/#comment27354

We should use @Test(expected = Exception.class) 
instead



test/org/apache/pig/test/TestInputOutputFileValidator.java
https://reviews.apache.org/r/7734/#comment27355

Same here,  @Test(expected = Exception.class) 
Possibly a proper subclass of Exception



test/org/apache/pig/test/TestInputOutputMiniClusterFileValidator.java
https://reviews.apache.org/r/7734/#comment27356

Same here, @Test(expected = Exception.class) or a proper subclass.



test/org/apache/pig/test/TestLargeFile.java
https://reviews.apache.org/r/7734/#comment27357

Why the change in name?



test/org/apache/pig/test/TestNullConstant.java
https://reviews.apache.org/r/7734/#comment27358

What is the accepted way of creating temporary datasets?
Are we suggesting everybody to use mock.Storage() ?


- Gianmarco De Francisci Morales


On Oct. 25, 2012, 6:05 p.m., Jonathan Coveney wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/7734/
 ---
 
 (Updated Oct. 25, 2012, 6:05 p.m.)
 
 
 Review request for pig and Julien Le Dem.
 
 
 Description
 ---
 
 A lot of the tests use antiquated patterns. My goal was to refactor them in a 
 couple ways:
 
 - get rid of the annotation specifying Junit 4. All should use JUnit 4 
 (question: where is the Junit 3 dependency even being pulled?)
 - Nothing should extend TestCase. Everything should be annotation driven.
 - Properly use asserts. There was a lot of assertTrue(null==thing), so I 
 replaced it with assertNull(thing), and so on.
 - Get rid of MiniCluster use in a handful of cases.
 
 
 This addresses bug PIG-3006.
 https://issues.apache.org/jira/browse/PIG-3006
 
 
 Diffs
 -
 
   test/org/apache/pig/test/PigExecTestCase.java 32a502c 
   test/org/apache/pig/test/TestAlgebraicEval.java 0bbd83d 
   test/org/apache/pig/test/TestAlgebraicEvalLocal.java df4b76a 
   test/org/apache/pig/test/TestBagFormat.java 09298d4 
   test/org/apache/pig/test/TestBatchAliases.java 6e952c7 
   test/org/apache/pig/test/TestCompressedFiles.java d54ffaa 
   test/org/apache/pig/test/TestConversions.java 152ad5c 
   test/org/apache/pig/test/TestDeleteOnFail.java 7070285 
   test/org/apache/pig/test/TestFilterOpNumeric.java 730e808 
   test/org/apache/pig/test/TestFilterOpString.java b65965f 
   test/org/apache/pig/test/TestFilterSimplification.java ade97b6 
   test/org/apache/pig/test/TestForEachNestedPlanLocal.java a78568e 
   test/org/apache/pig/test/TestFuncSpec.java bc7144c 
   test/org/apache/pig/test/TestInfixArithmetic.java cdf6948 
   test/org/apache/pig/test/TestInputOutputFileValidator.java 67b2873 
   test/org/apache/pig/test/TestInputOutputMiniClusterFileValidator.java 
 caa62cb 
   test/org/apache/pig/test/TestInstantiateFunc.java 31c37b1 
   test/org/apache/pig/test/TestJoin.java a4f3aff 
   test/org/apache/pig/test/TestKeyTypeDiscoveryVisitor.java 2bbeca1 
   test/org/apache/pig/test/TestLargeFile.java 79590ce 
   test/org/apache/pig/test/TestLocal.java 5680196 
   test/org/apache/pig/test/TestLocal2.java eea7b2f 
   test/org/apache/pig/test/TestMapReduce2.java 30574db 
   test/org/apache/pig/test/TestNewPlanColumnPrune.java bed006e 
   test/org/apache/pig/test/TestNewPlanListener.java 7701182 
   test/org/apache/pig/test/TestNewPlanOperatorPlan.java 1f8fe56 
   test/org/apache/pig/test/TestNewPlanPruneMapKeys.java d1cce22 
   test/org/apache/pig/test/TestNewPlanRule.java 4a7ff0a 
   test/org/apache/pig/test/TestNullConstant.java 3ae25d9 
   test/org/apache/pig/test/TestOrderBy2.java 4ee4f26 
   test/org/apache/pig/test/TestOrderBy3.java 2067d7a 
   test/org/apache/pig/test/TestPOBinCond.java 20bd734 
   test/org/apache/pig/test/TestPODistinct.java 60f9d73 
   test/org/apache/pig/test/TestPOGenerate.java e0fd796 
   test/org/apache/pig/test/TestPOMapLookUp.java 3ed0900 
   test/org/apache/pig/test/TestPONegative.java 220c409 
   test/org/apache/pig/test/TestPORegexp.java d6e15ac 
   test/org/apache/pig/test/TestPOSort.java 600ee0c 
   test/org/apache/pig/test/TestPOUserFunc.java 3a90d6c 
   test/org/apache/pig/test/TestParamSubPreproc.java 1a52691 
   test/org/apache/pig/test/TestParser.java 17dc42a 
   test/org/apache/pig/test/TestPi.java f0883d1 
   test/org/apache/pig/test/TestPigProgressReporting.java e4f76ec 
   test/org/apache/pig/test/TestPigScriptParser.java 2acb1a8 
   test/org/apache/pig/test/TestPigSplit.java af70e9d 
   test

[jira] [Commented] (PIG-3006) Modernize a chunk of the tests

2012-10-25 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13484424#comment-13484424
 ] 

Gianmarco De Francisci Morales commented on PIG-3006:
-

Great job Jon! Our test suite definitely needed some cleanup.

 Modernize a chunk of the tests
 --

 Key: PIG-3006
 URL: https://issues.apache.org/jira/browse/PIG-3006
 Project: Pig
  Issue Type: Improvement
Reporter: Jonathan Coveney
Assignee: Jonathan Coveney
 Fix For: 0.12

 Attachments: PIG-3006-0.patch


 A lot of the tests use antiquated patterns. My goal was to refactor them in a 
 couple ways:
 - get rid of the annotation specifying Junit 4. All should use JUnit 4 
 (question: where is the Junit 3 dependency even being pulled in?)
 - Nothing should extend TestCase. Everything should be annotation driven.
 - Properly use asserts. There was a lot of assertTrue(null==thing), so I 
 replaced it with assertNull(thing), and so on.
 - Get rid of MiniCluster use in a handful of cases.
 I've run every test and they pass, EXCEPT TestLargeFile which is failing on 
 trunk anyway.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3006) Modernize a chunk of the tests

2012-10-25 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13484437#comment-13484437
 ] 

Gianmarco De Francisci Morales commented on PIG-3006:
-

I did a quick review and it looked good (I left a couple of comments), but I 
would feel better if one more committer reviews it.

 Modernize a chunk of the tests
 --

 Key: PIG-3006
 URL: https://issues.apache.org/jira/browse/PIG-3006
 Project: Pig
  Issue Type: Improvement
Reporter: Jonathan Coveney
Assignee: Jonathan Coveney
 Fix For: 0.12

 Attachments: PIG-3006-0.patch


 A lot of the tests use antiquated patterns. My goal was to refactor them in a 
 couple ways:
 - get rid of the annotation specifying Junit 4. All should use JUnit 4 
 (question: where is the Junit 3 dependency even being pulled in?)
 - Nothing should extend TestCase. Everything should be annotation driven.
 - Properly use asserts. There was a lot of assertTrue(null==thing), so I 
 replaced it with assertNull(thing), and so on.
 - Get rid of MiniCluster use in a handful of cases.
 I've run every test and they pass, EXCEPT TestLargeFile which is failing on 
 trunk anyway.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2999) Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort failing

2012-10-24 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483338#comment-13483338
 ] 

Gianmarco De Francisci Morales commented on PIG-2999:
-

I see these tests fail:
org.apache.pig.impl.streaming.TestExecutableManager.testAddJobConfToEnv
unit.framework.AssertionFailedError: There should be no remaining pairs 
in the included map at

org.apache.pig.impl.streaming.TestExecutableManager.testAddJobConfToEnv(TestExecutableManager.java:84)

org.apache.pig.test.TestDataModel.testMultiFieldTupleCompareTo
less than tuple with greater value expected:-1 but was:-2


For the second one, the solution proposed by Koji is fine.
Not sure the first one is related (probably not).

These tests also fail but I think they are not directly related to this patch.
org.apache.pig.test.TestJobSubmission.testReducerNumEstimation
org.apache.pig.test.TestMacroExpansion.testMacroAliasConversion
org.apache.pig.test.TestScriptLanguage.bindLocalVariableTest2
org.apache.pig.test.TestStreaming.testInputCacheSpecs 

Once the testMultiFieldTupleCompareTo test case is fixed I am OK with the patch.

 Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort 
 failing
 -

 Key: PIG-2999
 URL: https://issues.apache.org/jira/browse/PIG-2999
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11, 0.12
Reporter: Koji Noguchi
Assignee: Jonathan Coveney
 Attachments: pig-2999-v1.txt, pig-2999-v2.txt


 I think I broke the build from PIG-2975.  I see couple of tests failing at 
 BinInterSedesTupleRawComparator. 
 {noformat}
 12/10/22 22:26:15 WARN mapred.LocalJobRunner: job_local_0022
 java.nio.BufferUnderflowException
   at java.nio.Buffer.nextGetIndex(Buffer.java:478)
   at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:387)
   at 
 org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinInterSedesDatum(BinInterSedes.java:829)
   at 
 org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinSedesTuple(BinInterSedes.java:732)
   at 
 org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compare(BinInterSedes.java:695)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSecondaryKeyComparator.compare(PigSecondaryKeyComparator.java:78)
   at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
   at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:139)
   at 
 org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
   at 
 org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
   at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
   at org.apache.hadoop.mapred.ReduceTask$4.next(ReduceTask.java:625)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:117)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3001) TestExecutableManager.testAddJobConfToEnv fails randomly

2012-10-24 Thread Gianmarco De Francisci Morales (JIRA)
Gianmarco De Francisci Morales created PIG-3001:
---

 Summary: TestExecutableManager.testAddJobConfToEnv fails randomly
 Key: PIG-3001
 URL: https://issues.apache.org/jira/browse/PIG-3001
 Project: Pig
  Issue Type: Bug
Reporter: Gianmarco De Francisci Morales
Priority: Minor


The test in the Summary fails intermittently.
This is due to using a random number generator without seeding it.
We should avoid stochastic tests.
Furthermore, the test itself is ill conceived.

Here the failure summary:
{code}
12/10/23 11:02:48 WARN streaming.ExecutableManager: Property set in 
pig.streaming.environment not found in Configuration: ⻨ꢏ切歯
12/10/23 11:02:48 WARN streaming.ExecutableManager: Property set in 
pig.streaming.environment not found in Configuration: 狓偝
12/10/23 11:02:48 WARN streaming.ExecutableManager: Property set in 
pig.streaming.environment not found in Configuration: 墣챟㌌̀썬鼹騷
12/10/23 11:02:48 WARN streaming.ExecutableManager: Property set in 
pig.streaming.environment not found in Configuration: 훎滼
{code}

{code}
Error Message:
There should be no remaining pairs in the included map

Stacktrace:
junit.framework.AssertionFailedError: There should be no remaining pairs in the 
included map
at 
org.apache.pig.impl.streaming.TestExecutableManager.testAddJobConfToEnv(TestExecutableManager.java:84)
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2999) Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort failing

2012-10-24 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483352#comment-13483352
 ] 

Gianmarco De Francisci Morales commented on PIG-2999:
-

I checked TestExecutableManager and the patch has nothing to do with it.
I opened PIG-3001 to address the issue.

 Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort 
 failing
 -

 Key: PIG-2999
 URL: https://issues.apache.org/jira/browse/PIG-2999
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11, 0.12
Reporter: Koji Noguchi
Assignee: Jonathan Coveney
 Attachments: pig-2999-v1.txt, pig-2999-v2.txt


 I think I broke the build from PIG-2975.  I see couple of tests failing at 
 BinInterSedesTupleRawComparator. 
 {noformat}
 12/10/22 22:26:15 WARN mapred.LocalJobRunner: job_local_0022
 java.nio.BufferUnderflowException
   at java.nio.Buffer.nextGetIndex(Buffer.java:478)
   at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:387)
   at 
 org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinInterSedesDatum(BinInterSedes.java:829)
   at 
 org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinSedesTuple(BinInterSedes.java:732)
   at 
 org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compare(BinInterSedes.java:695)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSecondaryKeyComparator.compare(PigSecondaryKeyComparator.java:78)
   at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
   at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:139)
   at 
 org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
   at 
 org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
   at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
   at org.apache.hadoop.mapred.ReduceTask$4.next(ReduceTask.java:625)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:117)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2999) Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort failing

2012-10-24 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13483388#comment-13483388
 ] 

Gianmarco De Francisci Morales commented on PIG-2999:
-

Hi Cheolsoo,

Thanks for the summary.
I guess it is an intermittent failure:
{code}
Error Message

Unable to find region for  after 10 tries.
Stacktrace

org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find 
region for  after 10 tries.
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:677)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:586)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:555)
at org.apache.hadoop.hbase.client.HTable.init(HTable.java:171)
at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:369)
at 
org.apache.pig.test.TestJobSubmission.testReducerNumEstimation(TestJobSubmission.java:545)
{code}

 Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort 
 failing
 -

 Key: PIG-2999
 URL: https://issues.apache.org/jira/browse/PIG-2999
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11, 0.12
Reporter: Koji Noguchi
Assignee: Jonathan Coveney
 Attachments: pig-2999-v1.txt, pig-2999-v2.txt


 I think I broke the build from PIG-2975.  I see couple of tests failing at 
 BinInterSedesTupleRawComparator. 
 {noformat}
 12/10/22 22:26:15 WARN mapred.LocalJobRunner: job_local_0022
 java.nio.BufferUnderflowException
   at java.nio.Buffer.nextGetIndex(Buffer.java:478)
   at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:387)
   at 
 org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinInterSedesDatum(BinInterSedes.java:829)
   at 
 org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinSedesTuple(BinInterSedes.java:732)
   at 
 org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compare(BinInterSedes.java:695)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSecondaryKeyComparator.compare(PigSecondaryKeyComparator.java:78)
   at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
   at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:139)
   at 
 org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
   at 
 org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
   at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
   at org.apache.hadoop.mapred.ReduceTask$4.next(ReduceTask.java:625)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:117)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2999) Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort failing

2012-10-24 Thread Gianmarco De Francisci Morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianmarco De Francisci Morales updated PIG-2999:


   Resolution: Fixed
Fix Version/s: 0.11
 Assignee: Cheolsoo Park  (was: Jonathan Coveney)
   Status: Resolved  (was: Patch Available)

Verified that failing tests pass locally.
+1

Committed to trunk and 0.11.

Thanks for fixing this Cheolsoo, Koji.

 Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort 
 failing
 -

 Key: PIG-2999
 URL: https://issues.apache.org/jira/browse/PIG-2999
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.11, 0.12
Reporter: Koji Noguchi
Assignee: Cheolsoo Park
 Fix For: 0.11

 Attachments: pig-2999-v1.txt, pig-2999-v2.txt, pig-2999-v3.txt


 I think I broke the build from PIG-2975.  I see couple of tests failing at 
 BinInterSedesTupleRawComparator. 
 {noformat}
 12/10/22 22:26:15 WARN mapred.LocalJobRunner: job_local_0022
 java.nio.BufferUnderflowException
   at java.nio.Buffer.nextGetIndex(Buffer.java:478)
   at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:387)
   at 
 org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinInterSedesDatum(BinInterSedes.java:829)
   at 
 org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinSedesTuple(BinInterSedes.java:732)
   at 
 org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compare(BinInterSedes.java:695)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSecondaryKeyComparator.compare(PigSecondaryKeyComparator.java:78)
   at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
   at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:139)
   at 
 org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
   at 
 org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
   at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
   at org.apache.hadoop.mapred.ReduceTask$4.next(ReduceTask.java:625)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:117)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: PROPOSAL: how to handle release documentation going forward

2012-10-23 Thread Gianmarco De Francisci Morales
I guess this is the only way to ensure documented code.

+1

We need to put this rule somewhere, maybe in the Wiki?

Cheers,
--
Gianmarco


On Tue, Oct 23, 2012 at 12:37 AM, Santhosh M S
santhosh_mut...@yahoo.com wrote:
 +1


 
  From: Jonathan Coveney jcove...@gmail.com
 To: dev@pig.apache.org; Olga Natkovich onatkov...@yahoo.com
 Sent: Monday, October 22, 2012 5:09 PM
 Subject: Re: PROPOSAL: how to handle release documentation going forward

 As someone who chronically under-documents, I think that this is a good
 idea. +1

 2012/10/22 Olga Natkovich onatkov...@yahoo.com

 Hi,

 Since we lost the dedicated document writer for Pig, would it make sense
 to require that going forward (0.12 and beyond) we require that
 documentation updates are included in the patch together with code changes
 and tests. I think that should work for most features/updates except
 perhaps big items that might require more than one JIRA to be completed
 before documentation changes make sense.

 Comments?

 Olga



[jira] [Commented] (PIG-2999) Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort failing

2012-10-23 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482489#comment-13482489
 ] 

Gianmarco De Francisci Morales commented on PIG-2999:
-

The patch looks good.
Running tests to make sure everything works.

 Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort 
 failing
 -

 Key: PIG-2999
 URL: https://issues.apache.org/jira/browse/PIG-2999
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11, 0.12
Reporter: Koji Noguchi
Assignee: Jonathan Coveney
 Attachments: pig-2999-v1.txt, pig-2999-v2.txt


 I think I broke the build from PIG-2975.  I see couple of tests failing at 
 BinInterSedesTupleRawComparator. 
 {noformat}
 12/10/22 22:26:15 WARN mapred.LocalJobRunner: job_local_0022
 java.nio.BufferUnderflowException
   at java.nio.Buffer.nextGetIndex(Buffer.java:478)
   at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:387)
   at 
 org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinInterSedesDatum(BinInterSedes.java:829)
   at 
 org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinSedesTuple(BinInterSedes.java:732)
   at 
 org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compare(BinInterSedes.java:695)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSecondaryKeyComparator.compare(PigSecondaryKeyComparator.java:78)
   at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
   at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:139)
   at 
 org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
   at 
 org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
   at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
   at org.apache.hadoop.mapred.ReduceTask$4.next(ReduceTask.java:625)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:117)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2999) Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort failing

2012-10-23 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482538#comment-13482538
 ] 

Gianmarco De Francisci Morales commented on PIG-2999:
-

Sure, I will take care of it.

 Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort 
 failing
 -

 Key: PIG-2999
 URL: https://issues.apache.org/jira/browse/PIG-2999
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11, 0.12
Reporter: Koji Noguchi
Assignee: Jonathan Coveney
 Attachments: pig-2999-v1.txt, pig-2999-v2.txt


 I think I broke the build from PIG-2975.  I see couple of tests failing at 
 BinInterSedesTupleRawComparator. 
 {noformat}
 12/10/22 22:26:15 WARN mapred.LocalJobRunner: job_local_0022
 java.nio.BufferUnderflowException
   at java.nio.Buffer.nextGetIndex(Buffer.java:478)
   at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:387)
   at 
 org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinInterSedesDatum(BinInterSedes.java:829)
   at 
 org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinSedesTuple(BinInterSedes.java:732)
   at 
 org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compare(BinInterSedes.java:695)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSecondaryKeyComparator.compare(PigSecondaryKeyComparator.java:78)
   at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
   at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:139)
   at 
 org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
   at 
 org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
   at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
   at org.apache.hadoop.mapred.ReduceTask$4.next(ReduceTask.java:625)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:117)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2999) Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort failing

2012-10-23 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482837#comment-13482837
 ] 

Gianmarco De Francisci Morales commented on PIG-2999:
-

Not yet. I am running the full test suite to be sure we don't break other 
things, but it takes a while.

 Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort 
 failing
 -

 Key: PIG-2999
 URL: https://issues.apache.org/jira/browse/PIG-2999
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11, 0.12
Reporter: Koji Noguchi
Assignee: Jonathan Coveney
 Attachments: pig-2999-v1.txt, pig-2999-v2.txt


 I think I broke the build from PIG-2975.  I see couple of tests failing at 
 BinInterSedesTupleRawComparator. 
 {noformat}
 12/10/22 22:26:15 WARN mapred.LocalJobRunner: job_local_0022
 java.nio.BufferUnderflowException
   at java.nio.Buffer.nextGetIndex(Buffer.java:478)
   at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:387)
   at 
 org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinInterSedesDatum(BinInterSedes.java:829)
   at 
 org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinSedesTuple(BinInterSedes.java:732)
   at 
 org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compare(BinInterSedes.java:695)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSecondaryKeyComparator.compare(PigSecondaryKeyComparator.java:78)
   at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
   at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:139)
   at 
 org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
   at 
 org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
   at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
   at org.apache.hadoop.mapred.ReduceTask$4.next(ReduceTask.java:625)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:117)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2999) Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort failing

2012-10-23 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482936#comment-13482936
 ] 

Gianmarco De Francisci Morales commented on PIG-2999:
-

Makes sense. Comparable only guarantees 0 ==0 and 0

 Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort 
 failing
 -

 Key: PIG-2999
 URL: https://issues.apache.org/jira/browse/PIG-2999
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11, 0.12
Reporter: Koji Noguchi
Assignee: Jonathan Coveney
 Attachments: pig-2999-v1.txt, pig-2999-v2.txt


 I think I broke the build from PIG-2975.  I see couple of tests failing at 
 BinInterSedesTupleRawComparator. 
 {noformat}
 12/10/22 22:26:15 WARN mapred.LocalJobRunner: job_local_0022
 java.nio.BufferUnderflowException
   at java.nio.Buffer.nextGetIndex(Buffer.java:478)
   at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:387)
   at 
 org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinInterSedesDatum(BinInterSedes.java:829)
   at 
 org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinSedesTuple(BinInterSedes.java:732)
   at 
 org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compare(BinInterSedes.java:695)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSecondaryKeyComparator.compare(PigSecondaryKeyComparator.java:78)
   at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
   at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:139)
   at 
 org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
   at 
 org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
   at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
   at org.apache.hadoop.mapred.ReduceTask$4.next(ReduceTask.java:625)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:117)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

2012-10-22 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13481770#comment-13481770
 ] 

Gianmarco De Francisci Morales commented on PIG-2975:
-

Guys, great job in moving this forward!
I am sold an all the improvements in the patch.
+1

 TestTypedMap.testOrderBy failing with incorrect result 
 ---

 Key: PIG-2975
 URL: https://issues.apache.org/jira/browse/PIG-2975
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.11
Reporter: Koji Noguchi
Assignee: Koji Noguchi
Priority: Blocker
 Fix For: 0.11

 Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, 
 pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, 
 pig-2975-trunk_v03-unionapproach.txt, pig-2975-trunk_v04-purerawcompare.txt, 
 pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withouttest.txt, 
 pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withtest2.txt, 
 pig-2975-trunk_v05-BinInterSedesRawComparatorAndlightweight-withtest.txt


 Looked at 
 {noformat}
 junit.framework.AssertionFailedError
 at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
 {noformat}
 This looks like a valid test case failing with incorrect result.
 {noformat}
 % cat test/orderby.txt
 [key#1,key9#23]
 [key#3,key3#2]
 [key#22]
 % cat test/orderby.pig
 a = load 'test/orderby.txt' as (m:[]);
 b = foreach a generate m#'key' as b0;
 dump b;
 c = order b by b0;
 dump c;
 % java ... org.apache.pig.Main-x local test/orderby.pig 
 [dump b]
 (1)
 (3)
 (22)
 ...
 [dump c]
 (1)
 (1)
 (22)
 %
 where did the '(3)' go?
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2993) Fix local mode on Hadoop-0.23

2012-10-22 Thread Gianmarco De Francisci Morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianmarco De Francisci Morales resolved PIG-2993.
-

   Resolution: Duplicate
Fix Version/s: (was: 0.11)

Thanks for the walkthrough.
Indeed, Pig was picking the Hadoop installed on my machine.
All the rest is as you described.

Closing as duplicate.

 Fix local mode on Hadoop-0.23
 -

 Key: PIG-2993
 URL: https://issues.apache.org/jira/browse/PIG-2993
 Project: Pig
  Issue Type: Sub-task
Reporter: Gianmarco De Francisci Morales

 When compiling with -Dhadoopversion=23 and launching Pig in local mode (-x 
 local) the shell just fills up with error notifications:
 {code}
 2012-10-19 15:10:17,360 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 2998: Unhandled internal error. Could not initialize class 
 org.apache.pig.tools.pigstats.PigStatsUtil
 {code}
 Here the stack trace:
 {code}
 Pig Stack Trace
 ---
 ERROR 2998: Unhandled internal error. 
 org/apache/hadoop/mapreduce/task/JobContextImpl
 java.lang.NoClassDefFoundError: 
 org/apache/hadoop/mapreduce/task/JobContextImpl
 at 
 org.apache.pig.tools.pigstats.PigStatsUtil.clinit(PigStatsUtil.java:54)
 at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:67)
 at org.apache.pig.Main.run(Main.java:538)
 at org.apache.pig.Main.main(Main.java:154)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.hadoop.mapreduce.task.JobContextImpl
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
 ... 9 more
 
 Pig Stack Trace
 ---
 ERROR 2998: Unhandled internal error. Could not initialize class 
 org.apache.pig.tools.pigstats.PigStatsUtil
 java.lang.NoClassDefFoundError: Could not initialize class 
 org.apache.pig.tools.pigstats.PigStatsUtil
 at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:67)
 at org.apache.pig.Main.run(Main.java:538)
 at org.apache.pig.Main.main(Main.java:154)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2941) Ivy resolvers in pig don't have consistent chaining and don't have a kitchen sink option for novices

2012-10-22 Thread Gianmarco De Francisci Morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianmarco De Francisci Morales updated PIG-2941:


   Resolution: Fixed
Fix Version/s: (was: 0.10.0)
   0.12
   Status: Resolved  (was: Patch Available)

+1
Committed to trunk.

Thanks John!

 Ivy resolvers in pig don't have consistent chaining and don't have a kitchen 
 sink option for novices
 

 Key: PIG-2941
 URL: https://issues.apache.org/jira/browse/PIG-2941
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.10.0
Reporter: John Gordon
Assignee: John Gordon
 Fix For: 0.12

 Attachments: 
 0001-IvySettings.xml-refactor-to-simplify-resolution.patch, 
 PIG-2941.trunk.002.patch


 The Ivy resolvers in Pig are split into default, external, and internal -- 
 and they are all actually distinct.  There isn't a resolver that rolls over 
 all three, and fallbacks aren't in place.  Ideally, these resolver should 
 chain right through with the default following a best practice fallback for 
 novices.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2999) Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort failing

2012-10-22 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13482082#comment-13482082
 ] 

Gianmarco De Francisci Morales commented on PIG-2999:
-

Most likely you are correct Jonathan.

 Regression after PIG-2975: BinInterSedesTupleRawComparator secondary sort 
 failing
 -

 Key: PIG-2999
 URL: https://issues.apache.org/jira/browse/PIG-2999
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.11, 0.12
Reporter: Koji Noguchi

 I think I broke the build from PIG-2975.  I see couple of tests failing at 
 BinInterSedesTupleRawComparator. 
 {noformat}
 12/10/22 22:26:15 WARN mapred.LocalJobRunner: job_local_0022
 java.nio.BufferUnderflowException
   at java.nio.Buffer.nextGetIndex(Buffer.java:478)
   at java.nio.HeapByteBuffer.getLong(HeapByteBuffer.java:387)
   at 
 org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinInterSedesDatum(BinInterSedes.java:829)
   at 
 org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compareBinSedesTuple(BinInterSedes.java:732)
   at 
 org.apache.pig.data.BinInterSedes$BinInterSedesTupleRawComparator.compare(BinInterSedes.java:695)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigSecondaryKeyComparator.compare(PigSecondaryKeyComparator.java:78)
   at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
   at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:139)
   at 
 org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
   at 
 org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
   at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
   at org.apache.hadoop.mapred.ReduceTask$4.next(ReduceTask.java:625)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:117)
   at 
 org.apache.hadoop.mapreduce.ReduceContext.nextKey(ReduceContext.java:92)
   at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:175)
   at 
 org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
   at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

2012-10-19 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480247#comment-13480247
 ] 

Gianmarco De Francisci Morales commented on PIG-2975:
-

Hi,
We use ByteBuffer in the comparator for convenience.

However, I don't think we should really compare the 6 minutes of the incorrect 
version with the 10 minutes of the correct version too much.
IMHO correctness is more important than performance.
The slowness is due to the fact that we need to unnest the ByteArray from the 
Tuple and that we are using a Tuple to store any kind of data.

That said, BinInterSedes.BinInterSedesRawComparator is meant for performance, 
so if there is a way to make it faster it's more than welcome.
My guess is that it won't be easy to recover the original speed.
I would suggest to profile the code with some micro benchmark to see where the 
time is spent.


 TestTypedMap.testOrderBy failing with incorrect result 
 ---

 Key: PIG-2975
 URL: https://issues.apache.org/jira/browse/PIG-2975
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.11
Reporter: Koji Noguchi
Assignee: Koji Noguchi
Priority: Blocker
 Fix For: 0.11

 Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, 
 pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, 
 pig-2975-trunk_v03-unionapproach.txt


 Looked at 
 {noformat}
 junit.framework.AssertionFailedError
 at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
 {noformat}
 This looks like a valid test case failing with incorrect result.
 {noformat}
 % cat test/orderby.txt
 [key#1,key9#23]
 [key#3,key3#2]
 [key#22]
 % cat test/orderby.pig
 a = load 'test/orderby.txt' as (m:[]);
 b = foreach a generate m#'key' as b0;
 dump b;
 c = order b by b0;
 dump c;
 % java ... org.apache.pig.Main-x local test/orderby.pig 
 [dump b]
 (1)
 (3)
 (22)
 ...
 [dump c]
 (1)
 (1)
 (22)
 %
 where did the '(3)' go?
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

2012-10-19 Thread Gianmarco De Francisci Morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianmarco De Francisci Morales reassigned PIG-2975:
---

Assignee: Gianmarco De Francisci Morales  (was: Koji Noguchi)

 TestTypedMap.testOrderBy failing with incorrect result 
 ---

 Key: PIG-2975
 URL: https://issues.apache.org/jira/browse/PIG-2975
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.11
Reporter: Koji Noguchi
Assignee: Gianmarco De Francisci Morales
Priority: Blocker
 Fix For: 0.11

 Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, 
 pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, 
 pig-2975-trunk_v03-unionapproach.txt


 Looked at 
 {noformat}
 junit.framework.AssertionFailedError
 at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
 {noformat}
 This looks like a valid test case failing with incorrect result.
 {noformat}
 % cat test/orderby.txt
 [key#1,key9#23]
 [key#3,key3#2]
 [key#22]
 % cat test/orderby.pig
 a = load 'test/orderby.txt' as (m:[]);
 b = foreach a generate m#'key' as b0;
 dump b;
 c = order b by b0;
 dump c;
 % java ... org.apache.pig.Main-x local test/orderby.pig 
 [dump b]
 (1)
 (3)
 (22)
 ...
 [dump c]
 (1)
 (1)
 (22)
 %
 where did the '(3)' go?
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

2012-10-19 Thread Gianmarco De Francisci Morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianmarco De Francisci Morales updated PIG-2975:


Assignee: Koji Noguchi  (was: Gianmarco De Francisci Morales)

 TestTypedMap.testOrderBy failing with incorrect result 
 ---

 Key: PIG-2975
 URL: https://issues.apache.org/jira/browse/PIG-2975
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.11
Reporter: Koji Noguchi
Assignee: Koji Noguchi
Priority: Blocker
 Fix For: 0.11

 Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, 
 pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, 
 pig-2975-trunk_v03-unionapproach.txt


 Looked at 
 {noformat}
 junit.framework.AssertionFailedError
 at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
 {noformat}
 This looks like a valid test case failing with incorrect result.
 {noformat}
 % cat test/orderby.txt
 [key#1,key9#23]
 [key#3,key3#2]
 [key#22]
 % cat test/orderby.pig
 a = load 'test/orderby.txt' as (m:[]);
 b = foreach a generate m#'key' as b0;
 dump b;
 c = order b by b0;
 dump c;
 % java ... org.apache.pig.Main-x local test/orderby.pig 
 [dump b]
 (1)
 (3)
 (22)
 ...
 [dump c]
 (1)
 (1)
 (22)
 %
 where did the '(3)' go?
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

2012-10-19 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480309#comment-13480309
 ] 

Gianmarco De Francisci Morales commented on PIG-2975:
-

Personally I don't care about byte order, it has no definite semantic already.
However, by including the 4 bytes in the comparison I am afraid we are exposing 
ourselves to further bugs when the serialization format changes.

 TestTypedMap.testOrderBy failing with incorrect result 
 ---

 Key: PIG-2975
 URL: https://issues.apache.org/jira/browse/PIG-2975
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.11
Reporter: Koji Noguchi
Assignee: Koji Noguchi
Priority: Blocker
 Fix For: 0.11

 Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, 
 pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, 
 pig-2975-trunk_v03-unionapproach.txt


 Looked at 
 {noformat}
 junit.framework.AssertionFailedError
 at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
 {noformat}
 This looks like a valid test case failing with incorrect result.
 {noformat}
 % cat test/orderby.txt
 [key#1,key9#23]
 [key#3,key3#2]
 [key#22]
 % cat test/orderby.pig
 a = load 'test/orderby.txt' as (m:[]);
 b = foreach a generate m#'key' as b0;
 dump b;
 c = order b by b0;
 dump c;
 % java ... org.apache.pig.Main-x local test/orderby.pig 
 [dump b]
 (1)
 (3)
 (22)
 ...
 [dump c]
 (1)
 (1)
 (22)
 %
 where did the '(3)' go?
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

2012-10-19 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480373#comment-13480373
 ] 

Gianmarco De Francisci Morales commented on PIG-2975:
-

Yes, I was referring to the last alternative.

If the serialization format changes (say we redefine the codes for TINYTUPLE) 
then we end up with a new order for tuples.
As I said ByteArray sorting does not have a definite semantic, but I feel that 
it would be good to keep it stable across releases, if possible.

 TestTypedMap.testOrderBy failing with incorrect result 
 ---

 Key: PIG-2975
 URL: https://issues.apache.org/jira/browse/PIG-2975
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.11
Reporter: Koji Noguchi
Assignee: Koji Noguchi
Priority: Blocker
 Fix For: 0.11

 Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, 
 pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, 
 pig-2975-trunk_v03-unionapproach.txt, pig-2975-trunk_v04-purerawcompare.txt


 Looked at 
 {noformat}
 junit.framework.AssertionFailedError
 at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
 {noformat}
 This looks like a valid test case failing with incorrect result.
 {noformat}
 % cat test/orderby.txt
 [key#1,key9#23]
 [key#3,key3#2]
 [key#22]
 % cat test/orderby.pig
 a = load 'test/orderby.txt' as (m:[]);
 b = foreach a generate m#'key' as b0;
 dump b;
 c = order b by b0;
 dump c;
 % java ... org.apache.pig.Main-x local test/orderby.pig 
 [dump b]
 (1)
 (3)
 (22)
 ...
 [dump c]
 (1)
 (1)
 (22)
 %
 where did the '(3)' go?
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2975) TestTypedMap.testOrderBy failing with incorrect result

2012-10-19 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13480393#comment-13480393
 ] 

Gianmarco De Francisci Morales commented on PIG-2975:
-

Indeed, my idea to keep the order stable is only a nice to have.
For sure there is no strict requirement to keep it, so I am OK with foregoing 
it and directly comparing the whole bytes.

Koji, that's a good question. I guess that it could happen if we lose the 
schema during the execution of a plan, e.g. because of a UDF.

 TestTypedMap.testOrderBy failing with incorrect result 
 ---

 Key: PIG-2975
 URL: https://issues.apache.org/jira/browse/PIG-2975
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.11
Reporter: Koji Noguchi
Assignee: Koji Noguchi
Priority: Blocker
 Fix For: 0.11

 Attachments: PIG-2975-0_jco.patch, PIG-2975-0_jco-v2.patch, 
 pig-2975-trunk_v01.txt, pig-2975-trunk_v02-broken.txt, 
 pig-2975-trunk_v03-unionapproach.txt, pig-2975-trunk_v04-purerawcompare.txt


 Looked at 
 {noformat}
 junit.framework.AssertionFailedError
 at org.apache.pig.test.TestTypedMap.testOrderBy(TestTypedMap.java:352)
 {noformat}
 This looks like a valid test case failing with incorrect result.
 {noformat}
 % cat test/orderby.txt
 [key#1,key9#23]
 [key#3,key3#2]
 [key#22]
 % cat test/orderby.pig
 a = load 'test/orderby.txt' as (m:[]);
 b = foreach a generate m#'key' as b0;
 dump b;
 c = order b by b0;
 dump c;
 % java ... org.apache.pig.Main-x local test/orderby.pig 
 [dump b]
 (1)
 (3)
 (22)
 ...
 [dump c]
 (1)
 (1)
 (22)
 %
 where did the '(3)' go?
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-2993) Fix local mode on Hadoop-0.23

2012-10-19 Thread Gianmarco De Francisci Morales (JIRA)
Gianmarco De Francisci Morales created PIG-2993:
---

 Summary: Fix local mode on Hadoop-0.23
 Key: PIG-2993
 URL: https://issues.apache.org/jira/browse/PIG-2993
 Project: Pig
  Issue Type: Sub-task
Reporter: Gianmarco De Francisci Morales


When compiling with -Dhadoopversion=23 and launching Pig in local mode (-x 
local) the shell just fills up with error notifications:
{code}
2012-10-19 15:10:17,360 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
2998: Unhandled internal error. Could not initialize class 
org.apache.pig.tools.pigstats.PigStatsUtil
{code}

Here the stack trace:
{code}
Pig Stack Trace
---
ERROR 2998: Unhandled internal error. 
org/apache/hadoop/mapreduce/task/JobContextImpl

java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/task/JobContextImpl
at 
org.apache.pig.tools.pigstats.PigStatsUtil.clinit(PigStatsUtil.java:54)
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:67)
at org.apache.pig.Main.run(Main.java:538)
at org.apache.pig.Main.main(Main.java:154)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.mapreduce.task.JobContextImpl
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
... 9 more

Pig Stack Trace
---
ERROR 2998: Unhandled internal error. Could not initialize class 
org.apache.pig.tools.pigstats.PigStatsUtil

java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.pig.tools.pigstats.PigStatsUtil
at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:67)
at org.apache.pig.Main.run(Main.java:538)
at org.apache.pig.Main.main(Main.java:154)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2985) TestRank1,2,3 fail with hadoop-2.0.x

2012-10-18 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479213#comment-13479213
 ] 

Gianmarco De Francisci Morales commented on PIG-2985:
-

This is a simple bug fix, should go to both 0.11 and trunk.

 TestRank1,2,3 fail with hadoop-2.0.x
 

 Key: PIG-2985
 URL: https://issues.apache.org/jira/browse/PIG-2985
 Project: Pig
  Issue Type: Sub-task
Reporter: Cheolsoo Park
Assignee: Rohini Palaniswamy
 Fix For: 0.11

 Attachments: PIG-2985.patch


 To reproduce the error, please run:
 {code}
 ant clean test -Dhadoopversion=23 -Dtestcase=TestRank1
 {code}
 This fails with the following error:
 {code}
 Caused by: java.lang.RuntimeException: Error to read counters into Rank 
 operation counterSize 0
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.saveCounters(JobControlCompiler.java:386)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.updateMROpPlan(JobControlCompiler.java:330)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:370)
 at org.apache.pig.PigServer.launchPlan(PigServer.java:1264)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.saveCounters(JobControlCompiler.java:359)
 {code}
 I see the failures with hadoop-2.0.x only.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2985) TestRank1,2,3 fail with hadoop-2.0.x

2012-10-18 Thread Gianmarco De Francisci Morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianmarco De Francisci Morales updated PIG-2985:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

+1, committed to both trunk and 0.11.
Thanks Rohini!

Interestingly, tests with hadoop-2.0 take 1/3 of the time compared to hadoop-1.0

 TestRank1,2,3 fail with hadoop-2.0.x
 

 Key: PIG-2985
 URL: https://issues.apache.org/jira/browse/PIG-2985
 Project: Pig
  Issue Type: Sub-task
Reporter: Cheolsoo Park
Assignee: Rohini Palaniswamy
 Fix For: 0.11

 Attachments: PIG-2985.patch


 To reproduce the error, please run:
 {code}
 ant clean test -Dhadoopversion=23 -Dtestcase=TestRank1
 {code}
 This fails with the following error:
 {code}
 Caused by: java.lang.RuntimeException: Error to read counters into Rank 
 operation counterSize 0
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.saveCounters(JobControlCompiler.java:386)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.updateMROpPlan(JobControlCompiler.java:330)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:370)
 at org.apache.pig.PigServer.launchPlan(PigServer.java:1264)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.saveCounters(JobControlCompiler.java:359)
 {code}
 I see the failures with hadoop-2.0.x only.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: CHANGES.txt in branches

2012-10-18 Thread Gianmarco De Francisci Morales
OK, I fixed a bunch of them.
There was also some misspelling on Pig issue numbers, which made
everything even more confusing :)

Cheers,
--
Gianmarco


On Tue, Oct 16, 2012 at 10:59 PM, Bill Graham billgra...@gmail.com wrote:
 Also guilty as of about 15 minutes ago. I just moved my entry for PIG-2976
 to the Pig 0.11 section on the trunk. Great catch.

 On Tue, Oct 16, 2012 at 9:46 PM, Dmitriy Ryaboy dvrya...@gmail.com wrote:

 Guilty.. I guess we should be putting them under 0.11 in trunk.

 On Tue, Oct 16, 2012 at 8:18 PM, Jonathan Coveney jcove...@gmail.com
 wrote:
  AFAIK (and I don't really know), I thought that if we put it in both,
 that
  it'd go in the pig 11 section in trunk, and if not, we don't.
 
  Is this correct?
 
  Good job noticing this.
 
  2012/10/16 Gianmarco De Francisci Morales g...@apache.org
 
  Hi devs,
 
  I noticed there is a misalignment in CHANGES.txt between 0.11 and trunk.
  It seems some people are putting patches on top in both versions of
  the file, while other are putting changes that get into 0.11 in the
  0.11 section of the trunk file.
 
  Let me show an example:
 
  This is 0.11
 
  Pig Change Log
 
  Release 0.11.0 (unreleased)
 
  INCOMPATIBLE CHANGES
  PIG-1891 Enable StoreFunc to make intelligent decision based on job
  success or failure (initialcontext via gates)
 
  IMPROVEMENTS
  PIG-2947: Documentation for Rank operator (xalan via azaroth)
 
  PIG-2943: DevTests, Refactor Windows checks to use new Util.WINDOWS
  method for code health (jgordon via dvryaboy)
 
  PIG-2794: Pig test: add utils to simplify testing on Windows (jgordon
 via
  gates)
 
  PIG-2908: Fix unit tests to work with jdk7 (rohini via dvryaboy)
 
  PIG-2965: RANDOM should allow seed initialization for ease of testing
  (jcoveney)
 
  PIG-2964: Add helper method getJobList() to PigStats.JobGraph. Extend
  visibility of couple methods on same class (prkommireddi via
  billgraham)
 
 
 
 
 
 
  And this is trunk:
 
  Pig Change Log
 
  Trunk (unreleased changes)
 
  INCOMPATIBLE CHANGES
 
  IMPROVEMENTS
 
  PIG-2943: DevTests, Refactor Windows checks to use new Util.WINDOWS
  method for code health (jgordon via dvryaboy)
 
  PIG-2966: Test failures on CentOS 6 because MALLOC_ARENA_MAX is not
  set (cheolsoo via sms)
 
  PIG-2793: Pig test: add utils to simplify testing on Windows (jgordon
 via
  gates)
 
  PIG-2908: Fix unit tests to work with jdk7 (rohini via dvryaboy)
 
  OPTIMIZATIONS
 
  BUG FIXES
 
  PIG-2928: Fix e2e test failures in trunk: FilterBoolean_23/24
  (cheolsoo via dvryaboy)
 
  Release 0.11.0 (unreleased)
 
  INCOMPATIBLE CHANGES
  PIG-1891 Enable StoreFunc to make intelligent decision based on job
  success or failure (initialcontext via gates)
 
  IMPROVEMENTS
 
  PIG-2947: Documentation for Rank operator (xalan via azaroth)
 
  PIG-2910: Add function to read schema from outout of Schema.toString()
  (initialcontext via thejas)
 
  PIG-2965: RANDOM should allow seed initialization for ease of testing
  (jcoveney)
 
  PIG-2964: Add helper method getJobList() to PigStats.JobGraph. Extend
  visibility of couple methods on same class (prkommireddi via
  billgraham)
 
 
 
  Notice how PIG-2943, PIG-2793, PIG-2908 are marked as appearing in
  trunk in trunk and in 0.11 in 0.11.
  PIG-2910 is in 0.11 in trunk but not in 0.11 (I guess it is a small
  mistake).
 
 
  So, what's the correct behavior?
  Do we mark a patch in CHANGES.txt at the earliest place it appears in
  the code (so that CHANGES.txt is consistent across releases)?
  Or do we treat the branches independently, and thus we put each patch
  always at the top?
 
  Personally, I put PIG-2947 in the 0.11 section in trunk, but I don't
  have a strong opinion on it (as long as we are consistent).
 
  Cheers,
  --
  Gianmarco
 




 --
 *Note that I'm no longer using my Yahoo! email address. Please email me at
 billgra...@gmail.com going forward.*


[jira] [Resolved] (PIG-2947) Documentation for Rank operator

2012-10-18 Thread Gianmarco De Francisci Morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianmarco De Francisci Morales resolved PIG-2947.
-

Resolution: Fixed

 Documentation for Rank operator
 ---

 Key: PIG-2947
 URL: https://issues.apache.org/jira/browse/PIG-2947
 Project: Pig
  Issue Type: Improvement
Reporter: Allan Avendaño
Assignee: Allan Avendaño
Priority: Trivial
  Labels: documentation
 Fix For: 0.11

 Attachments: patch_01, patch_02, patch_03


 User documentation for recently released Rank operator, with some basic 
 explanation of usage and examples

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2922) Documentation and examples for RANK

2012-10-18 Thread Gianmarco De Francisci Morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianmarco De Francisci Morales resolved PIG-2922.
-

Resolution: Duplicate

 Documentation and examples for RANK
 ---

 Key: PIG-2922
 URL: https://issues.apache.org/jira/browse/PIG-2922
 Project: Pig
  Issue Type: Improvement
  Components: documentation
Reporter: Gianmarco De Francisci Morales
Assignee: Allan Avendaño
  Labels: documentation

 We need documentation and examples for the newly introduced RANK command.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2985) TestRank1,2,3 fail with hadoop-2.0.x

2012-10-16 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477253#comment-13477253
 ] 

Gianmarco De Francisci Morales commented on PIG-2985:
-

Did hadoop-2.0 change the way to access counters?

 TestRank1,2,3 fail with hadoop-2.0.x
 

 Key: PIG-2985
 URL: https://issues.apache.org/jira/browse/PIG-2985
 Project: Pig
  Issue Type: Sub-task
Reporter: Cheolsoo Park
 Fix For: 0.11


 To reproduce the error, please run:
 {code}
 ant clean test -Dhadoopversion=23 -Dtestcase=TestRank1
 {code}
 This fails with the following error:
 {code}
 Caused by: java.lang.RuntimeException: Error to read counters into Rank 
 operation counterSize 0
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.saveCounters(JobControlCompiler.java:386)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.updateMROpPlan(JobControlCompiler.java:330)
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:370)
 at org.apache.pig.PigServer.launchPlan(PigServer.java:1264)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler.saveCounters(JobControlCompiler.java:359)
 {code}
 I see the failures with hadoop-2.0.x only.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2947) Documentation for Rank operator

2012-10-16 Thread Gianmarco De Francisci Morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianmarco De Francisci Morales updated PIG-2947:


Attachment: patch_03

Thanks Allan, the documentation looks good!
Attaching a version with some minor changes.

Will commit to both trunk and 0.11 branch.

 Documentation for Rank operator
 ---

 Key: PIG-2947
 URL: https://issues.apache.org/jira/browse/PIG-2947
 Project: Pig
  Issue Type: Improvement
Reporter: Allan Avendaño
Assignee: Allan Avendaño
Priority: Trivial
  Labels: documentation
 Fix For: 0.11

 Attachments: patch_01, patch_02, patch_03


 User documentation for recently released Rank operator, with some basic 
 explanation of usage and examples

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2947) Documentation for Rank operator

2012-10-16 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13477544#comment-13477544
 ] 

Gianmarco De Francisci Morales commented on PIG-2947:
-

+1
Committed to both trunk and 0.11 branch.

 Documentation for Rank operator
 ---

 Key: PIG-2947
 URL: https://issues.apache.org/jira/browse/PIG-2947
 Project: Pig
  Issue Type: Improvement
Reporter: Allan Avendaño
Assignee: Allan Avendaño
Priority: Trivial
  Labels: documentation
 Fix For: 0.11

 Attachments: patch_01, patch_02, patch_03


 User documentation for recently released Rank operator, with some basic 
 explanation of usage and examples

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


CHANGES.txt in branches

2012-10-16 Thread Gianmarco De Francisci Morales
Hi devs,

I noticed there is a misalignment in CHANGES.txt between 0.11 and trunk.
It seems some people are putting patches on top in both versions of
the file, while other are putting changes that get into 0.11 in the
0.11 section of the trunk file.

Let me show an example:

This is 0.11

Pig Change Log

Release 0.11.0 (unreleased)

INCOMPATIBLE CHANGES
PIG-1891 Enable StoreFunc to make intelligent decision based on job
success or failure (initialcontext via gates)

IMPROVEMENTS
PIG-2947: Documentation for Rank operator (xalan via azaroth)

PIG-2943: DevTests, Refactor Windows checks to use new Util.WINDOWS
method for code health (jgordon via dvryaboy)

PIG-2794: Pig test: add utils to simplify testing on Windows (jgordon via gates)

PIG-2908: Fix unit tests to work with jdk7 (rohini via dvryaboy)

PIG-2965: RANDOM should allow seed initialization for ease of testing (jcoveney)

PIG-2964: Add helper method getJobList() to PigStats.JobGraph. Extend
visibility of couple methods on same class (prkommireddi via
billgraham)






And this is trunk:

Pig Change Log

Trunk (unreleased changes)

INCOMPATIBLE CHANGES

IMPROVEMENTS

PIG-2943: DevTests, Refactor Windows checks to use new Util.WINDOWS
method for code health (jgordon via dvryaboy)

PIG-2966: Test failures on CentOS 6 because MALLOC_ARENA_MAX is not
set (cheolsoo via sms)

PIG-2793: Pig test: add utils to simplify testing on Windows (jgordon via gates)

PIG-2908: Fix unit tests to work with jdk7 (rohini via dvryaboy)

OPTIMIZATIONS

BUG FIXES

PIG-2928: Fix e2e test failures in trunk: FilterBoolean_23/24
(cheolsoo via dvryaboy)

Release 0.11.0 (unreleased)

INCOMPATIBLE CHANGES
PIG-1891 Enable StoreFunc to make intelligent decision based on job
success or failure (initialcontext via gates)

IMPROVEMENTS

PIG-2947: Documentation for Rank operator (xalan via azaroth)

PIG-2910: Add function to read schema from outout of Schema.toString()
(initialcontext via thejas)

PIG-2965: RANDOM should allow seed initialization for ease of testing (jcoveney)

PIG-2964: Add helper method getJobList() to PigStats.JobGraph. Extend
visibility of couple methods on same class (prkommireddi via
billgraham)



Notice how PIG-2943, PIG-2793, PIG-2908 are marked as appearing in
trunk in trunk and in 0.11 in 0.11.
PIG-2910 is in 0.11 in trunk but not in 0.11 (I guess it is a small mistake).


So, what's the correct behavior?
Do we mark a patch in CHANGES.txt at the earliest place it appears in
the code (so that CHANGES.txt is consistent across releases)?
Or do we treat the branches independently, and thus we put each patch
always at the top?

Personally, I put PIG-2947 in the 0.11 section in trunk, but I don't
have a strong opinion on it (as long as we are consistent).

Cheers,
--
Gianmarco


[jira] [Commented] (PIG-2970) Nested foreach getting incorrect schema when having unrelated inner query

2012-10-12 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475423#comment-13475423
 ] 

Gianmarco De Francisci Morales commented on PIG-2970:
-

Haven't had time to look at the patch, but I guess it is related to PIG-2119.
I thought we had solved it though.

 Nested foreach getting incorrect schema when having unrelated inner query
 -

 Key: PIG-2970
 URL: https://issues.apache.org/jira/browse/PIG-2970
 Project: Pig
  Issue Type: Bug
  Components: parser
Reporter: Koji Noguchi
Assignee: Koji Noguchi
Priority: Minor
 Attachments: pig-2970-trunk-v01.txt


 While looking at PIG-2968, hit a weird error message.
 {noformat}
 $ cat -n test/foreach2.pig
  1  daily = load 'nyse' as (exchange, symbol);
  2  grpd = group daily by exchange;
  3  unique = foreach grpd {
  4  sym = daily.symbol;
  5  uniq_sym = distinct sym;
  6  --ignoring uniq_sym result
  7  generate group, daily;
  8  };
  9  describe unique;
 10  zzz = foreach unique generate group;
 11  explain zzz;
 % pig -x local -t ColumnMapKeyPrune test/foreach2.pig
 ...
 unique: {symbol: bytearray}
 2012-10-12 16:55:44,226 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1025: 
 file test/foreach2.pig, line 10, column 30 Invalid field projection. 
 Projected field [group] does not exist in schema: symbol:bytearray.
 ...
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Pig 0.11

2012-10-11 Thread Gianmarco De Francisci Morales
We are missing some documentation on the RANK but I guess we could add that
to the branch and trunk in parallel.
All the patches I was keeping an eye on are in.

So +1 for me.
--
Gianmarco



On Wed, Oct 10, 2012 at 5:31 PM, Jonathan Coveney jcove...@gmail.comwrote:

 I think all of the major patches are in, no? Now it's just bug testing?
 Just wanted to touch base on where we are at with this.



Re: Pig 0.11

2012-10-11 Thread Gianmarco De Francisci Morales
I added it as a dependency as it has already its own Jira.
I hope it is OK.

Cheers,
--
Gianmarco



On Wed, Oct 10, 2012 at 11:23 PM, Bill Graham billgra...@gmail.com wrote:

 +1 for me.

 There's https://issues.apache.org/jira/browse/PIG-2756 which tracks a few
 documentation issues that should block Pig 0.11, but they can also be done
 on the trunk and merged to the branch. Gianmarco, you can add a rank
 subtask there to serve as a reminder.


 On Wed, Oct 10, 2012 at 11:03 PM, Gianmarco De Francisci Morales 
 g...@apache.org wrote:

  We are missing some documentation on the RANK but I guess we could add
 that
  to the branch and trunk in parallel.
  All the patches I was keeping an eye on are in.
 
  So +1 for me.
  --
  Gianmarco
 
 
 
  On Wed, Oct 10, 2012 at 5:31 PM, Jonathan Coveney jcove...@gmail.com
  wrote:
 
   I think all of the major patches are in, no? Now it's just bug testing?
   Just wanted to touch base on where we are at with this.
  
 



 --
 *Note that I'm no longer using my Yahoo! email address. Please email me at
 billgra...@gmail.com going forward.*



[jira] [Updated] (PIG-2941) Ivy resolvers in pig don't have consistent chaining and don't have a kitchen sink option for novices

2012-10-09 Thread Gianmarco De Francisci Morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianmarco De Francisci Morales updated PIG-2941:


Assignee: John Gordon
  Status: Open  (was: Patch Available)

Can you regenerate the patch with --no-prefix option and without the email 
headers?

Also, can you explain why maven2 is both on internal and external resolver?

Canceling patch for the moment.

 Ivy resolvers in pig don't have consistent chaining and don't have a kitchen 
 sink option for novices
 

 Key: PIG-2941
 URL: https://issues.apache.org/jira/browse/PIG-2941
 Project: Pig
  Issue Type: Bug
  Components: build
Affects Versions: 0.10.0
Reporter: John Gordon
Assignee: John Gordon
 Fix For: 0.10.0

 Attachments: 
 0001-IvySettings.xml-refactor-to-simplify-resolution.patch


 The Ivy resolvers in Pig are split into default, external, and internal -- 
 and they are all actually distinct.  There isn't a resolver that rolls over 
 all three, and fallbacks aren't in place.  Ideally, these resolver should 
 chain right through with the default following a best practice fallback for 
 novices.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2947) Documentation for Rank operator

2012-10-08 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471580#comment-13471580
 ] 

Gianmarco De Francisci Morales commented on PIG-2947:
-

There is no single place where all the reserved keywords are collected AFAIK.
If we decide to create one, it should be automated.

However, on the rank issues, I have used 'count' as a column name in many of my 
Pig scripts and it never blew up. I guess the situation is similar.

 Documentation for Rank operator
 ---

 Key: PIG-2947
 URL: https://issues.apache.org/jira/browse/PIG-2947
 Project: Pig
  Issue Type: Improvement
Reporter: Allan Avendaño
Assignee: Allan Avendaño
Priority: Trivial
  Labels: documentation
 Attachments: patch_01


 User documentation for recently released Rank operator, with some basic 
 explanation of usage and examples

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2946) Documentation of history and clear commands

2012-10-07 Thread Gianmarco De Francisci Morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianmarco De Francisci Morales updated PIG-2946:


Attachment: patch_02

I just reworded a couple of sentences.
Apart from that, +1.

Uploading the patch committed to trunk.

Thanks Allan!

 Documentation of history and clear commands 
 

 Key: PIG-2946
 URL: https://issues.apache.org/jira/browse/PIG-2946
 Project: Pig
  Issue Type: Improvement
  Components: documentation
Reporter: Allan Avendaño
Assignee: Allan Avendaño
Priority: Trivial
  Labels: documentation
 Attachments: patch_01, patch_02


 After adding these two commands history and clear to the Pig Grunt Shell, 
 this is a basic user documentation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2946) Documentation of history and clear commands

2012-10-07 Thread Gianmarco De Francisci Morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianmarco De Francisci Morales resolved PIG-2946.
-

   Resolution: Fixed
Fix Version/s: 0.11

 Documentation of history and clear commands 
 

 Key: PIG-2946
 URL: https://issues.apache.org/jira/browse/PIG-2946
 Project: Pig
  Issue Type: Improvement
  Components: documentation
Reporter: Allan Avendaño
Assignee: Allan Avendaño
Priority: Trivial
  Labels: documentation
 Fix For: 0.11

 Attachments: patch_01, patch_02


 After adding these two commands history and clear to the Pig Grunt Shell, 
 this is a basic user documentation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2947) Documentation for Rank operator

2012-10-07 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13471391#comment-13471391
 ] 

Gianmarco De Francisci Morales commented on PIG-2947:
-

There seems to be some imprecision in the Syntax of RANK.
The BY clause should be optional (so use square brackets).
Same for the DENSE option.

The examples look good.
The wording could use some cleaning up to make it more clear (for example using 
less passive voice, clearly stating what RANK is supposed to do).

 Documentation for Rank operator
 ---

 Key: PIG-2947
 URL: https://issues.apache.org/jira/browse/PIG-2947
 Project: Pig
  Issue Type: Improvement
Reporter: Allan Avendaño
Assignee: Allan Avendaño
Priority: Trivial
  Labels: documentation
 Attachments: patch_01


 User documentation for recently released Rank operator, with some basic 
 explanation of usage and examples

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2920) e2e tests override PERL5LIB environment variable

2012-10-02 Thread Gianmarco De Francisci Morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianmarco De Francisci Morales updated PIG-2920:


   Resolution: Fixed
Fix Version/s: 0.11
   Status: Resolved  (was: Patch Available)

 e2e tests override PERL5LIB environment variable
 

 Key: PIG-2920
 URL: https://issues.apache.org/jira/browse/PIG-2920
 Project: Pig
  Issue Type: Bug
Reporter: Gianmarco De Francisci Morales
Assignee: Gianmarco De Francisci Morales
Priority: Minor
 Fix For: 0.11

 Attachments: PIG-2920.2.patch, PIG-2920.patch


 I am not sure why but e2e tests set PERL5LIB like this:
 {code}
 env key=PERL5LIB value=./libexec/
 {code}
 This overrides any env variable, so there is no way to use custom Perl 
 installations.
 This patch just removes the line, thus we will rely on the user to configure 
 PERL5LIB appropriately.
 With this modification I am able to use my custom Perl installation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2920) e2e tests override PERL5LIB environment variable

2012-09-28 Thread Gianmarco De Francisci Morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianmarco De Francisci Morales updated PIG-2920:


Attachment: PIG-2920.2.patch

Addressed the comments by Rohini.

Now the user can set the property harness.PERL5LIB to control the PERL5LIB 
environment variable in the tests.

 e2e tests override PERL5LIB environment variable
 

 Key: PIG-2920
 URL: https://issues.apache.org/jira/browse/PIG-2920
 Project: Pig
  Issue Type: Bug
Reporter: Gianmarco De Francisci Morales
Assignee: Gianmarco De Francisci Morales
Priority: Minor
 Attachments: PIG-2920.2.patch, PIG-2920.patch


 I am not sure why but e2e tests set PERL5LIB like this:
 {code}
 env key=PERL5LIB value=./libexec/
 {code}
 This overrides any env variable, so there is no way to use custom Perl 
 installations.
 This patch just removes the line, thus we will rely on the user to configure 
 PERL5LIB appropriately.
 With this modification I am able to use my custom Perl installation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2932) Setting high default_parallel causes IOException in local mode

2012-09-27 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13465232#comment-13465232
 ] 

Gianmarco De Francisci Morales commented on PIG-2932:
-

Cheolsoo, thanks for the explanation.
Now it is more clear.

I agree with your proposals.
Will test and commit the patch tomorrow.

 Setting high default_parallel causes IOException in local mode
 --

 Key: PIG-2932
 URL: https://issues.apache.org/jira/browse/PIG-2932
 Project: Pig
  Issue Type: Bug
Reporter: Gianmarco De Francisci Morales
Priority: Critical
 Attachments: PIG-2932.patch


 This bug has been confirmed only in local mode.
 When setting a high default_parallel, Pig fails on some operations.
 The following data and script reproduce the bug.
 Data:
 {code}
 grunt cat file.txt  
 111   qwer
 122   qwerty
 133   ert
 133   ertyu
 144   zxcv
 166   fsdfg
 166   fdfghj
 188   fjklopi
 {code}
 Script:
 {code}
 SET default_parallel 9
 a = load 'file.txt' as (id1:int, id2:int, str:chararray);
 b = group a by (id1,id2);
 c = foreach b generate flatten(group), a;
 d = order c by group::id1 ASC, group::id2 ASC;
 dump d
 {code}
 Error:
 {code}
 2012-09-26 15:28:13,230 [Thread-32] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map
  - Aliases being processed per job phase (AliasName[line,offset]): M: d[12,4] 
 C:  R: 
 2012-09-26 15:28:13,232 [Thread-32] WARN  
 org.apache.hadoop.mapred.LocalJobRunner - job_local_0009
 java.io.IOException: Illegal partition for Null: false index: 0 (12,2) (1)
   at 
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073)
   at 
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
   at 
 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:123)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
   at 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
   at 
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)
 {code}
 The script succeeds if default_parallel is set to 2.
 I guess it depends on the fact that the default_parallel is higher than the 
 number of unique keys, probably some quirk with ORDER BY.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: e2e tests for Rank function

2012-09-26 Thread Gianmarco De Francisci Morales
I was able to reproduce the bug, I opened PIG-2932 to track it.

Cheers,
--
Gianmarco



On Wed, Sep 26, 2012 at 12:07 PM, Gianmarco De Francisci Morales 
g...@apache.org wrote:

 Forwarding to pig-dev.

 Summary, it looks like we have a regression in trunk.
 We need to investigate it before branching 0.11

 Cheers,
 --
 Gianmarco



 -- Forwarded message --
 From: Allan aaven...@gmail.com
 Date: Wed, Sep 26, 2012 at 11:21 AM
 Subject: Re: e2e tests for Rank function
 To: cheolsoo cheol...@cloudera.com, Gianmarco De Francisci Morales 
 g...@apache.org


 Hi Cheolsoo and Gianmarco,

 I double check the e2e tests, and I reproduced the scenario and it's
 correct...it's failing.

 Then, looking for a possible reason, I tried the following script:

 SET default_parallel 9;
 A = LOAD 'prerank' using PigStorage(',') as
 (rownumber:long,rankcabd:long,rankbdaa:long,rankbdca:long,rankaacd:long,rankaaba:long,a:int,b:int,c:int,tail:bytearray);
 B = group A by (a, b);
 C = foreach B generate flatten(group),A;
 D = order C by group::a ASC, group::b ASC;


 And it fails, with the same exception' message.

 Then, I tried the same script, but omitting the SET default_parallel 9;
 and it works. So, I'm really surprised that on local mode it doesn't work
 with parallelism.

 The reason for using this script is because RANK (RANK BY) operator uses
 the same chain of operators: GROUP (B), a flatten (C), SORT (D).

 Best regards,

 On Sun, Sep 23, 2012 at 10:43 PM, Cheolsoo Park cheol...@cloudera.comwrote:

 Hello,

 The e2e tests for Rank function in trunk do not pass for me when running
 in
 local mode. I am wondering whether they all pass for everyone.

 What I am doing is as following:

 ant clean
  ant -Dhadoopversion=20 ... test-e2e-deploy-local
 ant -Dhadoopversion=20 ... test-e2e-local -Dtests.to.run=-t Rank

 All tests except Rank_4 fail with errors similar to this:

 java.io.IOException: Illegal partition for Null: false index: 0 (1,7) (1)
 at

 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073)
 at

 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
 at

 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
 at

 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:123)
 at

 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285)
 at

 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
 at

 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
 at
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)

 I wanted to double check whether I am doing something wrong before I open
 a
 jira.

 Thanks,
 Cheolsoo




 --

 Allan Avendaño S.
 Computer Engineer
 SWY22 Participant
 GSOC 2012 Participant
 Rome - Italy
 Gmail: aaven...@gmail.com
 --





[jira] [Updated] (PIG-2353) RANK function like in SQL

2012-09-23 Thread Gianmarco De Francisci Morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianmarco De Francisci Morales updated PIG-2353:


Release Note: 
Pig includes a new RANK operator:
RANK relation ( BY column (ASC|DES)? (DENSE)? )?
This operator prepends a consecutive integer to each tuple in the relation 
starting from 1.
If the BY clause is present, RANK sorts the relation before ranking it, 
otherwise it uses the order in which it receives the relation (e.g. the order 
in which the relation is stored if RANK is performed right after a LOAD).
The DENSE modifier produces a dense rank, which has no gaps in it regardless of 
ties.



  was:
Pig includes a new RANK operator:
RANK relation ( BY column (ASC|DES)? )?
This operator prepends a consecutive integer to each tuple in the relation 
starting from 1.
If the BY clause is present, RANK sorts the relation before ranking it, 
otherwise it uses the order in which it receives the relation (e.g. the order 
in which the relation is stored if RANK is performed right after a LOAD).




 RANK function like in SQL
 -

 Key: PIG-2353
 URL: https://issues.apache.org/jira/browse/PIG-2353
 Project: Pig
  Issue Type: New Feature
Reporter: Gianmarco De Francisci Morales
Assignee: Allan Avendaño
  Labels: gsoc2012, mentor
 Fix For: 0.11

 Attachments: PIG-2353-2, PIG-2353-3.txt, PIG-2353-4.txt, 
 PIG-2353-5.txt, PIG2353.patch


 Implement a function that given a (sorted) bag adds to each tuple a unique, 
 increasing identifier without gaps, like what RANK does for SQL.
 This is a candidate project for Google summer of code 2012. More information 
 about the program can be found at 
 https://cwiki.apache.org/confluence/display/PIG/GSoc2012
 Functionality implemented so far, is available at 
 https://reviews.apache.org/r/5523/diff/#index_header

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: e2e tests for Rank function

2012-09-23 Thread Gianmarco De Francisci Morales
Hi,

Weird, they should be passing.
I will double check them tomorrow.

Cheers,
--
Gianmarco



On Sun, Sep 23, 2012 at 10:43 PM, Cheolsoo Park cheol...@cloudera.comwrote:

 Hello,

 The e2e tests for Rank function in trunk do not pass for me when running in
 local mode. I am wondering whether they all pass for everyone.

 What I am doing is as following:

 ant clean
  ant -Dhadoopversion=20 ... test-e2e-deploy-local
 ant -Dhadoopversion=20 ... test-e2e-local -Dtests.to.run=-t Rank

 All tests except Rank_4 fail with errors similar to this:

 java.io.IOException: Illegal partition for Null: false index: 0 (1,7) (1)
 at
 org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1073)
 at
 org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:691)
 at

 org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
 at

 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:123)
 at

 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:285)
 at

 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:278)
 at

 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
 at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
 at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
 at
 org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)

 I wanted to double check whether I am doing something wrong before I open a
 jira.

 Thanks,
 Cheolsoo



[jira] [Updated] (PIG-2879) Pig current releases lack a UDF startsWith.This UDF tests if a given string starts with the specified prefix.

2012-09-21 Thread Gianmarco De Francisci Morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianmarco De Francisci Morales updated PIG-2879:


   Resolution: Fixed
Fix Version/s: 0.11
 Release Note: Pig now includes a STARTSWITH built-in UDF that checks for 
presence of a given prefix in a chararray.
   Status: Resolved  (was: Patch Available)

+1
Committed to trunk.
Thanks Eli!

 Pig current releases lack a UDF startsWith.This UDF tests if a given string 
 starts with the specified prefix. 
 --

 Key: PIG-2879
 URL: https://issues.apache.org/jira/browse/PIG-2879
 Project: Pig
  Issue Type: New Feature
  Components: piggybank
Affects Versions: 0.10.0
Reporter: Anuroopa George
Assignee: Eli Reisman
  Labels: features, patch
 Fix For: 0.11

 Attachments: PIG-2879-1.patch, PIG-2879-2.patch, PIG-2879-3.patch, 
 PIG-2879-4.patch


 Pig current releases lack a UDF startsWith.This UDF tests if a given string 
 starts with the specified prefix.This UDF returns true if the character 
 sequence represented by the string argument given as a prefix is a prefix of 
 the character sequence represented by the given string; false otherwise.Also  
 true will be returned if the given prefix is an empty string or is equal to 
 the given String.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


  1   2   3   4   >