[jira] [Updated] (PIG-2968) ColumnMapKeyPrune fails to prune a subtree inside foreach

2012-10-29 Thread Koji Noguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-2968:
--

Fix Version/s: 0.11

I'd like to see this fixed in 0.11.

 ColumnMapKeyPrune fails to prune a subtree inside foreach
 -

 Key: PIG-2968
 URL: https://issues.apache.org/jira/browse/PIG-2968
 Project: Pig
  Issue Type: Bug
  Components: parser
Reporter: Koji Noguchi
Assignee: Koji Noguchi
Priority: Minor
 Fix For: 0.11

 Attachments: pig-2968-trunk_v01.txt, pig-2968-trunk_v02.txt


 Sample code 
 {noformat}
 $ cat test/foreach.pig 
 daily = load 'nyse' as (exchange, symbol);
 grpd = group daily by exchange;
 uniquecnt = foreach grpd {
 sym = daily.symbol;
 uniq_sym = distinct sym;
 generate group, uniq_sym;
 };
 another = FOREACH uniquecnt GENERATE group;
 explain another;
 {noformat}
 This breaks when it tries to prune uniq_sym-sym-innerload_daily
 bq. 2012-10-12 14:54:11,031 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
 ERROR 2000: Error processing rule ColumnMapKeyPrune. Try -t ColumnMapKeyPrune

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3008) Fix whitespace in Pig code

2012-10-29 Thread Julien Le Dem (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486153#comment-13486153
 ] 

Julien Le Dem commented on PIG-3008:


I support a whitespace policy as defined by Cheolsoo above : No tabs, no 
trailing whitespace, 4 spaces. As RB as a hide whitespace changes feature we 
should just use it more.


 Fix whitespace in Pig code
 --

 Key: PIG-3008
 URL: https://issues.apache.org/jira/browse/PIG-3008
 Project: Pig
  Issue Type: Improvement
Reporter: Jonathan Coveney
 Fix For: 0.12

 Attachments: checkstyle.xml


 This JIRA exists mainly to get a conversation started. We've talked about it 
 before, and it's a tricky issue. That said, some of the Pig code is super, 
 super gnarly. We need some sort of path that will let it eventually be 
 fix-able.
 I posit: any file that hasn't been touched for over 6 months is eligible for 
 a whitespace patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3008) Fix whitespace in Pig code

2012-10-29 Thread Jonathan Coveney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486160#comment-13486160
 ] 

Jonathan Coveney commented on PIG-3008:
---

I agree that we should make sure that all new code passes a checkstyle (and we 
should have a checkstyle.xml and we should enforce it). But when will be able 
to update old files?

 Fix whitespace in Pig code
 --

 Key: PIG-3008
 URL: https://issues.apache.org/jira/browse/PIG-3008
 Project: Pig
  Issue Type: Improvement
Reporter: Jonathan Coveney
 Fix For: 0.12

 Attachments: checkstyle.xml


 This JIRA exists mainly to get a conversation started. We've talked about it 
 before, and it's a tricky issue. That said, some of the Pig code is super, 
 super gnarly. We need some sort of path that will let it eventually be 
 fix-able.
 I posit: any file that hasn't been touched for over 6 months is eligible for 
 a whitespace patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2970) Nested foreach getting incorrect schema when having unrelated inner query

2012-10-29 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-2970:


Affects Version/s: 0.10.0
Fix Version/s: 0.12
   0.11

 Nested foreach getting incorrect schema when having unrelated inner query
 -

 Key: PIG-2970
 URL: https://issues.apache.org/jira/browse/PIG-2970
 Project: Pig
  Issue Type: Bug
  Components: parser
Affects Versions: 0.10.0
Reporter: Koji Noguchi
Assignee: Koji Noguchi
Priority: Minor
 Fix For: 0.11, 0.12

 Attachments: pig-2970-trunk-v01.txt


 While looking at PIG-2968, hit a weird error message.
 {noformat}
 $ cat -n test/foreach2.pig
  1  daily = load 'nyse' as (exchange, symbol);
  2  grpd = group daily by exchange;
  3  unique = foreach grpd {
  4  sym = daily.symbol;
  5  uniq_sym = distinct sym;
  6  --ignoring uniq_sym result
  7  generate group, daily;
  8  };
  9  describe unique;
 10  zzz = foreach unique generate group;
 11  explain zzz;
 % pig -x local -t ColumnMapKeyPrune test/foreach2.pig
 ...
 unique: {symbol: bytearray}
 2012-10-12 16:55:44,226 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1025: 
 file test/foreach2.pig, line 10, column 30 Invalid field projection. 
 Projected field [group] does not exist in schema: symbol:bytearray.
 ...
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3008) Fix whitespace in Pig code

2012-10-29 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486221#comment-13486221
 ] 

Prashant Kommireddi commented on PIG-3008:
--

Here's a formatter that hbase project uses 
http://svn.apache.org/repos/asf/hbase/trunk/dev-support/hbase_eclipse_formatter.xml

 Fix whitespace in Pig code
 --

 Key: PIG-3008
 URL: https://issues.apache.org/jira/browse/PIG-3008
 Project: Pig
  Issue Type: Improvement
Reporter: Jonathan Coveney
 Fix For: 0.12

 Attachments: checkstyle.xml


 This JIRA exists mainly to get a conversation started. We've talked about it 
 before, and it's a tricky issue. That said, some of the Pig code is super, 
 super gnarly. We need some sort of path that will let it eventually be 
 fix-able.
 I posit: any file that hasn't been touched for over 6 months is eligible for 
 a whitespace patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3014) CurrentTime() UDF has undesirable characteristics

2012-10-29 Thread Jonathan Coveney (JIRA)
Jonathan Coveney created PIG-3014:
-

 Summary: CurrentTime() UDF has undesirable characteristics
 Key: PIG-3014
 URL: https://issues.apache.org/jira/browse/PIG-3014
 Project: Pig
  Issue Type: Bug
Reporter: Jonathan Coveney


As part of the explanation of the new DateTime datatype I noticed that we had 
added a CurrentTime() UDF. The issue with this UDF is that it returns the 
current time _of every exec invocation_, which can lead to confusing results. 
In PIG-1431 I proposed a way such that every instance of the same NOW() will 
return the same time, which I think is better. Would enjoy thoughts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2924) PigStats should not be assuming all Storage classes to be file-based storage

2012-10-29 Thread Bill Graham (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Graham updated PIG-2924:
-

Status: Open  (was: Patch Available)

This looks great, thanks for taking this one. I think we need to make a few 
changes to the pattern used PIG-2574 though, because we could have a case where 
we have multiple store funcs that each write to a different data source.

* Instead of registering a single new computer it would be ideal if we could 
register a list of computers.
* Each computer could have a {{boolean supports(POStore poStore)}} method that 
returns whether this class supports a given POStore. This can often be done by 
inspecting the output path. A default URI-based abstract class could help with 
that part.
* The computers would then be consulted in order, where the first to support 
the POStore wins.
* If a computer can't determine a size for some reason (i.e., it doesn't 
support it or an exception occurred), it shouldn't return 0. Instead maybe we 
reserve -1 for this case and document it as such. 
* Having the word Computer in the interface name and configs could cause 
confusion, due to how it's an overloaded term. I don't have any great 
suggestions though. {{PigStatsOutputSizeReader}}?

Thoughts? 
 


 PigStats should not be assuming all Storage classes to be file-based storage
 

 Key: PIG-2924
 URL: https://issues.apache.org/jira/browse/PIG-2924
 Project: Pig
  Issue Type: Bug
  Components: tools
Affects Versions: 0.10.0, 0.9.2
Reporter: Harsh J
Assignee: Cheolsoo Park
 Attachments: PIG-2924-2.patch, PIG-2924.patch


 Using PigStatsUtil (like Oozie does) to collect JobStats for jobs that use a 
 HBaseStorage blows up when the stats are asked to be accumulated.
 This is because JobStats (which adds stuff up) is assuming all storages are 
 file based and that it can do listStatus/etc. operations on their 
 filespec-provided filename. For HBaseStorage, this is set to the tablename 
 and there's no such file, leading to an exception (FileNotFound or Invalid 
 URI - depending on using 'tablename' or 'hbase://tablename').

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3008) Fix whitespace in Pig code

2012-10-29 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486237#comment-13486237
 ] 

Daniel Dai commented on PIG-3008:
-

Agree with Cheolsoo, we shall start with a limited scope. A giant patch would 
will make code tracing (svn blame) harder. Adding to the list:
- Use Unix new line style

 Fix whitespace in Pig code
 --

 Key: PIG-3008
 URL: https://issues.apache.org/jira/browse/PIG-3008
 Project: Pig
  Issue Type: Improvement
Reporter: Jonathan Coveney
 Fix For: 0.12

 Attachments: checkstyle.xml


 This JIRA exists mainly to get a conversation started. We've talked about it 
 before, and it's a tricky issue. That said, some of the Pig code is super, 
 super gnarly. We need some sort of path that will let it eventually be 
 fix-able.
 I posit: any file that hasn't been touched for over 6 months is eligible for 
 a whitespace patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [ANNOUNCE] Welcome new Apache Pig Committers Rohini Palaniswamy

2012-10-29 Thread Julien Le Dem
Congrats Rohini !


On Sun, Oct 28, 2012 at 9:42 AM, Bill Graham billgra...@gmail.com wrote:
 Congrats Rohini! Great news indeed.

 On Saturday, October 27, 2012, Jon Coveney wrote:

 Wonderful news!

 On Oct 26, 2012, at 9:51 PM, Gianmarco De Francisci Morales 
 g...@apache.org javascript:; wrote:

  Congratulations Rohini!
  Welcome onboard :)
  --
  Gianmarco
 
 
  On Fri, Oct 26, 2012 at 7:32 PM, Prasanth J 
  buckeye.prasa...@gmail.comjavascript:;
 wrote:
  Congrats Rohini!
 
  Thanks
  -- Prasanth
 
  On Oct 26, 2012, at 10:21 PM, Santhosh Srinivasan 
 santhosh_mut...@yahoo.com javascript:; wrote:
 
  Congrats Rohini! Full speed ahead now :)
 
  On Oct 26, 2012, at 4:37 PM, Daniel Dai 
  da...@hortonworks.comjavascript:;
 wrote:
 
  Here is another Pig committer announcement today. Please welcome
  Rohini Palaniswamy to be a Pig committer!
 
  Thanks,
  Daniel
 



 --
 Sent from Gmail Mobile


[jira] [Updated] (PIG-2409) Tracking URL for hadoop 23 does not show up

2012-10-29 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-2409:


Fix Version/s: (was: 0.11)
   0.12

 Tracking URL for hadoop 23 does not show up
 ---

 Key: PIG-2409
 URL: https://issues.apache.org/jira/browse/PIG-2409
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.9.2, 0.10.0, 0.11
Reporter: Daniel Dai
Assignee: Daniel Dai
Priority: Minor
  Labels: hadoop023
 Fix For: 0.12


 Pig used to show a tracking url for hadoop job:
 More information at: 
 http://localhost:50030/jobdetails.jsp?jobid=job_201112071119_0001
 This information does not show up in hadoop 23.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-3011) Add Cheolsoo to list of Apache Pig committers

2012-10-29 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park resolved PIG-3011.


Resolution: Fixed

 Add Cheolsoo to list of Apache Pig committers
 -

 Key: PIG-3011
 URL: https://issues.apache.org/jira/browse/PIG-3011
 Project: Pig
  Issue Type: Bug
Reporter: Jonathan Coveney
Assignee: Cheolsoo Park
Priority: Critical

 It's always fun for someone's first commit to be adding themselves to this 
 page: http://pig.apache.org/whoweare.html (it's also a good chance to make 
 sure your dev setup is properly configured to allow committing)
 Welcome aboard!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3008) Fix whitespace in Pig code

2012-10-29 Thread Jonathan Coveney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486302#comment-13486302
 ] 

Jonathan Coveney commented on PIG-3008:
---

2 questions:

1. Is there a way to run checkstyle just on the files you changed?
2. Is there a way to make an IDE (or just the commandline) fix the indents for 
you? I can do it by hand but some if it is really bad and would be tedious to 
fix.

I like the idea that if you touch a file for other reasons, then we can 
automate the flow to try and get the whitespace formatting good.

 Fix whitespace in Pig code
 --

 Key: PIG-3008
 URL: https://issues.apache.org/jira/browse/PIG-3008
 Project: Pig
  Issue Type: Improvement
Reporter: Jonathan Coveney
 Fix For: 0.12

 Attachments: checkstyle.xml


 This JIRA exists mainly to get a conversation started. We've talked about it 
 before, and it's a tricky issue. That said, some of the Pig code is super, 
 super gnarly. We need some sort of path that will let it eventually be 
 fix-able.
 I posit: any file that hasn't been touched for over 6 months is eligible for 
 a whitespace patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2968) ColumnMapKeyPrune fails to prune a subtree inside foreach

2012-10-29 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486329#comment-13486329
 ] 

Cheolsoo Park commented on PIG-2968:


+1.

I will commit it after running tests. Thanks Koji!

 ColumnMapKeyPrune fails to prune a subtree inside foreach
 -

 Key: PIG-2968
 URL: https://issues.apache.org/jira/browse/PIG-2968
 Project: Pig
  Issue Type: Bug
  Components: parser
Reporter: Koji Noguchi
Assignee: Koji Noguchi
Priority: Minor
 Fix For: 0.11

 Attachments: pig-2968-trunk_v01.txt, pig-2968-trunk_v02.txt


 Sample code 
 {noformat}
 $ cat test/foreach.pig 
 daily = load 'nyse' as (exchange, symbol);
 grpd = group daily by exchange;
 uniquecnt = foreach grpd {
 sym = daily.symbol;
 uniq_sym = distinct sym;
 generate group, uniq_sym;
 };
 another = FOREACH uniquecnt GENERATE group;
 explain another;
 {noformat}
 This breaks when it tries to prune uniq_sym-sym-innerload_daily
 bq. 2012-10-12 14:54:11,031 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
 ERROR 2000: Error processing rule ColumnMapKeyPrune. Try -t ColumnMapKeyPrune

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2968) ColumnMapKeyPrune fails to prune a subtree inside foreach

2012-10-29 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486342#comment-13486342
 ] 

Cheolsoo Park commented on PIG-2968:


Actually, can you please fix the indentation of testPruneSubTreeForEach() using 
4 spaces? We're trying to clean up white spaces in code base.

Thanks!


 ColumnMapKeyPrune fails to prune a subtree inside foreach
 -

 Key: PIG-2968
 URL: https://issues.apache.org/jira/browse/PIG-2968
 Project: Pig
  Issue Type: Bug
  Components: parser
Reporter: Koji Noguchi
Assignee: Koji Noguchi
Priority: Minor
 Fix For: 0.11

 Attachments: pig-2968-trunk_v01.txt, pig-2968-trunk_v02.txt


 Sample code 
 {noformat}
 $ cat test/foreach.pig 
 daily = load 'nyse' as (exchange, symbol);
 grpd = group daily by exchange;
 uniquecnt = foreach grpd {
 sym = daily.symbol;
 uniq_sym = distinct sym;
 generate group, uniq_sym;
 };
 another = FOREACH uniquecnt GENERATE group;
 explain another;
 {noformat}
 This breaks when it tries to prune uniq_sym-sym-innerload_daily
 bq. 2012-10-12 14:54:11,031 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
 ERROR 2000: Error processing rule ColumnMapKeyPrune. Try -t ColumnMapKeyPrune

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2968) ColumnMapKeyPrune fails to prune a subtree inside foreach

2012-10-29 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486344#comment-13486344
 ] 

Rohini Palaniswamy commented on PIG-2968:
-

Koji/Cheolsoo,
bq. But I couldn't come up with expected raw query that matches the pruned 
result.
   I missed commenting on this in the initial review. Should we take a crack at 
this before committing this. Or at least some Assert? 

 ColumnMapKeyPrune fails to prune a subtree inside foreach
 -

 Key: PIG-2968
 URL: https://issues.apache.org/jira/browse/PIG-2968
 Project: Pig
  Issue Type: Bug
  Components: parser
Reporter: Koji Noguchi
Assignee: Koji Noguchi
Priority: Minor
 Fix For: 0.11

 Attachments: pig-2968-trunk_v01.txt, pig-2968-trunk_v02.txt


 Sample code 
 {noformat}
 $ cat test/foreach.pig 
 daily = load 'nyse' as (exchange, symbol);
 grpd = group daily by exchange;
 uniquecnt = foreach grpd {
 sym = daily.symbol;
 uniq_sym = distinct sym;
 generate group, uniq_sym;
 };
 another = FOREACH uniquecnt GENERATE group;
 explain another;
 {noformat}
 This breaks when it tries to prune uniq_sym-sym-innerload_daily
 bq. 2012-10-12 14:54:11,031 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
 ERROR 2000: Error processing rule ColumnMapKeyPrune. Try -t ColumnMapKeyPrune

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2990) the -secretDebugCmd shouldn't be a secret and should just be...a command

2012-10-29 Thread Jonathan Coveney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Coveney updated PIG-2990:
--

   Resolution: Fixed
Fix Version/s: 0.12
   Status: Resolved  (was: Patch Available)

Thanks Rohini! Committed.

 the -secretDebugCmd shouldn't be a secret and should just be...a command
 

 Key: PIG-2990
 URL: https://issues.apache.org/jira/browse/PIG-2990
 Project: Pig
  Issue Type: Improvement
Reporter: Jonathan Coveney
Assignee: Jonathan Coveney
Priority: Minor
  Labels: newbie
 Fix For: 0.12

 Attachments: PIG-2990-0.patch


 It's a useful command, and it's weird that it's not in -help

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3008) Fix whitespace in Pig code

2012-10-29 Thread Gianmarco De Francisci Morales (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486350#comment-13486350
 ] 

Gianmarco De Francisci Morales commented on PIG-3008:
-

1. I am assuming we will integrate this with Ant. The checkstyle ant target 
wants a list of file to run on. So if you are able to identify the list of 
files you changed automatically, then yes. How do you define changed?
2. Yes, as long as there is a compatible formatter profile (i.e. it is not too 
complex). For our use case it should be fine. I know it can be done with 
Eclipse. For other IDEs I guess it can be done as well but don't know how.

We also need integration with Jenkins for automatic builds.

 Fix whitespace in Pig code
 --

 Key: PIG-3008
 URL: https://issues.apache.org/jira/browse/PIG-3008
 Project: Pig
  Issue Type: Improvement
Reporter: Jonathan Coveney
 Fix For: 0.12

 Attachments: checkstyle.xml


 This JIRA exists mainly to get a conversation started. We've talked about it 
 before, and it's a tricky issue. That said, some of the Pig code is super, 
 super gnarly. We need some sort of path that will let it eventually be 
 fix-able.
 I posit: any file that hasn't been touched for over 6 months is eligible for 
 a whitespace patch.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2898) Parallel execution of e2e tests

2012-10-29 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486352#comment-13486352
 ] 

Daniel Dai commented on PIG-2898:
-

+1. Rohini, can you commit once you get permission?

 Parallel execution of e2e tests
 ---

 Key: PIG-2898
 URL: https://issues.apache.org/jira/browse/PIG-2898
 Project: Pig
  Issue Type: Improvement
  Components: e2e harness
Affects Versions: 0.10.0
Reporter: Andrey Klochkov
Assignee: Ivan A. Veselovsky
  Labels: test
 Fix For: 0.11, 0.12

 Attachments: PIG-2898-branch-0.10-6-final.patch, 
 PIG-2898-branch-0.10-7.patch, PIG-2898-trunk-3.patch, 
 PIG-2898-trunk-6-final.patch, PIG-2898-trunk-7.patch


 Today it takes ~19 hours to run the full set of e2e tests in mapred mode. The 
 bottleneck here is the client side, and per our observations it can help a 
 lot if the e2e harness would be able to run tests in parallel threads.
 We prototyped changes in e2e harness allowing to run tests in a configurable 
 number of threads. Preliminary results show more than 6x reduction in 
 execution time when using a small 3-nodes M/R cluster with modest 
 configuration. Going to share a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2968) ColumnMapKeyPrune fails to prune a subtree inside foreach

2012-10-29 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486365#comment-13486365
 ] 

Cheolsoo Park commented on PIG-2968:


Hi Rohini, sure I can wait until your concern is addressed.

If Koji can improve the test case, that will be of course great. But since I 
don't have a better suggestion, I won't insist.

Thanks!

 ColumnMapKeyPrune fails to prune a subtree inside foreach
 -

 Key: PIG-2968
 URL: https://issues.apache.org/jira/browse/PIG-2968
 Project: Pig
  Issue Type: Bug
  Components: parser
Reporter: Koji Noguchi
Assignee: Koji Noguchi
Priority: Minor
 Fix For: 0.11

 Attachments: pig-2968-trunk_v01.txt, pig-2968-trunk_v02.txt


 Sample code 
 {noformat}
 $ cat test/foreach.pig 
 daily = load 'nyse' as (exchange, symbol);
 grpd = group daily by exchange;
 uniquecnt = foreach grpd {
 sym = daily.symbol;
 uniq_sym = distinct sym;
 generate group, uniq_sym;
 };
 another = FOREACH uniquecnt GENERATE group;
 explain another;
 {noformat}
 This breaks when it tries to prune uniq_sym-sym-innerload_daily
 bq. 2012-10-12 14:54:11,031 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
 ERROR 2000: Error processing rule ColumnMapKeyPrune. Try -t ColumnMapKeyPrune

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3015) Rewrite of AvroStorage

2012-10-29 Thread Joseph Adler (JIRA)
Joseph Adler created PIG-3015:
-

 Summary: Rewrite of AvroStorage
 Key: PIG-3015
 URL: https://issues.apache.org/jira/browse/PIG-3015
 Project: Pig
  Issue Type: Improvement
  Components: piggybank
Reporter: Joseph Adler


The current AvroStorage implementation has a lot of issues: it requires old 
versions of Avro, it copies data much more than needed, and it's verbose and 
complicated. (One pet peeve of mine is that old versions of Avro don't support 
Snappy compression.)

I rewrote AvroStorage from scratch to fix these issues. In early tests, the new 
implementation is significantly faster, and the code is a lot simpler. 
Rewriting AvroStorage also enabled me to implement support for Trevni.

I'm opening this ticket to facilitate discussion while I figure out the best 
way to contribute the changes back to Apache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2582) Store size in bytes (not mbytes) in ResourceStatistics

2012-10-29 Thread Jonathan Coveney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486396#comment-13486396
 ] 

Jonathan Coveney commented on PIG-2582:
---

Given it is unstable, I say just go ahead with the plan to make it private, and 
have the deprecated getter/setter

 Store size in bytes (not mbytes) in ResourceStatistics
 --

 Key: PIG-2582
 URL: https://issues.apache.org/jira/browse/PIG-2582
 Project: Pig
  Issue Type: Bug
Reporter: Travis Crawford
Assignee: Prashant Kommireddi
Priority: Minor
 Attachments: PIG-2582.patch


 In 
 [ResourceStatistics.java|http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/ResourceStatistics.java?view=markup]
  we see mBytes is public, and has a public getter/setter.
 {code}
 47public Long mBytes; // size in megabytes
 196   public Long getmBytes() {
 197   return mBytes;
 198   }
 199   public ResourceStatistics setmBytes(Long mBytes) {
 200   this.mBytes = mBytes;
 201   return this;
 202   }
 {code}
 Typically sizes are stored as bytes, potentially having convenience functions 
 to return with different units.
 If mBytes can be marked private without causing woes it might be worth 
 storing size as bytes instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2582) Store size in bytes (not mbytes) in ResourceStatistics

2012-10-29 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486410#comment-13486410
 ] 

Prashant Kommireddi commented on PIG-2582:
--

Also, there is a common theme of returning the object return this; on all 
setters. I don't think this should exist, but we should probably tackle that in 
the next release to make sure we are fine with the existing changes at first.


 Store size in bytes (not mbytes) in ResourceStatistics
 --

 Key: PIG-2582
 URL: https://issues.apache.org/jira/browse/PIG-2582
 Project: Pig
  Issue Type: Bug
Reporter: Travis Crawford
Assignee: Prashant Kommireddi
Priority: Minor
 Attachments: PIG-2582.patch


 In 
 [ResourceStatistics.java|http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/ResourceStatistics.java?view=markup]
  we see mBytes is public, and has a public getter/setter.
 {code}
 47public Long mBytes; // size in megabytes
 196   public Long getmBytes() {
 197   return mBytes;
 198   }
 199   public ResourceStatistics setmBytes(Long mBytes) {
 200   this.mBytes = mBytes;
 201   return this;
 202   }
 {code}
 Typically sizes are stored as bytes, potentially having convenience functions 
 to return with different units.
 If mBytes can be marked private without causing woes it might be worth 
 storing size as bytes instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3015) Rewrite of AvroStorage

2012-10-29 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486427#comment-13486427
 ] 

Cheolsoo Park commented on PIG-3015:


Hi Joseph,

Thank you very much for opening the jira. I have recently worked on AvroStorage 
by myself, and I totally agree with you. Since you already have code to 
contribute, this is even better. :-)

As part of re-write, I would also like to propose to migrate AvroStorge from 
Piggybank to the core Pig. I have 2 reasons for this:
# AvroStorage is widely used, so it makes sense to include it in the core Pig 
rather than in Piggybank.
# Until migration is complete, we can maintain both versions (new one in core 
Pig and old one in Piggybank) to avoid breaking backward compatibility. Another 
motivation for re-write to me is to clean up funny options that the current 
AvroStorage has. So I think that it's unavoidable to break backward 
compatibility.

I asked this question on the [user mailing 
list|http://mail-archives.apache.org/mod_mbox/pig-user/201208.mbox/%3C27EE5059-F811-4E19-B1A3-951B4BB3BDDF%40hortonworks.com%3E]
 a while ago, and nobody disagreed. But please let me know if anyone has 
objections.


To start with, I am wondering if you can post your code as a patch to this jira 
and the review board. Assuming that we're going to move AvroStorage to the core 
Pig, you can probably create a new package called 
org.apache.pig.backend.hadoop.avro and add your code there. If you could 
break your patch into smaller pieces and attach them to sub-tasks of this jira, 
that would be helpful too. 

Please let me know what you think.

Thanks!

 Rewrite of AvroStorage
 --

 Key: PIG-3015
 URL: https://issues.apache.org/jira/browse/PIG-3015
 Project: Pig
  Issue Type: Improvement
  Components: piggybank
Reporter: Joseph Adler

 The current AvroStorage implementation has a lot of issues: it requires old 
 versions of Avro, it copies data much more than needed, and it's verbose and 
 complicated. (One pet peeve of mine is that old versions of Avro don't 
 support Snappy compression.)
 I rewrote AvroStorage from scratch to fix these issues. In early tests, the 
 new implementation is significantly faster, and the code is a lot simpler. 
 Rewriting AvroStorage also enabled me to implement support for Trevni.
 I'm opening this ticket to facilitate discussion while I figure out the best 
 way to contribute the changes back to Apache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Build failed in Jenkins: Pig-trunk #1349

2012-10-29 Thread Apache Jenkins Server
See https://builds.apache.org/job/Pig-trunk/1349/changes

Changes:

[jcoveney] PIG-2990: the -secretDebugCmd shouldnt be a secret and should just 
be...a command (jcoveney)

--
[...truncated 6641 lines...]
 [findbugs]   org.apache.hadoop.io.file.tfile.TFile$Reader$Scanner$Entry
 [findbugs]   org.apache.hadoop.fs.FSDataInputStream
 [findbugs]   org.python.core.PyObject
 [findbugs]   jline.History
 [findbugs]   org.jruby.embed.internal.LocalContextProvider
 [findbugs]   org.apache.hadoop.io.BooleanWritable
 [findbugs]   org.apache.log4j.Logger
 [findbugs]   org.apache.hadoop.hbase.filter.FamilyFilter
 [findbugs]   org.codehaus.jackson.annotate.JsonPropertyOrder
 [findbugs]   groovy.lang.Tuple
 [findbugs]   org.antlr.runtime.IntStream
 [findbugs]   org.apache.hadoop.util.ReflectionUtils
 [findbugs]   org.apache.hadoop.fs.ContentSummary
 [findbugs]   org.jruby.runtime.builtin.IRubyObject
 [findbugs]   org.jruby.RubyInteger
 [findbugs]   org.python.core.PyTuple
 [findbugs]   org.mortbay.log.Log
 [findbugs]   org.apache.hadoop.conf.Configuration
 [findbugs]   com.google.common.base.Joiner
 [findbugs]   org.apache.hadoop.mapreduce.lib.input.FileSplit
 [findbugs]   org.apache.hadoop.mapred.Counters$Counter
 [findbugs]   com.jcraft.jsch.Channel
 [findbugs]   org.apache.hadoop.mapred.JobPriority
 [findbugs]   org.apache.commons.cli.Options
 [findbugs]   org.apache.hadoop.mapred.JobID
 [findbugs]   org.apache.hadoop.util.bloom.BloomFilter
 [findbugs]   org.python.core.PyFrame
 [findbugs]   org.apache.hadoop.hbase.filter.CompareFilter
 [findbugs]   org.apache.hadoop.util.VersionInfo
 [findbugs]   org.python.core.PyString
 [findbugs]   org.apache.hadoop.io.Text$Comparator
 [findbugs]   org.jruby.runtime.Block
 [findbugs]   org.antlr.runtime.MismatchedSetException
 [findbugs]   org.apache.hadoop.io.BytesWritable
 [findbugs]   org.apache.hadoop.fs.FsShell
 [findbugs]   org.joda.time.Months
 [findbugs]   org.mozilla.javascript.ImporterTopLevel
 [findbugs]   org.apache.hadoop.hbase.mapreduce.TableOutputFormat
 [findbugs]   org.apache.hadoop.mapred.TaskReport
 [findbugs]   org.apache.hadoop.security.UserGroupInformation
 [findbugs]   org.antlr.runtime.tree.RewriteRuleSubtreeStream
 [findbugs]   org.apache.commons.cli.HelpFormatter
 [findbugs]   com.google.common.collect.Maps
 [findbugs]   org.joda.time.ReadableInstant
 [findbugs]   org.mozilla.javascript.NativeObject
 [findbugs]   org.apache.hadoop.hbase.HConstants
 [findbugs]   org.apache.hadoop.io.serializer.Deserializer
 [findbugs]   org.antlr.runtime.FailedPredicateException
 [findbugs]   org.apache.hadoop.io.compress.CompressionCodec
 [findbugs]   org.jruby.RubyNil
 [findbugs]   org.apache.hadoop.fs.FileStatus
 [findbugs]   org.apache.hadoop.hbase.client.Result
 [findbugs]   org.apache.hadoop.mapreduce.JobContext
 [findbugs]   org.codehaus.jackson.JsonGenerator
 [findbugs]   org.apache.hadoop.mapreduce.TaskAttemptContext
 [findbugs]   org.apache.hadoop.io.LongWritable$Comparator
 [findbugs]   org.codehaus.jackson.map.util.LRUMap
 [findbugs]   org.apache.hadoop.hbase.util.Bytes
 [findbugs]   org.antlr.runtime.MismatchedTokenException
 [findbugs]   org.codehaus.jackson.JsonParser
 [findbugs]   com.jcraft.jsch.UserInfo
 [findbugs]   org.python.core.PyException
 [findbugs]   org.apache.commons.cli.ParseException
 [findbugs]   org.apache.hadoop.io.compress.CompressionOutputStream
 [findbugs]   org.apache.hadoop.hbase.filter.WritableByteArrayComparable
 [findbugs]   org.antlr.runtime.tree.CommonTreeNodeStream
 [findbugs]   org.apache.log4j.Level
 [findbugs]   org.apache.hadoop.hbase.client.Scan
 [findbugs]   org.jruby.anno.JRubyMethod
 [findbugs]   org.apache.hadoop.mapreduce.Job
 [findbugs]   com.google.common.util.concurrent.Futures
 [findbugs]   org.apache.commons.logging.LogFactory
 [findbugs]   org.apache.commons.collections.IteratorUtils
 [findbugs]   org.apache.commons.codec.binary.Base64
 [findbugs]   org.codehaus.jackson.map.ObjectMapper
 [findbugs]   org.apache.hadoop.fs.FileSystem
 [findbugs]   org.jruby.embed.LocalContextScope
 [findbugs]   org.apache.hadoop.hbase.filter.FilterList$Operator
 [findbugs]   org.jruby.RubySymbol
 [findbugs]   org.apache.hadoop.hbase.io.ImmutableBytesWritable
 [findbugs]   org.apache.hadoop.io.serializer.SerializationFactory
 [findbugs]   org.antlr.runtime.tree.TreeAdaptor
 [findbugs]   org.apache.hadoop.mapred.RunningJob
 [findbugs]   org.antlr.runtime.CommonTokenStream
 [findbugs]   org.apache.hadoop.io.DataInputBuffer
 [findbugs]   org.apache.hadoop.io.file.tfile.TFile
 [findbugs]   org.apache.commons.cli.GnuParser
 [findbugs]   org.mozilla.javascript.Context
 [findbugs]   org.apache.hadoop.io.FloatWritable
 [findbugs]   org.antlr.runtime.tree.RewriteEarlyExitException
 [findbugs]   org.apache.hadoop.hbase.HBaseConfiguration
 [findbugs]   org.codehaus.jackson.JsonGenerationException
 [findbugs]   org.apache.hadoop.mapreduce.TaskInputOutputContext
 [findbugs]   

[jira] [Updated] (PIG-2997) Provide a convenience constructor on PigServer that accepts Configuration

2012-10-29 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-2997:
-

Attachment: PIG-2997_1.patch

Thanks for the review Rohini. 

Adding a test case. Also, I have moved the Configuration-Properties switch 
into PigContext so stuff other than PigServer can use it too. 

 Provide a convenience constructor on PigServer that accepts Configuration
 -

 Key: PIG-2997
 URL: https://issues.apache.org/jira/browse/PIG-2997
 Project: Pig
  Issue Type: Improvement
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.11

 Attachments: PIG-2997_1.patch, PIG-2997.patch


 PigServer currently has Properties based constructor. Hadoop in general deals 
 with Configuration and it would be good to have a PigServer constructor that 
 accepts the same. With this, user does not have to worry about creating 
 Properties object out of conf and can simply invoke this new constructor.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2461) Simplify schema syntax for cast

2012-10-29 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-2461:


Fix Version/s: (was: 0.11)

 Simplify schema syntax for cast
 ---

 Key: PIG-2461
 URL: https://issues.apache.org/jira/browse/PIG-2461
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.10.0
Reporter: Daniel Dai
 Fix For: 0.12


 Cast into a bag/tuple syntax is confusing:
 {code}
 b = foreach a generate (bag{tuple(int,double)})bag0;
 {code}
 It's pretty hard to get it right for users. We should make key word 
 bag/tuple optional.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2625) Allow use of JRuby for control flow

2012-10-29 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-2625:


Fix Version/s: (was: 0.11)

 Allow use of JRuby for control flow
 ---

 Key: PIG-2625
 URL: https://issues.apache.org/jira/browse/PIG-2625
 Project: Pig
  Issue Type: New Feature
Reporter: Jonathan Coveney
 Fix For: 0.12


 Much like people can use jython for iterative computation, it'd be great to 
 use JRuby for the same

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2625) Allow use of JRuby for control flow

2012-10-29 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-2625:


Fix Version/s: 0.12

Moving to 12 since this is an improvement with no activity at this point

 Allow use of JRuby for control flow
 ---

 Key: PIG-2625
 URL: https://issues.apache.org/jira/browse/PIG-2625
 Project: Pig
  Issue Type: New Feature
Reporter: Jonathan Coveney
 Fix For: 0.12


 Much like people can use jython for iterative computation, it'd be great to 
 use JRuby for the same

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2933) HBaseStorage is using setScannerCaching which is deprecated

2012-10-29 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486474#comment-13486474
 ] 

Prashant Kommireddi commented on PIG-2933:
--

1. Makes sense
2. I think inner classes ordering on imports is different, it's usually placed 
below. Let me know if you think otherwise as per Apache pig guidelines. 

 HBaseStorage is using setScannerCaching which is deprecated
 ---

 Key: PIG-2933
 URL: https://issues.apache.org/jira/browse/PIG-2933
 Project: Pig
  Issue Type: Bug
Reporter: Ted Malaska
Assignee: Prashant Kommireddi
Priority: Minor
  Labels: hbase
 Attachments: PIG-2933.patch


 HTable.setScannerCaching is deprecated use Scan.setCaching(int)
 Note: I'm on vacation starting tomorrow.  If you want I can fix this next 
 week.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2933) HBaseStorage is using setScannerCaching which is deprecated

2012-10-29 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-2933:
-

Attachment: PIG-2933_1.patch

 HBaseStorage is using setScannerCaching which is deprecated
 ---

 Key: PIG-2933
 URL: https://issues.apache.org/jira/browse/PIG-2933
 Project: Pig
  Issue Type: Bug
Reporter: Ted Malaska
Assignee: Prashant Kommireddi
Priority: Minor
  Labels: hbase
 Attachments: PIG-2933_1.patch, PIG-2933.patch


 HTable.setScannerCaching is deprecated use Scan.setCaching(int)
 Note: I'm on vacation starting tomorrow.  If you want I can fix this next 
 week.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3015) Rewrite of AvroStorage

2012-10-29 Thread Joseph Adler (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486489#comment-13486489
 ] 

Joseph Adler commented on PIG-3015:
---

Here's the working version: https://github.com/josephadler/fast-avro-storage

I can break that up into multiple Jira tickets, though that feels like a lot of 
extra work; I threw away all the existing code and started from scratch. I do 
think it's reasonable to separate AvroStorage and TrevniStorage for now (though 
they are very closely related)

 Rewrite of AvroStorage
 --

 Key: PIG-3015
 URL: https://issues.apache.org/jira/browse/PIG-3015
 Project: Pig
  Issue Type: Improvement
  Components: piggybank
Reporter: Joseph Adler

 The current AvroStorage implementation has a lot of issues: it requires old 
 versions of Avro, it copies data much more than needed, and it's verbose and 
 complicated. (One pet peeve of mine is that old versions of Avro don't 
 support Snappy compression.)
 I rewrote AvroStorage from scratch to fix these issues. In early tests, the 
 new implementation is significantly faster, and the code is a lot simpler. 
 Rewriting AvroStorage also enabled me to implement support for Trevni.
 I'm opening this ticket to facilitate discussion while I figure out the best 
 way to contribute the changes back to Apache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2582) Store size in bytes (not mbytes) in ResourceStatistics

2012-10-29 Thread Prashant Kommireddi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prashant Kommireddi updated PIG-2582:
-

Attachment: PIG-2582_1.patch

Making instance variables private. The setter on mBytes is being made 
deprecated, however getter can be thought of as a convenience method going 
forward and the implementation changed once we get rid of mBytes variable in 
the next iteration. We cannot *safely* get rid of mBytes until the setter is 
completely taken out of this class.

 Store size in bytes (not mbytes) in ResourceStatistics
 --

 Key: PIG-2582
 URL: https://issues.apache.org/jira/browse/PIG-2582
 Project: Pig
  Issue Type: Bug
Reporter: Travis Crawford
Assignee: Prashant Kommireddi
Priority: Minor
 Attachments: PIG-2582_1.patch, PIG-2582.patch


 In 
 [ResourceStatistics.java|http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/ResourceStatistics.java?view=markup]
  we see mBytes is public, and has a public getter/setter.
 {code}
 47public Long mBytes; // size in megabytes
 196   public Long getmBytes() {
 197   return mBytes;
 198   }
 199   public ResourceStatistics setmBytes(Long mBytes) {
 200   this.mBytes = mBytes;
 201   return this;
 202   }
 {code}
 Typically sizes are stored as bytes, potentially having convenience functions 
 to return with different units.
 If mBytes can be marked private without causing woes it might be worth 
 storing size as bytes instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2933) HBaseStorage is using setScannerCaching which is deprecated

2012-10-29 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486496#comment-13486496
 ] 

Rohini Palaniswamy commented on PIG-2933:
-

bq. I think inner classes ordering on imports is different, it's usually placed 
below. Let me know if you think otherwise as per Apache pig guidelines.
  Its just a standard ordering and should work with any IDE. Checked that doing 
a Ctrl+Shift+O in Eclipse puts Map.Entry immediately after Map. 

+1. Will commit this once I get access. 

 HBaseStorage is using setScannerCaching which is deprecated
 ---

 Key: PIG-2933
 URL: https://issues.apache.org/jira/browse/PIG-2933
 Project: Pig
  Issue Type: Bug
Reporter: Ted Malaska
Assignee: Prashant Kommireddi
Priority: Minor
  Labels: hbase
 Attachments: PIG-2933_1.patch, PIG-2933.patch


 HTable.setScannerCaching is deprecated use Scan.setCaching(int)
 Note: I'm on vacation starting tomorrow.  If you want I can fix this next 
 week.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2881) Add SUBTRACT eval function

2012-10-29 Thread Joel Costigliola (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486497#comment-13486497
 ] 

Joel Costigliola commented on PIG-2881:
---

Jon,

It turns out that the trunk (rev 1403547) did not pass the following tests on 
my machine :(
{noformat}
[junit] Running org.apache.pig.test.TestLoad
[junit] Tests run: 15, Failures: 9, Errors: 0, Time elapsed: 59.125 sec

[junit] Running org.apache.pig.test.TestStore
[junit] Tests run: 17, Failures: 2, Errors: 0, Time elapsed: 406.874 sec
{noformat}

My environment is : 
- Apache Ant(TM) version 1.8.2 compiled on December 3 2011
- Java version: 1.6.0_17, vendor: Sun Microsystems Inc.
- Default locale: en_GB, platform encoding: UTF-8
- OS name: linux, version: 3.2.0-33-generic, arch: amd64, family: unix
 
I have improved SUBTRACT unit tests, run all the tests and did not have more 
failing tests than the one I mentioned.

Question : do you want me to make a patch or wait until all tests pass on my 
machine ?

Regards,

Joel

 Add SUBTRACT eval function
 --

 Key: PIG-2881
 URL: https://issues.apache.org/jira/browse/PIG-2881
 Project: Pig
  Issue Type: New Feature
  Components: piggybank
Affects Versions: 0.10.0
Reporter: Joel Costigliola
Priority: Minor
 Attachments: Subtract.java, SubtractTest.java


 Close to DIFF function but SUBTRACT(bag1, bag2) will subtract elements of 
 bag2 from bag1.
   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2968) ColumnMapKeyPrune fails to prune a subtree inside foreach

2012-10-29 Thread Koji Noguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-2968:
--

Attachment: pig-2968-trunk_v03.txt

bq. Actually, can you please fix the indentation of testPruneSubTreeForEach() 
using 4 spaces?

Changed. (Sorry, I recently learned it's 4 spaces instead of 2.)

bq. Or at least some Assert? 

Added a catch and fail call.


bq. Should we take a crack at this before committing this

I tried simply taking out the lines but nested foreach with pruned inputs were 
different from a simple foreach with just one output. (one extra foreach on 
former)


 ColumnMapKeyPrune fails to prune a subtree inside foreach
 -

 Key: PIG-2968
 URL: https://issues.apache.org/jira/browse/PIG-2968
 Project: Pig
  Issue Type: Bug
  Components: parser
Reporter: Koji Noguchi
Assignee: Koji Noguchi
Priority: Minor
 Fix For: 0.11

 Attachments: pig-2968-trunk_v01.txt, pig-2968-trunk_v02.txt, 
 pig-2968-trunk_v03.txt


 Sample code 
 {noformat}
 $ cat test/foreach.pig 
 daily = load 'nyse' as (exchange, symbol);
 grpd = group daily by exchange;
 uniquecnt = foreach grpd {
 sym = daily.symbol;
 uniq_sym = distinct sym;
 generate group, uniq_sym;
 };
 another = FOREACH uniquecnt GENERATE group;
 explain another;
 {noformat}
 This breaks when it tries to prune uniq_sym-sym-innerload_daily
 bq. 2012-10-12 14:54:11,031 [main] ERROR org.apache.pig.tools.grunt.Grunt - 
 ERROR 2000: Error processing rule ColumnMapKeyPrune. Try -t ColumnMapKeyPrune

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2933) HBaseStorage is using setScannerCaching which is deprecated

2012-10-29 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486499#comment-13486499
 ] 

Prashant Kommireddi commented on PIG-2933:
--

Sounds good, thanks.

 HBaseStorage is using setScannerCaching which is deprecated
 ---

 Key: PIG-2933
 URL: https://issues.apache.org/jira/browse/PIG-2933
 Project: Pig
  Issue Type: Bug
Reporter: Ted Malaska
Assignee: Prashant Kommireddi
Priority: Minor
  Labels: hbase
 Attachments: PIG-2933_1.patch, PIG-2933.patch


 HTable.setScannerCaching is deprecated use Scan.setCaching(int)
 Note: I'm on vacation starting tomorrow.  If you want I can fix this next 
 week.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2881) Add SUBTRACT eval function

2012-10-29 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486504#comment-13486504
 ] 

Cheolsoo Park commented on PIG-2881:


Hi Joel, what errors do you see in the test logs for your test failures?

build/test/logs/TEST-org.apache.pig.test.TestLoad.txt
build/test/logs/TEST-org.apache.pig.test.TestStore.txt

Both tests use MiniCluster, so they may be failing due to environment issues. 
For example, did you set umask 0022 in the shell where you're running tests?

 Add SUBTRACT eval function
 --

 Key: PIG-2881
 URL: https://issues.apache.org/jira/browse/PIG-2881
 Project: Pig
  Issue Type: New Feature
  Components: piggybank
Affects Versions: 0.10.0
Reporter: Joel Costigliola
Priority: Minor
 Attachments: Subtract.java, SubtractTest.java


 Close to DIFF function but SUBTRACT(bag1, bag2) will subtract elements of 
 bag2 from bag1.
   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2628) Allow in line scripting UDF definitions

2012-10-29 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-2628:


Fix Version/s: (was: 0.11)
   0.12

Moving to 0.12 since it is an improvement with no work done yet

 Allow in line scripting UDF definitions
 ---

 Key: PIG-2628
 URL: https://issues.apache.org/jira/browse/PIG-2628
 Project: Pig
  Issue Type: Improvement
Reporter: Jonathan Coveney
 Fix For: 0.12


 For small udfs in scripting languages, it may be cumbersome to force users to 
 make a script, put it on the classpath, ship it, etc. It would be great to 
 support a syntax that allows people to declare UDFs in line (essentially, to 
 define a snippet of code that will be interpreted as a scriptlet)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2624) Handle recursive inclusion of scripts in JRuby UDFs

2012-10-29 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-2624:


Fix Version/s: (was: 0.10.1)
   (was: 0.11)
   0.12

Moving to 0.12 since there no work or person assigned to date

 Handle recursive inclusion of scripts in JRuby UDFs
 ---

 Key: PIG-2624
 URL: https://issues.apache.org/jira/browse/PIG-2624
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.10.0, 0.11
Reporter: Jonathan Coveney
  Labels: JRuby
 Fix For: 0.12


 Currently, if you have a script which require's another script, the 
 dependency won't be properly handled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2631) Pig should allow self joins

2012-10-29 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-2631:


Fix Version/s: 0.12

Moving to 12 since no work has been done and the ticket is unassigned

 Pig should allow self joins
 ---

 Key: PIG-2631
 URL: https://issues.apache.org/jira/browse/PIG-2631
 Project: Pig
  Issue Type: Improvement
Reporter: Jonathan Coveney
 Fix For: 0.11, 0.12


 This doesn't have to even be optimized, and can still involve a double scan 
 of the data, but there is no reason the following should work:
 {code}
 a = load 'thing' as (x:int);
 b = join a by x, (foreach a generate *) by x;
 {code}
 but this does not:
 {code}
 a = load 'thing' as (x:int);
 b = join a by x, a by x;
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2631) Pig should allow self joins

2012-10-29 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-2631:


Fix Version/s: (was: 0.11)

 Pig should allow self joins
 ---

 Key: PIG-2631
 URL: https://issues.apache.org/jira/browse/PIG-2631
 Project: Pig
  Issue Type: Improvement
Reporter: Jonathan Coveney
 Fix For: 0.12


 This doesn't have to even be optimized, and can still involve a double scan 
 of the data, but there is no reason the following should work:
 {code}
 a = load 'thing' as (x:int);
 b = join a by x, (foreach a generate *) by x;
 {code}
 but this does not:
 {code}
 a = load 'thing' as (x:int);
 b = join a by x, a by x;
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (PIG-2641) Create toJSON function for all complex types: tuples, bags and maps

2012-10-29 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich reassigned PIG-2641:
---

Assignee: Russell Jurney

Hi Russell,

Is this going to be done in the next couple of weeks? if not, should we move it 
to 12?

 Create toJSON function for all complex types: tuples, bags and maps
 ---

 Key: PIG-2641
 URL: https://issues.apache.org/jira/browse/PIG-2641
 Project: Pig
  Issue Type: New Feature
  Components: piggybank
Affects Versions: 0.11, 0.10.1
 Environment: Foggy. Damn foggy.
Reporter: Russell Jurney
Assignee: Russell Jurney
  Labels: chararray, fun, happy, input, json, output, pants, pig, 
 piggybank, string, wonderdog
 Fix For: 0.11, 0.10.1

   Original Estimate: 96h
  Remaining Estimate: 96h

 It is a travesty that there are no UDFs in Piggybanks that, given an 
 arbitrary Pig datatype, return a JSON string of same. I intend to fix this 
 problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2591) Unit tests should not write to /tmp but respect java.io.tmpdir

2012-10-29 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-2591:


Fix Version/s: (was: 0.11)
   0.12

Moving to 12 since no work has been done and the ticket is unassigned

 Unit tests should not write to /tmp but respect java.io.tmpdir
 --

 Key: PIG-2591
 URL: https://issues.apache.org/jira/browse/PIG-2591
 Project: Pig
  Issue Type: Bug
  Components: tools
Reporter: Thomas Weise
 Fix For: 0.12


 Several tests use /tmp but should derive temporary file location from 
 java.io.tmpdir to avoid side effects (java.io.tmpdir is already set to a test 
 run specific location in build.xml)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2881) Add SUBTRACT eval function

2012-10-29 Thread Joel Costigliola (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486519#comment-13486519
 ] 

Joel Costigliola commented on PIG-2881:
---

No I did not set umask 0022 because it was not mentioned to do so in the 
contributor guide : https://cwiki.apache.org/PIG/howtocontribute.html.

TestStore failures :

{noformat}
Testcase: testStoreRemoteRel took 0.028 sec
FAILED
expected:... but was:hdfs://joe-desktop:43074...

...

Testcase: testStoreRemoteRelScheme took 0.04 sec
FAILED
expected:... but was:hdfs://joe-desktop:43074...
{noformat}


TestLoad failures :

{noformat}
Testcase: testLoadRemoteRel took 0.333 sec
FAILED
expected:[hdfs://joe-desktop:49682]/tmp/test but was:[]/tmp/test
junit.framework.AssertionFailedError: 
expected:[hdfs://joe-desktop:49682]/tmp/test but was:[]/tmp/test
at org.apache.pig.test.TestLoad.checkLoadPath(TestLoad.java:332)
at org.apache.pig.test.TestLoad.checkLoadPath(TestLoad.java:300)
at org.apache.pig.test.TestLoad.testLoadRemoteRel(TestLoad.java:122)

Testcase: testLoadRemoteAbs took 0.074 sec
Testcase: testLoadRemoteRelScheme took 0.03 sec
FAILED
expected:[hdfs://joe-desktop:49682]/tmp/test but was:[]/tmp/test
junit.framework.AssertionFailedError: 
expected:[hdfs://joe-desktop:49682]/tmp/test but was:[]/tmp/test
at org.apache.pig.test.TestLoad.checkLoadPath(TestLoad.java:332)
at org.apache.pig.test.TestLoad.checkLoadPath(TestLoad.java:300)
at 
org.apache.pig.test.TestLoad.testLoadRemoteRelScheme(TestLoad.java:139)

Testcase: testLoadRemoteAbsScheme took 27.36 sec
Testcase: testLoadRemoteAbsAuth took 0.083 sec
FAILED
expected:[hdfs://joe-desktop:49682]/test but was:[]/test
junit.framework.AssertionFailedError: 
expected:[hdfs://joe-desktop:49682]/test but was:[]/test
at org.apache.pig.test.TestLoad.checkLoadPath(TestLoad.java:332)
at org.apache.pig.test.TestLoad.checkLoadPath(TestLoad.java:300)
at org.apache.pig.test.TestLoad.testLoadRemoteAbsAuth(TestLoad.java:158)

Testcase: testLoadRemoteNormalize took 0.041 sec
Testcase: testGlobChars took 0.035 sec
FAILED
expected:[hdfs://joe-desktop:49682]/tmp/t?s* but was:[]/tmp/t?s*
junit.framework.AssertionFailedError: 
expected:[hdfs://joe-desktop:49682]/tmp/t?s* but was:[]/tmp/t?s*
at org.apache.pig.test.TestLoad.checkLoadPath(TestLoad.java:332)
at org.apache.pig.test.TestLoad.checkLoadPath(TestLoad.java:300)
at org.apache.pig.test.TestLoad.testGlobChars(TestLoad.java:174)

Testcase: testCommaSeparatedString took 0.031 sec
FAILED
expected:[hdfs://joe-desktop:49682/tmp/usr/pig/a,hdfs://joe-desktop:49682]/tmp/usr/pig/b
 but was:[/tmp/usr/pig/a,]/tmp/usr/pig/b
junit.framework.AssertionFailedError: 
expected:[hdfs://joe-desktop:49682/tmp/usr/pig/a,hdfs://joe-desktop:49682]/tmp/usr/pig/b
 but was:[/tmp/usr/pig/a,]/tmp/usr/pig/b
at org.apache.pig.test.TestLoad.checkLoadPath(TestLoad.java:332)
at org.apache.pig.test.TestLoad.checkLoadPath(TestLoad.java:300)
at 
org.apache.pig.test.TestLoad.testCommaSeparatedString(TestLoad.java:182)

Testcase: testCommaSeparatedString2 took 0.045 sec
FAILED
expected:[hdfs://joe-desktop:49682/tmp/t?s*,hdfs://joe-desktop:49682]/tmp/test
 but was:[/tmp/t?s*,]/tmp/test
junit.framework.AssertionFailedError: 
expected:[hdfs://joe-desktop:49682/tmp/t?s*,hdfs://joe-desktop:49682]/tmp/test
 but was:[/tmp/t?s*,]/tmp/test
at org.apache.pig.test.TestLoad.checkLoadPath(TestLoad.java:332)
at org.apache.pig.test.TestLoad.checkLoadPath(TestLoad.java:300)
at 
org.apache.pig.test.TestLoad.testCommaSeparatedString2(TestLoad.java:190)

Testcase: testCommaSeparatedString3 took 23.653 sec
Testcase: testCommaSeparatedString4 took 0.094 sec
FAILED
expected:[hdfs://joe-desktop:49682/tmp/usr/pig/{a,c},hdfs://joe-desktop:49682]/tmp/usr/pig/b
 but was:[/tmp/usr/pig/{a,c},]/tmp/usr/pig/b
junit.framework.AssertionFailedError: 
expected:[hdfs://joe-desktop:49682/tmp/usr/pig/{a,c},hdfs://joe-desktop:49682]/tmp/usr/pig/b
 but was:[/tmp/usr/pig/{a,c},]/tmp/usr/pig/b
at org.apache.pig.test.TestLoad.checkLoadPath(TestLoad.java:332)
at org.apache.pig.test.TestLoad.checkLoadPath(TestLoad.java:300)
at 
org.apache.pig.test.TestLoad.testCommaSeparatedString4(TestLoad.java:218)

Testcase: testCommaSeparatedString5 took 0.035 sec
FAILED
expected:/usr/pig/{a,c},[hdfs://joe-desktop:49682]/tmp/usr/pig/b but 
was:/usr/pig/{a,c},[]/tmp/usr/pig/b
junit.framework.AssertionFailedError: 
expected:/usr/pig/{a,c},[hdfs://joe-desktop:49682]/tmp/usr/pig/b but 
was:/usr/pig/{a,c},[]/tmp/usr/pig/b
at org.apache.pig.test.TestLoad.checkLoadPath(TestLoad.java:332)
at org.apache.pig.test.TestLoad.checkLoadPath(TestLoad.java:300)
at 

[jira] [Commented] (PIG-1919) order-by on bag gives error only at runtime

2012-10-29 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486524#comment-13486524
 ] 

Olga Natkovich commented on PIG-1919:
-

Jonathan, should this be assigned to you? Is this going to be finished for 0.11 
or should be moved to 0.12?

 order-by on bag gives error only at runtime
 ---

 Key: PIG-1919
 URL: https://issues.apache.org/jira/browse/PIG-1919
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0, 0.9.0
Reporter: Thejas M Nair
 Fix For: 0.11, 0.10.1

 Attachments: PIG-1919-0.patch, PIG-1919-1.patch, PIG-1919-1.patch


 Order-by on a bag or tuple should give error at query compile time, instead 
 of giving an error at runtime.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2881) Add SUBTRACT eval function

2012-10-29 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486525#comment-13486525
 ] 

Cheolsoo Park commented on PIG-2881:


Yes, please attach the logs. Thanks!

 Add SUBTRACT eval function
 --

 Key: PIG-2881
 URL: https://issues.apache.org/jira/browse/PIG-2881
 Project: Pig
  Issue Type: New Feature
  Components: piggybank
Affects Versions: 0.10.0
Reporter: Joel Costigliola
Priority: Minor
 Attachments: Subtract.java, SubtractTest.java


 Close to DIFF function but SUBTRACT(bag1, bag2) will subtract elements of 
 bag2 from bag1.
   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2423) document use case where co-group is better choice than join

2012-10-29 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2423?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486527#comment-13486527
 ] 

Olga Natkovich commented on PIG-2423:
-

Thejas, should this be assigned to you? Is this going to go into 0.11 or 0.12?

 document use case where co-group is better choice than join 
 

 Key: PIG-2423
 URL: https://issues.apache.org/jira/browse/PIG-2423
 Project: Pig
  Issue Type: Improvement
  Components: documentation
Reporter: Thejas M Nair
 Fix For: 0.11


 Optimization rules 2 and 3 suggested in 
 https://issues.apache.org/jira/secure/attachment/12506841/pig_tpch.ppt 
 (PIG-2397) recommend the use of co-group instead of  join in certain cases. 
 These should be documented in pig performance page.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2595) BinCond only works inside parentheses

2012-10-29 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-2595:


Fix Version/s: (was: 0.11)
   0.12

Moving to 12 since no work has been done and the ticket is unassigned

 BinCond only works inside parentheses
 -

 Key: PIG-2595
 URL: https://issues.apache.org/jira/browse/PIG-2595
 Project: Pig
  Issue Type: Bug
Reporter: Daniel Dai
 Fix For: 0.12


 Not sure if we have a Jira for this before. This script does not work:
 {code}
 a = load '/user/pig/tests/data/singlefile/studenttab10k' using PigStorage() 
 as (name, age:int, gpa:double, instate:chararray);
 b = foreach a generate name, instate=='true'?gpa:gpa+1;
 dump b;
 {code}
 If we put bincond into parentheses, it works
 {code}
 a = load '/user/pig/tests/data/singlefile/studenttab10k' using PigStorage() 
 as (name, age:int, gpa:double, instate:chararray);
 b = foreach a generate name, (instate=='true'?gpa:gpa+1);
 dump b;
 {code}
 Exception:
 ERROR 1200: file 40.pig, line 2, column 36  mismatched input '==' expecting 
 SEMI_COLON
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during 
 parsing. file 40.pig, line 2, column 36  mismatched input '==' expecting 
 SEMI_COLON
 at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1598)
 at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1541)
 at org.apache.pig.PigServer.registerQuery(PigServer.java:541)
 at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:945)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:392)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:190)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
 at org.apache.pig.Main.run(Main.java:599)
 at org.apache.pig.Main.main(Main.java:153)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
 Caused by: Failed to parse: file 40.pig, line 2, column 36  mismatched 
 input '==' expecting SEMI_COLON
 at 
 org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:226)
 at 
 org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:168)
 at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1590)
 ... 14 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2582) Store size in bytes (not mbytes) in ResourceStatistics

2012-10-29 Thread Jonathan Coveney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486529#comment-13486529
 ] 

Jonathan Coveney commented on PIG-2582:
---

As long as the instance variables are private, then how come we can't get rid 
of the mBytes variable?

 Store size in bytes (not mbytes) in ResourceStatistics
 --

 Key: PIG-2582
 URL: https://issues.apache.org/jira/browse/PIG-2582
 Project: Pig
  Issue Type: Bug
Reporter: Travis Crawford
Assignee: Prashant Kommireddi
Priority: Minor
 Attachments: PIG-2582_1.patch, PIG-2582.patch


 In 
 [ResourceStatistics.java|http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/ResourceStatistics.java?view=markup]
  we see mBytes is public, and has a public getter/setter.
 {code}
 47public Long mBytes; // size in megabytes
 196   public Long getmBytes() {
 197   return mBytes;
 198   }
 199   public ResourceStatistics setmBytes(Long mBytes) {
 200   this.mBytes = mBytes;
 201   return this;
 202   }
 {code}
 Typically sizes are stored as bytes, potentially having convenience functions 
 to return with different units.
 If mBytes can be marked private without causing woes it might be worth 
 storing size as bytes instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2997) Provide a convenience constructor on PigServer that accepts Configuration

2012-10-29 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486532#comment-13486532
 ] 

Rohini Palaniswamy commented on PIG-2997:
-

Prashant,
   I did not mean that you need to write a unit test for this. It would be 
overkill :). Sorry for not being clear. There are already unit tests which use 
ConfigurationUtil.toProperties() - TestHBaseStorage.beforeTest() and 
TestFRJoin.FRJoin.setUpHashTable(). It would be good if you can change those 
existing tests to use your new API instead of a new test.

 Provide a convenience constructor on PigServer that accepts Configuration
 -

 Key: PIG-2997
 URL: https://issues.apache.org/jira/browse/PIG-2997
 Project: Pig
  Issue Type: Improvement
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.11

 Attachments: PIG-2997_1.patch, PIG-2997.patch


 PigServer currently has Properties based constructor. Hadoop in general deals 
 with Configuration and it would be good to have a PigServer constructor that 
 accepts the same. With this, user does not have to worry about creating 
 Properties object out of conf and can simply invoke this new constructor.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2997) Provide a convenience constructor on PigServer that accepts Configuration

2012-10-29 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486536#comment-13486536
 ] 

Prashant Kommireddi commented on PIG-2997:
--

I knew what you meant but wanted to keep my changes isolated. Wasn't a huge 
deal to add a 10 line test case. But I will change the above mentioned tests to 
re-use this.

 Provide a convenience constructor on PigServer that accepts Configuration
 -

 Key: PIG-2997
 URL: https://issues.apache.org/jira/browse/PIG-2997
 Project: Pig
  Issue Type: Improvement
Reporter: Prashant Kommireddi
Assignee: Prashant Kommireddi
 Fix For: 0.11

 Attachments: PIG-2997_1.patch, PIG-2997.patch


 PigServer currently has Properties based constructor. Hadoop in general deals 
 with Configuration and it would be good to have a PigServer constructor that 
 accepts the same. With this, user does not have to worry about creating 
 Properties object out of conf and can simply invoke this new constructor.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2582) Store size in bytes (not mbytes) in ResourceStatistics

2012-10-29 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486543#comment-13486543
 ] 

Prashant Kommireddi commented on PIG-2582:
--

What would we have the deprecated mBytes setter set the value on?

 Store size in bytes (not mbytes) in ResourceStatistics
 --

 Key: PIG-2582
 URL: https://issues.apache.org/jira/browse/PIG-2582
 Project: Pig
  Issue Type: Bug
Reporter: Travis Crawford
Assignee: Prashant Kommireddi
Priority: Minor
 Attachments: PIG-2582_1.patch, PIG-2582.patch


 In 
 [ResourceStatistics.java|http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/ResourceStatistics.java?view=markup]
  we see mBytes is public, and has a public getter/setter.
 {code}
 47public Long mBytes; // size in megabytes
 196   public Long getmBytes() {
 197   return mBytes;
 198   }
 199   public ResourceStatistics setmBytes(Long mBytes) {
 200   this.mBytes = mBytes;
 201   return this;
 202   }
 {code}
 Typically sizes are stored as bytes, potentially having convenience functions 
 to return with different units.
 If mBytes can be marked private without causing woes it might be worth 
 storing size as bytes instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2582) Store size in bytes (not mbytes) in ResourceStatistics

2012-10-29 Thread Jonathan Coveney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486552#comment-13486552
 ] 

Jonathan Coveney commented on PIG-2582:
---

Couldn't it just set realBytes = mBytes * 1024 * 1024 ? There would be no loss 
of precision. And if they happened to do getMBytes we just divided (in this 
case, we can in fact even just do shifts).

 Store size in bytes (not mbytes) in ResourceStatistics
 --

 Key: PIG-2582
 URL: https://issues.apache.org/jira/browse/PIG-2582
 Project: Pig
  Issue Type: Bug
Reporter: Travis Crawford
Assignee: Prashant Kommireddi
Priority: Minor
 Attachments: PIG-2582_1.patch, PIG-2582.patch


 In 
 [ResourceStatistics.java|http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/ResourceStatistics.java?view=markup]
  we see mBytes is public, and has a public getter/setter.
 {code}
 47public Long mBytes; // size in megabytes
 196   public Long getmBytes() {
 197   return mBytes;
 198   }
 199   public ResourceStatistics setmBytes(Long mBytes) {
 200   this.mBytes = mBytes;
 201   return this;
 202   }
 {code}
 Typically sizes are stored as bytes, potentially having convenience functions 
 to return with different units.
 If mBytes can be marked private without causing woes it might be worth 
 storing size as bytes instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2881) Add SUBTRACT eval function

2012-10-29 Thread Joel Costigliola (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Costigliola updated PIG-2881:
--

Attachment: TEST-org.apache.pig.test.TestStore.txt
TEST-org.apache.pig.test.TestLoad.txt

 Add SUBTRACT eval function
 --

 Key: PIG-2881
 URL: https://issues.apache.org/jira/browse/PIG-2881
 Project: Pig
  Issue Type: New Feature
  Components: piggybank
Affects Versions: 0.10.0
Reporter: Joel Costigliola
Priority: Minor
 Attachments: Subtract.java, SubtractTest.java, 
 TEST-org.apache.pig.test.TestLoad.txt, TEST-org.apache.pig.test.TestStore.txt


 Close to DIFF function but SUBTRACT(bag1, bag2) will subtract elements of 
 bag2 from bag1.
   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-1919) order-by on bag gives error only at runtime

2012-10-29 Thread Jonathan Coveney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486556#comment-13486556
 ] 

Jonathan Coveney commented on PIG-1919:
---

Move to 0.12, and yeah, I'll assign it to myself. It could go into 0.11 but 
it's not that important, really.

 order-by on bag gives error only at runtime
 ---

 Key: PIG-1919
 URL: https://issues.apache.org/jira/browse/PIG-1919
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0, 0.9.0
Reporter: Thejas M Nair
 Fix For: 0.12

 Attachments: PIG-1919-0.patch, PIG-1919-1.patch, PIG-1919-1.patch


 Order-by on a bag or tuple should give error at query compile time, instead 
 of giving an error at runtime.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-1919) order-by on bag gives error only at runtime

2012-10-29 Thread Jonathan Coveney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Coveney updated PIG-1919:
--

Fix Version/s: (was: 0.10.1)
   (was: 0.11)
   0.12

 order-by on bag gives error only at runtime
 ---

 Key: PIG-1919
 URL: https://issues.apache.org/jira/browse/PIG-1919
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0, 0.9.0
Reporter: Thejas M Nair
Assignee: Jonathan Coveney
 Fix For: 0.12

 Attachments: PIG-1919-0.patch, PIG-1919-1.patch, PIG-1919-1.patch


 Order-by on a bag or tuple should give error at query compile time, instead 
 of giving an error at runtime.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (PIG-1919) order-by on bag gives error only at runtime

2012-10-29 Thread Jonathan Coveney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Coveney reassigned PIG-1919:
-

Assignee: Jonathan Coveney

 order-by on bag gives error only at runtime
 ---

 Key: PIG-1919
 URL: https://issues.apache.org/jira/browse/PIG-1919
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.8.0, 0.9.0
Reporter: Thejas M Nair
Assignee: Jonathan Coveney
 Fix For: 0.12

 Attachments: PIG-1919-0.patch, PIG-1919-1.patch, PIG-1919-1.patch


 Order-by on a bag or tuple should give error at query compile time, instead 
 of giving an error at runtime.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2881) Add SUBTRACT eval function

2012-10-29 Thread Joel Costigliola (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486557#comment-13486557
 ] 

Joel Costigliola commented on PIG-2881:
---

Logs attached, resulting from playing the test without umask set to any special 
value.



 Add SUBTRACT eval function
 --

 Key: PIG-2881
 URL: https://issues.apache.org/jira/browse/PIG-2881
 Project: Pig
  Issue Type: New Feature
  Components: piggybank
Affects Versions: 0.10.0
Reporter: Joel Costigliola
Priority: Minor
 Attachments: Subtract.java, SubtractTest.java, 
 TEST-org.apache.pig.test.TestLoad.txt, TEST-org.apache.pig.test.TestStore.txt


 Close to DIFF function but SUBTRACT(bag1, bag2) will subtract elements of 
 bag2 from bag1.
   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2582) Store size in bytes (not mbytes) in ResourceStatistics

2012-10-29 Thread Prashant Kommireddi (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486561#comment-13486561
 ] 

Prashant Kommireddi commented on PIG-2582:
--

:)

If we are ok with abusing this setter for now, we need to definitely open a 
JIRA to completely get rid of the deprecated method in 0.12.

 Store size in bytes (not mbytes) in ResourceStatistics
 --

 Key: PIG-2582
 URL: https://issues.apache.org/jira/browse/PIG-2582
 Project: Pig
  Issue Type: Bug
Reporter: Travis Crawford
Assignee: Prashant Kommireddi
Priority: Minor
 Attachments: PIG-2582_1.patch, PIG-2582.patch


 In 
 [ResourceStatistics.java|http://svn.apache.org/viewvc/pig/trunk/src/org/apache/pig/ResourceStatistics.java?view=markup]
  we see mBytes is public, and has a public getter/setter.
 {code}
 47public Long mBytes; // size in megabytes
 196   public Long getmBytes() {
 197   return mBytes;
 198   }
 199   public ResourceStatistics setmBytes(Long mBytes) {
 200   this.mBytes = mBytes;
 201   return this;
 202   }
 {code}
 Typically sizes are stored as bytes, potentially having convenience functions 
 to return with different units.
 If mBytes can be marked private without causing woes it might be worth 
 storing size as bytes instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Subscription: PIG patch available

2012-10-29 Thread jira
Issue Subscription
Filter: PIG patch available (39 issues)

Subscriber: pigdaily

Key Summary
PIG-3013BinInterSedes improve chararray sort performance
https://issues.apache.org/jira/browse/PIG-3013
PIG-3010Allow UDF's to flatten themselves
https://issues.apache.org/jira/browse/PIG-3010
PIG-3006Modernize a chunk of the tests
https://issues.apache.org/jira/browse/PIG-3006
PIG-3001TestExecutableManager.testAddJobConfToEnv fails randomly
https://issues.apache.org/jira/browse/PIG-3001
PIG-2979Pig.jar doesn't work with hadoop-2.0.x
https://issues.apache.org/jira/browse/PIG-2979
PIG-2978TestLoadStoreFuncLifeCycle fails with hadoop-2.0.x
https://issues.apache.org/jira/browse/PIG-2978
PIG-2973TestStreaming test times out
https://issues.apache.org/jira/browse/PIG-2973
PIG-2968ColumnMapKeyPrune fails to prune a subtree inside foreach
https://issues.apache.org/jira/browse/PIG-2968
PIG-2960Increase the timeout for unit test
https://issues.apache.org/jira/browse/PIG-2960
PIG-2959Add a pig.cmd for Pig to run under Windows
https://issues.apache.org/jira/browse/PIG-2959
PIG-2957TetsScriptUDF fail due to volume prefix in jar
https://issues.apache.org/jira/browse/PIG-2957
PIG-2956Invalid cache specification for some streaming statement
https://issues.apache.org/jira/browse/PIG-2956
PIG-2955 Fix bunch of Pig e2e tests on Windows 
https://issues.apache.org/jira/browse/PIG-2955
PIG-2954 TestParamSubPreproc still depends on bash to run 
https://issues.apache.org/jira/browse/PIG-2954
PIG-2953which utility does not exist on Windows
https://issues.apache.org/jira/browse/PIG-2953
PIG-2942DevTests, TestLoad has a false failure on Windows
https://issues.apache.org/jira/browse/PIG-2942
PIG-2937generated field in nested foreach does not inherit the variable 
name as the field name
https://issues.apache.org/jira/browse/PIG-2937
PIG-2898Parallel execution of e2e tests
https://issues.apache.org/jira/browse/PIG-2898
PIG-2881Add SUBTRACT eval function
https://issues.apache.org/jira/browse/PIG-2881
PIG-2873Converting bin/pig shell script to python
https://issues.apache.org/jira/browse/PIG-2873
PIG-2834MultiStorage requires unused constructor argument
https://issues.apache.org/jira/browse/PIG-2834
PIG-2824Pushing checking number of fields into LoadFunc
https://issues.apache.org/jira/browse/PIG-2824
PIG-2801grunt sh command should invoke the shell implicitly instead of 
calling exec directly with the command tokens
https://issues.apache.org/jira/browse/PIG-2801
PIG-2799Update pig streaming interface to run correctly on Windows without 
Cygwin
https://issues.apache.org/jira/browse/PIG-2799
PIG-2798pig streaming tests assume interpreters are auto-resolved
https://issues.apache.org/jira/browse/PIG-2798
PIG-2796Local temporary paths are not always valid HDFS path names.
https://issues.apache.org/jira/browse/PIG-2796
PIG-2795Fix test cases that generate pig scripts with load  + pathStr to 
encode \ in the path
https://issues.apache.org/jira/browse/PIG-2795
PIG-2661Pig uses an extra job for loading data in Pigmix L9
https://issues.apache.org/jira/browse/PIG-2661
PIG-2657Print warning if using wrong jython version
https://issues.apache.org/jira/browse/PIG-2657
PIG-2507Semicolon in paramenters for UDF results in parsing error
https://issues.apache.org/jira/browse/PIG-2507
PIG-2495Using merge JOIN from a HBaseStorage produces an error
https://issues.apache.org/jira/browse/PIG-2495
PIG-2433Jython import module not working if module path is in classpath
https://issues.apache.org/jira/browse/PIG-2433
PIG-2417Streaming UDFs -  allow users to easily write UDFs in scripting 
languages with no JVM implementation.
https://issues.apache.org/jira/browse/PIG-2417
PIG-2405svn tags/release-0.9.1: some unit test case failed with open JDK
https://issues.apache.org/jira/browse/PIG-2405
PIG-2362Rework Ant build.xml to use macrodef instead of antcall
https://issues.apache.org/jira/browse/PIG-2362
PIG-2312NPE when relation and column share the same name and used in Nested 
Foreach 
https://issues.apache.org/jira/browse/PIG-2312
PIG-1942script UDF (jython) should utilize the intended output schema to 
more directly convert Py objects to Pig objects
https://issues.apache.org/jira/browse/PIG-1942
PIG-1431Current DateTime UDFs: ISONOW(), UNIXNOW()
https://issues.apache.org/jira/browse/PIG-1431
PIG-1237Piggybank MutliStorage - specify field to write in output
 

[jira] [Updated] (PIG-3006) Modernize a chunk of the tests

2012-10-29 Thread Jonathan Coveney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Coveney updated PIG-3006:
--

Attachment: PIG-3006-2.patch

This patch takes into account Cheolsoo's RB comments. It ALSO includes 
whitespace changes. We will see how that goes!

 Modernize a chunk of the tests
 --

 Key: PIG-3006
 URL: https://issues.apache.org/jira/browse/PIG-3006
 Project: Pig
  Issue Type: Improvement
Reporter: Jonathan Coveney
Assignee: Jonathan Coveney
 Fix For: 0.12

 Attachments: PIG-3006-0.patch, PIG-3006-1.patch, PIG-3006-2.patch


 A lot of the tests use antiquated patterns. My goal was to refactor them in a 
 couple ways:
 - get rid of the annotation specifying Junit 4. All should use JUnit 4 
 (question: where is the Junit 3 dependency even being pulled in?)
 - Nothing should extend TestCase. Everything should be annotation driven.
 - Properly use asserts. There was a lot of assertTrue(null==thing), so I 
 replaced it with assertNull(thing), and so on.
 - Get rid of MiniCluster use in a handful of cases.
 I've run every test and they pass, EXCEPT TestLargeFile which is failing on 
 trunk anyway.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Review Request: Modernize a chunk of the tests

2012-10-29 Thread Jonathan Coveney

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7734/
---

(Updated Oct. 30, 2012, 1:10 a.m.)


Review request for pig and Julien Le Dem.


Changes
---

I have taken into account comments, and this time I didn't use -w. It has 
whitespace changes. We'll see how that goes...


Description
---

A lot of the tests use antiquated patterns. My goal was to refactor them in a 
couple ways:

- get rid of the annotation specifying Junit 4. All should use JUnit 4 
(question: where is the Junit 3 dependency even being pulled?)
- Nothing should extend TestCase. Everything should be annotation driven.
- Properly use asserts. There was a lot of assertTrue(null==thing), so I 
replaced it with assertNull(thing), and so on.
- Get rid of MiniCluster use in a handful of cases.


This addresses bug PIG-3006.
https://issues.apache.org/jira/browse/PIG-3006


Diffs (updated)
-

  test/org/apache/pig/test/PigExecTestCase.java 32a502c 
  test/org/apache/pig/test/TestAlgebraicEval.java 0bbd83d 
  test/org/apache/pig/test/TestAlgebraicEvalLocal.java df4b76a 
  test/org/apache/pig/test/TestBagFormat.java 09298d4 
  test/org/apache/pig/test/TestBatchAliases.java 6e952c7 
  test/org/apache/pig/test/TestCompressedFiles.java d54ffaa 
  test/org/apache/pig/test/TestConversions.java 152ad5c 
  test/org/apache/pig/test/TestDeleteOnFail.java 7070285 
  test/org/apache/pig/test/TestFilterOpNumeric.java 730e808 
  test/org/apache/pig/test/TestFilterOpString.java b65965f 
  test/org/apache/pig/test/TestFilterSimplification.java ade97b6 
  test/org/apache/pig/test/TestForEachNestedPlanLocal.java a78568e 
  test/org/apache/pig/test/TestFuncSpec.java bc7144c 
  test/org/apache/pig/test/TestInfixArithmetic.java cdf6948 
  test/org/apache/pig/test/TestInputOutputFileValidator.java 67b2873 
  test/org/apache/pig/test/TestInputOutputMiniClusterFileValidator.java caa62cb 
  test/org/apache/pig/test/TestInstantiateFunc.java 31c37b1 
  test/org/apache/pig/test/TestJoin.java a4f3aff 
  test/org/apache/pig/test/TestKeyTypeDiscoveryVisitor.java 2bbeca1 
  test/org/apache/pig/test/TestLargeFile.java 79590ce 
  test/org/apache/pig/test/TestLocal.java 5680196 
  test/org/apache/pig/test/TestLocal2.java eea7b2f 
  test/org/apache/pig/test/TestMapReduce2.java 30574db 
  test/org/apache/pig/test/TestNewPlanColumnPrune.java bed006e 
  test/org/apache/pig/test/TestNewPlanListener.java 7701182 
  test/org/apache/pig/test/TestNewPlanOperatorPlan.java 1f8fe56 
  test/org/apache/pig/test/TestNewPlanPruneMapKeys.java d1cce22 
  test/org/apache/pig/test/TestNewPlanRule.java 4a7ff0a 
  test/org/apache/pig/test/TestNullConstant.java 3ae25d9 
  test/org/apache/pig/test/TestOrderBy2.java 4ee4f26 
  test/org/apache/pig/test/TestOrderBy3.java 2067d7a 
  test/org/apache/pig/test/TestPOBinCond.java 20bd734 
  test/org/apache/pig/test/TestPODistinct.java 60f9d73 
  test/org/apache/pig/test/TestPOGenerate.java e0fd796 
  test/org/apache/pig/test/TestPOMapLookUp.java 3ed0900 
  test/org/apache/pig/test/TestPONegative.java 220c409 
  test/org/apache/pig/test/TestPORegexp.java d6e15ac 
  test/org/apache/pig/test/TestPOSort.java 600ee0c 
  test/org/apache/pig/test/TestPOUserFunc.java 3a90d6c 
  test/org/apache/pig/test/TestParamSubPreproc.java 1a52691 
  test/org/apache/pig/test/TestParser.java 17dc42a 
  test/org/apache/pig/test/TestPi.java f0883d1 
  test/org/apache/pig/test/TestPigProgressReporting.java e4f76ec 
  test/org/apache/pig/test/TestPigScriptParser.java 2acb1a8 
  test/org/apache/pig/test/TestPigSplit.java af70e9d 
  test/org/apache/pig/test/TestPinOptions.java a730ce7 
  test/org/apache/pig/test/TestPruneColumn.java 03139a5 
  test/org/apache/pig/test/TestRank1.java fbc6a7d 
  test/org/apache/pig/test/TestRank2.java d4daf8b 
  test/org/apache/pig/test/TestRank3.java 6dd2624 
  test/org/apache/pig/test/TestRelationToExprProject.java 1411451 
  test/org/apache/pig/test/TestSchemaUtil.java e1d1133 
  test/org/apache/pig/test/TestStore.java 7f1c77b 
  test/org/apache/pig/test/TestStoreOld.java 37ad3bf 
  test/org/apache/pig/test/TestStreamingLocal.java b745074 
  test/org/apache/pig/test/TestToolsPigServer.java e021b8c 
  test/org/apache/pig/test/TestUDF.java f1b10f8 
  test/org/apache/pig/test/TestUDFGroovy.java e5b8c8e 
  test/org/apache/pig/test/TestUDFWithoutParameter.java 2527afa 
  test/org/apache/pig/test/TestUTF8.java 42aab25 

Diff: https://reviews.apache.org/r/7734/diff/


Testing
---

I ran every test affected and they pass, except for TestLargeFile which is 
failing independently (I made no changes to TestLargeFile that should affect 
whether it passed, it was small and cosmetic)


Thanks,

Jonathan Coveney



[jira] [Created] (PIG-3016) Modernize more tests

2012-10-29 Thread Jonathan Coveney (JIRA)
Jonathan Coveney created PIG-3016:
-

 Summary: Modernize more tests
 Key: PIG-3016
 URL: https://issues.apache.org/jira/browse/PIG-3016
 Project: Pig
  Issue Type: Improvement
Reporter: Jonathan Coveney
Assignee: Jonathan Coveney
 Fix For: 0.12


This takes the same idea as PIG-3006 and applies it to the remaining tests. 
Note that the one thing I did not do was get rid of MiniCluster. That can be 
for another JIRA. All of this refactoring is effort enough for the time being :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3016) Modernize more tests

2012-10-29 Thread Jonathan Coveney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Coveney updated PIG-3016:
--

Attachment: PIG-3016-0.patch

Here is a patch! It has whitespace changes. Will attach RB shortly.

 Modernize more tests
 

 Key: PIG-3016
 URL: https://issues.apache.org/jira/browse/PIG-3016
 Project: Pig
  Issue Type: Improvement
Reporter: Jonathan Coveney
Assignee: Jonathan Coveney
 Fix For: 0.12

 Attachments: PIG-3016-0.patch


 This takes the same idea as PIG-3006 and applies it to the remaining tests. 
 Note that the one thing I did not do was get rid of MiniCluster. That can be 
 for another JIRA. All of this refactoring is effort enough for the time being 
 :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Review Request: Modernize more tests

2012-10-29 Thread Jonathan Coveney

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7771/
---

Review request for pig, Julien Le Dem, Cheolsoo Park, and Gianmarco De 
Francisci Morales.


Description
---

I refactored the remaining tests. I did not remove MiniCluster usage.


This addresses bug PIG-3016.
https://issues.apache.org/jira/browse/PIG-3016


Diffs
-

  test/org/apache/pig/test/PigExecTestCase.java 32a502c 
  test/org/apache/pig/test/TestAccumulator.java 73ba0ce 
  test/org/apache/pig/test/TestAdd.java cdeabb3 
  test/org/apache/pig/test/TestAlgebraicEval.java 0bbd83d 
  test/org/apache/pig/test/TestAlgebraicEvalLocal.java df4b76a 
  test/org/apache/pig/test/TestBagFormat.java 09298d4 
  test/org/apache/pig/test/TestBatchAliases.java 6e952c7 
  test/org/apache/pig/test/TestBestFitCast.java a1bf5c5 
  test/org/apache/pig/test/TestBloom.java e9fe355 
  test/org/apache/pig/test/TestBoolean.java b3f9e3e 
  test/org/apache/pig/test/TestCharArrayToNumeric.java bd0880d 
  test/org/apache/pig/test/TestCmdLineParser.java eb91cae 
  test/org/apache/pig/test/TestCombiner.java af7b678 
  test/org/apache/pig/test/TestCompressedFiles.java d54ffaa 
  test/org/apache/pig/test/TestConstExpr.java a7a0e3b 
  test/org/apache/pig/test/TestConversions.java 152ad5c 
  test/org/apache/pig/test/TestDataModel.java d04658c 
  test/org/apache/pig/test/TestDeleteOnFail.java 7070285 
  test/org/apache/pig/test/TestDivide.java be7c6a3 
  test/org/apache/pig/test/TestEqualTo.java edefa35 
  test/org/apache/pig/test/TestFRJoin.java 829b8ae 
  test/org/apache/pig/test/TestFilter.java e954ea3 
  test/org/apache/pig/test/TestFilterOpNumeric.java 730e808 
  test/org/apache/pig/test/TestFilterOpString.java b65965f 
  test/org/apache/pig/test/TestFilterSimplification.java ade97b6 
  test/org/apache/pig/test/TestFilterUDF.java ffc589d 
  test/org/apache/pig/test/TestFinish.java d213a1d 
  test/org/apache/pig/test/TestForEach.java e47646f 
  test/org/apache/pig/test/TestForEachNestedPlanLocal.java a78568e 
  test/org/apache/pig/test/TestFuncSpec.java bc7144c 
  test/org/apache/pig/test/TestGTOrEqual.java a35bc74 
  test/org/apache/pig/test/TestGreaterThan.java d522975 
  test/org/apache/pig/test/TestInfixArithmetic.java cdf6948 
  test/org/apache/pig/test/TestInputOutputFileValidator.java 67b2873 
  test/org/apache/pig/test/TestInputOutputMiniClusterFileValidator.java caa62cb 
  test/org/apache/pig/test/TestInstantiateFunc.java 31c37b1 
  test/org/apache/pig/test/TestJoin.java a4f3aff 
  test/org/apache/pig/test/TestJoinSmoke.java afe7e60 
  test/org/apache/pig/test/TestKeyTypeDiscoveryVisitor.java 2bbeca1 
  test/org/apache/pig/test/TestLOLoadDeterminedSchema.java 2177b58 
  test/org/apache/pig/test/TestLTOrEqual.java 6b7c795 
  test/org/apache/pig/test/TestLargeFile.java 79590ce 
  test/org/apache/pig/test/TestLessThan.java 7df1dc1 
  test/org/apache/pig/test/TestLocal.java 5680196 
  test/org/apache/pig/test/TestLocal2.java eea7b2f 
  test/org/apache/pig/test/TestMapReduce2.java 30574db 
  test/org/apache/pig/test/TestMod.java 523238c 
  test/org/apache/pig/test/TestMultiply.java 7cb1d41 
  test/org/apache/pig/test/TestNewPlanColumnPrune.java bed006e 
  test/org/apache/pig/test/TestNewPlanListener.java 7701182 
  test/org/apache/pig/test/TestNewPlanLogicalOptimizer.java 5eb841d 
  test/org/apache/pig/test/TestNewPlanOperatorPlan.java 1f8fe56 
  test/org/apache/pig/test/TestNewPlanPruneMapKeys.java d1cce22 
  test/org/apache/pig/test/TestNewPlanRule.java 4a7ff0a 
  test/org/apache/pig/test/TestNotEqualTo.java e6c3bca 
  test/org/apache/pig/test/TestNull.java 64deaaf 
  test/org/apache/pig/test/TestNullConstant.java 3ae25d9 
  test/org/apache/pig/test/TestOrderBy2.java 4ee4f26 
  test/org/apache/pig/test/TestOrderBy3.java 2067d7a 
  test/org/apache/pig/test/TestPOBinCond.java 20bd734 
  test/org/apache/pig/test/TestPODistinct.java 60f9d73 
  test/org/apache/pig/test/TestPOGenerate.java e0fd796 
  test/org/apache/pig/test/TestPOMapLookUp.java 3ed0900 
  test/org/apache/pig/test/TestPONegative.java 220c409 
  test/org/apache/pig/test/TestPORegexp.java d6e15ac 
  test/org/apache/pig/test/TestPOSort.java 600ee0c 
  test/org/apache/pig/test/TestPOUserFunc.java 3a90d6c 
  test/org/apache/pig/test/TestPackage.java 7bc6c5c 
  test/org/apache/pig/test/TestParamSubPreproc.java 1a52691 
  test/org/apache/pig/test/TestParser.java 17dc42a 
  test/org/apache/pig/test/TestPhyOp.java 0c9ecb0 
  test/org/apache/pig/test/TestPi.java f0883d1 
  test/org/apache/pig/test/TestPigContext.java ca0ef84 
  test/org/apache/pig/test/TestPigProgressReporting.java e4f76ec 
  test/org/apache/pig/test/TestPigScriptParser.java 2acb1a8 
  test/org/apache/pig/test/TestPigSplit.java af70e9d 
  test/org/apache/pig/test/TestPinOptions.java a730ce7 
  test/org/apache/pig/test/TestPlanGeneration.java c1bc806 
  

[jira] [Commented] (PIG-3016) Modernize more tests

2012-10-29 Thread Jonathan Coveney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486584#comment-13486584
 ] 

Jonathan Coveney commented on PIG-3016:
---

RB link. Keep in mind that this has all of the changes from PIG-3006 (maybe I 
should have based the diff off that), so any file changed there isn't what this 
patch changed. Hopefully this will be cleaner once that patch is committed.

https://reviews.apache.org/r/7771/

 Modernize more tests
 

 Key: PIG-3016
 URL: https://issues.apache.org/jira/browse/PIG-3016
 Project: Pig
  Issue Type: Improvement
Reporter: Jonathan Coveney
Assignee: Jonathan Coveney
 Fix For: 0.12

 Attachments: PIG-3016-0.patch


 This takes the same idea as PIG-3006 and applies it to the remaining tests. 
 Note that the one thing I did not do was get rid of MiniCluster. That can be 
 for another JIRA. All of this refactoring is effort enough for the time being 
 :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2641) Create toJSON function for all complex types: tuples, bags and maps

2012-10-29 Thread Russell Jurney (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486618#comment-13486618
 ] 

Russell Jurney commented on PIG-2641:
-

Move to 0.12. If I get the time I'll finish it and change it back.

 Create toJSON function for all complex types: tuples, bags and maps
 ---

 Key: PIG-2641
 URL: https://issues.apache.org/jira/browse/PIG-2641
 Project: Pig
  Issue Type: New Feature
  Components: piggybank
Affects Versions: 0.12
 Environment: Foggy. Damn foggy.
Reporter: Russell Jurney
Assignee: Russell Jurney
  Labels: chararray, fun, happy, input, json, output, pants, pig, 
 piggybank, string, wonderdog
 Fix For: 0.12

   Original Estimate: 96h
  Remaining Estimate: 96h

 It is a travesty that there are no UDFs in Piggybanks that, given an 
 arbitrary Pig datatype, return a JSON string of same. I intend to fix this 
 problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2641) Create toJSON function for all complex types: tuples, bags and maps

2012-10-29 Thread Russell Jurney (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2641?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Jurney updated PIG-2641:


Affects Version/s: (was: 0.10.1)
   (was: 0.11)
   0.12
Fix Version/s: (was: 0.10.1)
   (was: 0.11)
   0.12

 Create toJSON function for all complex types: tuples, bags and maps
 ---

 Key: PIG-2641
 URL: https://issues.apache.org/jira/browse/PIG-2641
 Project: Pig
  Issue Type: New Feature
  Components: piggybank
Affects Versions: 0.12
 Environment: Foggy. Damn foggy.
Reporter: Russell Jurney
Assignee: Russell Jurney
  Labels: chararray, fun, happy, input, json, output, pants, pig, 
 piggybank, string, wonderdog
 Fix For: 0.12

   Original Estimate: 96h
  Remaining Estimate: 96h

 It is a travesty that there are no UDFs in Piggybanks that, given an 
 arbitrary Pig datatype, return a JSON string of same. I intend to fix this 
 problem.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2434) investigate 5% slowdown in TPC-H Q6 query in 0.10

2012-10-29 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486632#comment-13486632
 ] 

Olga Natkovich commented on PIG-2434:
-

Thejas, any plan to address this for 0.11?

 investigate 5% slowdown in TPC-H Q6 query in 0.10
 -

 Key: PIG-2434
 URL: https://issues.apache.org/jira/browse/PIG-2434
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.10.0
Reporter: Thejas M Nair
 Fix For: 0.11


 0.10 is slower than 0.9 by around 5% for TPC-H Q6 query as per observation in 
 https://issues.apache.org/jira/browse/PIG-2228?focusedCommentId=13171461page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13171461
  .
 This needs to be investigated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (PIG-2812) Spill InternalCachedBag into only 1 file

2012-10-29 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich reassigned PIG-2812:
---

Assignee: Haitao Yao

 Spill InternalCachedBag into only 1 file
 

 Key: PIG-2812
 URL: https://issues.apache.org/jira/browse/PIG-2812
 Project: Pig
  Issue Type: Bug
  Components: data
Reporter: Haitao Yao
Assignee: Haitao Yao
 Fix For: 0.11

 Attachments: aa.jpg, spill.patch


 I encountered a reducer's OOM because of java.io.DeleteOnExitHook. And I 
 found out that the InternalCachedBag creates a seperate tmp file, and the tmp 
 files is deleted on exit. So the file delete hook caused the OOM. 
 Why not just hold the tmp file handle and spill only one tmp file?
 Too many tmp files may block the tasktracker start process, if the tmp files 
 are not cleaned on time and the tasktracker restarts at this specific time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2812) Spill InternalCachedBag into only 1 file

2012-10-29 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486634#comment-13486634
 ] 

Olga Natkovich commented on PIG-2812:
-

Alan - are you planning to review this one? Do we need to include this in 0.11?

 Spill InternalCachedBag into only 1 file
 

 Key: PIG-2812
 URL: https://issues.apache.org/jira/browse/PIG-2812
 Project: Pig
  Issue Type: Bug
  Components: data
Reporter: Haitao Yao
Assignee: Haitao Yao
 Fix For: 0.11

 Attachments: aa.jpg, spill.patch


 I encountered a reducer's OOM because of java.io.DeleteOnExitHook. And I 
 found out that the InternalCachedBag creates a seperate tmp file, and the tmp 
 files is deleted on exit. So the file delete hook caused the OOM. 
 Why not just hold the tmp file handle and spill only one tmp file?
 Too many tmp files may block the tasktracker start process, if the tmp files 
 are not cleaned on time and the tasktracker restarts at this specific time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2681) TestDriverPig.countStores() does not correctly count the number of stores for pig scripts using variables for the alias

2012-10-29 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-2681:


Fix Version/s: (was: 0.10.1)
   (was: 0.9.3)
   (was: 0.11)
   0.12

 TestDriverPig.countStores() does not correctly count the number of stores for 
 pig scripts using variables for the alias
 ---

 Key: PIG-2681
 URL: https://issues.apache.org/jira/browse/PIG-2681
 Project: Pig
  Issue Type: Test
  Components: e2e harness
Affects Versions: 0.9.0, 0.9.1, 0.9.2, 0.10.0
Reporter: Araceli Henley
 Fix For: 0.12

 Attachments: PIG-2681.patch


 For  pig macros where the out parameter is referenced in a store statement, 
 the TestDriveP.countStores() does not correctly count the number of stores:
 For example, the store will not be counted in :
 define myMacro(in1,in2) returns A {
  A  = load '$in1' using PigStorage('$delimeter') as (intnum1000: int,id: 
 int,intnum5: int,intnum100: int,intnum: int,longnum: long,floatnum: 
 float,doublenum: double);
store $A into '$out';
 }
  countStores() matches with:
  $count += $q[$i] =~ /store\s+[a-zA-Z][a-zA-Z0-9_]*\s+into/i;
 Since the alias has a special character $ it doesn't count it and the test 
 fails.
 Need to change this to:
$count += $q[$i] =~ /store\s+(\$)?[a-zA-Z][a-zA-Z0-9_]*\s+into/i;
 I'll submit a patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2981) add e2e tests for DateTime data type

2012-10-29 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486636#comment-13486636
 ] 

Olga Natkovich commented on PIG-2981:
-

Is anybody planning to add this or should it be moved to 0.12?

 add e2e tests for DateTime  data type
 -

 Key: PIG-2981
 URL: https://issues.apache.org/jira/browse/PIG-2981
 Project: Pig
  Issue Type: Test
Reporter: Thejas M Nair
 Fix For: 0.11


 e2e tests for DateTime datatype need to be added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (PIG-2974) StreamingLocal_11 e2e test hangs

2012-10-29 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-2974.
-

Resolution: Duplicate

 StreamingLocal_11 e2e test hangs
 

 Key: PIG-2974
 URL: https://issues.apache.org/jira/browse/PIG-2974
 Project: Pig
  Issue Type: Sub-task
Affects Versions: 0.11
Reporter: Rohini Palaniswamy
 Fix For: 0.11




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2630) Issue with setting b = a;

2012-10-29 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-2630:


Fix Version/s: (was: 0.10.1)
   (was: 0.11)
   0.12

Moving to 0.12 as no work has been done so far

 Issue with setting b = a;
 ---

 Key: PIG-2630
 URL: https://issues.apache.org/jira/browse/PIG-2630
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.10.0, 0.11
Reporter: Jonathan Coveney
 Fix For: 0.12


 The following gives an error:
 {code}
 a = load 'thing' as (x:int);
 b = a; c = join a by x, b by x;
 {code}
 Error:
 {code}
 2012-04-03 14:02:47,434 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
 1200: Pig script failed to parse: 
 line 14, column 4 pig script failed to validate: 
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225: Projection 
 with nothing to reference!
 {code}
 No issue with the following, however
 {code}
 a = load 'thing' as (x:int);
 b = foreach a generate *;
 c = join a by x, b by x;
 {code}
 oh and here is the log:
 {code}
 $ cat pig_1333487146863.log
 Pig Stack Trace
 ---
 ERROR 1200: Pig script failed to parse: 
 line 3, column 4 pig script failed to validate: 
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225: Projection 
 with nothing to reference!
 Failed to parse: Pig script failed to parse: 
 line 3, column 4 pig script failed to validate: 
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225: Projection 
 with nothing to reference!
   at 
 org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:182)
   at org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1566)
   at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1539)
   at org.apache.pig.PigServer.registerQuery(PigServer.java:541)
   at 
 org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:945)
   at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:392)
   at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:190)
   at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
   at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
   at org.apache.pig.Main.run(Main.java:535)
   at org.apache.pig.Main.main(Main.java:153)
 Caused by: 
 line 3, column 4 pig script failed to validate: 
 org.apache.pig.impl.logicalLayer.FrontendException: ERROR 2225: Projection 
 with nothing to reference!
   at 
 org.apache.pig.parser.LogicalPlanBuilder.buildJoinOp(LogicalPlanBuilder.java:363)
   at 
 org.apache.pig.parser.LogicalPlanGenerator.join_clause(LogicalPlanGenerator.java:11441)
   at 
 org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1491)
   at 
 org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:791)
   at 
 org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:509)
   at 
 org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:384)
   at 
 org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:175)
   ... 10 more
 
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3005) TestLargeFile#testOrderBy is failing

2012-10-29 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-3005:


Affects Version/s: (was: 0.12)
   (was: 0.11)
Fix Version/s: (was: 0.11)

 TestLargeFile#testOrderBy is failing
 

 Key: PIG-3005
 URL: https://issues.apache.org/jira/browse/PIG-3005
 Project: Pig
  Issue Type: Sub-task
 Environment: Mac OSX 10.6.8
Reporter: Jonathan Coveney
 Fix For: 0.12


 When run locally, at least, this test is failing for me.
 Has anyone else noticed this failing?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2924) PigStats should not be assuming all Storage classes to be file-based storage

2012-10-29 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-2924:
---

Attachment: PIG-2924-3.patch

I updated the patch as follows:
{quote}
Having the word Computer in the interface name and configs could cause 
confusion, due to how it's an overloaded term. I don't have any great 
suggestions though. PigStatsOutputSizeReader?
{quote}
Changed to {{PigStatsOutputSizeReader}}.
{quote}
Instead of registering a single new computer it would be ideal if we could 
register a list of computers.
{quote}
Fixed.
{quote}
Each computer could have a boolean supports(POStore poStore) method that 
returns whether this class supports a given POStore. This can often be done by 
inspecting the output path. A default URI-based abstract class could help with 
that part.
{quote}
Each reader implements {{boolean supports(String uri)}} method. For 
{{FileBasedOutputSizeReader}}, the output of 
{{UriUtil.isHDFSFileOrLocalOrS3N()}} is returned.
{quote}
The computers would then be consulted in order, where the first to support the 
POStore wins.
{quote}
Fixed.
{quote}
If a computer can't determine a size for some reason (i.e., it doesn't support 
it or an exception occurred), it shouldn't return 0. Instead maybe we reserve 
-1 for this case and document it as such.
{quote}
Fixed.

In addition, I replaced {{POStore}} with {{String}}. Please let me know what 
you think.

Thanks!

 PigStats should not be assuming all Storage classes to be file-based storage
 

 Key: PIG-2924
 URL: https://issues.apache.org/jira/browse/PIG-2924
 Project: Pig
  Issue Type: Bug
  Components: tools
Affects Versions: 0.9.2, 0.10.0
Reporter: Harsh J
Assignee: Cheolsoo Park
 Attachments: PIG-2924-2.patch, PIG-2924-3.patch, PIG-2924.patch


 Using PigStatsUtil (like Oozie does) to collect JobStats for jobs that use a 
 HBaseStorage blows up when the stats are asked to be accumulated.
 This is because JobStats (which adds stuff up) is assuming all storages are 
 file based and that it can do listStatus/etc. operations on their 
 filespec-provided filename. For HBaseStorage, this is set to the tablename 
 and there's no such file, leading to an exception (FileNotFound or Invalid 
 URI - depending on using 'tablename' or 'hbase://tablename').

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2924) PigStats should not be assuming all Storage classes to be file-based storage

2012-10-29 Thread Cheolsoo Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-2924:
---

Status: Patch Available  (was: Open)

 PigStats should not be assuming all Storage classes to be file-based storage
 

 Key: PIG-2924
 URL: https://issues.apache.org/jira/browse/PIG-2924
 Project: Pig
  Issue Type: Bug
  Components: tools
Affects Versions: 0.10.0, 0.9.2
Reporter: Harsh J
Assignee: Cheolsoo Park
 Attachments: PIG-2924-2.patch, PIG-2924-3.patch, PIG-2924.patch


 Using PigStatsUtil (like Oozie does) to collect JobStats for jobs that use a 
 HBaseStorage blows up when the stats are asked to be accumulated.
 This is because JobStats (which adds stuff up) is assuming all storages are 
 file based and that it can do listStatus/etc. operations on their 
 filespec-provided filename. For HBaseStorage, this is set to the tablename 
 and there's no such file, leading to an exception (FileNotFound or Invalid 
 URI - depending on using 'tablename' or 'hbase://tablename').

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3015) Rewrite of AvroStorage

2012-10-29 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13486662#comment-13486662
 ] 

Cheolsoo Park commented on PIG-3015:


Thanks for the link.

You can upload the entire code as a single patch if you prefer. I suggested 
only because big patches usually take longer to be reviewed and committed, but 
I will review this one at least.

 Rewrite of AvroStorage
 --

 Key: PIG-3015
 URL: https://issues.apache.org/jira/browse/PIG-3015
 Project: Pig
  Issue Type: Improvement
  Components: piggybank
Reporter: Joseph Adler

 The current AvroStorage implementation has a lot of issues: it requires old 
 versions of Avro, it copies data much more than needed, and it's verbose and 
 complicated. (One pet peeve of mine is that old versions of Avro don't 
 support Snappy compression.)
 I rewrote AvroStorage from scratch to fix these issues. In early tests, the 
 new implementation is significantly faster, and the code is a lot simpler. 
 Rewriting AvroStorage also enabled me to implement support for Trevni.
 I'm opening this ticket to facilitate discussion while I figure out the best 
 way to contribute the changes back to Apache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira