[jira] [Commented] (PIG-3072) Pig job reporting negative progress

2012-12-04 Thread Rohini Palaniswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13509920#comment-13509920
 ] 

Rohini Palaniswamy commented on PIG-3072:
-

Koji,
   Can you use HadoopShims to create the TaskAttemptContext in your test. The 
test fails to compile with H23.

{noformat}
 [javac] 
/apache/pig/trunk/test/org/apache/pig/test/TestTmpFileCompression.java:369: 
org.apache.hadoop.mapreduce.TaskAttemptContext is abstract; cannot be 
instantiated
[javac] new TaskAttemptContext(conf, new 
TaskAttemptID()));
{noformat}

 Pig job reporting negative progress
 ---

 Key: PIG-3072
 URL: https://issues.apache.org/jira/browse/PIG-3072
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.10.0
Reporter: Koji Noguchi
Assignee: Koji Noguchi
Priority: Minor
 Fix For: 0.12

 Attachments: pig-3072-v01.txt, pig-3072-v02.txt, pig-3072-v03.txt


 Our users pointed out that their jobs reporting negative progress.
 2012-11-02 21:43:11,538 [main] INFO 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
 - -795% complete
 ...
 (due to TFileRecordReader)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-2812) Spill InternalCachedBag into only 1 file

2012-12-04 Thread Julien Le Dem (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-2812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Julien Le Dem updated PIG-2812:
---

Fix Version/s: (was: 0.11)

I'm detaching this from pig-0.11 as it is not ready yet

 Spill InternalCachedBag into only 1 file
 

 Key: PIG-2812
 URL: https://issues.apache.org/jira/browse/PIG-2812
 Project: Pig
  Issue Type: Bug
  Components: data
Reporter: Haitao Yao
Assignee: Haitao Yao
 Attachments: aa.jpg, spill.patch


 I encountered a reducer's OOM because of java.io.DeleteOnExitHook. And I 
 found out that the InternalCachedBag creates a seperate tmp file, and the tmp 
 files is deleted on exit. So the file delete hook caused the OOM. 
 Why not just hold the tmp file handle and spill only one tmp file?
 Too many tmp files may block the tasktracker start process, if the tmp files 
 are not cleaned on time and the tasktracker restarts at this specific time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3076) make TestScalarAliases more reliable

2012-12-04 Thread Julien Le Dem (JIRA)
Julien Le Dem created PIG-3076:
--

 Summary: make TestScalarAliases more reliable
 Key: PIG-3076
 URL: https://issues.apache.org/jira/browse/PIG-3076
 Project: Pig
  Issue Type: Test
Reporter: Julien Le Dem
Assignee: Julien Le Dem
 Fix For: 0.11, 0.12


currently, this test writes in the root directory so its output is not deleted 
by ant clean.
Also it deletes its output in the end instead of the begining.
The consequence is that if the test fail once then it will keep failing until 
the directory is manually cleaned up (not good for CI)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3072) Pig job reporting negative progress

2012-12-04 Thread Koji Noguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Noguchi updated PIG-3072:
--

Attachment: pig-3072-v04.txt

bq. Can you use HadoopShims to create the TaskAttemptContext in your test. The 
test fails to compile with H23.

Thanks Rohini.  Uploading another patch with your suggestion.  

Ran both
$ ant clean test -Dtestcase=TestTmpFileCompression
$ ant -Dhadoopversion=23 clean test -Dtestcase=TestTmpFileCompression

 Pig job reporting negative progress
 ---

 Key: PIG-3072
 URL: https://issues.apache.org/jira/browse/PIG-3072
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.10.0
Reporter: Koji Noguchi
Assignee: Koji Noguchi
Priority: Minor
 Fix For: 0.12

 Attachments: pig-3072-v01.txt, pig-3072-v02.txt, pig-3072-v03.txt, 
 pig-3072-v04.txt


 Our users pointed out that their jobs reporting negative progress.
 2012-11-02 21:43:11,538 [main] INFO 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
 - -795% complete
 ...
 (due to TFileRecordReader)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-3015) Rewrite of AvroStorage

2012-12-04 Thread Joseph Adler (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13509992#comment-13509992
 ] 

Joseph Adler commented on PIG-3015:
---

I think that approach makes sense; each object in a file should be wrapped in a 
Tuple. Suppose that a file example.avro contained the data:

  {[1, 2, 3, 4, 5]}
  {[6, 7, 8, 9, 10]}

and had this schema: {name : IntArray, type : array, items : int}, 
and we loaded this as

  A = LOAD 'example.avro' USING AvroStorage;

The bag A would have the Pig schema A:{(IntArray:{(int)})}; it would contain 
two tuples, which would in turn each contain one bag of integers. Does that 
sound correct? If so, I'll go implement that.


 Rewrite of AvroStorage
 --

 Key: PIG-3015
 URL: https://issues.apache.org/jira/browse/PIG-3015
 Project: Pig
  Issue Type: Improvement
  Components: piggybank
Reporter: Joseph Adler
Assignee: Joseph Adler
 Attachments: PIG-3015.patch


 The current AvroStorage implementation has a lot of issues: it requires old 
 versions of Avro, it copies data much more than needed, and it's verbose and 
 complicated. (One pet peeve of mine is that old versions of Avro don't 
 support Snappy compression.)
 I rewrote AvroStorage from scratch to fix these issues. In early tests, the 
 new implementation is significantly faster, and the code is a lot simpler. 
 Rewriting AvroStorage also enabled me to implement support for Trevni.
 I'm opening this ticket to facilitate discussion while I figure out the best 
 way to contribute the changes back to Apache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Our release process

2012-12-04 Thread Olga Natkovich
I am ok with tests running nightly and reverting patches that cause failures. 
We used to have that. Does anybody know what happened? Is anybody volunteering 
to make it work again?

I would like to see specific criteria for what goes into the branch been 
published (rather than case-by-case). This way each team can decided if the 
criteria stringent enough of if they need to run a private branch.

Olga



 From: Santhosh M S santhosh_mut...@yahoo.com
To: Julien Le Dem jul...@twitter.com; dev@pig.apache.org 
dev@pig.apache.org 
Cc: billgra...@gmail.com billgra...@gmail.com 
Sent: Friday, November 30, 2012 11:46 PM
Subject: Re: Our release process
 
HI Julien,

You are making most of the points that I did on this thread (CI for e2e, not 
burdening clean e2e prior to every commit for a release branch). The only point 
on which there is no clear agreement is the definition of a bug that can be 
included in a previously released branch. I am fine with a case by case 
inclusion. 

Hi Olga,

Are you fine with Julien's proposal as it stands - bugs that are included will 
be determined at the time of inclusion instead of doing it now.

Santhosh



From: Julien Le Dem jul...@twitter.com
To: dev@pig.apache.org; Santhosh M S santhosh_mut...@yahoo.com 
Cc: billgra...@gmail.com billgra...@gmail.com 
Sent: Friday, November 30, 2012 5:37 PM
Subject: Re: Our release process

Proposed criteria:
- it makes the tests fail. targets test-commit + test + e2e tests
- a critical bug is reported in a short time frame (definition of
critical not needed as it is rare and can be decided on a case by case
basis)

That raises another question: what are the existing CI servers running
the tests?
- the Apache CI runs test-commit and test (is it more stable now?)
and not e2e. It would be great if it did.
- we have a Jenkins build at Twitter where we run test-commit and
test, we could not run e2e easily in our environment.
- I understand there's a Yahoo/Hortonworks build (test-commit + test + e2e ???)

Whenever those builds fail we should open or reopen JIRAS and fix it.

The time it takes to run the full
test suite makes it impractical to
run on a desktop/laptop.

For the release Pig-0.11.0 we need to get this list of JIRAs down to 0
and publish the jar.
https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truejqlQuery=project+%3D+PIG+AND+fixVersion+%3D+%220.11%22+AND+resolution+%3D+Unresolved+ORDER+BY+updated+DESC%2C+due+ASC%2C+priority+DESC

Julien

On Thu, Nov 29, 2012 at 11:16 PM, Santhosh M S
santhosh_mut...@yahoo.com wrote:
 Looks like everyone is interested in having frequent releases - I don't see 
 anyone disagreeing with that.

 Regarding If a patch
makes the release branch unstable, we revert it - what are the criteria? If we 
can't decide on the criteria on this thread (already pretty long) then lets get 
the release trains going. We can revisit the criteria for inclusion of bug 
fixes when that happens.

 Santhosh


 
  From: Julien Le Dem jul...@twitter.com
 To: dev@pig.apache.org; Santhosh M S santhosh_mut...@yahoo.com
 Cc: billgra...@gmail.com billgra...@gmail.com
 Sent:
Thursday, November 29, 2012 9:45 AM
 Subject: Re: Our release process

 The release branch receives only bug fixes. Patch level releases (3rd
 version number) are issued out of the release branch and introduce
 only bug fixes and no new features.
 Deciding whether a patch is applied to the release branch is based on
 preserving stability (as Bill said). If a patch makes the release
 branch unstable, we revert it.
 New features are added to trunk where new major and minor releases will 
 happen.
 If we need a new feature out then we make a new minor release.
 Doing frequent releases is the industry standard and will resolve
 conflicts around what should go in a release branch.

 Making a new release is currently painful *because* we wait so long in
 between two releases. Let's fix that.

 Julien

 On Wed, Nov 28, 2012 at
10:09 PM, Santhosh M S
 santhosh_mut...@yahoo.com wrote:
 Since releasing a major version once a month is agressive and we have not 
 released on a quarterly basis, we should allow commits to a released branch 
 to facilitate dot releases.

 If we are allowing commits to a released branch, the criteria for inclusion 
 can be created anew or we use the industry standards for severity (or 
 priority). It could be painful for a few folks but I don't see better 
 alternatives.

 Regarding reverting commits based on e2e tests breaking:
         1. Who is running the tests?
         2. How often are they run?
 If we have nightly e2e runs then its easier to catch these errors early. If 
 not the barrier for inclusion is pretty high and time
consuming making it harder to develop.

 Santhosh


 
  From: Bill Graham billgra...@gmail.com
 To: dev@pig.apache.org
 Sent: Wednesday, November 28, 2012 11:39 AM
 

[jira] [Commented] (PIG-3015) Rewrite of AvroStorage

2012-12-04 Thread Cheolsoo Park (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13510033#comment-13510033
 ] 

Cheolsoo Park commented on PIG-3015:


Yes, it does. Thank you, sir!

 Rewrite of AvroStorage
 --

 Key: PIG-3015
 URL: https://issues.apache.org/jira/browse/PIG-3015
 Project: Pig
  Issue Type: Improvement
  Components: piggybank
Reporter: Joseph Adler
Assignee: Joseph Adler
 Attachments: PIG-3015.patch


 The current AvroStorage implementation has a lot of issues: it requires old 
 versions of Avro, it copies data much more than needed, and it's verbose and 
 complicated. (One pet peeve of mine is that old versions of Avro don't 
 support Snappy compression.)
 I rewrote AvroStorage from scratch to fix these issues. In early tests, the 
 new implementation is significantly faster, and the code is a lot simpler. 
 Rewriting AvroStorage also enabled me to implement support for Trevni.
 I'm opening this ticket to facilitate discussion while I figure out the best 
 way to contribute the changes back to Apache.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (PIG-2684) :: in field name causes AvroStorage to fail

2012-12-04 Thread Will Oberman (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13510045#comment-13510045
 ] 

Will Oberman commented on PIG-2684:
---

I was just bit by this same bug.  For me it was because I'm changing from 
running Hadoop directly against Cassnadra, to doing Cassandra - Amazon EMR - 
Cassandra (using Pig as my Hadoop language of choice, and S3 as the data 
interchange layer).  And, my output schema that is cassandra compatible seems 
to have autogenerated ::'s.

 :: in field name causes AvroStorage to fail
 ---

 Key: PIG-2684
 URL: https://issues.apache.org/jira/browse/PIG-2684
 Project: Pig
  Issue Type: Bug
  Components: piggybank
Reporter: Fabian Alenius

 There appears to be a bug in AvroStorage which causes it to fail when there 
 are field names that contain ::
 For example, the following will fail:
 data = load 'test.txt' as (one, two);
 grp = GROUP data by (one, two);
 result = foreach grp generate FLATTEN(group); 
   
 
 store result into 'test.avro' using 
 org.apache.pig.piggybank.storage.avro.AvroStorage();
 ERROR 2999: Unexpected internal error. Illegal character in: group::one
 While the following will succeed:
 data = load 'test.txt' as (one, two);
 grp = GROUP data by (one, two);
 result = foreach grp generate FLATTEN(group) as (one,two);
  
 store result into 'test.avro' using 
 org.apache.pig.piggybank.storage.avro.AvroStorage();
 Here is a minimal test case:
 data = load 'test.txt' as (one::two, three);  
   
 
 store data into 'test.avro' using 
 org.apache.pig.piggybank.storage.avro.AvroStorage();

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3072) Pig job reporting negative progress

2012-12-04 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3072:


  Resolution: Fixed
Release Note: Committed to trunk. Thanks Koji.
  Status: Resolved  (was: Patch Available)

 Pig job reporting negative progress
 ---

 Key: PIG-3072
 URL: https://issues.apache.org/jira/browse/PIG-3072
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.10.0
Reporter: Koji Noguchi
Assignee: Koji Noguchi
Priority: Minor
 Fix For: 0.12

 Attachments: pig-3072-v01.txt, pig-3072-v02.txt, pig-3072-v03.txt, 
 pig-3072-v04.txt


 Our users pointed out that their jobs reporting negative progress.
 2012-11-02 21:43:11,538 [main] INFO 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
 - -795% complete
 ...
 (due to TFileRecordReader)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (PIG-3072) Pig job reporting negative progress

2012-12-04 Thread Rohini Palaniswamy (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rohini Palaniswamy updated PIG-3072:


Release Note:   (was: Committed to trunk. Thanks Koji.)

Committed to trunk. Thanks Koji.

 Pig job reporting negative progress
 ---

 Key: PIG-3072
 URL: https://issues.apache.org/jira/browse/PIG-3072
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.10.0
Reporter: Koji Noguchi
Assignee: Koji Noguchi
Priority: Minor
 Fix For: 0.12

 Attachments: pig-3072-v01.txt, pig-3072-v02.txt, pig-3072-v03.txt, 
 pig-3072-v04.txt


 Our users pointed out that their jobs reporting negative progress.
 2012-11-02 21:43:11,538 [main] INFO 
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
 - -795% complete
 ...
 (due to TFileRecordReader)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3077) TestMultiQueryLocal should not write in /tmp

2012-12-04 Thread Julien Le Dem (JIRA)
Julien Le Dem created PIG-3077:
--

 Summary: TestMultiQueryLocal should not write in /tmp
 Key: PIG-3077
 URL: https://issues.apache.org/jira/browse/PIG-3077
 Project: Pig
  Issue Type: Test
Reporter: Julien Le Dem


temporary files from tests should be under build/test so that they are cleaned 
by ant clean
Currently two test suites running on the same machine step on each other and 
create flaky tests results

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (PIG-3078) Make a UDF that, given a string, returns just the columns prefixed by that string

2012-12-04 Thread Jonathan Coveney (JIRA)
Jonathan Coveney created PIG-3078:
-

 Summary: Make a UDF that, given a string, returns just the columns 
prefixed by that string
 Key: PIG-3078
 URL: https://issues.apache.org/jira/browse/PIG-3078
 Project: Pig
  Issue Type: Bug
Reporter: Jonathan Coveney
 Fix For: 0.12


This comes up fairly often, usually as the result of a join. Given that the 
resulting schema has the column name prepended, a udf in the following form 
could give just the columns from the desired relation:

Pluck('relation_name', *)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira