[jira] [Commented] (PIG-3072) Pig job reporting negative progress
[ https://issues.apache.org/jira/browse/PIG-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13509920#comment-13509920 ] Rohini Palaniswamy commented on PIG-3072: - Koji, Can you use HadoopShims to create the TaskAttemptContext in your test. The test fails to compile with H23. {noformat} [javac] /apache/pig/trunk/test/org/apache/pig/test/TestTmpFileCompression.java:369: org.apache.hadoop.mapreduce.TaskAttemptContext is abstract; cannot be instantiated [javac] new TaskAttemptContext(conf, new TaskAttemptID())); {noformat} Pig job reporting negative progress --- Key: PIG-3072 URL: https://issues.apache.org/jira/browse/PIG-3072 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.10.0 Reporter: Koji Noguchi Assignee: Koji Noguchi Priority: Minor Fix For: 0.12 Attachments: pig-3072-v01.txt, pig-3072-v02.txt, pig-3072-v03.txt Our users pointed out that their jobs reporting negative progress. 2012-11-02 21:43:11,538 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - -795% complete ... (due to TFileRecordReader) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-2812) Spill InternalCachedBag into only 1 file
[ https://issues.apache.org/jira/browse/PIG-2812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Le Dem updated PIG-2812: --- Fix Version/s: (was: 0.11) I'm detaching this from pig-0.11 as it is not ready yet Spill InternalCachedBag into only 1 file Key: PIG-2812 URL: https://issues.apache.org/jira/browse/PIG-2812 Project: Pig Issue Type: Bug Components: data Reporter: Haitao Yao Assignee: Haitao Yao Attachments: aa.jpg, spill.patch I encountered a reducer's OOM because of java.io.DeleteOnExitHook. And I found out that the InternalCachedBag creates a seperate tmp file, and the tmp files is deleted on exit. So the file delete hook caused the OOM. Why not just hold the tmp file handle and spill only one tmp file? Too many tmp files may block the tasktracker start process, if the tmp files are not cleaned on time and the tasktracker restarts at this specific time. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3076) make TestScalarAliases more reliable
Julien Le Dem created PIG-3076: -- Summary: make TestScalarAliases more reliable Key: PIG-3076 URL: https://issues.apache.org/jira/browse/PIG-3076 Project: Pig Issue Type: Test Reporter: Julien Le Dem Assignee: Julien Le Dem Fix For: 0.11, 0.12 currently, this test writes in the root directory so its output is not deleted by ant clean. Also it deletes its output in the end instead of the begining. The consequence is that if the test fail once then it will keep failing until the directory is manually cleaned up (not good for CI) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3072) Pig job reporting negative progress
[ https://issues.apache.org/jira/browse/PIG-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Noguchi updated PIG-3072: -- Attachment: pig-3072-v04.txt bq. Can you use HadoopShims to create the TaskAttemptContext in your test. The test fails to compile with H23. Thanks Rohini. Uploading another patch with your suggestion. Ran both $ ant clean test -Dtestcase=TestTmpFileCompression $ ant -Dhadoopversion=23 clean test -Dtestcase=TestTmpFileCompression Pig job reporting negative progress --- Key: PIG-3072 URL: https://issues.apache.org/jira/browse/PIG-3072 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.10.0 Reporter: Koji Noguchi Assignee: Koji Noguchi Priority: Minor Fix For: 0.12 Attachments: pig-3072-v01.txt, pig-3072-v02.txt, pig-3072-v03.txt, pig-3072-v04.txt Our users pointed out that their jobs reporting negative progress. 2012-11-02 21:43:11,538 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - -795% complete ... (due to TFileRecordReader) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-3015) Rewrite of AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13509992#comment-13509992 ] Joseph Adler commented on PIG-3015: --- I think that approach makes sense; each object in a file should be wrapped in a Tuple. Suppose that a file example.avro contained the data: {[1, 2, 3, 4, 5]} {[6, 7, 8, 9, 10]} and had this schema: {name : IntArray, type : array, items : int}, and we loaded this as A = LOAD 'example.avro' USING AvroStorage; The bag A would have the Pig schema A:{(IntArray:{(int)})}; it would contain two tuples, which would in turn each contain one bag of integers. Does that sound correct? If so, I'll go implement that. Rewrite of AvroStorage -- Key: PIG-3015 URL: https://issues.apache.org/jira/browse/PIG-3015 Project: Pig Issue Type: Improvement Components: piggybank Reporter: Joseph Adler Assignee: Joseph Adler Attachments: PIG-3015.patch The current AvroStorage implementation has a lot of issues: it requires old versions of Avro, it copies data much more than needed, and it's verbose and complicated. (One pet peeve of mine is that old versions of Avro don't support Snappy compression.) I rewrote AvroStorage from scratch to fix these issues. In early tests, the new implementation is significantly faster, and the code is a lot simpler. Rewriting AvroStorage also enabled me to implement support for Trevni. I'm opening this ticket to facilitate discussion while I figure out the best way to contribute the changes back to Apache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Our release process
I am ok with tests running nightly and reverting patches that cause failures. We used to have that. Does anybody know what happened? Is anybody volunteering to make it work again? I would like to see specific criteria for what goes into the branch been published (rather than case-by-case). This way each team can decided if the criteria stringent enough of if they need to run a private branch. Olga From: Santhosh M S santhosh_mut...@yahoo.com To: Julien Le Dem jul...@twitter.com; dev@pig.apache.org dev@pig.apache.org Cc: billgra...@gmail.com billgra...@gmail.com Sent: Friday, November 30, 2012 11:46 PM Subject: Re: Our release process HI Julien, You are making most of the points that I did on this thread (CI for e2e, not burdening clean e2e prior to every commit for a release branch). The only point on which there is no clear agreement is the definition of a bug that can be included in a previously released branch. I am fine with a case by case inclusion. Hi Olga, Are you fine with Julien's proposal as it stands - bugs that are included will be determined at the time of inclusion instead of doing it now. Santhosh From: Julien Le Dem jul...@twitter.com To: dev@pig.apache.org; Santhosh M S santhosh_mut...@yahoo.com Cc: billgra...@gmail.com billgra...@gmail.com Sent: Friday, November 30, 2012 5:37 PM Subject: Re: Our release process Proposed criteria: - it makes the tests fail. targets test-commit + test + e2e tests - a critical bug is reported in a short time frame (definition of critical not needed as it is rare and can be decided on a case by case basis) That raises another question: what are the existing CI servers running the tests? - the Apache CI runs test-commit and test (is it more stable now?) and not e2e. It would be great if it did. - we have a Jenkins build at Twitter where we run test-commit and test, we could not run e2e easily in our environment. - I understand there's a Yahoo/Hortonworks build (test-commit + test + e2e ???) Whenever those builds fail we should open or reopen JIRAS and fix it. The time it takes to run the full test suite makes it impractical to run on a desktop/laptop. For the release Pig-0.11.0 we need to get this list of JIRAs down to 0 and publish the jar. https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truejqlQuery=project+%3D+PIG+AND+fixVersion+%3D+%220.11%22+AND+resolution+%3D+Unresolved+ORDER+BY+updated+DESC%2C+due+ASC%2C+priority+DESC Julien On Thu, Nov 29, 2012 at 11:16 PM, Santhosh M S santhosh_mut...@yahoo.com wrote: Looks like everyone is interested in having frequent releases - I don't see anyone disagreeing with that. Regarding If a patch makes the release branch unstable, we revert it - what are the criteria? If we can't decide on the criteria on this thread (already pretty long) then lets get the release trains going. We can revisit the criteria for inclusion of bug fixes when that happens. Santhosh From: Julien Le Dem jul...@twitter.com To: dev@pig.apache.org; Santhosh M S santhosh_mut...@yahoo.com Cc: billgra...@gmail.com billgra...@gmail.com Sent: Thursday, November 29, 2012 9:45 AM Subject: Re: Our release process The release branch receives only bug fixes. Patch level releases (3rd version number) are issued out of the release branch and introduce only bug fixes and no new features. Deciding whether a patch is applied to the release branch is based on preserving stability (as Bill said). If a patch makes the release branch unstable, we revert it. New features are added to trunk where new major and minor releases will happen. If we need a new feature out then we make a new minor release. Doing frequent releases is the industry standard and will resolve conflicts around what should go in a release branch. Making a new release is currently painful *because* we wait so long in between two releases. Let's fix that. Julien On Wed, Nov 28, 2012 at 10:09 PM, Santhosh M S santhosh_mut...@yahoo.com wrote: Since releasing a major version once a month is agressive and we have not released on a quarterly basis, we should allow commits to a released branch to facilitate dot releases. If we are allowing commits to a released branch, the criteria for inclusion can be created anew or we use the industry standards for severity (or priority). It could be painful for a few folks but I don't see better alternatives. Regarding reverting commits based on e2e tests breaking: 1. Who is running the tests? 2. How often are they run? If we have nightly e2e runs then its easier to catch these errors early. If not the barrier for inclusion is pretty high and time consuming making it harder to develop. Santhosh From: Bill Graham billgra...@gmail.com To: dev@pig.apache.org Sent: Wednesday, November 28, 2012 11:39 AM
[jira] [Commented] (PIG-3015) Rewrite of AvroStorage
[ https://issues.apache.org/jira/browse/PIG-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13510033#comment-13510033 ] Cheolsoo Park commented on PIG-3015: Yes, it does. Thank you, sir! Rewrite of AvroStorage -- Key: PIG-3015 URL: https://issues.apache.org/jira/browse/PIG-3015 Project: Pig Issue Type: Improvement Components: piggybank Reporter: Joseph Adler Assignee: Joseph Adler Attachments: PIG-3015.patch The current AvroStorage implementation has a lot of issues: it requires old versions of Avro, it copies data much more than needed, and it's verbose and complicated. (One pet peeve of mine is that old versions of Avro don't support Snappy compression.) I rewrote AvroStorage from scratch to fix these issues. In early tests, the new implementation is significantly faster, and the code is a lot simpler. Rewriting AvroStorage also enabled me to implement support for Trevni. I'm opening this ticket to facilitate discussion while I figure out the best way to contribute the changes back to Apache. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (PIG-2684) :: in field name causes AvroStorage to fail
[ https://issues.apache.org/jira/browse/PIG-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13510045#comment-13510045 ] Will Oberman commented on PIG-2684: --- I was just bit by this same bug. For me it was because I'm changing from running Hadoop directly against Cassnadra, to doing Cassandra - Amazon EMR - Cassandra (using Pig as my Hadoop language of choice, and S3 as the data interchange layer). And, my output schema that is cassandra compatible seems to have autogenerated ::'s. :: in field name causes AvroStorage to fail --- Key: PIG-2684 URL: https://issues.apache.org/jira/browse/PIG-2684 Project: Pig Issue Type: Bug Components: piggybank Reporter: Fabian Alenius There appears to be a bug in AvroStorage which causes it to fail when there are field names that contain :: For example, the following will fail: data = load 'test.txt' as (one, two); grp = GROUP data by (one, two); result = foreach grp generate FLATTEN(group); store result into 'test.avro' using org.apache.pig.piggybank.storage.avro.AvroStorage(); ERROR 2999: Unexpected internal error. Illegal character in: group::one While the following will succeed: data = load 'test.txt' as (one, two); grp = GROUP data by (one, two); result = foreach grp generate FLATTEN(group) as (one,two); store result into 'test.avro' using org.apache.pig.piggybank.storage.avro.AvroStorage(); Here is a minimal test case: data = load 'test.txt' as (one::two, three); store data into 'test.avro' using org.apache.pig.piggybank.storage.avro.AvroStorage(); -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3072) Pig job reporting negative progress
[ https://issues.apache.org/jira/browse/PIG-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-3072: Resolution: Fixed Release Note: Committed to trunk. Thanks Koji. Status: Resolved (was: Patch Available) Pig job reporting negative progress --- Key: PIG-3072 URL: https://issues.apache.org/jira/browse/PIG-3072 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.10.0 Reporter: Koji Noguchi Assignee: Koji Noguchi Priority: Minor Fix For: 0.12 Attachments: pig-3072-v01.txt, pig-3072-v02.txt, pig-3072-v03.txt, pig-3072-v04.txt Our users pointed out that their jobs reporting negative progress. 2012-11-02 21:43:11,538 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - -795% complete ... (due to TFileRecordReader) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (PIG-3072) Pig job reporting negative progress
[ https://issues.apache.org/jira/browse/PIG-3072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rohini Palaniswamy updated PIG-3072: Release Note: (was: Committed to trunk. Thanks Koji.) Committed to trunk. Thanks Koji. Pig job reporting negative progress --- Key: PIG-3072 URL: https://issues.apache.org/jira/browse/PIG-3072 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.10.0 Reporter: Koji Noguchi Assignee: Koji Noguchi Priority: Minor Fix For: 0.12 Attachments: pig-3072-v01.txt, pig-3072-v02.txt, pig-3072-v03.txt, pig-3072-v04.txt Our users pointed out that their jobs reporting negative progress. 2012-11-02 21:43:11,538 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - -795% complete ... (due to TFileRecordReader) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3077) TestMultiQueryLocal should not write in /tmp
Julien Le Dem created PIG-3077: -- Summary: TestMultiQueryLocal should not write in /tmp Key: PIG-3077 URL: https://issues.apache.org/jira/browse/PIG-3077 Project: Pig Issue Type: Test Reporter: Julien Le Dem temporary files from tests should be under build/test so that they are cleaned by ant clean Currently two test suites running on the same machine step on each other and create flaky tests results -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (PIG-3078) Make a UDF that, given a string, returns just the columns prefixed by that string
Jonathan Coveney created PIG-3078: - Summary: Make a UDF that, given a string, returns just the columns prefixed by that string Key: PIG-3078 URL: https://issues.apache.org/jira/browse/PIG-3078 Project: Pig Issue Type: Bug Reporter: Jonathan Coveney Fix For: 0.12 This comes up fairly often, usually as the result of a join. Given that the resulting schema has the column name prepended, a udf in the following form could give just the columns from the desired relation: Pluck('relation_name', *) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira