[ANNOUNCE] Registration for ApacheCon Europe 2009 is now open!
All, I'm broadcasting this to all of the Hadoop dev and users lists, however, in the future I'll only send cross-subproject announcements to gene...@hadoop.apache.org. Please subscribe over there too! It is very low traffic. Anyways, ApacheCon Europe is coming up in March. There are a range of Hadoop talks being given: Introduction to Hadoop by Owen O'Malley Hadoop Map/Reduce: Tuning and Debugging by Arun Murthy Pig - Making Hadoop Easy by Olga Natkovich Running Hadoop in the Cloud by Tom White Architectures for the Cloud by Steve Loughran Configuring Hadoop for Grid Services by Allen Wittenauer Dynamic Hadoop Clusters by Steve Loughran HBasics: An Introduction to Hadoop's Bid Data Database by Michael Stack Hadoop Tools and Tricks for Data Pipelines by Christophe Bisciglia Introducing Mahout: Apache Machine Learning by Grant Ingersoll -- Owen Begin forwarded message: From: Shane Curcuru Date: January 27, 2009 6:15:25 AM PST Subject: [ANN] Registration for ApacheCon Europe 2009 is now open! PMC moderators - please forward the below to any appropriate dev@ or users@ lists so your larger community can hear about ApacheCon Europe. Remember, ACEU09 has scheduled sessions spanning the breadth of the ASF's projects, subprojects, and podlings, including at least: ActiveMQ, SerivceMix, CXF, Axis2, Hadoop, Felix, Sling, Maven, Struts, Roller, Shindig, Geronimo, Lucene, Solr, BSF, Mina, Directory, Tomcat, httpd, Mahout, Bayeux, CouchDB, AntUnit, Jackrabbit, Archiva, Wicket, POI, Pig, Synapse, Droids, Continuum. ApacheCon EU 2009 registration is now open! 23-27 March -- Mövenpick Hotel, Amsterdam, Netherlands http://www.eu.apachecon.com/ Registration for ApacheCon Europe 2009 is now open - act before early bird prices expire 6 February. Remember to book a room at the Mövenpick and use the Registration Code: Special package attendees for the conference registration, and get 150 Euros off your full conference registration. Lower Costs - Thanks to new VAT tax laws, our prices this year are 19% lower than last year in Europe! We've also negotiated a Mövenpick rate of a maximum of 155 Euros per night for attendees in our room block. Quick Links: http://xrl.us/aceu09sp See the schedule http://xrl.us/aceu09hp Get your hotel room http://xrl.us/aceu09rp Register for the conference Other important notes: - Geeks for Geeks is a new mini-track where we can feature advanced technical content from project committers. And our Hackathon on Monday and Tuesday is open to all attendees - be sure to check it off in your registration. - The Call for Papers for ApacheCon US 2009, held 2-6 November 2009 in Oakland, CA, is open through 28 February, so get your submissions in now. This ApacheCon will feature special events with some of the ASF's original founders in celebration of the 10th anniversary of The Apache Software Foundation. http://www.us.apachecon.com/c/acus2009/ - Interested in sponsoring the ApacheCon conferences? There are plenty of sponsor packages available - please contact Delia Frees at de...@apachecon.com for further information. == ApacheCon EU 2008: A week of Open Source at it's best! Hackathon - open to all! | Geeks for Geeks | Lunchtime Sessions In-Depth Trainings | Multi-Track Sessions | BOFs | Business Panel Lightning Talks | Receptions | Fast Feather Track | Expo... and more! - Shane Curcuru, on behalf of Noirin Shirley, Conference Lead, and the whole ApacheCon Europe 2009 Team http://www.eu.apachecon.com/ 23-27 March -- Amsterdam, Netherlands
[jira] Commented: (PIG-632) Improved error message for binary operators
[ https://issues.apache.org/jira/browse/PIG-632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667766#action_12667766 ] Olga Natkovich commented on PIG-632: +1 > Improved error message for binary operators > --- > > Key: PIG-632 > URL: https://issues.apache.org/jira/browse/PIG-632 > Project: Pig > Issue Type: Bug >Affects Versions: types_branch >Reporter: Olga Natkovich >Assignee: Santhosh Srinivasan > Fix For: types_branch > > Attachments: PIG-632.patch > > > Current message: "Incompatible types in Add Operator LHS:chararray RHS:int". > LHS and RHS might not be meaningful to users. Lets try to stick to > non-abrivated version of English :). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-632) Improved error message for binary operators
[ https://issues.apache.org/jira/browse/PIG-632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Santhosh Srinivasan resolved PIG-632. - Resolution: Fixed Hadoop Flags: [Reviewed] Patch has been committed. > Improved error message for binary operators > --- > > Key: PIG-632 > URL: https://issues.apache.org/jira/browse/PIG-632 > Project: Pig > Issue Type: Bug >Affects Versions: types_branch >Reporter: Olga Natkovich >Assignee: Santhosh Srinivasan > Fix For: types_branch > > Attachments: PIG-632.patch > > > Current message: "Incompatible types in Add Operator LHS:chararray RHS:int". > LHS and RHS might not be meaningful to users. Lets try to stick to > non-abrivated version of English :). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-636) PERFORMANCE: Use lightweight bag implementations which do not register with SpillableMemoryManager with Combiner
[ https://issues.apache.org/jira/browse/PIG-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667778#action_12667778 ] Olga Natkovich commented on PIG-636: +1 on the patch. The only minor thing that I saw that needs to be changes is size of a single tuple bag should be changed to 1 rather than 0. > PERFORMANCE: Use lightweight bag implementations which do not register with > SpillableMemoryManager with Combiner > > > Key: PIG-636 > URL: https://issues.apache.org/jira/browse/PIG-636 > Project: Pig > Issue Type: Improvement >Affects Versions: types_branch >Reporter: Pradeep Kamath >Assignee: Pradeep Kamath > Fix For: types_branch > > Attachments: PIG-636.patch > > > Currently whenever Combiner is used in pig, in the map, the > POPrecombinerLocalRearrange operator puts the single "value" tuple > corresponding to a key into a DataBag and passes this to the foreach which is > being combined. This will generate as many bags as there are input records. > These bags all will have a single tuple and hence are small and should not > need to be spilt to disk. However since the bags are created through the > BagFactory mechanism, each bag creation is registered with the > SpillableMemoryManager and a weak reference to the bag is stored in a linked > list. This linked list grows really big over time causing unnecessary Garbage > collection runs. This can be avoided by having a simple lightweight > implementation of the DataBag interface to store the single tuple in a bag. > Also these SingleTupleBags should be created without registering with the > spillableMemoryManager. Likewise the bags created in POCombinePackage are > supposed to fit in Memory and not spill. Again a NonSpillableDataBag > implementation of DataBag interface which does not register with the > SpillableMemoryManager would help. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-641) Fragment replicate join does not work in local mode
Fragment replicate join does not work in local mode --- Key: PIG-641 URL: https://issues.apache.org/jira/browse/PIG-641 Project: Pig Issue Type: Bug Reporter: Olga Natkovich Assignee: Shubham Chopra -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-642) Limit after FRJ causes problems
Limit after FRJ causes problems --- Key: PIG-642 URL: https://issues.apache.org/jira/browse/PIG-642 Project: Pig Issue Type: Bug Reporter: Olga Natkovich Assignee: Shravan Matthur Narayanamurthy Script: a = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age,gpa); b = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age,gpa); c = join a by name, b by name using "replicated"; d = limit c 10; dump d; Error: ERROR 2013: Moving LOLimit in front of LOFRJoin is not implemented It is fine not to move limit and apply it after the join. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-643) Using enum instead of numbers for error codes
Using enum instead of numbers for error codes - Key: PIG-643 URL: https://issues.apache.org/jira/browse/PIG-643 Project: Pig Issue Type: Bug Reporter: Olga Natkovich Assignee: Santhosh Srinivasan Make code readable to assign names to error codes. PIG_DISKFULL is more meaningful than 512. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-636) PERFORMANCE: Use lightweight bag implementations which do not register with SpillableMemoryManager with Combiner
[ https://issues.apache.org/jira/browse/PIG-636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-636: --- Attachment: PIG-636-v2.patch Attached new version of patch with the following two changes as per review comments: 1) SingleTupleBag now only has one constructor which takes the tuple the bag is meant to contain. This way SingleTupleBags can only be created with the member Tuple. 2) size() now returns 1. > PERFORMANCE: Use lightweight bag implementations which do not register with > SpillableMemoryManager with Combiner > > > Key: PIG-636 > URL: https://issues.apache.org/jira/browse/PIG-636 > Project: Pig > Issue Type: Improvement >Affects Versions: types_branch >Reporter: Pradeep Kamath >Assignee: Pradeep Kamath > Fix For: types_branch > > Attachments: PIG-636-v2.patch, PIG-636.patch > > > Currently whenever Combiner is used in pig, in the map, the > POPrecombinerLocalRearrange operator puts the single "value" tuple > corresponding to a key into a DataBag and passes this to the foreach which is > being combined. This will generate as many bags as there are input records. > These bags all will have a single tuple and hence are small and should not > need to be spilt to disk. However since the bags are created through the > BagFactory mechanism, each bag creation is registered with the > SpillableMemoryManager and a weak reference to the bag is stored in a linked > list. This linked list grows really big over time causing unnecessary Garbage > collection runs. This can be avoided by having a simple lightweight > implementation of the DataBag interface to store the single tuple in a bag. > Also these SingleTupleBags should be created without registering with the > spillableMemoryManager. Likewise the bags created in POCombinePackage are > supposed to fit in Memory and not spill. Again a NonSpillableDataBag > implementation of DataBag interface which does not register with the > SpillableMemoryManager would help. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-636) PERFORMANCE: Use lightweight bag implementations which do not register with SpillableMemoryManager with Combiner
[ https://issues.apache.org/jira/browse/PIG-636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-636: --- Resolution: Fixed Status: Resolved (was: Patch Available) Patch committed. > PERFORMANCE: Use lightweight bag implementations which do not register with > SpillableMemoryManager with Combiner > > > Key: PIG-636 > URL: https://issues.apache.org/jira/browse/PIG-636 > Project: Pig > Issue Type: Improvement >Affects Versions: types_branch >Reporter: Pradeep Kamath >Assignee: Pradeep Kamath > Fix For: types_branch > > Attachments: PIG-636-v2.patch, PIG-636.patch > > > Currently whenever Combiner is used in pig, in the map, the > POPrecombinerLocalRearrange operator puts the single "value" tuple > corresponding to a key into a DataBag and passes this to the foreach which is > being combined. This will generate as many bags as there are input records. > These bags all will have a single tuple and hence are small and should not > need to be spilt to disk. However since the bags are created through the > BagFactory mechanism, each bag creation is registered with the > SpillableMemoryManager and a weak reference to the bag is stored in a linked > list. This linked list grows really big over time causing unnecessary Garbage > collection runs. This can be avoided by having a simple lightweight > implementation of the DataBag interface to store the single tuple in a bag. > Also these SingleTupleBags should be created without registering with the > spillableMemoryManager. Likewise the bags created in POCombinePackage are > supposed to fit in Memory and not spill. Again a NonSpillableDataBag > implementation of DataBag interface which does not register with the > SpillableMemoryManager would help. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-642) Limit after FRJ causes problems
[ https://issues.apache.org/jira/browse/PIG-642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-642: --- Attachment: PIG-642.patch Here is the patch for reference. > Limit after FRJ causes problems > --- > > Key: PIG-642 > URL: https://issues.apache.org/jira/browse/PIG-642 > Project: Pig > Issue Type: Bug >Reporter: Olga Natkovich >Assignee: Shravan Matthur Narayanamurthy > Attachments: PIG-642.patch > > > Script: > a = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age,gpa); > b = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age,gpa); > c = join a by name, b by name using "replicated"; > d = limit c 10; > dump d; > Error: ERROR 2013: Moving LOLimit in front of LOFRJoin is not implemented > It is fine not to move limit and apply it after the join. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-631) 4 Unit test failures on Windows
[ https://issues.apache.org/jira/browse/PIG-631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667851#action_12667851 ] Olga Natkovich commented on PIG-631: Daniels, increasing time out worked fine for Lee. Can we change the HBase unit test to not run on windows for now > 4 Unit test failures on Windows > > > Key: PIG-631 > URL: https://issues.apache.org/jira/browse/PIG-631 > Project: Pig > Issue Type: Bug >Affects Versions: types_branch > Environment: Windows >Reporter: Lee Tucker > > 4 Windows unit test failures. All timeouts. Errors occur at tip of branch > and have for several days. > [junit] Running org.apache.pig.test.TestAlgebraicEval > [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec > [junit] Test org.apache.pig.test.TestAlgebraicEval FAILED (timeout) > [junit] Running org.apache.pig.test.TestEvalPipeline > [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec > [junit] Test org.apache.pig.test.TestEvalPipeline FAILED (timeout) > [junit] Running org.apache.pig.test.TestHBaseStorage > [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec > [junit] Test org.apache.pig.test.TestHBaseStorage FAILED (timeout) > [junit] Running org.apache.pig.test.TestMapReduce > [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec > [junit] Test org.apache.pig.test.TestMapReduce FAILED (timeout) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-631) 4 Unit test failures on Windows
[ https://issues.apache.org/jira/browse/PIG-631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667854#action_12667854 ] Daniel Dai commented on PIG-631: I am not sure why TestHBaseStorage is blocked under cygwin. Seems HBase should work for cygwin as they claimed. I think we can exclude it for now and I will look into it later. > 4 Unit test failures on Windows > > > Key: PIG-631 > URL: https://issues.apache.org/jira/browse/PIG-631 > Project: Pig > Issue Type: Bug >Affects Versions: types_branch > Environment: Windows >Reporter: Lee Tucker > > 4 Windows unit test failures. All timeouts. Errors occur at tip of branch > and have for several days. > [junit] Running org.apache.pig.test.TestAlgebraicEval > [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec > [junit] Test org.apache.pig.test.TestAlgebraicEval FAILED (timeout) > [junit] Running org.apache.pig.test.TestEvalPipeline > [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec > [junit] Test org.apache.pig.test.TestEvalPipeline FAILED (timeout) > [junit] Running org.apache.pig.test.TestHBaseStorage > [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec > [junit] Test org.apache.pig.test.TestHBaseStorage FAILED (timeout) > [junit] Running org.apache.pig.test.TestMapReduce > [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec > [junit] Test org.apache.pig.test.TestMapReduce FAILED (timeout) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-642) Limit after FRJ causes problems
[ https://issues.apache.org/jira/browse/PIG-642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667864#action_12667864 ] Olga Natkovich commented on PIG-642: I am reviewing this patch now > Limit after FRJ causes problems > --- > > Key: PIG-642 > URL: https://issues.apache.org/jira/browse/PIG-642 > Project: Pig > Issue Type: Bug >Reporter: Olga Natkovich >Assignee: Shravan Matthur Narayanamurthy > Attachments: PIG-642.patch > > > Script: > a = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age,gpa); > b = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age,gpa); > c = join a by name, b by name using "replicated"; > d = limit c 10; > dump d; > Error: ERROR 2013: Moving LOLimit in front of LOFRJoin is not implemented > It is fine not to move limit and apply it after the join. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-631) 4 Unit test failures on Windows
[ https://issues.apache.org/jira/browse/PIG-631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-631: --- Attachment: PIG-631.temp.patch Temporary patch to disable TestHBaseStorage under Windows > 4 Unit test failures on Windows > > > Key: PIG-631 > URL: https://issues.apache.org/jira/browse/PIG-631 > Project: Pig > Issue Type: Bug >Affects Versions: types_branch > Environment: Windows >Reporter: Lee Tucker > Attachments: PIG-631.temp.patch > > > 4 Windows unit test failures. All timeouts. Errors occur at tip of branch > and have for several days. > [junit] Running org.apache.pig.test.TestAlgebraicEval > [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec > [junit] Test org.apache.pig.test.TestAlgebraicEval FAILED (timeout) > [junit] Running org.apache.pig.test.TestEvalPipeline > [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec > [junit] Test org.apache.pig.test.TestEvalPipeline FAILED (timeout) > [junit] Running org.apache.pig.test.TestHBaseStorage > [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec > [junit] Test org.apache.pig.test.TestHBaseStorage FAILED (timeout) > [junit] Running org.apache.pig.test.TestMapReduce > [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec > [junit] Test org.apache.pig.test.TestMapReduce FAILED (timeout) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-560) UTFDataFormatException (encoded string too long) is thrown when storing strings > 65536 bytes (in UTF8 form) using BinStorage()
[ https://issues.apache.org/jira/browse/PIG-560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Laukik Chitnis updated PIG-560: --- Attachment: utf-limit-patch.diff The patch uses the String object's getBytes(charsetname) method to convert the string to UTF bytes, instead of the writeUTF() function. Now, an int can be used for storing the length instead of the 2 bytes used by the writeUTF(). Also includes the corresponding change while reading in a CHARARRAY. > UTFDataFormatException (encoded string too long) is thrown when storing > strings > 65536 bytes (in UTF8 form) using BinStorage() > --- > > Key: PIG-560 > URL: https://issues.apache.org/jira/browse/PIG-560 > Project: Pig > Issue Type: Bug >Affects Versions: types_branch >Reporter: Pradeep Kamath > Fix For: types_branch > > Attachments: utf-limit-patch.diff > > > BinStorage() uses DataOutput.writeUTF() and DataInput.readUTF() Java API to > write out Strings as UTF-8 bytes and to read them back. From the Javadoc - > "First, the total number of bytes needed to represent all the characters of s > is calculated. If this number is larger than 65535, then a > UTFDataFormatException is thrown. " (because the writeUTF() API uses 2 bytes > to represent the number of bytes). A way to get around this would be to not > use writeUTF()/ReadUTF() and instead hand convert the string to the > corresponding UTF-8 byte[] (using String.getBytes("UTF-8") and then write > the length of the byte array as an int - this will allow a size of upto 2^32 > (2 raised to 32). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.