[ANNOUNCE] Registration for ApacheCon Europe 2009 is now open!

2009-01-27 Thread Owen O'Malley

All,
   I'm broadcasting this to all of the Hadoop dev and users lists,  
however, in the future I'll only send cross-subproject announcements  
to gene...@hadoop.apache.org. Please subscribe over there too! It is  
very low traffic.
  Anyways, ApacheCon Europe is coming up in March. There are a range  
of Hadoop talks being given:


Introduction to Hadoop by Owen O'Malley
Hadoop Map/Reduce: Tuning and Debugging by Arun Murthy
Pig - Making Hadoop Easy by Olga Natkovich
Running Hadoop in the Cloud by Tom White
Architectures for the Cloud by Steve Loughran
Configuring Hadoop for Grid Services by Allen Wittenauer
Dynamic Hadoop Clusters by Steve Loughran
HBasics: An Introduction to Hadoop's Bid Data Database by Michael Stack
Hadoop Tools and Tricks for Data Pipelines by Christophe Bisciglia
Introducing Mahout: Apache Machine Learning by Grant Ingersoll

-- Owen

Begin forwarded message:


From: Shane Curcuru 
Date: January 27, 2009 6:15:25 AM PST
Subject: [ANN] Registration for ApacheCon Europe 2009 is now open!

PMC moderators - please forward the below to any appropriate dev@ or  
users@ lists so your larger community can hear about ApacheCon  
Europe. Remember, ACEU09 has scheduled sessions spanning the breadth  
of the ASF's projects, subprojects, and podlings, including at  
least: ActiveMQ, SerivceMix, CXF, Axis2, Hadoop, Felix, Sling,  
Maven, Struts, Roller, Shindig, Geronimo, Lucene, Solr, BSF, Mina,  
Directory, Tomcat, httpd, Mahout, Bayeux, CouchDB, AntUnit,  
Jackrabbit, Archiva, Wicket, POI, Pig, Synapse, Droids, Continuum.



ApacheCon EU 2009 registration is now open!
23-27 March -- Mövenpick Hotel, Amsterdam, Netherlands
http://www.eu.apachecon.com/


Registration for ApacheCon Europe 2009 is now open - act before early
bird prices expire 6 February.  Remember to book a room at the  
Mövenpick

and use the Registration Code: Special package attendees for the
conference registration, and get 150 Euros off your full conference
registration.

Lower Costs - Thanks to new VAT tax laws, our prices this year are 19%
lower than last year in Europe!  We've also negotiated a Mövenpick  
rate

of a maximum of 155 Euros per night for attendees in our room block.

Quick Links:

  http://xrl.us/aceu09sp  See the schedule
  http://xrl.us/aceu09hp  Get your hotel room
  http://xrl.us/aceu09rp  Register for the conference

Other important notes:

- Geeks for Geeks is a new mini-track where we can feature advanced
technical content from project committers.  And our Hackathon on  
Monday

and Tuesday is open to all attendees - be sure to check it off in your
registration.

- The Call for Papers for ApacheCon US 2009, held 2-6 November
2009 in Oakland, CA, is open through 28 February, so get your
submissions in now.  This ApacheCon will feature special events with
some of the ASF's original founders in celebration of the 10th
anniversary of The Apache Software Foundation.

  http://www.us.apachecon.com/c/acus2009/

- Interested in sponsoring the ApacheCon conferences?  There are  
plenty

of sponsor packages available - please contact Delia Frees at
de...@apachecon.com for further information.

==
ApacheCon EU 2008: A week of Open Source at it's best!

Hackathon - open to all! | Geeks for Geeks | Lunchtime Sessions
In-Depth Trainings | Multi-Track Sessions | BOFs | Business Panel
Lightning Talks | Receptions | Fast Feather Track | Expo... and more!

- Shane Curcuru, on behalf of
 Noirin Shirley, Conference Lead,
 and the whole ApacheCon Europe 2009 Team
 http://www.eu.apachecon.com/  23-27 March -- Amsterdam, Netherlands






[jira] Commented: (PIG-632) Improved error message for binary operators

2009-01-27 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667766#action_12667766
 ] 

Olga Natkovich commented on PIG-632:


+1

> Improved error message for binary operators
> ---
>
> Key: PIG-632
> URL: https://issues.apache.org/jira/browse/PIG-632
> Project: Pig
>  Issue Type: Bug
>Affects Versions: types_branch
>Reporter: Olga Natkovich
>Assignee: Santhosh Srinivasan
> Fix For: types_branch
>
> Attachments: PIG-632.patch
>
>
> Current message: "Incompatible types in Add Operator LHS:chararray RHS:int". 
> LHS and RHS might not be meaningful to users. Lets try to stick to 
> non-abrivated version of English :).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-632) Improved error message for binary operators

2009-01-27 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan resolved PIG-632.
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]

Patch has been committed.

> Improved error message for binary operators
> ---
>
> Key: PIG-632
> URL: https://issues.apache.org/jira/browse/PIG-632
> Project: Pig
>  Issue Type: Bug
>Affects Versions: types_branch
>Reporter: Olga Natkovich
>Assignee: Santhosh Srinivasan
> Fix For: types_branch
>
> Attachments: PIG-632.patch
>
>
> Current message: "Incompatible types in Add Operator LHS:chararray RHS:int". 
> LHS and RHS might not be meaningful to users. Lets try to stick to 
> non-abrivated version of English :).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-636) PERFORMANCE: Use lightweight bag implementations which do not register with SpillableMemoryManager with Combiner

2009-01-27 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667778#action_12667778
 ] 

Olga Natkovich commented on PIG-636:


+1 on the patch. The only minor thing that I saw that needs to be changes is 
size of a single tuple bag should be changed to 1 rather than 0.

> PERFORMANCE: Use lightweight bag implementations which do not register with 
> SpillableMemoryManager with Combiner
> 
>
> Key: PIG-636
> URL: https://issues.apache.org/jira/browse/PIG-636
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: types_branch
>Reporter: Pradeep Kamath
>Assignee: Pradeep Kamath
> Fix For: types_branch
>
> Attachments: PIG-636.patch
>
>
> Currently whenever Combiner is used in pig, in the map, the 
> POPrecombinerLocalRearrange operator puts the single "value" tuple 
> corresponding to a key into a DataBag and passes this to the foreach which is 
> being combined. This will generate as many bags as there are input records. 
> These bags all will have a single tuple and hence are small and should not 
> need to be spilt to disk. However since the bags are created through the 
> BagFactory mechanism, each bag creation is registered with the 
> SpillableMemoryManager and a weak reference to the bag is stored in a linked 
> list. This linked list grows really big over time causing unnecessary Garbage 
> collection runs. This can be avoided by having a simple lightweight 
> implementation of the DataBag interface to store the single tuple in a bag. 
> Also these SingleTupleBags should be created without registering with the 
> spillableMemoryManager. Likewise the bags created in POCombinePackage are 
> supposed to fit in Memory and not spill. Again a NonSpillableDataBag 
> implementation of DataBag interface which does not register with the 
> SpillableMemoryManager would help.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-641) Fragment replicate join does not work in local mode

2009-01-27 Thread Olga Natkovich (JIRA)
Fragment replicate join does not work in local mode
---

 Key: PIG-641
 URL: https://issues.apache.org/jira/browse/PIG-641
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
Assignee: Shubham Chopra




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-642) Limit after FRJ causes problems

2009-01-27 Thread Olga Natkovich (JIRA)
Limit after FRJ causes problems
---

 Key: PIG-642
 URL: https://issues.apache.org/jira/browse/PIG-642
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
Assignee: Shravan Matthur Narayanamurthy


Script:

a = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age,gpa);
b = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age,gpa);
c = join a by name, b by name using "replicated";
d = limit c 10;
dump d;

Error: ERROR 2013: Moving LOLimit in front of LOFRJoin is not implemented

It is fine not to move limit and apply it after the join.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-643) Using enum instead of numbers for error codes

2009-01-27 Thread Olga Natkovich (JIRA)
Using enum instead of numbers for error codes
-

 Key: PIG-643
 URL: https://issues.apache.org/jira/browse/PIG-643
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
Assignee: Santhosh Srinivasan


Make code readable to assign names to error codes. PIG_DISKFULL is more 
meaningful than 512.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-636) PERFORMANCE: Use lightweight bag implementations which do not register with SpillableMemoryManager with Combiner

2009-01-27 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-636:
---

Attachment: PIG-636-v2.patch

Attached new version of patch with the following two changes as per review 
comments:
1) SingleTupleBag now only has one constructor which takes the tuple the bag is 
meant to contain. This way SingleTupleBags can only be created with the member 
Tuple.
2) size() now returns 1.

> PERFORMANCE: Use lightweight bag implementations which do not register with 
> SpillableMemoryManager with Combiner
> 
>
> Key: PIG-636
> URL: https://issues.apache.org/jira/browse/PIG-636
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: types_branch
>Reporter: Pradeep Kamath
>Assignee: Pradeep Kamath
> Fix For: types_branch
>
> Attachments: PIG-636-v2.patch, PIG-636.patch
>
>
> Currently whenever Combiner is used in pig, in the map, the 
> POPrecombinerLocalRearrange operator puts the single "value" tuple 
> corresponding to a key into a DataBag and passes this to the foreach which is 
> being combined. This will generate as many bags as there are input records. 
> These bags all will have a single tuple and hence are small and should not 
> need to be spilt to disk. However since the bags are created through the 
> BagFactory mechanism, each bag creation is registered with the 
> SpillableMemoryManager and a weak reference to the bag is stored in a linked 
> list. This linked list grows really big over time causing unnecessary Garbage 
> collection runs. This can be avoided by having a simple lightweight 
> implementation of the DataBag interface to store the single tuple in a bag. 
> Also these SingleTupleBags should be created without registering with the 
> spillableMemoryManager. Likewise the bags created in POCombinePackage are 
> supposed to fit in Memory and not spill. Again a NonSpillableDataBag 
> implementation of DataBag interface which does not register with the 
> SpillableMemoryManager would help.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-636) PERFORMANCE: Use lightweight bag implementations which do not register with SpillableMemoryManager with Combiner

2009-01-27 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-636:
---

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Patch committed.

> PERFORMANCE: Use lightweight bag implementations which do not register with 
> SpillableMemoryManager with Combiner
> 
>
> Key: PIG-636
> URL: https://issues.apache.org/jira/browse/PIG-636
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: types_branch
>Reporter: Pradeep Kamath
>Assignee: Pradeep Kamath
> Fix For: types_branch
>
> Attachments: PIG-636-v2.patch, PIG-636.patch
>
>
> Currently whenever Combiner is used in pig, in the map, the 
> POPrecombinerLocalRearrange operator puts the single "value" tuple 
> corresponding to a key into a DataBag and passes this to the foreach which is 
> being combined. This will generate as many bags as there are input records. 
> These bags all will have a single tuple and hence are small and should not 
> need to be spilt to disk. However since the bags are created through the 
> BagFactory mechanism, each bag creation is registered with the 
> SpillableMemoryManager and a weak reference to the bag is stored in a linked 
> list. This linked list grows really big over time causing unnecessary Garbage 
> collection runs. This can be avoided by having a simple lightweight 
> implementation of the DataBag interface to store the single tuple in a bag. 
> Also these SingleTupleBags should be created without registering with the 
> spillableMemoryManager. Likewise the bags created in POCombinePackage are 
> supposed to fit in Memory and not spill. Again a NonSpillableDataBag 
> implementation of DataBag interface which does not register with the 
> SpillableMemoryManager would help.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-642) Limit after FRJ causes problems

2009-01-27 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-642:
---

Attachment: PIG-642.patch

Here is the patch for reference.

> Limit after FRJ causes problems
> ---
>
> Key: PIG-642
> URL: https://issues.apache.org/jira/browse/PIG-642
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
>Assignee: Shravan Matthur Narayanamurthy
> Attachments: PIG-642.patch
>
>
> Script:
> a = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age,gpa);
> b = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age,gpa);
> c = join a by name, b by name using "replicated";
> d = limit c 10;
> dump d;
> Error: ERROR 2013: Moving LOLimit in front of LOFRJoin is not implemented
> It is fine not to move limit and apply it after the join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-631) 4 Unit test failures on Windows

2009-01-27 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667851#action_12667851
 ] 

Olga Natkovich commented on PIG-631:


Daniels, increasing time out worked fine for Lee. Can we change the HBase unit 
test to not run on windows for now

> 4 Unit test failures on Windows 
> 
>
> Key: PIG-631
> URL: https://issues.apache.org/jira/browse/PIG-631
> Project: Pig
>  Issue Type: Bug
>Affects Versions: types_branch
> Environment: Windows
>Reporter: Lee Tucker
>
> 4 Windows unit test failures.  All timeouts.  Errors occur at tip of branch 
> and have for several days.
> [junit] Running org.apache.pig.test.TestAlgebraicEval
> [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec
> [junit] Test org.apache.pig.test.TestAlgebraicEval FAILED (timeout)
> [junit] Running org.apache.pig.test.TestEvalPipeline
> [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec
> [junit] Test org.apache.pig.test.TestEvalPipeline FAILED (timeout)
> [junit] Running org.apache.pig.test.TestHBaseStorage
> [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec
> [junit] Test org.apache.pig.test.TestHBaseStorage FAILED (timeout)
> [junit] Running org.apache.pig.test.TestMapReduce
> [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec
> [junit] Test org.apache.pig.test.TestMapReduce FAILED (timeout)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-631) 4 Unit test failures on Windows

2009-01-27 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667854#action_12667854
 ] 

Daniel Dai commented on PIG-631:


I am not sure why TestHBaseStorage is blocked under cygwin. Seems HBase should 
work for cygwin as they claimed. I think we can exclude it for now and I will 
look into it later.

> 4 Unit test failures on Windows 
> 
>
> Key: PIG-631
> URL: https://issues.apache.org/jira/browse/PIG-631
> Project: Pig
>  Issue Type: Bug
>Affects Versions: types_branch
> Environment: Windows
>Reporter: Lee Tucker
>
> 4 Windows unit test failures.  All timeouts.  Errors occur at tip of branch 
> and have for several days.
> [junit] Running org.apache.pig.test.TestAlgebraicEval
> [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec
> [junit] Test org.apache.pig.test.TestAlgebraicEval FAILED (timeout)
> [junit] Running org.apache.pig.test.TestEvalPipeline
> [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec
> [junit] Test org.apache.pig.test.TestEvalPipeline FAILED (timeout)
> [junit] Running org.apache.pig.test.TestHBaseStorage
> [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec
> [junit] Test org.apache.pig.test.TestHBaseStorage FAILED (timeout)
> [junit] Running org.apache.pig.test.TestMapReduce
> [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec
> [junit] Test org.apache.pig.test.TestMapReduce FAILED (timeout)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-642) Limit after FRJ causes problems

2009-01-27 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667864#action_12667864
 ] 

Olga Natkovich commented on PIG-642:


I am reviewing this patch now

> Limit after FRJ causes problems
> ---
>
> Key: PIG-642
> URL: https://issues.apache.org/jira/browse/PIG-642
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
>Assignee: Shravan Matthur Narayanamurthy
> Attachments: PIG-642.patch
>
>
> Script:
> a = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age,gpa);
> b = load '/user/pig/tests/data/singlefile/studenttab10k' as (name, age,gpa);
> c = join a by name, b by name using "replicated";
> d = limit c 10;
> dump d;
> Error: ERROR 2013: Moving LOLimit in front of LOFRJoin is not implemented
> It is fine not to move limit and apply it after the join.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-631) 4 Unit test failures on Windows

2009-01-27 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-631:
---

Attachment: PIG-631.temp.patch

Temporary patch to disable TestHBaseStorage under Windows

> 4 Unit test failures on Windows 
> 
>
> Key: PIG-631
> URL: https://issues.apache.org/jira/browse/PIG-631
> Project: Pig
>  Issue Type: Bug
>Affects Versions: types_branch
> Environment: Windows
>Reporter: Lee Tucker
> Attachments: PIG-631.temp.patch
>
>
> 4 Windows unit test failures.  All timeouts.  Errors occur at tip of branch 
> and have for several days.
> [junit] Running org.apache.pig.test.TestAlgebraicEval
> [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec
> [junit] Test org.apache.pig.test.TestAlgebraicEval FAILED (timeout)
> [junit] Running org.apache.pig.test.TestEvalPipeline
> [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec
> [junit] Test org.apache.pig.test.TestEvalPipeline FAILED (timeout)
> [junit] Running org.apache.pig.test.TestHBaseStorage
> [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec
> [junit] Test org.apache.pig.test.TestHBaseStorage FAILED (timeout)
> [junit] Running org.apache.pig.test.TestMapReduce
> [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec
> [junit] Test org.apache.pig.test.TestMapReduce FAILED (timeout)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-560) UTFDataFormatException (encoded string too long) is thrown when storing strings > 65536 bytes (in UTF8 form) using BinStorage()

2009-01-27 Thread Laukik Chitnis (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Laukik Chitnis updated PIG-560:
---

Attachment: utf-limit-patch.diff

The patch uses the String object's getBytes(charsetname) method to convert the 
string to UTF bytes, instead of the writeUTF() function. Now, an int can be 
used for storing the length instead of the 2 bytes used by the writeUTF(). Also 
includes the corresponding change while reading in a CHARARRAY.

> UTFDataFormatException (encoded string too long) is thrown when storing 
> strings > 65536 bytes (in UTF8 form) using BinStorage()
> ---
>
> Key: PIG-560
> URL: https://issues.apache.org/jira/browse/PIG-560
> Project: Pig
>  Issue Type: Bug
>Affects Versions: types_branch
>Reporter: Pradeep Kamath
> Fix For: types_branch
>
> Attachments: utf-limit-patch.diff
>
>
> BinStorage() uses DataOutput.writeUTF() and DataInput.readUTF() Java API to 
> write out Strings as UTF-8 bytes and to read them back. From the Javadoc - 
> "First, the total number of bytes needed to represent all the characters of s 
> is calculated. If this number is larger than 65535, then a 
> UTFDataFormatException  is thrown. " (because the writeUTF() API uses 2 bytes 
> to represent the number of bytes). A way to get around this would be to not 
> use writeUTF()/ReadUTF() and instead hand convert the string to the 
> corresponding UTF-8 byte[]  (using String.getBytes("UTF-8") and then write 
> the length of the byte array as an int - this will allow a size of upto 2^32 
> (2 raised to 32).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.