Since we have not heard any objections, we are going to proceed with
this plan. Stay tuned for the details when the change is coming.


-----Original Message-----
From: Olga Natkovich [] 
Sent: Friday, September 11, 2009 11:54 AM
Subject: proposed changes to Pig UDFs



As you know, a lot of work this year went into performance optimization
of Pig. One of the main sources of performance problems is high memory
usage. In an effort to address this problem we propose switching
internal implementation of strings from Java Strings to Hadoop Text
because text has lower memory overhead. Examples (assumes ASCII data;
sizes are in bytes):


Real String        Java String        Hadoop Text

5                      46                     37

10                     56                     42

20                     76                     52

40                     116                   72

80                     196                   112


As the size of the strings grows so does the gap between the two


Making this change would have no impact on pig users; however, it will
have impact on existing UDFs that work with Strings. Our question is
whether UDF writers/owners are comfortable with the proposed transition
and will update their UDFs.


Please, let us know by the end of next week if you strongly object to
this proposal. Otherwise, we will go forward with this plan.







Reply via email to