Olga Natkovich commented on PIG-793:

Clarification from Alan on the String vs. Text comparison:

The 16/36 24/52 numbers noted in the bug are correct.  Let me explain them.  
Text has a 16 byte overhead in and of itself, plus 16 bytes for the array that 
holds the data, plus 20 bytes for the data.  String has a 24 byte overhead for 
itself, plus 12 bytes for whatever it holds the data in, plus 40 bytes for the 
data.  So overall, I guess it would have been clearer had I said Text has a 32 
byte over head and String 36, and then Text stores the data in one byte per 
characters (assumingASCII) while String stores it in 2 (ASCII or not).  There 
is some guesswork involved here, since I'm just looking at output from Java 
memory tools.  We could retest this with larger strings and make sure the 
results are consistent.

> Improving memory efficiency of Tuple implementation
> ---------------------------------------------------
>                 Key: PIG-793
>                 URL: https://issues.apache.org/jira/browse/PIG-793
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Olga Natkovich
>            Assignee: Alan Gates
> Currently, our tuple is a real pig and uses a lot of extra memory. 
> There are several places where we can improve memory efficiency:
> (1) Laying out memory for the fields rather than using java objects since 
> since each object for a numeric field takes 16 bytes
> (2) For the cases where we know the schema using Java arrays rather than 
> ArrayList.
> There might be more.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to