David Ciemiewicz commented on PIG-793:


This sounds good, but it sounds like it is only 12 out of 174 bytes that you 
are saving or less than 10%.

Amdahl's law says this isn't sufficient in the grand scheme of things and so I 
won't expect a huge payback.

It seems like an "optimal" encoding of the same tuple would be something like:

1 or 2 bytes for an index to the structure describing the contents of the tuple 
(keep a list of these tuple structures)
4 bytes for the int
8 bytes for the double
1 or 2 bytes for string length in fixed positions
20 bytes for string

Total is 36 bytes or an 80% reduction in memory versus 174 bytes.

If memory and not CPU is what is slowing down Pig processing, then Hong Tang's 
"LazyTuple" or something like it ultimately going to be what is needed.

> Improving memory efficiency of Tuple implementation
> ---------------------------------------------------
>                 Key: PIG-793
>                 URL: https://issues.apache.org/jira/browse/PIG-793
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Olga Natkovich
>            Assignee: Alan Gates
> Currently, our tuple is a real pig and uses a lot of extra memory. 
> There are several places where we can improve memory efficiency:
> (1) Laying out memory for the fields rather than using java objects since 
> since each object for a numeric field takes 16 bytes
> (2) For the cases where we know the schema using Java arrays rather than 
> ArrayList.
> There might be more.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

Reply via email to