[
https://issues.apache.org/jira/browse/PIG-1474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Thejas M Nair updated PIG-1474:
-------------------------------
Fix Version/s: 0.9.0
(was: 0.8.0)
Unlinking from 0.8 release.
I was planning to use the lazy implementations of Map and Bag for this that
were proposed in PIG-1473. Those objects would have had a copy of the seralized
versions of map and bag. But the plan in the jira had to be abandoned for
reasons mentioned there. A different approach is required to solve the issue.
> Avoid serialization/deserialization costs for PigStorage data - Use custom
> Tuple
> --------------------------------------------------------------------------------
>
> Key: PIG-1474
> URL: https://issues.apache.org/jira/browse/PIG-1474
> Project: Pig
> Issue Type: Improvement
> Affects Versions: 0.8.0
> Reporter: Thejas M Nair
> Assignee: Thejas M Nair
> Fix For: 0.9.0
>
>
> Avoid sedes when possible for data loaded using PigStorage by implementing
> approach #4 proposed in http://wiki.apache.org/pig/AvoidingSedes .
> The write() and readFields() functions of tuple returned by TupleFactory is
> used to serialize data between Map and Reduce. By using a tuple that knows
> the serialization format of the loader, we avoid sedes at Map Recue boundary
> and use the load functions serialized format between Map and Reduce .
> To use a new custom tuple for this purpose, a custom TupleFactory that
> returns tuples of this type has to be specified using the property
> "pig.data.tuple.factory.name" .
> This approach will work only for a set of load functions in the query that
> share same serialization format for map and bags. If this approach proves to
> be very useful, it will build a case for more extensible approach.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.