On Jun 28, 2010, at 5:51 PM, Dmitriy Ryaboy wrote:
For what it's worth, I saw very significant speed improvements
(order of
magnitude for wide tables with few projected columns) when I
implemented (2)
for our protocol buffer - based loaders.
I have a feeling that propagating schemas when known, and using them
to for
(de)serialization instead of reflecting every field, would also be a
big
win.
Thoughts on just using Avro for the internal PigStorage?
I'm been trying to play with this in my spare time but haven't gotten
far yet. We're certain open to looking at it and seeing how it
performs.
Alan.
-D
On Mon, Jun 28, 2010 at 5:08 PM, Thejas Nair <te...@yahoo-inc.com>
wrote:
I have created a wiki which puts together some ideas that can help in
improving performance by avoiding/delaying serialization/de-
serialization .
http://wiki.apache.org/pig/AvoidingSedes
These are ideas that don't involve changes to optimizer. Most of them
involve changes in the load/store functions.
Your feedback is welcome.
Thanks,
Thejas