On Jun 28, 2010, at 5:51 PM, Dmitriy Ryaboy wrote:

For what it's worth, I saw very significant speed improvements (order of magnitude for wide tables with few projected columns) when I implemented (2)
for our protocol buffer - based loaders.

I have a feeling that propagating schemas when known, and using them to for (de)serialization instead of reflecting every field, would also be a big
win.

Thoughts on just using Avro for the internal PigStorage?
I'm been trying to play with this in my spare time but haven't gotten far yet. We're certain open to looking at it and seeing how it performs.

Alan.


-D

On Mon, Jun 28, 2010 at 5:08 PM, Thejas Nair <te...@yahoo-inc.com> wrote:

I have created a wiki which puts together some ideas that can help in
improving performance by avoiding/delaying serialization/de- serialization .

http://wiki.apache.org/pig/AvoidingSedes

These are ideas that don't involve changes to optimizer. Most of them
involve changes in the load/store functions.

Your feedback is welcome.

Thanks,
Thejas



Reply via email to