For what it's worth, I saw very significant speed improvements (order of
magnitude for wide tables with few projected columns) when I implemented (2)
for our protocol buffer - based loaders.

I have a feeling that propagating schemas when known, and using them to for
(de)serialization instead of reflecting every field, would also be a big
win.

Thoughts on just using Avro for the internal PigStorage?

-D

On Mon, Jun 28, 2010 at 5:08 PM, Thejas Nair <te...@yahoo-inc.com> wrote:

> I have created a wiki which puts together some ideas that can help in
> improving performance by avoiding/delaying serialization/de-serialization .
>
> http://wiki.apache.org/pig/AvoidingSedes
>
> These are ideas that don't involve changes to optimizer. Most of them
> involve changes in the load/store functions.
>
> Your feedback is welcome.
>
> Thanks,
> Thejas
>
>

Reply via email to