Re: Avoiding serialization/de-serialization in pig

2010-06-30 Thread Alan Gates
On Jun 28, 2010, at 5:51 PM, Dmitriy Ryaboy wrote: For what it's worth, I saw very significant speed improvements (order of magnitude for wide tables with few projected columns) when I implemented (2) for our protocol buffer - based loaders. I have a feeling that propagating schemas when

Re: Avoiding serialization/de-serialization in pig

2010-06-30 Thread Thejas Nair
On 6/28/10 5:51 PM, Dmitriy Ryaboy dvrya...@gmail.com wrote: I have a feeling that propagating schemas when known, and using them to for (de)serialization instead of reflecting every field, would also be a big win. Thoughts on just using Avro for the internal PigStorage? When I

Re: Avoiding serialization/de-serialization in pig

2010-06-28 Thread Dmitriy Ryaboy
For what it's worth, I saw very significant speed improvements (order of magnitude for wide tables with few projected columns) when I implemented (2) for our protocol buffer - based loaders. I have a feeling that propagating schemas when known, and using them to for (de)serialization instead of

Re: Avoiding serialization/de-serialization in pig

2010-06-28 Thread Russell Jurney
I don't fully understand the repercussions of this, but I like it. We're moving from our VoldemortStorage stuff to Avro and it would be great to pipe Avro all the way through. Russ On Mon, Jun 28, 2010 at 5:51 PM, Dmitriy Ryaboy dvrya...@gmail.com wrote: For what it's worth, I saw very