On 6/28/10 5:51 PM, "Dmitriy Ryaboy" wrote:
>
> I have a feeling that propagating schemas when known, and using them to for
> (de)serialization instead of reflecting every field, would also be a big
> win.
>
> Thoughts on just using Avro for the internal PigStorage?
When I profiled pig quer
Agree, I have compared the performance between Hive and Pig using some
simple script. The performance of Hive is much better than Pig. The mapper
task time of pig and hive is almost the same, The time difference is almost
is caused by the reduce task and much time is spent on transfer time from
ma
On Jun 28, 2010, at 5:51 PM, Dmitriy Ryaboy wrote:
For what it's worth, I saw very significant speed improvements
(order of
magnitude for wide tables with few projected columns) when I
implemented (2)
for our protocol buffer - based loaders.
I have a feeling that propagating schemas when k
I don't fully understand the repercussions of this, but I like it. We're
moving from our VoldemortStorage stuff to Avro and it would be great to pipe
Avro all the way through.
Russ
On Mon, Jun 28, 2010 at 5:51 PM, Dmitriy Ryaboy wrote:
> For what it's worth, I saw very significant speed improv
For what it's worth, I saw very significant speed improvements (order of
magnitude for wide tables with few projected columns) when I implemented (2)
for our protocol buffer - based loaders.
I have a feeling that propagating schemas when known, and using them to for
(de)serialization instead of re
I have created a wiki which puts together some ideas that can help in
improving performance by avoiding/delaying serialization/de-serialization .
http://wiki.apache.org/pig/AvoidingSedes
These are ideas that don't involve changes to optimizer. Most of them
involve changes in the load/store functi