Avro and Hive

Ken Krugler Thu, 28 Oct 2010 11:43:42 -0700

Hi all,

I'd seen past emails from Scott and Doug about using Avro as the dataformat for Hive.

This was back in April/May, and I'm wondering about current state ofthe world.

Specifically, what's the recommended approach (& known issues) withusing Avro files with Hive?

E.g. Scott mentioned that "Avro files should be better performing andmore compact than sequence files." Has that been proven out?

He also discussed a minor issue with maps - "Their maps however canhave any intrinsic type as a key (int, long, string, float, double)."

And a more serious issue with unions, though this wouldn't directlyimpact us as we wouldn't be using that feature.

In our situation, we're trying to get the best of both worlds byleveraging Hive for analytics, and Cascading for workflow, so havingone store in HDFS for both would be a significant win.


Thanks for any input!

-- Ken

--------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g

Avro and Hive

Reply via email to