Hi all,

I'd seen past emails from Scott and Doug about using Avro as the data format for Hive.

This was back in April/May, and I'm wondering about current state of the world.

Specifically, what's the recommended approach (& known issues) with using Avro files with Hive?

E.g. Scott mentioned that "Avro files should be better performing and more compact than sequence files." Has that been proven out?

He also discussed a minor issue with maps - "Their maps however can have any intrinsic type as a key (int, long, string, float, double)."

And a more serious issue with unions, though this wouldn't directly impact us as we wouldn't be using that feature.

In our situation, we're trying to get the best of both worlds by leveraging Hive for analytics, and Cascading for workflow, so having one store in HDFS for both would be a significant win.

Thanks for any input!

-- Ken

--------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g





Reply via email to