Thanks Edward. I'm actually populating this table periodically from another
temporary table and OCR sounds like a good fit. But unfortunately we are
stuck with Hive 0.9.
I wonder how easy/hard to use the data stored as RCFile or ORC with Java
MapReduce?
thanks,
Thilina
On Mon, Jan 27, 2014 at 3
The thing about OCR is that it is great for tables created from other
tables, (like the other columnar formats) but if you are logging directly
to HDFS, a columnar format is not easy (possible) to write directly.
Normally people store data in a very direct row oriented form and then
there first map
In general, use Sequence Files + with GZip or Snappy Compression.
On Mon, Jan 27, 2014 at 2:44 PM, Thilina Gunarathne wrote:
> Thanks Eric and Sharath for the pointers to ORC. Unfortunately ORC would
> not be an option for us as our cluster still runs Hive 0.9 and we won't be
> migrating any tim
Thanks Eric and Sharath for the pointers to ORC. Unfortunately ORC would
not be an option for us as our cluster still runs Hive 0.9 and we won't be
migrating any time soon.
thanks,
Thilina
On Mon, Jan 27, 2014 at 2:35 PM, Sharath Punreddy wrote:
> Quick insights:
>
>
> http://hortonworks.com/bl
Quick insights:
http://hortonworks.com/blog/orcfile-in-hdp-2-better-compression-better-performance/
On Mon, Jan 27, 2014 at 1:29 PM, Eric Hanson (BIG DATA) <
eric.n.han...@microsoft.com> wrote:
> It sounds like ORC would be best.
>
>
>
> -Eric
>
>
>
> *From:* Thilina Gunarath
It sounds like ORC would be best.
-Eric
From: Thilina Gunarathne [mailto:cset...@gmail.com]
Sent: Monday, January 27, 2014 11:05 AM
To: user@hive.apache.org
Subject: RCFile vs SequenceFile vs text files
Dear all,
We are trying to pick the right data storage format for the Hive ta