Re: RCFile vs SequenceFile vs text files

2014-01-27 Thread Thilina Gunarathne
Thanks Edward. I'm actually populating this table periodically from another temporary table and OCR sounds like a good fit. But unfortunately we are stuck with Hive 0.9. I wonder how easy/hard to use the data stored as RCFile or ORC with Java MapReduce? thanks, Thilina On Mon, Jan 27, 2014 at 3

Re: RCFile vs SequenceFile vs text files

2014-01-27 Thread Edward Capriolo
The thing about OCR is that it is great for tables created from other tables, (like the other columnar formats) but if you are logging directly to HDFS, a columnar format is not easy (possible) to write directly. Normally people store data in a very direct row oriented form and then there first map

Re: RCFile vs SequenceFile vs text files

2014-01-27 Thread Edward Capriolo
In general, use Sequence Files + with GZip or Snappy Compression. On Mon, Jan 27, 2014 at 2:44 PM, Thilina Gunarathne wrote: > Thanks Eric and Sharath for the pointers to ORC. Unfortunately ORC would > not be an option for us as our cluster still runs Hive 0.9 and we won't be > migrating any tim

Re: RCFile vs SequenceFile vs text files

2014-01-27 Thread Thilina Gunarathne
Thanks Eric and Sharath for the pointers to ORC. Unfortunately ORC would not be an option for us as our cluster still runs Hive 0.9 and we won't be migrating any time soon. thanks, Thilina On Mon, Jan 27, 2014 at 2:35 PM, Sharath Punreddy wrote: > Quick insights: > > > http://hortonworks.com/bl

Re: RCFile vs SequenceFile vs text files

2014-01-27 Thread Sharath Punreddy
Quick insights: http://hortonworks.com/blog/orcfile-in-hdp-2-better-compression-better-performance/ On Mon, Jan 27, 2014 at 1:29 PM, Eric Hanson (BIG DATA) < eric.n.han...@microsoft.com> wrote: > It sounds like ORC would be best. > > > > -Eric > > > > *From:* Thilina Gunarath

RE: RCFile vs SequenceFile vs text files

2014-01-27 Thread Eric Hanson (BIG DATA)
It sounds like ORC would be best. -Eric From: Thilina Gunarathne [mailto:cset...@gmail.com] Sent: Monday, January 27, 2014 11:05 AM To: user@hive.apache.org Subject: RCFile vs SequenceFile vs text files Dear all, We are trying to pick the right data storage format for the Hive ta