Re: Is it worth storing in ORC for one time read. And can be replace hive with HBase

2015-08-06 Thread Jörn Franke
Yes you should use orc it is much faster and more compact. Additionally you can apply compression (snappy) to increase performance. Your data processing pipeline seems to be not.very optimized. You should use the newest hive version enabling storage indexes and bloom filters on appropriate

Is it worth storing in ORC for one time read. And can be replace hive with HBase

2015-08-06 Thread venkatesh b
Hi, here I got two things to know. FIRST: In our project we use hive. We daily get new data. We need to process this new data only once. And send this processed data to RDBMS. Here in processing we majorly use many complex queries with joins with where condition and grouping functions. There are

Re: Is it worth storing in ORC for one time read. And can be replace hive with HBase

2015-08-06 Thread Jörn Franke
Additionally it is of key importance to use the right data types for the columns. Use int for ids, int or decimal or float or double etc for numeric values etc. - A bad data model using varchars and string where not appropriate is a significant bottle neck. Furthermore include partition columns

Re: Is it worth storing in ORC for one time read. And can be replace hive with HBase

2015-08-06 Thread venkatesh b
I'm really sorry, by mistake I posted in spark mailing list. Jorn Frankie Thanks for your reply. I have many joins, many complex queries and all are table scans. So I think HBase do not work for me. On Thursday, August 6, 2015, Jörn Franke jornfra...@gmail.com wrote: Additionally it is of key