Re: when to use hive vs hbase

2014-04-30 Thread Jean-Marc Spaggiari
Hi Shushant, Have you looked at OpenTSDB? If you use timestamp in your rowkey you will create what we call hotspots and you want to avoid that.OpenTSDB might help you with that. They key you propose will create Hotspot with default HBase version and you want to avoid that. You can place the ID fi

Re: when to use hive vs hbase

2014-04-30 Thread Shushant Arora
Thanks Jean ! Few more questions what are good practices for key column design in HBase? Say my web logs contains timestamp and request id which uniquely identify each row 1.Shall I make -MM-DD-HH-MM-SS_REQ_ID as row key ? In scenario where this data will be fetched from HBase on daily base a

Re: when to use hive vs hbase

2014-04-30 Thread Jean-Marc Spaggiari
With HBase you have some overhead. The Region Server will do a lot for you. Manage lal the columns families, the columns, the delete marker, the compactions, etc. If you read a file directly from HDFS it will be faster for sure because you will not have all those validations and all this extra memo

Re: when to use hive vs hbase

2014-04-30 Thread Shushant Arora
Hi Jean Thanks for explanation . I still have one doubt Why HBase is not good for bulk loads and aggregations (Full table scan) ? Hive will also read each row for aggregation as well as HBase . Can you explain more ? On Wed, Apr 30, 2014 at 5:15 PM, Jean-Marc Spaggiari < jean-m...@spaggiari.or

Re: when to use hive vs hbase

2014-04-30 Thread Shahab Yunus
HIve and HBase are 2 different tools/technologies. They are used together but hey are not interchangeable. HIve is for on-demand, RDMS SQL like data access while HBase is the actual data store. Hive runs on HBase providing a on-demand, SQL like API. Regards, Shahab On Wed, Apr 30, 2014 at 4:34

Re: when to use hive vs hbase

2014-04-30 Thread Jean-Marc Spaggiari
Hi Shushant, Hive and HBase are 2 different things. You can not really use one vs another one. Hive is a query engine against HDFS data. Data can be stored with different format like flat text, sequence files, Paquet file, or even HBase table. HBase is both a query engine (Get and scans) and a st

when to use hive vs hbase

2014-04-30 Thread Shushant Arora
I have a requirement of processing huge weblogs on daily basis. 1. data will come incremental to datastore on daily basis and I need cumulative and daily distinct user count from logs and after that aggregated data will be loaded in RDBMS like mydql. 2.data will be loaded in hdfs datawarehouse o