Hi Shushant,
Have you looked at OpenTSDB? If you use timestamp in your rowkey you will
create what we call hotspots and you want to avoid that.OpenTSDB might help
you with that.
They key you propose will create Hotspot with default HBase version and you
want to avoid that. You can place the ID fi
Thanks Jean !
Few more questions
what are good practices for key column design in HBase?
Say my web logs contains timestamp and request id which uniquely identify
each row
1.Shall I make -MM-DD-HH-MM-SS_REQ_ID as row key ? In scenario where
this data will be fetched from HBase on daily base a
With HBase you have some overhead. The Region Server will do a lot for you.
Manage lal the columns families, the columns, the delete marker, the
compactions, etc. If you read a file directly from HDFS it will be faster
for sure because you will not have all those validations and all this extra
memo
Hi Jean
Thanks for explanation .
I still have one doubt
Why HBase is not good for bulk loads and aggregations
(Full table scan) ? Hive will also read each row for aggregation as well as
HBase .
Can you explain more ?
On Wed, Apr 30, 2014 at 5:15 PM, Jean-Marc Spaggiari <
jean-m...@spaggiari.or
HIve and HBase are 2 different tools/technologies. They are used together
but hey are not interchangeable.
HIve is for on-demand, RDMS SQL like data access while HBase is the actual
data store. Hive runs on HBase providing a on-demand, SQL like API.
Regards,
Shahab
On Wed, Apr 30, 2014 at 4:34
Hi Shushant,
Hive and HBase are 2 different things. You can not really use one vs
another one.
Hive is a query engine against HDFS data. Data can be stored with different
format like flat text, sequence files, Paquet file, or even HBase table.
HBase is both a query engine (Get and scans) and a st
I have a requirement of processing huge weblogs on daily basis.
1. data will come incremental to datastore on daily basis and I need
cumulative and daily
distinct user count from logs and after that aggregated data will be loaded
in RDBMS like mydql.
2.data will be loaded in hdfs datawarehouse o