Re: Performance for hive external to hbase with serval terabyte or more data

Sathi Chowdhury Wed, 11 May 2016 20:04:03 -0700

Hi Yang,
Did you think of bulk loading option?

http://blog.cloudera.com/blog/2013/09/how-to-use-hbase-bulk-loading-and-why/
This may be a way to go .
Thanks
Sathi



On May 11, 2016, at 6:07 PM, Yi Jiang 
<yi.ji...@ubisoft.com<mailto:yi.ji...@ubisoft.com>> wrote:

Hi, Guys
Recently we are debating the usage for hbase as our destination for data 
pipeline job.
Basically, we want to save our logs into hbase, and our pipeline can generate 
2-4 terabytes data everyday, but our IT department think it is not good idea to 
scan so hbase, it will cause the performance and memory issue. And they ask our 
just keep 15 minutes data amount in the hbase for real time analysis.
For now, I am using hive to external to hbase, but what I am thinking that for 
map reduce job, what kind of mapper it is using to scan the data from hbase? Is 
it TableInputFormatBase? and how many mapper it will use in hive to scan the 
hbase. Is it efficient or not? Will it cause the performance issue if we have 
couple T's or more larger data amount?
I am also trying to index some columns that we might use to query. But  I am 
not sure if it is good idea to keep so much history data in the hbase for query.
Thank you
Jacky

Re: Performance for hive external to hbase with serval terabyte or more data

Reply via email to