Re: parallel scanning?

2016-02-06 Thread Ted Yu
bq. we can write twice/multi-time with no problem If you always write twice, the latency would go up. Yet, there is no guarantee that one of the writes would be successful. On Fri, Feb 5, 2016 at 9:48 PM, Jameson Li wrote: > ''By line, did you mean number of rows ? > > Yes,

Re: parallel scanning?

2016-02-05 Thread Jameson Li
2016-01-26 2:29 GMT+08:00 Henning Blohm : > I am looking for advise on an HBase mass data access optimization problem. > For multi-get and multi-scan: In my opion, multi-get(make less line) can work in realtime query, but multi-scan maybe work but it will let server

Re: parallel scanning?

2016-02-05 Thread Ted Yu
bq. when the result line is so much lines By line, did you mean number of rows ? bq. one table with rowkey as A_B_time, another as B_A_time In the above case, handling failed write (to the second table) becomes a bit tricky. Cheers On Fri, Feb 5, 2016 at 12:08 AM, Jameson Li

Re: parallel scanning?

2016-02-05 Thread Jameson Li
''By line, did you mean number of rows ? Yes, sorry for my poor English. ''In the above case, handling failed write (to the second table) becomes a bit tricky. Yes, But I think sometimes write question will can solve easier than read, and that sometimes we can write twice/multi-time with no

Re: parallel scanning?

2016-02-01 Thread Stack
On Mon, Jan 25, 2016 at 10:29 AM, Henning Blohm wrote: > Hi, > > I am looking for advise on an HBase mass data access optimization problem. > > In our application all data records stored in Hbase have a time dimension > (as inverted time) and a GUID in the row key.

parallel scanning?

2016-01-25 Thread Henning Blohm
Hi, I am looking for advise on an HBase mass data access optimization problem. In our application all data records stored in Hbase have a time dimension (as inverted time) and a GUID in the row key. Retrieving a record requires issueing a scan with the GUID as prefix. In order to get to

how to do parallel scanning in map reduce using hbase as input?

2014-06-26 Thread Li Li
my table has about 700 million rows and about 80 regions. each task tracker is configured with 4 mappers and 4 reducers at the same time. The hadoop/hbase cluster has 5 nodes so at the same time, it has 20 mappers running. it takes more than an hour to finish mapper stage. The hbase cluster's load

Re: how to do parallel scanning in map reduce using hbase as input?

2014-06-26 Thread Ted Yu
80 regions over 5 nodes - that's 16 per server. How big is average region size ? Have you considered splitting existing regions ? Cheers On Jun 26, 2014, at 12:34 AM, Li Li fancye...@gmail.com wrote: my table has about 700 million rows and about 80 regions. each task tracker is configured

Re: how to do parallel scanning in map reduce using hbase as input?

2014-06-26 Thread Li Li
I don't think splitting will help. Adding more mappers in tasktracker will use more resources(heap memory). btw, how to view average region size? I found in web ui: ServerName Num. Stores Num. Storefiles Storefile Size Uncompressed Storefile Size Index Size Bloom Size