bq. we can write twice/multi-time with no problem
If you always write twice, the latency would go up. Yet, there is no
guarantee that one of the writes would be successful.
On Fri, Feb 5, 2016 at 9:48 PM, Jameson Li wrote:
> ''By line, did you mean number of rows ?
>
> Yes,
2016-01-26 2:29 GMT+08:00 Henning Blohm :
> I am looking for advise on an HBase mass data access optimization problem.
>
For multi-get and multi-scan:
In my opion, multi-get(make less line) can work in realtime query, but
multi-scan maybe work but it will let server
bq. when the result line is so much lines
By line, did you mean number of rows ?
bq. one table with rowkey as A_B_time, another as B_A_time
In the above case, handling failed write (to the second table) becomes a
bit tricky.
Cheers
On Fri, Feb 5, 2016 at 12:08 AM, Jameson Li
''By line, did you mean number of rows ?
Yes, sorry for my poor English.
''In the above case, handling failed write (to the second table) becomes a
bit tricky.
Yes, But I think sometimes write question will can solve easier than read,
and that sometimes we can write twice/multi-time with no
On Mon, Jan 25, 2016 at 10:29 AM, Henning Blohm
wrote:
> Hi,
>
> I am looking for advise on an HBase mass data access optimization problem.
>
> In our application all data records stored in Hbase have a time dimension
> (as inverted time) and a GUID in the row key.
Hi,
I am looking for advise on an HBase mass data access optimization problem.
In our application all data records stored in Hbase have a time
dimension (as inverted time) and a GUID in the row key. Retrieving a
record requires issueing a scan with the GUID as prefix.
In order to get to
my table has about 700 million rows and about 80 regions. each task
tracker is configured with 4 mappers and 4 reducers at the same time.
The hadoop/hbase cluster has 5 nodes so at the same time, it has 20
mappers running. it takes more than an hour to finish mapper stage.
The hbase cluster's load
80 regions over 5 nodes - that's 16 per server.
How big is average region size ?
Have you considered splitting existing regions ?
Cheers
On Jun 26, 2014, at 12:34 AM, Li Li fancye...@gmail.com wrote:
my table has about 700 million rows and about 80 regions. each task
tracker is configured
I don't think splitting will help. Adding more mappers in tasktracker
will use more resources(heap memory).
btw, how to view average region size?
I found in web ui:
ServerName Num. Stores Num. Storefiles Storefile Size
Uncompressed Storefile Size Index Size Bloom Size