Hi,

I didn't hear about the possibility to split by regex. May somebody else will 
post here if it's possible.

But you could maybe workaround that by doing a mapping from regex to region in 
your client code.

If that's not an option and it's too difficult to decide how to pre-split you 
could rely on auto-splitting that occurs when hbase.hregion.max.filesize is 
reached.


A often helpful online reference: http://hbase.apache.org/book.html  -> see 
2.8.2.7. Managed Splitting

regards
Chris




________________________________
Von: Prakrati Agrawal <[email protected]>
An: "[email protected]" <[email protected]>; Christian Schäfer 
<[email protected]> 
Gesendet: 6:30 Donnerstag, 5.Juli 2012
Betreff: RE: Presplit regions when creating a table

Hi

Can I do splits on regular expressions instead of specific keys? For example, 
keys having a particular pattern go to node#1 and others go to node#2 etc.

Thanks and Regards
Prakrati
-----Original Message-----
From: Christian Schäfer [mailto:[email protected]]
Sent: Wednesday, July 04, 2012 5:14 PM
To: [email protected]
Subject: Re: Presplit regions when creating a table

Simplest way to pre-split a table is on table creation using the hbase shell by 
specifying the key-splits.

This could look like this: create 'mytable', 'myfamily', {SPLITS => ['111111', 
'222222', '333333', '444444']}

resulting in 5 regions: [below-111111[ , [111111-222222[, [222222-333333[, 
[333333-444444[, [444444-above[

If you have  a limited amount of attributes you store per row you should 
consider using OpenTSDB that's built on top of hbase and aims on time series 
data.

regards
Chris



----- Ursprüngliche Message -----
Von: Prakrati Agrawal <[email protected]>
An: "[email protected]" <[email protected]>
CC:
Gesendet: 13:23 Mittwoch, 4.Juli 2012
Betreff: Presplit regions when creating a table

Dear  all,

I am using Hbase 0.90.6
I have a streaming data which I want to store in Hbase table. I thought of the 
row key design as "typeString_date_Id" where typeString is of 5 types.  Now the 
problem is that the types are not evenly distributed i.e I have 1 type a lot 
more than another type due to which if I start inserting the data, I will see 
hotspotting in some region servers as compared to others. To avoid this, I 
thought I will presplit the regions. I am not understanding how to use the 
region splitter to my benefit. Can I get a code snippet on how to do it. I am 
using RegionSplitter interface to do the same.

Thanks
Prakrati

________________________________
This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.


This email message may contain proprietary, private and confidential 
information. The information transmitted is intended only for the person(s) or 
entities to which it is addressed. Any review, retransmission, dissemination or 
other use of, or taking of any action in reliance upon, this information by 
persons or entities other than the intended recipient is prohibited and may be 
illegal. If you received this in error, please contact the sender and delete 
the message from your system.

Mu Sigma takes all reasonable steps to ensure that its electronic 
communications are free from viruses. However, given Internet accessibility, 
the Company cannot accept liability for any virus introduced by this e-mail or 
any attachment and you are advised to use up-to-date virus checking software.

Reply via email to