What HBase version are you planning to use ? In 0.94, you can refer to: src/main/java/org/apache/hadoop/hbase/regionserver/KeyPrefixRegionSplitPolicy.java
You can write a policy which splits along category boundaries. There're other split policies in case you're interested: ./src/main/java/org/apache/hadoop/hbase/regionserver/ConstantSizeRegionSplitPolicy.java ./src/main/java/org/apache/hadoop/hbase/regionserver/DelimitedKeyPrefixRegionSplitPolicy.java ./src/main/java/org/apache/hadoop/hbase/regionserver/IncreasingToUpperBoundRegionSplitPolicy.java Cheers On Mon, Mar 4, 2013 at 12:55 PM, Lukáš Drbal <[email protected]> wrote: > Hi Jilal, > thanks for response, but can you give me please any link or explain it > more? > I don't know what you mean with regular expression spliting. My data are > not fixed and will grow in time. > > Thanks. > > Regards > > Lukas Drbal > > > 2013/3/4 Jilal Oussama <[email protected]> > > > You can split in your application using a regular expression on the > > underscore char if the langage supports them (like spliting data of a csv > > file) > > > > > > 2013/3/4 Lukáš Drbal <[email protected]> > > > > > Hi, > > > > > > i have one question about rowkey design and presplit table. > > > > > > My usecase: > > > I need store a lot of comments where each comment are for one article > and > > > this article has one category. > > > > > > What i need: > > > 1) read one comment by id (where i know commentId, articleId and > > > categoryId) > > > 2) read all coments for article (i know categoryId and articleId) > > > 3) read all comments for category (i know categoryId) > > > > > > From this read pattern i see one good rowkey: > > > <categoryId>_<articleId>_<commentId> > > > > > > But here i don't have fixed size of rowkey, so i don't know how to > define > > > split pattern. How can be this solved? > > > This id's come from external system and grow very fast, so add some > like > > > "padding" for each part are hard. > > > > > > Maybe i can use hash function for each part > > > md5(<categoryId>_md5(<articleId>)_md5(<commentId>), but this rowkey is > > very > > > long (3*32+2 bytes), i don't have experience with this long rowkeys. > > > > > > Can someone give me a suggestions please? > > > > > > Regards > > > > > > Lukas Drbal > > > > > > > > > -- > Save The World - http://www.worldcommunitygrid.org/ > http://www.worldcommunitygrid.org/stat/viewMemberInfo.do?userName=LesTR > > LesTR >
