Ok. I ran minor and major compaction, and it has split the table. I now have many regions. That's perfect! I still think a MIN_REGIONS option might be usefull or somethink like EVENLY_SPLIT. But at least I can adjust my settings with MAX_FILESIZE.
Thanks, JM 2012/11/19, Jean-Marc Spaggiari <[email protected]>: > Hi Kevin, > > Thanks for the suggestion. > > I have disabled the table, setup the MAX_FILESIZE value and enabled the > table. > > I can see that in the UI: > > work_proposed {NAME => 'work_proposed', MAX_FILESIZE => '104857600', > FAMILIES => [{NAME => '@'}]} > > But there is still only one region into the table. > > 104857600 is 100MB > > And here are the files in hadoop: > hadoop@node3:~/hadoop-1.0.3$ bin/hadoop fs -ls > /hbase/work_proposed/daca55e25f5ce23b358851990bd9d6a5/@ > Found 2 items > -rw-r--r-- 3 hbase supergroup 1340467822 2012-11-19 16:06 > /hbase/work_proposed/daca55e25f5ce23b358851990bd9d6a5/@/157867160e684800946dd129900d3f77 > -rw-r--r-- 3 hbase supergroup 834894008 2012-11-19 16:06 > /hbase/work_proposed/daca55e25f5ce23b358851990bd9d6a5/@/72bb17a94dc946da8db5841a37463713 > > The smallest one is almost 800MB. > > Something which might be interesting also will be to have something > like "MIN_REGIONS" where you can setup a number of minimum regions you > want for this table, whithout any consideration of the side of the > file. The goal here is to make sure the table is spread over enought > servers to distribut the work when there is major MapReduce jobs > running... Here, I have a 800MB file, and 8 region servers. I will > setup the MIN_REGIONS value to 8 and let hbase make sure there is at > least 8 regions for this table.... > > JM > > 2012/11/19, Kevin O'dell <[email protected]>: >> JM, >> >> You can go into the shell -> disable table -> alter table command and >> chance MAX_FILESIZE(I think that is what it is) this will set it at a per >> table basis. >> >> On Mon, Nov 19, 2012 at 4:29 AM, Jean-Marc Spaggiari < >> [email protected]> wrote: >> >>> Hi, >>> >>> I have a 400M lines table that I merged yesterday into a single >>> region. I have previously splitted it wrongly. So I would like HBase >>> to split it its way. >>> >>> The issue is that keys are very small in this table and the 400M table >>> is stored on a <10G HFile. >>> >>> I still can use the split option on the HTML interface, but I was >>> wondering if there was a way to tell to hbase that the max filesize >>> for this specific table is 1G, but remains 10G for the other tables? >>> >>> My goal is to split this table into at least 8 pieces. So worst case, >>> since I know the number of lines, I can "simply" look at x/8 lines, >>> note the key, and continue. Then do the split. But is there a more >>> "automatic" way to do it? >>> >>> Thanks, >>> >>> JM >>> >> >> >> >> -- >> Kevin O'Dell >> Customer Operations Engineer, Cloudera >> >
