Some more results:
1. Splits regions:
1. In cloudera distribution (i.e. HBase 0.90.1) there is a bug: before
compaction it check for spliting but never check
HRegion.splitRequest... In
the trunk its already fixed.
2. Also added to trunk way to split in specific row
(HBASE-3328<https://issues.apache.org/jira/browse/HBASE-3328>
and HBASE-3437 <https://issues.apache.org/jira/browse/HBASE-3437>).
3. Now missing update of the JSP page:
HBASE-3462<https://issues.apache.org/jira/browse/HBASE-3462>
2. I've added issue for splitting regions in requested position rather in
the mid file:
HBASE-3879<https://issues.apache.org/jira/browse/HBASE-3879> For
me it'll be a great update...
So to summary all my finding:
Retention using region deletion can be good solution assuming:
1. Your data sorted by the retention key.
2. You have HBase 0.92 and higher
So, now with that and with the security/co-processors I can ask: when do you
think 0.92 going to deployed?
BTW
Do you have any simulator to run HBase master and region server to check
this code?
Ophir
On Wed, May 11, 2011 at 10:32 PM, Ophir Cohen <[email protected]> wrote:
> Thanks for the comments,
>
> Going to work on it tomorrow - I'll keep you updated.
> Ophir
>
> On Wed, May 11, 2011 at 8:01 PM, Stack <[email protected]> wrote:
>
>> On Wed, May 11, 2011 at 6:14 AM, Ophir Cohen <[email protected]> wrote:
>> > My results from today's researches:
>> >
>> > I tried to delete region as Stack suggested:
>> >
>> > 1. *close_region*
>> > 2. Remove files from file system.
>> > 3. *assign* the region again.
>> >
>>
>> Try inserting something into that region and then getting it back out.
>> Flush it explicitly . See that a file is added to hdfs. Again get
>> the result back out. That'll tell you for sure if it works.
>>
>
> Tried that already - and got the results back. Going to try it tomorrow
> with bigger size of data.
>
>>
>> > 1. Can I split a region by a specific key? It looks that it split
>> > automatically.
>>
>> You can pass a key in the UI and in shell. If the key exists, I
>> believe it will split on the passed key (You should confirm). If the
>> key does not exist, it'll split on the closest.
>>
>
> In the web page just state that it'll be on the region that this key exists
> on. I'll try to trace it in the code as it seems not to work right now.
>
>>
>>
>> > 2. It seems that splitting from command line does not work... I get
>> the
>> > message in the log but nothing really happened. Actually in the code
>> it
>> > stated that it triggered compaction and that should be enough (????).
>>
>>
>> This sounds like a bug. The UI uses same code path so bug is probably
>> in it too. We might have to do some fixing herein. Want to try
>> tracing where it goes awry?
>>
>
> I'll trace it out and let know. I'll file a bug if needed... And yes, it
> does not work not from the page nore from shell.
>
>>
>>
>> > 3. Is there a way to choose my method of region splitting? I think it
>> can
>> > be a great option - way to state when and how region is splitted...
>> >
>>
>> No. Its the size of the biggest store file that determines when we
>> split. Its not currently pluggable. But its a good idea (File an
>> issue?). I'm not sure if coprocessors have influence over when a
>> split runs.
>>
>> OK. I'll see - it looks like a nice feature. For me it'll be exactly what
> I need - I'll split it by customers.
>
>
>> FYI, split check happens after compaction check. That might be why
>> you see the compaction message in the above though you invoked a
>> split.
>>
>
> Yep that exaplain it. The comments in the code also stated that compaction
> is enough for making the split happend (and then the split isn't happen :()
>
>>
>> St.Ack
>>
>
>