Re: Region has been OPENING for too long

Matthew Tovbin Mon, 31 Oct 2011 12:29:56 -0700

Ram,

Like I mentioned below the region is up, thanks to
"hbase.master.assignment.timeoutmonitor.timeout"
setting. Though, it still located on a single region server. How do I split
it?


>>Hi

>>What I would suggest is try doing a forcefull assign of that region that
is
>>showing this log using the shell.(If the logs still continues to appear).

>>Regards
>>Ram

On Mon, Oct 31, 2011 at 21:26, Matthew Tovbin <[email protected]> wrote:

> Mike, thanks for responding.
>
> BTW, I have a small update. I succeeded opening the table by setting
> "hbase.master.assignment.timeoutmonitor.timeout" to 1 hour.
> Now the table is hosted on single region server which is bad (see status
> below). Should I compact the table and then split it?
>
> >>>>  What did you set your max region size to be for this table?
> I did not set it explicitly, so default settings of 0.90.3-cdh3u1 are
> used. What setting should I use?
>
> >>>> 14K files totalling 650GB means you have a lot of small files...
> >>>>  On average ~45MB (rough calc).
> Correct, I'd like to minimize this number but I am not sure how.
> Maybe splits generated by my bulkloader MR job are just wrong, cause now I
> just have only one region with bunch of small files.
>
> >>How many regions?
> Here is a status:
> hbase(main):012:0> status 'detailed'
> version 0.90.3-cdh3u1
> 0 regionsInTransition
> 3 live servers
>     slave113:60020 1320067636128
>         requests=0, regions=1, usedHeap=7296, maxHeap=16346
>         mytable,,1319730467540.69e5825d3fea11030d9f370a9219328e.
>             stores=2, storefiles=14917, storefileSizeMB=677337,
> memstoreSizeMB=0, storefileIndexSizeMB=5774
>     slave115:60020 1320067640784
>         requests=0, regions=2, usedHeap=37, maxHeap=16346
>         .META.,,1
>             stores=1, storefiles=2, storefileSizeMB=0, memstoreSizeMB=0,
> storefileIndexSizeMB=0
>         -ROOT-,,0
>             stores=1, storefiles=1, storefileSizeMB=0, memstoreSizeMB=0,
> storefileIndexSizeMB=0
>     slave114:60020 1320067640288
>         requests=0, regions=1, usedHeap=30, maxHeap=16346
> 0 dead servers
>
> >>>>  Do you have mslabs set up?
> Nope. Should I?
>
> >>>> GC tuning?
> Nope. Should I? I use: "-ea -XX:+UseConcMarkSweepGC
> -XX:+CMSIncrementalMode"
>
>
> Best regards,
>    Matthew Tovbin =)
>
>
>
> On Mon, Oct 31, 2011 at 15:48, Michel Segel <[email protected]>wrote:
>
>> What did you set your max region size?
>>
>>
>>
>> Sent from a remote device. Please excuse any typos...
>>
>> Mike Segel
>>
>> On Oct 31, 2011, at 5:07 AM, Matthew Tovbin <[email protected]> wrote:
>>
>> > Ted,  thanks for such a rapid response.
>> >
>> > You're right, we use hbase 0.90.3 from cdh3u1.
>> >
>> > So, I suppose I need to make bulk loading in smaller bulks then. Any
>> other
>> > suggestions?
>> >
>> >
>> > Best regards,
>> >    Matthew Tovbin =)
>> >
>> >>
>> >>
>> >> I assume you're using HBase 0.90.x where HBASE-4015 isn't available.
>> >>
>> >>>> 5. And so on, till some of Slaves fail with
>> "java.net.SocketException:
>> >> Too many open files".
>> >> Do you have some monitoring setup so that you can know the number of
>> open
>> >> file handles ?
>> >>
>> >> Cheers
>> >>
>> >> On Sun, Oct 30, 2011 at 7:21 AM, Matthew Tovbin <[EMAIL PROTECTED]>
>> wrote:
>> >>
>> >>> Hi guys,
>> >>>
>> >>>  I've bulkloaded a solid amount of data (650GB, ~14000 files) into
>> Hbase
>> >>> (1master + 3regions) and now enabling the table results the
>> >>> following behavior on the cluster:
>> >>>
>> >>>  1. Master says that opening started  -
>> >>>   "org.apache.hadoop.hbase.master.AssignmentManager: Handling
>> >>>  transition=RS_ZK_REGION_OPENING, server=slave..."
>> >>>  2. Slaves report about opening files in progress -
>> >>>  "org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://...."
>> >>>  3. Then after ~10 mins the following error occurs on hmaster -
>> >>>   "org.apache.hadoop.hbase.master.AssignmentManager: Regions in
>> > transition
>> >>>  timed out / Region has been OPENING for too long, reassigning
>> > region=..."
>> >>>  4. More slaves report about opening files in progress -
>> >>>  "org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://...."
>> >>>  5. And so on, till some of Slaves fail with
>> "java.net.SocketException:
>> >>>  Too many open files".
>> >>>
>> >>>
>> >>> What I've done already to solve the issue (which DID NOT help though):
>> >>>
>> >>>  1. Set 'ulimit -n 65536' for hbase user
>> >>>  2. Set hbase.hbasemaster.maxregionopen=3600000 (1 hour) in
>> > hbase-site.xml
>> >>>
>> >>>
>> >>> What else can I try?!
>> >>>
>> >>>
>> >>> Best regards,
>> >>>   Matthew Tovbin =)
>> >>>
>>
>
>

Re: Region has been OPENING for too long

Reply via email to