Ram, Like I mentioned below the region is up, thanks to "hbase.master.assignment.timeoutmonitor.timeout" setting. Though, it still located on a single region server. How do I split it?
>>Hi >>What I would suggest is try doing a forcefull assign of that region that is >>showing this log using the shell.(If the logs still continues to appear). >>Regards >>Ram On Mon, Oct 31, 2011 at 21:26, Matthew Tovbin <[email protected]> wrote: > Mike, thanks for responding. > > BTW, I have a small update. I succeeded opening the table by setting > "hbase.master.assignment.timeoutmonitor.timeout" to 1 hour. > Now the table is hosted on single region server which is bad (see status > below). Should I compact the table and then split it? > > >>>> What did you set your max region size to be for this table? > I did not set it explicitly, so default settings of 0.90.3-cdh3u1 are > used. What setting should I use? > > >>>> 14K files totalling 650GB means you have a lot of small files... > >>>> On average ~45MB (rough calc). > Correct, I'd like to minimize this number but I am not sure how. > Maybe splits generated by my bulkloader MR job are just wrong, cause now I > just have only one region with bunch of small files. > > >>How many regions? > Here is a status: > hbase(main):012:0> status 'detailed' > version 0.90.3-cdh3u1 > 0 regionsInTransition > 3 live servers > slave113:60020 1320067636128 > requests=0, regions=1, usedHeap=7296, maxHeap=16346 > mytable,,1319730467540.69e5825d3fea11030d9f370a9219328e. > stores=2, storefiles=14917, storefileSizeMB=677337, > memstoreSizeMB=0, storefileIndexSizeMB=5774 > slave115:60020 1320067640784 > requests=0, regions=2, usedHeap=37, maxHeap=16346 > .META.,,1 > stores=1, storefiles=2, storefileSizeMB=0, memstoreSizeMB=0, > storefileIndexSizeMB=0 > -ROOT-,,0 > stores=1, storefiles=1, storefileSizeMB=0, memstoreSizeMB=0, > storefileIndexSizeMB=0 > slave114:60020 1320067640288 > requests=0, regions=1, usedHeap=30, maxHeap=16346 > 0 dead servers > > >>>> Do you have mslabs set up? > Nope. Should I? > > >>>> GC tuning? > Nope. Should I? I use: "-ea -XX:+UseConcMarkSweepGC > -XX:+CMSIncrementalMode" > > > Best regards, > Matthew Tovbin =) > > > > On Mon, Oct 31, 2011 at 15:48, Michel Segel <[email protected]>wrote: > >> What did you set your max region size? >> >> >> >> Sent from a remote device. Please excuse any typos... >> >> Mike Segel >> >> On Oct 31, 2011, at 5:07 AM, Matthew Tovbin <[email protected]> wrote: >> >> > Ted, thanks for such a rapid response. >> > >> > You're right, we use hbase 0.90.3 from cdh3u1. >> > >> > So, I suppose I need to make bulk loading in smaller bulks then. Any >> other >> > suggestions? >> > >> > >> > Best regards, >> > Matthew Tovbin =) >> > >> >> >> >> >> >> I assume you're using HBase 0.90.x where HBASE-4015 isn't available. >> >> >> >>>> 5. And so on, till some of Slaves fail with >> "java.net.SocketException: >> >> Too many open files". >> >> Do you have some monitoring setup so that you can know the number of >> open >> >> file handles ? >> >> >> >> Cheers >> >> >> >> On Sun, Oct 30, 2011 at 7:21 AM, Matthew Tovbin <[EMAIL PROTECTED]> >> wrote: >> >> >> >>> Hi guys, >> >>> >> >>> I've bulkloaded a solid amount of data (650GB, ~14000 files) into >> Hbase >> >>> (1master + 3regions) and now enabling the table results the >> >>> following behavior on the cluster: >> >>> >> >>> 1. Master says that opening started - >> >>> "org.apache.hadoop.hbase.master.AssignmentManager: Handling >> >>> transition=RS_ZK_REGION_OPENING, server=slave..." >> >>> 2. Slaves report about opening files in progress - >> >>> "org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://...." >> >>> 3. Then after ~10 mins the following error occurs on hmaster - >> >>> "org.apache.hadoop.hbase.master.AssignmentManager: Regions in >> > transition >> >>> timed out / Region has been OPENING for too long, reassigning >> > region=..." >> >>> 4. More slaves report about opening files in progress - >> >>> "org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://...." >> >>> 5. And so on, till some of Slaves fail with >> "java.net.SocketException: >> >>> Too many open files". >> >>> >> >>> >> >>> What I've done already to solve the issue (which DID NOT help though): >> >>> >> >>> 1. Set 'ulimit -n 65536' for hbase user >> >>> 2. Set hbase.hbasemaster.maxregionopen=3600000 (1 hour) in >> > hbase-site.xml >> >>> >> >>> >> >>> What else can I try?! >> >>> >> >>> >> >>> Best regards, >> >>> Matthew Tovbin =) >> >>> >> > >
