If the split takes too long (longer than 30 secs), I would say you may have too many store files in the region. Split has to write two tiny files per store file. The other thing may be the region has to be closed before split. Thus it has to do a flush. If it cannot complete the flush in time, it might cancel the split as well. Did you check that? Does your compactions working as intended?
Enis On Wed, Dec 18, 2013 at 10:06 AM, Timo Schaepe <[email protected]> wrote: > @Ted Yu: > Yep, nevertheless thanks a lot! > > > Am 18.12.2013 um 10:03 schrieb Ted Yu <[email protected]>: > > > Timo: > > I went through namenode log and didn't find much clue. > > > > Cheers > > > > > > On Tue, Dec 17, 2013 at 9:37 PM, Timo Schaepe <[email protected]> > wrote: > > > >> Hey Ted Yu, > >> > >> I had digging the name node log and so far I've found nothing special. > No > >> Exception, FATAL or ERROR message nor anything other peculiarities. > >> Only I see a lot of messages like this: > >> > >> 2013-12-12 13:53:22,541 INFO org.apache.hadoop.hdfs.StateChange: > Removing > >> lease on > >> > /hbase/Sessions_1091/d04cadb1b2252dafc476c138e9651ca7/.splits/9717de41277e207c24359a18dae72cd3/l/58ab2c11ca9b4b4994ce54bac0bb4c68.d04cadb1b2252dafc476c138e9651ca7 > >> from client DFSClient_hb_rs_baur-hbase7.baur.boreus.de > >> ,60020,1386712527761_1295065721_26 > >> 2013-12-12 13:53:22,541 INFO org.apache.hadoop.hdfs.StateChange: DIR* > >> completeFile: > >> > /hbase/Sessions_1091/d04cadb1b2252dafc476c138e9651ca7/.splits/9717de41277e207c24359a18dae72cd3/l/58ab2c11ca9b4b4994ce54bac0bb4c68.d04cadb1b2252dafc476c138e9651ca7 > >> is closed by DFSClient_hb_rs_baur-hbase7.baur.boreus.de > >> ,60020,1386712527761_1295065721_26 > >> > >> But maybe that is normal. If you wanna have a look, you can find the log > >> snippet at > >> > https://www.dropbox.com/s/8sls714knn4yqp3/hadoop-hadoop-namenode-baur-hbase1.log.2013-12-12.snip > >> > >> Thanks, > >> > >> Timo > >> > >> > >> > >> Am 14.12.2013 um 09:12 schrieb Ted Yu <[email protected]>: > >> > >>> Timo: > >>> Other than two occurrences of 'Took too long to split the files' > >>> @ 13:54:20,194 and 13:55:10,533, I don't find much clue from the posted > >> log. > >>> > >>> If you have time, mind checking namenode log for 1 minute interval > >> leading > >>> up to 13:54:20,194 and 13:55:10,533, respectively ? > >>> > >>> Thanks > >>> > >>> > >>> On Sat, Dec 14, 2013 at 5:21 AM, Timo Schaepe <[email protected]> > >> wrote: > >>> > >>>> Hey, > >>>> > >>>> @JM: Thanks for the hint with hbase.regionserver.fileSplitTimeout. At > >> the > >>>> moment (the import is actually working) and after I splittet the > >> specific > >>>> regions manually, we do not have growing regions anymore. > >>>> > >>>> hbase hbck says, all things are going fine. > >>>> 0 inconsistencies detected. > >>>> Status: OK > >>>> > >>>> @Ted Yu: Sure, have a look here: http://pastebin.com/2ANFVZEU > >>>> The relevant tablename ist data_1091. > >>>> > >>>> Thanks for your time. > >>>> > >>>> Timo > >>>> > >>>> Am 13.12.2013 um 20:18 schrieb Ted Yu <[email protected]>: > >>>> > >>>>> Timo: > >>>>> Can you pastebin regionserver log around 2013-12-12 13:54:20 so that > we > >>>> can > >>>>> see what happened ? > >>>>> > >>>>> Thanks > >>>>> > >>>>> > >>>>> On Fri, Dec 13, 2013 at 11:02 AM, Jean-Marc Spaggiari < > >>>>> [email protected]> wrote: > >>>>> > >>>>>> Try to increase hbase.regionserver.fileSplitTimeout but put it back > to > >>>> its > >>>>>> default value after. > >>>>>> > >>>>>> Default value is 30 seconds. I think it's not normal for a split to > >> take > >>>>>> more than that. > >>>>>> > >>>>>> What is your hardware configuration? > >>>>>> > >>>>>> Have you run hbck to see if everything is correct? > >>>>>> > >>>>>> JM > >>>>>> > >>>>>> > >>>>>> 2013/12/13 Timo Schaepe <[email protected]> > >>>>>> > >>>>>>> Hello again, > >>>>>>> > >>>>>>> digging in the logs of the specific regionserver shows me that: > >>>>>>> > >>>>>>> 2013-12-12 13:54:20,194 INFO > >>>>>>> org.apache.hadoop.hbase.regionserver.SplitRequest: Running > >>>>>> rollback/cleanup > >>>>>>> of failed split of > >>>>>>> > >>>>>> > >>>> > >> > data,OR\x83\xCF\x02\x82\xAE\xF3U,1386851456415.d04cadb1b2252dafc476c138e9651ca7.; > >>>>>>> Took too long to split the files and create the references, > aborting > >>>>>> split > >>>>>>> > >>>>>>> This message appears two time, so it seems, that HBase tried to > split > >>>> the > >>>>>>> region but it failed. I don't know why. How is the behaviour of > >> HBase, > >>>>>> if a > >>>>>>> region split fails? Are there more tries to split this region > again? > >> I > >>>>>>> didn't find any new tries in the log. Now I split the big regions > >>>>>> manually > >>>>>>> and this works. And also it seems, that HBase split the new regions > >>>> again > >>>>>>> to crunch they down to the given limit. > >>>>>>> > >>>>>>> But also it is a mystery for me, why the split size in Hannibal > shows > >>>> me > >>>>>>> 10 GB and in base-site.xml I put 2 GB… > >>>>>>> > >>>>>>> Thanks, > >>>>>>> > >>>>>>> Timo > >>>>>>> > >>>>>>> > >>>>>>> Am 13.12.2013 um 10:22 schrieb Timo Schaepe <[email protected]>: > >>>>>>> > >>>>>>>> Hello, > >>>>>>>> > >>>>>>>> during the loading of data in our cluster I noticed some strange > >>>>>>> behavior of some regions, that I don't understand. > >>>>>>>> > >>>>>>>> Scenario: > >>>>>>>> We convert data from a mysql database to HBase. The data is > inserted > >>>>>>> with a put to the specific HBase table. The row key is a > timestamp. I > >>>>>> know > >>>>>>> the problem with timestamp keys, but in our requirement it works > >> quiet > >>>>>>> well. The problem is now, that there are some regions, which are > >>>> growing > >>>>>>> and growing. > >>>>>>>> > >>>>>>>> For example the table on the picture [1]. First, all data was > >>>>>>> distributed over regions and node. And now, the data is written > into > >>>> only > >>>>>>> one region, which is growing and I can see no splitting at all. > >>>> Actually > >>>>>>> the size of the big region is nearly 60 GB. > >>>>>>>> > >>>>>>>> HBase version is 0.94.11. I cannot understand, why the splitting > is > >>>> not > >>>>>>> happening. In hbase-site.xml I limit the hbase.hregion.max.filesize > >> to > >>>> 2 > >>>>>> GB > >>>>>>> and HBase accepted this value. > >>>>>>>> > >>>>>>>> <property> > >>>>>>>> <!--Loaded from hbase-site.xml--> > >>>>>>>> <name>hbase.hregion.max.filesize</name> > >>>>>>>> <value>2147483648</value> > >>>>>>>> </property> > >>>>>>>> > >>>>>>>> First mystery: Hannibal shows me the split size is 10 GB (see > >>>>>>> screenshot). > >>>>>>>> Second mystery: HBase is not splitting some regions neither at 2 > GB > >>>> nor > >>>>>>> 10 GB. > >>>>>>>> > >>>>>>>> Any ideas? Could be the timestamp rowkey cause this problem? > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> > >>>>>>>> Timo > >>>>>>>> > >>>>>>>> [1] https://www.dropbox.com/s/lm286xkcpglnj1t/big_region.png > >>>>>>> > >>>>>>> > >>>>>> > >>>> > >>>> > >> > >> > >
