Timo: I went through namenode log and didn't find much clue. Cheers
On Tue, Dec 17, 2013 at 9:37 PM, Timo Schaepe <[email protected]> wrote: > Hey Ted Yu, > > I had digging the name node log and so far I've found nothing special. No > Exception, FATAL or ERROR message nor anything other peculiarities. > Only I see a lot of messages like this: > > 2013-12-12 13:53:22,541 INFO org.apache.hadoop.hdfs.StateChange: Removing > lease on > > /hbase/Sessions_1091/d04cadb1b2252dafc476c138e9651ca7/.splits/9717de41277e207c24359a18dae72cd3/l/58ab2c11ca9b4b4994ce54bac0bb4c68.d04cadb1b2252dafc476c138e9651ca7 > from client DFSClient_hb_rs_baur-hbase7.baur.boreus.de > ,60020,1386712527761_1295065721_26 > 2013-12-12 13:53:22,541 INFO org.apache.hadoop.hdfs.StateChange: DIR* > completeFile: > /hbase/Sessions_1091/d04cadb1b2252dafc476c138e9651ca7/.splits/9717de41277e207c24359a18dae72cd3/l/58ab2c11ca9b4b4994ce54bac0bb4c68.d04cadb1b2252dafc476c138e9651ca7 > is closed by DFSClient_hb_rs_baur-hbase7.baur.boreus.de > ,60020,1386712527761_1295065721_26 > > But maybe that is normal. If you wanna have a look, you can find the log > snippet at > https://www.dropbox.com/s/8sls714knn4yqp3/hadoop-hadoop-namenode-baur-hbase1.log.2013-12-12.snip > > Thanks, > > Timo > > > > Am 14.12.2013 um 09:12 schrieb Ted Yu <[email protected]>: > > > Timo: > > Other than two occurrences of 'Took too long to split the files' > > @ 13:54:20,194 and 13:55:10,533, I don't find much clue from the posted > log. > > > > If you have time, mind checking namenode log for 1 minute interval > leading > > up to 13:54:20,194 and 13:55:10,533, respectively ? > > > > Thanks > > > > > > On Sat, Dec 14, 2013 at 5:21 AM, Timo Schaepe <[email protected]> > wrote: > > > >> Hey, > >> > >> @JM: Thanks for the hint with hbase.regionserver.fileSplitTimeout. At > the > >> moment (the import is actually working) and after I splittet the > specific > >> regions manually, we do not have growing regions anymore. > >> > >> hbase hbck says, all things are going fine. > >> 0 inconsistencies detected. > >> Status: OK > >> > >> @Ted Yu: Sure, have a look here: http://pastebin.com/2ANFVZEU > >> The relevant tablename ist data_1091. > >> > >> Thanks for your time. > >> > >> Timo > >> > >> Am 13.12.2013 um 20:18 schrieb Ted Yu <[email protected]>: > >> > >>> Timo: > >>> Can you pastebin regionserver log around 2013-12-12 13:54:20 so that we > >> can > >>> see what happened ? > >>> > >>> Thanks > >>> > >>> > >>> On Fri, Dec 13, 2013 at 11:02 AM, Jean-Marc Spaggiari < > >>> [email protected]> wrote: > >>> > >>>> Try to increase hbase.regionserver.fileSplitTimeout but put it back to > >> its > >>>> default value after. > >>>> > >>>> Default value is 30 seconds. I think it's not normal for a split to > take > >>>> more than that. > >>>> > >>>> What is your hardware configuration? > >>>> > >>>> Have you run hbck to see if everything is correct? > >>>> > >>>> JM > >>>> > >>>> > >>>> 2013/12/13 Timo Schaepe <[email protected]> > >>>> > >>>>> Hello again, > >>>>> > >>>>> digging in the logs of the specific regionserver shows me that: > >>>>> > >>>>> 2013-12-12 13:54:20,194 INFO > >>>>> org.apache.hadoop.hbase.regionserver.SplitRequest: Running > >>>> rollback/cleanup > >>>>> of failed split of > >>>>> > >>>> > >> > data,OR\x83\xCF\x02\x82\xAE\xF3U,1386851456415.d04cadb1b2252dafc476c138e9651ca7.; > >>>>> Took too long to split the files and create the references, aborting > >>>> split > >>>>> > >>>>> This message appears two time, so it seems, that HBase tried to split > >> the > >>>>> region but it failed. I don't know why. How is the behaviour of > HBase, > >>>> if a > >>>>> region split fails? Are there more tries to split this region again? > I > >>>>> didn't find any new tries in the log. Now I split the big regions > >>>> manually > >>>>> and this works. And also it seems, that HBase split the new regions > >> again > >>>>> to crunch they down to the given limit. > >>>>> > >>>>> But also it is a mystery for me, why the split size in Hannibal shows > >> me > >>>>> 10 GB and in base-site.xml I put 2 GB… > >>>>> > >>>>> Thanks, > >>>>> > >>>>> Timo > >>>>> > >>>>> > >>>>> Am 13.12.2013 um 10:22 schrieb Timo Schaepe <[email protected]>: > >>>>> > >>>>>> Hello, > >>>>>> > >>>>>> during the loading of data in our cluster I noticed some strange > >>>>> behavior of some regions, that I don't understand. > >>>>>> > >>>>>> Scenario: > >>>>>> We convert data from a mysql database to HBase. The data is inserted > >>>>> with a put to the specific HBase table. The row key is a timestamp. I > >>>> know > >>>>> the problem with timestamp keys, but in our requirement it works > quiet > >>>>> well. The problem is now, that there are some regions, which are > >> growing > >>>>> and growing. > >>>>>> > >>>>>> For example the table on the picture [1]. First, all data was > >>>>> distributed over regions and node. And now, the data is written into > >> only > >>>>> one region, which is growing and I can see no splitting at all. > >> Actually > >>>>> the size of the big region is nearly 60 GB. > >>>>>> > >>>>>> HBase version is 0.94.11. I cannot understand, why the splitting is > >> not > >>>>> happening. In hbase-site.xml I limit the hbase.hregion.max.filesize > to > >> 2 > >>>> GB > >>>>> and HBase accepted this value. > >>>>>> > >>>>>> <property> > >>>>>> <!--Loaded from hbase-site.xml--> > >>>>>> <name>hbase.hregion.max.filesize</name> > >>>>>> <value>2147483648</value> > >>>>>> </property> > >>>>>> > >>>>>> First mystery: Hannibal shows me the split size is 10 GB (see > >>>>> screenshot). > >>>>>> Second mystery: HBase is not splitting some regions neither at 2 GB > >> nor > >>>>> 10 GB. > >>>>>> > >>>>>> Any ideas? Could be the timestamp rowkey cause this problem? > >>>>>> > >>>>>> Thanks, > >>>>>> > >>>>>> Timo > >>>>>> > >>>>>> [1] https://www.dropbox.com/s/lm286xkcpglnj1t/big_region.png > >>>>> > >>>>> > >>>> > >> > >> > >
