http://pastebin.com/S7ETUpSb
and Too many hlogs files: http://pastebin.com/j3GMynww Why do I have so many hlogs? -Jack On Mon, Sep 27, 2010 at 1:33 PM, Jean-Daniel Cryans <[email protected]> wrote: > You could set the blocking store files setting higher (we have it at > 17 here), but looking at the log I see it was blocking for 90secs only > to flush a 1MB file. Why was that flush requested? Global memstore > size reached? The log from a few lines before should tell > > J-D > > On Mon, Sep 27, 2010 at 1:18 PM, Jack Levin <[email protected]> wrote: >> I see it: http://pastebin.com/tgQHBSLj >> >> Interesting situation indeed. Any thoughts on how to avoid it? Have >> compaction running more aggressively? >> >> -Jack >> >> On Mon, Sep 27, 2010 at 1:00 PM, Jean-Daniel Cryans <[email protected]> >> wrote: >>> Can you grep around the region server log files to see what was going >>> on with that region during the previous run? There's only 1 way I see >>> this happening, and it would require that your region server would be >>> serving thousands of regions and that this region was in queue to be >>> compacted behind all those thousands of regions, and in the mean time >>> the flush blocker of 90 seconds would timeout at least enough times so >>> that you would end up with all those store files (which according to >>> my quick calculation, would mean that it took about 23 hours before >>> the region server was able to compact that region which is something >>> I've never seen, and it would have killed your region server with >>> OOME). Do you see this message often? >>> >>> LOG.info("Waited " + (System.currentTimeMillis() - fqe.createTime) + >>> "ms on a compaction to clean up 'too many store files'; waited " + >>> "long enough... proceeding with flush of " + >>> region.getRegionNameAsString()); >>> >>> Thx, >>> >>> J-D >>> >>> On Mon, Sep 27, 2010 at 12:54 PM, Jack Levin <[email protected]> wrote: >>>> Strange: this is what I have: >>>> >>>> <property> >>>> <name>hbase.hstore.blockingStoreFiles</name> >>>> <value>7</value> >>>> <description> >>>> If more than this number of StoreFiles in any one Store >>>> (one StoreFile is written per flush of MemStore) then updates are >>>> blocked for this HRegion until a compaction is completed, or >>>> until hbase.hstore.blockingWaitTime has been exceeded. >>>> </description> >>>> </property> >>>> >>>> I wonder how it got there, I've deleted the files. >>>> >>>> -jack >>>> >>>> >>>> On Mon, Sep 27, 2010 at 12:42 PM, Jean-Daniel Cryans >>>> <[email protected]> wrote: >>>>> I'd say it's the: >>>>> >>>>> 2010-09-27 12:16:15,291 INFO >>>>> org.apache.hadoop.hbase.regionserver.Store: Started compaction of 943 >>>>> file(s) in att of >>>>> img833,dsc03711s.jpg,1285493435306.da57612ee69d7baaefe84 >>>>> eeb0e49f240. into >>>>> hdfs://namenode-rd.imageshack.us:9000/hbase/img833/da57612ee69d7baaefe84eeb0e49f240/.tmp, >>>>> sequenceid=618626242 >>>>> >>>>> That killed you. I wonder how it was able to get there since the >>>>> Memstore blocks flushing if the upper threshold for compactions was >>>>> reached (default is 7, did you set it to 1000 by any chance?). >>>>> >>>>> J-D >>>>> >>>>> On Mon, Sep 27, 2010 at 12:29 PM, Jack Levin <[email protected]> wrote: >>>>>> Strange situation, cold start the cluster, and one of the servers just >>>>>> started getting more and more consuming of RAM, you can see it form >>>>>> the screenshot I am attaching. Here is the log: >>>>>> http://pastebin.com/MDPJzLQJ >>>>>> >>>>>> There seem to be nothing happen, and then it just runs out of Memory, >>>>>> and of course shuts down. >>>>>> >>>>>> Here is GC log before the crash: http://pastebin.com/GwdC3nhx >>>>>> >>>>>> Strange , that other region servers stay up and consuming little >>>>>> memory (or rather stay stable.). >>>>>> >>>>>> Any ideas? >>>>>> >>>>>> -Jack >>>>>> >>>>> >>>> >>> >> >
