Re: running out of memory. Reasons?

Jack Levin Mon, 27 Sep 2010 13:47:10 -0700

http://pastebin.com/S7ETUpSb


and

Too many hlogs files:

http://pastebin.com/j3GMynww

Why do I have so many hlogs?

-Jack


On Mon, Sep 27, 2010 at 1:33 PM, Jean-Daniel Cryans <[email protected]> wrote:
> You could set the blocking store files setting higher (we have it at
> 17 here), but looking at the log I see it was blocking for 90secs only
> to flush a 1MB file. Why was that flush requested? Global memstore
> size reached? The log from a few lines before should tell
>
> J-D
>
> On Mon, Sep 27, 2010 at 1:18 PM, Jack Levin <[email protected]> wrote:
>> I see it:  http://pastebin.com/tgQHBSLj
>>
>> Interesting situation indeed.  Any thoughts on how to avoid it?  Have
>> compaction running more aggressively?
>>
>> -Jack
>>
>> On Mon, Sep 27, 2010 at 1:00 PM, Jean-Daniel Cryans <[email protected]> 
>> wrote:
>>> Can you grep around the region server log files to see what was going
>>> on with that region during the previous run? There's only 1 way I see
>>> this happening, and it would require that your region server would be
>>> serving thousands of regions and that this region was in queue to be
>>> compacted behind all those thousands of regions, and in the mean time
>>> the flush blocker of 90 seconds would timeout at least enough times so
>>> that you would end up with all those store files (which according to
>>> my quick calculation, would mean that it took about 23 hours before
>>> the region server was able to compact that region which is something
>>> I've never seen, and it would have killed your region server with
>>> OOME). Do you see this message often?
>>>
>>>       LOG.info("Waited " + (System.currentTimeMillis() - fqe.createTime) +
>>>          "ms on a compaction to clean up 'too many store files'; waited " +
>>>          "long enough... proceeding with flush of " +
>>>          region.getRegionNameAsString());
>>>
>>> Thx,
>>>
>>> J-D
>>>
>>> On Mon, Sep 27, 2010 at 12:54 PM, Jack Levin <[email protected]> wrote:
>>>> Strange: this is what I have:
>>>>
>>>>  <property>
>>>>    <name>hbase.hstore.blockingStoreFiles</name>
>>>>    <value>7</value>
>>>>    <description>
>>>>    If more than this number of StoreFiles in any one Store
>>>>    (one StoreFile is written per flush of MemStore) then updates are
>>>>    blocked for this HRegion until a compaction is completed, or
>>>>    until hbase.hstore.blockingWaitTime has been exceeded.
>>>>    </description>
>>>>  </property>
>>>>
>>>> I wonder how it got there, I've deleted the files.
>>>>
>>>> -jack
>>>>
>>>>
>>>> On Mon, Sep 27, 2010 at 12:42 PM, Jean-Daniel Cryans
>>>> <[email protected]> wrote:
>>>>> I'd say it's the:
>>>>>
>>>>> 2010-09-27 12:16:15,291 INFO
>>>>> org.apache.hadoop.hbase.regionserver.Store: Started compaction of 943
>>>>> file(s) in att of
>>>>> img833,dsc03711s.jpg,1285493435306.da57612ee69d7baaefe84
>>>>> eeb0e49f240.  into
>>>>> hdfs://namenode-rd.imageshack.us:9000/hbase/img833/da57612ee69d7baaefe84eeb0e49f240/.tmp,
>>>>> sequenceid=618626242
>>>>>
>>>>> That killed you. I wonder how it was able to get there since the
>>>>> Memstore blocks flushing if the upper threshold for compactions was
>>>>> reached (default is 7, did you set it to 1000 by any chance?).
>>>>>
>>>>> J-D
>>>>>
>>>>> On Mon, Sep 27, 2010 at 12:29 PM, Jack Levin <[email protected]> wrote:
>>>>>> Strange situation, cold start the cluster, and one of the servers just
>>>>>> started getting more and more consuming of RAM, you can see it form
>>>>>> the screenshot I am attaching.  Here is the log:
>>>>>> http://pastebin.com/MDPJzLQJ
>>>>>>
>>>>>> There seem to be nothing happen, and then it just runs out of Memory,
>>>>>> and of course shuts down.
>>>>>>
>>>>>> Here is GC log before the crash:  http://pastebin.com/GwdC3nhx
>>>>>>
>>>>>> Strange , that other region servers stay up and consuming little
>>>>>> memory (or rather stay stable.).
>>>>>>
>>>>>> Any ideas?
>>>>>>
>>>>>> -Jack
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: running out of memory. Reasons?

Reply via email to