One idea we took from the 0.89-FB branch is setting the internal scanner read 
batching for compaction (compactionKVMax) to 1 as there isn't a benefit 
otherwise server side for compaction and we run with heaps sometimes up at 90% 
utilization for a time as observed with JMX. Wonder if that would have had an 
impact here. Just a random thought, pardon if the default is 1 (IIRC it's 10) 
or something silly like that.

Best regards,

    - Andy


On Apr 11, 2012, at 6:17 PM, Bryan Beaudreault <[email protected]> wrote:

> Hi Stack,
> 
> Thanks for the reply.  Unfortunately, our first instinct was to restart the
> region servers and when they came up it appears the compaction was able to
> succeed (perhaps because on a fresh restart the heap was low enough to
> succeed).  I listed the files under that region and there is now only 1
> file.
> 
> We are going to be running this job again in the near future.  We are going
> to try to rate limit the writes a bit (though only 10 reducers were running
> at once to begin with), and I will keep in mind your suggestions if it
> happens despite that.
> 
> - Bryan
> 
> On Wed, Apr 11, 2012 at 4:35 PM, Stack <[email protected]> wrote:
> 
>> On Wed, Apr 11, 2012 at 10:24 AM, Bryan Beaudreault
>> <[email protected]> wrote:
>>> We have 16 m1.xlarge ec2 machines as region servers, running cdh3u2,
>>> hosting about 17k regions.
>> 
>> Thats too many but thats another story.
>> 
>>> That pattern repeats on all of the region servers, every 5-8 minutes
>> until
>>> all are down. Should there be some safeguards on a compaction causing a
>>> region server to go OOM?  The region appears to only be around 425mb in
>>> size.
>>> 
>> 
>> My guess is that Region A has a massive or corrupt record in it.
>> 
>> You could disable the region for now while you are figuring whats wrong
>> w/it.
>> 
>> If you list files under this region, what do you see?  Are there many?
>> 
>> Can you see what files are selected for compaction?  This will narrow
>> the set to look at.  You could poke at them w/ the hfile tool.  See
>> '8.7.5.2.2. HFile Tool' in the reference guide.
>> 
>> St.Ack
>> 

Reply via email to