One idea we took from the 0.89-FB branch is setting the internal scanner read
batching for compaction (compactionKVMax) to 1 as there isn't a benefit
otherwise server side for compaction and we run with heaps sometimes up at 90%
utilization for a time as observed with JMX. Wonder if that would have had an
impact here. Just a random thought, pardon if the default is 1 (IIRC it's 10)
or something silly like that.
Best regards,
- Andy
On Apr 11, 2012, at 6:17 PM, Bryan Beaudreault <[email protected]> wrote:
> Hi Stack,
>
> Thanks for the reply. Unfortunately, our first instinct was to restart the
> region servers and when they came up it appears the compaction was able to
> succeed (perhaps because on a fresh restart the heap was low enough to
> succeed). I listed the files under that region and there is now only 1
> file.
>
> We are going to be running this job again in the near future. We are going
> to try to rate limit the writes a bit (though only 10 reducers were running
> at once to begin with), and I will keep in mind your suggestions if it
> happens despite that.
>
> - Bryan
>
> On Wed, Apr 11, 2012 at 4:35 PM, Stack <[email protected]> wrote:
>
>> On Wed, Apr 11, 2012 at 10:24 AM, Bryan Beaudreault
>> <[email protected]> wrote:
>>> We have 16 m1.xlarge ec2 machines as region servers, running cdh3u2,
>>> hosting about 17k regions.
>>
>> Thats too many but thats another story.
>>
>>> That pattern repeats on all of the region servers, every 5-8 minutes
>> until
>>> all are down. Should there be some safeguards on a compaction causing a
>>> region server to go OOM? The region appears to only be around 425mb in
>>> size.
>>>
>>
>> My guess is that Region A has a massive or corrupt record in it.
>>
>> You could disable the region for now while you are figuring whats wrong
>> w/it.
>>
>> If you list files under this region, what do you see? Are there many?
>>
>> Can you see what files are selected for compaction? This will narrow
>> the set to look at. You could poke at them w/ the hfile tool. See
>> '8.7.5.2.2. HFile Tool' in the reference guide.
>>
>> St.Ack
>>