Re: Major compactions and OS cache

Edward Capriolo Wed, 16 Feb 2011 18:21:44 -0800

On Wed, Feb 16, 2011 at 3:09 PM, Jason Rutherglen
<[email protected]> wrote:
> This comment 
> https://issues.apache.org/jira/browse/HDFS-347?focusedCommentId=12991734&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12991734
> is interesting as in Lucene the IO cache is relied on, one would
> assume that HBase'd be the same?
>
> On Wed, Feb 16, 2011 at 11:48 AM, Ryan Rawson <[email protected]> wrote:
>> That would be cool, I think we should probably also push for HSDF-347
>> while we are at it as well. The situation for HDFS improvements has
>> not been good, but might improve in the mid-future.
>>
>> Thanks for the pointer!
>> -ryan
>>
>> On Wed, Feb 16, 2011 at 11:40 AM, Jason Rutherglen
>> <[email protected]> wrote:
>>>> One of my coworker is reminding me that major compactions do have the
>>>> well know side effect of slowing down a busy system.
>>>
>>> I think where this is going is the system IO cache problem could be
>>> solved with something like DirectIOLinuxDirectory:
>>> https://issues.apache.org/jira/browse/LUCENE-2500  Of course the issue
>>> would be integrating DIOLD or it's underlying native implementation
>>> into HDFS somehow?
>>>
>>
>


This seems to be a common issue across the "write once and compact"
model, it tends to vaporizes page cache. Cassandra is working on
similar trickery at the file system level. Another interesting idea is
the concept of re-warming the cache after a compaction
https://issues.apache.org/jira/browse/CASSANDRA-1878.

I would assume that users of HBase rely more on the HBase Block cache
then the VFS cache. Our of curiosity do people who run with 24 GB
memory. 4GB Xmx DataNode 16GB Xmx RegionServer (block cache), 4MB vfs
cache?

I always suggest firing off the major compact at a low traffic time
(if you have such a time) so it has the least impact.

Re: Major compactions and OS cache

Reply via email to