On Wed, Feb 16, 2011 at 3:09 PM, Jason Rutherglen <[email protected]> wrote: > This comment > https://issues.apache.org/jira/browse/HDFS-347?focusedCommentId=12991734&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12991734 > is interesting as in Lucene the IO cache is relied on, one would > assume that HBase'd be the same? > > On Wed, Feb 16, 2011 at 11:48 AM, Ryan Rawson <[email protected]> wrote: >> That would be cool, I think we should probably also push for HSDF-347 >> while we are at it as well. The situation for HDFS improvements has >> not been good, but might improve in the mid-future. >> >> Thanks for the pointer! >> -ryan >> >> On Wed, Feb 16, 2011 at 11:40 AM, Jason Rutherglen >> <[email protected]> wrote: >>>> One of my coworker is reminding me that major compactions do have the >>>> well know side effect of slowing down a busy system. >>> >>> I think where this is going is the system IO cache problem could be >>> solved with something like DirectIOLinuxDirectory: >>> https://issues.apache.org/jira/browse/LUCENE-2500 Of course the issue >>> would be integrating DIOLD or it's underlying native implementation >>> into HDFS somehow? >>> >> >
This seems to be a common issue across the "write once and compact" model, it tends to vaporizes page cache. Cassandra is working on similar trickery at the file system level. Another interesting idea is the concept of re-warming the cache after a compaction https://issues.apache.org/jira/browse/CASSANDRA-1878. I would assume that users of HBase rely more on the HBase Block cache then the VFS cache. Our of curiosity do people who run with 24 GB memory. 4GB Xmx DataNode 16GB Xmx RegionServer (block cache), 4MB vfs cache? I always suggest firing off the major compact at a low traffic time (if you have such a time) so it has the least impact.
