With the introduction of byte addressable storage devices that have low 
latencies, it becomes difficult to decide how to expose these devices to 
user space applications. Do we treat them as traditional block devices 
or expose them as a DAX capable device? A traditional block device 
allows us to use the page cache to take advantage of locality in access 
patterns, but comes at the expense of extra memory copies that are 
extremely costly for random workloads. A DAX capable device seems great 
for the aforementioned random access workload, but suffers once there is 
some locality in the access pattern.

When DAX-capable devices are used as slower/cheaper volatile memory, 
treating them as a slower NUMA node with an associated NUMA migration 
policy would allow for taking advantage of access pattern locality. 
However this approach suffers from a few drawbacks. First, when those 
devices are also persistent, the tiering approach used in NUMA migration 
may not guarantee persistence. Secondly, for devices with significantly 
higher latencies than DRAM, the cost of moving clean pages may be 
significant. Finally, pages handled via NUMA migration are a common 
resource subject to thrashing in case of memory pressure.

I would like to discuss an alternative approach where memory intensive 
applications mmap these storage devices into their address space. The 
application can specify how much DRAM could be used as a cache and have 
some influence on prefetching and eviction policies. The goal of such an 
approach would be to minimize the impact of the slightly slower memory 
could potentially have on a system when it is treated as kernel managed 
global resource, as well as enable use of those devices as persistent 
memory. BTW we criminally ;) used the vm_insert_page function in a 
prototype and have found that it is faster to use vs page cache and 
swapping mechanisms limited to use a small amount of DRAM.

Reply via email to