On 24/04/29 07:32PM, Matthew Wilcox wrote:
> On Mon, Apr 29, 2024 at 12:04:16PM -0500, John Groves wrote:
> > This patch set introduces famfs[1] - a special-purpose fs-dax file system
> > for sharable disaggregated or fabric-attached memory (FAM). Famfs is not
> > CXL-specific in anyway way.
> > 
> > * Famfs creates a simple access method for storing and sharing data in
> >   sharable memory. The memory is exposed and accessed as memory-mappable
> >   dax files.
> > * Famfs supports multiple hosts mounting the same file system from the
> >   same memory (something existing fs-dax file systems don't do).
> 
> Yes, but we do already have two filesystems that support shared storage,
> and are rather more advanced than famfs -- GFS2 and OCFS2.  What are
> the pros and cons of improving either of those to support DAX rather
> than starting again with a new filesystem?
> 

Thanks for paying attention to this Willy.

This is a fair question; I'll share some thoughts on the rationale, but it's
probably something that should be an ongoing dialog. We already have a LSFMM
session planned that will discuss whether the famfs functionality should be
merged into fuse, but GFS2 and OCFS2 are also potential candidates.

(I've already seen Kent's reply and will get to that next)

I work for a memory company, and the motivation here is to make disaggregated
shared memory practically usable. Any approach that moves in that direction 
is goodness as far as we're concerned -- provided it doesn't insert years of 
delay. 

Some thoughts on famfs:

* Famfs is not, not, not a general purpose file system.
* One can think of famfs as a shared memory allocator where allocations can be
  accessed as files. For certain data analytics work flows (especially 
  involving Apache Arrow data frames) this is really powerful. Consumers of
  data frames commonly use mmap(MAP_SHARED), and can benefit from the memory
  de-duplication of shared memory and don't need any new abstractions.
* Famfs is not really a data storage tool. It's more of a shared-memroy 
  allocation tool that has the benefit of allocations being accesssible 
  (and memory-mappable) as files. So a lot of software can automatically use 
  it.
* Famfs is oriented to dumping sharable data into files and then allowing a
  scale-out cluster to share it (often read-only) to access a single copy in
  shared memory.
* Although this audience probably already understands this, please forgive me
  for putting a fine point on it: memory mapping a famfs/fs-dax file does 
  not use system-ram as a cache - it directly accesses the memory associated 
  with a file. This would be true of all file systems with proper fs-dax 
  support (of which there are not many, and currently only famfs that supports
  shared access to media/memory).

Some thoughts on shared-storage file systems:

* I'm no expert on GFS2 or OCFS2, but I've been around memory, file systems 
  and storage since well before the turn of the century...
* If you had brought up the existing fs-dax file systems, I would have pointed
  that they use write-back metadata, which does not reconcile with shared
  access to media - but these file systems do handle that.
* The shared media file systems are still oriented to block devices that
  provide durable storage and page-oriented access. CXL DRAM is a character 
  dax (devdax) device and does not provide durable storage.
* fs-dax-style memory mapping for volatile cxl memory requires the 
  dev_dax_iomap portion of this patch set - or something similar. 
* A scale-out shared media file system presumably requires some commitment to
  configure and manage some complexity in a distributed environment; whether
  that should be mandatory for enablement of shared memory is worthy of
  discussion.
* Adding memory to the storage tier for GFS2/OCFS2 would add non-persistent
  media to the storage tier; whether this makes sense would be a topic that
  GFS2/OCFS2 developers/architects should get involved in if they're 
  interested.

Although disaggregated shared memory is not commercially available yet, famfs 
is being actively tested by multiple companies for several use cases and 
patterns with real and simulated shared memory. Demonstrations will start to
surface in the coming weeks & months.

Regards,
John



Reply via email to