Hi,

On 09/06/2020 13:41, Matthew Wilcox wrote:
I have a modest proposal ...

  struct inode {
-       struct address_space i_data;
  }

+struct minode {
+       struct inode i;
+       struct address_space m;
+};

  struct address_space {
-       struct inode *host;
  }

This saves one pointer per inode, and cuts all the pagecache support
from inodes which don't need to have a page cache (symlinks, directories,
pipes, sockets, char devices).

This was born from the annoyance of going from a struct page to a filesystem:
page->mapping->host->i_sb->s_type

That's four pointer dereferences.  This would bring it down to three:
i_host(page->mapping)->i_sb->s_type

I could see (eventually) interfaces changing to pass around a
struct minode *mapping instead of a struct address_space *mapping.  But
I know mapping->host and inode->i_mapping sometimes have some pretty
weird relationships and maybe there's a legitimate usage that can't be
handled by this change.

Every filesystem which does use the page cache would have to be changed
to use a minode instead of an inode, which is why this proposal is so
very modest.  But before I start looking into it properly, I thought
somebody might know why this isn't going to work.

I know about raw devices:
                 file_inode(filp)->i_mapping =
                         bdev->bd_inode->i_mapping;

and this seems like it should work for that.  I know about coda:
                 coda_inode->i_mapping = host_inode->i_mapping;

and this seems like it should work there too.

DAX just seems confused:
         inode->i_mapping = __dax_inode->i_mapping;
         inode->i_mapping->host = __dax_inode;
         inode->i_mapping->a_ops = &dev_dax_aops;

GFS2 might need to embed an entire minode instead of just a mapping in its
glocks and its superblock:
fs/gfs2/glock.c:                mapping->host = s->s_bdev->bd_inode;
fs/gfs2/ops_fstype.c:   mapping->host = sb->s_bdev->bd_inode;

I don't think that will scale. We did gain a big reduction in overhead for each cached inode when we stopped using two struct inodes and just embedded an address_space in the glock. However, I'm fairly sure that for the glock address_space case, we already have our own way to find the associated inode. So it might well be ok to do this anyway, and not need to embed a full minode.

Also, if there was a better way to track metadata on a per inode basis, then that would be an even better solution, but a much bigger project too.

The issue that you might run across is for stacked filesystems... will you land up finding the correct layer in the stack?

Steve.



NILFS ... I don't understand at all.  It seems to allocate its own
private address space in nilfs_inode_info instead of using i_data (why?)
and also allocate more address spaces for metadata inodes.
fs/nilfs2/page.c:       mapping->host = inode;

So that will need to be understood, but is there a fundamental reason
this won't work?

Advantages:
  - Eliminates a pointer dereference when moving from mapping to host
  - Shrinks all inodes by one pointer
  - Shrinks inodes used for symlinks, directories, sockets, pipes & char
    devices by an entire struct address_space.

Disadvantages:
  - Churn
  - Seems like it'll grow a few data structures in less common filesystems
    (but may be important for some users)


Reply via email to