Hi,
On 09/06/2020 13:41, Matthew Wilcox wrote:
I have a modest proposal ...
struct inode {
- struct address_space i_data;
}
+struct minode {
+ struct inode i;
+ struct address_space m;
+};
struct address_space {
- struct inode *host;
}
This saves one pointer per inode, and cuts all the pagecache support
from inodes which don't need to have a page cache (symlinks, directories,
pipes, sockets, char devices).
This was born from the annoyance of going from a struct page to a filesystem:
page->mapping->host->i_sb->s_type
That's four pointer dereferences. This would bring it down to three:
i_host(page->mapping)->i_sb->s_type
I could see (eventually) interfaces changing to pass around a
struct minode *mapping instead of a struct address_space *mapping. But
I know mapping->host and inode->i_mapping sometimes have some pretty
weird relationships and maybe there's a legitimate usage that can't be
handled by this change.
Every filesystem which does use the page cache would have to be changed
to use a minode instead of an inode, which is why this proposal is so
very modest. But before I start looking into it properly, I thought
somebody might know why this isn't going to work.
I know about raw devices:
file_inode(filp)->i_mapping =
bdev->bd_inode->i_mapping;
and this seems like it should work for that. I know about coda:
coda_inode->i_mapping = host_inode->i_mapping;
and this seems like it should work there too.
DAX just seems confused:
inode->i_mapping = __dax_inode->i_mapping;
inode->i_mapping->host = __dax_inode;
inode->i_mapping->a_ops = &dev_dax_aops;
GFS2 might need to embed an entire minode instead of just a mapping in its
glocks and its superblock:
fs/gfs2/glock.c: mapping->host = s->s_bdev->bd_inode;
fs/gfs2/ops_fstype.c: mapping->host = sb->s_bdev->bd_inode;
I don't think that will scale. We did gain a big reduction in overhead
for each cached inode when we stopped using two struct inodes and just
embedded an address_space in the glock. However, I'm fairly sure that
for the glock address_space case, we already have our own way to find
the associated inode. So it might well be ok to do this anyway, and not
need to embed a full minode.
Also, if there was a better way to track metadata on a per inode basis,
then that would be an even better solution, but a much bigger project too.
The issue that you might run across is for stacked filesystems... will
you land up finding the correct layer in the stack?
Steve.
NILFS ... I don't understand at all. It seems to allocate its own
private address space in nilfs_inode_info instead of using i_data (why?)
and also allocate more address spaces for metadata inodes.
fs/nilfs2/page.c: mapping->host = inode;
So that will need to be understood, but is there a fundamental reason
this won't work?
Advantages:
- Eliminates a pointer dereference when moving from mapping to host
- Shrinks all inodes by one pointer
- Shrinks inodes used for symlinks, directories, sockets, pipes & char
devices by an entire struct address_space.
Disadvantages:
- Churn
- Seems like it'll grow a few data structures in less common filesystems
(but may be important for some users)