Re: [PATCH] erofs: fix erofs_insert_workgroup() lockref usage
On Mon, 30 Oct 2023 at 20:08, Gao Xiang wrote: > > As Linus pointed out [1], lockref_put_return() is fundamentally > designed to be something that can fail. It behaves as a fastpath-only > thing, and the failure case needs to be handled anyway. > > Actually, since the new pcluster was just allocated without being > populated, it won't be accessed by others until it is inserted into > XArray, so lockref helpers are actually unneeded here. > > Let's just set the proper reference count on initializing. >From a quick superficial look this looks like the right approach. Thanks for the quick response. Linus
Re: [PATCH 86/87] fs: switch timespec64 fields in inode to discrete integers
On Thu, 28 Sept 2023 at 20:50, Amir Goldstein wrote: > > OTOH, it is perfectly fine if the vfs wants to stop providing sub 100ns > services to filesystems. It's just going to be the fs problem and the > preserved pre-historic/fine-grained time on existing files would only > need to be provided in getattr(). It does not need to be in __i_mtime. Hmm. That sounds technically sane, but for one thing: if the aim is to try to do (a) atomic timestamp access (b) shrink the inode then having the filesystem maintain its own timestamp for fine-grained data will break both of those goals. Yes, we'd make 'struct inode' smaller if we pack the times into one 64-bit entity, but if btrfs responds by adding mtime fields to "struct btrfs_inode", we lost the size advantage and only made things worse. And if ->getattr() then reads those fields without locking (and we definitely don't want locking in that path), then we lost the atomicity thing too. So no. A "but the filesystem can maintain finer granularity" model is not acceptable, I think. If we do require nanoseconds for compatibility, what we could possibly do is say "we guarantee nanosecond values for *legacy* dates", and say that future dates use 100ns resolution. We'd define "legacy dates" to be the traditional 32-bit signed time_t. So with a 64-bit fstime_t, we'd have the "legacy format": - top 32 bits are seconds, bottom 32 bits are ns which gives us that ns format. Then, because only 30 bits are needed for nanosecond resolution, we use the top two bits of that ns field as flags. '00' means that legacy format, and '01' would mean "we're not doing nanosecond resolution, we're doing 64ns resolution, and the low 6 bits of the ns field are actually bits 32-37 of the seconds field". That still gives us some extensibility (unless the multi-grain code still wants to use the other top bit), and it gives us 40 bits of seconds, which is quite a lot. And all the conversion functions will be simple bit field manipulations, so there are no expensive ops here. Anyway, I agree with the "let's introduce the accessor functions first, we can do the 'pack into one word' decisions later". Linus
Re: [PATCH 86/87] fs: switch timespec64 fields in inode to discrete integers
On Thu, 28 Sept 2023 at 14:28, Theodore Ts'o wrote: > > I don't think anyone will complain about breaking the userspace API > --- especially since if, say, the CIA was using this for their spies' > drop boxes, they probably wouldn't want to admit it. :-) Well, you will find that real apps do kind of of care. Just to take a very real example, "git" will very much notice time granularity issues and care - because git will cache the 'stat' times in the index. So if you get a different stat time (because the vfs layer has changed some granularity), git will then have to check the files carefully again and update the index. You can simulate this "re-check all files" with something like this: $ time git diff real 0m0.040s user 0m0.035s sys 0m0.264s $ rm .git/index && git read-tree HEAD $ time git diff real 0m9.595s user 0m7.287s sys 0m2.810s so the difference between just doing a "look, index information matches current 'stat' information" and "oops, index does not have the stat data" is "40 milliseconds" vs "10 seconds". That's a big difference, and you'd see that each time the granularity changes. But then once the index file has been updated, it's back to the good case. So yes, real programs to cache stat information, and it matters for performance. But I don't think any actual reasonable program will have *correctness* issues, though - because there are certainly filesystems out there that don't do nanosecond resolution (and other operations like copying trees around will obviously also change times). Anybody doing steganography in the timestamps is already not going to have a great time, really. Linus
Re: [PATCH 87/87] fs: move i_blocks up a few places in struct inode
On Thu, 28 Sept 2023 at 04:06, Jeff Layton wrote: > > Move i_blocks up above the i_lock, which moves the new 4 byte hole to > just after the timestamps, without changing the size of the structure. I'm sure others have mentioned this, but 'struct inode' is marked with __randomize_layout, so the actual layout may end up being very different. I'm personally not convinced the whole structure randomization is worth it - it's easy enough to figure out for any distro kernel since the seed has to be the same across machines for modules to work, so even if the seed isn't "public", any layout is bound to be fairly easily discoverable. So the whole randomization only really works for private kernel builds, and it adds this kind of pain where "optimizing" the structure layout is kind of pointless depending on various options. I certainly *hope* no distro enables that pointless thing, but it's a worry. Linus
Re: [GIT PULL] erofs updates for 6.2-rc1 (fscache part inclusive)
On Sun, Dec 11, 2022 at 11:01 PM Gao Xiang wrote: > > Comparing with the latest -next on the last Thursday, there was one-liner > addition commit 927e5010ff5b ("erofs: use kmap_local_page() > only for erofs_bread()") as below: [...] > Because there is no -next version on Monday, if you would like to > take all commits in -next you could take > git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs.git > tags/erofs-for-6.2-rc1-alt No worries. That one-liner isn't a problem, and you sent the pull request to me early, so I'm perfectly happy with your original request and have pulled it. Thanks for being careful, Linus
Re: [PATCH v4 0/3] mm, netfs, fscache: Stop read optimisation when folio removed from pagecache
On Wed, Nov 23, 2022 at 2:48 PM David Howells wrote: > > I've also got rid of the bit clearances > from the network filesystem evict_inode functions as they doesn't seem to > be necessary. Well, the patches look superficially cleaner to me, at least. That "doesn't seem to be necessary" makes me a bit worried, and I'd have liked to see a more clear-cut "clearing it isn't necessary because X", but I _assume_ it's not necessary simply because the 'struct address_space" is released and never re-used. But making the lifetime of that bit explicit might just be a good idea. Linus
Re: [RFC][PATCH 0/3] netfs, afs: Cleanups
On Fri, Jun 10, 2022 at 12:56 PM David Howells wrote: > > Here are some cleanups, one for afs and a couple for netfs: Pulled, Linus
Re: [GIT PULL] erofs updates for 5.15-rc1
On Thu, Sep 2, 2021 at 11:21 AM Gao Xiang wrote: > > Yeah, thanks. That was my first time to merge another tree due to hard > dependency like this. I've gained some experience from this and will be > more confident on this if such things happen in the future. :) Well, being nervous about cross-tree merges is probably a good thing, and they *should* be fairly rare. So don't get over-confident and cocky ;^) Linus
Re: [GIT PULL] erofs updates for 5.15-rc1
On Tue, Aug 31, 2021 at 4:00 PM Gao Xiang wrote: > > All commits have been tested and have been in linux-next. Note that > in order to support iomap tail-packing inline, I had to merge iomap > core branch (I've created a merge commit with the reason) in advance > to resolve such functional dependency, which is now merged into > upstream. Hopefully I did the right thing... It all looks fine to me. You have all the important parts: what you are merging, and _why_ you are merging it. So no complaints, and thanks for making it explicit in your pull request too so that I'm not taken by surprise. Linus
Re: [PATCH v8 07/24] erofs: add directory operations
On Wed, Aug 14, 2019 at 9:42 PM Gao Xiang wrote: > > + > +static const unsigned char erofs_filetype_table[EROFS_FT_MAX] = { > + [EROFS_FT_UNKNOWN] = DT_UNKNOWN, > + [EROFS_FT_REG_FILE] = DT_REG, > + [EROFS_FT_DIR] = DT_DIR, > + [EROFS_FT_CHRDEV] = DT_CHR, > + [EROFS_FT_BLKDEV] = DT_BLK, > + [EROFS_FT_FIFO] = DT_FIFO, > + [EROFS_FT_SOCK] = DT_SOCK, > + [EROFS_FT_SYMLINK] = DT_LNK, > +}; Hmm. The EROFS_FT_XYZ values seem to match the normal FT_XYZ values, and we've lately tried to just have filesystems use the standard ones instead of having a (pointless) duplicate conversion between the two. And then you can use the common "fs_ftype_to_dtype()" to convert from FT_XYZ to DT_XYZ. Maybe I'm missing something, and the EROFS_FT_x list actually differs from the normal FT_x list some way, but it would be good to not introduce another case of this in normal filesystems, just as we've been getting rid of them. See for example commit e10892189428 ("ext2: use common file type conversion"). Linus