Re: [PATCH] erofs: fix erofs_insert_workgroup() lockref usage

2023-10-31 Thread Linus Torvalds
On Mon, 30 Oct 2023 at 20:08, Gao Xiang  wrote:
>
> As Linus pointed out [1], lockref_put_return() is fundamentally
> designed to be something that can fail.  It behaves as a fastpath-only
> thing, and the failure case needs to be handled anyway.
>
> Actually, since the new pcluster was just allocated without being
> populated, it won't be accessed by others until it is inserted into
> XArray, so lockref helpers are actually unneeded here.
>
> Let's just set the proper reference count on initializing.

>From a quick superficial look this looks like the right approach.
Thanks for the quick response.

  Linus


Re: [PATCH 86/87] fs: switch timespec64 fields in inode to discrete integers

2023-10-06 Thread Linus Torvalds
On Thu, 28 Sept 2023 at 20:50, Amir Goldstein  wrote:
>
> OTOH, it is perfectly fine if the vfs wants to stop providing sub 100ns
> services to filesystems. It's just going to be the fs problem and the
> preserved pre-historic/fine-grained time on existing files would only
> need to be provided in getattr(). It does not need to be in __i_mtime.

Hmm. That sounds technically sane, but for one thing: if the aim is to try to do

 (a) atomic timestamp access

 (b) shrink the inode

then having the filesystem maintain its own timestamp for fine-grained
data will break both of those goals.

Yes, we'd make 'struct inode' smaller if we pack the times into one
64-bit entity, but if btrfs responds by adding mtime fields to "struct
btrfs_inode", we lost the size advantage and only made things worse.

And if ->getattr() then reads those fields without locking (and we
definitely don't want locking in that path), then we lost the
atomicity thing too.

So no. A "but the filesystem can maintain finer granularity" model is
not acceptable, I think.

If we do require nanoseconds for compatibility, what we could possibly
do is say "we guarantee nanosecond values for *legacy* dates", and say
that future dates use 100ns resolution. We'd define "legacy dates" to
be the traditional 32-bit signed time_t.

So with a 64-bit fstime_t, we'd have the "legacy format":

 - top 32 bits are seconds, bottom 32 bits are ns

which gives us that ns format.

Then, because only 30 bits are needed for nanosecond resolution, we
use the top two bits of that ns field as flags. '00' means that legacy
format, and '01' would mean "we're not doing nanosecond resolution,
we're doing 64ns resolution, and the low 6 bits of the ns field are
actually bits 32-37 of the seconds field".

That still gives us some extensibility (unless the multi-grain code
still wants to use the other top bit), and it gives us 40 bits of
seconds, which is quite a lot.

And all the conversion functions will be simple bit field
manipulations, so there are no expensive ops here.

Anyway, I agree with the "let's introduce the accessor functions
first, we can do the 'pack into one word' decisions later".

Linus


Re: [PATCH 86/87] fs: switch timespec64 fields in inode to discrete integers

2023-10-06 Thread Linus Torvalds
On Thu, 28 Sept 2023 at 14:28, Theodore Ts'o  wrote:
>
> I don't think anyone will complain about breaking the userspace API
> --- especially since if, say, the CIA was using this for their spies'
> drop boxes, they probably wouldn't want to admit it.  :-)

Well, you will find that real apps do kind of of care.

Just to take a very real example, "git" will very much notice time
granularity issues and care - because git will cache the 'stat' times
in the index.

So if you get a different stat time (because the vfs layer has changed
some granularity), git will then have to check the files carefully
again and update the index.

You can simulate this "re-check all files" with something like this:

$ time git diff

real 0m0.040s
user 0m0.035s
sys 0m0.264s

$ rm .git/index && git read-tree HEAD

$ time git diff

real 0m9.595s
user 0m7.287s
sys 0m2.810s

so the difference between just doing a "look, index information
matches current 'stat' information" and "oops, index does not have the
stat data" is "40 milliseconds" vs "10 seconds".

That's a big difference, and you'd see that each time the granularity
changes. But then once the index file has been updated, it's back to
the good case.

So yes, real programs to cache stat information, and it matters for performance.

But I don't think any actual reasonable program will have
*correctness* issues, though - because there are certainly filesystems
out there that don't do nanosecond resolution (and other operations
like copying trees around will obviously also change times).

Anybody doing steganography in the timestamps is already not going to
have a great time, really.

 Linus


Re: [PATCH 87/87] fs: move i_blocks up a few places in struct inode

2023-10-06 Thread Linus Torvalds
On Thu, 28 Sept 2023 at 04:06, Jeff Layton  wrote:
>
> Move i_blocks up above the i_lock, which moves the new 4 byte hole to
> just after the timestamps, without changing the size of the structure.

I'm sure others have mentioned this, but 'struct inode' is marked with
__randomize_layout, so the actual layout may end up being very
different.

I'm personally not convinced the whole structure randomization is
worth it - it's easy enough to figure out for any distro kernel since
the seed has to be the same across machines for modules to work, so
even if the seed isn't "public", any layout is bound to be fairly
easily discoverable.

So the whole randomization only really works for private kernel
builds, and it adds this kind of pain where "optimizing" the structure
layout is kind of pointless depending on various options.

I certainly *hope* no distro enables that pointless thing, but it's a worry.

   Linus


Re: [GIT PULL] erofs updates for 6.2-rc1 (fscache part inclusive)

2022-12-12 Thread Linus Torvalds
On Sun, Dec 11, 2022 at 11:01 PM Gao Xiang  wrote:
>
> Comparing with the latest -next on the last Thursday, there was one-liner
> addition  commit 927e5010ff5b ("erofs: use kmap_local_page()
> only for erofs_bread()") as below:
[...]
> Because there is no -next version on Monday, if you would like to
> take all commits in -next you could take
>   git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs.git 
> tags/erofs-for-6.2-rc1-alt

No worries. That one-liner isn't a problem, and you sent the pull
request to me early, so I'm perfectly happy with your original request
and have pulled it.

Thanks for being careful,

 Linus


Re: [PATCH v4 0/3] mm, netfs, fscache: Stop read optimisation when folio removed from pagecache

2022-11-23 Thread Linus Torvalds
On Wed, Nov 23, 2022 at 2:48 PM David Howells  wrote:
>
>   I've also got rid of the bit clearances
> from the network filesystem evict_inode functions as they doesn't seem to
> be necessary.

Well, the patches look superficially cleaner to me, at least. That
"doesn't seem to be necessary" makes me a bit worried, and I'd have
liked to see a more clear-cut "clearing it isn't necessary because X",
but I _assume_ it's not necessary simply because the 'struct
address_space" is released and never re-used.

But making the lifetime of that bit explicit might just be a good idea.

 Linus


Re: [RFC][PATCH 0/3] netfs, afs: Cleanups

2022-06-10 Thread Linus Torvalds
On Fri, Jun 10, 2022 at 12:56 PM David Howells  wrote:
>
> Here are some cleanups, one for afs and a couple for netfs:

Pulled,

   Linus


Re: [GIT PULL] erofs updates for 5.15-rc1

2021-09-02 Thread Linus Torvalds
On Thu, Sep 2, 2021 at 11:21 AM Gao Xiang  wrote:
>
> Yeah, thanks. That was my first time to merge another tree due to hard
> dependency like this. I've gained some experience from this and will be
> more confident on this if such things happen in the future. :)

Well, being nervous about cross-tree merges is probably a good thing,
and they *should* be fairly rare.

So don't get over-confident and cocky ;^)

  Linus


Re: [GIT PULL] erofs updates for 5.15-rc1

2021-09-02 Thread Linus Torvalds
On Tue, Aug 31, 2021 at 4:00 PM Gao Xiang  wrote:
>
> All commits have been tested and have been in linux-next. Note that
> in order to support iomap tail-packing inline, I had to merge iomap
> core branch (I've created a merge commit with the reason) in advance
> to resolve such functional dependency, which is now merged into
> upstream. Hopefully I did the right thing...

It all looks fine to me. You have all the important parts: what you
are merging, and _why_ you are merging it.

So no complaints, and thanks for making it explicit in your pull
request too so that I'm not taken by surprise.

 Linus


Re: [PATCH v8 07/24] erofs: add directory operations

2019-08-15 Thread Linus Torvalds
On Wed, Aug 14, 2019 at 9:42 PM Gao Xiang  wrote:
>
> +
> +static const unsigned char erofs_filetype_table[EROFS_FT_MAX] = {
> +   [EROFS_FT_UNKNOWN]  = DT_UNKNOWN,
> +   [EROFS_FT_REG_FILE] = DT_REG,
> +   [EROFS_FT_DIR]  = DT_DIR,
> +   [EROFS_FT_CHRDEV]   = DT_CHR,
> +   [EROFS_FT_BLKDEV]   = DT_BLK,
> +   [EROFS_FT_FIFO] = DT_FIFO,
> +   [EROFS_FT_SOCK] = DT_SOCK,
> +   [EROFS_FT_SYMLINK]  = DT_LNK,
> +};

Hmm.

The EROFS_FT_XYZ values seem to match the normal FT_XYZ values, and
we've lately tried to just have filesystems use the standard ones
instead of having a (pointless) duplicate conversion between the two.

And then you can use the common "fs_ftype_to_dtype()" to convert from
FT_XYZ to DT_XYZ.

Maybe I'm missing something, and the EROFS_FT_x list actually differs
from the normal FT_x list some way, but it would be good to not
introduce another case of this in normal filesystems, just as we've
been getting rid of them.

See for example commit e10892189428 ("ext2: use common file type conversion").

   Linus