Re: Libfuse interoperability/ABI broken.

2024-03-22 Thread Ashley Pittman


I’m back from holiday now and working on this properly.  Yes, the below is what 
I had in mind but haven’t had a chance to go through all the permutations yet 
and see if we can patch this up at run-time without potentially missing any 
calls.

The good news at least is once you detect one one incorrect/invalid option you 
can then make assumptions about what version the binary was with and then just 
apply the fix from that point out without further needing the run the 
heuristics to detect on every call.  The heuristics and what flags collied with 
which other flags depend on if we choose to revert to the older ABI or stick 
with the most recent one, my initial though was to revert on the basis that 
most distros haven’t updated past the change point yet but Bernd was arguing 
for keeping the new interface.  Being able to construct a set of heuristics 
which is safe in either direction could swing this decision.

One option we do have is that fuse_session_new passes in struct 
fuse_lowlevel_ops and it’s size, whilst it’s also a hack I could add padding to 
this structure to increase it’s size and then have a way of knowing 
conclusively at run-time that this bug and any work-arounds are not applicable.

Thank you everyone who's contributed to this discussion, hopefully we can find 
a path where we can detect this and patch it up entirely in the libfuse source.

Ashley.

> On 9 Mar 2024, at 02:46, Amir Goldstein  wrote:
> 
> On Thu, Mar 7, 2024 at 8:47 PM Bernd Schubert
>  wrote:
>> 
>> Hi all,
>> 
>> this is certainly not kind of the mail I was hoping for as a new libfuse
>> maintainer.
>> 
>> As you can see from the title and from discussion below (sorry this is
>> not typical ML discussion style), we have a bit of of problem with
>> libfuse ABI compatibility.
>> 
>> While scanning through git history, Ashley found a commit that was adding
>> members in the middle of struct - doesn't break API, but binary
>> compatibility. That commit already landed a year ago in these releases:
>> 
>> git tag --contains a5eb7f2a0117ab43119ef5724cf5f4f2f181804a
>> fuse-3.14.1
>> fuse-3.15.0
>> fuse-3.15.1
>> fuse-3.16.1
>> fuse-3.16.2
>> 
>> 
>> Obviously this needs improved testing for, but right now we wonder what
>> would be the preferred action to avoid issues.
>> 
>> a) We could fix it and move bits to the right place. Fixes everything
>> compiled with old versions, but breaks everything compiled with the
>> versions above
>> 
>> b) Increase the .so version - enforces recompilation of all binaries.
>> Intrusive, especially for distributions, but probably safest.
>> 
>> c) Other ideas?
>> 
> 
> Heuristically, you can detect most of the shifted flags at runtime
> because...
> 
>> 
>> 
>> I don't think there is anything in libfuse that would allow us to
>> detect which version of libfuse a library was linked to.
>> 
> 
> I think we do know for sure if fs was linked with libfuse < 3.12
> without fuse_loop_mt_312?
> so only left to detect  3.12..3.14 vs. 3.14..3.16
> 
>> 
>> The commit shifted these members in struct fuse_file_info {
>> 
>> struct fuse_file_info {
>> ...
>>/** Can be filled by open/create, to allow parallel direct writes on 
>> this
>> *  file */
>>unsigned int parallel_direct_writes : 1; --> introduced the shift
> 
> Not expected in flush/release, so can be heuristically interpreted as flush
> 
>> 
>>/** Indicates a flush operation.  Set in flush operation, also
>>maybe set in highlevel lock operation and lowlevel release
>>operation. */
>>unsigned int flush : 1;
>> 
> 
> Not expected in open/create, so can be heuristically interpreted as 
> nonseekable
> 
>>/** Can be filled in by open, to indicate that the file is not
>>seekable. */
>>unsigned int nonseekable : 1;
>> 
> 
> Not expected in release, so can be heuristically interpreted as flock_release
> 
>>/* Indicates that flock locks for this file should be
>>   released.  If set, lock_owner shall contain a valid value.
>>   May only be set in ->release(). */
>>unsigned int flock_release : 1;
>> 
> 
> Not expected in opendir, so can be heuristically interpreted as cache_readdir
> 
>>/** Can be filled in by opendir. It signals the kernel to
>>enable caching of entries returned by readdir().  Has no
>>effect when set in other contexts (in particular it does
>>nothing when set by open()). */
>>unsigned int cache_readdir : 1;
>> 
> 
> Ignored in open, but based on the comment above, it may be
> implied that some fs may set it in open() reply
> 
>>/** Can be filled in by open, to indicate that flush is not needed
>>on close. */
>>unsigned int noflush : 1;
> 
> noflush is just an optimization, which the kernel ignores anyway
> when writeback cache is enabled, so no harm done if it is wrongly
> interpreted as readdir_cache in open() and ignored.
> It is 

Re: Libfuse interoperability/ABI broken.

2024-03-13 Thread Amir Goldstein
On Wed, Mar 13, 2024 at 1:57 AM Bernd Schubert
 wrote:
>
> Hi Amir,
>
> thanks for your help!
>
> On 3/9/24 03:46, Amir Goldstein wrote:
> > On Thu, Mar 7, 2024 at 8:47 PM Bernd Schubert
> >  wrote:
> >>
> >> Hi all,
> >>
> >> this is certainly not kind of the mail I was hoping for as a new libfuse
> >> maintainer.
> >>
> >> As you can see from the title and from discussion below (sorry this is
> >> not typical ML discussion style), we have a bit of of problem with
> >> libfuse ABI compatibility.
> >>
> >> While scanning through git history, Ashley found a commit that was adding
> >> members in the middle of struct - doesn't break API, but binary
> >> compatibility. That commit already landed a year ago in these releases:
> >>
> >> git tag --contains a5eb7f2a0117ab43119ef5724cf5f4f2f181804a
> >> fuse-3.14.1
> >> fuse-3.15.0
> >> fuse-3.15.1
> >> fuse-3.16.1
> >> fuse-3.16.2
> >>
> >>
> >> Obviously this needs improved testing for, but right now we wonder what
> >> would be the preferred action to avoid issues.
> >>
> >> a) We could fix it and move bits to the right place. Fixes everything
> >> compiled with old versions, but breaks everything compiled with the
> >> versions above
> >>
> >> b) Increase the .so version - enforces recompilation of all binaries.
> >> Intrusive, especially for distributions, but probably safest.
> >>
> >> c) Other ideas?
> >>
> >
> > Heuristically, you can detect most of the shifted flags at runtime
> > because...
> >
> >>
> >>
> >> I don't think there is anything in libfuse that would allow us to
> >> detect which version of libfuse a library was linked to.
> >>
> >
> > I think we do know for sure if fs was linked with libfuse < 3.12
> > without fuse_loop_mt_312?
> > so only left to detect  3.12..3.14 vs. 3.14..3.16
>
> Hmm, I guess I miss something, but how would I know if it was linked
> with fuse_loop_mt_312? That needs an elf reader? Assuming we would put
> this into the library, somehow, how does this work with stripped binaries?
>
> bschubert2@imesrv6 example>nm passthrough_ll | head -n1
> 03b4 r __abi_tag
> bschubert2@imesrv6 example>strip passthrough_ll
> bschubert2@imesrv6 example>nm passthrough_ll | head -n1
> nm: passthrough_ll: no symbols
>
>

I thought that invoking fuse_loop_mt_*() indicated the lib version
that binary was built with.
I wasn't taking FUSE_USE_VERSION into account.

>
> >
> >>
> >> The commit shifted these members in struct fuse_file_info {
> >>
> >> struct fuse_file_info {
> >> ...
> >> /** Can be filled by open/create, to allow parallel direct writes 
> >> on this
> >>  *  file */
> >> unsigned int parallel_direct_writes : 1; --> introduced the shift
> >
> > Not expected in flush/release, so can be heuristically interpreted as flush
> >
> >>
> >> /** Indicates a flush operation.  Set in flush operation, also
> >> maybe set in highlevel lock operation and lowlevel release
> >> operation. */
> >> unsigned int flush : 1;
> >>
> >
> > Not expected in open/create, so can be heuristically interpreted as 
> > nonseekable
> >
> >> /** Can be filled in by open, to indicate that the file is not
> >> seekable. */
> >> unsigned int nonseekable : 1;
> >>
> >
> > Not expected in release, so can be heuristically interpreted as 
> > flock_release
> >
> >> /* Indicates that flock locks for this file should be
> >>released.  If set, lock_owner shall contain a valid value.
> >>May only be set in ->release(). */
> >> unsigned int flock_release : 1;
> >>
> >
> > Not expected in opendir, so can be heuristically interpreted as 
> > cache_readdir
> >
> >> /** Can be filled in by opendir. It signals the kernel to
> >> enable caching of entries returned by readdir().  Has no
> >> effect when set in other contexts (in particular it does
> >> nothing when set by open()). */
> >> unsigned int cache_readdir : 1;
> >>I am not sure I know how versioned symbols work, but
doesn't new lib always have all the functions
fuse_loop_mt_*() and old binary will be invoking the
No, I was not looking closely.

> >
> > Ignored in open, but based on the comment above, it may be
> > implied that some fs may set it in open() reply
> >
> >> /** Can be filled in by open, to indicate that flush is not needed
> >> on close. */
> >> unsigned int noflush : 1;
> >
> > noflush is just an optimization, which the kernel ignores anyway
> > when writeback cache is enabled, so no harm done if it is wrongly
> > interpreted as readdir_cache in open() and ignored.
> > It is also quite recent (3.11) so not very likely used.
> >
> >> };
> >>
> >> I.e. setting flush would actually set parallel_direct_writes
> >> (note: with binaries compiled against older libfuse versions)
> >>
> >
> > It would be suspicious if fs sets parallel_direct_writes flush at
> > flush() or release(), so that can be 

Re: Libfuse interoperability/ABI broken.

2024-03-12 Thread Bernd Schubert



On 3/11/24 14:32, Andrea Bolognani wrote:
> On Thu, Mar 07, 2024 at 07:47:23PM +0100, Bernd Schubert wrote:
>> Hi all,
>>
>> this is certainly not kind of the mail I was hoping for as a new libfuse
>> maintainer.
>>
>> As you can see from the title and from discussion below (sorry this is 
>> not typical ML discussion style), we have a bit of of problem with 
>> libfuse ABI compatibility. 
> 
> [...]
> 
 On 3/7/24 16:43, Ashley Pittman wrote:
> I’ve spotted an issue with the linked commit, the fuse_file_info struct 
> should have been modified by adding new entries just before the padding, 
> with this commit then members after the new entry will be moved creating 
> a change in the ABI for members after this.
>
> https://github.com/libfuse/libfuse/commit/a5eb7f2a0117ab43119ef5724cf5f4f2f181804a
>
> This affects the flush, nonseekable, flock_release, cache_readdir and 
> noflush flags, each one of which could be set or cleared accidentally 
> with one of the flags before or after it depending on what version of 
> libfuse the application is compiled and linked with.
>
> This isn’t a failure mode that I’ve experience before when using linux so 
> I don’t have a playbook to work from in how to correct this but 
> essentially fuse3 releases up to and including 3.13 have one ABI and 3.13 
> to 3.16 have a different ABI.
> 
> Not strictly related to the change at hand, but since we're
> discussing recent-ish changes that negatively affected backwards
> compatibility in libfuse I will mention this too:
> 
>   https://bugs.debian.org/1031802
> 
> It has been reported downstream a year ago, but I'm not sure if it
> ever got upstream's attention. Now it should have :)
> 

Arg thanks, I don't think that was ever posted. Clearly my fault, I had
added that symbol when I noticed it is missing.

Going to follow up in the debian report.


Thanks,
Bernd



Re: Libfuse interoperability/ABI broken.

2024-03-12 Thread Bernd Schubert
Hi Amir,

thanks for your help!

On 3/9/24 03:46, Amir Goldstein wrote:
> On Thu, Mar 7, 2024 at 8:47 PM Bernd Schubert
>  wrote:
>>
>> Hi all,
>>
>> this is certainly not kind of the mail I was hoping for as a new libfuse
>> maintainer.
>>
>> As you can see from the title and from discussion below (sorry this is
>> not typical ML discussion style), we have a bit of of problem with
>> libfuse ABI compatibility.
>>
>> While scanning through git history, Ashley found a commit that was adding
>> members in the middle of struct - doesn't break API, but binary
>> compatibility. That commit already landed a year ago in these releases:
>>
>> git tag --contains a5eb7f2a0117ab43119ef5724cf5f4f2f181804a
>> fuse-3.14.1
>> fuse-3.15.0
>> fuse-3.15.1
>> fuse-3.16.1
>> fuse-3.16.2
>>
>>
>> Obviously this needs improved testing for, but right now we wonder what
>> would be the preferred action to avoid issues.
>>
>> a) We could fix it and move bits to the right place. Fixes everything
>> compiled with old versions, but breaks everything compiled with the
>> versions above
>>
>> b) Increase the .so version - enforces recompilation of all binaries.
>> Intrusive, especially for distributions, but probably safest.
>>
>> c) Other ideas?
>>
> 
> Heuristically, you can detect most of the shifted flags at runtime
> because...
> 
>>
>>
>> I don't think there is anything in libfuse that would allow us to
>> detect which version of libfuse a library was linked to.
>>
> 
> I think we do know for sure if fs was linked with libfuse < 3.12
> without fuse_loop_mt_312?
> so only left to detect  3.12..3.14 vs. 3.14..3.16

Hmm, I guess I miss something, but how would I know if it was linked
with fuse_loop_mt_312? That needs an elf reader? Assuming we would put
this into the library, somehow, how does this work with stripped binaries?

bschubert2@imesrv6 example>nm passthrough_ll | head -n1
03b4 r __abi_tag
bschubert2@imesrv6 example>strip passthrough_ll
bschubert2@imesrv6 example>nm passthrough_ll | head -n1
nm: passthrough_ll: no symbols



> 
>>
>> The commit shifted these members in struct fuse_file_info {
>>
>> struct fuse_file_info {
>> ...
>> /** Can be filled by open/create, to allow parallel direct writes on 
>> this
>>  *  file */
>> unsigned int parallel_direct_writes : 1; --> introduced the shift
> 
> Not expected in flush/release, so can be heuristically interpreted as flush
> 
>>
>> /** Indicates a flush operation.  Set in flush operation, also
>> maybe set in highlevel lock operation and lowlevel release
>> operation. */
>> unsigned int flush : 1;
>>
> 
> Not expected in open/create, so can be heuristically interpreted as 
> nonseekable
> 
>> /** Can be filled in by open, to indicate that the file is not
>> seekable. */
>> unsigned int nonseekable : 1;
>>
> 
> Not expected in release, so can be heuristically interpreted as flock_release
> 
>> /* Indicates that flock locks for this file should be
>>released.  If set, lock_owner shall contain a valid value.
>>May only be set in ->release(). */
>> unsigned int flock_release : 1;
>>
> 
> Not expected in opendir, so can be heuristically interpreted as cache_readdir
> 
>> /** Can be filled in by opendir. It signals the kernel to
>> enable caching of entries returned by readdir().  Has no
>> effect when set in other contexts (in particular it does
>> nothing when set by open()). */
>> unsigned int cache_readdir : 1;
>>
> 
> Ignored in open, but based on the comment above, it may be
> implied that some fs may set it in open() reply
> 
>> /** Can be filled in by open, to indicate that flush is not needed
>> on close. */
>> unsigned int noflush : 1;
> 
> noflush is just an optimization, which the kernel ignores anyway
> when writeback cache is enabled, so no harm done if it is wrongly
> interpreted as readdir_cache in open() and ignored.
> It is also quite recent (3.11) so not very likely used.
> 
>> };
>>
>> I.e. setting flush would actually set parallel_direct_writes
>> (note: with binaries compiled against older libfuse versions)
>>
> 
> It would be suspicious if fs sets parallel_direct_writes flush at
> flush() or release(), so that can be used as a hint of old ABI
> interpreted as flush and issue a warning.
> 
> It would be pretty ugly to use these heuristics in the library forever,
> but perhaps as safety measure for binaries built with a range of
> libfuse version that is not likely to be found in the wild (3.14..3.16)
> it is an acceptable compromise?

I really like your idea about heuristics, I just don't know how to limit
the libfuse version range.

> 
> and perhaps the next libfuse version can pass the header version
> fs was compiled with as argument to fuse_loop_mt_317()?


I think adding it to fuse_loop_mt_317 would be easy and could be
auto-added 

Re: Libfuse interoperability/ABI broken.

2024-03-11 Thread Andrea Bolognani
On Thu, Mar 07, 2024 at 07:47:23PM +0100, Bernd Schubert wrote:
> Hi all,
> 
> this is certainly not kind of the mail I was hoping for as a new libfuse
> maintainer.
> 
> As you can see from the title and from discussion below (sorry this is 
> not typical ML discussion style), we have a bit of of problem with 
> libfuse ABI compatibility. 

[...]

> >> On 3/7/24 16:43, Ashley Pittman wrote:
> >>> I’ve spotted an issue with the linked commit, the fuse_file_info struct 
> >>> should have been modified by adding new entries just before the padding, 
> >>> with this commit then members after the new entry will be moved creating 
> >>> a change in the ABI for members after this.
> >>>
> >>> https://github.com/libfuse/libfuse/commit/a5eb7f2a0117ab43119ef5724cf5f4f2f181804a
> >>>
> >>> This affects the flush, nonseekable, flock_release, cache_readdir and 
> >>> noflush flags, each one of which could be set or cleared accidentally 
> >>> with one of the flags before or after it depending on what version of 
> >>> libfuse the application is compiled and linked with.
> >>>
> >>> This isn’t a failure mode that I’ve experience before when using linux so 
> >>> I don’t have a playbook to work from in how to correct this but 
> >>> essentially fuse3 releases up to and including 3.13 have one ABI and 3.13 
> >>> to 3.16 have a different ABI.

Not strictly related to the change at hand, but since we're
discussing recent-ish changes that negatively affected backwards
compatibility in libfuse I will mention this too:

  https://bugs.debian.org/1031802

It has been reported downstream a year ago, but I'm not sure if it
ever got upstream's attention. Now it should have :)

-- 
Andrea Bolognani 
Resistance is futile, you will be garbage collected.


signature.asc
Description: PGP signature


Re: Libfuse interoperability/ABI broken.

2024-03-08 Thread Amir Goldstein
On Thu, Mar 7, 2024 at 8:47 PM Bernd Schubert
 wrote:
>
> Hi all,
>
> this is certainly not kind of the mail I was hoping for as a new libfuse
> maintainer.
>
> As you can see from the title and from discussion below (sorry this is
> not typical ML discussion style), we have a bit of of problem with
> libfuse ABI compatibility.
>
> While scanning through git history, Ashley found a commit that was adding
> members in the middle of struct - doesn't break API, but binary
> compatibility. That commit already landed a year ago in these releases:
>
> git tag --contains a5eb7f2a0117ab43119ef5724cf5f4f2f181804a
> fuse-3.14.1
> fuse-3.15.0
> fuse-3.15.1
> fuse-3.16.1
> fuse-3.16.2
>
>
> Obviously this needs improved testing for, but right now we wonder what
> would be the preferred action to avoid issues.
>
> a) We could fix it and move bits to the right place. Fixes everything
> compiled with old versions, but breaks everything compiled with the
> versions above
>
> b) Increase the .so version - enforces recompilation of all binaries.
> Intrusive, especially for distributions, but probably safest.
>
> c) Other ideas?
>

Heuristically, you can detect most of the shifted flags at runtime
because...

>
>
> I don't think there is anything in libfuse that would allow us to
> detect which version of libfuse a library was linked to.
>

I think we do know for sure if fs was linked with libfuse < 3.12
without fuse_loop_mt_312?
so only left to detect  3.12..3.14 vs. 3.14..3.16

>
> The commit shifted these members in struct fuse_file_info {
>
> struct fuse_file_info {
> ...
> /** Can be filled by open/create, to allow parallel direct writes on 
> this
>  *  file */
> unsigned int parallel_direct_writes : 1; --> introduced the shift

Not expected in flush/release, so can be heuristically interpreted as flush

>
> /** Indicates a flush operation.  Set in flush operation, also
> maybe set in highlevel lock operation and lowlevel release
> operation. */
> unsigned int flush : 1;
>

Not expected in open/create, so can be heuristically interpreted as nonseekable

> /** Can be filled in by open, to indicate that the file is not
> seekable. */
> unsigned int nonseekable : 1;
>

Not expected in release, so can be heuristically interpreted as flock_release

> /* Indicates that flock locks for this file should be
>released.  If set, lock_owner shall contain a valid value.
>May only be set in ->release(). */
> unsigned int flock_release : 1;
>

Not expected in opendir, so can be heuristically interpreted as cache_readdir

> /** Can be filled in by opendir. It signals the kernel to
> enable caching of entries returned by readdir().  Has no
> effect when set in other contexts (in particular it does
> nothing when set by open()). */
> unsigned int cache_readdir : 1;
>

Ignored in open, but based on the comment above, it may be
implied that some fs may set it in open() reply

> /** Can be filled in by open, to indicate that flush is not needed
> on close. */
> unsigned int noflush : 1;

noflush is just an optimization, which the kernel ignores anyway
when writeback cache is enabled, so no harm done if it is wrongly
interpreted as readdir_cache in open() and ignored.
It is also quite recent (3.11) so not very likely used.

> };
>
> I.e. setting flush would actually set parallel_direct_writes
> (note: with binaries compiled against older libfuse versions)
>

It would be suspicious if fs sets parallel_direct_writes flush at
flush() or release(), so that can be used as a hint of old ABI
interpreted as flush and issue a warning.

It would be pretty ugly to use these heuristics in the library forever,
but perhaps as safety measure for binaries built with a range of
libfuse version that is not likely to be found in the wild (3.14..3.16)
it is an acceptable compromise?

and perhaps the next libfuse version can pass the header version
fs was compiled with as argument to fuse_loop_mt_317()?

Thanks,
Amir.



Re: Libfuse interoperability/ABI broken.

2024-03-08 Thread Wookey
On 2024-03-07 19:47 +0100, Bernd Schubert wrote:
> Hi all,
> 
> this is certainly not kind of the mail I was hoping for as a new libfuse
> maintainer.
> 
> As you can see from the title and from discussion below (sorry this is 
> not typical ML discussion style), we have a bit of of problem with 
> libfuse ABI compatibility. 

Oops. 

> While scanning through git history, Ashley found a commit that was adding
> members in the middle of struct - doesn't break API, but binary
> compatibility. That commit already landed a year ago in these releases:
> 
> git tag --contains a5eb7f2a0117ab43119ef5724cf5f4f2f181804a
> fuse-3.14.1
> fuse-3.15.0
> fuse-3.15.1
> fuse-3.16.1
> fuse-3.16.2

> > Perhaps we should bring in the distro package maintainers here for their 
> > guidance/input?  Like I say I’ve not had experience of this type of issue 
> > before but I’m sure the distros will have.

Looking at
https://tracker.debian.org/pkg/fuse
and
https://tracker.debian.org/pkg/fuse3

It looks like Debian (and thus most debian-derived distros) has not
yet moved past 3.14.0, so we appear to have narrowly avoided building anything 
using the broken ABI.
So a revert to the old ABI in 3.17 along with advice not to use 3.14.1-3.16.x 
would work fine for us.

I have no particular expertise in libfuse, so I don't know if somehow
the offending commit has been included depsite the version
numbers. https://sources.debian.org/src/fuse3/3.14.0-5/debian/patches/
suggests not. Hopefully the maintainer will weigh in on this.

So I don't think we care whether you revert or do a soname bump -
that's something you need to decide based on the fallout elsewhere in
the ecosystem. I have no idea how many people/groups/distros/softwares
are following fuse development more closely and thus have already got
'wrong' binaries.

A soname bump is probably your best bet, but if no-one even noticed
this for a year then maybe it's not that bad and just reverting will
suffice? (And send everyone on a crash course about ABIs and struct padding!)

Wookey
-- 
Principal hats:  Debian, Wookware, ARM
http://wookware.org/


signature.asc
Description: PGP signature


Re: Libfuse interoperability/ABI broken.

2024-03-07 Thread Bernd Schubert
Hi all,

this is certainly not kind of the mail I was hoping for as a new libfuse
maintainer.

As you can see from the title and from discussion below (sorry this is 
not typical ML discussion style), we have a bit of of problem with 
libfuse ABI compatibility. 

While scanning through git history, Ashley found a commit that was adding
members in the middle of struct - doesn't break API, but binary
compatibility. That commit already landed a year ago in these releases:

git tag --contains a5eb7f2a0117ab43119ef5724cf5f4f2f181804a
fuse-3.14.1
fuse-3.15.0
fuse-3.15.1
fuse-3.16.1
fuse-3.16.2


Obviously this needs improved testing for, but right now we wonder what
would be the preferred action to avoid issues.

a) We could fix it and move bits to the right place. Fixes everything
compiled with old versions, but breaks everything compiled with the
versions above

b) Increase the .so version - enforces recompilation of all binaries.
Intrusive, especially for distributions, but probably safest.

c) Other ideas?



I don't think there is anything in libfuse that would allow us to
detect which version of libfuse a library was linked to.


The commit shifted these members in struct fuse_file_info {

struct fuse_file_info {
...
/** Can be filled by open/create, to allow parallel direct writes on 
this
 *  file */
unsigned int parallel_direct_writes : 1; --> introduced the shift

/** Indicates a flush operation.  Set in flush operation, also
maybe set in highlevel lock operation and lowlevel release
operation. */
unsigned int flush : 1;

/** Can be filled in by open, to indicate that the file is not
seekable. */
unsigned int nonseekable : 1;

/* Indicates that flock locks for this file should be
   released.  If set, lock_owner shall contain a valid value.
   May only be set in ->release(). */
unsigned int flock_release : 1;

/** Can be filled in by opendir. It signals the kernel to
enable caching of entries returned by readdir().  Has no
effect when set in other contexts (in particular it does
nothing when set by open()). */
unsigned int cache_readdir : 1;

/** Can be filled in by open, to indicate that flush is not needed
on close. */
unsigned int noflush : 1;
};

I.e. setting flush would actually set parallel_direct_writes
(note: with binaries compiled against older libfuse versions)


For the high level API it is probably less critical, in struct fuse_config
these fields are shifted:

struct fuse_config {
...
/**
 *  Allow parallel direct-io writes to operate on the same file.
 *
 *  FUSE implementations which do not handle parallel writes on
 *  same file/region should NOT enable this option at all as it
 *  might lead to data inconsistencies.
 *
 *  For the FUSE implementations which have their own mechanism
 *  of cache/data integrity are beneficiaries of this setting as
 *  it now open doors to parallel writes on the same file (without
 *  enabling this setting, all direct writes on the same file are
 *  serialized, resulting in huge data bandwidth loss).
 */
int parallel_direct_writes;

/**
 * The remaining options are used by libfuse internally and
 * should not be touched.
 */
int show_help;
char *modules;
int debug;
};


I don't think there is a security concern, but probably more a
data safety issue. So I included open mailing lists.


Thanks,
Bernd


On 3/7/24 19:02, Ashley Pittman wrote:
> 
> Simply bumping the .so number and forcing a rebuild would certainly work.  It 
> would probably be the safest option but also put the highest cost on users, 
> although I think for this kind of bug then a manual step of verifying the 
> versions are correct is needed so forcing a build failure isn’t a bad option. 
>  It would massively increase the visibility if this if nothing else.
> 
> Perhaps we should bring in the distro package maintainers here for their 
> guidance/input?  Like I say I’ve not had experience of this type of issue 
> before but I’m sure the distros will have.
> 
> Ashley.
> 
>> On 7 Mar 2024, at 16:05, Bernd Schubert  wrote:
>>
>> Hi Ashley,
>>
>> thanks for spotting and embarrassing as I had been involved in
>> development that feature.
>>
>> Hard to decide if revert it right - issue is, if we revert, everything
>> linked with the new version would broken as well. And it is now already
>> a year...
>>
>> I think we can only increase the .so version and send out a warning
>> (mailing list, distributions).
>>
>> In order to avoid it in the future, we can probably add some compile
>> time asserts about struct member positions.
>>
>>
>> Bernd
>>
>> On 3/7/24 16:43, Ashley Pittman wrote:
>>>
>>> I’ve spotted an issue with the