Re: [RFC] should read(2) update the position if it returns an error?

2013-06-18 Thread Al Viro
On Fri, Jun 14, 2013 at 08:38:19AM -0700, Linus Torvalds wrote:
> On Fri, Jun 14, 2013 at 1:05 AM, Al Viro  wrote:
> >
> > Comments?  I'd obviously prefer to solve it that way (i.e. leave
> > ->f_pos untouched if vfs_read() returns an error), but I might be missing
> > some case where we want position updated even though read() returns an
> > error.  I can't come up with one, but then I hadn't RTFS through every
> > ->read() instance in drivers in search of weird cases like that - we've
> > too many instances ;-/
> 
> Not updating f_pos on errors sounds like the right thing to do to me,
> and if it also ends up fixing some nasty issues with hpfs and
> potentially other cases, I'd say "go for it".
> 
> Not for 3.10, though. It's not like this is a new - or acute - problem.

Grr...  I've just crawled through ->llseek() instances and I really wonder
what to do with that crap.  Debugfs idiocies had been obvious, but there
are much stranger turds floating there.

Some random examples:
drivers/sbus/char/flash.c:
SEEK_CUR beyond the EOF silently trims position to EOF
SEEK_SET to the same position just sets it
SEEK_END sets position to EOF, period - offset is simply ignored

cris eeprom:
SEEK_END inverts the sign of offset
any seek to negative returns EOVERFLOW (instead of usual EINVAL)
*and* sets position to 0
any seek to or beyond the EOF (there's an off-by-one) also
returns EOVERFLOW *and* sets position to EOF - 1.  Yes, that includes
lseek(fd, SEEK_END, 0)

carma-fpga-program.c:
SEEK_END inverts sign of offset
seeks past EOF give EINVAL
seeks to negative offsets pass on 32bit and give EINVAL on 64bit

drivers/s390/char/vmur.c:
fails when offset is not a multiple of 4Kb; the trouble
being, read() can bloody well leave you with position not being such
a multiple, so that check buys us nothing - SEEK_CUR for 12Kb might
be an equivalent of SEEK_SET to 13Kb; the former is allowed, the
latter isn't.

drivers/staging/vme/devices/vme_user.c:
apparent off-by-one - SEEK_END:-1 succeeds, SEEK_END:0 gives
EINVAL.

Then there's a bunch of drivers that either don't allow SEEK_END while
having an upper limit to seekable positions acting as EOF would or
allow seeks past what SEEK_END:0 yields.

All that mess is a user-visible ABI and at least for the weird eeprom
devices I would be very cautious with fixing it - the userland code
dealing with those animals is rather obscure, can easily turn out
to rely on the idiotic behaviour and brick the boxen if things go
wrong.

I'm switching everything I can to use of generic_file_llseek_size();
quite a few can be switched, often along with dropping the locking
that served only to serialize lseek vs. lseek.  The rest is going to
be unpleasant.

There's another pile of fun around f_pos manipulations - try to run
ecryptfs on top of tmpfs or ceph and do seeks on directories.
ecryptfs ->readdir() assigns ->f_pos of underlying file and calls
underlying ->readdir(); the trouble is, underlying fs has things it
wants done on ->f_pos changes (cursor needs to be moved, etc.).
ecryptfs on top of hpfs is also a lot of fun, not that anybody had been
using that combination.

Sigh...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] should read(2) update the position if it returns an error?

2013-06-18 Thread Al Viro
On Fri, Jun 14, 2013 at 08:38:19AM -0700, Linus Torvalds wrote:
 On Fri, Jun 14, 2013 at 1:05 AM, Al Viro v...@zeniv.linux.org.uk wrote:
 
  Comments?  I'd obviously prefer to solve it that way (i.e. leave
  -f_pos untouched if vfs_read() returns an error), but I might be missing
  some case where we want position updated even though read() returns an
  error.  I can't come up with one, but then I hadn't RTFS through every
  -read() instance in drivers in search of weird cases like that - we've
  too many instances ;-/
 
 Not updating f_pos on errors sounds like the right thing to do to me,
 and if it also ends up fixing some nasty issues with hpfs and
 potentially other cases, I'd say go for it.
 
 Not for 3.10, though. It's not like this is a new - or acute - problem.

Grr...  I've just crawled through -llseek() instances and I really wonder
what to do with that crap.  Debugfs idiocies had been obvious, but there
are much stranger turds floating there.

Some random examples:
drivers/sbus/char/flash.c:
SEEK_CUR beyond the EOF silently trims position to EOF
SEEK_SET to the same position just sets it
SEEK_END sets position to EOF, period - offset is simply ignored

cris eeprom:
SEEK_END inverts the sign of offset
any seek to negative returns EOVERFLOW (instead of usual EINVAL)
*and* sets position to 0
any seek to or beyond the EOF (there's an off-by-one) also
returns EOVERFLOW *and* sets position to EOF - 1.  Yes, that includes
lseek(fd, SEEK_END, 0)

carma-fpga-program.c:
SEEK_END inverts sign of offset
seeks past EOF give EINVAL
seeks to negative offsets pass on 32bit and give EINVAL on 64bit

drivers/s390/char/vmur.c:
fails when offset is not a multiple of 4Kb; the trouble
being, read() can bloody well leave you with position not being such
a multiple, so that check buys us nothing - SEEK_CUR for 12Kb might
be an equivalent of SEEK_SET to 13Kb; the former is allowed, the
latter isn't.

drivers/staging/vme/devices/vme_user.c:
apparent off-by-one - SEEK_END:-1 succeeds, SEEK_END:0 gives
EINVAL.

Then there's a bunch of drivers that either don't allow SEEK_END while
having an upper limit to seekable positions acting as EOF would or
allow seeks past what SEEK_END:0 yields.

All that mess is a user-visible ABI and at least for the weird eeprom
devices I would be very cautious with fixing it - the userland code
dealing with those animals is rather obscure, can easily turn out
to rely on the idiotic behaviour and brick the boxen if things go
wrong.

I'm switching everything I can to use of generic_file_llseek_size();
quite a few can be switched, often along with dropping the locking
that served only to serialize lseek vs. lseek.  The rest is going to
be unpleasant.

There's another pile of fun around f_pos manipulations - try to run
ecryptfs on top of tmpfs or ceph and do seeks on directories.
ecryptfs -readdir() assigns -f_pos of underlying file and calls
underlying -readdir(); the trouble is, underlying fs has things it
wants done on -f_pos changes (cursor needs to be moved, etc.).
ecryptfs on top of hpfs is also a lot of fun, not that anybody had been
using that combination.

Sigh...
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] should read(2) update the position if it returns an error?

2013-06-14 Thread Linus Torvalds
On Fri, Jun 14, 2013 at 1:05 AM, Al Viro  wrote:
>
> Comments?  I'd obviously prefer to solve it that way (i.e. leave
> ->f_pos untouched if vfs_read() returns an error), but I might be missing
> some case where we want position updated even though read() returns an
> error.  I can't come up with one, but then I hadn't RTFS through every
> ->read() instance in drivers in search of weird cases like that - we've
> too many instances ;-/

Not updating f_pos on errors sounds like the right thing to do to me,
and if it also ends up fixing some nasty issues with hpfs and
potentially other cases, I'd say "go for it".

Not for 3.10, though. It's not like this is a new - or acute - problem.

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC] should read(2) update the position if it returns an error?

2013-06-14 Thread Al Viro
We have an unpleasant HPFS (at least) race with read(2) vs. unlink(2)
The thing is, HPFS and several other filesystems keep track of all opened
struct file for directory and update the position in it upon directory
modifications.  For HPFS it's particulary painful, since it encodes the
location of directory block (times 64 + number of entry in block) as ->f_pos.
lseek() validates the offset under i_mutex, directory modifications update
->f_pos of all files (again, under i_mutex) and readdir does all accesses
under i_mutex.  The trouble with that scheme is sys_read().  There we
save the current position, pass the address of local copy to vfs_read()
and, once vfs_read() has returned an error, store the value in that local
copy back into ->f_pos.  The value is unmodified, of course, and everything
would be fine if not for the following problem: directory modification
done in parallel with that has no idea of that local copy and does *not*
update it.  It does update ->f_pos, but that update gets reverted as soon
as vfs_read() returns and sys_read() does file_pos_write().

It's not just HPFS - HFS, HFS+ and sysfs have a similar (but probably
milder) issue.  For HPFS it's really nasty - we might end up with subsequent
readdir() reading from a completely unrelated directory ;-/

We obviously don't want to grab i_mutex in sys_read(), but I really
wonder if we should do file_pos_write() there in case of vfs_read() returning
an error.  Most of the ->read() instances leave *pos unchanged if they
decide to return an error - after all, if we'd consumed some data, the right
thing is short read, with whatever errors happening on subsequent reads.
I'm not sure if it's mandated by POSIX, but it seems to be what the userland
would reasonably expect.  And that matches the usual logics with e.g.
interruptible waits catching signals in ->read() instances...

Comments?  I'd obviously prefer to solve it that way (i.e. leave
->f_pos untouched if vfs_read() returns an error), but I might be missing
some case where we want position updated even though read() returns an
error.  I can't come up with one, but then I hadn't RTFS through every
->read() instance in drivers in search of weird cases like that - we've
too many instances ;-/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC] should read(2) update the position if it returns an error?

2013-06-14 Thread Al Viro
We have an unpleasant HPFS (at least) race with read(2) vs. unlink(2)
The thing is, HPFS and several other filesystems keep track of all opened
struct file for directory and update the position in it upon directory
modifications.  For HPFS it's particulary painful, since it encodes the
location of directory block (times 64 + number of entry in block) as -f_pos.
lseek() validates the offset under i_mutex, directory modifications update
-f_pos of all files (again, under i_mutex) and readdir does all accesses
under i_mutex.  The trouble with that scheme is sys_read().  There we
save the current position, pass the address of local copy to vfs_read()
and, once vfs_read() has returned an error, store the value in that local
copy back into -f_pos.  The value is unmodified, of course, and everything
would be fine if not for the following problem: directory modification
done in parallel with that has no idea of that local copy and does *not*
update it.  It does update -f_pos, but that update gets reverted as soon
as vfs_read() returns and sys_read() does file_pos_write().

It's not just HPFS - HFS, HFS+ and sysfs have a similar (but probably
milder) issue.  For HPFS it's really nasty - we might end up with subsequent
readdir() reading from a completely unrelated directory ;-/

We obviously don't want to grab i_mutex in sys_read(), but I really
wonder if we should do file_pos_write() there in case of vfs_read() returning
an error.  Most of the -read() instances leave *pos unchanged if they
decide to return an error - after all, if we'd consumed some data, the right
thing is short read, with whatever errors happening on subsequent reads.
I'm not sure if it's mandated by POSIX, but it seems to be what the userland
would reasonably expect.  And that matches the usual logics with e.g.
interruptible waits catching signals in -read() instances...

Comments?  I'd obviously prefer to solve it that way (i.e. leave
-f_pos untouched if vfs_read() returns an error), but I might be missing
some case where we want position updated even though read() returns an
error.  I can't come up with one, but then I hadn't RTFS through every
-read() instance in drivers in search of weird cases like that - we've
too many instances ;-/
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] should read(2) update the position if it returns an error?

2013-06-14 Thread Linus Torvalds
On Fri, Jun 14, 2013 at 1:05 AM, Al Viro v...@zeniv.linux.org.uk wrote:

 Comments?  I'd obviously prefer to solve it that way (i.e. leave
 -f_pos untouched if vfs_read() returns an error), but I might be missing
 some case where we want position updated even though read() returns an
 error.  I can't come up with one, but then I hadn't RTFS through every
 -read() instance in drivers in search of weird cases like that - we've
 too many instances ;-/

Not updating f_pos on errors sounds like the right thing to do to me,
and if it also ends up fixing some nasty issues with hpfs and
potentially other cases, I'd say go for it.

Not for 3.10, though. It's not like this is a new - or acute - problem.

 Linus
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/