Re: Splicing to/from a tty

2021-01-26 Thread Al Viro
On Tue, Jan 26, 2021 at 10:49:11AM -0800, Linus Torvalds wrote:
> On Mon, Jan 25, 2021 at 10:07 PM Al Viro  wrote:
> >
> > On Sun, Jan 24, 2021 at 11:11:42AM -0800, Linus Torvalds wrote:
> > >
> > > I agree that it would be better fixed by just having sendfile()
> > > basically turn into splice() for the pipe target case, but I haven't
> > > seen any patches from you, so I assume it wasn't 100% trivial.
> >
> > Just to make clear - sendfile() regular-to-pipe is *not* the same
> > issue as splice to/from tty.
> 
> That's not what I meant.
> 
> sendfile() to a pipe is basically the same thing as splice() to a pipe.
> 
> Except I think it might have different looping behavior. And as you
> noted earlier, the error returns may be randomly different.

Sure.  What I'm saying is that there'd been two different regressions
discussed in that thread, and the one dealt with by this patch series
is not the one brought up in the original posting.  IOW, it's an
alternative not to "let's add ->splice_{read,write}() for tty" stuff
but to "let's add ->splice_write() for pipes" one-liner.

I do not doubt that you know that.  But the thread had been confusing
enough (hell, the sendfile-related subthread has kept the original
subject explicitly refering to tty), so I wanted to avoid any confusion
for somebody looking through it a year or two down the road.


Re: Splicing to/from a tty

2021-01-26 Thread Linus Torvalds
On Mon, Jan 25, 2021 at 10:07 PM Al Viro  wrote:
>
> On Sun, Jan 24, 2021 at 11:11:42AM -0800, Linus Torvalds wrote:
> >
> > I agree that it would be better fixed by just having sendfile()
> > basically turn into splice() for the pipe target case, but I haven't
> > seen any patches from you, so I assume it wasn't 100% trivial.
>
> Just to make clear - sendfile() regular-to-pipe is *not* the same
> issue as splice to/from tty.

That's not what I meant.

sendfile() to a pipe is basically the same thing as splice() to a pipe.

Except I think it might have different looping behavior. And as you
noted earlier, the error returns may be randomly different.

 Linus


Re: Splicing to/from a tty

2021-01-26 Thread Al Viro
On Sun, Jan 24, 2021 at 11:11:42AM -0800, Linus Torvalds wrote:
> Al,
>  coming back to this because rc5 is imminent..
> 
> On Mon, Jan 18, 2021 at 11:45 AM Al Viro  wrote:
> >
> > do_splice_direct() does something that do_splice() won't - it
> > handles non-pipe to non-pipe case.  Which is how sendfile(2) is
> > normally used, of course.
> >
> > I'll look into that in more details, but IMO bothering with
> > internal pipe is just plain wrong for those cases.
> 
> You clearly thought about this, with the emails about odd error cases,
> but I get the feeling that for fixing the current "you can't
> sendfile() to a pipe" regression (including stable) we should do the
> one-liner. No?
> 
> I agree that it would be better fixed by just having sendfile()
> basically turn into splice() for the pipe target case, but I haven't
> seen any patches from you, so I assume it wasn't 100% trivial.
> 
> Hmm?

Just to make clear - sendfile() regular-to-pipe is *not* the same
issue as splice to/from tty.  The latter needs ->splice_read()
and ->splice_write() in tty_fops; the former can be dealt with
by teaching do_sendifile() to use the guts of do_splice() in
case when in or out happens to be a pipe (with some rearrangement
of checks) instead of bothering with do_splice_direct().

The only commonality between these two is that both got broken
by the same patch series.  Johannes' patch is an attempt to
deal with regular-to-pipe sendfile(2), and it's not a good
way to handle that.  Neither it, nor the variant I proposed
would do a damn thing for tty (and sendfile(2) never worked
for source other than regular or block anyway).  FWIW, the real
check in do_splice_direct() should be "has FMODE_PREAD", regardless
of the position we are asking to read from - do_splice_direct()
is basically
while left to copy
splice_read from in to internal pipe
splice_write from internal pipe to out
and if splice_write ends up with short copy, we advance the position
by the amount written and discard everything left in the internal pipe.
Using it for something non-seekable would end up with data silently
lost on short copy.

Note that decision against splice(2) between non-pipes appears
to have been deliberate and unless Jens (and mingo?) decide
they are OK with that change, I'm *not* adding that functionality
to do_splice().

Anyway, the series is in vfs.git #work.sendfile, 5.11-rc1-based

Shortlog:
Al Viro (3):
  do_splice_to(): move the logics for limiting the read length in
  take the guts of file-to-pipe splice into a helper function
  teach sendfile(2) to handle send-to-pipe directly

Diffstat:
 fs/internal.h   |  9 +
 fs/read_write.c | 19 +--
 fs/splice.c | 42 +++---
 3 files changed, 45 insertions(+), 25 deletions(-)

Individual patches in followups; first two are equivalent transformations
(fairly obvious cleanup and taking a part of do_splice() into a helper),
while the third one makes do_sendfile() use that new helper for file-to-pipe
case.


Re: Splicing to/from a tty

2021-01-24 Thread Linus Torvalds
Al,
 coming back to this because rc5 is imminent..

On Mon, Jan 18, 2021 at 11:45 AM Al Viro  wrote:
>
> do_splice_direct() does something that do_splice() won't - it
> handles non-pipe to non-pipe case.  Which is how sendfile(2) is
> normally used, of course.
>
> I'll look into that in more details, but IMO bothering with
> internal pipe is just plain wrong for those cases.

You clearly thought about this, with the emails about odd error cases,
but I get the feeling that for fixing the current "you can't
sendfile() to a pipe" regression (including stable) we should do the
one-liner. No?

I agree that it would be better fixed by just having sendfile()
basically turn into splice() for the pipe target case, but I haven't
seen any patches from you, so I assume it wasn't 100% trivial.

Hmm?

  Linus


Re: Splicing to/from a tty

2021-01-21 Thread Linus Torvalds
On Thu, Jan 21, 2021 at 9:03 AM Robert Karszniewicz
 wrote:
>
> I confirm that the 4 patches, as well as the 4+2 patches, fix the regression I
> noticed with cat failing on sendfile() to ttymxc0.

Thanks. Now we just need to find somebody with HDLC use cases and
we'll actually have this series properly tested...

  Linus


Re: tty splice branch (was "Re: Splicing to/from a tty")

2021-01-21 Thread Linus Torvalds
On Thu, Jan 21, 2021 at 12:58 AM Jiri Slaby  wrote:
>
> Which is weird as you Cced me. Let me check what is wrong with my e-mail
> setup.

Yeah, I think you were cc'd on not just the patches, but on all the
discussion that preceded them.

But better late than never. I was actually much more nervous about the
much more subtle "cookie continuation" stuff, not the trivial
conversion patches.

Which just goes to show that most bugs are when you don't really think
about it because it's "trivial".

Linus


Re: Splicing to/from a tty

2021-01-21 Thread Robert Karszniewicz
On 1/20/21 5:44 AM, Linus Torvalds wrote:
> On Tue, Jan 19, 2021 at 5:29 PM Oliver Giles  wrote:
>>
>> Writing this from a kernel with those patches in; happily splice()ing
>> to and from a pty.
> 
> Ok, good.
> 
> I have a couple of improvement patches on top of those, that I'm attaching 
> here.
> 
> [...]
> 
> But in the meantime, here's two more patches to try on top of the
> previous four. They shouldn't matter, other than making the non-icanon
> throughput a lot better. But the more coverage they get, the happier
> I'll be.

I confirm that the 4 patches, as well as the 4+2 patches, fix the regression I
noticed with cat failing on sendfile() to ttymxc0.

Thanks,
Robert


RE: Splicing to/from a tty

2021-01-21 Thread David Laight
From: Linus Torvalds
> Sent: 21 January 2021 01:04
> On Wed, Jan 20, 2021 at 4:38 PM Al Viro  wrote:
> >
> > OK...  I wonder how many debugfs writable files allow pwrite() with
> > BS results...
> 
> I hope some of them check for "pos == 0" when they start parsing integers.
> 
> But honestly, I don't think it's a big deal. We've had these things
> that just basically assume that whenever you write, the offset just
> doesn't matter at all, and as long as some number comes in one single
> write call, we accept it.
> 
> Because even if you end up doing something like just
> 
>echo $SOMETHING > /sys/xyz/abc
> 
> and that "$SOMETHING" could be done multiple writes, in practice it
> all works out just fine and it never really is. You almost have to try
> to screw up with something like
> 
>   (echo -n 3; echo -n 4) > /sys/xyz/abc
> 
> to actually see two writes of "3" and "4" instead of one write with
> "34". And honestly, if somebody does something like that, do we really
> care? They might get 3, they might get 4, and they might get 34. They
> get what they deserve.

Or worse:
echo abc >/sys/xyz/abc
echo x | dd bs=1 count=2 oseek=1 conv=notrunc of=/sys/xyz/abc
which (if I got the dd command right) would generate "axc" on a real file.

OTOH multiple short reads are quite likely.
Best not done on a counter...

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)


Re: tty splice branch (was "Re: Splicing to/from a tty")

2021-01-21 Thread Greg Kroah-Hartman
On Thu, Jan 21, 2021 at 09:50:39AM +0100, Jiri Slaby wrote:
> On 21. 01. 21, 2:18, Linus Torvalds wrote:
> > On Tue, Jan 19, 2021 at 8:44 PM Linus Torvalds
> >  wrote:
> > > 
> > > I'll come back to this tomorrow and do the line-buffered icanon case
> > > too (unless pull requests pile up), and then I'll be happy with the
> > > tty changes, and I think I can submit this series for real to Greg.
> > 
> > Greg, I don't know how you want to handle this.
> > 
> > I have a branch with my tty splice patches at
> > 
> >  git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
> > tty-splice
> > 
> > and that now includes doing that "cookie continuation" thing even for
> > the N_TTY icanon modes.
> > 
> > It passes my local tests, and I did try a few rather odd things. And
> > Oliver tested an ealier version without that final commit on his load.
> > But...
> 
> Hm, I would like to review this first. I noticed the changes only because a
> new branch appeared when I grabbed your tree and the branch has "tty" in its
> name.
> 
> So for example:
> 
> > @@ -1038,18 +1045,15 @@ static ssize_t tty_write(struct file *file, const 
> > char __user *buf,
> > if (tty->ops->write_room == NULL)
> > tty_err(tty, "missing write_room method\n");
> > ld = tty_ldisc_ref_wait(tty);
> > -   if (!ld)
> > -   return hung_up_tty_write(file, buf, count, ppos);
> > -   if (!ld->ops->write)
> > +   if (!ld || !ld->ops->write)
> > ret = -EIO;
> > else
> > -   ret = do_tty_write(ld->ops->write, tty, file, buf, count);
> > +   ret = do_tty_write(ld->ops->write, tty, file, from);
> > tty_ldisc_deref(ld);
> 
> if ld == NULL => crash here.
> 
> So can you send the patches to the list and Cc me too?

I'll send them out right now, I've already merged them to my branches.

greg k-h


Re: tty splice branch (was "Re: Splicing to/from a tty")

2021-01-21 Thread Jiri Slaby

On 21. 01. 21, 9:50, Jiri Slaby wrote:
Hm, I would like to review this first. I noticed the changes only 
because a new branch appeared when I grabbed your tree and the branch 
has "tty" in its name.


Which is weird as you Cced me. Let me check what is wrong with my e-mail 
setup.


thanks,
--
js


Re: tty splice branch (was "Re: Splicing to/from a tty")

2021-01-21 Thread Jiri Slaby

On 21. 01. 21, 2:18, Linus Torvalds wrote:

On Tue, Jan 19, 2021 at 8:44 PM Linus Torvalds
 wrote:


I'll come back to this tomorrow and do the line-buffered icanon case
too (unless pull requests pile up), and then I'll be happy with the
tty changes, and I think I can submit this series for real to Greg.


Greg, I don't know how you want to handle this.

I have a branch with my tty splice patches at

 git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git tty-splice

and that now includes doing that "cookie continuation" thing even for
the N_TTY icanon modes.

It passes my local tests, and I did try a few rather odd things. And
Oliver tested an ealier version without that final commit on his load.
But...


Hm, I would like to review this first. I noticed the changes only 
because a new branch appeared when I grabbed your tree and the branch 
has "tty" in its name.


So for example:


@@ -1038,18 +1045,15 @@ static ssize_t tty_write(struct file *file, const char 
__user *buf,
if (tty->ops->write_room == NULL)
tty_err(tty, "missing write_room method\n");
ld = tty_ldisc_ref_wait(tty);
-   if (!ld)
-   return hung_up_tty_write(file, buf, count, ppos);
-   if (!ld->ops->write)
+   if (!ld || !ld->ops->write)
ret = -EIO;
else
-   ret = do_tty_write(ld->ops->write, tty, file, buf, count);
+   ret = do_tty_write(ld->ops->write, tty, file, from);
tty_ldisc_deref(ld);


if ld == NULL => crash here.

So can you send the patches to the list and Cc me too?


return ret;
 }

thanks,
--
js


Re: tty splice branch (was "Re: Splicing to/from a tty")

2021-01-21 Thread Greg Kroah-Hartman
On Wed, Jan 20, 2021 at 05:18:36PM -0800, Linus Torvalds wrote:
> On Tue, Jan 19, 2021 at 8:44 PM Linus Torvalds
>  wrote:
> >
> > I'll come back to this tomorrow and do the line-buffered icanon case
> > too (unless pull requests pile up), and then I'll be happy with the
> > tty changes, and I think I can submit this series for real to Greg.
> 
> Greg, I don't know how you want to handle this.
> 
> I have a branch with my tty splice patches at
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
> tty-splice
> 
> and that now includes doing that "cookie continuation" thing even for
> the N_TTY icanon modes.
> 
> It passes my local tests, and I did try a few rather odd things. And
> Oliver tested an ealier version without that final commit on his load.
> But...
> 
> That tty splice thing is clearly a regression, but it's not like we
> have seen a lot of reports of it, so it's clearly a very special
> thing.
> 
> End result: I'm leaving it to you to decide how you want to handle it.
> You can tell me to just merge it myself as a regression fix, despite
> it being fairly late in the 5.11 series.  Or you can pull it into your
> tty tree for linux-next and 5.12. And we can just plan to backport it
> (for 5.10 and 5.11) later when it has had more wide testing.
> 
> Another alternative is to do just that first patch immediately (the
> "tty: implement write_iter" one), because that one should be the
> simple case that gets sendfile() and splice() working when the
> _destination_ is a tty. The "source is a tty" is the much more complex
> case that the other patches deal with.

Let me do this last thing.  I've taken your one patch into my
"tty-linus" branch and will go beat on it for a day and then ask you to
pull it in for the next 5.11-rc release, and I've taken your full series
into my "tty-next" branch so it will get much wider testing in
linux-next for a few weeks.  If it turns out that we get reports of the
"splice/sendfile from a tty", we can always merge them into 5.11 and
5.10 as needed.

Thanks for doing this work,

greg k-h


Re: Splicing to/from a tty

2021-01-21 Thread Johannes Berg
On Thu, 2021-01-21 at 07:05 +0100, Willy Tarreau wrote:

> I think that most users of splice() on sockets got used to falling back
> to recv/send on splice failure due to various cases not being supported
> historically (UNIX family sockets immediately come to my mind but I seem
> to remember other combinations).

Note, however, that I got here because cgit, if using sendfile(), does
*not* fall back if it fails (and thus my git tree view is currently down
because I haven't downgraded the kernel so far). That may not be common
for splice() though.

johannes



Re: Splicing to/from a tty

2021-01-20 Thread Linus Torvalds
On Wed, Jan 20, 2021 at 3:14 PM Al Viro  wrote:
>
> Umm...  Why do we clear FMODE_PWRITE there [seq_open - ed], anyway?

I think it's pointless and historical, and comes from "several /proc
files supported the simple single-write model, nothing ever supported
moving around and writing".

The seq_file stuff was always about reading, and then the writing part
was generally random special-case hacks on the side.

So I think that "clear PWRITE" thing is to make sure we get sane error
cases if somebody tries something funny, knowing that none of the
hacky stuff support it.

And then the very special kernfs thing adds it back in, because it
does in fact allow seeking writes.

 Linus


Re: Splicing to/from a tty

2021-01-20 Thread Willy Tarreau
On Wed, Jan 20, 2021 at 07:38:38PM -0800, Linus Torvalds wrote:
> On Wed, Jan 20, 2021 at 5:45 PM Al Viro  wrote:
> >
> > splice() triggers an error for seekable destination with O_APPEND and
> > with NULL off_out.
> 
> Ok, that's just broken.
> 
> > Same for splice() to socket with
> > fcntl(sock_fd, F_SETFL, O_APPEND);
> > done first.
> 
> Same.
> 
> As long as you don't pass a position pointer, I think both should just work.
> 
> Not that I imagine it matters for a lot of people..

I think that most users of splice() on sockets got used to falling back
to recv/send on splice failure due to various cases not being supported
historically (UNIX family sockets immediately come to my mind but I seem
to remember other combinations). Thus I guess that most users of splice()
detect that it doesn't work either due to lower than expected performance
or while running strace.

Willy


Re: Splicing to/from a tty

2021-01-20 Thread Linus Torvalds
On Wed, Jan 20, 2021 at 5:45 PM Al Viro  wrote:
>
> splice() triggers an error for seekable destination with O_APPEND and
> with NULL off_out.

Ok, that's just broken.

> Same for splice() to socket with
> fcntl(sock_fd, F_SETFL, O_APPEND);
> done first.

Same.

As long as you don't pass a position pointer, I think both should just work.

Not that I imagine it matters for a lot of people..

  Linus

   Linus


Re: Splicing to/from a tty

2021-01-20 Thread Al Viro
On Wed, Jan 20, 2021 at 05:04:17PM -0800, Linus Torvalds wrote:

> The whole point of O_APPEND is that the position shouldn't matter.
> 
> And the whole point of "pwrite()" is that you specify a position.
> 
> So the two just do not go together - although we may have legacy
> issues, of course.

Our pwrite(2):
BUGS
   POSIX  requires  that  opening  a  file  with the O_APPEND flag
   should have no effect on the location at which pwrite() writes
   data.  However, on Linux, if a file is opened with O_APPEND,
   pwrite() appends data to the  end of the file, regardless of
   the value of offset.
POSIX pwrite(2):
The pwrite() function shall be equivalent to write(), except that
it writes into a given position and does not change the file offset
(regardless of whether O_APPEND is set).  The first three arguments
to pwrite() are the same as write() with the addition of a fourth
argument offset for the desired position inside the file.  An attempt
to perform a pwrite() on a file that is incapable of seeking shall
result in an error.

I don't believe that we could change behaviour of our pwrite(2) without
breaking userland, even if we wanted to.  It's been that way since
2.1.60 when pwrite() had been first introduced - 23 years ago is more
than enough to have it cast in stone.  We do allow pwrite(2) with O_APPEND
and on such descriptors it acts like write(2) on the same.

> Now, splice() is able to do *both* write() and pwrite(), because
> unlike pwrite() it doesn't take a "pos" argument, it takes a _pointer_
> to pos. So with a NULL pointer, it's like read/write, and with a
> non-NULL pointer it is like pread/pwrite.
> 
> So I do think that "splice with non-NULL off_out and O_APPEND" should
> cause an error in general.

splice() triggers an error for seekable destination with O_APPEND and
with NULL off_out.  Same for splice() to socket with
fcntl(sock_fd, F_SETFL, O_APPEND);
done first.
 
> Honestly, I don't think it's a huge deal. O_APPEND isn't that
> interesting, but I do hope that if we allow O_APPEND and a file
> position, then O_APPEND always overrides it.

It does, when it is allowed.


tty splice branch (was "Re: Splicing to/from a tty")

2021-01-20 Thread Linus Torvalds
On Tue, Jan 19, 2021 at 8:44 PM Linus Torvalds
 wrote:
>
> I'll come back to this tomorrow and do the line-buffered icanon case
> too (unless pull requests pile up), and then I'll be happy with the
> tty changes, and I think I can submit this series for real to Greg.

Greg, I don't know how you want to handle this.

I have a branch with my tty splice patches at

git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git tty-splice

and that now includes doing that "cookie continuation" thing even for
the N_TTY icanon modes.

It passes my local tests, and I did try a few rather odd things. And
Oliver tested an ealier version without that final commit on his load.
But...

That tty splice thing is clearly a regression, but it's not like we
have seen a lot of reports of it, so it's clearly a very special
thing.

End result: I'm leaving it to you to decide how you want to handle it.
You can tell me to just merge it myself as a regression fix, despite
it being fairly late in the 5.11 series.  Or you can pull it into your
tty tree for linux-next and 5.12. And we can just plan to backport it
(for 5.10 and 5.11) later when it has had more wide testing.

Another alternative is to do just that first patch immediately (the
"tty: implement write_iter" one), because that one should be the
simple case that gets sendfile() and splice() working when the
_destination_ is a tty. The "source is a tty" is the much more complex
case that the other patches deal with.

 Linus


Re: Splicing to/from a tty

2021-01-20 Thread Linus Torvalds
On Wed, Jan 20, 2021 at 4:38 PM Al Viro  wrote:
>
> OK...  I wonder how many debugfs writable files allow pwrite() with
> BS results...

I hope some of them check for "pos == 0" when they start parsing integers.

But honestly, I don't think it's a big deal. We've had these things
that just basically assume that whenever you write, the offset just
doesn't matter at all, and as long as some number comes in one single
write call, we accept it.

Because even if you end up doing something like just

   echo $SOMETHING > /sys/xyz/abc

and that "$SOMETHING" could be done multiple writes, in practice it
all works out just fine and it never really is. You almost have to try
to screw up with something like

  (echo -n 3; echo -n 4) > /sys/xyz/abc

to actually see two writes of "3" and "4" instead of one write with
"34". And honestly, if somebody does something like that, do we really
care? They might get 3, they might get 4, and they might get 34. They
get what they deserve.

> Anyway, possibly more interesting question is why do we care about
> O_APPEND at all - why not treat it the same way we do in write()?

The whole point of O_APPEND is that the position shouldn't matter.

And the whole point of "pwrite()" is that you specify a position.

So the two just do not go together - although we may have legacy
issues, of course.

In contrast, the whole point of just a plain "write()" is that the
position is the "current file position", with O_APPEND is just a
special rule for what the current position for a write is.

Now, splice() is able to do *both* write() and pwrite(), because
unlike pwrite() it doesn't take a "pos" argument, it takes a _pointer_
to pos. So with a NULL pointer, it's like read/write, and with a
non-NULL pointer it is like pread/pwrite.

So I do think that "splice with non-NULL off_out and O_APPEND" should
cause an error in general.

That said, we probabyl have legacy behavior with splice and pipes in
particular, and that legacy behavior would override any "this is
conceptually the sane model".

> So... why do we ban O_APPEND on destination for splice() or for sendfile()?

sendfile() shouldn't be an issue. The offset pointer for sendfile is
for the _source_, not the destination.

For splice(), I do think that O_APPEND and a position pointer don't
make sense as a combination, although if we do allow it for regular
file pwrite() (I didn't check), maybe we could allow it for splice()
too just to be erqually inconsistent.

Honestly, I don't think it's a huge deal. O_APPEND isn't that
interesting, but I do hope that if we allow O_APPEND and a file
position, then O_APPEND always overrides it.

   Linus


Re: Splicing to/from a tty

2021-01-20 Thread Al Viro
On Wed, Jan 20, 2021 at 03:40:29PM -0800, Linus Torvalds wrote:
> On Wed, Jan 20, 2021 at 3:14 PM Al Viro  wrote:
> >
> > Umm...  Why do we clear FMODE_PWRITE there [seq_open - ed], anyway?
> 
> I think it's pointless and historical, and comes from "several /proc
> files supported the simple single-write model, nothing ever supported
> moving around and writing".
> 
> The seq_file stuff was always about reading, and then the writing part
> was generally random special-case hacks on the side.
> 
> So I think that "clear PWRITE" thing is to make sure we get sane error
> cases if somebody tries something funny, knowing that none of the
> hacky stuff support it.
> 
> And then the very special kernfs thing adds it back in, because it
> does in fact allow seeking writes.

OK...  I wonder how many debugfs writable files allow pwrite() with
BS results...

Anyway, possibly more interesting question is why do we care about
O_APPEND at all - why not treat it the same way we do in write()?
Hell, even our pwrite() just goes ahead and writes to the end of
file, whatever position it had been given.  Yes, for pwrite(2) that's
contrary to POSIX, but it's probably cast in stone by that point
anyway...

Looking through the instances of ->splice_write(), iter_file_splice_write()
will end up appending the data to EOF and so will gfs2_file_splice_write().
For sockets (generic_splice_sendpage()) we definitely don't give a toss
about O_APPEND (F_SETFL can set it, so that case is possible to hit),
ditto for splice_write_null() and port_fops_splice_write().  Which leaves
only one instance: fuse_dev_splice_write(), which also should ignore
O_APPEND (IMO fuse_dev_open() ought to call nonseekable_open() anyway).

So... why do we ban O_APPEND on destination for splice() or for sendfile()?
AFAICS, if we simply remove that test, we'll end up with write going to
the end of O_APPEND file. same as for write()/pwrite().

Comments?


Re: Splicing to/from a tty

2021-01-20 Thread Al Viro
On Wed, Jan 20, 2021 at 10:25:56PM +, David Laight wrote:

> I also wonder if pread/pwrite with offset == 0 should be valid
> on things where the offset makes no sense.
> 
> I'm rather surprised the offset isn't just silently ignored
> for devices where seeking is non-sensical.
> You might want to error it for mag tapes, but not pipes,
> ttys, sockets etc.
> 
> I really can't remember what SYSV, Solaris or NetBSD do.

... nor can you be arsed to RTFPOSIX.   Why am I not surprised?

In https://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html
(located by arcane action known as googling for pwrite POSIX):
==
The pwrite() function shall fail if:

[EINVAL]
The file is a regular file or block special file, and the offset 
argument is negative.
The file offset shall remain unchanged.
[ESPIPE]
The file is incapable of seeking.
==


Re: Splicing to/from a tty

2021-01-20 Thread Al Viro
On Wed, Jan 20, 2021 at 11:27:26AM -0800, Linus Torvalds wrote:
> On Wed, Jan 20, 2021 at 11:11 AM Al Viro  wrote:
> >
> > Why do we care about O_APPEND on anything without FMODE_PWRITE (including
> > pipes), anyway?  All writes there ignore position, after all...
> 
> We shouldn't care.
> 
> Also, I think we should try to move away from FMODE_PWRITE/PREAD
> entirely, and use FMODE_STREAM as the primary "this thing doesn't have
> a position at all".
> 
> That's what gets rid of all the f_pos locking etc after all. The
> FMODE_PWRITE/PREAD flags are I think legacy (although we do seem to
> have the seq_file case that normally allows position on reads, but not
> on writes, so we may need to keep all three bits).

Umm...  Why do we clear FMODE_PWRITE there, anyway?  It came in
commit 915a29ec1c5e34283a6231af1036114e4d612cb0
Author: Linus Torvalds 
Date:   Sat Aug 7 02:08:23 2004 -0700

Add pread/pwrite support bits to match the lseek bit.

This also removes the ESPIPE logic from pipes and seq_files,
since the VFS layer now supports it.

with seq_read() losing the special-cased pread prevention and
seq_open() getting a ban on both pread and pwrite.  With
pread() support added in 2009, and (pointless) pwrite prohibition
left in place.


RE: Splicing to/from a tty

2021-01-20 Thread David Laight
From: Linus Torvalds
> Sent: 20 January 2021 19:27
> On Wed, Jan 20, 2021 at 11:11 AM Al Viro  wrote:
> >
> > Why do we care about O_APPEND on anything without FMODE_PWRITE (including
> > pipes), anyway?  All writes there ignore position, after all...
> 
> We shouldn't care.
> 
> Also, I think we should try to move away from FMODE_PWRITE/PREAD
> entirely, and use FMODE_STREAM as the primary "this thing doesn't have
> a position at all".
> 
> That's what gets rid of all the f_pos locking etc after all. The
> FMODE_PWRITE/PREAD flags are I think legacy (although we do seem to
> have the seq_file case that normally allows position on reads, but not
> on writes, so we may need to keep all three bits).
> 
> Anyway, I think that with FMODE_STREAM, O_APPEND definitely should be a no-op.

I also wonder if pread/pwrite with offset == 0 should be valid
on things where the offset makes no sense.

I'm rather surprised the offset isn't just silently ignored
for devices where seeking is non-sensical.
You might want to error it for mag tapes, but not pipes,
ttys, sockets etc.

I really can't remember what SYSV, Solaris or NetBSD do.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)


Re: Splicing to/from a tty

2021-01-20 Thread Linus Torvalds
On Wed, Jan 20, 2021 at 11:11 AM Al Viro  wrote:
>
> Why do we care about O_APPEND on anything without FMODE_PWRITE (including
> pipes), anyway?  All writes there ignore position, after all...

We shouldn't care.

Also, I think we should try to move away from FMODE_PWRITE/PREAD
entirely, and use FMODE_STREAM as the primary "this thing doesn't have
a position at all".

That's what gets rid of all the f_pos locking etc after all. The
FMODE_PWRITE/PREAD flags are I think legacy (although we do seem to
have the seq_file case that normally allows position on reads, but not
on writes, so we may need to keep all three bits).

Anyway, I think that with FMODE_STREAM, O_APPEND definitely should be a no-op.

Linus


Re: Splicing to/from a tty

2021-01-20 Thread Al Viro
On Wed, Jan 20, 2021 at 04:26:08PM +, Al Viro wrote:

> [1] yes, it is possible to have O_APPEND opened pipes - open a FIFO with
> O_APPEND and you've got it.  We are not quite consistent in handling
> those - sendfile() to such is rejected, splice() is not.

BTW, according to manpages of splice(2) and sendfile(2), we have
   EINVAL [snip] target file is opened in append mode [snip]
and
   EINVAL out_fd has the O_APPEND flag set.  This is not currently 
supported by sendfile().

However, splice(2) to FIFO opened with O_APPEND works just fine.  So
it doesn't match the manpage either.

Why do we care about O_APPEND on anything without FMODE_PWRITE (including
pipes), anyway?  All writes there ignore position, after all...


Re: Splicing to/from a tty

2021-01-20 Thread Al Viro
On Mon, Jan 18, 2021 at 07:54:00PM +, Al Viro wrote:
> On Mon, Jan 18, 2021 at 11:46:42AM -0800, Linus Torvalds wrote:
> > On Mon, Jan 18, 2021 at 11:35 AM Al Viro  wrote:
> > >
> > > I'd rather have sendfile(2) do what splice(2) does and handle pipes
> > > directly.  Let me take a look,,,
> > 
> > It's not technically just the sendfile() thing. Some things do
> > sendfile "internally" (ie they use do_splice_direct()).
> > 
> > Although maybe they always happen just on non-pipe destinations (ie
> > file-to-file copy). I didn't check.
> > 
> > But particularly if it can be handled in do_splice_direct(), that
> > would be a good thing.
> 
> FWIW, it might make sense to merge do_splice_direct() and do_splice();
> interfaces are very similar and do_splice() is basically
>   if both are pipes
>   
>   else if input is pipe
>   
>   else if output is pipe
>   
>   else
>   return -EINVAL
> with do_splice_direct() being fairly close to the missing case.

Yecchhh...  The situation with checks is interesting.
do_splice():
in needs FMODE_READ, out - FMODE_WRITE [EBADF]
if in is a pipe, off_in must be NULL.  [ESPIPE]
if out is a pipe, off_out must be NULL.  [ESPIPE]
if in is not a pipe
non-NULL off_in is allowed only with FMODE_PREAD 
[EINVAL]
rw_verify_area done on in
if out is not a pipe
non-NULL off_out is allowed only with FMODE_PWRITE 
[EINVAL]
no O_APPEND on out [EINVAL]
rw_verify_area done on out
either in or out must be a pipe [EINVAL]
do_splice_direct():
out needs FMODE_WRITE [EBADF]
rw_verify_area done on out
no O_APPEND on out [EINVAL]
Callers:
__do_splice() -> do_splice():
if in is a pipe, off_in must be NULL.  [ESPIPE, duplicate]
if out is a pipe, off_out must be NULL.  [ESPIPE, duplicate]

io_uring io_splice() -> do_splice():
no extra checks

do_sendfile() -> do_splice_direct():
in needs FMODE_READ [EBADF]
out needs FMODE_WRITE [EBADF, duplicate]
non-NULL off_in is allowed only with FMODE_PREAD [EINVAL]
for out we act as splice with NULL off_out (== use ->f_pos)
rw_verify_area done on in
rw_verify_area done on out [duplicate]

vfs_copy_file_range() -> do_copy_file_range() ->
-> generic_copy_file_range() or __ceph_copy_file_range() ->
-> do_splice_direct():
both in and out must be regular [EINVAL or EISDIR]
in needs FMODE_READ [EBADF]
out needs FMODE_WRITE [EBADF]
no O_APPEND on out [EBADF]
on immutable for out [EPERM]
rw_verify_area done on in
rw_verify_area done on out [duplicate]
FMODE_P{READ,WRITE} is ignored.  I wonder what happens if
somebody starts playing with copy_file_range(2) on
e.g. procfs or sysfs...

ovl_copy_up_date() -> do_splice_direct():
both in and out are hopefully regular - I'm not sure that copyup
logics is sufficiently protected, TBH.
FMODE_READ on in and FMODE_WRITE on out are guaranteed
lack of O_APPEND on out is guaranteed
FMODE_P{READ,WRITE} is completely ignored.  Should be OK, modulo
the same issues with protection against manipulation of
layers.

Checks in vfs_copy_file_range() need to remain there - do_splice_direct() is
not guaranteed downstream of that.

AFAICS, we have the following:
in must have FMODE_READ
out must have FMODE_WRITE
no O_APPEND on out, unless a pipe[1]
for non-pipe out we do rw_verify_area
for non-pipe in we do rw_verify_area
no offsets for pipes

The lack of offsets for pipes should've been "not without FMODE_P{READ,WRITE}",
but that's not entirely consistent - we have splice(2) fail with EINVAL in
case of non-NULL off_in on a file that doesn't allow pread(), except that
when this file happens to be a pipe (which obviously doesn't allow pread())
we fail with ESPIPE instead.  Note that for pread(2)/pwrite(2)/sendfile(2)
we fail with consistent ESPIPE in all such cases.

We also do not check FMODE_PREAD/FMODE_PWRITE for copy_file_range()/copyup.
Which is probably wrong.

Another inconsistency is that rw_verify_area() in copyup is done for
output (once for each chunk), but not for input.  The tricky part here
is that there's an LSM hook in the end of rw_verify_area()...

Can we turn "not a pipe, but lacks FMODE_PREAD" error value for 

Re: Splicing to/from a tty

2021-01-20 Thread Oliver Giles
On Wed Jan 20, 2021 at 5:44 PM NZDT, Linus Torvalds wrote:
>
> But in the meantime, here's two more patches to try on top of the
> previous four. They shouldn't matter, other than making the non-icanon
> throughput a lot better. But the more coverage they get, the happier
> I'll be.
>

I tried these out too, my use case is still working well.


Re: Splicing to/from a tty

2021-01-19 Thread Linus Torvalds
On Tue, Jan 19, 2021 at 5:29 PM Oliver Giles  wrote:
>
> Writing this from a kernel with those patches in; happily splice()ing
> to and from a pty.

Ok, good.

I have a couple of improvement patches on top of those, that I'm attaching here.

The first four patches worked fine for me (and apparently for you
too), but due to the small buffer, the regular N_TTY case for tty
read() calls are now limited to 64 bytes each.

That is unfortunate for performance, but it might also cause some
actual breakage: anybody doing "read()" calls on a tty file descriptor
_should_ be perfectly fine with short reads since they happen for
signals etc, but I could well imagine that not everybody is.

And that new kernel buffer interface was _designed_ to allow stitching
small buffers together efficiently (since the hdlc case needed it), so
this implements that for the non-icanon case for n_tty too.

I wasted an embarrasing amount of time today on that final patch - I
spent something like 6 hours chasing a truly stupid one-liner bug I
had initially, and couldn't see what was wrong.

Which is why this only does the non-icanon case, because after I
finally had my "Duh!" moment, I was so annoyed with it that I had to
just walk away.

I'll come back to this tomorrow and do the line-buffered icanon case
too (unless pull requests pile up), and then I'll be happy with the
tty changes, and I think I can submit this series for real to Greg.

But in the meantime, here's two more patches to try on top of the
previous four. They shouldn't matter, other than making the non-icanon
throughput a lot better. But the more coverage they get, the happier
I'll be.

   Linus
From b12a1652c91648e96ae11946f7489515dd063789 Mon Sep 17 00:00:00 2001
From: Linus Torvalds 
Date: Tue, 19 Jan 2021 13:46:28 -0800
Subject: [PATCH 5/6] tty: clean up legacy leftovers from n_tty line discipline

Back when the line disciplines did their own direct user accesses, they
had to deal with the data copy possibly failing in the middle.

Now that the user copy is done by the tty_io.c code, that failure case
no longer exists.

Remove the left-over error handling code that cannot trigger.

Signed-off-by: Linus Torvalds 
---
 drivers/tty/n_tty.c | 16 
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c
index 2f2f57a53968..a02fe661f617 100644
--- a/drivers/tty/n_tty.c
+++ b/drivers/tty/n_tty.c
@@ -1957,19 +1957,17 @@ static inline int input_available_p(struct tty_struct *tty, int poll)
  *		read_tail published
  */
 
-static int copy_from_read_buf(struct tty_struct *tty,
+static void copy_from_read_buf(struct tty_struct *tty,
   unsigned char **kbp,
   size_t *nr)
 
 {
 	struct n_tty_data *ldata = tty->disc_data;
-	int retval;
 	size_t n;
 	bool is_eof;
 	size_t head = smp_load_acquire(>commit_head);
 	size_t tail = ldata->read_tail & (N_TTY_BUF_SIZE - 1);
 
-	retval = 0;
 	n = min(head - ldata->read_tail, N_TTY_BUF_SIZE - tail);
 	n = min(*nr, n);
 	if (n) {
@@ -1986,7 +1984,6 @@ static int copy_from_read_buf(struct tty_struct *tty,
 		*kbp += n;
 		*nr -= n;
 	}
-	return retval;
 }
 
 /**
@@ -2228,20 +2225,15 @@ static ssize_t n_tty_read(struct tty_struct *tty, struct file *file,
 			if (retval)
 break;
 		} else {
-			int uncopied;
-
 			/* Deal with packet mode. */
 			if (packet && kb == kbuf) {
 *kb++ = TIOCPKT_DATA;
 nr--;
 			}
 
-			uncopied = copy_from_read_buf(tty, , );
-			uncopied += copy_from_read_buf(tty, , );
-			if (uncopied) {
-retval = -EFAULT;
-break;
-			}
+			/* See comment above copy_from_read_buf() why twice */
+			copy_from_read_buf(tty, , );
+			copy_from_read_buf(tty, , );
 		}
 
 		n_tty_check_unthrottle(tty);
-- 
2.29.2.157.g1d47791a39

From e724cc9c4b101a4de1a56bcca6b5ec1d3493b173 Mon Sep 17 00:00:00 2001
From: Linus Torvalds 
Date: Tue, 19 Jan 2021 18:14:20 -0800
Subject: [PATCH 6/6] tty: teach n_tty line discipline about the new "cookie
 continuations"

With the conversion to do the tty ldisc read operations in small chunks,
the n_tty line discipline became noticeably slower for throughput
oriented loads, because rather than read things in up to 2kB chunks, it
would return at most 64 bytes per read() system call.

The cost is mainly all in the "do system calls over and over", not
really in the new "copy to an extra kernel buffer".

This can be fixed by teaching the n_tty line discipline about the
"cookie continuation" model, which the chunking code supports because
things like hdlc need to be able to handle packets up to 64kB in size.

Doing that doesn't just get us back to the old performace, but to much
better performance: my stupid "copy 10MB of data over a pty" test
program is now almost twice as fast as it used to be (going down from
0.1s to 0.054s).

This is entirely because it now creates maximal chunks (which happens to
be "one byte less than one page" due to how we do the circular tty
buffers).

NOTE! This case only handles 

Re: Splicing to/from a tty

2021-01-19 Thread Oliver Giles
On Wed Jan 20, 2021 at 9:24 AM NZDT, Linus Torvalds wrote:
> Anyway, anybody willing to test these tty/pipe patches on the loads
> that failed? Oliver?

Writing this from a kernel with those patches in; happily splice()ing
to and from a pty.


Re: Splicing to/from a tty

2021-01-19 Thread Oliver Giles
On Wed Jan 20, 2021 at 5:56 AM NZDT, Robert Karszniewicz wrote:
>
> I have bisected this issue down to this commit:
> 4d03e3cc5982 ("fs: don't allow kernel reads and writes without iter
> ops")
>
> Another case I've also noticed is writing to a serial connection:
> kernel write not supported for file /ttymxc0 (pid: 252 comm: cat)
>

Tangentially, I hit the same thing when hacking on this. Here's a diff
making the implementation match the comment:

--- a/fs/read_write.c
+++ b/fs/read_write.c
@@ -541,7 +541,7 @@ ssize_t __kernel_write(struct file *file, const void *buf, 
size_t count, loff_t
 * Also fail if ->write_iter and ->write are both wired up as that
 * implies very convoluted semantics.
 */
-   if (unlikely(!file->f_op->write_iter || file->f_op->write))
+   if (unlikely(file->f_op->write_iter && file->f_op->write))
return warn_unsupported(file, "write");
 
init_sync_kiocb(, file);



Re: Splicing to/from a tty

2021-01-19 Thread Christoph Hellwig
On Tue, Jan 19, 2021 at 12:24:22PM -0800, Linus Torvalds wrote:
> But that sysfs binary file case really isn't interesting, and just
> more of a "Christoph broke it, I think he should just fix it".
> Christoph?

I'll give it a spin tomorrow.  Should be simple enough.


Re: Splicing to/from a tty

2021-01-19 Thread Linus Torvalds
On Tue, Jan 19, 2021 at 3:54 AM Greg Kroah-Hartman
 wrote:
>
> This looks sane, but I'm still missing what the goal of this is here.
> It's nice from a "don't make the ldisc do the userspace copy", point of
> view, but what is the next step in order to tie that into splice?

Ok, so here's a series of four patches that make ttys possible sources
and destinations for splice() again.

Well, the first patch is just the pipe one for sendfile() - and it's
the hacky one-liner, not the proper one that Al will hopefully add.

NOTE! I've signed off on these, because I think they are fine for
testing - but they are really meant for testing ONLY.

I'm running a kernel with these in place, so they kind of work. And
yes, I verified that sendfile() now works with a pipe or tty target. I
didn't actually check splicing _from_ a tty, nor did I check that
readv/writev now works properly, but it all LoosGood(tm) to me.

HOWEVER.

The reason these are for testing only is that

 (a) my tests are pretty limited, and I'd like the actual people who
reported this to really test them out

 (b) the new read iterator model is going to be quite horribly slow
for big pty transfers because the n_tty ldisc isn't doing the cookie
batching

 (c) I really really want Al to take a look at that iov_iter_revert()
thing in do_tty_write() (in "[PATCH 2/4] tty: implement write_iter")

Note that I'm more than happy to do (b) as a patch on top of this, but
I'd like (a) and (c) to be clarified before I do that.

> I ask as I also have reports that sysfs binary files are now failing for
> this same reason, so I need to make the same change for them and it's
> not excatly obvious what to do:

Ok, so that would require those kernfs_fop_{read,write}() functions to
also be converted to read_iter/write_iter.

That doesn't look horrendous: it's not all that dissimilar from the
two patches to do that for tty's ("tty: implement {read,write}_iter").
The seq_file part already has a iter version for reading, and the main
change to kernfs_file_direct_read() and kernfs_fop_write() is to do
that

 (a) change the arguments from file/buf/count/ppos to kiocb/iov_iter

 (b) change the copy_to/from_user() calls to copy_to/from_iter()

Note that (b) involves changing the semantics of the return value:
"copy_to/from_user()" returns the number of bytes that were *NOT*
copied, while "copy_to/from_iter()" returns the number of bytes that
WERE copied.

So the error case check does from

if (copy_to/from_user()) **ERROR**

to

if (copy_to/from_iter(n) != n) **ERROR**

but that is fairly straightforward.

The two "tty: implement write/read_iter" patches (patch 2 and 4) can
be used as examples. That said, I want to again stress that they
haven't seen all that much testing, and I do want Al to spray his holy
penguin pee on that iov_iter_revert() thing in patch 2.

I'm honestly not that motivated on those sysfs files: the tty layer
was an interesting test-case that I wanted to look at just because the
conversion to kernel pointers was nontrivial for the read side.

But that sysfs binary file case really isn't interesting, and just
more of a "Christoph broke it, I think he should just fix it".
Christoph?

Anyway, anybody willing to test these tty/pipe patches on the loads
that failed? Oliver?

   Linus
From 95713b6e8b2247c55dd0a04174a55ea9a7fde7f6 Mon Sep 17 00:00:00 2001
From: Linus Torvalds 
Date: Tue, 19 Jan 2021 09:26:15 -0800
Subject: [PATCH 1/4] pipe: allow sendfile() destination with splice_write

Note that Al Viro is 100% right that this isn't needed for regular
splicing (that treats pipes specially, since pipes _are_ the splice
buffers).

So the correct thing to do is to teach do_splice_direct() the same "hey,
it's already a pipe", and fix sendfile() with a pipe destination that way.

But this is the one-liner "make it work" thing, rather than the "do it
properly" thing that Al will hopefully do.

Fixes: 36e2c7421f02 ("fs: don't allow splice read/write without explicit ops")
Reported-by: Johannes Berg 
Cc: Al Viro 
Cc: Christoph Hellwig 
Signed-off-by: Linus Torvalds 
---
 fs/pipe.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/fs/pipe.c b/fs/pipe.c
index c5989cfd564d..39c96845a72f 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -1206,6 +1206,7 @@ const struct file_operations pipefifo_fops = {
 	.unlocked_ioctl	= pipe_ioctl,
 	.release	= pipe_release,
 	.fasync		= pipe_fasync,
+	.splice_write	= iter_file_splice_write,
 };
 
 /*
-- 
2.29.2.157.g1d47791a39

From 0dce8c5ef15f0aa7b4525721b86a20b7c4df8ca0 Mon Sep 17 00:00:00 2001
From: Linus Torvalds 
Date: Tue, 19 Jan 2021 11:41:16 -0800
Subject: [PATCH 2/4] tty: implement write_iter

This makes the tty layer use the .write_iter() function instead of the
traditional .write() functionality.

That allows writev(), but more importantly also makes it possible to
enable .splice_write() for ttys, reinstating the "splice to tty"
functionality that was lost in commit 36e2c7421f02 ("fs: don't allow
splice 

Re: Splicing to/from a tty

2021-01-19 Thread Linus Torvalds
On Tue, Jan 19, 2021 at 3:54 AM Greg Kroah-Hartman
 wrote:
>
> This looks sane, but I'm still missing what the goal of this is here.
> It's nice from a "don't make the ldisc do the userspace copy", point of
> view, but what is the next step in order to tie that into splice?

I'll cook something up. With this, it should be fairly easy to add
both the splice and iov_iter versions, because now it only needs the
wrappers in tty_io.c, not for each line discipline.

I hope. Let's see..

   Linus


Re: Splicing to/from a tty

2021-01-19 Thread Robert Karszniewicz
On 1/19/21 12:53 PM, Greg Kroah-Hartman wrote:
> This looks sane, but I'm still missing what the goal of this is here.
> It's nice from a "don't make the ldisc do the userspace copy", point of
> view, but what is the next step in order to tie that into splice?
> 
> I ask as I also have reports that sysfs binary files are now failing for
> this same reason, so I need to make the same change for them and it's
> not excatly obvious what to do:
>   
> https://lore.kernel.org/r/1adf9aa4-ed7e-8f05-a354-57419d61e...@codeaurora.org

I would like to confirm this. We are using firmwared and it returns EINVAL on
sendfile(), too. I have tried setting the .splice_write callback as in the
linked thread, but it didn't help, it just EINVAL'd in a different place.

I have bisected this issue down to this commit:
4d03e3cc5982 ("fs: don't allow kernel reads and writes without iter ops")

Another case I've also noticed is writing to a serial connection:
kernel write not supported for file /ttymxc0 (pid: 252 comm: cat)

(Which still prints, though, because cat falls back to write(2), I suppose.)

Thank you,
Robert


Re: Splicing to/from a tty

2021-01-19 Thread Robert Karszniewicz
On 1/19/21 5:56 PM, Robert Karszniewicz wrote:
> Another case I've also noticed is writing to a serial connection:
> kernel write not supported for file /ttymxc0 (pid: 252 comm: cat)

Which is actually a TTY and I just failed to see the forest for all the trees...

Cheers.


Re: Splicing to/from a tty

2021-01-19 Thread Greg Kroah-Hartman
On Mon, Jan 18, 2021 at 05:38:55PM -0800, Linus Torvalds wrote:
> On Mon, Jan 18, 2021 at 2:20 PM Linus Torvalds
>  wrote:
> >
> > So it's not a "real" patch, but with improved buffer handling in
> > tty_read(), I think this is actually quite close.
> 
> Hmm.
> 
> I somehow ended up working on this all because it's a Monday, and I
> don't see a lot of pull requests early in the week.
> 
> And I think I have a solution for the HDLC "we may need to copy a
> packet that might be up to 64kB" issue, that isn't really all that
> ugly.
> 
> We can just iterate over a random "cookie" that the line discipline
> can use any way it wants to. In the case of n_hdlc, it can just put
> the 'rbuf' thing it has into that cookie, and then it can copy it all
> piece-meal until it is all used up. And if it runs out of space in the
> middle, it will return -EOVERFLOW, and we're all good.
> 
> The only other thing such a line discipline needs is the offset into
> the cookie, but the iterator has to maintain that anyway, so that's
> simple enough.
> 
> So here's a fourth patch for this thing today, this time with what I
> think is actually a working model for the buffer handling.
> 
> Other line disciplines *could* use the cookie if they want to. I
> didn't do any of that, though.
> 
> The normal n_tty line discipline, for example, could easily just loop
> over the data. It doesn't need an offset or that 'rbuf' pointer, but
> it still needs to know whether the call is the first one or not,
> because the first time the n_tty line discipline is called it may need
> to wait for a minimum number of characters or whatever the termios
> settings say - but obviously once it has waited for it once, it
> shouldn't wait for it again the next time around (only on the next
> actual full read()). IOW, it would be wrong if the termios said "wait
> for 5 characters", and then it saw 68 characters, copied the first 64,
> in the first iteration, and than saw "oh, now there are only 4
> characters left so now I have to wait for a fifth".
> 
> So n_tty could use the cookie purely to see whether it's the first
> iteration or not, and it could just set the cookie to a random value
> (it always starts out as NULL) to just show what state it is in.
> 
> I did *NOT* do that, because it's not technically necessary - unlike
> the hdlc packet case, n_tty returning a partial result is not wrong
> per se even if we might have reasons to improve on it later.
> 
> What do people think about this?
> 
> Also, does anybody have any test-code for the HDLC case? I did find an
> interesting comment when doing a Debian code search:
> 
>   /* Bloody hell... readv doesn't work with N_HDLC line discipline... GRR! */
> 
> and yes, this model would allow us to handle readv() properly for hdlc
> (and no, the old one did not, because it really wanted to see the
> whole packet in *one* user buffer).
> 
> But I have no idea if hdlc is even relevant any more, and if anybody
> really cares.
> 
> Anybody?


This looks sane, but I'm still missing what the goal of this is here.
It's nice from a "don't make the ldisc do the userspace copy", point of
view, but what is the next step in order to tie that into splice?

I ask as I also have reports that sysfs binary files are now failing for
this same reason, so I need to make the same change for them and it's
not excatly obvious what to do:

https://lore.kernel.org/r/1adf9aa4-ed7e-8f05-a354-57419d61e...@codeaurora.org

thanks,

greg k-h


Re: Splicing to/from a tty

2021-01-19 Thread Greg Kroah-Hartman
On Mon, Jan 18, 2021 at 01:35:15PM -0800, Linus Torvalds wrote:
> On Mon, Jan 18, 2021 at 12:24 PM Linus Torvalds
>  wrote:
> >
> > Anybody want to play with this? I'd suggest keeping that "dummy"
> > parameter around for a while - I did an allmodconfig build with this,
> > but if there are any architecture-specific non-x86-64 cases I wouldn't
> > have seen them.
> 
> Not architecture-specific, but I did find by some grepping that I
> missed one line discipline driver: the Siemens R3964.
> 
> The reason I missed that is because it's been marked BROKEN in the
> Kconfig for almost two years, so it didn't show up in my allmodconfig
> coverage.

I need to just delete that thing now, obviously no one uses it anymore,
sorry for it getting in the way...

greg k-h


Re: Splicing to/from a tty

2021-01-18 Thread Linus Torvalds
On Mon, Jan 18, 2021 at 2:20 PM Linus Torvalds
 wrote:
>
> So it's not a "real" patch, but with improved buffer handling in
> tty_read(), I think this is actually quite close.

Hmm.

I somehow ended up working on this all because it's a Monday, and I
don't see a lot of pull requests early in the week.

And I think I have a solution for the HDLC "we may need to copy a
packet that might be up to 64kB" issue, that isn't really all that
ugly.

We can just iterate over a random "cookie" that the line discipline
can use any way it wants to. In the case of n_hdlc, it can just put
the 'rbuf' thing it has into that cookie, and then it can copy it all
piece-meal until it is all used up. And if it runs out of space in the
middle, it will return -EOVERFLOW, and we're all good.

The only other thing such a line discipline needs is the offset into
the cookie, but the iterator has to maintain that anyway, so that's
simple enough.

So here's a fourth patch for this thing today, this time with what I
think is actually a working model for the buffer handling.

Other line disciplines *could* use the cookie if they want to. I
didn't do any of that, though.

The normal n_tty line discipline, for example, could easily just loop
over the data. It doesn't need an offset or that 'rbuf' pointer, but
it still needs to know whether the call is the first one or not,
because the first time the n_tty line discipline is called it may need
to wait for a minimum number of characters or whatever the termios
settings say - but obviously once it has waited for it once, it
shouldn't wait for it again the next time around (only on the next
actual full read()). IOW, it would be wrong if the termios said "wait
for 5 characters", and then it saw 68 characters, copied the first 64,
in the first iteration, and than saw "oh, now there are only 4
characters left so now I have to wait for a fifth".

So n_tty could use the cookie purely to see whether it's the first
iteration or not, and it could just set the cookie to a random value
(it always starts out as NULL) to just show what state it is in.

I did *NOT* do that, because it's not technically necessary - unlike
the hdlc packet case, n_tty returning a partial result is not wrong
per se even if we might have reasons to improve on it later.

What do people think about this?

Also, does anybody have any test-code for the HDLC case? I did find an
interesting comment when doing a Debian code search:

  /* Bloody hell... readv doesn't work with N_HDLC line discipline... GRR! */

and yes, this model would allow us to handle readv() properly for hdlc
(and no, the old one did not, because it really wanted to see the
whole packet in *one* user buffer).

But I have no idea if hdlc is even relevant any more, and if anybody
really cares.

Anybody?

  Linus
From 526258e18d13ebb113638d26eba18ca05a51dc2a Mon Sep 17 00:00:00 2001
From: Linus Torvalds 
Date: Mon, 18 Jan 2021 13:31:30 -0800
Subject: [PATCH] tty: convert tty_ldisc_ops 'read()' function to take a kernel
 pointer

The tty line discipline .read() function was passed the final user
pointer destination as an argument, which doesn't match the 'write()'
function, and makes it very inconvenient to do a splice method for
tty's.

This is a conversion to use a kernel buffer instead.

NOTE! It does this by passing the tty line discipline ->read() function
an additional "cookie" to fill in, and an offset into the cookie data.

The line discipline can fill in the cookie data with its own private
information, and then the reader will repeat the read until either the
cookie is cleared or it runs out of data.

The only real user of this is N_HDLC, which can use this to handle big
packets, even if the kernel buffer is smaller than the whole packet.

Signed-off-by: Linus Torvalds 
---
 drivers/bluetooth/hci_ldisc.c | 34 +++
 drivers/input/serio/serport.c |  4 +-
 drivers/net/ppp/ppp_async.c   |  3 +-
 drivers/net/ppp/ppp_synctty.c |  3 +-
 drivers/tty/n_gsm.c   |  3 +-
 drivers/tty/n_hdlc.c  | 60 +
 drivers/tty/n_null.c  |  3 +-
 drivers/tty/n_r3964.c | 10 ++---
 drivers/tty/n_tracerouter.c   |  4 +-
 drivers/tty/n_tracesink.c |  4 +-
 drivers/tty/n_tty.c   | 82 +++
 drivers/tty/tty_io.c  | 64 +--
 include/linux/tty_ldisc.h |  3 +-
 net/nfc/nci/uart.c|  3 +-
 14 files changed, 178 insertions(+), 102 deletions(-)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index f83d67eafc9f..dd92aea15b8b 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -802,7 +802,8 @@ static int hci_uart_tty_ioctl(struct tty_struct *tty, struct file *file,
  * We don't provide read/write/poll interface for user space.
  */
 static ssize_t hci_uart_tty_read(struct tty_struct *tty, struct file *file,
- unsigned char __user *buf, size_t nr)
+ unsigned char *buf, 

Re: Splicing to/from a tty

2021-01-18 Thread Linus Torvalds
On Mon, Jan 18, 2021 at 2:03 PM Linus Torvalds
 wrote:
>
> I'll have a third patch in a moment, but while it's ready I want to
> actually reboot and confirm it first.

Here. This is an actually usable and tested starting point for this all.

Again - it doesn't handle HDLC correctly, because it will chop all
reads up into 32-byte chunks.

And some other apps that think they'll get whole lines might end up
being confused too.

So it's not a "real" patch, but with improved buffer handling in
tty_read(), I think this is actually quite close.

NOTE: due to security reasons (ie password data), we do want to clear
that buffer after we've copied the data to user space. Again, not
something I did in my patch, so it would be part of that "improved
buffer handling"

  Linus


patch
Description: Binary data


Re: Splicing to/from a tty

2021-01-18 Thread Linus Torvalds
On Mon, Jan 18, 2021 at 1:54 PM Linus Torvalds
 wrote:
>
> But it might well be some other conversion bug of mine even if I tried
> to keep it fairly minimal and straight-forward.

Duh. I completely forgot to handle the canon_copy_from_read_buf()
case. So ICANON mode was entirely scrogged and would just return
-EFAULT.

That would do it. I'm surprised how well everything I did actually
worked - because all my normal terminal apps (shell, editor etc)
obviously end up not using icanon at all.

I'll have a third patch in a moment, but while it's ready I want to
actually reboot and confirm it first.

 Linus


Re: Splicing to/from a tty

2021-01-18 Thread Linus Torvalds
On Mon, Jan 18, 2021 at 1:35 PM Linus Torvalds
 wrote:
>
> So here's a slightly updated version of that patch, but apart from
> slightly better coverage - even if it's a driver that is disabled
> anyway - I'd like to point out that all my previous caveats still
> apply.

I have now booted that version to see that I didn't make any horribly
obvious mistakes.

And I must have screwed something up. I can actually use the machine
normally (I'm writing this running that kernel), but when I decided to
test the actual virtual console (as opposed to the GUI terminal
windows that use pty's), I can't even log in.

That *may* just be due to the inexcusably lazy and stupid "chunk
things up into 32 byte pieces" I did. Most programs shouldn't care,
tty's can return partial results anyway, but it's obviously a fairly
fundamental and silly change.

But it might well be some other conversion bug of mine even if I tried
to keep it fairly minimal and straight-forward.

So I suggest taking that patch with a lot of salt, and really treating
it as a very rough starting point (which was the point of it - trying
to actually boot it as-is was more of a "let's see if it survives at
all" thing).

Linus


Re: Splicing to/from a tty

2021-01-18 Thread Linus Torvalds
On Mon, Jan 18, 2021 at 12:24 PM Linus Torvalds
 wrote:
>
> Anybody want to play with this? I'd suggest keeping that "dummy"
> parameter around for a while - I did an allmodconfig build with this,
> but if there are any architecture-specific non-x86-64 cases I wouldn't
> have seen them.

Not architecture-specific, but I did find by some grepping that I
missed one line discipline driver: the Siemens R3964.

The reason I missed that is because it's been marked BROKEN in the
Kconfig for almost two years, so it didn't show up in my allmodconfig
coverage.

But the fact that I found it makes me a bit happier about my patch
actually covering all the cases.

My grep exercise failed on the bluetooth hci_ldisc.c pattern, which I
feel is because the bluetooth code used the wrong pattern to
initialize the 'struct tty_ldisc_ops', so I fixed that up too.

So here's a slightly updated version of that patch, but apart from
slightly better coverage - even if it's a driver that is disabled
anyway - I'd like to point out that all my previous caveats still
apply.

  Linus


patch
Description: Binary data


Re: Splicing to/from a tty

2021-01-18 Thread Al Viro
On Mon, Jan 18, 2021 at 09:53:11AM +0100, Christoph Hellwig wrote:
> On Sat, Jan 16, 2021 at 05:46:33PM +0100, Johannes Berg wrote:
> > > For my case, I attempted to instead implement splice_write and
> > > splice_read in tty_fops; I managed to get splice_write working calling
> > > ld->ops->write, but splice_read is not so simple because the
> > > tty_ldisc_ops read method expects a userspace buffer. So I cannot see
> > > how to implement this without either (a) using set_fs, or (b)
> > > implementing iter ops on all line disciplines.
> > > 
> > > Is splice()ing between a tty and a pipe worth supporting at all? Not a
> > > big deal for my use case at least, but it used to work.
> > 
> > Is it even strictly related to the tty?
> > 
> > I was just now looking into why my cgit/fcgi/nginx setup no longer
> > works, and the reason is getting -EINVAL from sendfile() when the input
> > is a file and the output is a pipe().
> 
> Yes, pipes do not support ->splice_write currenly.   I think just wiring
> up iter_file_splice_write would work.  Al?

I'd rather have sendfile(2) do what splice(2) does and handle pipes
directly.  Let me take a look,,,


Re: Splicing to/from a tty

2021-01-18 Thread Linus Torvalds
On Mon, Jan 18, 2021 at 11:36 AM Linus Torvalds
 wrote:
>
> Of course, it probably would be really nice to try to convert
> tty_read() to use the same model that we have for tty_write(), and
> then make the ld->ops->read() function actually take a kernel buffer
> instead.
>
> I wonder how painful that would be.

Oh, that seems _much_ less painful than I thought it would be.

This is a COMPLETELY BROKEN patch that builds cleanly for me. I say
that it's completely broken for three reasons:

 (a) that extra "int dummy" argument is there purely to force build
errors if some user hasn't been converted.

 (b) I intentionally made the conversion truly stupid. It's not
"broken", but it needs cleanup for nasty variable names (ie things
like me changing that already badly named "b" variable to "kbp" to
again catch all users at build time)

 (c) the buffer handling in tty_read() is actively broken. Things like
HDLC rely on getting a proper buffer that can hold the whole packet,
and the max packet size is actually potentially quite big.

But the only real problem does seem to be HDLC, which has a default
maximum frame size of 4k, but it can be set all the way up to 64kB as
a module parameter.

So that tty_read() conversion in this patch is complete garbage, but I
really just wrote this to see how many painful places there are. And
there are not many.

The HDLC case could probably be done fairly well by

 - have a small on-stack buffer for the "small read" case

 - have a kmalloc() buffer that defaults to one page for bigger reads

 - grow the kmalloc() buffer if the ->read function returns -EOVERFLOW

Comments?

NOTE! This does *not* do the actual splice_read() implementation. No,
this literally is just the preparatory stage for that by starting the
whole "make the tty ldisc read path use a kernel buffer instead".

Anybody want to play with this? I'd suggest keeping that "dummy"
parameter around for a while - I did an allmodconfig build with this,
but if there are any architecture-specific non-x86-64 cases I wouldn't
have seen them.

 Linus


patch
Description: Binary data


Re: Splicing to/from a tty

2021-01-18 Thread Al Viro
On Mon, Jan 18, 2021 at 11:49:53AM -0800, Linus Torvalds wrote:
> On Mon, Jan 18, 2021 at 11:45 AM Al Viro  wrote:
> >
> > do_splice_direct() does something that do_splice() won't - it
> > handles non-pipe to non-pipe case.  Which is how sendfile(2) is
> > normally used, of course.
> 
> Yeah, I agree, it's better if we do the pipe case specially, but if
> it's too painful for a stable backport I'll certainly take the
> one-liner..
> 
> Btw, your email setup is broken. Your emails now have a "From:" line like 
> this:
> 
> From: Al Viro 
> 
> and that is not a valid email address.

Just noticed, should be fixed now...


Re: Splicing to/from a tty

2021-01-18 Thread Al Viro
On Mon, Jan 18, 2021 at 11:46:42AM -0800, Linus Torvalds wrote:
> On Mon, Jan 18, 2021 at 11:35 AM Al Viro  wrote:
> >
> > I'd rather have sendfile(2) do what splice(2) does and handle pipes
> > directly.  Let me take a look,,,
> 
> It's not technically just the sendfile() thing. Some things do
> sendfile "internally" (ie they use do_splice_direct()).
> 
> Although maybe they always happen just on non-pipe destinations (ie
> file-to-file copy). I didn't check.
> 
> But particularly if it can be handled in do_splice_direct(), that
> would be a good thing.

FWIW, it might make sense to merge do_splice_direct() and do_splice();
interfaces are very similar and do_splice() is basically
if both are pipes

else if input is pipe

else if output is pipe

else
return -EINVAL
with do_splice_direct() being fairly close to the missing case.


Re: Splicing to/from a tty

2021-01-18 Thread Linus Torvalds
On Mon, Jan 18, 2021 at 11:45 AM Al Viro  wrote:
>
> do_splice_direct() does something that do_splice() won't - it
> handles non-pipe to non-pipe case.  Which is how sendfile(2) is
> normally used, of course.

Yeah, I agree, it's better if we do the pipe case specially, but if
it's too painful for a stable backport I'll certainly take the
one-liner..

Btw, your email setup is broken. Your emails now have a "From:" line like this:

From: Al Viro 

and that is not a valid email address.

  Linus


Re: Splicing to/from a tty

2021-01-18 Thread Al Viro
On Mon, Jan 18, 2021 at 11:26:00AM -0800, Linus Torvalds wrote:
> On Mon, Jan 18, 2021 at 12:58 AM Johannes Berg
>  wrote:
> >
> > > I think just wiring up iter_file_splice_write would work.  Al?
> >
> > Seems to work for the simple test case that I had, at least:
> 
> Mind sending me a signed-off patch for this?
> 
> Yeah, I know it's a trivial one-liner, but I much prefer having an
> author with a patch and a sign-off to just doing it personally and
> reaping all that glory for it..

IMO it's a wrong way to handle that.   Look: do_sendfile() calls
do_splice_direct(), which calls splice_direct_to_actor().  There
we allocate an internal pipe and go through "feed from source into
that pipe, then shove what's there into destination".  Which is
insane for the case when destination (or source, for that matter)
is a pipe itself.

do_splice_direct() does something that do_splice() won't - it
handles non-pipe to non-pipe case.  Which is how sendfile(2) is
normally used, of course.

I'll look into that in more details, but IMO bothering with
internal pipe is just plain wrong for those cases.


Re: Splicing to/from a tty

2021-01-18 Thread Linus Torvalds
On Mon, Jan 18, 2021 at 11:35 AM Al Viro  wrote:
>
> I'd rather have sendfile(2) do what splice(2) does and handle pipes
> directly.  Let me take a look,,,

It's not technically just the sendfile() thing. Some things do
sendfile "internally" (ie they use do_splice_direct()).

Although maybe they always happen just on non-pipe destinations (ie
file-to-file copy). I didn't check.

But particularly if it can be handled in do_splice_direct(), that
would be a good thing.

 Linus


Re: Splicing to/from a tty

2021-01-18 Thread Linus Torvalds
On Mon, Jan 18, 2021 at 12:16 AM Christoph Hellwig  wrote:
>
> On Sat, Jan 16, 2021 at 08:35:41PM +1300, Oliver Giles wrote:
> > For my case, I attempted to instead implement splice_write and splice_read 
> > in tty_fops; I managed to get splice_write working calling ld->ops->write, 
> > but splice_read is not so simple because the tty_ldisc_ops read method 
> > expects a userspace buffer. So I cannot see how to implement this without 
> > either (a) using set_fs, or (b) implementing iter ops on all line 
> > disciplines.
>
> set_fs is gone for all the important platforms.  So yes, you basically
> need to convert to iov_iter or have a huge splice_write parallel
> infrastucture.

It might ok to try to limit it to just the pty cases and ldisc ops
that need it, apparently in this case pty (and presumably just one or
two line disciplines)

Of course, it probably would be really nice to try to convert
tty_read() to use the same model that we have for tty_write(), and
then make the ld->ops->read() function actually take a kernel buffer
instead.

I wonder how painful that would be.

 Linus


Re: Splicing to/from a tty

2021-01-18 Thread Linus Torvalds
On Mon, Jan 18, 2021 at 12:58 AM Johannes Berg
 wrote:
>
> > I think just wiring up iter_file_splice_write would work.  Al?
>
> Seems to work for the simple test case that I had, at least:

Mind sending me a signed-off patch for this?

Yeah, I know it's a trivial one-liner, but I much prefer having an
author with a patch and a sign-off to just doing it personally and
reaping all that glory for it..

   Linus


Re: Splicing to/from a tty

2021-01-18 Thread Johannes Berg
On Mon, 2021-01-18 at 09:53 +0100, Christoph Hellwig wrote:
> On Sat, Jan 16, 2021 at 05:46:33PM +0100, Johannes Berg wrote:
> > > For my case, I attempted to instead implement splice_write and
> > > splice_read in tty_fops; I managed to get splice_write working calling
> > > ld->ops->write, but splice_read is not so simple because the
> > > tty_ldisc_ops read method expects a userspace buffer. So I cannot see
> > > how to implement this without either (a) using set_fs, or (b)
> > > implementing iter ops on all line disciplines.
> > > 
> > > Is splice()ing between a tty and a pipe worth supporting at all? Not a
> > > big deal for my use case at least, but it used to work.
> > 
> > Is it even strictly related to the tty?
> > 
> > I was just now looking into why my cgit/fcgi/nginx setup no longer
> > works, and the reason is getting -EINVAL from sendfile() when the input
> > is a file and the output is a pipe().
> 
> Yes, pipes do not support ->splice_write currenly.

Well, it clearly used to work :-)

>I think just wiring
> up iter_file_splice_write would work.  Al?

Seems to work for the simple test case that I had, at least:

diff --git a/fs/pipe.c b/fs/pipe.c
index c5989cfd564d..39c96845a72f 100644
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -1206,6 +1206,7 @@ const struct file_operations pipefifo_fops = {
.unlocked_ioctl = pipe_ioctl,
.release= pipe_release,
.fasync = pipe_fasync,
+   .splice_write   = iter_file_splice_write,
 };
 
 /*

johannes



Re: Splicing to/from a tty

2021-01-18 Thread Christoph Hellwig
On Sat, Jan 16, 2021 at 05:46:33PM +0100, Johannes Berg wrote:
> > For my case, I attempted to instead implement splice_write and
> > splice_read in tty_fops; I managed to get splice_write working calling
> > ld->ops->write, but splice_read is not so simple because the
> > tty_ldisc_ops read method expects a userspace buffer. So I cannot see
> > how to implement this without either (a) using set_fs, or (b)
> > implementing iter ops on all line disciplines.
> > 
> > Is splice()ing between a tty and a pipe worth supporting at all? Not a
> > big deal for my use case at least, but it used to work.
> 
> Is it even strictly related to the tty?
> 
> I was just now looking into why my cgit/fcgi/nginx setup no longer
> works, and the reason is getting -EINVAL from sendfile() when the input
> is a file and the output is a pipe().

Yes, pipes do not support ->splice_write currenly.   I think just wiring
up iter_file_splice_write would work.  Al?

> So I wrote a simple test program (below) and that errors out on kernel
> 5.10.4, while it works fine on the 5.9.16 I currently have. Haven't
> tried reverting anything yet, but now that I haev a test program it
> should be simple to even bisect.
> 
> johannes
> 
> 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> 
> int main(int argc, char **argv)
> {
>   int in = open(argv[0], O_RDONLY);
>   int p[2], out;
>   off_t off = 0;
>   int err;
> 
>   assert(in >= 0);
>   assert(pipe(p) >= 0);
>   out = p[1];
>   err = sendfile(out, in, , 1024);
>   if (err < 0)
>   perror("sendfile");
>   assert(err == 1024);
> 
>   return 0;
> }
> 
---end quoted text---


Re: Splicing to/from a tty

2021-01-18 Thread Christoph Hellwig
On Sat, Jan 16, 2021 at 08:35:41PM +1300, Oliver Giles wrote:
> Commit 36e2c7421f02 (fs: don't allow splice read/write without explicit ops) 
> broke my userspace application which talks to an SSL VPN by splice()ing 
> between "openssl s_client" and "pppd". The latter operates over a pty, and 
> since that commit there is no fallback for splice()ing between a pipe and a 
> pty, or any tty for that matter.
> 
> The above commit mentions switching them to the iter ops and using 
> generic_file_splice_read. IIUC, this would require implementing iter ops also 
> on the line disciplines, which sounds pretty disruptive.
> 
> For my case, I attempted to instead implement splice_write and splice_read in 
> tty_fops; I managed to get splice_write working calling ld->ops->write, but 
> splice_read is not so simple because the tty_ldisc_ops read method expects a 
> userspace buffer. So I cannot see how to implement this without either (a) 
> using set_fs, or (b) implementing iter ops on all line disciplines.

set_fs is gone for all the important platforms.  So yes, you basically
need to convert to iov_iter or have a huge splice_write parallel
infrastucture.

> 
> Is splice()ing between a tty and a pipe worth supporting at all? Not a big 
> deal for my use case at least, but it used to work.

Our normal policy is no regressions for exiting userspace.  By that we'd
have to fix it.  Let me see if I can help you with this in any way.


Re: Splicing to/from a tty

2021-01-16 Thread Oliver Giles
On Sun Jan 17, 2021 at 5:46 AM NZDT, Johannes Berg wrote:
> On Sat, 2021-01-16 at 20:35 +1300, Oliver Giles wrote:
> > Commit 36e2c7421f02 (fs: don't allow splice read/write without
> > explicit ops) broke my userspace application which talks to an SSL VPN
> > by splice()ing between "openssl s_client" and "pppd". The latter
> > operates over a pty, and since that commit there is no fallback for
> > splice()ing between a pipe and a pty, or any tty for that matter.
> > 
> > The above commit mentions switching them to the iter ops and using
> > generic_file_splice_read. IIUC, this would require implementing iter
> > ops also on the line disciplines, which sounds pretty disruptive.
> > 
> > For my case, I attempted to instead implement splice_write and
> > splice_read in tty_fops; I managed to get splice_write working calling
> > ld->ops->write, but splice_read is not so simple because the
> > tty_ldisc_ops read method expects a userspace buffer. So I cannot see
> > how to implement this without either (a) using set_fs, or (b)
> > implementing iter ops on all line disciplines.
> > 
> > Is splice()ing between a tty and a pipe worth supporting at all? Not a
> > big deal for my use case at least, but it used to work.
>
> Is it even strictly related to the tty?
>
> I was just now looking into why my cgit/fcgi/nginx setup no longer
> works, and the reason is getting -EINVAL from sendfile() when the input
> is a file and the output is a pipe().
>
> So I wrote a simple test program (below) and that errors out on kernel
> 5.10.4, while it works fine on the 5.9.16 I currently have. Haven't
> tried reverting anything yet, but now that I haev a test program it
> should be simple to even bisect.
>
> johannes
>
>
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
> #include 
>
> int main(int argc, char **argv)
> {
> int in = open(argv[0], O_RDONLY);
> int p[2], out;
> off_t off = 0;
> int err;
>
> assert(in >= 0);
> assert(pipe(p) >= 0);
> out = p[1];
> err = sendfile(out, in, , 1024);
> if (err < 0)
> perror("sendfile");
> assert(err == 1024);
>
> return 0;
> }

I can confirm the behaviour you see, and that it starts occurring from the same
commit 36e2c7421f02a22 (fs: don't allow splice read/write without explicit ops).

In my tty case, it is clear that removing the default fallback would cause this
to fail, but assuming the sendfile() source is on a regular filesystem, I am
unsure why splice cannot find the appropriate splice_write method. Could be
connected to the fact that the message from warn_unsupported in fs/splice.c
outputs "splice write not supported for file  (pid: 983819 comm: 
test-sendfile)",
i.e. the file path is blank. In my case of directly calling splice on a pty, I
do see a path such as /ptyp0 in that error before implementing 
splice_(read/write)
callbacks. Maybe splice is getting a bogus file pointer from sendfile?


Re: Splicing to/from a tty

2021-01-16 Thread Johannes Berg
On Sat, 2021-01-16 at 20:35 +1300, Oliver Giles wrote:
> Commit 36e2c7421f02 (fs: don't allow splice read/write without
> explicit ops) broke my userspace application which talks to an SSL VPN
> by splice()ing between "openssl s_client" and "pppd". The latter
> operates over a pty, and since that commit there is no fallback for
> splice()ing between a pipe and a pty, or any tty for that matter.
> 
> The above commit mentions switching them to the iter ops and using
> generic_file_splice_read. IIUC, this would require implementing iter
> ops also on the line disciplines, which sounds pretty disruptive.
> 
> For my case, I attempted to instead implement splice_write and
> splice_read in tty_fops; I managed to get splice_write working calling
> ld->ops->write, but splice_read is not so simple because the
> tty_ldisc_ops read method expects a userspace buffer. So I cannot see
> how to implement this without either (a) using set_fs, or (b)
> implementing iter ops on all line disciplines.
> 
> Is splice()ing between a tty and a pipe worth supporting at all? Not a
> big deal for my use case at least, but it used to work.

Is it even strictly related to the tty?

I was just now looking into why my cgit/fcgi/nginx setup no longer
works, and the reason is getting -EINVAL from sendfile() when the input
is a file and the output is a pipe().

So I wrote a simple test program (below) and that errors out on kernel
5.10.4, while it works fine on the 5.9.16 I currently have. Haven't
tried reverting anything yet, but now that I haev a test program it
should be simple to even bisect.

johannes


#include 
#include 
#include 
#include 
#include 
#include 
#include 

int main(int argc, char **argv)
{
int in = open(argv[0], O_RDONLY);
int p[2], out;
off_t off = 0;
int err;

assert(in >= 0);
assert(pipe(p) >= 0);
out = p[1];
err = sendfile(out, in, , 1024);
if (err < 0)
perror("sendfile");
assert(err == 1024);

return 0;
}




Splicing to/from a tty

2021-01-16 Thread Oliver Giles
Commit 36e2c7421f02 (fs: don't allow splice read/write without explicit ops) 
broke my userspace application which talks to an SSL VPN by splice()ing between 
"openssl s_client" and "pppd". The latter operates over a pty, and since that 
commit there is no fallback for splice()ing between a pipe and a pty, or any 
tty for that matter.

The above commit mentions switching them to the iter ops and using 
generic_file_splice_read. IIUC, this would require implementing iter ops also 
on the line disciplines, which sounds pretty disruptive.

For my case, I attempted to instead implement splice_write and splice_read in 
tty_fops; I managed to get splice_write working calling ld->ops->write, but 
splice_read is not so simple because the tty_ldisc_ops read method expects a 
userspace buffer. So I cannot see how to implement this without either (a) 
using set_fs, or (b) implementing iter ops on all line disciplines.

Is splice()ing between a tty and a pipe worth supporting at all? Not a big deal 
for my use case at least, but it used to work.