Re: syscall_get_error() && TS_ checks

2017-03-30 Thread Linus Torvalds
On Thu, Mar 30, 2017 at 12:21 PM, Andy Lutomirski  wrote:
>
> Huh?  Aren't those REX prefixes interpreted as INC instructions or
> similar in compat mode?

Hmm. I think you're right. So it's not like x32 that runs in long mode
but then would use the 64-bit ptrace interface anyway.

Your statement that 64-bit gdb sometimes uses 32-bit compat ptrace
makes me shudder. Why?

Don't even tell me. I suspect I'm happier not knowing.

   Linus


Re: syscall_get_error() && TS_ checks

2017-03-30 Thread Andy Lutomirski
On Thu, Mar 30, 2017 at 12:11 PM, Linus Torvalds
 wrote:
> On Thu, Mar 30, 2017 at 11:59 AM, Andy Lutomirski  wrote:
>>>
>>> And then actually run such a kernel on a 32-bit distro, and verifying
>>> that things like gdb and strace really work. But it needs real
>>> testing, not some kind of handwaving. It's a *big* change.
>>
>> I'll offer the following handwave: if there are problems, I'd expect
>> to see them in mixed-bitness uses, not 32-bit distros.  But the 32-bit
>> case is worth testing, too.
>
> I wouldn't worry too much about the mixed case, simply because you
> clearly cannot use a 32-bit gdb on a 64-bit process.
>
> So the mixed case already needs to use a 64-bit gdb, which presumably
> would never use the 32-bit ptrace paths in the first place, so this
> code never triggers.
>

Hah.  Hah hah.  IIRC 64-bit gdb *does* use the 32-bit paths, or at
least it uses some path that can't see the high regs.  I don't fully
recall, but this is the case that seems more likely to break to me.
It's a great big mess.

> Of course, the mroe testing the better, but the thing I'd really want
> to check is that there isn't some 32-bit distro that might have a
> library that is optimized and notices when it's running on a 64-bit
> capable CPU and uses REX prefixes to use special optimized versions.

Huh?  Aren't those REX prefixes interpreted as INC instructions or
similar in compat mode?  You can't just run 64-bit instructions in a
compat code segment.  You *can* use LAR to find a 64-bit code segment
and long-jump to it (and I've written code to do exactly that, and
it's even snuck it's way into linux.git, muahaha), but code like this
is terminally screwed under 32-bit gdb.


Re: syscall_get_error() && TS_ checks

2017-03-30 Thread Linus Torvalds
On Thu, Mar 30, 2017 at 11:59 AM, Andy Lutomirski  wrote:
>>
>> And then actually run such a kernel on a 32-bit distro, and verifying
>> that things like gdb and strace really work. But it needs real
>> testing, not some kind of handwaving. It's a *big* change.
>
> I'll offer the following handwave: if there are problems, I'd expect
> to see them in mixed-bitness uses, not 32-bit distros.  But the 32-bit
> case is worth testing, too.

I wouldn't worry too much about the mixed case, simply because you
clearly cannot use a 32-bit gdb on a 64-bit process.

So the mixed case already needs to use a 64-bit gdb, which presumably
would never use the 32-bit ptrace paths in the first place, so this
code never triggers.

Of course, the mroe testing the better, but the thing I'd really want
to check is that there isn't some 32-bit distro that might have a
library that is optimized and notices when it's running on a 64-bit
capable CPU and uses REX prefixes to use special optimized versions.

That would probably already break with a 32-bit gdb, but I can at
least in theory imagine code that simply depends on zero extension.

>> And in the signal handling path, the sign-extension and test of %eax
>> is reivially ok. Not because it's ok in general, but because we've
>> verified using %orig_eax that we're in a system call return path. We
>> could happily delete the whole TS_I386_REGS_POKED thing, I think.
>
> Then how do we know whether to sign extend?  TS_COMPAT does *not* work
> because it isn't set on signal delivery.

Ok, so we'd still need that nasty TS_I386_REGS_POKED. I still think
that's "ok" within the particular confines of just signal delivery.

Of course, I don't actually know why we don't set that flag
unconditionally when we poke things using that 32-bit ptrace
interface, but that's just more of the "yeah, this code is crazy
hacky"

Cleaning things up is good.

I just don't want people to dismiss that the current code seems to
work well enough in practice. That's *probably* because nobody
actually uses it, but I do know that people used to run 64-bit kernels
with 32-bit user land exactly because they needed the kernel for large
memory machines (PAE really doesn't work for shit) but were carrying
32-bit userland along for whatever lazy legacy reasons.

So it has gotten *some* use. It's probably (hopefully) fading.

.. there's Wine and dosemu that do crazy things too. Things like "we
only set TS_I386_REGS_POKED when modifying orig_eax" might be
something that those kinds of programs have actually noticed and are
actively working around etc. Because while *normal* programs hopefully
don't do insane things on purpose, both wine and dosemu definitely
have been known to very much intentionally do some truly crazy stuff.

Linus


Re: syscall_get_error() && TS_ checks

2017-03-30 Thread Andy Lutomirski
On Thu, Mar 30, 2017 at 11:35 AM, Linus Torvalds
 wrote:
> On Thu, Mar 30, 2017 at 11:23 AM, Andy Lutomirski  wrote:
>> On Thu, Mar 30, 2017 at 10:46 AM, Linus Torvalds
>>  wrote:
>>>
>>> We *really* shouldn't sign-extend that value if the debugger ends up
>>> updating the pointer (or maybe the debugger just reloads previous
>>> values, not really "updating" anything - I think that's what gdb does
>>> when you do a call within the context of the debugged program from
>>> within gdb, for example)
>>
>> Can you think of a case where this would actually matter?
>
> I'd actually be willing to have people try, but then you'd have to
> sign-extend *all* registers. No way in hell will I accept a patch that
> randomly sign-extends just %eax.
>
> And then actually run such a kernel on a 32-bit distro, and verifying
> that things like gdb and strace really work. But it needs real
> testing, not some kind of handwaving. It's a *big* change.

I'll offer the following handwave: if there are problems, I'd expect
to see them in mixed-bitness uses, not 32-bit distros.  But the 32-bit
case is worth testing, too.

>
>> --- issue 1 ---
>>
>> get_nr_restart_syscall() does:
>>
>>if (current->thread.status & (TS_COMPAT|TS_I386_REGS_POKED))
>>return __NR_ia32_restart_syscall;
>>
>> This is very, very buggy
>
> Quite frankly, a bug somewhere else is not an excuse for then making
> other code buggier. I don't see what "issue 1" has to do with anything
> what-so-ever,.
>
>> --- issue 2 ---
>>
>> syscall_get_error().  This is available on all arches, but it appears
>> to be used *only* on x86.
>
> So I think that one is a red herring. We can trivially fix that by
> just (a) removing everybody elses syscall_get_error(), and just
> inlining the x86 case into the x86 signal handling. Boom, gone.
>
> And in the signal handling path, the sign-extension and test of %eax
> is reivially ok. Not because it's ok in general, but because we've
> verified using %orig_eax that we're in a system call return path. We
> could happily delete the whole TS_I386_REGS_POKED thing, I think.

Then how do we know whether to sign extend?  TS_COMPAT does *not* work
because it isn't set on signal delivery.

>
> So don't even _try_ to equate this code with the general sign
> extension code by ptrace. That is a totally different animal
> altogether.

Of course it's different.  But doing sign extension in general would
avoid the need for this special case, and it seems that the special
case isn't actually correct in cases that people try to use.  Oleg,
can you clarify what's broken?

--Andy


Re: syscall_get_error() && TS_ checks

2017-03-30 Thread Linus Torvalds
On Thu, Mar 30, 2017 at 11:23 AM, Andy Lutomirski  wrote:
> On Thu, Mar 30, 2017 at 10:46 AM, Linus Torvalds
>  wrote:
>>
>> We *really* shouldn't sign-extend that value if the debugger ends up
>> updating the pointer (or maybe the debugger just reloads previous
>> values, not really "updating" anything - I think that's what gdb does
>> when you do a call within the context of the debugged program from
>> within gdb, for example)
>
> Can you think of a case where this would actually matter?

I'd actually be willing to have people try, but then you'd have to
sign-extend *all* registers. No way in hell will I accept a patch that
randomly sign-extends just %eax.

And then actually run such a kernel on a 32-bit distro, and verifying
that things like gdb and strace really work. But it needs real
testing, not some kind of handwaving. It's a *big* change.

> --- issue 1 ---
>
> get_nr_restart_syscall() does:
>
>if (current->thread.status & (TS_COMPAT|TS_I386_REGS_POKED))
>return __NR_ia32_restart_syscall;
>
> This is very, very buggy

Quite frankly, a bug somewhere else is not an excuse for then making
other code buggier. I don't see what "issue 1" has to do with anything
what-so-ever,.

> --- issue 2 ---
>
> syscall_get_error().  This is available on all arches, but it appears
> to be used *only* on x86.

So I think that one is a red herring. We can trivially fix that by
just (a) removing everybody elses syscall_get_error(), and just
inlining the x86 case into the x86 signal handling. Boom, gone.

And in the signal handling path, the sign-extension and test of %eax
is reivially ok. Not because it's ok in general, but because we've
verified using %orig_eax that we're in a system call return path. We
could happily delete the whole TS_I386_REGS_POKED thing, I think.

But then it's very much about the fact that within the particular case
of the signal handling code and system call return detection, the
whole sign extension checking makes sense.

So don't even _try_ to equate this code with the general sign
extension code by ptrace. That is a totally different animal
altogether.

  Linus


Re: syscall_get_error() && TS_ checks

2017-03-30 Thread Andy Lutomirski
On Thu, Mar 30, 2017 at 10:46 AM, Linus Torvalds
 wrote:
> For example, let's assume that %eax contains a 32-bit pointer with the
> high bit set, and we're using a 32-bit debugger on a 32-bit program
> (ie you're just running a 32-bit distro on a 64-bit kernel, which
> people have definitely done).
>
> We *really* shouldn't sign-extend that value if the debugger ends up
> updating the pointer (or maybe the debugger just reloads previous
> values, not really "updating" anything - I think that's what gdb does
> when you do a call within the context of the debugged program from
> within gdb, for example)

Can you think of a case where this would actually matter?

>
> So I really *really* don't think you can just sign-extend %eax. Which
> is exactly why we have that nasty odd sign-extension in the signal
> path instead, but then have to make it conditional on running a 32-bit
> program.
>
> But maybe there is still something I'm not understanding in your
> argument. This thread has been a series of mis-understandings.

As the daft kernel hacker who introduced TS_I386_REGS_POKED in the
first place, I'll try to explain what I think is going on.

TS_I386_REGS_POKED is an enormous kludge, and it serves two purposes.
It avoids a potential security bug that the old code had, and it at
least documents the code paths that are thoroughly broken.  (Before
they were TS_COMPAT instead, but most of the TS_COMPAT users are
fine.)

It's used in two places:

--- issue 1 ---

get_nr_restart_syscall() does:

if (current->thread.status & (TS_COMPAT|TS_I386_REGS_POKED))
return __NR_ia32_restart_syscall;

This is very, very buggy.  Fixing this appears to require somewhat
some surgery.  Proposals include adding new restart_syscall numbers
that match across 32-bit and 64-bit (interacts quite awkwardly with
seccomp) or trying to store syscall bitness along with restart_block
(ick, not actually 100% reliable depending on just how abusing the
debugger is).

--- issue 2 ---

syscall_get_error().  This is available on all arches, but it appears
to be used *only* on x86.  It's used to figure out whether we're
restarting a syscall.  It could plausibly matter if we have a buggy
compat syscall that returns int instead of long, but the main purpose
is for compatibility with 32-bit debuggers.

Neither Oleg nor I have thought of anything other than this code path
that cares at all about the high bits of RAX on a process that's being
poked using 32-bit ptrace.  Sign-extending RAX seems like it would get
rid of this code path entirely to me.

--Andy


Re: syscall_get_error() && TS_ checks

2017-03-30 Thread Linus Torvalds
On Thu, Mar 30, 2017 at 8:49 AM, Oleg Nesterov  wrote:
>>
>>   case offsetof(struct user32, regs.orig_eax):
>  ^^
> damn, just in case, of course this is typo, should be
>
> case offsetof(struct user32, regs.eax);

So honestly, the first version of your patch I went "Ok, that looks
perfectly reasonable", because orig_eax is special, and always
sign-extending it sounds fine.

The *fixed* version of your patch I go "no, that's obviously complete
garbage, that's completely bogus".

Because the regular eax register is *not* special, and sign-extending
it sounds very dangerous indeed. The user will actually *use* that
register as a register, unlike orig_eax. And eax is no different from
all the other random registers. Yes, it happens to be the return value
for system calls, but this codepath is not at all limited to system
calls, it's absolutely *any* context when you use gdb on a 32-bit
binary.

Now, you may well say that "the upper bits aren't well-defined in that
case anyway". And you'd be mostly right. Except the semantics of
x86-64 is that 32-bit operations zero-extend the upper bits, so
zero-extension really *is* the right thing to do.

For example, let's assume that %eax contains a 32-bit pointer with the
high bit set, and we're using a 32-bit debugger on a 32-bit program
(ie you're just running a 32-bit distro on a 64-bit kernel, which
people have definitely done).

We *really* shouldn't sign-extend that value if the debugger ends up
updating the pointer (or maybe the debugger just reloads previous
values, not really "updating" anything - I think that's what gdb does
when you do a call within the context of the debugged program from
within gdb, for example)

So I really *really* don't think you can just sign-extend %eax. Which
is exactly why we have that nasty odd sign-extension in the signal
path instead, but then have to make it conditional on running a 32-bit
program.

But maybe there is still something I'm not understanding in your
argument. This thread has been a series of mis-understandings.

   Linus


Re: syscall_get_error() && TS_ checks

2017-03-30 Thread Oleg Nesterov
On 03/30, Oleg Nesterov wrote:
>
> On 03/29, Linus Torvalds wrote:
> >
> > On Wed, Mar 29, 2017 at 11:50 AM, Oleg Nesterov  wrote:
> > >
> > > Again, afaics we only need these compat checks because regs->ax could be
> > > changed by 32-bit debugger without sign-extension.
> >
> > You don't explain how you were planning on *fixing* that code. You
> > know why it exists, but then you just say "let's remove it", without
> > any explanation of what you'd replace it with.
>
> Hmm. I tried to explain... Let me quote my initial email,
>
>   So why we can't simply change putreg32() to always sign-extend regs->ax
>   regs->orig_ax and just do
>
>   static inline long syscall_get_error(struct task_struct *task,
>struct pt_regs *regs)
>   {
>   return regs-ax;
>   }
>
>   ? Or, better, simply kill it and use syscall_get_return_value() in
>   arch/x86/kernel/signal.c.
>
>   Of course, if the tracee is 64-bit and debugger is 32-bit then the
>   unconditional sign-extend can be wrong, but do we really care about
>   this case? This can't really work anyway. And the current code is not
>   right too. Say, debugger nacks the signal which interrupted syscall
>   and sets regs->ax = -ERESTARTSYS to force the restart, this won't work
>   because TS_COMPAT|TS_I386_REGS_POKED are not set.
>
> In short. can the patch below work?
>
> Oleg.
>
> diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
> index 9cc7d5a..96f21fc 100644
> --- a/arch/x86/kernel/ptrace.c
> +++ b/arch/x86/kernel/ptrace.c
> @@ -917,11 +917,14 @@ static int putreg32(struct task_struct *child, unsigned 
> regno, u32 value)
>   R32(edi, di);
>   R32(esi, si);
>   R32(ebp, bp);
> - R32(eax, ax);
>   R32(eip, ip);
>   R32(esp, sp);
>
>   case offsetof(struct user32, regs.orig_eax):
 ^^
damn, just in case, of course this is typo, should be

case offsetof(struct user32, regs.eax);


> + regs->ax = (long) (int) value;
> + break;
> +
> + case offsetof(struct user32, regs.orig_eax):
>   /*
>* Warning: bizarre corner case fixup here.  A 32-bit
>* debugger setting orig_eax to -1 wants to disable
> diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
> index b3b98ff..41023f8 100644
> --- a/arch/x86/kernel/signal.c
> +++ b/arch/x86/kernel/signal.c
> @@ -710,7 +710,7 @@ handle_signal(struct ksignal *ksig, struct pt_regs *regs)
>   /* Are we from a system call? */
>   if (syscall_get_nr(current, regs) >= 0) {
>   /* If so, check system call restarting.. */
> - switch (syscall_get_error(current, regs)) {
> + switch (syscall_get_return_value(current, regs)) {
>   case -ERESTART_RESTARTBLOCK:
>   case -ERESTARTNOHAND:
>   regs->ax = -EINTR;
> @@ -813,7 +813,7 @@ void do_signal(struct pt_regs *regs)
>   /* Did we come from a system call? */
>   if (syscall_get_nr(current, regs) >= 0) {
>   /* Restart the system call - no handlers present */
> - switch (syscall_get_error(current, regs)) {
> + switch (syscall_get_return_value(current, regs)) {
>   case -ERESTARTNOHAND:
>   case -ERESTARTSYS:
>   case -ERESTARTNOINTR:



Re: syscall_get_error() && TS_ checks

2017-03-30 Thread Oleg Nesterov
On 03/29, Linus Torvalds wrote:
>
> On Wed, Mar 29, 2017 at 11:50 AM, Oleg Nesterov  wrote:
> >
> > Again, afaics we only need these compat checks because regs->ax could be
> > changed by 32-bit debugger without sign-extension.
>
> You don't explain how you were planning on *fixing* that code. You
> know why it exists, but then you just say "let's remove it", without
> any explanation of what you'd replace it with.

Hmm. I tried to explain... Let me quote my initial email,

So why we can't simply change putreg32() to always sign-extend regs->ax
regs->orig_ax and just do

static inline long syscall_get_error(struct task_struct *task,
 struct pt_regs *regs)
{
return regs-ax;
}

? Or, better, simply kill it and use syscall_get_return_value() in
arch/x86/kernel/signal.c.


Of course, if the tracee is 64-bit and debugger is 32-bit then the
unconditional sign-extend can be wrong, but do we really care about
this case? This can't really work anyway. And the current code is not
right too. Say, debugger nacks the signal which interrupted syscall
and sets regs->ax = -ERESTARTSYS to force the restart, this won't work
because TS_COMPAT|TS_I386_REGS_POKED are not set.

In short. can the patch below work?

Oleg.

diff --git a/arch/x86/kernel/ptrace.c b/arch/x86/kernel/ptrace.c
index 9cc7d5a..96f21fc 100644
--- a/arch/x86/kernel/ptrace.c
+++ b/arch/x86/kernel/ptrace.c
@@ -917,11 +917,14 @@ static int putreg32(struct task_struct *child, unsigned 
regno, u32 value)
R32(edi, di);
R32(esi, si);
R32(ebp, bp);
-   R32(eax, ax);
R32(eip, ip);
R32(esp, sp);
 
case offsetof(struct user32, regs.orig_eax):
+   regs->ax = (long) (int) value;
+   break;
+
+   case offsetof(struct user32, regs.orig_eax):
/*
 * Warning: bizarre corner case fixup here.  A 32-bit
 * debugger setting orig_eax to -1 wants to disable
diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
index b3b98ff..41023f8 100644
--- a/arch/x86/kernel/signal.c
+++ b/arch/x86/kernel/signal.c
@@ -710,7 +710,7 @@ handle_signal(struct ksignal *ksig, struct pt_regs *regs)
/* Are we from a system call? */
if (syscall_get_nr(current, regs) >= 0) {
/* If so, check system call restarting.. */
-   switch (syscall_get_error(current, regs)) {
+   switch (syscall_get_return_value(current, regs)) {
case -ERESTART_RESTARTBLOCK:
case -ERESTARTNOHAND:
regs->ax = -EINTR;
@@ -813,7 +813,7 @@ void do_signal(struct pt_regs *regs)
/* Did we come from a system call? */
if (syscall_get_nr(current, regs) >= 0) {
/* Restart the system call - no handlers present */
-   switch (syscall_get_error(current, regs)) {
+   switch (syscall_get_return_value(current, regs)) {
case -ERESTARTNOHAND:
case -ERESTARTSYS:
case -ERESTARTNOINTR:



Re: syscall_get_error() && TS_ checks

2017-03-29 Thread Linus Torvalds
On Wed, Mar 29, 2017 at 11:50 AM, Oleg Nesterov  wrote:
>
> Again, afaics we only need these compat checks because regs->ax could be
> changed by 32-bit debugger without sign-extension.

You don't explain how you were planning on *fixing* that code. You
know why it exists, but then you just say "let's remove it", without
any explanation of what you'd replace it with.

If your suggestion is just that "let's remove it, breaking the known
reason it's there", I really really don't see the upside.

It may be hacky, but it *works*. You seem to be advocating replacing
it with something simpler - "cleaner, but broken".

I really don't see the point of "cleaner, but broken".

The fact is, reality is not "clean". But reality trumps :I wish" and
"make-believe" every single time.

 Linus


Re: syscall_get_error() && TS_ checks

2017-03-29 Thread Oleg Nesterov
On 03/29, Linus Torvalds wrote:
>
> That said, I'm not sure why you want to change this in the first
> place? I think the current syscall_get_error() - with explicit compat
> handling and all - is fine.

To simplify this logic. To kill TS_I386_REGS_POKED (which doesn't really
work and can't) and to remove the subtle dependency on TS_COMPAT in ret-
with-signal paths.

Again, afaics we only need these compat checks because regs->ax could be
changed by 32-bit debugger without sign-extension. And TS_I386_REGS_POKED
means that if TS_COMPAT is not set, then the debugger should have also
changed regs->orig_ax. This mostly works, but imo too fragile.

Currently we have the same check in get_nr_restart_syscall() but it is
even more broken (see the patch/changelog), so it should go away and in
this case it would be nice to avoid these checks in do_signal() path too.

Oleg.



Re: syscall_get_error() && TS_ checks

2017-03-29 Thread Linus Torvalds
On Wed, Mar 29, 2017 at 10:04 AM, Oleg Nesterov  wrote:
>
> Oh, I agree, and let me repeat the 3rd time that I suggest to kill this
> helper and use syscall_get_return_value() in arch/x86/kernel/signal.c,
> it has no other callers.

That is probably fine, I'm just arguing against the suggested changes
to syscall_get_error().

That said, I'm not sure why you want to change this in the first
place? I think the current syscall_get_error() - with explicit compat
handling and all - is fine.

But if the aim is to just remove syscall_get_error() entirely because
it's so unused, then I'm ok with that.

  Linus


Re: syscall_get_error() && TS_ checks

2017-03-29 Thread Oleg Nesterov
On 03/29, Linus Torvalds wrote:
>
> On Wed, Mar 29, 2017 at 9:55 AM, Oleg Nesterov  wrote:
> >
> > Once again, it is only used in arch/x86/kernel/signal.c by do_signal() and
> > handle_signal(). We do not care if mmap() returns a valid pointer with the
> > high bit set, regs-ax can't be confused with -ERESTART code.
>
> Immaterial. If the function is called "get_error()", it sure as hell
> shouldn't return a random non-error value.

Oh, I agree, and let me repeat the 3rd time that I suggest to kill this
helper and use syscall_get_return_value() in arch/x86/kernel/signal.c,
it has no other callers.

Oleg.



Re: syscall_get_error() && TS_ checks

2017-03-29 Thread Linus Torvalds
On Wed, Mar 29, 2017 at 9:55 AM, Oleg Nesterov  wrote:
>
> Once again, it is only used in arch/x86/kernel/signal.c by do_signal() and
> handle_signal(). We do not care if mmap() returns a valid pointer with the
> high bit set, regs-ax can't be confused with -ERESTART code.

Immaterial. If the function is called "get_error()", it sure as hell
shouldn't return a random non-error value.

Code should make sense, otherwise it's not going to be maintainable.
Naming matters. If the code doesn't match the name of the function,
that's a bug regardless of whether it has semantic effects or not in
the end - because somebody will eventually depend on the _expected_
semantics.

  Linus


Re: syscall_get_error() && TS_ checks

2017-03-29 Thread Andy Lutomirski
On Wed, Mar 29, 2017 at 9:45 AM, Linus Torvalds
 wrote:
> On Wed, Mar 29, 2017 at 9:33 AM, Oleg Nesterov  wrote:
>>
>> Firstly, why do we need the IS_ERR_VALUE() check? This is only used by
>> do_signal/handle_signal, we do not care if it returns non-zero as long
>> as the value can't be confused with -ERESTART.* codes.
>
> There are system calls that can return "negative" values that aren't errors.
>
> Notably mmap() can return a valid pointer with the high bit set.
>
> So syscall_get_error() should return 0 for not just positive return
> values, but for those kinds of negative non-error values.
>
>> And why do we need the TS_ checks?
>
> Those may be bogus.
>
>> So why we can't simply change putreg32() to always sign-extend regs->ax
>> regs->orig_ax and just do
>>
>> static inline long syscall_get_error(struct task_struct *task,
>>  struct pt_regs *regs)
>> {
>> return regs-ax;
>> }
>
> That would be *complete* garbage. Lots of system calls return positive
> values that sure as hell aren't errors.

Does this cause an observable problem?  The only things that care are:

a) 32-bit debugger pokes some value with the high bit and a 64-bit
debugger reads it back.  I seriously doubt we care.

b) 32-bit debugger pokes some value with the high bit set and the user
code switches to 64-bit mode and reads RAX.  This case is so
terminally broken anyway that we definitely don't care.

c) 32-bit debugger pokes some value with the high bit set and
syscall_get_error happens.  Oleg's proposed change won't change what
we do, but it will dramatically simplify the code.


Re: syscall_get_error() && TS_ checks

2017-03-29 Thread Oleg Nesterov
On 03/29, Linus Torvalds wrote:
>
> On Wed, Mar 29, 2017 at 9:33 AM, Oleg Nesterov  wrote:
> >
> > Firstly, why do we need the IS_ERR_VALUE() check? This is only used by
> > do_signal/handle_signal, we do not care if it returns non-zero as long
> > as the value can't be confused with -ERESTART.* codes.
>
> There are system calls that can return "negative" values that aren't errors.
>
> Notably mmap() can return a valid pointer with the high bit set.
>
> So syscall_get_error() should return 0 for not just positive return
> values, but for those kinds of negative non-error values.

Once again, it is only used in arch/x86/kernel/signal.c by do_signal() and
handle_signal(). We do not care if mmap() returns a valid pointer with the
high bit set, regs-ax can't be confused with -ERESTART code.

> > And why do we need the TS_ checks?
>
> Those may be bogus.
>
> > So why we can't simply change putreg32() to always sign-extend regs->ax
> > regs->orig_ax and just do
> >
> > static inline long syscall_get_error(struct task_struct *task,
> >  struct pt_regs *regs)
> > {
> > return regs-ax;
> > }
>
> That would be *complete* garbage. Lots of system calls return positive
> values that sure as hell aren't errors.

See above. And please note that I actually suggest to kill this helper and
just use syscall_get_return_value() in arch/x86/kernel/signal.c.

Oleg.



Re: syscall_get_error() && TS_ checks

2017-03-29 Thread Linus Torvalds
On Wed, Mar 29, 2017 at 9:33 AM, Oleg Nesterov  wrote:
>
> Firstly, why do we need the IS_ERR_VALUE() check? This is only used by
> do_signal/handle_signal, we do not care if it returns non-zero as long
> as the value can't be confused with -ERESTART.* codes.

There are system calls that can return "negative" values that aren't errors.

Notably mmap() can return a valid pointer with the high bit set.

So syscall_get_error() should return 0 for not just positive return
values, but for those kinds of negative non-error values.

> And why do we need the TS_ checks?

Those may be bogus.

> So why we can't simply change putreg32() to always sign-extend regs->ax
> regs->orig_ax and just do
>
> static inline long syscall_get_error(struct task_struct *task,
>  struct pt_regs *regs)
> {
> return regs-ax;
> }

That would be *complete* garbage. Lots of system calls return positive
values that sure as hell aren't errors.

Linus