Re: pselect/etc semantics

2019-05-30 Thread Eric Wong
"Eric W. Biederman"  wrote:
> Frankly the only reason this appears to be worth touching is that we
> have a userspace regression.  Otherwise I would call the current
> behavior more correct and desirable than ignoring the signal longer.
> 
> If I am reading sitaution properly I suggest we go back to resoring the
> sigmask by hand in epoll_pwait, and put in a big fat comment about a
> silly userspace program depending on that behavior.
> 
> That will be easier to backport and it will just fix the regression and
> not overfix the problem for reasonable applications.

Fwiw, the cmogstored (userspace program which regressed) only
hit this regression in an obscure code path for tuning; that
code path probably sees no use outside of the test suite.

Add to that, cmogstored is an unofficial and optional component
to the obscure, largely-forgotten MogileFS.

Finally, the main users of cmogstored are on FreeBSD.  They
hit the kqueue+self-pipe code path instead of epoll_pwait.

I only used epoll_pwait on Linux since I figured I could save a
few bytes of memory by skipping eventfd/self-pipe...

Anyways, I updated cmogstored a few weeks back to call `note_run'
(the signal dispatcher) if epoll_pwait (wrapped by `mog_idleq_wait_intr')
returns 0 instead of -1 (EINTR) to workaround this kernel change:

https://bogomips.org/cmogstored-public/20190511075630.17811-...@80x24.org/

I could easily make a change to call `note_run' unconditionally
when `mog_idleq_wait_intr' returns, too.

But that said, there could be other users hitting the same
problem I did.  So maybe cmogstored's primary use on Linux these
days is finding regressions :>


Re: pselect/etc semantics (Was: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask())

2019-05-30 Thread Arnd Bergmann
On Thu, May 30, 2019 at 4:41 PM Oleg Nesterov  wrote:
> On 05/30, Arnd Bergmann wrote:
> Plus every file touched by this patch asks for more cleanups. Say, do_poll()
> should return -ERESTARTNOHAND, not -EINTR, after that we can remove the ugly
> EINTR->ERESTARTNOHAND in its callers. And more.
>
> > For the stable
> > kernels, I think we want just the addition of the 'bool interrupted' 
> > argument
> > to restore_user_sigmask()
>
> or simply revert this patch. I will check if this is possible today... At 
> first
> glance 854a6ed56839a40f6 fixed another bug by accident, do_pselect() did
> "ret == -ERESTARTNOHAND" after "ret = poll_select_copy_remaining()" which can
> turn ERESTARTNOHAND into EINTR, but this is simple. I'll check tomorrow.

Right, there were several differences between the system calls
that Deepa's original change got rid of. I don't know if any ones besides
the do_pselect() return code can be observed in practice.

> > > -   ret = set_user_sigmask(ksig.sigmask, , , 
> > > ksig.sigsetsize);
> > > +   ret = set_xxx(ksig.sigmask, ksig.sigsetsize);
> > > if (ret)
> > > return ret;
> > >
> > > ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ?  : 
> > > NULL);
> > > -   restore_user_sigmask(ksig.sigmask, );
> > > -   if (signal_pending(current) && !ret)
> > > +
> > > +   interrupted = signal_pending(current);
> > > +   update_xxx(interrupted);
> >
> > Maybe name this
> >
> >restore_saved_sigmask_if(!interrupted);
>
> Yes, I thought about restore_if(), but to me
>
> restore_saved_sigmask_if(ret != -EINTR);
>
> doesn't look readable... May be
>
> restore_saved_sigmask_unless(ret == -EINTR);
>
> ? but actually I agree with any naming.

Yes, restore_saved_sigmask_unless() probably better.

> > With some of the recent discussions about compat syscall handling,
> > I now think that we want to just fold set_compat_user_sigmask()
> > into set_user_sigmask()
>
> agreed, and I thought about this too. But again, I'd prefer to do this
> and other cleanups later, on top of this patch.

Ok, fair enough. I don't care much about the order as long as the
regression fix comes first.

 Arnd


Re: pselect/etc semantics

2019-05-30 Thread Arnd Bergmann
On Thu, May 30, 2019 at 3:54 AM Eric W. Biederman  wrote:
> Arnd Bergmann  writes:
> > On Wed, May 29, 2019 at 6:12 PM Oleg Nesterov  wrote:

> >
> > Not sure about the order of the cleanups, but probably something like
> > this would work:
> >
> > 1. fix the race (to be backported)
> > 2. unify set_compat_user_sigmask/set_user_sigmask
> > 3. remove unneeded compat handlers
> > 4. replace restore_user_sigmask with restore_saved_sigmask_if()
> > 5. also unify compat_get_fd_set()/get_fd_set() and kill off
> > compat select() variants.
>
> Are new system calls added preventing a revert of the patch in question
> for stable kernels?

Yes, a straight revert would not work, as it was done as a cleanup in
order to simplify the following conversion. I suppose one could undo
the cleanup in both the time32 and time64 versions of each syscall,
but I would consider that a more risky change than just fixing the
bug that was reported.

   Arnd


Re: pselect/etc semantics

2019-05-30 Thread Eric W. Biederman
Oleg Nesterov  writes:

> On 05/30, Eric W. Biederman wrote:
>>
>> ebied...@xmission.com (Eric W. Biederman) writes:
>>
>> > Which means I believe we have a semantically valid change in behavior
>> > that is causing a regression.
>>
>> I haven't made a survey of all of the functions yet but
>> fucntions return -ENORESTARTNOHAND will never return -EINTR and are
>> immune from this problem.
>
> Hmm. handle_signal:
>
>   case -ERESTARTNOHAND:
>   regs->ax = -EINTR;
>   break;
>
> but I am not sure I understand which problem do you mean..

Yes.  My mistake.  I looked at the transparent restart case for when a
signal is not pending and failed to look at what happens when a signal
is delivered.

So yes.  Everything changed does appear to have a behavioral difference
where they can now succeed and not return -EINTR.

Eric


Re: pselect/etc semantics

2019-05-30 Thread Deepa Dinamani
On Thu, May 30, 2019 at 8:48 AM Deepa Dinamani  wrote:
>
> > On May 30, 2019, at 8:38 AM, Eric W. Biederman  
> > wrote:
> >
> > ebied...@xmission.com (Eric W. Biederman) writes:
> >
> >> Which means I believe we have a semantically valid change in behavior
> >> that is causing a regression.
> >
> > I haven't made a survey of all of the functions yet but
> > fucntions return -ENORESTARTNOHAND will never return -EINTR and are
> > immune from this problem.
> >
> > AKA pselect is fine.  While epoll_pwait can be affected.
>
> This was my understanding as well.

I think I was misremembered here. I had noted this before:
https://lore.kernel.org/linux-fsdevel/CABeXuvq7gCV2qPOo+Q8jvNyRaTvhkRLRbnL_oJ-AuK7Sp=p...@mail.gmail.com/

"sys_io_pgetevents() does not seem to have this problem as we are still
checking signal_pending() here.
sys_pselect6() seems to have a similar problem. The changes to
sys_pselect6() also impact sys_select() as the changes are in the
common code path."

This was the code replaced for io_pgetevents by 854a6ed56839a40f6b is as below.
No matter what events completed, there was signal_pending() check
after the return from do_io_getevents().

--- a/fs/aio.c
+++ b/fs/aio.c
@@ -2110,18 +2110,9 @@ SYSCALL_DEFINE6(io_pgetevents,
return ret;

ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ?  : NULL);
-   if (signal_pending(current)) {
-   if (ksig.sigmask) {
-   current->saved_sigmask = sigsaved;
-   set_restore_sigmask();
-   }
-
-   if (!ret)
-   ret = -ERESTARTNOHAND;
-   } else {
-   if (ksig.sigmask)
-   sigprocmask(SIG_SETMASK, , NULL);
-   }
+   restore_user_sigmask(ksig.sigmask, );
+   if (signal_pending(current) && !ret)
+   ret = -ERESTARTNOHAND;

Can I ask a simple question for my understanding?

man page for epoll_pwait says

EINTR
The call was interrupted by a signal handler before either any of the
requested events occurred or the timeout expired; see signal(7).

But it is not clear to me if we can figure out(without race) the
chronological order if one of the requested events are completed or a
signal came first.
Is this a correct exectation?

Also like pointed out above, this behavior is not consistent for all
such syscalls(io_pgetevents). Was this also by design?

-Deepa


RE: pselect/etc semantics

2019-05-30 Thread David Laight
From: Eric W. Biederman
> Sent: 30 May 2019 16:38
> ebied...@xmission.com (Eric W. Biederman) writes:
> 
> > Which means I believe we have a semantically valid change in behavior
> > that is causing a regression.
> 
> I haven't made a survey of all of the functions yet but
> fucntions return -ENORESTARTNOHAND will never return -EINTR and are
> immune from this problem.

Eh?
ERESTARTNOHAND just makes the system call restart if there is no
signal handler, EINTR should still be returned if there is a handler.

All the functions that have a temporary signal mask are likely
to be expected to work the same way and thus have the same bugs.

http://pubs.opengroup.org/onlinepubs/9699919799/functions/pselect.html#
isn't overly helpful.
But I think it should return EINTR even if there is no handler unless
SA_RESTART is set:

[EINTR]
The function was interrupted while blocked waiting for any of the selected 
descriptors to become ready and before the timeout interval expired.

If SA_RESTART has been set for the interrupting signal, it is 
implementation-defined whether the function restarts or returns with [EINTR].

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)



Re: pselect/etc semantics

2019-05-30 Thread Oleg Nesterov
On 05/30, David Laight wrote:
>
> 4) As an optimisation a signal that arrives after the timer
>expires, but before the mask is restored can be 'deemed'
>to have happened before the timeout expired and EINTR
>returned.

This is what pselect/ppoll already does.

Oleg.



Re: pselect/etc semantics

2019-05-30 Thread Oleg Nesterov
On 05/30, Eric W. Biederman wrote:
>
> ebied...@xmission.com (Eric W. Biederman) writes:
>
> > Which means I believe we have a semantically valid change in behavior
> > that is causing a regression.
>
> I haven't made a survey of all of the functions yet but
> fucntions return -ENORESTARTNOHAND will never return -EINTR and are
> immune from this problem.

Hmm. handle_signal:

case -ERESTARTNOHAND:
regs->ax = -EINTR;
break;

but I am not sure I understand which problem do you mean..

Oleg.



Re: pselect/etc semantics

2019-05-30 Thread Oleg Nesterov
On 05/30, Eric W. Biederman wrote:
>
> Oleg Nesterov  writes:
>
> > Al, Linus, Eric, please help.
> >
> > The previous discussion was very confusing, we simply can not understand 
> > each
> > other.
> >
> > To me everything looks very simple and clear, but perhaps I missed something
> > obvious? Please correct me.
>
> If I have read this thread correctly the core issue is that ther is a
> program that used to work and that fails now.  The question is why.

I didn't even try to investigate, I wasn't cc'ed initially and I then I had
enough confusion when I started to look at the patch.

However, 854a6ed56839a40f6 obviously introduced the user-visible change so
I am not surprised it breaks something. And yes, personally I think this
change is not right.

> Which means I believe we have a semantically valid change in behavior
> that is causing a regression.

See below,

> void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved)
> {
>
>   if (!usigmask)
>   return;
>   /*
>* When signals are pending, do not restore them here.
>* Restoring sigmask here can lead to delivering signals that the above
>* syscalls are intended to block because of the sigmask passed in.
>*/
>   if (signal_pending(current)) {
>   current->saved_sigmask = *sigsaved;
>   set_restore_sigmask();
>   return;
>   }
>
>   /*
>* This is needed because the fast syscall return path does not restore
>* saved_sigmask when signals are not pending.
>*/
>   set_current_blocked(sigsaved);
> }
>
> Which has been reported results in a return value of 0,

0 or success

> and a signal
> delivered, where previously that did not happen.

Yes.

And to me this doesn't look right. OK, OK, probably this is because I never
read the docs, only the source code in fs/select.c. But to me pselect() should
either return success/timeout or deliver a signal. Not both. Even if the signal
was already pending before pselect() was called.

To clarify, "a signal" means a signal which was blocked before pselect(sigmask)
and temporary unblocked in this syscall.

And even if this doesn't violate posix, I see no reason to change the historic
behaviour. And this regression probably means we can't ;)

Oleg.



Re: pselect/etc semantics

2019-05-30 Thread Deepa Dinamani
> On May 30, 2019, at 8:38 AM, Eric W. Biederman  wrote:
>
> ebied...@xmission.com (Eric W. Biederman) writes:
>
>> Which means I believe we have a semantically valid change in behavior
>> that is causing a regression.
>
> I haven't made a survey of all of the functions yet but
> fucntions return -ENORESTARTNOHAND will never return -EINTR and are
> immune from this problem.
>
> AKA pselect is fine.  While epoll_pwait can be affected.

This was my understanding as well.

> Has anyone contacted Omar Kilani to see if that is his issue?
> https://lore.kernel.org/lkml/CA+8F9hicnF=kvjXPZFQy=pa2hjus3js+g9vswfhnqqynpmh...@mail.gmail.com/

Omar was cc-ed when this regression was reported. I did cc him on fix
and asked if he could try it. We have not heard from him.

> So far the only regression report I am seeing is from Eric Wong.
> AKA https://lore.kernel.org/lkml/20190501021405.hfvd7ps623liu25i@dcvr/
> Are there any others?  How did we get to be talking about more
> than just epoll_pwait?

This is the only report that I know of. I’m not sure why people
started talking about pselect. I was also confused why instead of
reviewing the patch and discussing the fix, we ended up talking about
how to simplify the code. We have deviated much from what should have
been a code review.

-Deepa


Re: pselect/etc semantics

2019-05-30 Thread Eric W. Biederman
ebied...@xmission.com (Eric W. Biederman) writes:

> Which means I believe we have a semantically valid change in behavior
> that is causing a regression.

I haven't made a survey of all of the functions yet but
fucntions return -ENORESTARTNOHAND will never return -EINTR and are
immune from this problem.

AKA pselect is fine.  While epoll_pwait can be affected.

Has anyone contacted Omar Kilani to see if that is his issue?
https://lore.kernel.org/lkml/CA+8F9hicnF=kvjXPZFQy=pa2hjus3js+g9vswfhnqqynpmh...@mail.gmail.com/

So far the only regression report I am seeing is from Eric Wong.
AKA https://lore.kernel.org/lkml/20190501021405.hfvd7ps623liu25i@dcvr/
Are there any others?  How did we get to be talking about more
than just epoll_pwait?

Eric


RE: pselect/etc semantics

2019-05-30 Thread David Laight
From: Eric W. Biederman
> Sent: 30 May 2019 14:01
> Oleg Nesterov  writes:
> 
> > Al, Linus, Eric, please help.
> >
> > The previous discussion was very confusing, we simply can not understand 
> > each
> > other.
> >
> > To me everything looks very simple and clear, but perhaps I missed something
> > obvious? Please correct me.
> 
> If I have read this thread correctly the core issue is that ther is a
> program that used to work and that fails now.  The question is why.
> 
> There are two ways the semantics for a sigmask changing system call
> can be defined.  The naive way and by reading posix.  I will pick
> on pselect.
> 
> The naive way:
> int pselect(int nfds, fd_set *readfds, fd_set *writefds,
> fd_set *exceptfds, const struct timespec *timeout,
> const sigset_t *sigmask)
> {
> sigset_t oldmask;
>   int ret;
> 
> if (sigmask)
>   sigprocmask(SIG_SETMASK, sigmask, );
> 
>   ret = select(nfds, readfds, writefds, exceptfds, timeout);
> 
>   if (sigmask)
>   sigprocmask(SIG_SETMASK, , NULL);
> 
> return ret;
> }
> 
> The standard for pselect behavior at
> https://pubs.opengroup.org/onlinepubs/009695399/functions/pselect.html
> says:
> > If sigmask is not a null pointer, then the pselect() function shall
> > replace the signal mask of the caller by the set of signals pointed to
> > by sigmask before examining the descriptors, and shall restore the
> > signal mask of the calling thread before returning.

Right, but that bit says nothing about when signals are 'delivered'.
Section 2.4 of 
http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html 
isn't very enlightening either - well, pretty impenetrable really. 
But since the ToG specs don't require a user-kernel boundary I believe
that the signal handler is expected to be called as soon as it is unmasked.

Now, for async signals, the kernel can always pretend that they happened
slightly later than they actually did.
So signal delivery is delayed until 'return to user'.
In some cases system calls are restarted to make the whole thing transparent.

For pselect() etc I think this means:

1) Signal handlers can only be called if EINTR is returned.
2) If a signal is deliverable after the mask is changed
   then the signal hander must be called.
   This means EINTR must be returned.
   ie this must be detected before checking anything else.
3) A signal that is raised after the wait completes can be
   'deemed' to have happened after the mask is restored
   and left masked until the applications next pselect() call.
4) As an optimisation a signal that arrives after the timer
   expires, but before the mask is restored can be 'deemed'
   to have happened before the timeout expired and EINTR
   returned.

Now not all system calls that use a temporary mask may want to
work that way.
In particular they may want to return 'success' and have the
signal handlers called.

So maybe the set_xxx() function that installs to user's mask
should return:
   -ve:   errno value
 0:   no signal pending
 1:   signal pending.

And the update_xxx() function that restores the saved mask
should take a parameter to indicate whether signal handlers
should be called maybe 'bool call_handlers' and return 0/1
depending on whether signals were pending.

pselect() then contains:
rval = set_xxx();
if (rval) {
if (rval < 0)
return rval;
update_xxx(true);
return -EINTR:
}

rval = wait();

#if 0  // don't report late signals
update_xxx(rval == -EINTR);
#else  // report signals after timeout
if (update_xxx(!rval || rval == -EINTR))
rval == -EINTR;
#endif

return rval;
}

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)



Re: pselect/etc semantics (Was: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask())

2019-05-30 Thread Oleg Nesterov
On 05/30, Arnd Bergmann wrote:
>
> I think this is a nice simplification, but it would help not to mix up the
> minimal regression fix with the rewrite of those functions.

Yes, yes, agreed.

Plus every file touched by this patch asks for more cleanups. Say, do_poll()
should return -ERESTARTNOHAND, not -EINTR, after that we can remove the ugly
EINTR->ERESTARTNOHAND in its callers. And more.

> For the stable
> kernels, I think we want just the addition of the 'bool interrupted' argument
> to restore_user_sigmask()

or simply revert this patch. I will check if this is possible today... At first
glance 854a6ed56839a40f6 fixed another bug by accident, do_pselect() did
"ret == -ERESTARTNOHAND" after "ret = poll_select_copy_remaining()" which can
turn ERESTARTNOHAND into EINTR, but this is simple. I'll check tomorrow.


> > -   ret = set_user_sigmask(ksig.sigmask, , , 
> > ksig.sigsetsize);
> > +   ret = set_xxx(ksig.sigmask, ksig.sigsetsize);
> > if (ret)
> > return ret;
> >
> > ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ?  : 
> > NULL);
> > -   restore_user_sigmask(ksig.sigmask, );
> > -   if (signal_pending(current) && !ret)
> > +
> > +   interrupted = signal_pending(current);
> > +   update_xxx(interrupted);
>
> Maybe name this
>
>restore_saved_sigmask_if(!interrupted);

Yes, I thought about restore_if(), but to me

restore_saved_sigmask_if(ret != -EINTR);

doesn't look readable... May be

restore_saved_sigmask_unless(ret == -EINTR);

? but actually I agree with any naming.

> and make restore_saved_sigmask_if() an inline function
> next to restore_saved_sigmask()?

agreed,

> With some of the recent discussions about compat syscall handling,
> I now think that we want to just fold set_compat_user_sigmask()
> into set_user_sigmask()

agreed, and I thought about this too. But again, I'd prefer to do this
and other cleanups later, on top of this patch.

Oleg.



Re: pselect/etc semantics

2019-05-30 Thread Eric W. Biederman
Eric Wong  writes:

> Agreed...  I believe cmogstored has always had a bug in the way
> it uses epoll_pwait because it failed to check interrupts if:
>
> a) an FD is ready + interrupt
> b) epoll_pwait returns 0 on interrupt
>
> The bug remains in userspace for a), which I will fix by adding
> an interrupt check when an FD is ready.  The window is very
> small for a) and difficult to trigger, and also in a rare code
> path.
>
> The b) case is the kernel bug introduced in 854a6ed56839a40f
> ("signal: Add restore_user_sigmask()").
>
> I don't think there's any disagreement that b) is a kernel bug.

See my reply to Oleg.  I think (b) is a regression that needs to be
fixed.  I do not think that (b) is a kernel bug.  Both versions of the
of what sigmask means posix and naive will allow (b).

Because fundamentally the sigmask is restored after the rest of the
system call happens.

Eric


Re: pselect/etc semantics

2019-05-30 Thread Eric W. Biederman
Oleg Nesterov  writes:

> Al, Linus, Eric, please help.
>
> The previous discussion was very confusing, we simply can not understand each
> other.
>
> To me everything looks very simple and clear, but perhaps I missed something
> obvious? Please correct me.

If I have read this thread correctly the core issue is that ther is a
program that used to work and that fails now.  The question is why.

There are two ways the semantics for a sigmask changing system call
can be defined.  The naive way and by reading posix.  I will pick
on pselect.

The naive way:
int pselect(int nfds, fd_set *readfds, fd_set *writefds,
fd_set *exceptfds, const struct timespec *timeout,
const sigset_t *sigmask)
{
sigset_t oldmask;
int ret;

if (sigmask)
sigprocmask(SIG_SETMASK, sigmask, );

ret = select(nfds, readfds, writefds, exceptfds, timeout);

if (sigmask)
sigprocmask(SIG_SETMASK, , NULL);

return ret;
}

The standard for pselect behavior at
https://pubs.opengroup.org/onlinepubs/009695399/functions/pselect.html
says:
> If sigmask is not a null pointer, then the pselect() function shall
> replace the signal mask of the caller by the set of signals pointed to
> by sigmask before examining the descriptors, and shall restore the
> signal mask of the calling thread before returning.

...

> On failure, the objects pointed to by the readfds, writefds, and
> errorfds arguments shall not be modified. If the timeout interval
> expires without the specified condition being true for any of the
> specified file descriptors, the objects pointed to by the readfds,
> writefds, and errorfds arguments shall have all bits set to 0.


So the standard specified behavior is if the return value is -EINTR you
know you were interrupted by a signal, for any other return value you
don't know.

An error value just indicates that the file descriptor sets have not
been updated.


That is what the standard and reasonable people will expect from the
system call.  Looking at set_user_sigmask and restore_user_sigmask
I believe we are don't violate those semantics today.

Which means I believe we have a semantically valid change in behavior
that is causing a regression.

>From my inspection of the code the change in behavior is from these
two pieces of code:

>From the v4.18 epoll_pwait.

/*
 * If we changed the signal mask, we need to restore the original one.
 * In case we've got a signal while waiting, we do not restore the
 * signal mask yet, and we allow do_signal() to deliver the signal on
 * the way back to userspace, before the signal mask is restored.
 */
if (sigmask) {
if (error == -EINTR) {
memcpy(>saved_sigmask, ,
   sizeof(sigsaved));
set_restore_sigmask();
} else
set_current_blocked();
}

/*
 * restore_user_sigmask:
 * usigmask: sigmask passed in from userland.
 * sigsaved: saved sigmask when the syscall started and changed the sigmask to
 *   usigmask.
 *
 * This is useful for syscalls such as ppoll, pselect, io_pgetevents and
 * epoll_pwait where a new sigmask is passed in from userland for the syscalls.
 */
void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved)
{

if (!usigmask)
return;
/*
 * When signals are pending, do not restore them here.
 * Restoring sigmask here can lead to delivering signals that the above
 * syscalls are intended to block because of the sigmask passed in.
 */
if (signal_pending(current)) {
current->saved_sigmask = *sigsaved;
set_restore_sigmask();
return;
}

/*
 * This is needed because the fast syscall return path does not restore
 * saved_sigmask when signals are not pending.
 */
set_current_blocked(sigsaved);
}

Which has been reported results in a return value of 0, and a signal
delivered, where previously that did not happen.

They only way I can see that happening is that set_current_blocked in
__set_task_blocked clears pending signals (that will be blocked) and
calls retarget_shared_pending.

Frankly the only reason this appears to be worth touching is that we
have a userspace regression.  Otherwise I would call the current
behavior more correct and desirable than ignoring the signal longer.

If I am reading sitaution properly I suggest we go back to resoring the
sigmask by hand in epoll_pwait, and put in a big fat comment about a
silly userspace program depending on that behavior.

That will be easier to backport and it will just fix the regression and
not overfix the problem for reasonable applications.

Oleg your cleanup also seems reasonable.  But I would lik

RE: pselect/etc semantics (Was: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask())

2019-05-30 Thread David Laight
From: Eric Wong
> Sent: 29 May 2019 19:50
...
> > Personally I think that is wrong.
> > Given code like the above that has:
> > while (!interrupted) {
> > pselect(..., );
> > // process available data
> > }
> >
> > You want the signal handler to be executed even if one of the fds
> > always has available data.
> > Otherwise you can't interrupt a process that is always busy.

FWIW in the past I've seen a process that loops select-accept-fork-exec
get its priority reduced to the point where it never blocked
in select. The client side retries made it go badly wrong!
If it had limited when SIG_INT was detected it would have been
a little more difficult to kill.

> Agreed...  I believe cmogstored has always had a bug in the way
> it uses epoll_pwait because it failed to check interrupts if:
> 
> a) an FD is ready + interrupt
> b) epoll_pwait returns 0 on interrupt

But the kernel code seems to only call the signal handler
(for signals that are enabled during pselect() (etc)) if
the underlying wait is interrupted.

> The bug remains in userspace for a), which I will fix by adding
> an interrupt check when an FD is ready.  The window is very
> small for a) and difficult to trigger, and also in a rare code
> path.
> 
> The b) case is the kernel bug introduced in 854a6ed56839a40f
> ("signal: Add restore_user_sigmask()").
> 
> I don't think there's any disagreement that b) is a kernel bug.

If the signal is raised after the wait has timed out but before
the signal mask is restored.

> So the confusion is for a), and POSIX is not clear w.r.t. how
> pselect/poll works when there's both FD readiness and an
> interrupt.
> 
> Thus I'm inclined to believe *select/*poll/epoll_*wait should
> follow POSIX read() semantics:
> 
>If a read() is interrupted by a signal before it reads any data, it 
> shall
>return −1 with errno set to [EINTR].
> 
>If  a  read()  is  interrupted by a signal after it has successfully 
> read
>some data, it shall return the number of bytes read.

Except that above you want different semantics :-)
For read() any signal handler is always called.
And the return value of a non-blocking read that returns no data
is not affected by any pending signals.

For pselect() that would mean that if the wait timed out (result 0)
and then a signal was raised (before the mask got changed back) then
the return value would be zero and the signal handler would be called.
So your (b) above is not a bug.
Even select() can return 'timeout' and have a signal handler called.

There are really two separate issues:
1) Signals that are pending when the new mask is applied.
2) Signals that are raised after the wait terminates (success or timeout).
If the signal handlers for (2) are not called then they become (1)
next time around the application loop.

Maybe the 'restore sigmask' function should be passed an indication
of whether it is allowed to let signal handler be called and return
whether they would be called (ie whether it restored the signal mask
or left it for the syscall exit code to do after calling the signal
handlers).

That would allow epoll() to convert timeout+pending signal to EINTR,
or to allow all handlers be called regardless of the return value.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)


Re: pselect/etc semantics

2019-05-29 Thread Eric W. Biederman
Arnd Bergmann  writes:

> On Wed, May 29, 2019 at 6:12 PM Oleg Nesterov  wrote:
>>
>> Al, Linus, Eric, please help.
>>
>> The previous discussion was very confusing, we simply can not understand each
>> other.
>>
>> To me everything looks very simple and clear, but perhaps I missed something
>> obvious? Please correct me.
>
> Thanks for the elaborate explanation in this patch, it all starts making sense
> to me now. I also looked at your patch in detail and thought I had found
> a few mistakes at first but those all turned out to be mistakes in my reading.
>
>> See the compile-tested patch at the end. Of course, the new _xxx() helpers
>> should be renamed somehow. fs/aio.c doesn't look right with or without this
>> patch, but iiuc this is what it did before 854a6ed56839a.
>
> I think this is a nice simplification, but it would help not to mix up the
> minimal regression fix with the rewrite of those functions. For the stable
> kernels, I think we want just the addition of the 'bool interrupted' argument
> to restore_user_sigmask() to close the race that was introduced
> 854a6ed56839a. Following up on that for future kernels, your patch
> improves the readability, but we can probably take it even further.
>
>> -   ret = set_user_sigmask(ksig.sigmask, , , 
>> ksig.sigsetsize);
>> +   ret = set_xxx(ksig.sigmask, ksig.sigsetsize);
>> if (ret)
>> return ret;
>>
>> ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ?  : 
>> NULL);
>> -   restore_user_sigmask(ksig.sigmask, );
>> -   if (signal_pending(current) && !ret)
>> +
>> +   interrupted = signal_pending(current);
>> +   update_xxx(interrupted);
>
> Maybe name this
>
>restore_saved_sigmask_if(!interrupted);
>
> and make restore_saved_sigmask_if() an inline function
> next to restore_saved_sigmask()?
>
>> @@ -2201,13 +2205,15 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents,
>> if (usig && copy_from_user(, usig, sizeof(ksig)))
>> return -EFAULT;
>>
>> -   ret = set_compat_user_sigmask(ksig.sigmask, , , 
>> ksig.sigsetsize);
>> +   ret = set_compat_xxx(ksig.sigmask, ksig.sigsetsize);
>> if (ret)
>> return ret;
>
> With some of the recent discussions about compat syscall handling,
> I now think that we want to just fold set_compat_user_sigmask()
> into set_user_sigmask() (whatever they get called in the end)
> with an in_compat_syscall() conditional inside it, and completely get
> rid of the COMPAT_SYSCALL_DEFINEx() definitions for those
> system calls for which this is the only difference.
>
> Unfortunately we still need the time32/time64 distinction, but removing
> syscall handlers is a significant cleanup here already, and we can
> move most of the function body of sys_io_pgetevents() into
> do_io_getevents() in the process. Same for some of the other calls.
>
> Not sure about the order of the cleanups, but probably something like
> this would work:
>
> 1. fix the race (to be backported)
> 2. unify set_compat_user_sigmask/set_user_sigmask
> 3. remove unneeded compat handlers
> 4. replace restore_user_sigmask with restore_saved_sigmask_if()
> 5. also unify compat_get_fd_set()/get_fd_set() and kill off
> compat select() variants.

Are new system calls added preventing a revert of the patch in question
for stable kernels?

Eric


Re: pselect/etc semantics (Was: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask())

2019-05-29 Thread Arnd Bergmann
On Wed, May 29, 2019 at 6:12 PM Oleg Nesterov  wrote:
>
> Al, Linus, Eric, please help.
>
> The previous discussion was very confusing, we simply can not understand each
> other.
>
> To me everything looks very simple and clear, but perhaps I missed something
> obvious? Please correct me.

Thanks for the elaborate explanation in this patch, it all starts making sense
to me now. I also looked at your patch in detail and thought I had found
a few mistakes at first but those all turned out to be mistakes in my reading.

> See the compile-tested patch at the end. Of course, the new _xxx() helpers
> should be renamed somehow. fs/aio.c doesn't look right with or without this
> patch, but iiuc this is what it did before 854a6ed56839a.

I think this is a nice simplification, but it would help not to mix up the
minimal regression fix with the rewrite of those functions. For the stable
kernels, I think we want just the addition of the 'bool interrupted' argument
to restore_user_sigmask() to close the race that was introduced
854a6ed56839a. Following up on that for future kernels, your patch
improves the readability, but we can probably take it even further.

> -   ret = set_user_sigmask(ksig.sigmask, , , 
> ksig.sigsetsize);
> +   ret = set_xxx(ksig.sigmask, ksig.sigsetsize);
> if (ret)
> return ret;
>
> ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ?  : 
> NULL);
> -   restore_user_sigmask(ksig.sigmask, );
> -   if (signal_pending(current) && !ret)
> +
> +   interrupted = signal_pending(current);
> +   update_xxx(interrupted);

Maybe name this

   restore_saved_sigmask_if(!interrupted);

and make restore_saved_sigmask_if() an inline function
next to restore_saved_sigmask()?

> @@ -2201,13 +2205,15 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents,
> if (usig && copy_from_user(, usig, sizeof(ksig)))
> return -EFAULT;
>
> -   ret = set_compat_user_sigmask(ksig.sigmask, , , 
> ksig.sigsetsize);
> +   ret = set_compat_xxx(ksig.sigmask, ksig.sigsetsize);
> if (ret)
> return ret;

With some of the recent discussions about compat syscall handling,
I now think that we want to just fold set_compat_user_sigmask()
into set_user_sigmask() (whatever they get called in the end)
with an in_compat_syscall() conditional inside it, and completely get
rid of the COMPAT_SYSCALL_DEFINEx() definitions for those
system calls for which this is the only difference.

Unfortunately we still need the time32/time64 distinction, but removing
syscall handlers is a significant cleanup here already, and we can
move most of the function body of sys_io_pgetevents() into
do_io_getevents() in the process. Same for some of the other calls.

Not sure about the order of the cleanups, but probably something like
this would work:

1. fix the race (to be backported)
2. unify set_compat_user_sigmask/set_user_sigmask
3. remove unneeded compat handlers
4. replace restore_user_sigmask with restore_saved_sigmask_if()
5. also unify compat_get_fd_set()/get_fd_set() and kill off
compat select() variants.

   Arnd


Re: pselect/etc semantics (Was: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask())

2019-05-29 Thread Eric Wong
David Laight  wrote:
> From: Oleg Nesterov
> > Sent: 29 May 2019 17:12
> > Al, Linus, Eric, please help.
> > 
> > The previous discussion was very confusing, we simply can not understand 
> > each
> > other.
> > 
> > To me everything looks very simple and clear, but perhaps I missed something
> > obvious? Please correct me.
> > 
> > I think that the following code is correct
> > 
> > int interrupted = 0;
> > 
> > void sigint_handler(int sig)
> > {
> > interrupted = 1;
> > }
> > 
> > int main(void)
> > {
> > sigset_t sigint, empty;
> > 
> > sigemptyset();
> > sigaddset(, SIGINT);
> > sigprocmask(SIG_BLOCK, , NULL);
> > 
> > signal(SIGINT, sigint_handler);
> > 
> > sigemptyset();// so pselect() unblocks SIGINT
> > 
> > ret = pselect(..., );
> ^ sigint
> > 
> >     if (ret >= 0)   // sucess or timeout
> > assert(!interrupted);
> > 
> > if (interrupted)
> > assert(ret == -EINTR);
> > }
> > 
> > IOW, if pselect(sigmask) temporary unblocks SIGINT according to sigmask, 
> > this
> > signal should not be delivered if a ready fd was found or timeout. The 
> > signal
> > handle should only run if ret == -EINTR.
> 
> Personally I think that is wrong.
> Given code like the above that has:
>   while (!interrupted) {
>   pselect(..., );
>   // process available data
>   }
> 
> You want the signal handler to be executed even if one of the fds
> always has available data.
> Otherwise you can't interrupt a process that is always busy.

Agreed...  I believe cmogstored has always had a bug in the way
it uses epoll_pwait because it failed to check interrupts if:

a) an FD is ready + interrupt
b) epoll_pwait returns 0 on interrupt

The bug remains in userspace for a), which I will fix by adding
an interrupt check when an FD is ready.  The window is very
small for a) and difficult to trigger, and also in a rare code
path.

The b) case is the kernel bug introduced in 854a6ed56839a40f
("signal: Add restore_user_sigmask()").

I don't think there's any disagreement that b) is a kernel bug.

So the confusion is for a), and POSIX is not clear w.r.t. how
pselect/poll works when there's both FD readiness and an
interrupt.

Thus I'm inclined to believe *select/*poll/epoll_*wait should
follow POSIX read() semantics:

   If a read() is interrupted by a signal before it reads any data, it shall
   return −1 with errno set to [EINTR].

   If  a  read()  is  interrupted by a signal after it has successfully read
   some data, it shall return the number of bytes read.

> One option is to return -EINTR if a signal is pending when the mask
> is updated - before even looking at anything else.
>
> Signals that happen later on (eg after a timeout) need not be reported
> (until the next time around the loop).

I'm not sure that's necessary and it would cause delays in
signal handling.


Re: pselect/etc semantics (Was: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask())

2019-05-29 Thread Deepa Dinamani
On Wed, May 29, 2019 at 9:12 AM Oleg Nesterov  wrote:
>
> Al, Linus, Eric, please help.
>
> The previous discussion was very confusing, we simply can not understand each
> other.
>
> To me everything looks very simple and clear, but perhaps I missed something
> obvious? Please correct me.
>
> I think that the following code is correct
>
> int interrupted = 0;
>
> void sigint_handler(int sig)
> {
> interrupted = 1;
> }
>
> int main(void)
> {
> sigset_t sigint, empty;
>
> sigemptyset();
> sigaddset(, SIGINT);
> sigprocmask(SIG_BLOCK, , NULL);
>
> signal(SIGINT, sigint_handler);
>
>     sigemptyset();// so pselect() unblocks SIGINT
>
> ret = pselect(..., );
>
> if (ret >= 0)   // sucess or timeout
> assert(!interrupted);
>
>     if (interrupted)
> assert(ret == -EINTR);
> }
>
> IOW, if pselect(sigmask) temporary unblocks SIGINT according to sigmask, this
> signal should not be delivered if a ready fd was found or timeout. The signal
> handle should only run if ret == -EINTR.

I do not think we discussed this part earlier. But, if this is true
then this is what is wrong as part of 854a6ed56839a. I missed that
before.

> (pselect() can be interrupted by any other signal which has a handler. In this
>  case the handler can be called even if ret >= 0. This is correct, I fail to
>  understand why some people think this is wrong, and in any case we simply 
> can't
>  avoid this).

This patch is wrong because I did not know that it was ok to deliver a
signal and not set the errno before. I also admitted to this. And
proposed another way to revert the patch.:
https://lore.kernel.org/lkml/CABeXuvouBzZuNarmNcd9JgZgvonL1N_p21gat=O_x0-1hMx=6...@mail.gmail.com/

-Deepa


Re: pselect/etc semantics (Was: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask())

2019-05-29 Thread Deepa Dinamani
Resending due to inadvertent conversion of prior message to html.

On Wed, May 29, 2019 at 9:12 AM Oleg Nesterov  wrote:
> Al, Linus, Eric, please help.
>
> The previous discussion was very confusing, we simply can not understand each
> other.
>
> To me everything looks very simple and clear, but perhaps I missed something
> obvious? Please correct me.
>
> I think that the following code is correct
>
> int interrupted = 0;
>
> void sigint_handler(int sig)
> {
> interrupted = 1;
> }
>
> int main(void)
> {
> sigset_t sigint, empty;
>
> sigemptyset();
> sigaddset(, SIGINT);
> sigprocmask(SIG_BLOCK, , NULL);
>
>     signal(SIGINT, sigint_handler);
>
>     sigemptyset();// so pselect() unblocks SIGINT
>
> ret = pselect(..., );
>
> if (ret >= 0)   // sucess or timeout
> assert(!interrupted);
>
>     if (interrupted)
> assert(ret == -EINTR);
> }
>
> IOW, if pselect(sigmask) temporary unblocks SIGINT according to sigmask, this
> signal should not be delivered if a ready fd was found or timeout. The signal
> handle should only run if ret == -EINTR.
>
> (pselect() can be interrupted by any other signal which has a handler. In this
>  case the handler can be called even if ret >= 0. This is correct, I fail to
>  understand why some people think this is wrong, and in any case we simply 
> can't
>  avoid this).
>
> This was true until 854a6ed56839a ("signal: Add restore_user_sigmask()"),
> now this is broken by the signal_pending() check in restore_user_sigmask().
>
> This patch 
> https://lore.kernel.org/lkml/20190522032144.10995-1-deepa.ker...@gmail.com/
> turns 0 into -EINTR if signal_pending(), but I think we should simply restore
> the old behaviour and simplify the code.
>
> See the compile-tested patch at the end. Of course, the new _xxx() helpers
> should be renamed somehow. fs/aio.c doesn't look right with or without this
> patch, but iiuc this is what it did before 854a6ed56839a.
>
> Let me show the code with the patch applied. I am using epoll_pwait() as an
> example because it looks very simple.
>
>
> static inline void set_restore_sigmask(void)
> {
> // WARN_ON(!TIF_SIGPENDING) was removed by this patch
> current->restore_sigmask = true;
> }
>
> int set_xxx(const sigset_t __user *umask, size_t sigsetsize)
> {
> sigset_t *kmask;
>
> if (!umask)
> return 0;
> if (sigsetsize != sizeof(sigset_t))
> return -EINVAL;
> if (copy_from_user(kmask, umask, sizeof(sigset_t)))
> return -EFAULT;
>
> // we can safely modify ->saved_sigmask/restore_sigmask, they has no meaning
> // until the syscall returns.
> set_restore_sigmask();
> current->saved_sigmask = current->blocked;
> set_current_blocked(kmask);
>
> return 0;
> }
>
>
> void update_xxx(bool interrupted)
> {
> // the main reason for this helper is WARN_ON(!TIF_SIGPENDING) which was 
> "moved"
> // from set_restore_sigmask() above.
> if (interrupted)
> WARN_ON(!test_thread_flag(TIF_SIGPENDING));
> else
> restore_saved_sigmask();
> }
>
> SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, 
> events,
> int, maxevents, int, timeout, const sigset_t __user 
> *, sigmask,
> size_t, sigsetsize)
> {
> int error;
>
> error = set_xxx(sigmask, sigsetsize);
> if (error)
> return error;
>
> error = do_epoll_wait(epfd, events, maxevents, timeout);
> update_xxx(error == -EINTR);
>
> return error;
> }
>
> Oleg.
> ---
>
>  fs/aio.c | 40 ++-
>  fs/eventpoll.c   | 12 +++
>  fs/io_uring.c| 12 +++
>  fs/select.c  | 40 +++
>  include/linux/compat.h   |  4 +--
>  include/linux/sched/signal.h |  2 --
>  include/linux/signal.h   |  6 ++--
>  kernel/signal.c  | 77 
> -

RE: pselect/etc semantics (Was: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask())

2019-05-29 Thread David Laight
From: Oleg Nesterov
> Sent: 29 May 2019 17:12
> Al, Linus, Eric, please help.
> 
> The previous discussion was very confusing, we simply can not understand each
> other.
> 
> To me everything looks very simple and clear, but perhaps I missed something
> obvious? Please correct me.
> 
> I think that the following code is correct
> 
>   int interrupted = 0;
> 
>   void sigint_handler(int sig)
>   {
>   interrupted = 1;
>   }
> 
>   int main(void)
>   {
>   sigset_t sigint, empty;
> 
>   sigemptyset();
>   sigaddset(, SIGINT);
>   sigprocmask(SIG_BLOCK, , NULL);
> 
>   signal(SIGINT, sigint_handler);
> 
>       sigemptyset();// so pselect() unblocks SIGINT
> 
>   ret = pselect(..., );
^ sigint
> 
>   if (ret >= 0)   // sucess or timeout
>   assert(!interrupted);
> 
>   if (interrupted)
>   assert(ret == -EINTR);
>   }
> 
> IOW, if pselect(sigmask) temporary unblocks SIGINT according to sigmask, this
> signal should not be delivered if a ready fd was found or timeout. The signal
> handle should only run if ret == -EINTR.

Personally I think that is wrong.
Given code like the above that has:
while (!interrupted) {
pselect(..., );
// process available data
}

You want the signal handler to be executed even if one of the fds
always has available data.
Otherwise you can't interrupt a process that is always busy.

One option is to return -EINTR if a signal is pending when the mask
is updated - before even looking at anything else.
Signals that happen later on (eg after a timeout) need not be reported
(until the next time around the loop).

> (pselect() can be interrupted by any other signal which has a handler. In this
>  case the handler can be called even if ret >= 0. This is correct, I fail to
>  understand why some people think this is wrong, and in any case we simply 
> can't
>  avoid this).

You mean any signal that isn't blocked when pselect() is called

> This was true until 854a6ed56839a ("signal: Add restore_user_sigmask()"),
> now this is broken by the signal_pending() check in restore_user_sigmask().
> 
> This patch 
> https://lore.kernel.org/lkml/20190522032144.10995-1-deepa.ker...@gmail.com/
> turns 0 into -EINTR if signal_pending(), but I think we should simply restore
> the old behaviour and simplify the code.
> 
> See the compile-tested patch at the end. Of course, the new _xxx() helpers
> should be renamed somehow. fs/aio.c doesn't look right with or without this
> patch, but iiuc this is what it did before 854a6ed56839a.
> 
> Let me show the code with the patch applied. I am using epoll_pwait() as an
> example because it looks very simple.
> 
> 
>   static inline void set_restore_sigmask(void)
>   {
> // WARN_ON(!TIF_SIGPENDING) was removed by this patch
>   current->restore_sigmask = true;
>   }
> 
>   int set_xxx(const sigset_t __user *umask, size_t sigsetsize)
>   {
>   sigset_t *kmask;
 ^ no '*' here, add & before uses.
> 
>   if (!umask)
>   return 0;
>   if (sigsetsize != sizeof(sigset_t))
>   return -EINVAL;
>   if (copy_from_user(kmask, umask, sizeof(sigset_t)))
>   return -EFAULT;
> 
> // we can safely modify ->saved_sigmask/restore_sigmask, they has no meaning
> // until the syscall returns.
>   set_restore_sigmask();
>   current->saved_sigmask = current->blocked;
>   set_current_blocked(kmask);
> 
>   return 0;
>   }
> 
> 
>   void update_xxx(bool interrupted)
>   {
> // the main reason for this helper is WARN_ON(!TIF_SIGPENDING) which was 
> "moved"
> // from set_restore_sigmask() above.
>   if (interrupted)
>   WARN_ON(!test_thread_flag(TIF_SIGPENDING));
>   else
>   restore_saved_sigmask();
>   }

I looked at the code earlier, but failed to find the code that actually
delivers the signals.
It may be 'racy' with update_xxx() regardless of whether that is
looking for -EINTR or just a pending signal.

I assume that TIF_SIGPENGING is used to (not) short-circuit the
system call return path so that signals get delivered.
So that it is important that update_xxx() calls restore_saved_sigmask()
if there is no signal pending.
(Although a signal can happe

pselect/etc semantics (Was: [PATCH v2] signal: Adjust error codes according to restore_user_sigmask())

2019-05-29 Thread Oleg Nesterov
Al, Linus, Eric, please help.

The previous discussion was very confusing, we simply can not understand each
other.

To me everything looks very simple and clear, but perhaps I missed something
obvious? Please correct me.

I think that the following code is correct

int interrupted = 0;

void sigint_handler(int sig)
{
interrupted = 1;
}

int main(void)
{
sigset_t sigint, empty;

sigemptyset();
sigaddset(, SIGINT);
sigprocmask(SIG_BLOCK, , NULL);

signal(SIGINT, sigint_handler);

sigemptyset();// so pselect() unblocks SIGINT

ret = pselect(..., );

if (ret >= 0)   // sucess or timeout
assert(!interrupted);

if (interrupted)
assert(ret == -EINTR);
}

IOW, if pselect(sigmask) temporary unblocks SIGINT according to sigmask, this
signal should not be delivered if a ready fd was found or timeout. The signal
handle should only run if ret == -EINTR.

(pselect() can be interrupted by any other signal which has a handler. In this
 case the handler can be called even if ret >= 0. This is correct, I fail to
 understand why some people think this is wrong, and in any case we simply can't
 avoid this).

This was true until 854a6ed56839a ("signal: Add restore_user_sigmask()"),
now this is broken by the signal_pending() check in restore_user_sigmask().

This patch 
https://lore.kernel.org/lkml/20190522032144.10995-1-deepa.ker...@gmail.com/
turns 0 into -EINTR if signal_pending(), but I think we should simply restore
the old behaviour and simplify the code.

See the compile-tested patch at the end. Of course, the new _xxx() helpers
should be renamed somehow. fs/aio.c doesn't look right with or without this
patch, but iiuc this is what it did before 854a6ed56839a.

Let me show the code with the patch applied. I am using epoll_pwait() as an
example because it looks very simple.


static inline void set_restore_sigmask(void)
{
// WARN_ON(!TIF_SIGPENDING) was removed by this patch
current->restore_sigmask = true;
}

int set_xxx(const sigset_t __user *umask, size_t sigsetsize)
{
sigset_t *kmask;

if (!umask)
return 0;
if (sigsetsize != sizeof(sigset_t))
return -EINVAL;
if (copy_from_user(kmask, umask, sizeof(sigset_t)))
return -EFAULT;

// we can safely modify ->saved_sigmask/restore_sigmask, they has no meaning
// until the syscall returns.
set_restore_sigmask();
current->saved_sigmask = current->blocked;
set_current_blocked(kmask);

return 0;
}


void update_xxx(bool interrupted)
{
// the main reason for this helper is WARN_ON(!TIF_SIGPENDING) which was "moved"
// from set_restore_sigmask() above.
if (interrupted)
WARN_ON(!test_thread_flag(TIF_SIGPENDING));
else
restore_saved_sigmask();
}

SYSCALL_DEFINE6(epoll_pwait, int, epfd, struct epoll_event __user *, 
events,
int, maxevents, int, timeout, const sigset_t __user *, 
sigmask,
size_t, sigsetsize)
{
int error;

error = set_xxx(sigmask, sigsetsize);
if (error)
return error;

error = do_epoll_wait(epfd, events, maxevents, timeout);
update_xxx(error == -EINTR);

return error;
}

Oleg.
---

 fs/aio.c | 40 ++-
 fs/eventpoll.c   | 12 +++
 fs/io_uring.c| 12 +++
 fs/select.c  | 40 +++
 include/linux/compat.h   |  4 +--
 include/linux/sched/signal.h |  2 --
 include/linux/signal.h   |  6 ++--
 kernel/signal.c  | 77 
 8 files changed, 74 insertions(+), 119 deletions(-)


diff --git a/fs/aio.c b/fs/aio.c
index 3490d1f..8315bd2 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -2093,8 +2093,8 @@ SYSCALL_DEFINE6(io_pgetevents,
const struct __aio_sigset __user *, usig)
 {
struct __aio_sigset ksig = { NULL, };
-   sigset_tksigmask, sigsaved;
struct timespec64   ts;
+   bool interrupted;
int ret;
 
if (timeout && unlikely(get_timespec64(, timeout)))
@@ -2103,13 +2103,15 @@ SYSCALL_DEFINE6(io_pgetevents,
if (usig && copy_from_user(, usig, sizeof(ksig)))
return -EFAULT;
 
-   ret = set_user_sigmask(ksig.sigmask, , , 
ksig.sigsetsize

Re: [PATCH v4 0/5] y2038: Make ppoll, io_pgetevents and pselect y2038 safe

2018-12-06 Thread Arnd Bergmann
On 9/20/18, Deepa Dinamani  wrote:
> The series transitions the ppoll, io_getevents, and pselect syscalls
> to be y2038 safe.
>
> This is part of the work proceeding for syscalls for y2038.
> It is based on the series [1] from Arnd Bergmann.
>
> The overview of the series is as below:
> 1. Refactor sigmask handling logic for the above syscalls.
> 2. Provide y2038 safe versions of the syscalls for all ABIs.
>
> [1] https://lkml.org/lkml/2018/8/27/651

I'm sorry for dropping the ball on this earlier, so it already
missed the 4.20 merge window, and is getting almost too
late for 4.21 as well now.

I've applied your five patches to my y2038 tree now, so
it should at least be in linux-next as of tomorrow and
make it into 4.21. I'll see what other patches I have in
my backlog that we should get into that release before
I send him a pull request.

  Arnd


[RFC PATCH 2/3] use hrtimer in select and pselect

2007-03-04 Thread Arnd Bergmann
This changes the select and pselect system calls to use the
new schedule_timeout_hr function. Since many applications
use the select function instead of nanosleep, this provides
a higher resolution sleep to them.

BUG: the same needs to be done for the compat syscalls, the
current patch breaks building on 64 bit machines.

Signed-off-by: Arnd Bergmann <[EMAIL PROTECTED]>

Index: linux-cg/fs/select.c
===
--- linux-cg.orig/fs/select.c
+++ linux-cg/fs/select.c
@@ -189,7 +189,7 @@ get_max:
 #define POLLOUT_SET (POLLWRBAND | POLLWRNORM | POLLOUT | POLLERR)
 #define POLLEX_SET (POLLPRI)
 
-int do_select(int n, fd_set_bits *fds, s64 *timeout)
+int do_select(int n, fd_set_bits *fds, ktime_t *timeout)
 {
struct poll_wqueues table;
poll_table *wait;
@@ -205,12 +205,11 @@ int do_select(int n, fd_set_bits *fds, s
 
poll_initwait();
wait = 
-   if (!*timeout)
+   if (timeout && !timeout->tv64)
wait = NULL;
retval = 0;
for (;;) {
unsigned long *rinp, *routp, *rexp, *inp, *outp, *exp;
-   long __timeout;
 
set_current_state(TASK_INTERRUPTIBLE);
 
@@ -266,27 +265,19 @@ int do_select(int n, fd_set_bits *fds, s
*rexp = res_ex;
}
wait = NULL;
-   if (retval || !*timeout || signal_pending(current))
+   if (retval || (timeout && !timeout->tv64)
+   || signal_pending(current))
break;
if(table.error) {
retval = table.error;
break;
}
 
-   if (*timeout < 0) {
+   if (!timeout || timeout->tv64 < 0)
/* Wait indefinitely */
-   __timeout = MAX_SCHEDULE_TIMEOUT;
-   } else if (unlikely(*timeout >= (s64)MAX_SCHEDULE_TIMEOUT - 1)) 
{
-   /* Wait for longer than MAX_SCHEDULE_TIMEOUT. Do it in 
a loop */
-   __timeout = MAX_SCHEDULE_TIMEOUT - 1;
-   *timeout -= __timeout;
-   } else {
-   __timeout = *timeout;
-   *timeout = 0;
-   }
-   __timeout = schedule_timeout(__timeout);
-   if (*timeout >= 0)
-   *timeout += __timeout;
+   schedule();
+   else
+   *timeout = schedule_timeout_hr(*timeout);
}
__set_current_state(TASK_RUNNING);
 
@@ -307,7 +298,7 @@ int do_select(int n, fd_set_bits *fds, s
((unsigned long) (MAX_SCHEDULE_TIMEOUT / HZ)-1)
 
 static int core_sys_select(int n, fd_set __user *inp, fd_set __user *outp,
-  fd_set __user *exp, s64 *timeout)
+  fd_set __user *exp, ktime_t *timeout)
 {
fd_set_bits fds;
void *bits;
@@ -384,7 +375,7 @@ out_nofds:
 asmlinkage long sys_select(int n, fd_set __user *inp, fd_set __user *outp,
fd_set __user *exp, struct timeval __user *tvp)
 {
-   s64 timeout = -1;
+   ktime_t timeout, *timeoutp = NULL;
struct timeval tv;
int ret;
 
@@ -395,24 +386,20 @@ asmlinkage long sys_select(int n, fd_set
if (tv.tv_sec < 0 || tv.tv_usec < 0)
return -EINVAL;
 
+   timeout = timeval_to_ktime(tv);
/* Cast to u64 to make GCC stop complaining */
-   if ((u64)tv.tv_sec >= (u64)MAX_INT64_SECONDS)
-   timeout = -1;   /* infinite */
-   else {
-   timeout = ROUND_UP(tv.tv_usec, USEC_PER_SEC/HZ);
-   timeout += tv.tv_sec * HZ;
-   }
+   if ((u64)tv.tv_sec < (u64)MAX_INT64_SECONDS)
+   timeoutp = 
}
 
-   ret = core_sys_select(n, inp, outp, exp, );
+   ret = core_sys_select(n, inp, outp, exp, timeoutp);
 
if (tvp) {
struct timeval rtv;
 
if (current->personality & STICKY_TIMEOUTS)
goto sticky;
-   rtv.tv_usec = jiffies_to_usecs(do_div((*(u64*)), HZ));
-   rtv.tv_sec = timeout;
+   rtv = ktime_to_timeval(timeout);
if (timeval_compare(, ) >= 0)
rtv = tv;
if (copy_to_user(tvp, , sizeof(rtv))) {
@@ -438,7 +425,7 @@ asmlinkage long sys_pselect7(int n, fd_s
fd_set __user *exp, struct timespec __user *tsp,
const sigset_t __user *sigmask, size_t sigsetsize)
 {
-   s64 timeout = MAX_SCHEDULE_TIMEOUT;
+   ktime_t timeout, *timeoutp = NULL;
sigset_t ksigmask, sigsaved;
struct timespec ts;
int ret;
@@ -450,13 +437,11 @@ asmlinkag

[RFC PATCH 2/3] use hrtimer in select and pselect

2007-03-04 Thread Arnd Bergmann
This changes the select and pselect system calls to use the
new schedule_timeout_hr function. Since many applications
use the select function instead of nanosleep, this provides
a higher resolution sleep to them.

BUG: the same needs to be done for the compat syscalls, the
current patch breaks building on 64 bit machines.

Signed-off-by: Arnd Bergmann [EMAIL PROTECTED]

Index: linux-cg/fs/select.c
===
--- linux-cg.orig/fs/select.c
+++ linux-cg/fs/select.c
@@ -189,7 +189,7 @@ get_max:
 #define POLLOUT_SET (POLLWRBAND | POLLWRNORM | POLLOUT | POLLERR)
 #define POLLEX_SET (POLLPRI)
 
-int do_select(int n, fd_set_bits *fds, s64 *timeout)
+int do_select(int n, fd_set_bits *fds, ktime_t *timeout)
 {
struct poll_wqueues table;
poll_table *wait;
@@ -205,12 +205,11 @@ int do_select(int n, fd_set_bits *fds, s
 
poll_initwait(table);
wait = table.pt;
-   if (!*timeout)
+   if (timeout  !timeout-tv64)
wait = NULL;
retval = 0;
for (;;) {
unsigned long *rinp, *routp, *rexp, *inp, *outp, *exp;
-   long __timeout;
 
set_current_state(TASK_INTERRUPTIBLE);
 
@@ -266,27 +265,19 @@ int do_select(int n, fd_set_bits *fds, s
*rexp = res_ex;
}
wait = NULL;
-   if (retval || !*timeout || signal_pending(current))
+   if (retval || (timeout  !timeout-tv64)
+   || signal_pending(current))
break;
if(table.error) {
retval = table.error;
break;
}
 
-   if (*timeout  0) {
+   if (!timeout || timeout-tv64  0)
/* Wait indefinitely */
-   __timeout = MAX_SCHEDULE_TIMEOUT;
-   } else if (unlikely(*timeout = (s64)MAX_SCHEDULE_TIMEOUT - 1)) 
{
-   /* Wait for longer than MAX_SCHEDULE_TIMEOUT. Do it in 
a loop */
-   __timeout = MAX_SCHEDULE_TIMEOUT - 1;
-   *timeout -= __timeout;
-   } else {
-   __timeout = *timeout;
-   *timeout = 0;
-   }
-   __timeout = schedule_timeout(__timeout);
-   if (*timeout = 0)
-   *timeout += __timeout;
+   schedule();
+   else
+   *timeout = schedule_timeout_hr(*timeout);
}
__set_current_state(TASK_RUNNING);
 
@@ -307,7 +298,7 @@ int do_select(int n, fd_set_bits *fds, s
((unsigned long) (MAX_SCHEDULE_TIMEOUT / HZ)-1)
 
 static int core_sys_select(int n, fd_set __user *inp, fd_set __user *outp,
-  fd_set __user *exp, s64 *timeout)
+  fd_set __user *exp, ktime_t *timeout)
 {
fd_set_bits fds;
void *bits;
@@ -384,7 +375,7 @@ out_nofds:
 asmlinkage long sys_select(int n, fd_set __user *inp, fd_set __user *outp,
fd_set __user *exp, struct timeval __user *tvp)
 {
-   s64 timeout = -1;
+   ktime_t timeout, *timeoutp = NULL;
struct timeval tv;
int ret;
 
@@ -395,24 +386,20 @@ asmlinkage long sys_select(int n, fd_set
if (tv.tv_sec  0 || tv.tv_usec  0)
return -EINVAL;
 
+   timeout = timeval_to_ktime(tv);
/* Cast to u64 to make GCC stop complaining */
-   if ((u64)tv.tv_sec = (u64)MAX_INT64_SECONDS)
-   timeout = -1;   /* infinite */
-   else {
-   timeout = ROUND_UP(tv.tv_usec, USEC_PER_SEC/HZ);
-   timeout += tv.tv_sec * HZ;
-   }
+   if ((u64)tv.tv_sec  (u64)MAX_INT64_SECONDS)
+   timeoutp = timeout;
}
 
-   ret = core_sys_select(n, inp, outp, exp, timeout);
+   ret = core_sys_select(n, inp, outp, exp, timeoutp);
 
if (tvp) {
struct timeval rtv;
 
if (current-personality  STICKY_TIMEOUTS)
goto sticky;
-   rtv.tv_usec = jiffies_to_usecs(do_div((*(u64*)timeout), HZ));
-   rtv.tv_sec = timeout;
+   rtv = ktime_to_timeval(timeout);
if (timeval_compare(rtv, tv) = 0)
rtv = tv;
if (copy_to_user(tvp, rtv, sizeof(rtv))) {
@@ -438,7 +425,7 @@ asmlinkage long sys_pselect7(int n, fd_s
fd_set __user *exp, struct timespec __user *tsp,
const sigset_t __user *sigmask, size_t sigsetsize)
 {
-   s64 timeout = MAX_SCHEDULE_TIMEOUT;
+   ktime_t timeout, *timeoutp = NULL;
sigset_t ksigmask, sigsaved;
struct timespec ts;
int ret;
@@ -450,13 +437,11 @@ asmlinkage long sys_pselect7(int n, fd_s

Re: Add pselect, ppoll system calls.

2005-08-26 Thread Michael Kerrisk
David,

> On Fri, 2005-08-05 at 14:49 +0200, Michael Kerrisk wrote:
> > If I change your program to do something like the above, I also
> > do not see a message from the handler -- i.e., it is not being
> > called, and I'm pretty sure it should be.
> 
> Hm, yes. What happens is we come back out of the select() immediately
> because of the pending signal, but on the way back to userspace we put the
> old signal mask back... so by the time we check for it, there _is_ no
> (unblocked) signal pending. 
> 
> If it's mandatory that we actually call the signal handler, 

I'm just about to go off on holiday, and don't have a chance to pull up 
all the relevant standards details at them moment.  However, I'm fairly 
sure that the signal handler should be called.  (Try running a modified 
version of my program on Solaris 10 or the Unix-03 conversion of AIX 
(5.3?).)

> then we need to
> play tricks like sigsuspend() does to leave the old signal mask on the
> stack frame. That's a bit painful atm because do_signal is different
> between architectures. 

Yes, I'd say the behaviour should in fact be like what sigsuspend() does.

Cheers,

Michael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Add pselect, ppoll system calls.

2005-08-26 Thread Michael Kerrisk
David,

 On Fri, 2005-08-05 at 14:49 +0200, Michael Kerrisk wrote:
  If I change your program to do something like the above, I also
  do not see a message from the handler -- i.e., it is not being
  called, and I'm pretty sure it should be.
 
 Hm, yes. What happens is we come back out of the select() immediately
 because of the pending signal, but on the way back to userspace we put the
 old signal mask back... so by the time we check for it, there _is_ no
 (unblocked) signal pending. 
 
 If it's mandatory that we actually call the signal handler, 

I'm just about to go off on holiday, and don't have a chance to pull up 
all the relevant standards details at them moment.  However, I'm fairly 
sure that the signal handler should be called.  (Try running a modified 
version of my program on Solaris 10 or the Unix-03 conversion of AIX 
(5.3?).)

 then we need to
 play tricks like sigsuspend() does to leave the old signal mask on the
 stack frame. That's a bit painful atm because do_signal is different
 between architectures. 

Yes, I'd say the behaviour should in fact be like what sigsuspend() does.

Cheers,

Michael
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Add pselect, ppoll system calls.

2005-08-24 Thread Ulrich Drepper
David Woodhouse wrote:
> If it's mandatory that we actually call the signal handler, then we need
> to play tricks like sigsuspend() does to leave the old signal mask on
> the stack frame. That's a bit painful atm because do_signal is different
> between architectures. 

It is necessary that the handler is called.  This is the purpose of
these interfaces.  If this means more complexity is needed then this is
how the cookie crumbles.  One use case for pselect would be something
like this:


int got_signal;
void sigint_handler(int sig) {
  got_signal = 1;
}

{
  ...
  while (1) {
if (!got_signal)
  pselect()

if (got_signal) {
  handle signal
  got_signal = 0;
}
  }
  ...
}

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖



signature.asc
Description: OpenPGP digital signature


Re: Add pselect, ppoll system calls.

2005-08-24 Thread Ulrich Drepper
David Woodhouse wrote:
 If it's mandatory that we actually call the signal handler, then we need
 to play tricks like sigsuspend() does to leave the old signal mask on
 the stack frame. That's a bit painful atm because do_signal is different
 between architectures. 

It is necessary that the handler is called.  This is the purpose of
these interfaces.  If this means more complexity is needed then this is
how the cookie crumbles.  One use case for pselect would be something
like this:


int got_signal;
void sigint_handler(int sig) {
  got_signal = 1;
}

{
  ...
  while (1) {
if (!got_signal)
  pselect()

if (got_signal) {
  handle signal
  got_signal = 0;
}
  }
  ...
}

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖



signature.asc
Description: OpenPGP digital signature


Re: pselect() modifying timeout

2005-08-08 Thread Alan Cox
On Gwe, 2005-08-05 at 12:42 +0200, Michael Kerrisk wrote:
> 1. POSIX made the behaviour of pselect() explicit -- the 
>timeout must not be modified.  The idea was to avoid the 
>vagueness of the select() specification; it had to be vague 
>because of existing implementations. By contrast, there were 

Unfortunately it made the wrong choice with pselect, as Linux select
experience has shown the modified timeout is *very* useful data to some
applications. So the patch is better than the POSUX behaviour. The
library can wrap it to provide the poorer standards compliant API while
not stopping people using the better one for Linux specific apps.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: pselect() modifying timeout

2005-08-08 Thread Alan Cox
On Gwe, 2005-08-05 at 12:42 +0200, Michael Kerrisk wrote:
 1. POSIX made the behaviour of pselect() explicit -- the 
timeout must not be modified.  The idea was to avoid the 
vagueness of the select() specification; it had to be vague 
because of existing implementations. By contrast, there were 

Unfortunately it made the wrong choice with pselect, as Linux select
experience has shown the modified timeout is *very* useful data to some
applications. So the patch is better than the POSUX behaviour. The
library can wrap it to provide the poorer standards compliant API while
not stopping people using the better one for Linux specific apps.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: pselect() modifying timeout

2005-08-05 Thread Michael Kerrisk
> Michael Kerrisk wrote:
> > Please consider making Linux pselect() conform to POSIX on this 
> > point.
> 
> There is no question the implementation will conform.  But this not
> dependent on changing the syscall interface.  We deliberately decided to
> not require the kernel interface to be different from select.  The
> userlevel code will take care of the difference.  The kernel code is good
> as proposed.

Okay -- thanks for the info.

Cheers,

Michael



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: pselect() modifying timeout

2005-08-05 Thread Ulrich Drepper
Michael Kerrisk wrote:
> Please consider making Linux pselect() conform to POSIX on this 
> point.

There is no question the implementation will conform.  But this not
dependent on changing the syscall interface.  We deliberately decided to
not require the kernel interface to be different from select.  The
userlevel code will take care of the difference.  The kernel code is
good as proposed.

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖



signature.asc
Description: OpenPGP digital signature


Re: Add pselect, ppoll system calls.

2005-08-05 Thread Michael Kerrisk
> From: David Woodhouse <[EMAIL PROTECTED]>
> Subject: Re: Add pselect, ppoll system calls.
> Date: Wed, 15 Jun 2005 12:36:54 +0100
> 
> On Mon, 2005-06-13 at 08:38 -0700, Ulrich Drepper wrote:
> > > Eep -- I hadn't noticed that difference. Will update patch
> > accordingly. 
> > 
> > And change [the poll timeout] to expect a 64bit value I hope...
> 
> I believe that 64-bit int types in syscalls are a pain on some
> architectures because of restrictions on precisely which register pairs
> may be used. I think I'd rather just use a struct timespec, which also
> makes it consistent with pselect().
> 
> > > The other documented difference (other than the signal mask bit) is
> > > that
> > > pselect() is documented not to modify the timeout parameter. I'm not
> > > sure whether I should preserve that difference, or not.
> > 
> > As long as there is a configuration where the timeout value is not
> > modified, it doesn't matter.  That is the case for select() using a
> > personality switch.
> 
> I've made pselect() consistent with select() by using the same
> personality switch to control its behaviour.
> 
> I've also fixed select() so that timeouts of greater than LONG_MAX
> jiffies are implemented accurately, instead of being infinite.

Hi David,

I applied the 15 Jun version of your patch agains 2.6.12 and 
tried a test program.  I see some behaviour that puzzles me.
When I run the program below, and type control-C, the program
tells me that the SIGINT hander was *not* called:

./t_pselect -
Making syscall(SYS_pselect6) call
[Type ^C]
pselect: Unknown error 514   <-- ERESTNOHAND
ready = -1
stdin IS NOT readable
signal handler WAS NOT called

I'm not sure whether this is possibly a mistake in the way 
I've constructed my test program (I don't think so, but I'm 
not !00% confident), or some bug in the implementation
(I did try making a syscall(SYS__newselect), and that *did* 
show the signal handler being called.  Also, I now have a test 
result on Solaris 10 for pselect(), and it shows the signal 
handler is called in this case.)

Cheers,

Michael

/* t_pselect.c

   Michael Kerrisk, Aug 2005
*/
#if defined(__linux__) && !defined(NO_SYSCALL)
#define USE_PSELECT_SYSCALL 1
#endif

#ifdef USE_PSELECT_SYSCALL
#define _GNU_SOURCE
#include 
#endif
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define errMsg(msg) do { perror(msg); } while (0)

#define errExit(msg)do { perror(msg); exit(EXIT_FAILURE); \
} while (0)

#ifdef USE_PSELECT_SYSCALL
/* Following are for x86 */
#define SYS_pselect6   289
#define SYS_ppoll  290
#endif

sig_atomic_t gotsig = 0;

static void
handler(int sig)
{
gotsig = 1;
printf("Caught signal %d\n", sig);  /* UNSAFE (see Section $$$) */
} /* handler */

int
main(int argc, char *argv[])
{
fd_set readfds;
int ready, nfds;
struct timespec timeout;
struct timespec *pto;
struct sigaction sa;
#ifdef USE_PSELECT_SYSCALL
struct {
sigset_t *sp;
size_t size;
} pselarg6;
#endif
sigset_t empty, all;

if (argc != 2) {
fprintf(stderr, "Usage: %s {nsecs|-}\n", argv[1]);
exit(EXIT_FAILURE);
}

setbuf(stdout, NULL);

   /* Block all sigs */

sigfillset();
if (sigprocmask(SIG_BLOCK, , NULL) == -1) errExit("sigprocmask");

/* Establish hander for SIGINT */

sa.sa_handler = handler;
sigemptyset(_mask);
sa.sa_flags = 0;
if (sigaction(SIGINT, , NULL) == -1) errExit("sigaction");

/* Timeout is specified in argv[1] */

if (strcmp(argv[1], "-") == 0) {
pto = NULL; /* Infinite timeout */
} else {
pto = 
timeout.tv_sec = atoi(argv[1]);
timeout.tv_nsec = 0;
}

/* Initialize descriptor set */

nfds = 1;
FD_ZERO();
FD_SET(STDIN_FILENO, );

sigemptyset();
/* sigaddset(, SIGINT); */

/* Make the call */

#ifdef USE_PSELECT_SYSCALL
#if 1
printf("Making syscall(SYS_pselect6) call\n");
pselarg6.sp = 
pselarg6.size = 8; /* sizeof(sigset_t) */
ready = syscall(SYS_pselect6, nfds, , NULL, NULL,
pto, );
#else
/* The following is just some testing cruft */
{
struct timeval tv;
struct timeval *ptv;

if (strcmp(argv[1], "-") == 0) {
ptv = NULL; /* Infinite timeout */
} else {
ptv = 
tv.tv_sec = atoi(argv[1]);
tv.tv_usec = 0;
}

if (sigprocmask(SIG_SETMASK, , NULL) == -1)
errExit("sigprocmask");

printf("Making syscall(SYS__newselect) call\n");
ready = syscall(SYS__newselect, nfds, , NULL, NULL, ptv);
    printf(" Ignore timeout informati

pselect() modifying timeout

2005-08-05 Thread Michael Kerrisk
Hello David,

By the way, looking at the comments to the last version of
the pselect()/ppoll()patch, I see that the treatment of 
the timeout argument is made dependent on the personality.  

http://marc.theaimsgroup.com/?l=linux-kernel=111883591220436=2

I'm not sure that this is a good idea; my reasons as follows:

1. POSIX made the behaviour of pselect() explicit -- the 
   timeout must not be modified.  The idea was to avoid the 
   vagueness of the select() specification; it had to be vague 
   because of existing implementations. By contrast, there were 
   no pre-existing implementations when pselect() was specified.  
   (By the way, although one or two posts in the earlier thread 
   implied that pselect() has long/widely been present on 
   some systems, this is almost certainly not true.  The only 
   systems where I believe it is currently implemented are two that 
   were recently Unix 03 certified: Solaris 10 and AIX (5.3?).  I 
   know from doing quite a bit of checking that it is not present 
   as a kernel implementation on most (all?) other systems (even
   though it was already described by POSIX.1g and Richard 
   Stevens 7 years ago))  

   I haven't tested Solaris 10 and AIX, but I think one can be 
   reasonably sure that they would conform to the letter of 
   POSIX law.  Lacking any strong reason to the contrary, 
   Linux should (IMO) too (why gratuitously introduce 
   differences across implementations?).

2. The existing (non-atomic) glibc pselect() implementation 
   does not change the timeout argument.

Please consider making Linux pselect() conform to POSIX on this 
point.

Cheers,

Michael

-- 
5 GB Mailbox, 50 FreeSMS http://www.gmx.net/de/go/promail
+++ GMX - die erste Adresse für Mail, Message, More +++
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


pselect() modifying timeout

2005-08-05 Thread Michael Kerrisk
Hello David,

By the way, looking at the comments to the last version of
the pselect()/ppoll()patch, I see that the treatment of 
the timeout argument is made dependent on the personality.  

http://marc.theaimsgroup.com/?l=linux-kernelm=111883591220436w=2

I'm not sure that this is a good idea; my reasons as follows:

1. POSIX made the behaviour of pselect() explicit -- the 
   timeout must not be modified.  The idea was to avoid the 
   vagueness of the select() specification; it had to be vague 
   because of existing implementations. By contrast, there were 
   no pre-existing implementations when pselect() was specified.  
   (By the way, although one or two posts in the earlier thread 
   implied that pselect() has long/widely been present on 
   some systems, this is almost certainly not true.  The only 
   systems where I believe it is currently implemented are two that 
   were recently Unix 03 certified: Solaris 10 and AIX (5.3?).  I 
   know from doing quite a bit of checking that it is not present 
   as a kernel implementation on most (all?) other systems (even
   though it was already described by POSIX.1g and Richard 
   Stevens 7 years ago))  

   I haven't tested Solaris 10 and AIX, but I think one can be 
   reasonably sure that they would conform to the letter of 
   POSIX law.  Lacking any strong reason to the contrary, 
   Linux should (IMO) too (why gratuitously introduce 
   differences across implementations?).

2. The existing (non-atomic) glibc pselect() implementation 
   does not change the timeout argument.

Please consider making Linux pselect() conform to POSIX on this 
point.

Cheers,

Michael

-- 
5 GB Mailbox, 50 FreeSMS http://www.gmx.net/de/go/promail
+++ GMX - die erste Adresse für Mail, Message, More +++
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Add pselect, ppoll system calls.

2005-08-05 Thread Michael Kerrisk
 From: David Woodhouse [EMAIL PROTECTED]
 Subject: Re: Add pselect, ppoll system calls.
 Date: Wed, 15 Jun 2005 12:36:54 +0100
 
 On Mon, 2005-06-13 at 08:38 -0700, Ulrich Drepper wrote:
   Eep -- I hadn't noticed that difference. Will update patch
  accordingly. 
  
  And change [the poll timeout] to expect a 64bit value I hope...
 
 I believe that 64-bit int types in syscalls are a pain on some
 architectures because of restrictions on precisely which register pairs
 may be used. I think I'd rather just use a struct timespec, which also
 makes it consistent with pselect().
 
   The other documented difference (other than the signal mask bit) is
   that
   pselect() is documented not to modify the timeout parameter. I'm not
   sure whether I should preserve that difference, or not.
  
  As long as there is a configuration where the timeout value is not
  modified, it doesn't matter.  That is the case for select() using a
  personality switch.
 
 I've made pselect() consistent with select() by using the same
 personality switch to control its behaviour.
 
 I've also fixed select() so that timeouts of greater than LONG_MAX
 jiffies are implemented accurately, instead of being infinite.

Hi David,

I applied the 15 Jun version of your patch agains 2.6.12 and 
tried a test program.  I see some behaviour that puzzles me.
When I run the program below, and type control-C, the program
tells me that the SIGINT hander was *not* called:

./t_pselect -
Making syscall(SYS_pselect6) call
[Type ^C]
pselect: Unknown error 514   -- ERESTNOHAND
ready = -1
stdin IS NOT readable
signal handler WAS NOT called

I'm not sure whether this is possibly a mistake in the way 
I've constructed my test program (I don't think so, but I'm 
not !00% confident), or some bug in the implementation
(I did try making a syscall(SYS__newselect), and that *did* 
show the signal handler being called.  Also, I now have a test 
result on Solaris 10 for pselect(), and it shows the signal 
handler is called in this case.)

Cheers,

Michael

/* t_pselect.c

   Michael Kerrisk, Aug 2005
*/
#if defined(__linux__)  !defined(NO_SYSCALL)
#define USE_PSELECT_SYSCALL 1
#endif

#ifdef USE_PSELECT_SYSCALL
#define _GNU_SOURCE
#include syscall.h
#endif
#include sys/time.h
#include sys/select.h
#include signal.h
#include sys/types.h
#include stdio.h
#include stdlib.h
#include unistd.h
#include string.h
#include errno.h

#define errMsg(msg) do { perror(msg); } while (0)

#define errExit(msg)do { perror(msg); exit(EXIT_FAILURE); \
} while (0)

#ifdef USE_PSELECT_SYSCALL
/* Following are for x86 */
#define SYS_pselect6   289
#define SYS_ppoll  290
#endif

sig_atomic_t gotsig = 0;

static void
handler(int sig)
{
gotsig = 1;
printf(Caught signal %d\n, sig);  /* UNSAFE (see Section $$$) */
} /* handler */

int
main(int argc, char *argv[])
{
fd_set readfds;
int ready, nfds;
struct timespec timeout;
struct timespec *pto;
struct sigaction sa;
#ifdef USE_PSELECT_SYSCALL
struct {
sigset_t *sp;
size_t size;
} pselarg6;
#endif
sigset_t empty, all;

if (argc != 2) {
fprintf(stderr, Usage: %s {nsecs|-}\n, argv[1]);
exit(EXIT_FAILURE);
}

setbuf(stdout, NULL);

   /* Block all sigs */

sigfillset(all);
if (sigprocmask(SIG_BLOCK, all, NULL) == -1) errExit(sigprocmask);

/* Establish hander for SIGINT */

sa.sa_handler = handler;
sigemptyset(sa.sa_mask);
sa.sa_flags = 0;
if (sigaction(SIGINT, sa, NULL) == -1) errExit(sigaction);

/* Timeout is specified in argv[1] */

if (strcmp(argv[1], -) == 0) {
pto = NULL; /* Infinite timeout */
} else {
pto = timeout;
timeout.tv_sec = atoi(argv[1]);
timeout.tv_nsec = 0;
}

/* Initialize descriptor set */

nfds = 1;
FD_ZERO(readfds);
FD_SET(STDIN_FILENO, readfds);

sigemptyset(empty);
/* sigaddset(empty, SIGINT); */

/* Make the call */

#ifdef USE_PSELECT_SYSCALL
#if 1
printf(Making syscall(SYS_pselect6) call\n);
pselarg6.sp = empty;
pselarg6.size = 8; /* sizeof(sigset_t) */
ready = syscall(SYS_pselect6, nfds, readfds, NULL, NULL,
pto, pselarg6);
#else
/* The following is just some testing cruft */
{
struct timeval tv;
struct timeval *ptv;

if (strcmp(argv[1], -) == 0) {
ptv = NULL; /* Infinite timeout */
} else {
ptv = tv;
tv.tv_sec = atoi(argv[1]);
tv.tv_usec = 0;
}

if (sigprocmask(SIG_SETMASK, empty, NULL) == -1)
errExit(sigprocmask);

printf(Making syscall(SYS__newselect) call\n);
ready = syscall(SYS__newselect, nfds, readfds, NULL, NULL, ptv);
printf( Ignore timeout information printed below \n);
}
#endif


#else
/* This is how a proper pselect() call looks on other
   implementations, or when calling

Re: pselect() modifying timeout

2005-08-05 Thread Ulrich Drepper
Michael Kerrisk wrote:
 Please consider making Linux pselect() conform to POSIX on this 
 point.

There is no question the implementation will conform.  But this not
dependent on changing the syscall interface.  We deliberately decided to
not require the kernel interface to be different from select.  The
userlevel code will take care of the difference.  The kernel code is
good as proposed.

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖



signature.asc
Description: OpenPGP digital signature


Re: pselect() modifying timeout

2005-08-05 Thread Michael Kerrisk
 Michael Kerrisk wrote:
  Please consider making Linux pselect() conform to POSIX on this 
  point.
 
 There is no question the implementation will conform.  But this not
 dependent on changing the syscall interface.  We deliberately decided to
 not require the kernel interface to be different from select.  The
 userlevel code will take care of the difference.  The kernel code is good
 as proposed.

Okay -- thanks for the info.

Cheers,

Michael



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: pselect

2001-03-19 Thread Jens-Uwe Mager

>For people who prefer programming above documenting,
>here is a simple small thing to do:
>
>POSIX.1g and Austin document a pselect() call intended to
>remove the race condition that is present when one wants
>to wait on either a signal or some file descriptor.
>(See also Stevens, Unix Network Programming, Volume 1, 2nd Ed.,
>1998, p. 168 and the pselect.2 man page released today.)
>Glibc 2.0 has a bad version (wrong number of parameters)
>and glibc 2.1 a better version, but the whole purpose
>of pselect is to avoid the race, and glibc cannot do that,
>one needs kernel support.
>So, probably someone should make a system call pselect
>almost identical to the present select, adding a sigmask
>parameter. (Or something more general.)

Well, pselect is not that interesting I suppose, ppoll would be my
favorite call to make use of.

-- 
Jens-Uwe Mager  mailto:62CFDB25>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



Re: pselect

2001-03-19 Thread Jens-Uwe Mager

For people who prefer programming above documenting,
here is a simple small thing to do:

POSIX.1g and Austin document a pselect() call intended to
remove the race condition that is present when one wants
to wait on either a signal or some file descriptor.
(See also Stevens, Unix Network Programming, Volume 1, 2nd Ed.,
1998, p. 168 and the pselect.2 man page released today.)
Glibc 2.0 has a bad version (wrong number of parameters)
and glibc 2.1 a better version, but the whole purpose
of pselect is to avoid the race, and glibc cannot do that,
one needs kernel support.
So, probably someone should make a system call pselect
almost identical to the present select, adding a sigmask
parameter. (Or something more general.)

Well, pselect is not that interesting I suppose, ppoll would be my
favorite call to make use of.

-- 
Jens-Uwe Mager  pgp-mailto:62CFDB25
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



pselect

2001-03-18 Thread Andries . Brouwer

For people who prefer programming above documenting,
here is a simple small thing to do:

POSIX.1g and Austin document a pselect() call intended to
remove the race condition that is present when one wants
to wait on either a signal or some file descriptor.
(See also Stevens, Unix Network Programming, Volume 1, 2nd Ed.,
1998, p. 168 and the pselect.2 man page released today.)
Glibc 2.0 has a bad version (wrong number of parameters)
and glibc 2.1 a better version, but the whole purpose
of pselect is to avoid the race, and glibc cannot do that,
one needs kernel support.
So, probably someone should make a system call pselect
almost identical to the present select, adding a sigmask
parameter. (Or something more general.)

Andries
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



pselect

2001-03-18 Thread Andries . Brouwer

For people who prefer programming above documenting,
here is a simple small thing to do:

POSIX.1g and Austin document a pselect() call intended to
remove the race condition that is present when one wants
to wait on either a signal or some file descriptor.
(See also Stevens, Unix Network Programming, Volume 1, 2nd Ed.,
1998, p. 168 and the pselect.2 man page released today.)
Glibc 2.0 has a bad version (wrong number of parameters)
and glibc 2.1 a better version, but the whole purpose
of pselect is to avoid the race, and glibc cannot do that,
one needs kernel support.
So, probably someone should make a system call pselect
almost identical to the present select, adding a sigmask
parameter. (Or something more general.)

Andries
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/