Re: [PATCH 1/4] run-command: add new check_command helper

2013-04-01 Thread Jeff King
On Mon, Apr 01, 2013 at 11:22:36PM -0600, Felipe Contreras wrote:

> On Mon, Apr 1, 2013 at 11:14 PM, Jeff King  wrote:
> > On Mon, Apr 01, 2013 at 11:11:20PM -0600, Felipe Contreras wrote:
> >
> >> > But if we know from reading waitpid(3) that waitpid should only fail due
> >> > to EINTR, or due to bogus arguments (e.g., a pid that does not exist or
> >> > has already been reaped), then maybe something like this makes sense:
> >> >
> >> >   while ((waiting = waitpid(pid, &status, 0)) < 0 && errno == EINTR)
> >> >   ; /* nothing */
> >>
> >> But we don't want to wait synchronously here, we just want to ping.
> >
> > Yeah, sorry, I forgot the WNOHANG there.
> 
> It still can potentially stay in a loop for some cycles.

That should be OK; it's the same loop we use in wait_or_whine (and that
is in fact how I managed to get the WNOHANG wrong, as I copied the loop
from there but forgot to update the flag variable).  A few cycles is OK,
as it is really about handling a simultaneous signal; it should be rare
that we loop at all, and even rarer to loop more than a single time. On
Linux, I don't think we will ever get EINTR at all, according to the
manpage; however, POSIX seems to allow EINTR even with WNOHANG.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] run-command: add new check_command helper

2013-04-01 Thread Felipe Contreras
On Mon, Apr 1, 2013 at 11:14 PM, Jeff King  wrote:
> On Mon, Apr 01, 2013 at 11:11:20PM -0600, Felipe Contreras wrote:
>
>> > But if we know from reading waitpid(3) that waitpid should only fail due
>> > to EINTR, or due to bogus arguments (e.g., a pid that does not exist or
>> > has already been reaped), then maybe something like this makes sense:
>> >
>> >   while ((waiting = waitpid(pid, &status, 0)) < 0 && errno == EINTR)
>> >   ; /* nothing */
>>
>> But we don't want to wait synchronously here, we just want to ping.
>
> Yeah, sorry, I forgot the WNOHANG there.

It still can potentially stay in a loop for some cycles.

>> > After the fix above, yes; in the original we would always have exited
>> > already.
>>
>> No:
>>
>> +   if (waiting != cmd->pid)
>> +   return 1;
>>
>> If waiting < 0, waiting != cmd->pid, and therefore this return is not
>> triggered, and there's only one more return at the end of the
>> function.
>
> Are my eyes not working? If waiting < 0, then waiting != cmd->pid, and
> therefore this return _is_ triggered.

Oh, right, it's only after the modification that the code works.

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] run-command: add new check_command helper

2013-04-01 Thread Jeff King
On Mon, Apr 01, 2013 at 11:11:20PM -0600, Felipe Contreras wrote:

> > But if we know from reading waitpid(3) that waitpid should only fail due
> > to EINTR, or due to bogus arguments (e.g., a pid that does not exist or
> > has already been reaped), then maybe something like this makes sense:
> >
> >   while ((waiting = waitpid(pid, &status, 0)) < 0 && errno == EINTR)
> >   ; /* nothing */
> 
> But we don't want to wait synchronously here, we just want to ping.

Yeah, sorry, I forgot the WNOHANG there.

> > After the fix above, yes; in the original we would always have exited
> > already.
> 
> No:
> 
> +   if (waiting != cmd->pid)
> +   return 1;
> 
> If waiting < 0, waiting != cmd->pid, and therefore this return is not
> triggered, and there's only one more return at the end of the
> function.

Are my eyes not working? If waiting < 0, then waiting != cmd->pid, and
therefore this return _is_ triggered.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] run-command: add new check_command helper

2013-04-01 Thread Felipe Contreras
On Mon, Apr 1, 2013 at 8:22 PM, Jeff King  wrote:
> On Mon, Apr 01, 2013 at 05:58:55PM -0600, Felipe Contreras wrote:

>> Are you saying that even if we have stored the result of a waitpid
>> command, if errno is EINTR, then we should still loop waitpid()? If
>> so, I guess this would do the trick:
>
> Yes, I think that would work. Though I wonder if it is even worth
> storing EINTR at all in the first place. It tells us nothing. In fact,
> does storing any error condition really tell us anything?

Probably not, I just tried to minimize the potential behavior changes.

> The two states
> we are interested in at this point are:
>
>   1. We have reaped the child via waitpid; here is its status.
>
>   2. We have not (either we did not try, it was not dead yet, or we were
>  not able to due to an error). We should now try it again.
>
> If we got EINTR the first time around, we would likely get the "real"
> answer this time. If we get anything else (like EINVAL or ECHILD), then
> we would get the same thing again calling waitpid() later.
>
>> > We now take argv0 into wait_or_whine. But I don't see it being used.
>> > What's it for?
>>
>> It was there before:
>> -static int wait_or_whine(pid_t pid, const char *argv0)
>> +static int wait_or_whine(struct child_process *cmd, pid_t pid, const
>> char *argv0)
>
> Ah, sorry, I misread the diff. We are adding "cmd", not "argv0".

Yeah, which in fact was already there before.

> That would trigger the rest of your code in the error case, which I
> think was your original intent. But then we return "0" from
> check_command. Is that right?
>
> There are three states we can be in from calling waitpid:
>
>   1. The process is dead.
>
>   2. The process is not dead.
>
>   3. We could not determine which because waitpid returned an error.
>
> It is clear that check_command is trying to tell its caller (1) or (2);
> but what should it say in case of (3)?
>
> Naively, given how patch 2 uses it, I think it would actually make sense
> for it to return 1. That is, the semantics are "return 0 if and only if
> the pid is verified to be dead; otherwise return 1".

I thought if there was an error that constituted a failure.

> But if we know from reading waitpid(3) that waitpid should only fail due
> to EINTR, or due to bogus arguments (e.g., a pid that does not exist or
> has already been reaped), then maybe something like this makes sense:
>
>   while ((waiting = waitpid(pid, &status, 0)) < 0 && errno == EINTR)
>   ; /* nothing */

But we don't want to wait synchronously here, we just want to ping.

>   /* pid definitely still going */
>   if (!waiting)
>   return 1;
>
>   /* pid definitely died */
>   if (waiting == cmd->pid) {
>   cmd->last_status.valid = 1;
>   cmd->last_status.status = status;
>   return 0;
>   }
>
>   /*
>* this should never happen, since we handed waitpid() a single
>* pid, so it should either return that pid, 0, or an error.
>*/
>   if (waiting > 0)
>   die("BUG: waitpid reported a random pid?");
>
>   /*
>* otherwise, we have an error. Assume the pid is gone, since that
>* is the only reason for waitpid to report a problem besides EINTR.
>* We don't bother recording errno, since we can just repeat
>* the waitpid again later.
>*/
>return 0;

The rest makes sense.

>> >> + cmd->last_wait.code = -1;
>> >> + cmd->last_wait.failed_errno = failed_errno;
>> >> + cmd->last_wait.status = status;
>> >
>> > Since we can only get here when waiting == cmd->pid,
>>
>> No, also when waiting < 0.
>
> After the fix above, yes; in the original we would always have exited
> already.

No:

+   if (waiting != cmd->pid)
+   return 1;

If waiting < 0, waiting != cmd->pid, and therefore this return is not
triggered, and there's only one more return at the end of the
function.

Cheers.

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] run-command: add new check_command helper

2013-04-01 Thread Jeff King
On Mon, Apr 01, 2013 at 05:58:55PM -0600, Felipe Contreras wrote:

> On Mon, Apr 1, 2013 at 5:23 PM, Jeff King  wrote:
> > On Mon, Apr 01, 2013 at 03:46:41PM -0600, Felipe Contreras wrote:
> 
> >> -static int wait_or_whine(pid_t pid, const char *argv0)
> >> +static pid_t persistent_waitpid(struct child_process *cmd, pid_t pid, int 
> >> *stat_loc)
> >> +{
> >> + if (cmd->last_wait.code) {
> >> + errno = cmd->last_wait.failed_errno;
> >> + *stat_loc = cmd->last_wait.status;
> >> + return errno ? -1 : pid;
> >> + } else {
> >> + pid_t waiting;
> >> + while ((waiting = waitpid(pid, stat_loc, 0)) < 0 && errno == 
> >> EINTR)
> >> + ;   /* nothing */
> >> + return waiting;
> >> + }
> >> +}
> >
> > So it looks we are trying to save the waitpid state from a previous run
> > and use the saved value. Otherwise, waitpid as normal.
> >
> > We loop on EINTR when we actually call waitpid(). But we don't check
> > whether the saved errno is waitpid. What happens if we EINTR during the
> > saved call to waitpid?
> 
> Are you saying that even if we have stored the result of a waitpid
> command, if errno is EINTR, then we should still loop waitpid()? If
> so, I guess this would do the trick:

Yes, I think that would work. Though I wonder if it is even worth
storing EINTR at all in the first place. It tells us nothing. In fact,
does storing any error condition really tell us anything? The two states
we are interested in at this point are:

  1. We have reaped the child via waitpid; here is its status.

  2. We have not (either we did not try, it was not dead yet, or we were
 not able to due to an error). We should now try it again.

If we got EINTR the first time around, we would likely get the "real"
answer this time. If we get anything else (like EINVAL or ECHILD), then
we would get the same thing again calling waitpid() later.

> > We now take argv0 into wait_or_whine. But I don't see it being used.
> > What's it for?
> 
> It was there before:
> -static int wait_or_whine(pid_t pid, const char *argv0)
> +static int wait_or_whine(struct child_process *cmd, pid_t pid, const
> char *argv0)

Ah, sorry, I misread the diff. We are adding "cmd", not "argv0".

> >> + if (waiting != cmd->pid)
> >> + return 1;
> >> +
> >> + if (waiting < 0)
> >> + failed_errno = errno;
> >
> > How would we ever trigger this second conditional?
> [...]
> How about this?
> 
> if (waiting >= 0 && waiting != cmd->pid)
>   return 1;

That would trigger the rest of your code in the error case, which I
think was your original intent. But then we return "0" from
check_command. Is that right?

There are three states we can be in from calling waitpid:

  1. The process is dead.

  2. The process is not dead.

  3. We could not determine which because waitpid returned an error.

It is clear that check_command is trying to tell its caller (1) or (2);
but what should it say in case of (3)?

Naively, given how patch 2 uses it, I think it would actually make sense
for it to return 1. That is, the semantics are "return 0 if and only if
the pid is verified to be dead; otherwise return 1".

But if we know from reading waitpid(3) that waitpid should only fail due
to EINTR, or due to bogus arguments (e.g., a pid that does not exist or
has already been reaped), then maybe something like this makes sense:

  while ((waiting = waitpid(pid, &status, 0)) < 0 && errno == EINTR)
  ; /* nothing */

  /* pid definitely still going */
  if (!waiting)
  return 1;

  /* pid definitely died */
  if (waiting == cmd->pid) {
  cmd->last_status.valid = 1;
  cmd->last_status.status = status;
  return 0;
  }

  /*
   * this should never happen, since we handed waitpid() a single
   * pid, so it should either return that pid, 0, or an error.
   */
  if (waiting > 0)
  die("BUG: waitpid reported a random pid?");

  /*
   * otherwise, we have an error. Assume the pid is gone, since that
   * is the only reason for waitpid to report a problem besides EINTR.
   * We don't bother recording errno, since we can just repeat
   * the waitpid again later.
   */
   return 0;

> >> + cmd->last_wait.code = -1;
> >> + cmd->last_wait.failed_errno = failed_errno;
> >> + cmd->last_wait.status = status;
> >
> > Since we can only get here when waiting == cmd->pid,
> 
> No, also when waiting < 0.

After the fix above, yes; in the original we would always have exited
already.

As an aside, should check_command be able to be called twice? That is,
should it first check for cmd->last_status.valid and return early if
somebody has already reaped the child? It doesn't matter for the code
you add in patch 2, but it seems like it would give the least surprise
to somebody trying to use it later.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...

Re: [PATCH 1/4] run-command: add new check_command helper

2013-04-01 Thread Felipe Contreras
On Mon, Apr 1, 2013 at 5:23 PM, Jeff King  wrote:
> On Mon, Apr 01, 2013 at 03:46:41PM -0600, Felipe Contreras wrote:

>> -static int wait_or_whine(pid_t pid, const char *argv0)
>> +static pid_t persistent_waitpid(struct child_process *cmd, pid_t pid, int 
>> *stat_loc)
>> +{
>> + if (cmd->last_wait.code) {
>> + errno = cmd->last_wait.failed_errno;
>> + *stat_loc = cmd->last_wait.status;
>> + return errno ? -1 : pid;
>> + } else {
>> + pid_t waiting;
>> + while ((waiting = waitpid(pid, stat_loc, 0)) < 0 && errno == 
>> EINTR)
>> + ;   /* nothing */
>> + return waiting;
>> + }
>> +}
>
> So it looks we are trying to save the waitpid state from a previous run
> and use the saved value. Otherwise, waitpid as normal.
>
> We loop on EINTR when we actually call waitpid(). But we don't check
> whether the saved errno is waitpid. What happens if we EINTR during the
> saved call to waitpid?

Are you saying that even if we have stored the result of a waitpid
command, if errno is EINTR, then we should still loop waitpid()? If
so, I guess this would do the trick:

static pid_t persistent_waitpid(struct child_process *cmd, pid_t pid,
int *stat_loc)
{
pid_t waiting;

if (cmd->last_wait.code) {
errno = cmd->last_wait.failed_errno;
*stat_loc = cmd->last_wait.status;
if (errno != EINTR)
return errno ? -1 : pid;
}

while ((waiting = waitpid(pid, stat_loc, 0)) < 0 && errno == EINTR)
;   /* nothing */
return waiting;
}

>> +static int wait_or_whine(struct child_process *cmd, pid_t pid, const char 
>> *argv0)
>>  {
>>   int status, code = -1;
>>   pid_t waiting;
>>   int failed_errno = 0;
>>
>> - while ((waiting = waitpid(pid, &status, 0)) < 0 && errno == EINTR)
>> - ;   /* nothing */
>> + waiting = persistent_waitpid(cmd, pid, &status);
>>
>>   if (waiting < 0) {
>>   failed_errno = errno;
>
> We now take argv0 into wait_or_whine. But I don't see it being used.
> What's it for?

It was there before:
-static int wait_or_whine(pid_t pid, const char *argv0)
+static int wait_or_whine(struct child_process *cmd, pid_t pid, const
char *argv0)

>> +int check_command(struct child_process *cmd)
>> +{
>> + int status;
>> + pid_t waiting;
>> + int failed_errno = 0;
>> +
>> + waiting = waitpid(cmd->pid, &status, WNOHANG);
>
> This might return the pid if it has died, -1 if there was an error, or 0
> if the process still exists but hasn't died. So...
>
>> + if (waiting != cmd->pid)
>> + return 1;
>> +
>> + if (waiting < 0)
>> + failed_errno = errno;
>
> How would we ever trigger this second conditional? It makes sense to
> return 1 when "waiting == 0", as that is saying "yes, your process is
> still running" (though documenting the return either at the top of the
> function or in the commit message would be helpful)
>
> But if we get an error from waitpid, we would also return 1, which
> doesn't make sense (especially if it is something like EINTR -- I don't
> know offhand if we can get EINTR during WNOHANG. It should not block,
> but I don't know if it can race with a signal).

How about this?

if (waiting >= 0 && waiting != cmd->pid)
return 1;

>> + cmd->last_wait.code = -1;
>> + cmd->last_wait.failed_errno = failed_errno;
>> + cmd->last_wait.status = status;
>
> Since we can only get here when waiting == cmd->pid,

No, also when waiting < 0.

> Why is code set to -1?  It
> seems to be used as a flag to say "this structure is valid". Should it
> be defined as "unsigned valid:1;" instead?

Yeap. I was using it for other purposes before, 'valid' would be better now.

Cheers.

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] run-command: add new check_command helper

2013-04-01 Thread Jeff King
On Mon, Apr 01, 2013 at 03:46:41PM -0600, Felipe Contreras wrote:

> And persistent_waitpid() to recover the information from the last run.
> 
> Signed-off-by: Felipe Contreras 

A little background would be nice here...what problem are we solving?

> -static int wait_or_whine(pid_t pid, const char *argv0)
> +static pid_t persistent_waitpid(struct child_process *cmd, pid_t pid, int 
> *stat_loc)
> +{
> + if (cmd->last_wait.code) {
> + errno = cmd->last_wait.failed_errno;
> + *stat_loc = cmd->last_wait.status;
> + return errno ? -1 : pid;
> + } else {
> + pid_t waiting;
> + while ((waiting = waitpid(pid, stat_loc, 0)) < 0 && errno == 
> EINTR)
> + ;   /* nothing */
> + return waiting;
> + }
> +}

So it looks we are trying to save the waitpid state from a previous run
and use the saved value. Otherwise, waitpid as normal.

We loop on EINTR when we actually call waitpid(). But we don't check
whether the saved errno is waitpid. What happens if we EINTR during the
saved call to waitpid?

> +static int wait_or_whine(struct child_process *cmd, pid_t pid, const char 
> *argv0)
>  {
>   int status, code = -1;
>   pid_t waiting;
>   int failed_errno = 0;
>  
> - while ((waiting = waitpid(pid, &status, 0)) < 0 && errno == EINTR)
> - ;   /* nothing */
> + waiting = persistent_waitpid(cmd, pid, &status);
>  
>   if (waiting < 0) {
>   failed_errno = errno;

We now take argv0 into wait_or_whine. But I don't see it being used.
What's it for?

> +int check_command(struct child_process *cmd)
> +{
> + int status;
> + pid_t waiting;
> + int failed_errno = 0;
> +
> + waiting = waitpid(cmd->pid, &status, WNOHANG);

This might return the pid if it has died, -1 if there was an error, or 0
if the process still exists but hasn't died. So...

> + if (waiting != cmd->pid)
> + return 1;
> +
> + if (waiting < 0)
> + failed_errno = errno;

How would we ever trigger this second conditional? It makes sense to
return 1 when "waiting == 0", as that is saying "yes, your process is
still running" (though documenting the return either at the top of the
function or in the commit message would be helpful)

But if we get an error from waitpid, we would also return 1, which
doesn't make sense (especially if it is something like EINTR -- I don't
know offhand if we can get EINTR during WNOHANG. It should not block,
but I don't know if it can race with a signal).

> + cmd->last_wait.code = -1;
> + cmd->last_wait.failed_errno = failed_errno;
> + cmd->last_wait.status = status;

Since we can only get here when waiting == cmd->pid, failed_errno is
always 0. We do correctly record the status. Why is code set to -1?  It
seems to be used as a flag to say "this structure is valid". Should it
be defined as "unsigned valid:1;" instead?

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html