Re: [PATCH 1/3] usage: refactor die-recursion checks

2013-04-16 Thread Johannes Sixt
Am 4/16/2013 15:01, schrieb Jeff King:
> On Tue, Apr 16, 2013 at 09:18:46AM +0200, Johannes Sixt wrote:
> 
>>> Yeah, that seems sane; my biggest worry was that it would create
>>> headaches for Windows folks, who would have to emulate pthread_key. But
>>> it seems like we already added support in 9ba604a.
>>
>> pthread_key is not a problem, but pthread_once is. It's certainly
>> solvable, but do we really have to?
> 
> I'm not clear on what you are suggesting. That we protect only the main
> thread from recursion, or that we drop the check entirely? Or that we
> implement thread-local storage for this case without using pthread_once?

Anything(*) that does not require pthread_once. A pthread_once
implementation on Windows would be tricky and voluminous and and on top of
it very likely to be done differently for gcc and MSVC. I don't like to go
there if we can avoid it.

(*) That includes doing nothing, but does not include ripping out the
recursion check, as it protects us from crashes.

-- Hannes
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] usage: refactor die-recursion checks

2013-04-16 Thread Jeff King
On Tue, Apr 16, 2013 at 09:18:46AM +0200, Johannes Sixt wrote:

> > Yeah, that seems sane; my biggest worry was that it would create
> > headaches for Windows folks, who would have to emulate pthread_key. But
> > it seems like we already added support in 9ba604a.
> 
> pthread_key is not a problem, but pthread_once is. It's certainly
> solvable, but do we really have to?

I'm not clear on what you are suggesting. That we protect only the main
thread from recursion, or that we drop the check entirely? Or that we
implement thread-local storage for this case without using pthread_once?

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] usage: refactor die-recursion checks

2013-04-16 Thread Johannes Sixt
Am 4/16/2013 4:50, schrieb Jeff King:
> On Mon, Apr 15, 2013 at 07:34:07PM -0700, Brandon Casey wrote:
> 
>>> Right. My assumption was that we are primarily interested in protecting
>>> against the die_routine. Compat functions should never be calling die.
>>
>> I think the rule we have been enforcing is less strict than that.  We
>> have only said that any compat function in a die handler path should
>> never call die.  But maybe that's what you meant.
> 
> No, I assumed we were following the stronger rule. If you are a compat
> function for a C library function, then you should never need to die.
> You should be conforming to the existing interface, which will have some
> mechanism for passing back an error.

This rule has been violated LNG ago, and not only in compat/mingw.c
(see xmalloc in compat/qsort.c, for example).

>> The primary motivation was that Hannes Sixt had to step in and point
>> out yet again that the high-level memory allocators should not be
>> called in anything that could be in a die handler code path.  I was on
>> the thread, but I don't remember the topic (ah, Jonathan has stepped
>> in with the answer).  I do remember that I was not the only one who
>> had forgotten about that rule though.
> 
> Yeah, it is subtle enough that it may be worth protecting against.

Too late.

>> To implement this check correctly/completely (i.e. detect recursion in
>> the main thread as well as in any child threads), I think you really
>> do need to use thread-local storage as you mentioned in 3/3 which
>> could look something like:
>>
>>static pthread_key_t dying;
>>static pthread_once_t dying_once = PTHREAD_ONCE_INIT;
>>
>>void setup_die_counter(void)
>>{
>>pthread_key_create(&dying, NULL);
>>}
>>
>>check_die_recursion(void)
>>{
>>pthread_once(&dying_once, setup_die_counter);
>>if (pthread(getspecific(dying)) {
>>puts("BUG: recursion...");
>>exit(128);
>>}
>>
>>pthread_setspecific(dying, &dying);
>>}
> 
> Yeah, that seems sane; my biggest worry was that it would create
> headaches for Windows folks, who would have to emulate pthread_key. But
> it seems like we already added support in 9ba604a.

pthread_key is not a problem, but pthread_once is. It's certainly
solvable, but do we really have to?

-- Hannes
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] usage: refactor die-recursion checks

2013-04-15 Thread Jeff King
On Mon, Apr 15, 2013 at 07:34:07PM -0700, Brandon Casey wrote:

> > Right. My assumption was that we are primarily interested in protecting
> > against the die_routine. Compat functions should never be calling die.
> 
> I think the rule we have been enforcing is less strict than that.  We
> have only said that any compat function in a die handler path should
> never call die.  But maybe that's what you meant.

No, I assumed we were following the stronger rule. If you are a compat
function for a C library function, then you should never need to die.
You should be conforming to the existing interface, which will have some
mechanism for passing back an error.

> The primary motivation was that Hannes Sixt had to step in and point
> out yet again that the high-level memory allocators should not be
> called in anything that could be in a die handler code path.  I was on
> the thread, but I don't remember the topic (ah, Jonathan has stepped
> in with the answer).  I do remember that I was not the only one who
> had forgotten about that rule though.

Yeah, it is subtle enough that it may be worth protecting against.

> To implement this check correctly/completely (i.e. detect recursion in
> the main thread as well as in any child threads), I think you really
> do need to use thread-local storage as you mentioned in 3/3 which
> could look something like:
> 
>static pthread_key_t dying;
>static pthread_once_t dying_once = PTHREAD_ONCE_INIT;
> 
>void setup_die_counter(void)
>{
>pthread_key_create(&dying, NULL);
>}
> 
>check_die_recursion(void)
>{
>pthread_once(&dying_once, setup_die_counter);
>if (pthread(getspecific(dying)) {
>puts("BUG: recursion...");
>exit(128);
>}
> 
>pthread_setspecific(dying, &dying);
>}

Yeah, that seems sane; my biggest worry was that it would create
headaches for Windows folks, who would have to emulate pthread_key. But
it seems like we already added support in 9ba604a.

I'll try to re-work the series with thread-local storage, and I'll leave
off the extra printing. This _should_ never happen, so if we are going
to put in the check, it is probably better to be more thorough than to
worry about what the error message looks like.

Thanks for looking it over.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] usage: refactor die-recursion checks

2013-04-15 Thread Brandon Casey
On Mon, Apr 15, 2013 at 5:42 PM, Jeff King  wrote:
> On Mon, Apr 15, 2013 at 05:11:36PM -0700, Brandon Casey wrote:
>
>> > +static void check_die_recursion(const char *fmt, va_list ap)
>> > +{
>> > +   static int dying;
>> > +
>> > +   if (!dying++)
>> > +   return;
>> > +
>> > +   vreportf("fatal: ", fmt, ap);
>>
>> How do you know it's safe to call vreportf() ?
>
> Because I hand-audited it.

:)

> But I think the more important question you
> are getting at is: how do I know that it will remain safe to call
> vreportf?

Right.

>> If the bug is in the vreportf code path, we will recurse infinitely
>> (at least until the stack is used up). An implementation of vsnprintf
>> exists in compat/snprintf.c for example.
>
> Right. My assumption was that we are primarily interested in protecting
> against the die_routine. Compat functions should never be calling die.

I think the rule we have been enforcing is less strict than that.  We
have only said that any compat function in a die handler path should
never call die.  But maybe that's what you meant.

> Of course that is assuming nobody violates that rule, which is part of
> the point of the check.
>
>> It's nice to print out the error message here, but I think doing so
>> defeats the purpose of this "dying" check.  Better to get the stack
>> trace from a core dump.
>
> Easier said than done, sometimes, if you are debugging somebody else's
> problem from a dump of a terminal session. :)
>
> But I can live with dropping this patch; my primary goal is the bug-fix
> in patch three.
>
> I was also tempted to suggest just dropping the recursion check
> altogether. While it is neat to detect such things, it's a "should never
> happen" bug situation, and an infinite loop of printing out the same
> message is pretty easy to notice. Do you remember what spurred the
> original check? The message in cd163d4 doesn't say.

That's a valid option.

The primary motivation was that Hannes Sixt had to step in and point
out yet again that the high-level memory allocators should not be
called in anything that could be in a die handler code path.  I was on
the thread, but I don't remember the topic (ah, Jonathan has stepped
in with the answer).  I do remember that I was not the only one who
had forgotten about that rule though.

We didn't actually have someone report that they encountered infinite
recursion, but it seemed easy enough to add a check for it, so...

Unfortunately, I didn't realize that the async functions installed
their own die handler which may not exit the process, allowing die to
be called multiple times.

To implement this check correctly/completely (i.e. detect recursion in
the main thread as well as in any child threads), I think you really
do need to use thread-local storage as you mentioned in 3/3 which
could look something like:

   static pthread_key_t dying;
   static pthread_once_t dying_once = PTHREAD_ONCE_INIT;

   void setup_die_counter(void)
   {
   pthread_key_create(&dying, NULL);
   }

   check_die_recursion(void)
   {
   pthread_once(&dying_once, setup_die_counter);
   if (pthread(getspecific(dying)) {
   puts("BUG: recursion...");
   exit(128);
   }

   pthread_setspecific(dying, &dying);
   }

or maybe the setup could be performed in set_die_routine(), but it
does kinda seem like overkill for a "nicety" like this.  So maybe
checking for recursion in just the main thread as this series does is
better than nothing.

-Brandon
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] usage: refactor die-recursion checks

2013-04-15 Thread Jonathan Nieder
Jeff King wrote:

> I was also tempted to suggest just dropping the recursion check
> altogether. While it is neat to detect such things, it's a "should never
> happen" bug situation, and an infinite loop of printing out the same
> message is pretty easy to notice.

On servers it might be useful to avoid accidentally filling up logs
quickly.

IIUC the context is in the following two threads:

 http://thread.gmane.org/gmane.comp.version-control.git/182982
 http://thread.gmane.org/gmane.comp.version-control.git/181421/focus=181443

Thanks,
Jonathan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] usage: refactor die-recursion checks

2013-04-15 Thread Jeff King
On Mon, Apr 15, 2013 at 05:11:36PM -0700, Brandon Casey wrote:

> > +static void check_die_recursion(const char *fmt, va_list ap)
> > +{
> > +   static int dying;
> > +
> > +   if (!dying++)
> > +   return;
> > +
> > +   vreportf("fatal: ", fmt, ap);
> 
> How do you know it's safe to call vreportf() ?

Because I hand-audited it. But I think the more important question you
are getting at is: how do I know that it will remain safe to call
vreportf?

> If the bug is in the vreportf code path, we will recurse infinitely
> (at least until the stack is used up). An implementation of vsnprintf
> exists in compat/snprintf.c for example.

Right. My assumption was that we are primarily interested in protecting
against the die_routine. Compat functions should never be calling die.
Of course that is assuming nobody violates that rule, which is part of
the point of the check.

> It's nice to print out the error message here, but I think doing so
> defeats the purpose of this "dying" check.  Better to get the stack
> trace from a core dump.

Easier said than done, sometimes, if you are debugging somebody else's
problem from a dump of a terminal session. :)

But I can live with dropping this patch; my primary goal is the bug-fix
in patch three.

I was also tempted to suggest just dropping the recursion check
altogether. While it is neat to detect such things, it's a "should never
happen" bug situation, and an infinite loop of printing out the same
message is pretty easy to notice. Do you remember what spurred the
original check? The message in cd163d4 doesn't say.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] usage: refactor die-recursion checks

2013-04-15 Thread Brandon Casey
On Mon, Apr 15, 2013 at 4:08 PM, Jeff King  wrote:
> When any git code calls die(), we chain to a custom
> die_routine, which we expect to print a message and exit the
> program. To avoid infinite loops, we detect a recursive call
> to die() with a simple counter, and break out of the loop by
> printing a message and exiting ourselves, without chaining
> to the die_routine.
>
> But the user does not get to see the message that would have
> been fed to the die_routine, which makes debugging harder.
> The user does not know if it was a true infinite loop, or
> simply a single re-entrant call, since they cannot compare
> the messages. Furthermore, if we are wrong about detecting
> the recursion, we have blocked the user from seeing the
> original message, which is probably the more useful one.
>
> This patch teaches die() to print the original die message
> to stderr before reporting the recursion. The custom
> die_routine may or may not have put it the message to
> stderr, but this is the best we can do (it is what most
> handlers will do anyway, and it is where our recursion error
> will go).
>
> While we're at it, let's mark the "recursion detected"
> message as a "BUG:", since it should never happen in
> practice. And let's factor out the repeated code in die and
> die_errno. This loses the information of which function was
> called to cause the recursion, but it's important; knowing
> the actual message fed to the function (which we now do) is
> much more useful, as it can generally pin-point the actual
> call-site that caused the recursion.
>
> Signed-off-by: Jeff King 
> ---
> This helped me debug the current problem. And factoring it out helps
> with patch 3. :)
>
>  usage.c | 29 ++---
>  1 file changed, 14 insertions(+), 15 deletions(-)
>
> diff --git a/usage.c b/usage.c
> index 40b3de5..c6b7ac5 100644
> --- a/usage.c
> +++ b/usage.c

> @@ -80,17 +78,24 @@ void NORETURN die(const char *err, ...)
> usagef("%s", err);
>  }
>
> +static void check_die_recursion(const char *fmt, va_list ap)
> +{
> +   static int dying;
> +
> +   if (!dying++)
> +   return;
> +
> +   vreportf("fatal: ", fmt, ap);

How do you know it's safe to call vreportf() ?

If the bug is in the vreportf code path, we will recurse infinitely
(at least until the stack is used up). An implementation of vsnprintf
exists in compat/snprintf.c for example.

It's nice to print out the error message here, but I think doing so
defeats the purpose of this "dying" check.  Better to get the stack
trace from a core dump.

> +   fputs("BUG: recursion detected in die handler\n", stderr);
> +   exit(128);
> +}
> +

-Brandon
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] usage: refactor die-recursion checks

2013-04-15 Thread Jeff King
On Mon, Apr 15, 2013 at 07:45:03PM -0400, Eric Sunshine wrote:

> On Mon, Apr 15, 2013 at 7:08 PM, Jeff King  wrote:
> > This patch teaches die() to print the original die message
> > to stderr before reporting the recursion. The custom
> > die_routine may or may not have put it the message to
> 
> s/put it the/emitted/ perhaps?

I meant s/ it//, but I think "sent the message to..." is probably more
clear.

> > stderr, but this is the best we can do (it is what most
> > handlers will do anyway, and it is where our recursion error
> > will go).
> >
> > While we're at it, let's mark the "recursion detected"
> > message as a "BUG:", since it should never happen in
> > practice. And let's factor out the repeated code in die and
> > die_errno. This loses the information of which function was
> > called to cause the recursion, but it's important; knowing
> 
> Was this supposed to be s/important/unimportant/?

Urgh, yes, it was originally "not important" but I lost the "not" while
trying to clarify the wording.

Thanks.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] usage: refactor die-recursion checks

2013-04-15 Thread Eric Sunshine
On Mon, Apr 15, 2013 at 7:08 PM, Jeff King  wrote:
> This patch teaches die() to print the original die message
> to stderr before reporting the recursion. The custom
> die_routine may or may not have put it the message to

s/put it the/emitted/ perhaps?

> stderr, but this is the best we can do (it is what most
> handlers will do anyway, and it is where our recursion error
> will go).
>
> While we're at it, let's mark the "recursion detected"
> message as a "BUG:", since it should never happen in
> practice. And let's factor out the repeated code in die and
> die_errno. This loses the information of which function was
> called to cause the recursion, but it's important; knowing

Was this supposed to be s/important/unimportant/?

> the actual message fed to the function (which we now do) is
> much more useful, as it can generally pin-point the actual
> call-site that caused the recursion.
>
> Signed-off-by: Jeff King 
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] usage: refactor die-recursion checks

2013-04-15 Thread Jeff King
When any git code calls die(), we chain to a custom
die_routine, which we expect to print a message and exit the
program. To avoid infinite loops, we detect a recursive call
to die() with a simple counter, and break out of the loop by
printing a message and exiting ourselves, without chaining
to the die_routine.

But the user does not get to see the message that would have
been fed to the die_routine, which makes debugging harder.
The user does not know if it was a true infinite loop, or
simply a single re-entrant call, since they cannot compare
the messages. Furthermore, if we are wrong about detecting
the recursion, we have blocked the user from seeing the
original message, which is probably the more useful one.

This patch teaches die() to print the original die message
to stderr before reporting the recursion. The custom
die_routine may or may not have put it the message to
stderr, but this is the best we can do (it is what most
handlers will do anyway, and it is where our recursion error
will go).

While we're at it, let's mark the "recursion detected"
message as a "BUG:", since it should never happen in
practice. And let's factor out the repeated code in die and
die_errno. This loses the information of which function was
called to cause the recursion, but it's important; knowing
the actual message fed to the function (which we now do) is
much more useful, as it can generally pin-point the actual
call-site that caused the recursion.

Signed-off-by: Jeff King 
---
This helped me debug the current problem. And factoring it out helps
with patch 3. :)

 usage.c | 29 ++---
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/usage.c b/usage.c
index 40b3de5..c6b7ac5 100644
--- a/usage.c
+++ b/usage.c
@@ -6,8 +6,6 @@
 #include "git-compat-util.h"
 #include "cache.h"
 
-static int dying;
-
 void vreportf(const char *prefix, const char *err, va_list params)
 {
char msg[4096];
@@ -80,17 +78,24 @@ void NORETURN die(const char *err, ...)
usagef("%s", err);
 }
 
+static void check_die_recursion(const char *fmt, va_list ap)
+{
+   static int dying;
+
+   if (!dying++)
+   return;
+
+   vreportf("fatal: ", fmt, ap);
+   fputs("BUG: recursion detected in die handler\n", stderr);
+   exit(128);
+}
+
 void NORETURN die(const char *err, ...)
 {
va_list params;
 
-   if (dying) {
-   fputs("fatal: recursion detected in die handler\n", stderr);
-   exit(128);
-   }
-   dying = 1;
-
va_start(params, err);
+   check_die_recursion(err, params);
die_routine(err, params);
va_end(params);
 }
@@ -102,13 +107,6 @@ void NORETURN die_errno(const char *fmt, ...)
char str_error[256], *err;
int i, j;
 
-   if (dying) {
-   fputs("fatal: recursion detected in die_errno handler\n",
-   stderr);
-   exit(128);
-   }
-   dying = 1;
-
err = strerror(errno);
for (i = j = 0; err[i] && j < sizeof(str_error) - 1; ) {
if ((str_error[j++] = err[i++]) != '%')
@@ -126,6 +124,7 @@ void NORETURN die_errno(const char *fmt, ...)
snprintf(fmt_with_err, sizeof(fmt_with_err), "%s: %s", fmt, str_error);
 
va_start(params, fmt);
+   check_die_recursion(fmt_with_err, params);
die_routine(fmt_with_err, params);
va_end(params);
 }
-- 
1.8.2.8.g44e4c28

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html