Re: [PATCH 1/3] usage: refactor die-recursion checks
Am 4/16/2013 15:01, schrieb Jeff King: > On Tue, Apr 16, 2013 at 09:18:46AM +0200, Johannes Sixt wrote: > >>> Yeah, that seems sane; my biggest worry was that it would create >>> headaches for Windows folks, who would have to emulate pthread_key. But >>> it seems like we already added support in 9ba604a. >> >> pthread_key is not a problem, but pthread_once is. It's certainly >> solvable, but do we really have to? > > I'm not clear on what you are suggesting. That we protect only the main > thread from recursion, or that we drop the check entirely? Or that we > implement thread-local storage for this case without using pthread_once? Anything(*) that does not require pthread_once. A pthread_once implementation on Windows would be tricky and voluminous and and on top of it very likely to be done differently for gcc and MSVC. I don't like to go there if we can avoid it. (*) That includes doing nothing, but does not include ripping out the recursion check, as it protects us from crashes. -- Hannes -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] usage: refactor die-recursion checks
On Tue, Apr 16, 2013 at 09:18:46AM +0200, Johannes Sixt wrote: > > Yeah, that seems sane; my biggest worry was that it would create > > headaches for Windows folks, who would have to emulate pthread_key. But > > it seems like we already added support in 9ba604a. > > pthread_key is not a problem, but pthread_once is. It's certainly > solvable, but do we really have to? I'm not clear on what you are suggesting. That we protect only the main thread from recursion, or that we drop the check entirely? Or that we implement thread-local storage for this case without using pthread_once? -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] usage: refactor die-recursion checks
Am 4/16/2013 4:50, schrieb Jeff King: > On Mon, Apr 15, 2013 at 07:34:07PM -0700, Brandon Casey wrote: > >>> Right. My assumption was that we are primarily interested in protecting >>> against the die_routine. Compat functions should never be calling die. >> >> I think the rule we have been enforcing is less strict than that. We >> have only said that any compat function in a die handler path should >> never call die. But maybe that's what you meant. > > No, I assumed we were following the stronger rule. If you are a compat > function for a C library function, then you should never need to die. > You should be conforming to the existing interface, which will have some > mechanism for passing back an error. This rule has been violated LNG ago, and not only in compat/mingw.c (see xmalloc in compat/qsort.c, for example). >> The primary motivation was that Hannes Sixt had to step in and point >> out yet again that the high-level memory allocators should not be >> called in anything that could be in a die handler code path. I was on >> the thread, but I don't remember the topic (ah, Jonathan has stepped >> in with the answer). I do remember that I was not the only one who >> had forgotten about that rule though. > > Yeah, it is subtle enough that it may be worth protecting against. Too late. >> To implement this check correctly/completely (i.e. detect recursion in >> the main thread as well as in any child threads), I think you really >> do need to use thread-local storage as you mentioned in 3/3 which >> could look something like: >> >>static pthread_key_t dying; >>static pthread_once_t dying_once = PTHREAD_ONCE_INIT; >> >>void setup_die_counter(void) >>{ >>pthread_key_create(&dying, NULL); >>} >> >>check_die_recursion(void) >>{ >>pthread_once(&dying_once, setup_die_counter); >>if (pthread(getspecific(dying)) { >>puts("BUG: recursion..."); >>exit(128); >>} >> >>pthread_setspecific(dying, &dying); >>} > > Yeah, that seems sane; my biggest worry was that it would create > headaches for Windows folks, who would have to emulate pthread_key. But > it seems like we already added support in 9ba604a. pthread_key is not a problem, but pthread_once is. It's certainly solvable, but do we really have to? -- Hannes -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] usage: refactor die-recursion checks
On Mon, Apr 15, 2013 at 07:34:07PM -0700, Brandon Casey wrote: > > Right. My assumption was that we are primarily interested in protecting > > against the die_routine. Compat functions should never be calling die. > > I think the rule we have been enforcing is less strict than that. We > have only said that any compat function in a die handler path should > never call die. But maybe that's what you meant. No, I assumed we were following the stronger rule. If you are a compat function for a C library function, then you should never need to die. You should be conforming to the existing interface, which will have some mechanism for passing back an error. > The primary motivation was that Hannes Sixt had to step in and point > out yet again that the high-level memory allocators should not be > called in anything that could be in a die handler code path. I was on > the thread, but I don't remember the topic (ah, Jonathan has stepped > in with the answer). I do remember that I was not the only one who > had forgotten about that rule though. Yeah, it is subtle enough that it may be worth protecting against. > To implement this check correctly/completely (i.e. detect recursion in > the main thread as well as in any child threads), I think you really > do need to use thread-local storage as you mentioned in 3/3 which > could look something like: > >static pthread_key_t dying; >static pthread_once_t dying_once = PTHREAD_ONCE_INIT; > >void setup_die_counter(void) >{ >pthread_key_create(&dying, NULL); >} > >check_die_recursion(void) >{ >pthread_once(&dying_once, setup_die_counter); >if (pthread(getspecific(dying)) { >puts("BUG: recursion..."); >exit(128); >} > >pthread_setspecific(dying, &dying); >} Yeah, that seems sane; my biggest worry was that it would create headaches for Windows folks, who would have to emulate pthread_key. But it seems like we already added support in 9ba604a. I'll try to re-work the series with thread-local storage, and I'll leave off the extra printing. This _should_ never happen, so if we are going to put in the check, it is probably better to be more thorough than to worry about what the error message looks like. Thanks for looking it over. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] usage: refactor die-recursion checks
On Mon, Apr 15, 2013 at 5:42 PM, Jeff King wrote: > On Mon, Apr 15, 2013 at 05:11:36PM -0700, Brandon Casey wrote: > >> > +static void check_die_recursion(const char *fmt, va_list ap) >> > +{ >> > + static int dying; >> > + >> > + if (!dying++) >> > + return; >> > + >> > + vreportf("fatal: ", fmt, ap); >> >> How do you know it's safe to call vreportf() ? > > Because I hand-audited it. :) > But I think the more important question you > are getting at is: how do I know that it will remain safe to call > vreportf? Right. >> If the bug is in the vreportf code path, we will recurse infinitely >> (at least until the stack is used up). An implementation of vsnprintf >> exists in compat/snprintf.c for example. > > Right. My assumption was that we are primarily interested in protecting > against the die_routine. Compat functions should never be calling die. I think the rule we have been enforcing is less strict than that. We have only said that any compat function in a die handler path should never call die. But maybe that's what you meant. > Of course that is assuming nobody violates that rule, which is part of > the point of the check. > >> It's nice to print out the error message here, but I think doing so >> defeats the purpose of this "dying" check. Better to get the stack >> trace from a core dump. > > Easier said than done, sometimes, if you are debugging somebody else's > problem from a dump of a terminal session. :) > > But I can live with dropping this patch; my primary goal is the bug-fix > in patch three. > > I was also tempted to suggest just dropping the recursion check > altogether. While it is neat to detect such things, it's a "should never > happen" bug situation, and an infinite loop of printing out the same > message is pretty easy to notice. Do you remember what spurred the > original check? The message in cd163d4 doesn't say. That's a valid option. The primary motivation was that Hannes Sixt had to step in and point out yet again that the high-level memory allocators should not be called in anything that could be in a die handler code path. I was on the thread, but I don't remember the topic (ah, Jonathan has stepped in with the answer). I do remember that I was not the only one who had forgotten about that rule though. We didn't actually have someone report that they encountered infinite recursion, but it seemed easy enough to add a check for it, so... Unfortunately, I didn't realize that the async functions installed their own die handler which may not exit the process, allowing die to be called multiple times. To implement this check correctly/completely (i.e. detect recursion in the main thread as well as in any child threads), I think you really do need to use thread-local storage as you mentioned in 3/3 which could look something like: static pthread_key_t dying; static pthread_once_t dying_once = PTHREAD_ONCE_INIT; void setup_die_counter(void) { pthread_key_create(&dying, NULL); } check_die_recursion(void) { pthread_once(&dying_once, setup_die_counter); if (pthread(getspecific(dying)) { puts("BUG: recursion..."); exit(128); } pthread_setspecific(dying, &dying); } or maybe the setup could be performed in set_die_routine(), but it does kinda seem like overkill for a "nicety" like this. So maybe checking for recursion in just the main thread as this series does is better than nothing. -Brandon -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] usage: refactor die-recursion checks
Jeff King wrote: > I was also tempted to suggest just dropping the recursion check > altogether. While it is neat to detect such things, it's a "should never > happen" bug situation, and an infinite loop of printing out the same > message is pretty easy to notice. On servers it might be useful to avoid accidentally filling up logs quickly. IIUC the context is in the following two threads: http://thread.gmane.org/gmane.comp.version-control.git/182982 http://thread.gmane.org/gmane.comp.version-control.git/181421/focus=181443 Thanks, Jonathan -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] usage: refactor die-recursion checks
On Mon, Apr 15, 2013 at 05:11:36PM -0700, Brandon Casey wrote: > > +static void check_die_recursion(const char *fmt, va_list ap) > > +{ > > + static int dying; > > + > > + if (!dying++) > > + return; > > + > > + vreportf("fatal: ", fmt, ap); > > How do you know it's safe to call vreportf() ? Because I hand-audited it. But I think the more important question you are getting at is: how do I know that it will remain safe to call vreportf? > If the bug is in the vreportf code path, we will recurse infinitely > (at least until the stack is used up). An implementation of vsnprintf > exists in compat/snprintf.c for example. Right. My assumption was that we are primarily interested in protecting against the die_routine. Compat functions should never be calling die. Of course that is assuming nobody violates that rule, which is part of the point of the check. > It's nice to print out the error message here, but I think doing so > defeats the purpose of this "dying" check. Better to get the stack > trace from a core dump. Easier said than done, sometimes, if you are debugging somebody else's problem from a dump of a terminal session. :) But I can live with dropping this patch; my primary goal is the bug-fix in patch three. I was also tempted to suggest just dropping the recursion check altogether. While it is neat to detect such things, it's a "should never happen" bug situation, and an infinite loop of printing out the same message is pretty easy to notice. Do you remember what spurred the original check? The message in cd163d4 doesn't say. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] usage: refactor die-recursion checks
On Mon, Apr 15, 2013 at 4:08 PM, Jeff King wrote: > When any git code calls die(), we chain to a custom > die_routine, which we expect to print a message and exit the > program. To avoid infinite loops, we detect a recursive call > to die() with a simple counter, and break out of the loop by > printing a message and exiting ourselves, without chaining > to the die_routine. > > But the user does not get to see the message that would have > been fed to the die_routine, which makes debugging harder. > The user does not know if it was a true infinite loop, or > simply a single re-entrant call, since they cannot compare > the messages. Furthermore, if we are wrong about detecting > the recursion, we have blocked the user from seeing the > original message, which is probably the more useful one. > > This patch teaches die() to print the original die message > to stderr before reporting the recursion. The custom > die_routine may or may not have put it the message to > stderr, but this is the best we can do (it is what most > handlers will do anyway, and it is where our recursion error > will go). > > While we're at it, let's mark the "recursion detected" > message as a "BUG:", since it should never happen in > practice. And let's factor out the repeated code in die and > die_errno. This loses the information of which function was > called to cause the recursion, but it's important; knowing > the actual message fed to the function (which we now do) is > much more useful, as it can generally pin-point the actual > call-site that caused the recursion. > > Signed-off-by: Jeff King > --- > This helped me debug the current problem. And factoring it out helps > with patch 3. :) > > usage.c | 29 ++--- > 1 file changed, 14 insertions(+), 15 deletions(-) > > diff --git a/usage.c b/usage.c > index 40b3de5..c6b7ac5 100644 > --- a/usage.c > +++ b/usage.c > @@ -80,17 +78,24 @@ void NORETURN die(const char *err, ...) > usagef("%s", err); > } > > +static void check_die_recursion(const char *fmt, va_list ap) > +{ > + static int dying; > + > + if (!dying++) > + return; > + > + vreportf("fatal: ", fmt, ap); How do you know it's safe to call vreportf() ? If the bug is in the vreportf code path, we will recurse infinitely (at least until the stack is used up). An implementation of vsnprintf exists in compat/snprintf.c for example. It's nice to print out the error message here, but I think doing so defeats the purpose of this "dying" check. Better to get the stack trace from a core dump. > + fputs("BUG: recursion detected in die handler\n", stderr); > + exit(128); > +} > + -Brandon -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] usage: refactor die-recursion checks
On Mon, Apr 15, 2013 at 07:45:03PM -0400, Eric Sunshine wrote: > On Mon, Apr 15, 2013 at 7:08 PM, Jeff King wrote: > > This patch teaches die() to print the original die message > > to stderr before reporting the recursion. The custom > > die_routine may or may not have put it the message to > > s/put it the/emitted/ perhaps? I meant s/ it//, but I think "sent the message to..." is probably more clear. > > stderr, but this is the best we can do (it is what most > > handlers will do anyway, and it is where our recursion error > > will go). > > > > While we're at it, let's mark the "recursion detected" > > message as a "BUG:", since it should never happen in > > practice. And let's factor out the repeated code in die and > > die_errno. This loses the information of which function was > > called to cause the recursion, but it's important; knowing > > Was this supposed to be s/important/unimportant/? Urgh, yes, it was originally "not important" but I lost the "not" while trying to clarify the wording. Thanks. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] usage: refactor die-recursion checks
On Mon, Apr 15, 2013 at 7:08 PM, Jeff King wrote: > This patch teaches die() to print the original die message > to stderr before reporting the recursion. The custom > die_routine may or may not have put it the message to s/put it the/emitted/ perhaps? > stderr, but this is the best we can do (it is what most > handlers will do anyway, and it is where our recursion error > will go). > > While we're at it, let's mark the "recursion detected" > message as a "BUG:", since it should never happen in > practice. And let's factor out the repeated code in die and > die_errno. This loses the information of which function was > called to cause the recursion, but it's important; knowing Was this supposed to be s/important/unimportant/? > the actual message fed to the function (which we now do) is > much more useful, as it can generally pin-point the actual > call-site that caused the recursion. > > Signed-off-by: Jeff King -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] usage: refactor die-recursion checks
When any git code calls die(), we chain to a custom die_routine, which we expect to print a message and exit the program. To avoid infinite loops, we detect a recursive call to die() with a simple counter, and break out of the loop by printing a message and exiting ourselves, without chaining to the die_routine. But the user does not get to see the message that would have been fed to the die_routine, which makes debugging harder. The user does not know if it was a true infinite loop, or simply a single re-entrant call, since they cannot compare the messages. Furthermore, if we are wrong about detecting the recursion, we have blocked the user from seeing the original message, which is probably the more useful one. This patch teaches die() to print the original die message to stderr before reporting the recursion. The custom die_routine may or may not have put it the message to stderr, but this is the best we can do (it is what most handlers will do anyway, and it is where our recursion error will go). While we're at it, let's mark the "recursion detected" message as a "BUG:", since it should never happen in practice. And let's factor out the repeated code in die and die_errno. This loses the information of which function was called to cause the recursion, but it's important; knowing the actual message fed to the function (which we now do) is much more useful, as it can generally pin-point the actual call-site that caused the recursion. Signed-off-by: Jeff King --- This helped me debug the current problem. And factoring it out helps with patch 3. :) usage.c | 29 ++--- 1 file changed, 14 insertions(+), 15 deletions(-) diff --git a/usage.c b/usage.c index 40b3de5..c6b7ac5 100644 --- a/usage.c +++ b/usage.c @@ -6,8 +6,6 @@ #include "git-compat-util.h" #include "cache.h" -static int dying; - void vreportf(const char *prefix, const char *err, va_list params) { char msg[4096]; @@ -80,17 +78,24 @@ void NORETURN die(const char *err, ...) usagef("%s", err); } +static void check_die_recursion(const char *fmt, va_list ap) +{ + static int dying; + + if (!dying++) + return; + + vreportf("fatal: ", fmt, ap); + fputs("BUG: recursion detected in die handler\n", stderr); + exit(128); +} + void NORETURN die(const char *err, ...) { va_list params; - if (dying) { - fputs("fatal: recursion detected in die handler\n", stderr); - exit(128); - } - dying = 1; - va_start(params, err); + check_die_recursion(err, params); die_routine(err, params); va_end(params); } @@ -102,13 +107,6 @@ void NORETURN die_errno(const char *fmt, ...) char str_error[256], *err; int i, j; - if (dying) { - fputs("fatal: recursion detected in die_errno handler\n", - stderr); - exit(128); - } - dying = 1; - err = strerror(errno); for (i = j = 0; err[i] && j < sizeof(str_error) - 1; ) { if ((str_error[j++] = err[i++]) != '%') @@ -126,6 +124,7 @@ void NORETURN die_errno(const char *fmt, ...) snprintf(fmt_with_err, sizeof(fmt_with_err), "%s: %s", fmt, str_error); va_start(params, fmt); + check_die_recursion(fmt_with_err, params); die_routine(fmt_with_err, params); va_end(params); } -- 1.8.2.8.g44e4c28 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html