Re: [PATCH] blame: handle broken commit headers gracefully
On Wed, Apr 17, 2013 at 02:55:29PM -0700, Junio C Hamano wrote: > Or you can imagine nastier input strings, like > >Name <>- 123456789 - >Name - 123456789 - >Name 56789 - > > I am afraid that at some point "we should salvage as much as we > can", which is a worthy goal, becomes a losing proposition. Good point. In the worst cases, even if you cleaned things up, you might even need to allocate a new string (like your middle one), which would make calling split_ident_line a lot more annoying. Probably not worth the effort. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] blame: handle broken commit headers gracefully
Jeff King writes: > On Wed, Apr 17, 2013 at 08:33:54PM +0200, René Scharfe wrote: > >> Minimal patch, test case missing. It's a bit sad that the old commit >> parser of blame handled Ivan's specific corruption (extra "-<>" after >> email) gracefully because it used the spaces as cutting points instead >> of "<" and ">". > > That may mean there is room for improvement in split_ident_line to > be more resilient in removing cruft. With something like: > > Name -<> 123456789 - > > it would obviously be nice to find the date timestamp there, but I > wonder what the "email" field should return? The full broken string, or > just "email@host"? Or you can imagine nastier input strings, like Name <>- 123456789 - Name - 123456789 - Name 56789 - I am afraid that at some point "we should salvage as much as we can", which is a worthy goal, becomes a losing proposition. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] blame: handle broken commit headers gracefully
Am 17.04.2013 23:07, schrieb Jeff King: On Wed, Apr 17, 2013 at 08:33:54PM +0200, René Scharfe wrote: Minimal patch, test case missing. It's a bit sad that the old commit parser of blame handled Ivan's specific corruption (extra "-<>" after email) gracefully because it used the spaces as cutting points instead of "<" and ">". That may mean there is room for improvement in split_ident_line to be more resilient in removing cruft. With something like: Name -<> 123456789 - it would obviously be nice to find the date timestamp there, but I wonder what the "email" field should return? The full broken string, or just "email@host"? One way is convenient for overlooking problems in broken commits, but I would worry about code paths that are using split_ident_line to verify the quality of the string (like determine_author_info). It's possible we would need a strict and a forgiving mode. You can have both; the necessary data is in the struct ident_split: Just check that *mail_end == '>' and mail_end + 1 == date_begin etc. René -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] blame: handle broken commit headers gracefully
On Wed, Apr 17, 2013 at 08:33:54PM +0200, René Scharfe wrote: > Minimal patch, test case missing. It's a bit sad that the old commit > parser of blame handled Ivan's specific corruption (extra "-<>" after > email) gracefully because it used the spaces as cutting points instead > of "<" and ">". That may mean there is room for improvement in split_ident_line to be more resilient in removing cruft. With something like: Name -<> 123456789 - it would obviously be nice to find the date timestamp there, but I wonder what the "email" field should return? The full broken string, or just "email@host"? One way is convenient for overlooking problems in broken commits, but I would worry about code paths that are using split_ident_line to verify the quality of the string (like determine_author_info). It's possible we would need a strict and a forgiving mode. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] blame: handle broken commit headers gracefully
split_ident_line() can leave us with the pointers date_begin, date_end, tz_begin and tz_end all set to NULL. Check them before use and supply the same fallback values as in the case of a negative return code from split_ident_line(). The "(unknown)" is not actually shown in the output, though, because it will be converted to a number (zero) eventually. Signed-off-by: Rene Scharfe --- Minimal patch, test case missing. It's a bit sad that the old commit parser of blame handled Ivan's specific corruption (extra "-<>" after email) gracefully because it used the spaces as cutting points instead of "<" and ">". builtin/blame.c | 11 --- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/builtin/blame.c b/builtin/blame.c index 86100e9..7770781 100644 --- a/builtin/blame.c +++ b/builtin/blame.c @@ -1375,10 +1375,15 @@ static void get_ac_line(const char *inbuf, const char *what, maillen = ident.mail_end - ident.mail_begin; mailbuf = ident.mail_begin; - *time = strtoul(ident.date_begin, NULL, 10); + if (ident.date_begin && ident.date_end) + *time = strtoul(ident.date_begin, NULL, 10); + else + *time = 0; - len = ident.tz_end - ident.tz_begin; - strbuf_add(tz, ident.tz_begin, len); + if (ident.tz_begin && ident.tz_end) + strbuf_add(tz, ident.tz_begin, ident.tz_end - ident.tz_begin); + else + strbuf_addstr(tz, "(unknown)"); /* * Now, convert both name and e-mail using mailmap -- 1.8.2.1 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html