Re: [PATCH] blame: handle broken commit headers gracefully

2013-04-18 Thread Jeff King
On Wed, Apr 17, 2013 at 02:55:29PM -0700, Junio C Hamano wrote:

> Or you can imagine nastier input strings, like
> 
>Name <>- 123456789 -
>Name - 123456789 -
>Name 56789 -
> 
> I am afraid that at some point "we should salvage as much as we
> can", which is a worthy goal, becomes a losing proposition.

Good point. In the worst cases, even if you cleaned things up, you might
even need to allocate a new string (like your middle one), which would
make calling split_ident_line a lot more annoying. Probably not worth
the effort.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] blame: handle broken commit headers gracefully

2013-04-17 Thread Junio C Hamano
Jeff King  writes:

> On Wed, Apr 17, 2013 at 08:33:54PM +0200, René Scharfe wrote:
>
>> Minimal patch, test case missing.  It's a bit sad that the old commit
>> parser of blame handled Ivan's specific corruption (extra "-<>" after
>> email) gracefully because it used the spaces as cutting points instead
>> of "<" and ">".
>
> That may mean there is room for improvement in split_ident_line to
> be more resilient in removing cruft. With something like:
>
>   Name -<> 123456789 -
>
> it would obviously be nice to find the date timestamp there, but I
> wonder what the "email" field should return? The full broken string, or
> just "email@host"?

Or you can imagine nastier input strings, like

   Name <>- 123456789 -
   Name - 123456789 -
   Name 56789 -

I am afraid that at some point "we should salvage as much as we
can", which is a worthy goal, becomes a losing proposition.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] blame: handle broken commit headers gracefully

2013-04-17 Thread René Scharfe

Am 17.04.2013 23:07, schrieb Jeff King:

On Wed, Apr 17, 2013 at 08:33:54PM +0200, René Scharfe wrote:


Minimal patch, test case missing.  It's a bit sad that the old commit
parser of blame handled Ivan's specific corruption (extra "-<>" after
email) gracefully because it used the spaces as cutting points instead
of "<" and ">".


That may mean there is room for improvement in split_ident_line to
be more resilient in removing cruft. With something like:

   Name -<> 123456789 -

it would obviously be nice to find the date timestamp there, but I
wonder what the "email" field should return? The full broken string, or
just "email@host"? One way is convenient for overlooking problems in
broken commits, but I would worry about code paths that are using
split_ident_line to verify the quality of the string (like
determine_author_info). It's possible we would need a strict and a
forgiving mode.


You can have both; the necessary data is in the struct ident_split: Just 
check that *mail_end == '>' and mail_end + 1 == date_begin etc.


René


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] blame: handle broken commit headers gracefully

2013-04-17 Thread Jeff King
On Wed, Apr 17, 2013 at 08:33:54PM +0200, René Scharfe wrote:

> Minimal patch, test case missing.  It's a bit sad that the old commit
> parser of blame handled Ivan's specific corruption (extra "-<>" after
> email) gracefully because it used the spaces as cutting points instead
> of "<" and ">".

That may mean there is room for improvement in split_ident_line to
be more resilient in removing cruft. With something like:

  Name -<> 123456789 -

it would obviously be nice to find the date timestamp there, but I
wonder what the "email" field should return? The full broken string, or
just "email@host"? One way is convenient for overlooking problems in
broken commits, but I would worry about code paths that are using
split_ident_line to verify the quality of the string (like
determine_author_info). It's possible we would need a strict and a
forgiving mode.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] blame: handle broken commit headers gracefully

2013-04-17 Thread René Scharfe
split_ident_line() can leave us with the pointers date_begin, date_end,
tz_begin and tz_end all set to NULL.  Check them before use and supply
the same fallback values as in the case of a negative return code from
split_ident_line().

The "(unknown)" is not actually shown in the output, though, because it
will be converted to a number (zero) eventually.

Signed-off-by: Rene Scharfe 
---
Minimal patch, test case missing.  It's a bit sad that the old commit
parser of blame handled Ivan's specific corruption (extra "-<>" after
email) gracefully because it used the spaces as cutting points instead
of "<" and ">".

 builtin/blame.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index 86100e9..7770781 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -1375,10 +1375,15 @@ static void get_ac_line(const char *inbuf, const char 
*what,
maillen = ident.mail_end - ident.mail_begin;
mailbuf = ident.mail_begin;
 
-   *time = strtoul(ident.date_begin, NULL, 10);
+   if (ident.date_begin && ident.date_end)
+   *time = strtoul(ident.date_begin, NULL, 10);
+   else
+   *time = 0;
 
-   len = ident.tz_end - ident.tz_begin;
-   strbuf_add(tz, ident.tz_begin, len);
+   if (ident.tz_begin && ident.tz_end)
+   strbuf_add(tz, ident.tz_begin, ident.tz_end - ident.tz_begin);
+   else
+   strbuf_addstr(tz, "(unknown)");
 
/*
 * Now, convert both name and e-mail using mailmap
-- 
1.8.2.1


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html