Re: Crashes while trying to show tag objects with bad timestamps

2013-02-25 Thread Jeff King
On Fri, Feb 22, 2013 at 03:20:10PM -0800, Junio C Hamano wrote:

> As pp_user_info() is called from very few places, I do not think it
> is unreasonable to add an output parameter (i.e. "unsigned *") to
> let the caller know that we made a best guess given malformed input
> and handle the error in the caller.  The make_cover_letter() caller
> may look like:
> 
>   pp_user_info(&pp, NULL, &sb, committer, encoding, &errors);
> if (errors & PP_CORRUPT_DATE)
>   warning("unparsable datestamp in '%s'", committer);
> 
> although it is unlikely to see this error in practice, given that
> committer is coming from git_committer_info(0) and would have the
> current timestamp.

Sadly that is not quite enough for the object-parsing cases (which are
the ones we _really_ want to add context to, because they are buried
inside other pp_* calls. Probably adding an object context field (or an
error return) to the pretty-print context would make sense. But I don't
relish the thought of annotating each pretty-print caller.

I think we're OK to be silent and just react in an appropriate way;
having looked over the other callers of split_ident_line, we already do
so in some places. See my patch 1 below for details.

Once fsck is taught to note this, then the warning is a lot less
important (my patch 3 below).

> The whole "cat-file -p" is a historical wart, aka poor-man's
> "show".  I do not even consider it a part of the plumbing.  It is a
> fair game for Porcelainisque improvement ;-)

Good, that's how I feel, too. See my patch 4. :)

Here are the patches I'd like to do:

  [1/4]: handle malformed dates in ident lines
  [2/4]: skip_prefix: return a non-const pointer
  [3/4]: fsck: check "tagger" lines
  [4/4]: cat-file: print tags raw for "cat-file -p"

The first one is solid, and should probably go to maint and/or the -rc
track, as it fixes a segfault on bogus input. It's hopefully a
no-brainer, as the existing behavior is obviously unacceptable. We may
change our mind later about exactly what to print for such bogus input,
but whatever we print in such a case is just trying to be nice to the
user, and anybody who depends on our particular handling of malformed
objects is crazy.

The rest can wait, as they are about improving output when fed bogus
input, or tightening fsck. Moreover, they have some problems which make
them not suitable for applying yet. I'll give details in each patch.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Crashes while trying to show tag objects with bad timestamps

2013-02-25 Thread Jeff King
On Sat, Feb 23, 2013 at 01:14:40AM +0200, Mantas Mikulėnas wrote:

> > Then I think it would make sense to allow the very specific no-date tag,
> > but not allow arbitrary crud. I wonder if there's an example in the
> > kernel or in git.git.
> 
> I couldn't find any such examples. However, I did find several tags
> with no "tagger" line at all: git.git has "v0.99" and linux.git has
> many such tags starting with "v2.6.11" ending with "v2.6.13-rc3".

Yes, I think Junio was mis-remembering the exact condition. It looks
like we added tagger lines in c818566 ([PATCH] Update tags to record who
made them, 2005-07-14), which pulls the identity straight from "git var
GIT_COMMITTER_IDENT". I double-checked to be sure that we included the
date stamp at that time, and we did.

When parsing such a tag, we put a "0" in the date field of the "struct
tag", and I suspect that is what caused the memory confusion.

So I think we are fine to fsck tagger lines as we do ordinary
author/committer ident lines; the only exception is that we should not
complain if they do not exist.

> It seems that `git cat-file -p` doesn't like such tags too – if there
> is no "tagger", it doesn't display *any* header lines. More bugs?

Yeah, I think we should just rid of that parser entirely. It is very
inconsistent with the pretty-printer used by "git show", as well as the
one used by "git for-each-ref", not to mention parse_tag (ugh, how many
tag parsers do we have?).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Crashes while trying to show tag objects with bad timestamps

2013-02-22 Thread Junio C Hamano
Jeff King  writes:

> On Fri, Feb 22, 2013 at 02:53:48PM -0800, Junio C Hamano wrote:
>
>> > I guess we should probably issue a warning, too. Also disappointingly,
>> > git-fsck does not seem to detect this breakage at all.
>> 
>> Yes for the warning, 
>
> Unfortunately, a good warning is harder than I had hoped. At the point
> where we notice the problem, pp_user_info, we have very little context.
> We can say only something like:
>
>   warning: malformed date in ident 'Jeff King  BOGUS'
>
> but we cannot say in which object, or even that it was a "tagger" line
> (and in some cases we do not even have an object, as in
> make_cover_letter).

As pp_user_info() is called from very few places, I do not think it
is unreasonable to add an output parameter (i.e. "unsigned *") to
let the caller know that we made a best guess given malformed input
and handle the error in the caller.  The make_cover_letter() caller
may look like:

pp_user_info(&pp, NULL, &sb, committer, encoding, &errors);
if (errors & PP_CORRUPT_DATE)
warning("unparsable datestamp in '%s'", committer);

although it is unlikely to see this error in practice, given that
committer is coming from git_committer_info(0) and would have the
current timestamp.

> I also took a look at parsing routine of "cat-file -p". It's totally
> hand-rolled, separate from what "git show" does, and is not build on the
> pretty-print code at all. I wonder, though, if it actually makes sense
> to munge the date there. The commit-object pretty-printer for cat-file
> just shows the object intact. It seems weirdly inconsistent that we
> would munge tags just to rewrite the date. If you want a real
> pretty-printer, you should be using porcelain like "show".

The whole "cat-file -p" is a historical wart, aka poor-man's
"show".  I do not even consider it a part of the plumbing.  It is a
fair game for Porcelainisque improvement ;-)

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Crashes while trying to show tag objects with bad timestamps

2013-02-22 Thread Mantas Mikulėnas
On Sat, Feb 23, 2013 at 1:04 AM, Jeff King  wrote:
> On Fri, Feb 22, 2013 at 02:53:48PM -0800, Junio C Hamano wrote:
>> and no for disappointing.  IIRC, in the very early implementations
>> allowed tag object without dates.
>>
>> I _think_ we can start tightening fsck, though.
>
> Then I think it would make sense to allow the very specific no-date tag,
> but not allow arbitrary crud. I wonder if there's an example in the
> kernel or in git.git.

I couldn't find any such examples. However, I did find several tags
with no "tagger" line at all: git.git has "v0.99" and linux.git has
many such tags starting with "v2.6.11" ending with "v2.6.13-rc3".

It seems that `git cat-file -p` doesn't like such tags too – if there
is no "tagger", it doesn't display *any* header lines. More bugs?

--
Mantas Mikulėnas
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Crashes while trying to show tag objects with bad timestamps

2013-02-22 Thread Jeff King
On Fri, Feb 22, 2013 at 02:53:48PM -0800, Junio C Hamano wrote:

> > I guess we should probably issue a warning, too. Also disappointingly,
> > git-fsck does not seem to detect this breakage at all.
> 
> Yes for the warning, 

Unfortunately, a good warning is harder than I had hoped. At the point
where we notice the problem, pp_user_info, we have very little context.
We can say only something like:

  warning: malformed date in ident 'Jeff King  BOGUS'

but we cannot say in which object, or even that it was a "tagger" line
(and in some cases we do not even have an object, as in
make_cover_letter).

> and no for disappointing.  IIRC, in the very early implementations
> allowed tag object without dates.
> 
> I _think_ we can start tightening fsck, though.

Then I think it would make sense to allow the very specific no-date tag,
but not allow arbitrary crud. I wonder if there's an example in the
kernel or in git.git.

I also took a look at parsing routine of "cat-file -p". It's totally
hand-rolled, separate from what "git show" does, and is not build on the
pretty-print code at all. I wonder, though, if it actually makes sense
to munge the date there. The commit-object pretty-printer for cat-file
just shows the object intact. It seems weirdly inconsistent that we
would munge tags just to rewrite the date. If you want a real
pretty-printer, you should be using porcelain like "show".

It would be a regression, of course, for people relying on "cat-file -p"
to have consistent output. But I am very tempted to call it a bug, and
tempted to call "cat-file -p" inside a script a bad thing (you cannot,
after all, tell what object type you have; you should figure out the
type you expect and then use "cat-file  " to make sure you
get the right one).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Crashes while trying to show tag objects with bad timestamps

2013-02-22 Thread Junio C Hamano
Jeff King  writes:

> I guess we should probably issue a warning, too. Also disappointingly,
> git-fsck does not seem to detect this breakage at all.

Yes for the warning, and no for disappointing.  IIRC, in the very
early implementations allowed tag object without dates.

I _think_ we can start tightening fsck, though.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Crashes while trying to show tag objects with bad timestamps

2013-02-22 Thread Jeff King
On Sat, Feb 23, 2013 at 12:30:28AM +0200, Mantas Mikulėnas wrote:

> When messing around with various repositories, I noticed that git 1.8
> (currently using 1.8.2.rc0.22.gb3600c3) has problems parsing tag objects
> that have invalid timestamps.
> 
> Times in tag objects appear to be kept as Unix timestamps, but I didn't
> realize this at first, and ran something roughly equivalent to:
>   git cat-file -p $tagname | git hash-object -w -t tag --stdin
> creating a tag object the "tagger" line containing formatted time
> instead of a Unix timestamp.

Thanks, that makes it easy to replicate. It looks like it is not just
tags, but rather the pp_user_info function does not realize that
split_ident may return NULL for the date field if it is unparseable.
Something like this stops the crash and just gives a bogus date:

diff --git a/pretty.c b/pretty.c
index eae57ad..9688857 100644
--- a/pretty.c
+++ b/pretty.c
@@ -428,8 +428,16 @@ void pp_user_info(const struct pretty_print_context *pp,
strbuf_add(&name, namebuf, namelen);
 
namelen = name.len + mail.len + 3; /* ' ' + '<' + '>' */
-   time = strtoul(ident.date_begin, &date, 10);
-   tz = strtol(date, NULL, 10);
+
+   if (ident.date_begin) {
+   time = strtoul(ident.date_begin, &date, 10);
+   tz = strtol(date, NULL, 10);
+   }
+   else {
+   /* ident line had malformed date */
+   time = 0;
+   tz = 0;
+   }
 
if (pp->fmt == CMIT_FMT_EMAIL) {
strbuf_addstr(sb, "From: ");

I guess we should probably issue a warning, too. Also disappointingly,
git-fsck does not seem to detect this breakage at all.

> Git doesn't handle the resulting tag objects nicely at all. For example,
> running `git cat-file -p` on the new object outputs a really odd
> timestamp "Thu Jun Thu Jan 1 00:16:09 1970 +0016" (I'm guessing it
> parses the year as Unix time), and `git show` outright crashes
> (backtrace included below.)

If "cat-file -p" is not using the usual pretty-print routines, it
probably should. I'll take a look.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html