Re: [PATCH 4/5] log: handle integer overflow in timestamps

2014-02-24 Thread Junio C Hamano
Jeff King  writes:

> On Mon, Feb 24, 2014 at 12:21:33PM -0800, Junio C Hamano wrote:
>
>> >> > +   if (date_overflows(date))
>> >> > +   date = 0;
>> >> > +   else {
>> >> > +   if (ident->tz_begin && ident->tz_end)
>> >> > +   tz = strtol(ident->tz_begin, NULL, 10);
>> >> > +   if (tz == LONG_MAX || tz == LONG_MIN)
>> >> > +   tz = 0;
>> >> > +   }
>> >> 
>> >> ... don't we want to fix an input having a bogus timestamp and also
>> >> a bogus tz recorded in it?
>> >
>> > If there is a bogus timestamp, then we do not want to look at tz at all.
>> > We leave it at "0", so that we get a true sentinel:
>> 
>> Ah, OK, I missed the initialization to 0 at the beginning.
>> 
>> It might have been more clear if "int tz" declaration were left
>> uninitialized, and the variable were explicitly cleared to 0 in the
>> "date-overflows" error codepath, but it is not a big deal.
>
> It might be, but I think it would end up cumbersome.
> ...
> So I'd be in favor of keeping it as-is, but feel free to mark it up if
> you feel strongly.

I'd be in favor of keeping it as-is.


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] log: handle integer overflow in timestamps

2014-02-24 Thread Jeff King
On Mon, Feb 24, 2014 at 12:21:33PM -0800, Junio C Hamano wrote:

> >> > +if (date_overflows(date))
> >> > +date = 0;
> >> > +else {
> >> > +if (ident->tz_begin && ident->tz_end)
> >> > +tz = strtol(ident->tz_begin, NULL, 10);
> >> > +if (tz == LONG_MAX || tz == LONG_MIN)
> >> > +tz = 0;
> >> > +}
> >> 
> >> ... don't we want to fix an input having a bogus timestamp and also
> >> a bogus tz recorded in it?
> >
> > If there is a bogus timestamp, then we do not want to look at tz at all.
> > We leave it at "0", so that we get a true sentinel:
> 
> Ah, OK, I missed the initialization to 0 at the beginning.
> 
> It might have been more clear if "int tz" declaration were left
> uninitialized, and the variable were explicitly cleared to 0 in the
> "date-overflows" error codepath, but it is not a big deal.

It might be, but I think it would end up cumbersome. The initialization
was already there from the previous version, which was hitting the else
for "ident->tz_begin". Without fallback initializations, you end up with:

  if (ident->date_begin && ident->date_end) {
  date = strtoul(ident->date_begin, NULL, 10);
  if (date_overflows(date)) {
  date = 0;
  tz = 0;
  } else {
  if (ident->tz_begin && ident->tz_end) {
  tz = strtol(ident->tz_begin, NULL, 10);
  if (tz == LONG_MAX || tz == LONG_MIN)
  tz = 0;
  } else
  tz = 0;
  }
  } else {
  date = 0;
  tz = 0;
  }

which I think is much more confusing (and hard to verify that the
variables are always set). Checking !date as an error condition would
make it a little more readable:

  if (ident->date_begin && ident->date_end) {
  date = strtoul(ident->date_begin, NULL, 10);
  if (date_overflows(date))
  date = 0;
  } else
  date = 0;

  if (date) {
  if (ident->tz_begin && ident->tz_end) {
  tz = strtol(ident->tz_begin, NULL, 10);
  if (tz == LONG_MAX || tz == LONG_MIN)
  tz = 0;
  } else
  tz = 0;
  } else
  tz = 0;

but then we treat date==0 as a sentinel, and can never correctly parse
dates on Jan 1, 1970.

So I'd be in favor of keeping it as-is, but feel free to mark it up if
you feel strongly.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] log: handle integer overflow in timestamps

2014-02-24 Thread Junio C Hamano
Jeff King  writes:

> On Mon, Feb 24, 2014 at 11:50:00AM -0800, Junio C Hamano wrote:
>
>> > We also recognize overflow in the timezone field, which
>> > could produce nonsensical results. In this case we show the
>> > parsed date, but in UTC.
>> 
>> Both are good measures to fallback to sanity, but why is that
>> if/else?  In other words...
>> 
>> > +  if (date_overflows(date))
>> > +  date = 0;
>> > +  else {
>> > +  if (ident->tz_begin && ident->tz_end)
>> > +  tz = strtol(ident->tz_begin, NULL, 10);
>> > +  if (tz == LONG_MAX || tz == LONG_MIN)
>> > +  tz = 0;
>> > +  }
>> 
>> ... don't we want to fix an input having a bogus timestamp and also
>> a bogus tz recorded in it?
>
> If there is a bogus timestamp, then we do not want to look at tz at all.
> We leave it at "0", so that we get a true sentinel:

Ah, OK, I missed the initialization to 0 at the beginning.

It might have been more clear if "int tz" declaration were left
uninitialized, and the variable were explicitly cleared to 0 in the
"date-overflows" error codepath, but it is not a big deal.

Thanks for clarification.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] log: handle integer overflow in timestamps

2014-02-24 Thread Jeff King
On Mon, Feb 24, 2014 at 11:50:00AM -0800, Junio C Hamano wrote:

> > We also recognize overflow in the timezone field, which
> > could produce nonsensical results. In this case we show the
> > parsed date, but in UTC.
> 
> Both are good measures to fallback to sanity, but why is that
> if/else?  In other words...
> 
> > +   if (date_overflows(date))
> > +   date = 0;
> > +   else {
> > +   if (ident->tz_begin && ident->tz_end)
> > +   tz = strtol(ident->tz_begin, NULL, 10);
> > +   if (tz == LONG_MAX || tz == LONG_MIN)
> > +   tz = 0;
> > +   }
> 
> ... don't we want to fix an input having a bogus timestamp and also
> a bogus tz recorded in it?

If there is a bogus timestamp, then we do not want to look at tz at all.
We leave it at "0", so that we get a true sentinel:

  Thu Jan 1 00:00:00 1970 +

and not:

  Wed Dec 31 19:00:00 1969 -0500

If the timestamp is valid, then we bother to parse the zone, and handle
overflow there.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] log: handle integer overflow in timestamps

2014-02-24 Thread Junio C Hamano
Jeff King  writes:

> If an ident line has a ridiculous date value like (2^64)+1,
> we currently just pass ULONG_MAX along to the date code,
> which can produce nonsensical dates.
>
> On systems with a signed long time_t (e.g., 64-bit glibc
> systems), this actually doesn't end up too bad. The
> ULONG_MAX is converted to -1, we apply the timezone field to
> that, and the result ends up somewhere between Dec 31, 1969
> and Jan 1, 1970.
> ...
> We also recognize overflow in the timezone field, which
> could produce nonsensical results. In this case we show the
> parsed date, but in UTC.

Both are good measures to fallback to sanity, but why is that
if/else?  In other words...

> + if (date_overflows(date))
> + date = 0;
> + else {
> + if (ident->tz_begin && ident->tz_end)
> + tz = strtol(ident->tz_begin, NULL, 10);
> + if (tz == LONG_MAX || tz == LONG_MIN)
> + tz = 0;
> + }

... don't we want to fix an input having a bogus timestamp and also
a bogus tz recorded in it?

>   return show_date(date, tz, mode);
>  }
>  
> diff --git a/t/t4212-log-corrupt.sh b/t/t4212-log-corrupt.sh
> index 83de981..ba25a2e 100755
> --- a/t/t4212-log-corrupt.sh
> +++ b/t/t4212-log-corrupt.sh
> @@ -65,4 +65,20 @@ test_expect_success 'unparsable dates produce sentinel 
> value (%ad)' '
>   test_cmp expect actual
>  '
>  
> +# date is 2^64 + 1
> +test_expect_success 'date parser recognizes integer overflow' '
> + commit=$(munge_author_date HEAD 18446744073709551617) &&
> + echo "Thu Jan 1 00:00:00 1970 +" >expect &&
> + git log -1 --format=%ad $commit >actual &&
> + test_cmp expect actual
> +'
> +
> +# date is 2^64 - 2
> +test_expect_success 'date parser recognizes time_t overflow' '
> + commit=$(munge_author_date HEAD 18446744073709551614) &&
> + echo "Thu Jan 1 00:00:00 1970 +" >expect &&
> + git log -1 --format=%ad $commit >actual &&
> + test_cmp expect actual
> +'
> +
>  test_done
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] log: handle integer overflow in timestamps

2014-02-23 Thread Jeff King
If an ident line has a ridiculous date value like (2^64)+1,
we currently just pass ULONG_MAX along to the date code,
which can produce nonsensical dates.

On systems with a signed long time_t (e.g., 64-bit glibc
systems), this actually doesn't end up too bad. The
ULONG_MAX is converted to -1, we apply the timezone field to
that, and the result ends up somewhere between Dec 31, 1969
and Jan 1, 1970.

However, there is still a few good reasons to detect the
overflow explicitly:

  1. On systems where "unsigned long" is smaller than
 time_t, we get a nonsensical date in the future.

  2. Even where it would produce "Dec 31, 1969", it's easier
 to recognize "midnight Jan 1" as a consistent sentinel
 value for "we could not parse this".

  3.  Values which do not overflow strtoul but do overflow a
  signed time_t produce nonsensical values in the past.
  For example, on a 64-bit system with a signed long
  time_t, a timestamp of 184467440730 produces a
  date in 1947.

We also recognize overflow in the timezone field, which
could produce nonsensical results. In this case we show the
parsed date, but in UTC.

Signed-off-by: Jeff King 
---
A note on these tests. They are designed for 64-bit systems, but should
run fine on 32-bit systems (they are both just overflow, and the second
test is not doing anything novel that the first is not).

However, the second test relies on finding a value that fits into an
"unsigned long" but does not fit into a time_t. For systems with a
64-bit signed time_t, that's fine. But if a system has an unsigned
64-bit time_t, the test will fail (it will actually produce some value
300 billion years from now).

I'm inclined to include it as-is, as I do not know of any such system
(and it would be kind of lame, since it could not represent dates before
1970). If somebody comes up with such a system, we can allow either
output (we could do it preemptively, but somebody would have to
calculate the exact date/time billions of years in the future; we would
be just as likely to disagree with the system's gmtime about something
silly like leapseconds).

 pretty.c   | 10 --
 t/t4212-log-corrupt.sh | 16 
 2 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/pretty.c b/pretty.c
index 87db08b..3b811ed 100644
--- a/pretty.c
+++ b/pretty.c
@@ -401,8 +401,14 @@ static const char *show_ident_date(const struct 
ident_split *ident,
 
if (ident->date_begin && ident->date_end)
date = strtoul(ident->date_begin, NULL, 10);
-   if (ident->tz_begin && ident->tz_end)
-   tz = strtol(ident->tz_begin, NULL, 10);
+   if (date_overflows(date))
+   date = 0;
+   else {
+   if (ident->tz_begin && ident->tz_end)
+   tz = strtol(ident->tz_begin, NULL, 10);
+   if (tz == LONG_MAX || tz == LONG_MIN)
+   tz = 0;
+   }
return show_date(date, tz, mode);
 }
 
diff --git a/t/t4212-log-corrupt.sh b/t/t4212-log-corrupt.sh
index 83de981..ba25a2e 100755
--- a/t/t4212-log-corrupt.sh
+++ b/t/t4212-log-corrupt.sh
@@ -65,4 +65,20 @@ test_expect_success 'unparsable dates produce sentinel value 
(%ad)' '
test_cmp expect actual
 '
 
+# date is 2^64 + 1
+test_expect_success 'date parser recognizes integer overflow' '
+   commit=$(munge_author_date HEAD 18446744073709551617) &&
+   echo "Thu Jan 1 00:00:00 1970 +" >expect &&
+   git log -1 --format=%ad $commit >actual &&
+   test_cmp expect actual
+'
+
+# date is 2^64 - 2
+test_expect_success 'date parser recognizes time_t overflow' '
+   commit=$(munge_author_date HEAD 18446744073709551614) &&
+   echo "Thu Jan 1 00:00:00 1970 +" >expect &&
+   git log -1 --format=%ad $commit >actual &&
+   test_cmp expect actual
+'
+
 test_done
-- 
1.8.5.2.500.g8060133

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html