Re: [PATCH] Fix delta integer overflows

2017-08-08 Thread Johannes Schindelin
Hi Martin,

On Tue, 8 Aug 2017, Martin Koegler wrote:

> On Mon, Aug 07, 2017 at 09:39:12PM +0200, Johannes Schindelin wrote:
> > If you want to work on data in memory, then size_t is the appropriate data
> > type. We already use it elsewhere. Let's use it here, too, without the
> > intermediate bump from the incorrect `int` to the equally incorrect
> > `long`.
> 
> I disagree with "We already use it elsewhere.".

By "it" I meant "size_t".

> The whole delta code uses "unsigned long" - look at delta.h. Look at
> unpack-objects.c. Or cache.h. Or pack-objects.c. Or index-pack.c.

I know that. It is a major bug in the source code.

> Other possible cases:
> git grep "unsigned long" |grep size

Yes, even more bugs.

> So the codebase still suggests, that "unsigned long" is the data type
> for storing object sizes.

And it would be wrong, and we know it already. Most importantly, Junio
knows it already.

> I would be fine with resubmitting a patch using size_t/off_t for the
> touched parts - changing the whole core code is a too invasive change
> for a bug fix.

Sorry, my mistake: I never meant to burden you with the invasive change.
I only wanted you to change the `int` to `size_t` right away.

Thanks,
Johannes


Re: [PATCH] Fix delta integer overflows

2017-08-08 Thread Martin Koegler
On Mon, Aug 07, 2017 at 06:44:16PM -0700, Junio C Hamano wrote:
> Having said that, I am a bit curious how you came to this patch.
> Was the issue found by code inspection, or did you actually have a
> real life use case to raise the core.bigFileThreshold configuration
> to a value above 4GB?

Real life use - tracking changes in larger files.

Raising the limit above 4GB suddenly resulted in a broken pack files in the 
repository and
aborts of various git commands.

Data is still recoverable with all size sanity checks disabled.

Regards,
Martin


Re: [PATCH] Fix delta integer overflows

2017-08-08 Thread Martin Koegler
On Mon, Aug 07, 2017 at 09:39:12PM +0200, Johannes Schindelin wrote:
> If you want to work on data in memory, then size_t is the appropriate data
> type. We already use it elsewhere. Let's use it here, too, without the
> intermediate bump from the incorrect `int` to the equally incorrect
> `long`.

I disagree with "We already use it elsewhere.". The whole delta code uses 
"unsigned long" -
look at delta.h. Look at unpack-objects.c. Or cache.h. Or pack-objects.c. Or 
index-pack.c.

Other possible cases:
git grep "unsigned long" |grep size

So the codebase still suggests, that "unsigned long" is the data type for 
storing object sizes.

I would be fine with resubmitting a patch using size_t/off_t for the touched 
parts - changing the whole
core code is a too invasive change for a bug fix.

Regards,
Martin
>From d97a7810ff679dd939972c151bcf23c122cdc570 Mon Sep 17 00:00:00 2001
From: Martin Koegler 
Date: Mon, 7 Aug 2017 20:00:30 +0200
Subject: [PATCH] Fix delta integer overflows

The current delta code produces incorrect pack objects for files > 4GB.

Signed-off-by: Martin Koegler 
---
 diff-delta.c | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/diff-delta.c b/diff-delta.c
index 3797ce6..cd238c8 100644
--- a/diff-delta.c
+++ b/diff-delta.c
@@ -319,7 +319,9 @@ create_delta(const struct delta_index *index,
 	 const void *trg_buf, unsigned long trg_size,
 	 unsigned long *delta_size, unsigned long max_size)
 {
-	unsigned int i, outpos, outsize, moff, msize, val;
+	unsigned int i, val;
+	off_t outpos, moff;
+	size_t l, outsize, msize;
 	int inscnt;
 	const unsigned char *ref_data, *ref_top, *data, *top;
 	unsigned char *out;
@@ -336,20 +338,20 @@ create_delta(const struct delta_index *index,
 		return NULL;
 
 	/* store reference buffer size */
-	i = index->src_size;
-	while (i >= 0x80) {
-		out[outpos++] = i | 0x80;
-		i >>= 7;
+	l = index->src_size;
+	while (l >= 0x80) {
+		out[outpos++] = l | 0x80;
+		l >>= 7;
 	}
-	out[outpos++] = i;
+	out[outpos++] = l;
 
 	/* store target buffer size */
-	i = trg_size;
-	while (i >= 0x80) {
-		out[outpos++] = i | 0x80;
-		i >>= 7;
+	l = trg_size;
+	while (l >= 0x80) {
+		out[outpos++] = l | 0x80;
+		l >>= 7;
 	}
-	out[outpos++] = i;
+	out[outpos++] = l;
 
 	ref_data = index->src_buf;
 	ref_top = ref_data + index->src_size;
-- 
2.1.4



Re: [PATCH] Fix delta integer overflows

2017-08-07 Thread Junio C Hamano
Junio C Hamano  writes:

> Martin Koegler  writes:
>
>> From: Martin Koegler 
>>
>> The current delta code produces incorrect pack objects for files > 4GB.
>>
>> Signed-off-by: Martin Koegler 
>> ---
>>  diff-delta.c | 23 ---
>>  1 file changed, 12 insertions(+), 11 deletions(-)
>>
>> Just pass any file > 4 GB to the delta-compression [by increasing the delta 
>> limits].
>> As file size, a truncated 32bit value will be encoded, leading to broken 
>> pack files.
>
> The patch obviously makes the code better and self consistent in
> that "struct delta_index" has src_size as ulong, and this function
> takes trg_size as ulong, and it was plain wrong for the code to
> assume that "i", which is uint, can receive it safely.
>
> In the longer term we might want to move to size_t or even
> uintmax_t, as the ulong on a platform may not be long enough in
> order to express the largest file size the platform can have, but
> this patch (1) is good even without such a change, and (2) gives a
> good foundation to build on if we want such a change on top.
>
> Thanks.  Will queue.

Having said that, I am a bit curious how you came to this patch.
Was the issue found by code inspection, or did you actually have a
real life use case to raise the core.bigFileThreshold configuration
to a value above 4GB?

Thanks.


Re: [PATCH] Fix delta integer overflows

2017-08-07 Thread Junio C Hamano
Junio C Hamano  writes:

> This is not about where the bar is set.  It is about expectation

After having thought about this a bit more, I think in the message I
am responding to I mischaracterised the aspect of a patch that
influences the "expectation".  It is much less about who the
contributor is but more about what the patch does.  

If the patch in question were from a more experienced contributor
(like you or Peff), my internal reaction would have been "gee, the
submitter should have known better that a more complete fix should
involve a larger integral type, not stopping at matching the largest
type that happens to be used in the interface without updating the
interface".

But I still would have said that the patch is an improvement--as it
indeed is; it does not make things worse anywhere and brings in a
more consistency.  And I still would have mentioned the same "in the
longer term, we would want to use size_t or uintmax_t here, not just
ulong".

The only thing I would have done differently if the submission were
by a more experienced contributor is that I probably would have
added "yes this may be an improvement, but I expected you should
know better to at least mention the longer term direction to use
size_t or uintmax_t in the log message, even if you didn't
immediately extend this patch into a more complete series".

That one is a difference of expectation between an occasional
contributor and an experienced one.


Re: [PATCH] Fix delta integer overflows

2017-08-07 Thread Junio C Hamano
Johannes Schindelin  writes:

> So are you saying that starting with v2.14.0, you accept patches into `pu`
> for which you would previously have required multiple iterations before
> even considering it for `pu`?
>
> Frankly, I am a bit surprised that this obvious change from `unsigned
> long` to `size_t` is not required in this case before queuing, but if the
> rules have changed to lower the bar for patch submissions, I am all for
> it. I always felt that we are wasting contributors' time a little too
> freely and too deliberately.

This is not about where the bar is set.  It is about expectation.  I
do not expect much from occasional (or new) contributors and I try
not to demand too much from them.  The consequence is that as long
as a small patch makes things better without making anything worse,
I'd want to be inclusive to encourage them to build obvious
improvements on top.  Maybe they just want a single patch landed to
fix their immediate needs (which may be generic enough and expected
to be shared with many other people) without going further, so I may
end up queuing something that only helps 40% of people until follow
up patches are done to cover the remaining 60% of people, but that
is fine as long as the patch does not make things worse (it is not
like a patch that helps 40% while hurting the remaining 60% until
a follow-up happens).

I would expect a lot more from experienced contributors, when I know
they are capable of helping the remaining 60% inside the same series
and without requiring too much hand-holding from me.  The same thing
I cannot say to a occasional (or new) contributor---they may not be
coorperative, or they may be willing to do more but may require more
help polishing their work than the bandwidth I can afford.

So if you are volunteering to help by either guiding Martin to aim
higher and make this a series with a larger and more complete scope,
I'd very much appreciate it.  Or you can do a follow-up series on
top of what Martin posted.  Either is fine by me.  Just do not step
on each others' toes ;-)

I avoided saying all of the above because I didn't want my word
taken out of context later to make it sound as if I were belittling
the competence of Martin, but you seem to want to force me to say
this, so here it is.



Re: [PATCH] Fix delta integer overflows

2017-08-07 Thread Johannes Schindelin
Hi Junio,

On Mon, 7 Aug 2017, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> >> The patch obviously makes the code better and self consistent in
> >> that "struct delta_index" has src_size as ulong, and this function
> >> takes trg_size as ulong, and it was plain wrong for the code to
> >> assume that "i", which is uint, can receive it safely.
> >> 
> >> In the longer term we might want to move to size_t or even
> >> uintmax_t, as the ulong on a platform may not be long enough in
> >> order to express the largest file size the platform can have, but
> >> this patch (1) is good even without such a change, and (2) gives a
> >> good foundation to build on if we want such a change on top.
> >> 
> >> Thanks.  Will queue.
> >
> > This is sad. There is no "may not be long enough". We already know a
> > platform where unsigned long is not long enough, don't we? Why leave this
> > patch in this intermediate state?
> 
> This is a good foundation to build on, and I never said no further
> update on top of this patch is desired.  Look for "(2)" in what you
> quoted.

So are you saying that starting with v2.14.0, you accept patches into `pu`
for which you would previously have required multiple iterations before
even considering it for `pu`?

Frankly, I am a bit surprised that this obvious change from `unsigned
long` to `size_t` is not required in this case before queuing, but if the
rules have changed to lower the bar for patch submissions, I am all for
it. I always felt that we are wasting contributors' time a little too
freely and too deliberately.

Ciao,
Dscho


Re: [PATCH] Fix delta integer overflows

2017-08-07 Thread Junio C Hamano
Johannes Schindelin  writes:

>> The patch obviously makes the code better and self consistent in
>> that "struct delta_index" has src_size as ulong, and this function
>> takes trg_size as ulong, and it was plain wrong for the code to
>> assume that "i", which is uint, can receive it safely.
>> 
>> In the longer term we might want to move to size_t or even
>> uintmax_t, as the ulong on a platform may not be long enough in
>> order to express the largest file size the platform can have, but
>> this patch (1) is good even without such a change, and (2) gives a
>> good foundation to build on if we want such a change on top.
>> 
>> Thanks.  Will queue.
>
> This is sad. There is no "may not be long enough". We already know a
> platform where unsigned long is not long enough, don't we? Why leave this
> patch in this intermediate state?

This is a good foundation to build on, and I never said no further
update on top of this patch is desired.  Look for "(2)" in what you
quoted.


Re: [PATCH] Fix delta integer overflows

2017-08-07 Thread Johannes Schindelin
Hi,

On Mon, 7 Aug 2017, Junio C Hamano wrote:

> Martin Koegler  writes:
> 
> > From: Martin Koegler 
> >
> > The current delta code produces incorrect pack objects for files > 4GB.
> >
> > Signed-off-by: Martin Koegler 
> > ---
> >  diff-delta.c | 23 ---
> >  1 file changed, 12 insertions(+), 11 deletions(-)
> >
> > Just pass any file > 4 GB to the delta-compression [by increasing the delta 
> > limits].
> > As file size, a truncated 32bit value will be encoded, leading to broken 
> > pack files.
> 
> The patch obviously makes the code better and self consistent in
> that "struct delta_index" has src_size as ulong, and this function
> takes trg_size as ulong, and it was plain wrong for the code to
> assume that "i", which is uint, can receive it safely.
> 
> In the longer term we might want to move to size_t or even
> uintmax_t, as the ulong on a platform may not be long enough in
> order to express the largest file size the platform can have, but
> this patch (1) is good even without such a change, and (2) gives a
> good foundation to build on if we want such a change on top.
> 
> Thanks.  Will queue.

This is sad. There is no "may not be long enough". We already know a
platform where unsigned long is not long enough, don't we? Why leave this
patch in this intermediate state?

If you want to work on data in memory, then size_t is the appropriate data
type. We already use it elsewhere. Let's use it here, too, without the
intermediate bump from the incorrect `int` to the equally incorrect
`long`.

Ciao,
Johannes


Re: [PATCH] Fix delta integer overflows

2017-08-07 Thread Junio C Hamano
Martin Koegler  writes:

> From: Martin Koegler 
>
> The current delta code produces incorrect pack objects for files > 4GB.
>
> Signed-off-by: Martin Koegler 
> ---
>  diff-delta.c | 23 ---
>  1 file changed, 12 insertions(+), 11 deletions(-)
>
> Just pass any file > 4 GB to the delta-compression [by increasing the delta 
> limits].
> As file size, a truncated 32bit value will be encoded, leading to broken pack 
> files.

The patch obviously makes the code better and self consistent in
that "struct delta_index" has src_size as ulong, and this function
takes trg_size as ulong, and it was plain wrong for the code to
assume that "i", which is uint, can receive it safely.

In the longer term we might want to move to size_t or even
uintmax_t, as the ulong on a platform may not be long enough in
order to express the largest file size the platform can have, but
this patch (1) is good even without such a change, and (2) gives a
good foundation to build on if we want such a change on top.

Thanks.  Will queue.

>
> diff --git a/diff-delta.c b/diff-delta.c
> index 3797ce6..13e5a01 100644
> --- a/diff-delta.c
> +++ b/diff-delta.c
> @@ -319,7 +319,8 @@ create_delta(const struct delta_index *index,
>const void *trg_buf, unsigned long trg_size,
>unsigned long *delta_size, unsigned long max_size)
>  {
> - unsigned int i, outpos, outsize, moff, msize, val;
> + unsigned int i, val;
> + unsigned long l, outpos, outsize, moff, msize;
>   int inscnt;
>   const unsigned char *ref_data, *ref_top, *data, *top;
>   unsigned char *out;
> @@ -336,20 +337,20 @@ create_delta(const struct delta_index *index,
>   return NULL;
>  
>   /* store reference buffer size */
> - i = index->src_size;
> - while (i >= 0x80) {
> - out[outpos++] = i | 0x80;
> - i >>= 7;
> + l = index->src_size;
> + while (l >= 0x80) {
> + out[outpos++] = l | 0x80;
> + l >>= 7;
>   }
> - out[outpos++] = i;
> + out[outpos++] = l;
>  
>   /* store target buffer size */
> - i = trg_size;
> - while (i >= 0x80) {
> - out[outpos++] = i | 0x80;
> - i >>= 7;
> + l = trg_size;
> + while (l >= 0x80) {
> + out[outpos++] = l | 0x80;
> + l >>= 7;
>   }
> - out[outpos++] = i;
> + out[outpos++] = l;
>  
>   ref_data = index->src_buf;
>   ref_top = ref_data + index->src_size;