Re: [PATCH] Fix delta integer overflows
Hi Martin, On Tue, 8 Aug 2017, Martin Koegler wrote: > On Mon, Aug 07, 2017 at 09:39:12PM +0200, Johannes Schindelin wrote: > > If you want to work on data in memory, then size_t is the appropriate data > > type. We already use it elsewhere. Let's use it here, too, without the > > intermediate bump from the incorrect `int` to the equally incorrect > > `long`. > > I disagree with "We already use it elsewhere.". By "it" I meant "size_t". > The whole delta code uses "unsigned long" - look at delta.h. Look at > unpack-objects.c. Or cache.h. Or pack-objects.c. Or index-pack.c. I know that. It is a major bug in the source code. > Other possible cases: > git grep "unsigned long" |grep size Yes, even more bugs. > So the codebase still suggests, that "unsigned long" is the data type > for storing object sizes. And it would be wrong, and we know it already. Most importantly, Junio knows it already. > I would be fine with resubmitting a patch using size_t/off_t for the > touched parts - changing the whole core code is a too invasive change > for a bug fix. Sorry, my mistake: I never meant to burden you with the invasive change. I only wanted you to change the `int` to `size_t` right away. Thanks, Johannes
Re: [PATCH] Fix delta integer overflows
On Mon, Aug 07, 2017 at 06:44:16PM -0700, Junio C Hamano wrote: > Having said that, I am a bit curious how you came to this patch. > Was the issue found by code inspection, or did you actually have a > real life use case to raise the core.bigFileThreshold configuration > to a value above 4GB? Real life use - tracking changes in larger files. Raising the limit above 4GB suddenly resulted in a broken pack files in the repository and aborts of various git commands. Data is still recoverable with all size sanity checks disabled. Regards, Martin
Re: [PATCH] Fix delta integer overflows
On Mon, Aug 07, 2017 at 09:39:12PM +0200, Johannes Schindelin wrote: > If you want to work on data in memory, then size_t is the appropriate data > type. We already use it elsewhere. Let's use it here, too, without the > intermediate bump from the incorrect `int` to the equally incorrect > `long`. I disagree with "We already use it elsewhere.". The whole delta code uses "unsigned long" - look at delta.h. Look at unpack-objects.c. Or cache.h. Or pack-objects.c. Or index-pack.c. Other possible cases: git grep "unsigned long" |grep size So the codebase still suggests, that "unsigned long" is the data type for storing object sizes. I would be fine with resubmitting a patch using size_t/off_t for the touched parts - changing the whole core code is a too invasive change for a bug fix. Regards, Martin >From d97a7810ff679dd939972c151bcf23c122cdc570 Mon Sep 17 00:00:00 2001 From: Martin Koegler Date: Mon, 7 Aug 2017 20:00:30 +0200 Subject: [PATCH] Fix delta integer overflows The current delta code produces incorrect pack objects for files > 4GB. Signed-off-by: Martin Koegler --- diff-delta.c | 24 +--- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/diff-delta.c b/diff-delta.c index 3797ce6..cd238c8 100644 --- a/diff-delta.c +++ b/diff-delta.c @@ -319,7 +319,9 @@ create_delta(const struct delta_index *index, const void *trg_buf, unsigned long trg_size, unsigned long *delta_size, unsigned long max_size) { - unsigned int i, outpos, outsize, moff, msize, val; + unsigned int i, val; + off_t outpos, moff; + size_t l, outsize, msize; int inscnt; const unsigned char *ref_data, *ref_top, *data, *top; unsigned char *out; @@ -336,20 +338,20 @@ create_delta(const struct delta_index *index, return NULL; /* store reference buffer size */ - i = index->src_size; - while (i >= 0x80) { - out[outpos++] = i | 0x80; - i >>= 7; + l = index->src_size; + while (l >= 0x80) { + out[outpos++] = l | 0x80; + l >>= 7; } - out[outpos++] = i; + out[outpos++] = l; /* store target buffer size */ - i = trg_size; - while (i >= 0x80) { - out[outpos++] = i | 0x80; - i >>= 7; + l = trg_size; + while (l >= 0x80) { + out[outpos++] = l | 0x80; + l >>= 7; } - out[outpos++] = i; + out[outpos++] = l; ref_data = index->src_buf; ref_top = ref_data + index->src_size; -- 2.1.4
Re: [PATCH] Fix delta integer overflows
Junio C Hamano writes: > Martin Koegler writes: > >> From: Martin Koegler >> >> The current delta code produces incorrect pack objects for files > 4GB. >> >> Signed-off-by: Martin Koegler >> --- >> diff-delta.c | 23 --- >> 1 file changed, 12 insertions(+), 11 deletions(-) >> >> Just pass any file > 4 GB to the delta-compression [by increasing the delta >> limits]. >> As file size, a truncated 32bit value will be encoded, leading to broken >> pack files. > > The patch obviously makes the code better and self consistent in > that "struct delta_index" has src_size as ulong, and this function > takes trg_size as ulong, and it was plain wrong for the code to > assume that "i", which is uint, can receive it safely. > > In the longer term we might want to move to size_t or even > uintmax_t, as the ulong on a platform may not be long enough in > order to express the largest file size the platform can have, but > this patch (1) is good even without such a change, and (2) gives a > good foundation to build on if we want such a change on top. > > Thanks. Will queue. Having said that, I am a bit curious how you came to this patch. Was the issue found by code inspection, or did you actually have a real life use case to raise the core.bigFileThreshold configuration to a value above 4GB? Thanks.
Re: [PATCH] Fix delta integer overflows
Junio C Hamano writes: > This is not about where the bar is set. It is about expectation After having thought about this a bit more, I think in the message I am responding to I mischaracterised the aspect of a patch that influences the "expectation". It is much less about who the contributor is but more about what the patch does. If the patch in question were from a more experienced contributor (like you or Peff), my internal reaction would have been "gee, the submitter should have known better that a more complete fix should involve a larger integral type, not stopping at matching the largest type that happens to be used in the interface without updating the interface". But I still would have said that the patch is an improvement--as it indeed is; it does not make things worse anywhere and brings in a more consistency. And I still would have mentioned the same "in the longer term, we would want to use size_t or uintmax_t here, not just ulong". The only thing I would have done differently if the submission were by a more experienced contributor is that I probably would have added "yes this may be an improvement, but I expected you should know better to at least mention the longer term direction to use size_t or uintmax_t in the log message, even if you didn't immediately extend this patch into a more complete series". That one is a difference of expectation between an occasional contributor and an experienced one.
Re: [PATCH] Fix delta integer overflows
Johannes Schindelin writes: > So are you saying that starting with v2.14.0, you accept patches into `pu` > for which you would previously have required multiple iterations before > even considering it for `pu`? > > Frankly, I am a bit surprised that this obvious change from `unsigned > long` to `size_t` is not required in this case before queuing, but if the > rules have changed to lower the bar for patch submissions, I am all for > it. I always felt that we are wasting contributors' time a little too > freely and too deliberately. This is not about where the bar is set. It is about expectation. I do not expect much from occasional (or new) contributors and I try not to demand too much from them. The consequence is that as long as a small patch makes things better without making anything worse, I'd want to be inclusive to encourage them to build obvious improvements on top. Maybe they just want a single patch landed to fix their immediate needs (which may be generic enough and expected to be shared with many other people) without going further, so I may end up queuing something that only helps 40% of people until follow up patches are done to cover the remaining 60% of people, but that is fine as long as the patch does not make things worse (it is not like a patch that helps 40% while hurting the remaining 60% until a follow-up happens). I would expect a lot more from experienced contributors, when I know they are capable of helping the remaining 60% inside the same series and without requiring too much hand-holding from me. The same thing I cannot say to a occasional (or new) contributor---they may not be coorperative, or they may be willing to do more but may require more help polishing their work than the bandwidth I can afford. So if you are volunteering to help by either guiding Martin to aim higher and make this a series with a larger and more complete scope, I'd very much appreciate it. Or you can do a follow-up series on top of what Martin posted. Either is fine by me. Just do not step on each others' toes ;-) I avoided saying all of the above because I didn't want my word taken out of context later to make it sound as if I were belittling the competence of Martin, but you seem to want to force me to say this, so here it is.
Re: [PATCH] Fix delta integer overflows
Hi Junio, On Mon, 7 Aug 2017, Junio C Hamano wrote: > Johannes Schindelin writes: > > >> The patch obviously makes the code better and self consistent in > >> that "struct delta_index" has src_size as ulong, and this function > >> takes trg_size as ulong, and it was plain wrong for the code to > >> assume that "i", which is uint, can receive it safely. > >> > >> In the longer term we might want to move to size_t or even > >> uintmax_t, as the ulong on a platform may not be long enough in > >> order to express the largest file size the platform can have, but > >> this patch (1) is good even without such a change, and (2) gives a > >> good foundation to build on if we want such a change on top. > >> > >> Thanks. Will queue. > > > > This is sad. There is no "may not be long enough". We already know a > > platform where unsigned long is not long enough, don't we? Why leave this > > patch in this intermediate state? > > This is a good foundation to build on, and I never said no further > update on top of this patch is desired. Look for "(2)" in what you > quoted. So are you saying that starting with v2.14.0, you accept patches into `pu` for which you would previously have required multiple iterations before even considering it for `pu`? Frankly, I am a bit surprised that this obvious change from `unsigned long` to `size_t` is not required in this case before queuing, but if the rules have changed to lower the bar for patch submissions, I am all for it. I always felt that we are wasting contributors' time a little too freely and too deliberately. Ciao, Dscho
Re: [PATCH] Fix delta integer overflows
Johannes Schindelin writes: >> The patch obviously makes the code better and self consistent in >> that "struct delta_index" has src_size as ulong, and this function >> takes trg_size as ulong, and it was plain wrong for the code to >> assume that "i", which is uint, can receive it safely. >> >> In the longer term we might want to move to size_t or even >> uintmax_t, as the ulong on a platform may not be long enough in >> order to express the largest file size the platform can have, but >> this patch (1) is good even without such a change, and (2) gives a >> good foundation to build on if we want such a change on top. >> >> Thanks. Will queue. > > This is sad. There is no "may not be long enough". We already know a > platform where unsigned long is not long enough, don't we? Why leave this > patch in this intermediate state? This is a good foundation to build on, and I never said no further update on top of this patch is desired. Look for "(2)" in what you quoted.
Re: [PATCH] Fix delta integer overflows
Hi, On Mon, 7 Aug 2017, Junio C Hamano wrote: > Martin Koegler writes: > > > From: Martin Koegler > > > > The current delta code produces incorrect pack objects for files > 4GB. > > > > Signed-off-by: Martin Koegler > > --- > > diff-delta.c | 23 --- > > 1 file changed, 12 insertions(+), 11 deletions(-) > > > > Just pass any file > 4 GB to the delta-compression [by increasing the delta > > limits]. > > As file size, a truncated 32bit value will be encoded, leading to broken > > pack files. > > The patch obviously makes the code better and self consistent in > that "struct delta_index" has src_size as ulong, and this function > takes trg_size as ulong, and it was plain wrong for the code to > assume that "i", which is uint, can receive it safely. > > In the longer term we might want to move to size_t or even > uintmax_t, as the ulong on a platform may not be long enough in > order to express the largest file size the platform can have, but > this patch (1) is good even without such a change, and (2) gives a > good foundation to build on if we want such a change on top. > > Thanks. Will queue. This is sad. There is no "may not be long enough". We already know a platform where unsigned long is not long enough, don't we? Why leave this patch in this intermediate state? If you want to work on data in memory, then size_t is the appropriate data type. We already use it elsewhere. Let's use it here, too, without the intermediate bump from the incorrect `int` to the equally incorrect `long`. Ciao, Johannes
Re: [PATCH] Fix delta integer overflows
Martin Koegler writes: > From: Martin Koegler > > The current delta code produces incorrect pack objects for files > 4GB. > > Signed-off-by: Martin Koegler > --- > diff-delta.c | 23 --- > 1 file changed, 12 insertions(+), 11 deletions(-) > > Just pass any file > 4 GB to the delta-compression [by increasing the delta > limits]. > As file size, a truncated 32bit value will be encoded, leading to broken pack > files. The patch obviously makes the code better and self consistent in that "struct delta_index" has src_size as ulong, and this function takes trg_size as ulong, and it was plain wrong for the code to assume that "i", which is uint, can receive it safely. In the longer term we might want to move to size_t or even uintmax_t, as the ulong on a platform may not be long enough in order to express the largest file size the platform can have, but this patch (1) is good even without such a change, and (2) gives a good foundation to build on if we want such a change on top. Thanks. Will queue. > > diff --git a/diff-delta.c b/diff-delta.c > index 3797ce6..13e5a01 100644 > --- a/diff-delta.c > +++ b/diff-delta.c > @@ -319,7 +319,8 @@ create_delta(const struct delta_index *index, >const void *trg_buf, unsigned long trg_size, >unsigned long *delta_size, unsigned long max_size) > { > - unsigned int i, outpos, outsize, moff, msize, val; > + unsigned int i, val; > + unsigned long l, outpos, outsize, moff, msize; > int inscnt; > const unsigned char *ref_data, *ref_top, *data, *top; > unsigned char *out; > @@ -336,20 +337,20 @@ create_delta(const struct delta_index *index, > return NULL; > > /* store reference buffer size */ > - i = index->src_size; > - while (i >= 0x80) { > - out[outpos++] = i | 0x80; > - i >>= 7; > + l = index->src_size; > + while (l >= 0x80) { > + out[outpos++] = l | 0x80; > + l >>= 7; > } > - out[outpos++] = i; > + out[outpos++] = l; > > /* store target buffer size */ > - i = trg_size; > - while (i >= 0x80) { > - out[outpos++] = i | 0x80; > - i >>= 7; > + l = trg_size; > + while (l >= 0x80) { > + out[outpos++] = l | 0x80; > + l >>= 7; > } > - out[outpos++] = i; > + out[outpos++] = l; > > ref_data = index->src_buf; > ref_top = ref_data + index->src_size;