Re: [PATCH V2 1/2] Fix delta integer overflows
Jeff Kingwrites: > On Thu, Aug 10, 2017 at 01:07:07PM -0700, Junio C Hamano wrote: > >> Perhaps we should teach the receiving end to notice that the varint >> data it reads encodes a size that is too large for it to grok and >> die. With that, we can safely move forward with whatever size_t >> each platform uses. > > Yes, this is very important even for "unsigned long". I'd worry that > malicious input could cause us to wrap to 0, and we'd potentially write > into a too-small buffer[1]. > > There's some prior art with checking this against bitsizeof() in > unpack_object_header_buffer() but get_delta_hdr_size() does not seem to > have a check. > > -Peff > > [1] In most cases it's _probably_ not a vulnerability to wrap here, > because we'd just read less data than we ought to. But it makes me > nervous nonetheless. As I said in my other message in the thread, as long as the callers of get_delta_hdr_size() are written correctly, it should be OK. And patch_delta() should be OK, even for "unsigned long" when it is too small. It just will not produce correct result and instead abort, and the patch under discussion fixes that.
Re: [PATCH V2 1/2] Fix delta integer overflows
Martin Koeglerwrites: > On Thu, Aug 10, 2017 at 01:07:07PM -0700, Junio C Hamano wrote: >> > The current delta code produces incorrect pack objects for files > 4GB. >> > >> > Signed-off-by: Martin Koegler >> >> I am a bit torn on this change. >> >> Given that this is not merely a local storage format but it also is >> an interchange format, we would probably want to make sure that the >> receiving end (e.g. get_delta_hdr_size() that is used at the >> beginning of patch_delta()) on a platform whose size_t is smaller >> than that of a platform that produced the delta stream with this >> code behaves "sensibly". > > Overflows would already be detected during unpack: > * Assuming size_t = uint32, the system should just be able to handle up to > 4GB of process memory. > So loading any source blob larger than 4GB should already fail. After re-reading patch_delta(), I agree. The loading of the base object (i.e. the procedure to prepare src_buf & src_size fed to the function) would have failed already. > * Assuming size_t = uint32 and a source blob size < 4 GB, the > target blob size would be readed truncated. apply_delta checks, > that the generated result matches the encoded size - this check > would fail. When target size is truncated, we allocate much less than the real target size, and start reading the input. But patch_delta() makes sure that what is copied into the buffer, either by copying bytes literally from the pack data or by copying from src_buf, will never cause the resulting dst_buf overflow (especially after your change updates "size" to size_t), so we should be safe on this side, too. > >> If we replaced ulong we use in create/patch delta codepaths with >> uint32_t, that would be safer, just because the encoder would not be >> able to emit varint that is larger than the receivers to handle. >> But that defeats the whole point of using varint() to encode the >> sizes in the first place. It was partly done for space saving, but >> more for allowing larger sizes and larger ulong in the future >> without having to change the file format. > > The ondisk-format is able to handle larger sizes [using a slightly worse > compression]. > The current implementation is just buggy. > > I would not move to uint32_t. The remaing part of git uses "unsigned long", > so the > delta code could still be called with larger files. Oh, absolutely. I was merely commenting on the lack of any error checking in get_delta_hdr_size() helper function, and dismissing a naive move to limiting the file format by insisting on uint32_t as an unworkable workaround.
Re: [PATCH V2 1/2] Fix delta integer overflows
On Thu, Aug 10, 2017 at 01:07:07PM -0700, Junio C Hamano wrote: > > The current delta code produces incorrect pack objects for files > 4GB. > > > > Signed-off-by: Martin Koegler> > I am a bit torn on this change. > > Given that this is not merely a local storage format but it also is > an interchange format, we would probably want to make sure that the > receiving end (e.g. get_delta_hdr_size() that is used at the > beginning of patch_delta()) on a platform whose size_t is smaller > than that of a platform that produced the delta stream with this > code behaves "sensibly". Overflows would already be detected during unpack: * Assuming size_t = uint32, the system should just be able to handle up to 4GB of process memory. So loading any source blob larger than 4GB should already fail. * Assuming size_t = uint32 and a source blob size < 4 GB, the target blob size would be readed truncated. apply_delta checks, that the generated result matches the encoded size - this check would fail. > If we replaced ulong we use in create/patch delta codepaths with > uint32_t, that would be safer, just because the encoder would not be > able to emit varint that is larger than the receivers to handle. > But that defeats the whole point of using varint() to encode the > sizes in the first place. It was partly done for space saving, but > more for allowing larger sizes and larger ulong in the future > without having to change the file format. The ondisk-format is able to handle larger sizes [using a slightly worse compression]. The current implementation is just buggy. I would not move to uint32_t. The remaing part of git uses "unsigned long", so the delta code could still be called with larger files. We will also see more RAM as well as CPU power - reducing the limits just because of older plattforms, which can't even handle such large blobs, is the wrong way. Regards, Martin
Re: [PATCH V2 1/2] Fix delta integer overflows
On Thu, Aug 10, 2017 at 01:07:07PM -0700, Junio C Hamano wrote: > Perhaps we should teach the receiving end to notice that the varint > data it reads encodes a size that is too large for it to grok and > die. With that, we can safely move forward with whatever size_t > each platform uses. Yes, this is very important even for "unsigned long". I'd worry that malicious input could cause us to wrap to 0, and we'd potentially write into a too-small buffer[1]. There's some prior art with checking this against bitsizeof() in unpack_object_header_buffer() but get_delta_hdr_size() does not seem to have a check. -Peff [1] In most cases it's _probably_ not a vulnerability to wrap here, because we'd just read less data than we ought to. But it makes me nervous nonetheless.
Re: [PATCH V2 1/2] Fix delta integer overflows
Martin Koeglerwrites: > From: Martin Koegler Just a nitpick on the patch title. As "git shortlog --no-merges" output would tell you, we try to prefix the title with a short name of the area of codebase we are touching, followed by a colon and a space and then remainder without extra capitalization. Perhaps Subject: delta: fix enconding size larger than an "uint" can hold > The current delta code produces incorrect pack objects for files > 4GB. > > Signed-off-by: Martin Koegler I am a bit torn on this change. The original is indeed bad in that the code does not guarantee that an intermediate variable like 'l' is not large enough to hold the true size we know in index->src_size, and in that sense this change is an improvement. Given that this is not merely a local storage format but it also is an interchange format, we would probably want to make sure that the receiving end (e.g. get_delta_hdr_size() that is used at the beginning of patch_delta()) on a platform whose size_t is smaller than that of a platform that produced the delta stream with this code behaves "sensibly". If we replaced ulong we use in create/patch delta codepaths with uint32_t, that would be safer, just because the encoder would not be able to emit varint that is larger than the receivers to handle. But that defeats the whole point of using varint() to encode the sizes in the first place. It was partly done for space saving, but more for allowing larger sizes and larger ulong in the future without having to change the file format. Perhaps we should teach the receiving end to notice that the varint data it reads encodes a size that is too large for it to grok and die. With that, we can safely move forward with whatever size_t each platform uses. Thanks. > --- > For next. > > diff-delta.c | 24 +--- > 1 file changed, 13 insertions(+), 11 deletions(-) > > diff --git a/diff-delta.c b/diff-delta.c > index 3797ce6..cd238c8 100644 > --- a/diff-delta.c > +++ b/diff-delta.c > @@ -319,7 +319,9 @@ create_delta(const struct delta_index *index, >const void *trg_buf, unsigned long trg_size, >unsigned long *delta_size, unsigned long max_size) > { > - unsigned int i, outpos, outsize, moff, msize, val; > + unsigned int i, val; > + off_t outpos, moff; > + size_t l, outsize, msize; > int inscnt; > const unsigned char *ref_data, *ref_top, *data, *top; > unsigned char *out; > @@ -336,20 +338,20 @@ create_delta(const struct delta_index *index, > return NULL; > > /* store reference buffer size */ > - i = index->src_size; > - while (i >= 0x80) { > - out[outpos++] = i | 0x80; > - i >>= 7; > + l = index->src_size; > + while (l >= 0x80) { > + out[outpos++] = l | 0x80; > + l >>= 7; > } > - out[outpos++] = i; > + out[outpos++] = l; > > /* store target buffer size */ > - i = trg_size; > - while (i >= 0x80) { > - out[outpos++] = i | 0x80; > - i >>= 7; > + l = trg_size; > + while (l >= 0x80) { > + out[outpos++] = l | 0x80; > + l >>= 7; > } > - out[outpos++] = i; > + out[outpos++] = l; > > ref_data = index->src_buf; > ref_top = ref_data + index->src_size;
[PATCH V2 1/2] Fix delta integer overflows
From: Martin KoeglerThe current delta code produces incorrect pack objects for files > 4GB. Signed-off-by: Martin Koegler --- For next. diff-delta.c | 24 +--- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/diff-delta.c b/diff-delta.c index 3797ce6..cd238c8 100644 --- a/diff-delta.c +++ b/diff-delta.c @@ -319,7 +319,9 @@ create_delta(const struct delta_index *index, const void *trg_buf, unsigned long trg_size, unsigned long *delta_size, unsigned long max_size) { - unsigned int i, outpos, outsize, moff, msize, val; + unsigned int i, val; + off_t outpos, moff; + size_t l, outsize, msize; int inscnt; const unsigned char *ref_data, *ref_top, *data, *top; unsigned char *out; @@ -336,20 +338,20 @@ create_delta(const struct delta_index *index, return NULL; /* store reference buffer size */ - i = index->src_size; - while (i >= 0x80) { - out[outpos++] = i | 0x80; - i >>= 7; + l = index->src_size; + while (l >= 0x80) { + out[outpos++] = l | 0x80; + l >>= 7; } - out[outpos++] = i; + out[outpos++] = l; /* store target buffer size */ - i = trg_size; - while (i >= 0x80) { - out[outpos++] = i | 0x80; - i >>= 7; + l = trg_size; + while (l >= 0x80) { + out[outpos++] = l | 0x80; + l >>= 7; } - out[outpos++] = i; + out[outpos++] = l; ref_data = index->src_buf; ref_top = ref_data + index->src_size; -- 2.1.4