Re: [PATCH V2 1/2] Fix delta integer overflows

2017-08-11 Thread Junio C Hamano
Jeff King  writes:

> On Thu, Aug 10, 2017 at 01:07:07PM -0700, Junio C Hamano wrote:
>
>> Perhaps we should teach the receiving end to notice that the varint
>> data it reads encodes a size that is too large for it to grok and
>> die.  With that, we can safely move forward with whatever size_t
>> each platform uses.
>
> Yes, this is very important even for "unsigned long". I'd worry that
> malicious input could cause us to wrap to 0, and we'd potentially write
> into a too-small buffer[1].
>
> There's some prior art with checking this against bitsizeof() in
> unpack_object_header_buffer() but get_delta_hdr_size() does not seem to
> have a check.
>
> -Peff
>
> [1] In most cases it's _probably_ not a vulnerability to wrap here,
> because we'd just read less data than we ought to. But it makes me
> nervous nonetheless.

As I said in my other message in the thread, as long as the callers
of get_delta_hdr_size() are written correctly, it should be OK.  And
patch_delta() should be OK, even for "unsigned long" when it is too
small.  It just will not produce correct result and instead abort,
and the patch under discussion fixes that.




Re: [PATCH V2 1/2] Fix delta integer overflows

2017-08-11 Thread Junio C Hamano
Martin Koegler  writes:

> On Thu, Aug 10, 2017 at 01:07:07PM -0700, Junio C Hamano wrote:
>> > The current delta code produces incorrect pack objects for files > 4GB.
>> >
>> > Signed-off-by: Martin Koegler 
>> 
>> I am a bit torn on this change.
>> 
>> Given that this is not merely a local storage format but it also is
>> an interchange format, we would probably want to make sure that the
>> receiving end (e.g. get_delta_hdr_size() that is used at the
>> beginning of patch_delta()) on a platform whose size_t is smaller
>> than that of a platform that produced the delta stream with this
>> code behaves "sensibly".
>
> Overflows would already be detected during unpack:
> * Assuming size_t = uint32, the system should just be able to handle up to 
> 4GB of process memory.
> So loading any source blob larger than 4GB should already fail.

After re-reading patch_delta(), I agree.  The loading of the base
object (i.e. the procedure to prepare src_buf & src_size fed to the
function) would have failed already.

> * Assuming size_t = uint32 and a source blob size < 4 GB, the
> target blob size would be readed truncated. apply_delta checks,
> that the generated result matches the encoded size - this check
> would fail.

When target size is truncated, we allocate much less than the real
target size, and start reading the input.  But patch_delta() makes
sure that what is copied into the buffer, either by copying bytes
literally from the pack data or by copying from src_buf, will never
cause the resulting dst_buf overflow (especially after your change
updates "size" to size_t), so we should be safe on this side, too.
>  
>> If we replaced ulong we use in create/patch delta codepaths with
>> uint32_t, that would be safer, just because the encoder would not be
>> able to emit varint that is larger than the receivers to handle.
>> But that defeats the whole point of using varint() to encode the
>> sizes in the first place.  It was partly done for space saving, but
>> more for allowing larger sizes and larger ulong in the future
>> without having to change the file format.
>
> The ondisk-format is able to handle larger sizes [using a slightly worse 
> compression].
> The current implementation is just buggy.
>
> I would not move to uint32_t. The remaing part of git uses "unsigned long", 
> so the 
> delta code could still be called with larger files.

Oh, absolutely.  I was merely commenting on the lack of any error
checking in get_delta_hdr_size() helper function, and dismissing a
naive move to limiting the file format by insisting on uint32_t as
an unworkable workaround.


Re: [PATCH V2 1/2] Fix delta integer overflows

2017-08-11 Thread Martin Koegler
On Thu, Aug 10, 2017 at 01:07:07PM -0700, Junio C Hamano wrote:
> > The current delta code produces incorrect pack objects for files > 4GB.
> >
> > Signed-off-by: Martin Koegler 
> 
> I am a bit torn on this change.
> 
> Given that this is not merely a local storage format but it also is
> an interchange format, we would probably want to make sure that the
> receiving end (e.g. get_delta_hdr_size() that is used at the
> beginning of patch_delta()) on a platform whose size_t is smaller
> than that of a platform that produced the delta stream with this
> code behaves "sensibly".

Overflows would already be detected during unpack:
* Assuming size_t = uint32, the system should just be able to handle up to 4GB 
of process memory.
So loading any source blob larger than 4GB should already fail.
* Assuming size_t = uint32 and a source blob size < 4 GB, the target blob size 
would be readed
truncated. apply_delta checks, that the generated result matches the encoded 
size - this check would
fail.
 
> If we replaced ulong we use in create/patch delta codepaths with
> uint32_t, that would be safer, just because the encoder would not be
> able to emit varint that is larger than the receivers to handle.
> But that defeats the whole point of using varint() to encode the
> sizes in the first place.  It was partly done for space saving, but
> more for allowing larger sizes and larger ulong in the future
> without having to change the file format.

The ondisk-format is able to handle larger sizes [using a slightly worse 
compression].
The current implementation is just buggy.

I would not move to uint32_t. The remaing part of git uses "unsigned long", so 
the 
delta code could still be called with larger files.

We will also see more RAM as well as CPU power - reducing the limits just 
because of older plattforms,
which can't even handle such large blobs, is the wrong way.

Regards,
Martin


Re: [PATCH V2 1/2] Fix delta integer overflows

2017-08-10 Thread Jeff King
On Thu, Aug 10, 2017 at 01:07:07PM -0700, Junio C Hamano wrote:

> Perhaps we should teach the receiving end to notice that the varint
> data it reads encodes a size that is too large for it to grok and
> die.  With that, we can safely move forward with whatever size_t
> each platform uses.

Yes, this is very important even for "unsigned long". I'd worry that
malicious input could cause us to wrap to 0, and we'd potentially write
into a too-small buffer[1].

There's some prior art with checking this against bitsizeof() in
unpack_object_header_buffer() but get_delta_hdr_size() does not seem to
have a check.

-Peff

[1] In most cases it's _probably_ not a vulnerability to wrap here,
because we'd just read less data than we ought to. But it makes me
nervous nonetheless.


Re: [PATCH V2 1/2] Fix delta integer overflows

2017-08-10 Thread Junio C Hamano
Martin Koegler  writes:

> From: Martin Koegler 

Just a nitpick on the patch title.  As "git shortlog --no-merges"
output would tell you, we try to prefix the title with a short name
of the area of codebase we are touching, followed by a colon and a
space and then remainder without extra capitalization.  Perhaps

Subject: delta: fix enconding size larger than an "uint" can hold

> The current delta code produces incorrect pack objects for files > 4GB.
>
> Signed-off-by: Martin Koegler 

I am a bit torn on this change.

The original is indeed bad in that the code does not guarantee that
an intermediate variable like 'l' is not large enough to hold the
true size we know in index->src_size, and in that sense this change
is an improvement.

Given that this is not merely a local storage format but it also is
an interchange format, we would probably want to make sure that the
receiving end (e.g. get_delta_hdr_size() that is used at the
beginning of patch_delta()) on a platform whose size_t is smaller
than that of a platform that produced the delta stream with this
code behaves "sensibly".

If we replaced ulong we use in create/patch delta codepaths with
uint32_t, that would be safer, just because the encoder would not be
able to emit varint that is larger than the receivers to handle.
But that defeats the whole point of using varint() to encode the
sizes in the first place.  It was partly done for space saving, but
more for allowing larger sizes and larger ulong in the future
without having to change the file format.

Perhaps we should teach the receiving end to notice that the varint
data it reads encodes a size that is too large for it to grok and
die.  With that, we can safely move forward with whatever size_t
each platform uses.

Thanks.

> ---
> For next.
>
>  diff-delta.c | 24 +---
>  1 file changed, 13 insertions(+), 11 deletions(-)
>
> diff --git a/diff-delta.c b/diff-delta.c
> index 3797ce6..cd238c8 100644
> --- a/diff-delta.c
> +++ b/diff-delta.c
> @@ -319,7 +319,9 @@ create_delta(const struct delta_index *index,
>const void *trg_buf, unsigned long trg_size,
>unsigned long *delta_size, unsigned long max_size)
>  {
> - unsigned int i, outpos, outsize, moff, msize, val;
> + unsigned int i, val;
> + off_t outpos, moff;
> + size_t l, outsize, msize;
>   int inscnt;
>   const unsigned char *ref_data, *ref_top, *data, *top;
>   unsigned char *out;
> @@ -336,20 +338,20 @@ create_delta(const struct delta_index *index,
>   return NULL;
>  
>   /* store reference buffer size */
> - i = index->src_size;
> - while (i >= 0x80) {
> - out[outpos++] = i | 0x80;
> - i >>= 7;
> + l = index->src_size;
> + while (l >= 0x80) {
> + out[outpos++] = l | 0x80;
> + l >>= 7;
>   }
> - out[outpos++] = i;
> + out[outpos++] = l;
>  
>   /* store target buffer size */
> - i = trg_size;
> - while (i >= 0x80) {
> - out[outpos++] = i | 0x80;
> - i >>= 7;
> + l = trg_size;
> + while (l >= 0x80) {
> + out[outpos++] = l | 0x80;
> + l >>= 7;
>   }
> - out[outpos++] = i;
> + out[outpos++] = l;
>  
>   ref_data = index->src_buf;
>   ref_top = ref_data + index->src_size;


[PATCH V2 1/2] Fix delta integer overflows

2017-08-10 Thread Martin Koegler
From: Martin Koegler 

The current delta code produces incorrect pack objects for files > 4GB.

Signed-off-by: Martin Koegler 
---
For next.

 diff-delta.c | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/diff-delta.c b/diff-delta.c
index 3797ce6..cd238c8 100644
--- a/diff-delta.c
+++ b/diff-delta.c
@@ -319,7 +319,9 @@ create_delta(const struct delta_index *index,
 const void *trg_buf, unsigned long trg_size,
 unsigned long *delta_size, unsigned long max_size)
 {
-   unsigned int i, outpos, outsize, moff, msize, val;
+   unsigned int i, val;
+   off_t outpos, moff;
+   size_t l, outsize, msize;
int inscnt;
const unsigned char *ref_data, *ref_top, *data, *top;
unsigned char *out;
@@ -336,20 +338,20 @@ create_delta(const struct delta_index *index,
return NULL;
 
/* store reference buffer size */
-   i = index->src_size;
-   while (i >= 0x80) {
-   out[outpos++] = i | 0x80;
-   i >>= 7;
+   l = index->src_size;
+   while (l >= 0x80) {
+   out[outpos++] = l | 0x80;
+   l >>= 7;
}
-   out[outpos++] = i;
+   out[outpos++] = l;
 
/* store target buffer size */
-   i = trg_size;
-   while (i >= 0x80) {
-   out[outpos++] = i | 0x80;
-   i >>= 7;
+   l = trg_size;
+   while (l >= 0x80) {
+   out[outpos++] = l | 0x80;
+   l >>= 7;
}
-   out[outpos++] = i;
+   out[outpos++] = l;
 
ref_data = index->src_buf;
ref_top = ref_data + index->src_size;
-- 
2.1.4