Re: [PATCH] Ensure __BYTE_ORDER is always set

2014-01-30 Thread Jeff King
On Thu, Jan 30, 2014 at 02:55:41PM -0500, Brian Gernhardt wrote:

 a201c20 (ewah: support platforms that require aligned reads) added a
 reliance on the existence of __BYTE_ORDER and __BIG_ENDIAN.  However,
 these macros are spelled without the leading __ on some platforms (OS
 X at least).  In this case, the endian-swapping code was added even
 when unnecessary, which caused assertion failures in
 t5310-pack-bitmaps.sh as the code that used the bitmap would read past
 the end.
 
 We already had code to handle this case in compat/bswap.h, but it was
 only used if we couldn't already find a reasonable version of bswap64.
 Move the macro-defining and checking code out of a conditional so that
 either __BYTE_ORDER is defined or we get a compilation error instead
 of a runtime error in the bitmap code.

Thanks, this makes sense, and matches the assumption that a201c20 made.

I do find the failure mode interesting. The endian-swapping code kicked
in when it did not, meaning your are on a big-endian system. Is this on
an ancient PPC Mac? Or is the problem that the code did not kick in when
it should?

Either way, we should perhaps be more careful in the bitmap code, too,
that the values we get are sensible. It's better to die(your bitmap is
broken) than to read off the end of the array. I can't seem to trigger
the same failure mode, though. On my x86 system, turning off the
endian-swap (i.e., the opposite of what should happen) makes t5310 fail,
but it is because we end up trying to set the bit very far into a
dynamic bitfield, and die allocating memory.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Ensure __BYTE_ORDER is always set

2014-01-30 Thread Jeff King
On Thu, Jan 30, 2014 at 03:45:38PM -0500, Jeff King wrote:

 Either way, we should perhaps be more careful in the bitmap code, too,
 that the values we get are sensible. It's better to die(your bitmap is
 broken) than to read off the end of the array. I can't seem to trigger
 the same failure mode, though. On my x86 system, turning off the
 endian-swap (i.e., the opposite of what should happen) makes t5310 fail,
 but it is because we end up trying to set the bit very far into a
 dynamic bitfield, and die allocating memory.

I think we could do this with something like the patch below, which
checks two things:

  1. When we expand the ewah, it has the same number of bits we claimed
 in the on-disk header.

  2. The ewah header matches the number of objects in the packfile.

The first catches a corruption in the ewah data itself, and the latter
when the header is corrupted. You can test either by breaking the
endian-swapping. :)

Vicent, can you confirm my assumptions about the round-to-nearest-64 in
the patch below? I assume that the bit_size on-disk may be rounded in
some cases (and it is -- if you take out the rounding, this breaks
things). Is that sane? Or should the on-disk ewah bit_size header always
match the number of objects in the pack, and our writer is just being
sloppy?

diff --git a/ewah/ewah_bitmap.c b/ewah/ewah_bitmap.c
index 9ced2da..a8f77cf 100644
--- a/ewah/ewah_bitmap.c
+++ b/ewah/ewah_bitmap.c
@@ -343,6 +343,18 @@ int ewah_iterator_next(eword_t *next, struct ewah_iterator 
*it)
if (it-pointer = it-buffer_size)
return 0;
 
+   /*
+* If we return more bits than the ewah advertised, then either
+* our data bits or the bit_size field was corrupted, and we
+* risk a caller overwriting their own buffer (if they used
+* bit_size to size their buffer in the first place).
+*
+* We don't have a good way of returning an error here, so let's
+* just die.
+*/
+   if (!it-words_remaining--)
+   die(ewah bitmap contains more bits than it claims);
+
if (it-compressed  it-rl) {
it-compressed++;
*next = it-b ? (eword_t)(~0) : 0;
@@ -371,6 +383,8 @@ void ewah_iterator_init(struct ewah_iterator *it, struct 
ewah_bitmap *parent)
it-buffer_size = parent-buffer_size;
it-pointer = 0;
 
+   it-words_remaining = (parent-bit_size + 63) / 64;
+
it-lw = 0;
it-rl = 0;
it-compressed = 0;
diff --git a/ewah/ewok.h b/ewah/ewok.h
index 43adeb5..a3f49de 100644
--- a/ewah/ewok.h
+++ b/ewah/ewok.h
@@ -144,6 +144,7 @@ struct ewah_iterator {
size_t buffer_size;
 
size_t pointer;
+   size_t words_remaining;
eword_t compressed, literals;
eword_t rl, lw;
int b;
diff --git a/pack-bitmap.c b/pack-bitmap.c
index ae0b57b..a31e529 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -118,6 +118,7 @@ static struct ewah_bitmap *lookup_stored_bitmap(struct 
stored_bitmap *st)
  */
 static struct ewah_bitmap *read_bitmap_1(struct bitmap_index *index)
 {
+   size_t expected_bits;
struct ewah_bitmap *b = ewah_pool_new();
 
int bitmap_size = ewah_read_mmap(b,
@@ -130,6 +131,31 @@ static struct ewah_bitmap *read_bitmap_1(struct 
bitmap_index *index)
return NULL;
}
 
+   /*
+* It's OK for us to have too fewer bits than objects, as the EWAH
+* writer may have simply left off an ending that is all-zeroes.
+*
+* However it's not OK for us to have too many bits, as that would
+* entail touching objects that we don't have. We are careful
+* enough to avoid doing so in later code, but in the case of
+* nonsensical values, we would want to avoid even allocating
+* memory to hold the expanded bitmap.
+*
+* There is one exception: we may go over to round up to the next
+* 64-bit ewah word, since the storage comes in chunks of that size.
+*/
+   expected_bits = index-pack-num_objects;
+   if (expected_bits  63) {
+   expected_bits = ~63;
+   expected_bits += 64;
+   }
+   if (b-bit_size  expected_bits) {
+   error(unexpected number of bits in bitmap: %PRIuMAX  
%PRIuMAX,
+ (uintmax_t)b-bit_size, (uintmax_t)expected_bits);
+   ewah_pool_free(b);
+   return NULL;
+   }
+
index-map_pos += bitmap_size;
return b;
 }
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Ensure __BYTE_ORDER is always set

2014-01-30 Thread Jonathan Nieder
Hi,

Brian Gernhardt wrote:

 a201c20 (ewah: support platforms that require aligned reads) added a
 reliance on the existence of __BYTE_ORDER and __BIG_ENDIAN.  However,
 these macros are spelled without the leading __ on some platforms (OS
 X at least).  In this case, the endian-swapping code was added even
 when unnecessary, which caused assertion failures in
 t5310-pack-bitmaps.sh as the code that used the bitmap would read past
 the end.

 We already had code to handle this case in compat/bswap.h, but it was
 only used if we couldn't already find a reasonable version of bswap64.

Makes sense.  Sorry I missed this.

In an ideal world I would prefer to just rely on ntohll when it's
decent (meaning that the '#if __BYTE_ORDER != __BIG_ENDIAN' block
could be written as

if (ntohll(1) != 1) {
...
}

or

if (ntohll(1) == 1)
; /* Big endian.  Nothing to do.
else {
...
}

).  But compat/bswap.h already relies on knowing the endianness at
preprocessing time so that wouldn't buy anything.

Another in an ideal world option: make the loop unconditional after
checking that optimizers on big-endian systems realize it's a noop.
In any event, in the real world your patch looks like the right thing
to do.

Reviewed-by: Jonathan Nieder jrnie...@gmail.com
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Ensure __BYTE_ORDER is always set

2014-01-30 Thread Jonathan Nieder
Hi,

Jeff King wrote:

 I do find the failure mode interesting. The endian-swapping code kicked
 in when it did not

Odd --- wouldn't the #if condition expand to '0 != 0'?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Ensure __BYTE_ORDER is always set

2014-01-30 Thread Jeff King
On Thu, Jan 30, 2014 at 02:02:33PM -0800, Jonathan Nieder wrote:

 In an ideal world I would prefer to just rely on ntohll when it's
 decent (meaning that the '#if __BYTE_ORDER != __BIG_ENDIAN' block
 could be written as
 
   if (ntohll(1) != 1) {
   ...
   }
 
 or
 
   if (ntohll(1) == 1)
   ; /* Big endian.  Nothing to do.
   else {
   ...
   }
 
 ).  But compat/bswap.h already relies on knowing the endianness at
 preprocessing time so that wouldn't buy anything.

Yes, though it would simplify things because we are depending on ntohll
being defined, rather than some obscure macros.

 Another in an ideal world option: make the loop unconditional after
 checking that optimizers on big-endian systems realize it's a noop.
 In any event, in the real world your patch looks like the right thing
 to do.

I had the same thought when reading the original patch. The loop after
pre-processing on a big-endian system should look like:

  {
  size_t i;
  for (i = 0; i  self-buffer_size; ++i)
  self-buffer[i] = self-buffer[i];
  }

It really seems like the sort of thing that any halfway decent compiler
should be able to turn into a noop. I'm OK to go that route, and if you
don't have a halfway decent compiler, tough cookies; git will waste your
precious nanoseconds doing a relatively small loop. If this loop
actually mattered, we would probably do better still to leave it in disk
order, and fix it up as-needed only when we look at a particular bitmap
(we do not typically need to look at all of them on disk).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Ensure __BYTE_ORDER is always set

2014-01-30 Thread Jeff King
On Thu, Jan 30, 2014 at 02:12:05PM -0800, Jonathan Nieder wrote:

 Jeff King wrote:
 
  I do find the failure mode interesting. The endian-swapping code kicked
  in when it did not
 
 Odd --- wouldn't the #if condition expand to '0 != 0'?

I had the same thought. The kicked in when it did not is Brian's claim
from the original, which is why I was confused. He replied to me
off-list, and I think he may simply have had it backwards. :)

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Ensure __BYTE_ORDER is always set

2014-01-30 Thread Brian Gernhardt
[Re-send to include the list. Meant to hit reply all, not just reply.]

 On Jan 30, 2014, at 3:45 PM, Jeff King p...@peff.net wrote:
 
 I do find the failure mode interesting. The endian-swapping code kicked
 in when it did not, meaning your are on a big-endian system. Is this on
 an ancient PPC Mac? Or is the problem that the code did not kick in when
 it should?

Erm.  I was perhaps writing my analysis too quickly.  I was running on a x86_64 
Mac, so it wasn't included when it was supposed to be.  Or whichever you said 
that I didn't.  ;-)

 Either way, we should perhaps be more careful in the bitmap code, too,
 that the values we get are sensible. It's better to die(your bitmap is
 broken) than to read off the end of the array. I can't seem to trigger
 the same failure mode, though. On my x86 system, turning off the
 endian-swap (i.e., the opposite of what should happen) makes t5310 fail,
 but it is because we end up trying to set the bit very far into a
 dynamic bitfield, and die allocating memory.

To be more specific, I hit an assertion failure at in ewah_iterator_next() 
(ewah/ewah_bitmap.c:355) when running `git rev-list --test-bitmap HEAD` (and 
others if I don't have it die immediately).  That seems to me that there is a 
check to ensure it doesn't run off the end.  Perhaps you have assertions 
disabled so hit an error somewhere else?

~~ Brian Gernhardt

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Ensure __BYTE_ORDER is always set

2014-01-30 Thread Eric Sunshine
On Thu, Jan 30, 2014 at 4:50 PM, Jeff King p...@peff.net wrote:
 I think we could do this with something like the patch below, which
 checks two things:

   1. When we expand the ewah, it has the same number of bits we claimed
  in the on-disk header.

   2. The ewah header matches the number of objects in the packfile.

 The first catches a corruption in the ewah data itself, and the latter
 when the header is corrupted. You can test either by breaking the
 endian-swapping. :)

 diff --git a/pack-bitmap.c b/pack-bitmap.c
 index ae0b57b..a31e529 100644
 --- a/pack-bitmap.c
 +++ b/pack-bitmap.c
 @@ -130,6 +131,31 @@ static struct ewah_bitmap *read_bitmap_1(struct 
 bitmap_index *index)
 return NULL;
 }

 +   /*
 +* It's OK for us to have too fewer bits than objects, as the EWAH

s/fewer/few/

 +* writer may have simply left off an ending that is all-zeroes.
 +*
 +* However it's not OK for us to have too many bits, as that would
 +* entail touching objects that we don't have. We are careful
 +* enough to avoid doing so in later code, but in the case of
 +* nonsensical values, we would want to avoid even allocating
 +* memory to hold the expanded bitmap.
 +*
 +* There is one exception: we may go over to round up to the next
 +* 64-bit ewah word, since the storage comes in chunks of that size.
 +*/
 +   expected_bits = index-pack-num_objects;
 +   if (expected_bits  63) {
 +   expected_bits = ~63;
 +   expected_bits += 64;
 +   }
 +   if (b-bit_size  expected_bits) {
 +   error(unexpected number of bits in bitmap: %PRIuMAX  
 %PRIuMAX,
 + (uintmax_t)b-bit_size, (uintmax_t)expected_bits);
 +   ewah_pool_free(b);
 +   return NULL;
 +   }
 +
 index-map_pos += bitmap_size;
 return b;
  }
 --
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html