[PATCH] pack-bitmap: do not use gcc packed attribute

2014-11-26 Thread Jeff King
On Wed, Nov 26, 2014 at 03:09:45PM -0800, Junio C Hamano wrote:

 * jk/pack-bitmap (2014-08-04) 1 commit
  - pack-bitmap: do not use gcc packed attribute
 
  Hold, waiting for Karsten's replacement.

I got tired of waiting, so here it is, I hope good enough for inclusion.

-- 8 --
From: Karsten Blees bl...@dcon.de
Subject: pack-bitmap: do not use gcc packed attribute

The __attribute__ flag may be a noop on some compilers.
That's OK as long as the code is correct without the
attribute, but in this case it is not. We would typically
end up with a struct that is 2 bytes too long due to struct
padding, breaking both reading and writing of bitmaps.

Instead of marshalling the data in a struct, let's just
provide helpers for reading and writing the appropriate
types. Besides being correct on all platforms, the result is
more efficient and simpler to read.

Signed-off-by: Karsten Blees bl...@dcon.de
Signed-off-by: Jeff King p...@peff.net
---
From Karsten's original, the three changes I made (aside from the commit
message) were:

  1. The _u32 helpers are now _be32, to make it clear that they are
 dealing with big-endian integers (and it matches get/put_be32;
 dropping the u is OK as it is implied by dealing with byte
 ordering). I left the _u8 variants as-is; I do not think there is
 precedent for a similar name for single bytes (and the u is
 meaningful there). Technically you can accomplish the same thing
 with a single call to sha1write, but I think the helper makes the
 calling code flow better.

  2. I moved the sha1write_* helpers into csum-file.h. It's possible
 we will find other callers. I left the read_* variants as local
 to pack-bitmap.c. In theory we could use them elsewhere, but I
 could not find any other location that used the same mmap base +
 pos pattern. Some similar code uses a simple pointer which is
 updated, which would yield something like:

   uint32_t read_be32(unsigned char **data)
   {
   uint32_t ret = get_be32(*data);
   (*data) += sizeof(ret);
   return ret;
   }

 In theory we could adapt the bitmap code here to use a similar
 system, but it would involve a bit of surgery (we push the pos
 pointer forward in a lot of places, not just here, and they would
 all need to be converted). I don't think it's worth the trouble.

 The original discussion also raised the question of whether we
 could do a straight open/read on the bitmap file rather than
 mmap-ing it. The answer is yes, though it would similarly involve a
 lot of surgery. Moreover, it's possible that future versions of the
 bitmap format would benefit from being mmap'd (this one does not).
 So unless there is a compelling reason to switch away from mmap,
 I think it makes sense to keep the code as-is.

  3. I dropped casts from uint8_t to int in the assignment of
 xor_offset, etc.  These aren't doing anything (the compiler knows
 both types and handles the conversion fine, and we know that a
 uint8_t will always fit into an int on any sane platform).

 csum-file.h | 11 +++
 pack-bitmap-write.c |  8 +++-
 pack-bitmap.c   | 22 +++---
 pack-bitmap.h   |  6 --
 4 files changed, 29 insertions(+), 18 deletions(-)

diff --git a/csum-file.h b/csum-file.h
index bb543d5..7530927 100644
--- a/csum-file.h
+++ b/csum-file.h
@@ -39,4 +39,15 @@ extern void sha1flush(struct sha1file *f);
 extern void crc32_begin(struct sha1file *);
 extern uint32_t crc32_end(struct sha1file *);
 
+static inline void sha1write_u8(struct sha1file *f, uint8_t data)
+{
+   sha1write(f, data, sizeof(data));
+}
+
+static inline void sha1write_be32(struct sha1file *f, uint32_t data)
+{
+   data = htonl(data);
+   sha1write(f, data, sizeof(data));
+}
+
 #endif
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 8029ae3..c05d138 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -472,7 +472,6 @@ static void write_selected_commits_v1(struct sha1file *f,
 
for (i = 0; i  writer.selected_nr; ++i) {
struct bitmapped_commit *stored = writer.selected[i];
-   struct bitmap_disk_entry on_disk;
 
int commit_pos =
sha1_pos(stored-commit-object.sha1, index, index_nr, 
sha1_access);
@@ -480,11 +479,10 @@ static void write_selected_commits_v1(struct sha1file *f,
if (commit_pos  0)
die(BUG: trying to write commit not in index);
 
-   on_disk.object_pos = htonl(commit_pos);
-   on_disk.xor_offset = stored-xor_offset;
-   on_disk.flags = stored-flags;
+   sha1write_be32(f, commit_pos);
+   sha1write_u8(f, stored-xor_offset);
+   sha1write_u8(f, stored-flags);
 
-   sha1write(f, on_disk, sizeof(on_disk));
dump_bitmap(f, stored-write_as);
 

Re: [PATCH] pack-bitmap: do not use gcc packed attribute

2014-08-06 Thread Karsten Blees
Am 05.08.2014 20:47, schrieb Jeff King:
 On Mon, Aug 04, 2014 at 09:19:46PM +0200, Karsten Blees wrote:
 
 Hmm, I wonder if it wouldn't be simpler to read / write the desired on-disk
 structure directly, without copying to a uchar[6] first.
 
 Probably. My initial attempt was to keep together the read/write logic
 about which sizes each item is, but I think the result ended up
 unnecessarily verbose and harder to follow.
 

Yeah, having read / write logic in different files is confusing, esp. when
not using structs to define the file format.

 Here's what I came up with (just a sketch, commit message is lacky and the
 helper functions deserve a better place / name):
 
 I like it. Want to clean it up and submit in place of mine?
 

Will do, but it will have to wait till next week.

 -Peff
 

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pack-bitmap: do not use gcc packed attribute

2014-08-06 Thread Junio C Hamano
Karsten Blees karsten.bl...@gmail.com writes:

 Am 05.08.2014 20:47, schrieb Jeff King:
 On Mon, Aug 04, 2014 at 09:19:46PM +0200, Karsten Blees wrote:
 
 Hmm, I wonder if it wouldn't be simpler to read / write the desired on-disk
 structure directly, without copying to a uchar[6] first.
 
 Probably. My initial attempt was to keep together the read/write logic
 about which sizes each item is, but I think the result ended up
 unnecessarily verbose and harder to follow.
 

 Yeah, having read / write logic in different files is confusing, esp. when
 not using structs to define the file format.

 Here's what I came up with (just a sketch, commit message is lacky and the
 helper functions deserve a better place / name):
 
 I like it. Want to clean it up and submit in place of mine?
 

 Will do, but it will have to wait till next week.

Thanks, both.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pack-bitmap: do not use gcc packed attribute

2014-08-05 Thread Vicent Martí
On Mon, Aug 4, 2014 at 9:19 PM, Karsten Blees karsten.bl...@gmail.com wrote:
 This raises the question why we read via mmap at all

The first version of the pack bitmap format I wrote for GitHub was 50%
faster to load than this one because it was designed to be mmapable.
Eventually we moved to the JGit-compatible bitmap format (because I
get paid a lot of money to do as I'm told -- not because of some
inherent benefit of the JGit format), which needs to be read
sequentially, but I never bothered to change the mmap reading code.

I believe your patch makes a lot of sense -- at this point we could as
well remove the mmaping altogether and read the file sequentially.

Cheers,
vmg
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pack-bitmap: do not use gcc packed attribute

2014-08-05 Thread Jeff King
On Mon, Aug 04, 2014 at 09:19:46PM +0200, Karsten Blees wrote:

 Hmm, I wonder if it wouldn't be simpler to read / write the desired on-disk
 structure directly, without copying to a uchar[6] first.

Probably. My initial attempt was to keep together the read/write logic
about which sizes each item is, but I think the result ended up
unnecessarily verbose and harder to follow.

 Here's what I came up with (just a sketch, commit message is lacky and the
 helper functions deserve a better place / name):

I like it. Want to clean it up and submit in place of mine?

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] pack-bitmap: do not use gcc packed attribute

2014-08-04 Thread Karsten Blees
Am 02.08.2014 01:10, schrieb Jeff King:
 On Fri, Aug 01, 2014 at 06:37:39PM -0400, Jeff King wrote:
 
 Btw.: Using struct-packing on 'struct bitmap_disk_entry' means that the
 binary format of .bitmap files is incompatible between GCC and other
 builds, correct?

 The on-disk format is defined by JGit; if there are differences between
 the builds, that's a bug (and I would not be too surprised if there is
 one, as bitmaps have gotten very extensive testing on 32- and 64-bit
 gcc, but probably not much elsewhere).

 We do use structs to represent disk structures in other bits of the
 packfile code (e.g., struct pack_idx_header), but the struct is vanilla
 enough that we assume every compiler gives us two tightly-packed 32-bit
 integers without having to bother with the packed attribute (and it
 seems to have worked in practice).

 We should probably be more careful with that bitmap code. It looks like
 it wouldn't be too bad to drop it. I'll see if I can come up with a
 patch.
 
 I confirmed that this does break horribly without the packed attribute
 (as you'd expect; it's asking for 48-bit alignment!). p5310 notices it,
 _if_ you have jgit installed to check against.
 
 Here's a fix.
 
 Subject: pack-bitmap: do not use gcc packed attribute
 
 The __attribute__ flag may be a noop on some compilers.
 That's OK as long as the code is correct without the
 attribute, but in this case it is not. We would typically
 end up with a struct that is 2 bytes too long due to struct
 padding, breaking both reading and writing of bitmaps.
 
 We can work around this by using an array of unsigned char
 to represent the data, and relying on get/put_be32 to handle
 alignment issues as we interact with the array.
 
 Signed-off-by: Jeff King p...@peff.net
 ---
 The accessors may be overkill; each function is called only a single
 time in the whole codebase. But doing it this way rather than accessing
 entry[4] inline at least puts the magic constants all in one place.
 
[...]

Hmm, I wonder if it wouldn't be simpler to read / write the desired on-disk
structure directly, without copying to a uchar[6] first.

When writing, sha1write already buffers the data, so calling this with 4/1/1
bytes of payload shouldn't affect performance.

Similarly for reading - we already have a function to read a bitmap and
advance the 'file' position, why not have similar functions to read uint8/32
in a stream-based fashion?

This raises the question why we read via mmap at all (without munmap()ing the
file when done...). We copy all data into internal data structures anyway. Is
an fopen/fread-based solution (with fread_u8/_u32 helpers) that much slower?


Here's what I came up with (just a sketch, commit message is lacky and the
helper functions deserve a better place / name):

8-
Subject: [PATCH] pack-bitmap: do not use packed structs to read / write bitmap
 files

Signed-off-by: Karsten Blees bl...@dcon.de
---
 pack-bitmap-write.c | 18 +-
 pack-bitmap.c   | 21 ++---
 pack-bitmap.h   |  6 --
 3 files changed, 27 insertions(+), 18 deletions(-)

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 5f1791a..01995cb 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -465,6 +465,16 @@ static const unsigned char *sha1_access(size_t pos, void 
*table)
return index[pos]-sha1;
 }
 
+static inline void sha1write_u8(struct sha1file *f, uint8_t data)
+{
+   sha1write(f, data, sizeof(data));
+}
+static inline void sha1write_u32(struct sha1file *f, uint32_t data)
+{
+   data = htonl(data);
+   sha1write(f, data, sizeof(data));
+}
+
 static void write_selected_commits_v1(struct sha1file *f,
  struct pack_idx_entry **index,
  uint32_t index_nr)
@@ -473,7 +483,6 @@ static void write_selected_commits_v1(struct sha1file *f,
 
for (i = 0; i  writer.selected_nr; ++i) {
struct bitmapped_commit *stored = writer.selected[i];
-   struct bitmap_disk_entry on_disk;
 
int commit_pos =
sha1_pos(stored-commit-object.sha1, index, index_nr, 
sha1_access);
@@ -481,11 +490,10 @@ static void write_selected_commits_v1(struct sha1file *f,
if (commit_pos  0)
die(BUG: trying to write commit not in index);
 
-   on_disk.object_pos = htonl(commit_pos);
-   on_disk.xor_offset = stored-xor_offset;
-   on_disk.flags = stored-flags;
+   sha1write_u32(f, commit_pos);
+   sha1write_u8(f, stored-xor_offset);
+   sha1write_u8(f, stored-flags);
 
-   sha1write(f, on_disk, sizeof(on_disk));
dump_bitmap(f, stored-write_as);
}
 }
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 91e4101..cb1b2dd 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -197,13 +197,23 @@ static struct stored_bitmap *store_bitmap(struct 

[PATCH] pack-bitmap: do not use gcc packed attribute

2014-08-01 Thread Jeff King
On Fri, Aug 01, 2014 at 06:37:39PM -0400, Jeff King wrote:

  Btw.: Using struct-packing on 'struct bitmap_disk_entry' means that the
  binary format of .bitmap files is incompatible between GCC and other
  builds, correct?
 
 The on-disk format is defined by JGit; if there are differences between
 the builds, that's a bug (and I would not be too surprised if there is
 one, as bitmaps have gotten very extensive testing on 32- and 64-bit
 gcc, but probably not much elsewhere).
 
 We do use structs to represent disk structures in other bits of the
 packfile code (e.g., struct pack_idx_header), but the struct is vanilla
 enough that we assume every compiler gives us two tightly-packed 32-bit
 integers without having to bother with the packed attribute (and it
 seems to have worked in practice).
 
 We should probably be more careful with that bitmap code. It looks like
 it wouldn't be too bad to drop it. I'll see if I can come up with a
 patch.

I confirmed that this does break horribly without the packed attribute
(as you'd expect; it's asking for 48-bit alignment!). p5310 notices it,
_if_ you have jgit installed to check against.

Here's a fix.

-- 8 --
Subject: pack-bitmap: do not use gcc packed attribute

The __attribute__ flag may be a noop on some compilers.
That's OK as long as the code is correct without the
attribute, but in this case it is not. We would typically
end up with a struct that is 2 bytes too long due to struct
padding, breaking both reading and writing of bitmaps.

We can work around this by using an array of unsigned char
to represent the data, and relying on get/put_be32 to handle
alignment issues as we interact with the array.

Signed-off-by: Jeff King p...@peff.net
---
The accessors may be overkill; each function is called only a single
time in the whole codebase. But doing it this way rather than accessing
entry[4] inline at least puts the magic constants all in one place.

 pack-bitmap-write.c | 10 --
 pack-bitmap.c   | 12 ++--
 pack-bitmap.h   | 42 +-
 3 files changed, 47 insertions(+), 17 deletions(-)

diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 5f1791a..f885a7a 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -473,7 +473,7 @@ static void write_selected_commits_v1(struct sha1file *f,
 
for (i = 0; i  writer.selected_nr; ++i) {
struct bitmapped_commit *stored = writer.selected[i];
-   struct bitmap_disk_entry on_disk;
+   unsigned char on_disk[BITMAP_DISK_ENTRY_LEN];
 
int commit_pos =
sha1_pos(stored-commit-object.sha1, index, index_nr, 
sha1_access);
@@ -481,11 +481,9 @@ static void write_selected_commits_v1(struct sha1file *f,
if (commit_pos  0)
die(BUG: trying to write commit not in index);
 
-   on_disk.object_pos = htonl(commit_pos);
-   on_disk.xor_offset = stored-xor_offset;
-   on_disk.flags = stored-flags;
-
-   sha1write(f, on_disk, sizeof(on_disk));
+   bitmap_disk_entry_create(on_disk, commit_pos,
+stored-xor_offset, stored-flags);
+   sha1write(f, on_disk, sizeof(on_disk));
dump_bitmap(f, stored-write_as);
}
 }
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 91e4101..1b2a473 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -203,7 +203,7 @@ static int load_bitmap_entries_v1(struct bitmap_index 
*index)
 
uint32_t i;
struct stored_bitmap **recent_bitmaps;
-   struct bitmap_disk_entry *entry;
+   unsigned char *entry;
 
recent_bitmaps = xcalloc(MAX_XOR_OFFSET, sizeof(struct stored_bitmap));
 
@@ -214,14 +214,14 @@ static int load_bitmap_entries_v1(struct bitmap_index 
*index)
uint32_t commit_idx_pos;
const unsigned char *sha1;
 
-   entry = (struct bitmap_disk_entry *)(index-map + 
index-map_pos);
-   index-map_pos += sizeof(struct bitmap_disk_entry);
+   entry = index-map + index-map_pos;
+   index-map_pos += BITMAP_DISK_ENTRY_LEN;
 
-   commit_idx_pos = ntohl(entry-object_pos);
+   commit_idx_pos = bitmap_disk_entry_object_pos(entry);
sha1 = nth_packed_object_sha1(index-pack, commit_idx_pos);
 
-   xor_offset = (int)entry-xor_offset;
-   flags = (int)entry-flags;
+   xor_offset = (int)bitmap_disk_entry_xor_offset(entry);
+   flags = (int)bitmap_disk_entry_flags(entry);
 
bitmap = read_bitmap_1(index);
if (!bitmap)
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 8b7f4e9..0d57706 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -5,11 +5,43 @@
 #include khash.h
 #include pack-objects.h
 
-struct bitmap_disk_entry {
-   uint32_t object_pos;
-   uint8_t xor_offset;
-   uint8_t 

Re: [PATCH] pack-bitmap: do not use gcc packed attribute

2014-08-01 Thread Jeff King
On Fri, Aug 01, 2014 at 07:10:44PM -0400, Jeff King wrote:

 I confirmed that this does break horribly without the packed attribute
 (as you'd expect; it's asking for 48-bit alignment!). p5310 notices it,
 _if_ you have jgit installed to check against.

Er, that should be t5310, of course. p5310 is the perf test, which does
not notice the problem. :)

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html