[PATCH] pack-bitmap: do not use gcc packed attribute
On Wed, Nov 26, 2014 at 03:09:45PM -0800, Junio C Hamano wrote: * jk/pack-bitmap (2014-08-04) 1 commit - pack-bitmap: do not use gcc packed attribute Hold, waiting for Karsten's replacement. I got tired of waiting, so here it is, I hope good enough for inclusion. -- 8 -- From: Karsten Blees bl...@dcon.de Subject: pack-bitmap: do not use gcc packed attribute The __attribute__ flag may be a noop on some compilers. That's OK as long as the code is correct without the attribute, but in this case it is not. We would typically end up with a struct that is 2 bytes too long due to struct padding, breaking both reading and writing of bitmaps. Instead of marshalling the data in a struct, let's just provide helpers for reading and writing the appropriate types. Besides being correct on all platforms, the result is more efficient and simpler to read. Signed-off-by: Karsten Blees bl...@dcon.de Signed-off-by: Jeff King p...@peff.net --- From Karsten's original, the three changes I made (aside from the commit message) were: 1. The _u32 helpers are now _be32, to make it clear that they are dealing with big-endian integers (and it matches get/put_be32; dropping the u is OK as it is implied by dealing with byte ordering). I left the _u8 variants as-is; I do not think there is precedent for a similar name for single bytes (and the u is meaningful there). Technically you can accomplish the same thing with a single call to sha1write, but I think the helper makes the calling code flow better. 2. I moved the sha1write_* helpers into csum-file.h. It's possible we will find other callers. I left the read_* variants as local to pack-bitmap.c. In theory we could use them elsewhere, but I could not find any other location that used the same mmap base + pos pattern. Some similar code uses a simple pointer which is updated, which would yield something like: uint32_t read_be32(unsigned char **data) { uint32_t ret = get_be32(*data); (*data) += sizeof(ret); return ret; } In theory we could adapt the bitmap code here to use a similar system, but it would involve a bit of surgery (we push the pos pointer forward in a lot of places, not just here, and they would all need to be converted). I don't think it's worth the trouble. The original discussion also raised the question of whether we could do a straight open/read on the bitmap file rather than mmap-ing it. The answer is yes, though it would similarly involve a lot of surgery. Moreover, it's possible that future versions of the bitmap format would benefit from being mmap'd (this one does not). So unless there is a compelling reason to switch away from mmap, I think it makes sense to keep the code as-is. 3. I dropped casts from uint8_t to int in the assignment of xor_offset, etc. These aren't doing anything (the compiler knows both types and handles the conversion fine, and we know that a uint8_t will always fit into an int on any sane platform). csum-file.h | 11 +++ pack-bitmap-write.c | 8 +++- pack-bitmap.c | 22 +++--- pack-bitmap.h | 6 -- 4 files changed, 29 insertions(+), 18 deletions(-) diff --git a/csum-file.h b/csum-file.h index bb543d5..7530927 100644 --- a/csum-file.h +++ b/csum-file.h @@ -39,4 +39,15 @@ extern void sha1flush(struct sha1file *f); extern void crc32_begin(struct sha1file *); extern uint32_t crc32_end(struct sha1file *); +static inline void sha1write_u8(struct sha1file *f, uint8_t data) +{ + sha1write(f, data, sizeof(data)); +} + +static inline void sha1write_be32(struct sha1file *f, uint32_t data) +{ + data = htonl(data); + sha1write(f, data, sizeof(data)); +} + #endif diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 8029ae3..c05d138 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -472,7 +472,6 @@ static void write_selected_commits_v1(struct sha1file *f, for (i = 0; i writer.selected_nr; ++i) { struct bitmapped_commit *stored = writer.selected[i]; - struct bitmap_disk_entry on_disk; int commit_pos = sha1_pos(stored-commit-object.sha1, index, index_nr, sha1_access); @@ -480,11 +479,10 @@ static void write_selected_commits_v1(struct sha1file *f, if (commit_pos 0) die(BUG: trying to write commit not in index); - on_disk.object_pos = htonl(commit_pos); - on_disk.xor_offset = stored-xor_offset; - on_disk.flags = stored-flags; + sha1write_be32(f, commit_pos); + sha1write_u8(f, stored-xor_offset); + sha1write_u8(f, stored-flags); - sha1write(f, on_disk, sizeof(on_disk)); dump_bitmap(f, stored-write_as);
Re: [PATCH] pack-bitmap: do not use gcc packed attribute
Am 05.08.2014 20:47, schrieb Jeff King: On Mon, Aug 04, 2014 at 09:19:46PM +0200, Karsten Blees wrote: Hmm, I wonder if it wouldn't be simpler to read / write the desired on-disk structure directly, without copying to a uchar[6] first. Probably. My initial attempt was to keep together the read/write logic about which sizes each item is, but I think the result ended up unnecessarily verbose and harder to follow. Yeah, having read / write logic in different files is confusing, esp. when not using structs to define the file format. Here's what I came up with (just a sketch, commit message is lacky and the helper functions deserve a better place / name): I like it. Want to clean it up and submit in place of mine? Will do, but it will have to wait till next week. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] pack-bitmap: do not use gcc packed attribute
Karsten Blees karsten.bl...@gmail.com writes: Am 05.08.2014 20:47, schrieb Jeff King: On Mon, Aug 04, 2014 at 09:19:46PM +0200, Karsten Blees wrote: Hmm, I wonder if it wouldn't be simpler to read / write the desired on-disk structure directly, without copying to a uchar[6] first. Probably. My initial attempt was to keep together the read/write logic about which sizes each item is, but I think the result ended up unnecessarily verbose and harder to follow. Yeah, having read / write logic in different files is confusing, esp. when not using structs to define the file format. Here's what I came up with (just a sketch, commit message is lacky and the helper functions deserve a better place / name): I like it. Want to clean it up and submit in place of mine? Will do, but it will have to wait till next week. Thanks, both. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] pack-bitmap: do not use gcc packed attribute
On Mon, Aug 4, 2014 at 9:19 PM, Karsten Blees karsten.bl...@gmail.com wrote: This raises the question why we read via mmap at all The first version of the pack bitmap format I wrote for GitHub was 50% faster to load than this one because it was designed to be mmapable. Eventually we moved to the JGit-compatible bitmap format (because I get paid a lot of money to do as I'm told -- not because of some inherent benefit of the JGit format), which needs to be read sequentially, but I never bothered to change the mmap reading code. I believe your patch makes a lot of sense -- at this point we could as well remove the mmaping altogether and read the file sequentially. Cheers, vmg -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] pack-bitmap: do not use gcc packed attribute
On Mon, Aug 04, 2014 at 09:19:46PM +0200, Karsten Blees wrote: Hmm, I wonder if it wouldn't be simpler to read / write the desired on-disk structure directly, without copying to a uchar[6] first. Probably. My initial attempt was to keep together the read/write logic about which sizes each item is, but I think the result ended up unnecessarily verbose and harder to follow. Here's what I came up with (just a sketch, commit message is lacky and the helper functions deserve a better place / name): I like it. Want to clean it up and submit in place of mine? -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] pack-bitmap: do not use gcc packed attribute
Am 02.08.2014 01:10, schrieb Jeff King: On Fri, Aug 01, 2014 at 06:37:39PM -0400, Jeff King wrote: Btw.: Using struct-packing on 'struct bitmap_disk_entry' means that the binary format of .bitmap files is incompatible between GCC and other builds, correct? The on-disk format is defined by JGit; if there are differences between the builds, that's a bug (and I would not be too surprised if there is one, as bitmaps have gotten very extensive testing on 32- and 64-bit gcc, but probably not much elsewhere). We do use structs to represent disk structures in other bits of the packfile code (e.g., struct pack_idx_header), but the struct is vanilla enough that we assume every compiler gives us two tightly-packed 32-bit integers without having to bother with the packed attribute (and it seems to have worked in practice). We should probably be more careful with that bitmap code. It looks like it wouldn't be too bad to drop it. I'll see if I can come up with a patch. I confirmed that this does break horribly without the packed attribute (as you'd expect; it's asking for 48-bit alignment!). p5310 notices it, _if_ you have jgit installed to check against. Here's a fix. Subject: pack-bitmap: do not use gcc packed attribute The __attribute__ flag may be a noop on some compilers. That's OK as long as the code is correct without the attribute, but in this case it is not. We would typically end up with a struct that is 2 bytes too long due to struct padding, breaking both reading and writing of bitmaps. We can work around this by using an array of unsigned char to represent the data, and relying on get/put_be32 to handle alignment issues as we interact with the array. Signed-off-by: Jeff King p...@peff.net --- The accessors may be overkill; each function is called only a single time in the whole codebase. But doing it this way rather than accessing entry[4] inline at least puts the magic constants all in one place. [...] Hmm, I wonder if it wouldn't be simpler to read / write the desired on-disk structure directly, without copying to a uchar[6] first. When writing, sha1write already buffers the data, so calling this with 4/1/1 bytes of payload shouldn't affect performance. Similarly for reading - we already have a function to read a bitmap and advance the 'file' position, why not have similar functions to read uint8/32 in a stream-based fashion? This raises the question why we read via mmap at all (without munmap()ing the file when done...). We copy all data into internal data structures anyway. Is an fopen/fread-based solution (with fread_u8/_u32 helpers) that much slower? Here's what I came up with (just a sketch, commit message is lacky and the helper functions deserve a better place / name): 8- Subject: [PATCH] pack-bitmap: do not use packed structs to read / write bitmap files Signed-off-by: Karsten Blees bl...@dcon.de --- pack-bitmap-write.c | 18 +- pack-bitmap.c | 21 ++--- pack-bitmap.h | 6 -- 3 files changed, 27 insertions(+), 18 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 5f1791a..01995cb 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -465,6 +465,16 @@ static const unsigned char *sha1_access(size_t pos, void *table) return index[pos]-sha1; } +static inline void sha1write_u8(struct sha1file *f, uint8_t data) +{ + sha1write(f, data, sizeof(data)); +} +static inline void sha1write_u32(struct sha1file *f, uint32_t data) +{ + data = htonl(data); + sha1write(f, data, sizeof(data)); +} + static void write_selected_commits_v1(struct sha1file *f, struct pack_idx_entry **index, uint32_t index_nr) @@ -473,7 +483,6 @@ static void write_selected_commits_v1(struct sha1file *f, for (i = 0; i writer.selected_nr; ++i) { struct bitmapped_commit *stored = writer.selected[i]; - struct bitmap_disk_entry on_disk; int commit_pos = sha1_pos(stored-commit-object.sha1, index, index_nr, sha1_access); @@ -481,11 +490,10 @@ static void write_selected_commits_v1(struct sha1file *f, if (commit_pos 0) die(BUG: trying to write commit not in index); - on_disk.object_pos = htonl(commit_pos); - on_disk.xor_offset = stored-xor_offset; - on_disk.flags = stored-flags; + sha1write_u32(f, commit_pos); + sha1write_u8(f, stored-xor_offset); + sha1write_u8(f, stored-flags); - sha1write(f, on_disk, sizeof(on_disk)); dump_bitmap(f, stored-write_as); } } diff --git a/pack-bitmap.c b/pack-bitmap.c index 91e4101..cb1b2dd 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -197,13 +197,23 @@ static struct stored_bitmap *store_bitmap(struct
[PATCH] pack-bitmap: do not use gcc packed attribute
On Fri, Aug 01, 2014 at 06:37:39PM -0400, Jeff King wrote: Btw.: Using struct-packing on 'struct bitmap_disk_entry' means that the binary format of .bitmap files is incompatible between GCC and other builds, correct? The on-disk format is defined by JGit; if there are differences between the builds, that's a bug (and I would not be too surprised if there is one, as bitmaps have gotten very extensive testing on 32- and 64-bit gcc, but probably not much elsewhere). We do use structs to represent disk structures in other bits of the packfile code (e.g., struct pack_idx_header), but the struct is vanilla enough that we assume every compiler gives us two tightly-packed 32-bit integers without having to bother with the packed attribute (and it seems to have worked in practice). We should probably be more careful with that bitmap code. It looks like it wouldn't be too bad to drop it. I'll see if I can come up with a patch. I confirmed that this does break horribly without the packed attribute (as you'd expect; it's asking for 48-bit alignment!). p5310 notices it, _if_ you have jgit installed to check against. Here's a fix. -- 8 -- Subject: pack-bitmap: do not use gcc packed attribute The __attribute__ flag may be a noop on some compilers. That's OK as long as the code is correct without the attribute, but in this case it is not. We would typically end up with a struct that is 2 bytes too long due to struct padding, breaking both reading and writing of bitmaps. We can work around this by using an array of unsigned char to represent the data, and relying on get/put_be32 to handle alignment issues as we interact with the array. Signed-off-by: Jeff King p...@peff.net --- The accessors may be overkill; each function is called only a single time in the whole codebase. But doing it this way rather than accessing entry[4] inline at least puts the magic constants all in one place. pack-bitmap-write.c | 10 -- pack-bitmap.c | 12 ++-- pack-bitmap.h | 42 +- 3 files changed, 47 insertions(+), 17 deletions(-) diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 5f1791a..f885a7a 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -473,7 +473,7 @@ static void write_selected_commits_v1(struct sha1file *f, for (i = 0; i writer.selected_nr; ++i) { struct bitmapped_commit *stored = writer.selected[i]; - struct bitmap_disk_entry on_disk; + unsigned char on_disk[BITMAP_DISK_ENTRY_LEN]; int commit_pos = sha1_pos(stored-commit-object.sha1, index, index_nr, sha1_access); @@ -481,11 +481,9 @@ static void write_selected_commits_v1(struct sha1file *f, if (commit_pos 0) die(BUG: trying to write commit not in index); - on_disk.object_pos = htonl(commit_pos); - on_disk.xor_offset = stored-xor_offset; - on_disk.flags = stored-flags; - - sha1write(f, on_disk, sizeof(on_disk)); + bitmap_disk_entry_create(on_disk, commit_pos, +stored-xor_offset, stored-flags); + sha1write(f, on_disk, sizeof(on_disk)); dump_bitmap(f, stored-write_as); } } diff --git a/pack-bitmap.c b/pack-bitmap.c index 91e4101..1b2a473 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -203,7 +203,7 @@ static int load_bitmap_entries_v1(struct bitmap_index *index) uint32_t i; struct stored_bitmap **recent_bitmaps; - struct bitmap_disk_entry *entry; + unsigned char *entry; recent_bitmaps = xcalloc(MAX_XOR_OFFSET, sizeof(struct stored_bitmap)); @@ -214,14 +214,14 @@ static int load_bitmap_entries_v1(struct bitmap_index *index) uint32_t commit_idx_pos; const unsigned char *sha1; - entry = (struct bitmap_disk_entry *)(index-map + index-map_pos); - index-map_pos += sizeof(struct bitmap_disk_entry); + entry = index-map + index-map_pos; + index-map_pos += BITMAP_DISK_ENTRY_LEN; - commit_idx_pos = ntohl(entry-object_pos); + commit_idx_pos = bitmap_disk_entry_object_pos(entry); sha1 = nth_packed_object_sha1(index-pack, commit_idx_pos); - xor_offset = (int)entry-xor_offset; - flags = (int)entry-flags; + xor_offset = (int)bitmap_disk_entry_xor_offset(entry); + flags = (int)bitmap_disk_entry_flags(entry); bitmap = read_bitmap_1(index); if (!bitmap) diff --git a/pack-bitmap.h b/pack-bitmap.h index 8b7f4e9..0d57706 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -5,11 +5,43 @@ #include khash.h #include pack-objects.h -struct bitmap_disk_entry { - uint32_t object_pos; - uint8_t xor_offset; - uint8_t
Re: [PATCH] pack-bitmap: do not use gcc packed attribute
On Fri, Aug 01, 2014 at 07:10:44PM -0400, Jeff King wrote: I confirmed that this does break horribly without the packed attribute (as you'd expect; it's asking for 48-bit alignment!). p5310 notices it, _if_ you have jgit installed to check against. Er, that should be t5310, of course. p5310 is the perf test, which does not notice the problem. :) -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html