Re: [PATCH 2/4] pack v4: add v4_size to struct delta_base_cache_entry
On Sat, Sep 14, 2013 at 11:22 AM, Nicolas Pitre n...@fluxnic.net wrote: The cache is currently updated by the caller. The caller may ask for a copy of 2 entries from a base object, but that base object may itself copy those objects from somewhere else in a larger chunk. Let's consider this example: tree A -- 0 (0) copy 2 entries from tree B starting at entry 0 1 (2) copy 1 entry from tree B starting at entry 3 tree B -- 0 (0) copy 6 entries from tree C starting at entry 0 1 (6) entry foo.txt 2 (7) entry bar.txt Right now, the code calls decode_entries() to decode 2 entries from tree B but those entries are part of a copy from tree C. When that call returns, the cache is updated as if tree B entry #2 would start at offset 1 but this is wrong because offset 0 in tree B covers 6 entries and therefore offset 1 is for entry #6. So this needs a rethink. I've given it some thought and see no simple/efficient way do it when 2+ depth is involved. Ideally tree A should refer to tree C directly for the first two entries, but in general we can't enforce that a copy sequence must refer to non-copy sequences only. Caching flattened tree B up until the 6th entry may help, but then there's no need to cache offsets anymore because we could just cache tree A.. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] pack v4: add v4_size to struct delta_base_cache_entry
On Sun, 15 Sep 2013, Duy Nguyen wrote: On Sat, Sep 14, 2013 at 11:22 AM, Nicolas Pitre n...@fluxnic.net wrote: The cache is currently updated by the caller. The caller may ask for a copy of 2 entries from a base object, but that base object may itself copy those objects from somewhere else in a larger chunk. Let's consider this example: tree A -- 0 (0) copy 2 entries from tree B starting at entry 0 1 (2) copy 1 entry from tree B starting at entry 3 tree B -- 0 (0) copy 6 entries from tree C starting at entry 0 1 (6) entry foo.txt 2 (7) entry bar.txt Right now, the code calls decode_entries() to decode 2 entries from tree B but those entries are part of a copy from tree C. When that call returns, the cache is updated as if tree B entry #2 would start at offset 1 but this is wrong because offset 0 in tree B covers 6 entries and therefore offset 1 is for entry #6. So this needs a rethink. I've given it some thought and see no simple/efficient way do it when 2+ depth is involved. Ideally tree A should refer to tree C directly for the first two entries, but in general we can't enforce that a copy sequence must refer to non-copy sequences only. Caching flattened tree B up until the 6th entry may help, but then there's no need to cache offsets anymore because we could just cache tree A.. Well... for the case where tree A should refer to tree C directly, that should be optimized by packv4-create/pack-objects. But as far as my offset cache is concerned, I came around to rethink it. I managed to write decent and logical code this time. The speed-up is significant even without any tuning. Here it is: commit aa43ec18956a2c835112f086a0a59d7fbc35a021 Author: Nicolas Pitre n...@fluxnic.net Date: Fri Sep 13 20:56:31 2013 -0400 packv4-parse.c: add tree offset caching It is a common pattern to see a tree object copy a few entries from another tree object, possibly from a non zero offset, then provide a few entries of its own just to go back to the previous tree object to copy some more entries. Each time this happens, the tree object being copied has to be parsed from the beginning over again up to the desired offset which is rather wasteful. Let's introduce a tree offset cache to record the entry position and offset when tree parsing stops so a subsequent copy from the same tree object can be resumed without having to start from the beginning all the time. Without doing further tuning on this cache, performing git rev-list --all --objects | wc -l on my copy of git.git went from 14.5s down to 6.6s. Signed-off-by: Nicolas Pitre n...@fluxnic.net diff --git a/packv4-parse.c b/packv4-parse.c index 38c22b0..b8af702 100644 --- a/packv4-parse.c +++ b/packv4-parse.c @@ -378,6 +378,40 @@ static int copy_canonical_tree_entries(struct packed_git *p, off_t offset, return 0; } +/* ordering is so that member alignment takes the least amount of space */ +struct pv4_tree_cache { + off_t base_offset; + off_t offset; + off_t last_copy_base; + struct packed_git *p; + unsigned int pos; + unsigned int nb_entries; +}; + +#define CACHE_SIZE 256 +static struct pv4_tree_cache pv4_tree_cache[CACHE_SIZE]; + +static struct pv4_tree_cache *get_tree_offset_cache(struct packed_git *p, off_t base_offset) +{ + struct pv4_tree_cache *c; + unsigned long hash; + + hash = (unsigned long)p + (unsigned long)base_offset; + hash += (hash 8) + (hash 16); + hash %= CACHE_SIZE; + + c = pv4_tree_cache[hash]; + if (c-p != p || c-base_offset != base_offset) { + c-p = p; + c-base_offset = base_offset; + c-offset = 0; + c-last_copy_base = 0; + c-pos = 0; + c-nb_entries = 0; + } + return c; +} + static int tree_entry_prefix(unsigned char *buf, unsigned long size, const unsigned char *path, int path_len, unsigned mode) @@ -405,39 +439,72 @@ static int tree_entry_prefix(unsigned char *buf, unsigned long size, } static int decode_entries(struct packed_git *p, struct pack_window **w_curs, - off_t offset, unsigned int start, unsigned int count, - unsigned char **dstp, unsigned long *sizep, - int parse_hdr) + off_t obj_offset, unsigned int start, unsigned int count, + unsigned char **dstp, unsigned long *sizep) { unsigned long avail; - unsigned int nb_entries; const unsigned char *src, *scp; - off_t copy_objoffset = 0; + unsigned int curpos; + off_t offset, copy_objoffset; + struct pv4_tree_cache *c; + + c = get_tree_offset_cache(p, obj_offset); + if (count start c-nb_entries start = c-pos +
Re: [PATCH 2/4] pack v4: add v4_size to struct delta_base_cache_entry
On Mon, Sep 16, 2013 at 11:42 AM, Nicolas Pitre n...@fluxnic.net wrote: On Sun, 15 Sep 2013, Duy Nguyen wrote: On Sat, Sep 14, 2013 at 11:22 AM, Nicolas Pitre n...@fluxnic.net wrote: The cache is currently updated by the caller. The caller may ask for a copy of 2 entries from a base object, but that base object may itself copy those objects from somewhere else in a larger chunk. Let's consider this example: tree A -- 0 (0) copy 2 entries from tree B starting at entry 0 1 (2) copy 1 entry from tree B starting at entry 3 tree B -- 0 (0) copy 6 entries from tree C starting at entry 0 1 (6) entry foo.txt 2 (7) entry bar.txt Right now, the code calls decode_entries() to decode 2 entries from tree B but those entries are part of a copy from tree C. When that call returns, the cache is updated as if tree B entry #2 would start at offset 1 but this is wrong because offset 0 in tree B covers 6 entries and therefore offset 1 is for entry #6. So this needs a rethink. I've given it some thought and see no simple/efficient way do it when 2+ depth is involved. Ideally tree A should refer to tree C directly for the first two entries, but in general we can't enforce that a copy sequence must refer to non-copy sequences only. Caching flattened tree B up until the 6th entry may help, but then there's no need to cache offsets anymore because we could just cache tree A.. Well... for the case where tree A should refer to tree C directly, that should be optimized by packv4-create/pack-objects. But as far as my offset cache is concerned, I came around to rethink it. I managed to write decent and logical code this time. OK if I read your patch correctly, in the example above, after processing tree A entry #0, the position of tree B entry #0 is cached (instead of #6). When tree A #2 is processed, we check out the cache and parse tree B's copy sequence again, which leads to a cached position of tree C, correct? We still have to jump through tree B unnnecessarily, but that can be dealt with later. Good thing that it works, and the perf improvement is impressive too. I think we now have the core of struct pv4_tree_desc :) -- Duy The speed-up is significant even without any tuning. Here it is: commit aa43ec18956a2c835112f086a0a59d7fbc35a021 Author: Nicolas Pitre n...@fluxnic.net Date: Fri Sep 13 20:56:31 2013 -0400 packv4-parse.c: add tree offset caching It is a common pattern to see a tree object copy a few entries from another tree object, possibly from a non zero offset, then provide a few entries of its own just to go back to the previous tree object to copy some more entries. Each time this happens, the tree object being copied has to be parsed from the beginning over again up to the desired offset which is rather wasteful. Let's introduce a tree offset cache to record the entry position and offset when tree parsing stops so a subsequent copy from the same tree object can be resumed without having to start from the beginning all the time. Without doing further tuning on this cache, performing git rev-list --all --objects | wc -l on my copy of git.git went from 14.5s down to 6.6s. Signed-off-by: Nicolas Pitre n...@fluxnic.net diff --git a/packv4-parse.c b/packv4-parse.c index 38c22b0..b8af702 100644 --- a/packv4-parse.c +++ b/packv4-parse.c @@ -378,6 +378,40 @@ static int copy_canonical_tree_entries(struct packed_git *p, off_t offset, return 0; } +/* ordering is so that member alignment takes the least amount of space */ +struct pv4_tree_cache { + off_t base_offset; + off_t offset; + off_t last_copy_base; + struct packed_git *p; + unsigned int pos; + unsigned int nb_entries; +}; + +#define CACHE_SIZE 256 +static struct pv4_tree_cache pv4_tree_cache[CACHE_SIZE]; + +static struct pv4_tree_cache *get_tree_offset_cache(struct packed_git *p, off_t base_offset) +{ + struct pv4_tree_cache *c; + unsigned long hash; + + hash = (unsigned long)p + (unsigned long)base_offset; + hash += (hash 8) + (hash 16); + hash %= CACHE_SIZE; + + c = pv4_tree_cache[hash]; + if (c-p != p || c-base_offset != base_offset) { + c-p = p; + c-base_offset = base_offset; + c-offset = 0; + c-last_copy_base = 0; + c-pos = 0; + c-nb_entries = 0; + } + return c; +} + static int tree_entry_prefix(unsigned char *buf, unsigned long size, const unsigned char *path, int path_len, unsigned mode) @@ -405,39 +439,72 @@ static int tree_entry_prefix(unsigned char *buf, unsigned long size, } static int decode_entries(struct packed_git *p, struct pack_window **w_curs, - off_t offset, unsigned int
Re: [PATCH 2/4] pack v4: add v4_size to struct delta_base_cache_entry
On Thu, 12 Sep 2013, Nguyễn Thái Ngọc Duy wrote: The intention is to store flat v4 trees in delta base cache to avoid repeatedly expanding copy sequences in v4 trees. When the user needs to unpack a v4 tree and the tree is found in the cache, the tree will be converted back to canonical format. Future tree_desc interface may skip canonical format and read v4 trees directly. For that to work we need to keep track of v4 tree size after all copy sequences are expanded, which is the purpose of this new field. Hmmm I think this is going in a wrong direction. If we want a cache for pv4 objects, it would be best to keep it separate from the current delta cache in sha1_file.c for code maintainability reasons. I'd suggest creating another cache in packv4-parse.c instead. Yet, pavkv4 tree walking shouldn't need a cache since there is nothing to expand in the end. Entries should be advanced one by one as they are needed. Granted when converting back to a canonical object we need all of them, but eventually this shouldn't be the main mode of operation. However I can see that, as you say, the same base object is repeatedly referenced. This means that we need to parse it over and over again just to find the right offset where the needed entries start. We probably end up skipping over the first entries in a tree object multiple times. And that would be true even when the core code learns to walk pv4 trees directly. So here's the beginning of a tree offset cache to mitigate this problem. It is incomplete as the cache function is not implemented, etc. But that should give you the idea. diff --git a/packv4-parse.c b/packv4-parse.c index ae3e6a5..39b4f31 100644 --- a/packv4-parse.c +++ b/packv4-parse.c @@ -339,9 +339,9 @@ void *pv4_get_commit(struct packed_git *p, struct pack_window **w_curs, return dst; } -static int copy_canonical_tree_entries(struct packed_git *p, off_t offset, - unsigned int start, unsigned int count, - unsigned char **dstp, unsigned long *sizep) +static off_t copy_canonical_tree_entries(struct packed_git *p, off_t offset, + unsigned int start, unsigned int count, + unsigned char **dstp, unsigned long *sizep) { void *data; const unsigned char *from, *end; @@ -369,13 +369,19 @@ static int copy_canonical_tree_entries(struct packed_git *p, off_t offset, if (end - from *sizep) { free(data); - return -1; + return 0; } memcpy(*dstp, from, end - from); *dstp += end - from; *sizep -= end - from; free(data); - return 0; + + + /* +* We won't cache this tree's current offset so let's just +* indicate success with a non-zero return value. +*/ + return 1; } static int tree_entry_prefix(unsigned char *buf, unsigned long size, @@ -404,39 +410,56 @@ static int tree_entry_prefix(unsigned char *buf, unsigned long size, return len; } -static int decode_entries(struct packed_git *p, struct pack_window **w_curs, - off_t offset, unsigned int start, unsigned int count, - unsigned char **dstp, unsigned long *sizep, int hdr) +static off_t decode_entries(struct packed_git *p, struct pack_window **w_curs, + off_t offset, unsigned int start, unsigned int count, + unsigned char **dstp, unsigned long *sizep, int hdr) { - unsigned long avail; - unsigned int nb_entries; + unsigned long avail = 0; const unsigned char *src, *scp; off_t copy_objoffset = 0; - src = use_pack(p, w_curs, offset, avail); - scp = src; - if (hdr) { + unsigned int nb_entries; + + src = use_pack(p, w_curs, offset, avail); + scp = src; + /* we need to skip over the object header */ while (*scp 128) if (++scp - src = avail - 20) - return -1; + return 0; + /* is this a canonical tree object? */ if ((*scp 0xf) == OBJ_TREE) return copy_canonical_tree_entries(p, offset, start, count, dstp, sizep); + /* let's still make sure this is actually a pv4 tree */ if ((*scp++ 0xf) != OBJ_PV4_TREE) - return -1; - } + return 0; + + nb_entries = decode_varint(scp); + if (scp == src || start nb_entries || + count nb_entries - start) + return 0; + + /* +* If this is a partial
Re: [PATCH 2/4] pack v4: add v4_size to struct delta_base_cache_entry
On Fri, Sep 13, 2013 at 8:27 PM, Nicolas Pitre n...@fluxnic.net wrote: On Thu, 12 Sep 2013, Nguyễn Thái Ngọc Duy wrote: The intention is to store flat v4 trees in delta base cache to avoid repeatedly expanding copy sequences in v4 trees. When the user needs to unpack a v4 tree and the tree is found in the cache, the tree will be converted back to canonical format. Future tree_desc interface may skip canonical format and read v4 trees directly. For that to work we need to keep track of v4 tree size after all copy sequences are expanded, which is the purpose of this new field. Hmmm I think this is going in a wrong direction. Good thing you caught me early. I was planning to implement a better version of this on the weekend. And you are not wrong about code maintainability, unpack_entry() so far looks very close to a real mess. Yet, pavkv4 tree walking shouldn't need a cache since there is nothing to expand in the end. Entries should be advanced one by one as they are needed. Granted when converting back to a canonical object we need all of them, but eventually this shouldn't be the main mode of operation. There's another case where one of the base tree is not v4 (the packer is inefficient, like my index-pack --fix-thin). For trees with leading zeros in entry mode, we can just do a lossy conversion to v4, but I wonder if there is a case where we can't even convert to v4 and the v4 treewalk interface has to fall back to canonical format.. I guess that can't happen. However I can see that, as you say, the same base object is repeatedly referenced. This means that we need to parse it over and over again just to find the right offset where the needed entries start. We probably end up skipping over the first entries in a tree object multiple times. And that would be true even when the core code learns to walk pv4 trees directly. So here's the beginning of a tree offset cache to mitigate this problem. It is incomplete as the cache function is not implemented, etc. But that should give you the idea. Thanks. I'll have a closer look and maybe complete your patch. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] pack v4: add v4_size to struct delta_base_cache_entry
On Fri, 13 Sep 2013, Duy Nguyen wrote: On Fri, Sep 13, 2013 at 8:27 PM, Nicolas Pitre n...@fluxnic.net wrote: On Thu, 12 Sep 2013, Nguyễn Thái Ngọc Duy wrote: The intention is to store flat v4 trees in delta base cache to avoid repeatedly expanding copy sequences in v4 trees. When the user needs to unpack a v4 tree and the tree is found in the cache, the tree will be converted back to canonical format. Future tree_desc interface may skip canonical format and read v4 trees directly. For that to work we need to keep track of v4 tree size after all copy sequences are expanded, which is the purpose of this new field. Hmmm I think this is going in a wrong direction. Good thing you caught me early. I was planning to implement a better version of this on the weekend. And you are not wrong about code maintainability, unpack_entry() so far looks very close to a real mess. Yes. We might have to break it into sub-functions. However this has the potential to recurse rather deeply so it is necessary to limit its stack footprint as well. Yet, pavkv4 tree walking shouldn't need a cache since there is nothing to expand in the end. Entries should be advanced one by one as they are needed. Granted when converting back to a canonical object we need all of them, but eventually this shouldn't be the main mode of operation. There's another case where one of the base tree is not v4 (the packer is inefficient, like my index-pack --fix-thin). For trees with leading zeros in entry mode, we can just do a lossy conversion to v4, but I wonder if there is a case where we can't even convert to v4 and the v4 treewalk interface has to fall back to canonical format.. I guess that can't happen. Well... There is little gain in converting loose tree objects into pv4 format so there will always be a place for the canolical tree parser. Eventually the base trees added by index-pack should be pv4 trees. The only case where this might not work is for zero padded mode trees and we don't have to optimize for that i.e. a fallback to the canonical tree format will be good enough. However I can see that, as you say, the same base object is repeatedly referenced. This means that we need to parse it over and over again just to find the right offset where the needed entries start. We probably end up skipping over the first entries in a tree object multiple times. And that would be true even when the core code learns to walk pv4 trees directly. So here's the beginning of a tree offset cache to mitigate this problem. It is incomplete as the cache function is not implemented, etc. But that should give you the idea. Thanks. I'll have a closer look and maybe complete your patch. I've committed an almost final patch and pushed it out. It is disabled right now due to some bug I've not found yet. But you can have a look. Nicolas
Re: [PATCH 2/4] pack v4: add v4_size to struct delta_base_cache_entry
On Fri, 13 Sep 2013, Nicolas Pitre wrote: On Fri, 13 Sep 2013, Duy Nguyen wrote: On Fri, Sep 13, 2013 at 8:27 PM, Nicolas Pitre n...@fluxnic.net wrote: However I can see that, as you say, the same base object is repeatedly referenced. This means that we need to parse it over and over again just to find the right offset where the needed entries start. We probably end up skipping over the first entries in a tree object multiple times. And that would be true even when the core code learns to walk pv4 trees directly. So here's the beginning of a tree offset cache to mitigate this problem. It is incomplete as the cache function is not implemented, etc. But that should give you the idea. Thanks. I'll have a closer look and maybe complete your patch. I've committed an almost final patch and pushed it out. It is disabled right now due to some bug I've not found yet. But you can have a look. Found the bug. The cache is currently updated by the caller. The caller may ask for a copy of 2 entries from a base object, but that base object may itself copy those objects from somewhere else in a larger chunk. Let's consider this example: tree A -- 0 (0) copy 2 entries from tree B starting at entry 0 1 (2) copy 1 entry from tree B starting at entry 3 tree B -- 0 (0) copy 6 entries from tree C starting at entry 0 1 (6) entry foo.txt 2 (7) entry bar.txt Right now, the code calls decode_entries() to decode 2 entries from tree B but those entries are part of a copy from tree C. When that call returns, the cache is updated as if tree B entry #2 would start at offset 1 but this is wrong because offset 0 in tree B covers 6 entries and therefore offset 1 is for entry #6. So this needs a rethink. I won't be able to work on this for a few days. Nicolas -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] pack v4: add v4_size to struct delta_base_cache_entry
The intention is to store flat v4 trees in delta base cache to avoid repeatedly expanding copy sequences in v4 trees. When the user needs to unpack a v4 tree and the tree is found in the cache, the tree will be converted back to canonical format. Future tree_desc interface may skip canonical format and read v4 trees directly. For that to work we need to keep track of v4 tree size after all copy sequences are expanded, which is the purpose of this new field. Signed-off-by: Nguyễn Thái Ngọc Duy pclo...@gmail.com --- sha1_file.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/sha1_file.c b/sha1_file.c index 038e22e..03c66bb 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1934,6 +1934,7 @@ static struct delta_base_cache_entry { struct packed_git *p; off_t base_offset; unsigned long size; + unsigned long v4_size; enum object_type type; } delta_base_cache[MAX_DELTA_CACHE]; @@ -2015,7 +2016,8 @@ void clear_delta_base_cache(void) } static void add_delta_base_cache(struct packed_git *p, off_t base_offset, - void *base, unsigned long base_size, enum object_type type) + void *base, unsigned long base_size, unsigned long v4_size, + enum object_type type) { unsigned long hash = pack_entry_hash(p, base_offset); struct delta_base_cache_entry *ent = delta_base_cache + hash; @@ -2045,6 +2047,7 @@ static void add_delta_base_cache(struct packed_git *p, off_t base_offset, ent-type = type; ent-data = base; ent-size = base_size; + ent-v4_size = v4_size; ent-lru.next = delta_base_cache_lru; ent-lru.prev = delta_base_cache_lru.prev; delta_base_cache_lru.prev-next = ent-lru; @@ -2208,7 +2211,7 @@ void *unpack_entry(struct packed_git *p, off_t obj_offset, data = NULL; if (base) - add_delta_base_cache(p, obj_offset, base, base_size, type); + add_delta_base_cache(p, obj_offset, base, base_size, 0, type); if (!base) { /* -- 1.8.2.83.gc99314b -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html