Re: [PATCH 0/9] introducing oideq()
On Tue, Aug 28, 2018 at 05:21:27PM -0400, Jeff King wrote: > On Sun, Aug 26, 2018 at 08:56:21PM +, brian m. carlson wrote: > > I would quite like to see this series picked up for v2.20. If we want > > to minimize performance regressions with the SHA-256 work, I think it's > > important. > > Thanks. One of the things I was worried about was causing unnecessary > conflicts with existing topics, including your work. But if everybody is > on board, I'd be happy to see this go in early in the next release cycle > (the longer we wait, the more annoying conflicts Junio has to resolve). I can send out work that doesn't conflict with this while it makes its way to master. There are a lot of test fixes that can go in in the mean time. I will be sending out a series that actually implements SHA-256 as an RFC soon, but I don't think there will be any conflicts (and it will just be an RFC anyway). -- brian m. carlson: Houston, Texas, US OpenPGP: https://keybase.io/bk2204 signature.asc Description: PGP signature
Re: [PATCH 0/9] introducing oideq()
On 8/28/2018 5:21 PM, Jeff King wrote: On Sun, Aug 26, 2018 at 08:56:21PM +, brian m. carlson wrote: Due to the simplicity of the current code and our inlining, the compiler can usually figure this out for now. So I wouldn't expect this patch to actually improve performance right away. But as that discussion shows, we are likely to take a performance hit as we move to more runtime determination of the_hash_algo parameters. Having these callers use the more strict form will potentially help us recover that. So in that sense we _could_ simply punt on this series until then (and it's certainly post-v2.19 material). But I think it's worth doing now, simply from a readability/annotation standpoint. IMHO the resulting code is more clear (though I've long since taught myself to read !foocmp() as equality). I would quite like to see this series picked up for v2.20. If we want to minimize performance regressions with the SHA-256 work, I think it's important. Thanks. One of the things I was worried about was causing unnecessary conflicts with existing topics, including your work. But if everybody is on board, I'd be happy to see this go in early in the next release cycle (the longer we wait, the more annoying conflicts Junio has to resolve). I'm happy to take this change whenever. In my opinion, right after v2.19.0 is cut would be a great time to merge it into master. This v2 is good. Reviewed-by: Derrick Stolee
Re: [PATCH 0/9] introducing oideq()
On Sun, Aug 26, 2018 at 08:56:21PM +, brian m. carlson wrote: > > Due to the simplicity of the current code and our inlining, the compiler > > can usually figure this out for now. So I wouldn't expect this patch to > > actually improve performance right away. But as that discussion shows, > > we are likely to take a performance hit as we move to more runtime > > determination of the_hash_algo parameters. Having these callers use the > > more strict form will potentially help us recover that. > > > > So in that sense we _could_ simply punt on this series until then (and > > it's certainly post-v2.19 material). But I think it's worth doing now, > > simply from a readability/annotation standpoint. IMHO the resulting code > > is more clear (though I've long since taught myself to read !foocmp() as > > equality). > > I would quite like to see this series picked up for v2.20. If we want > to minimize performance regressions with the SHA-256 work, I think it's > important. Thanks. One of the things I was worried about was causing unnecessary conflicts with existing topics, including your work. But if everybody is on board, I'd be happy to see this go in early in the next release cycle (the longer we wait, the more annoying conflicts Junio has to resolve). > Applying the following patch on top of this series causes gcc to inline > both branches, which is pretty much the best we can expect. I haven't > benchmarked it, though, so I can't say what the actual performance > consequence is. > [...] > diff --git a/cache.h b/cache.h > index 3bb88ac6d0..1c182c6ef6 100644 > --- a/cache.h > +++ b/cache.h > @@ -1033,7 +1033,10 @@ static inline int oidcmp(const struct object_id *oid1, > const struct object_id *o > > static inline int hasheq(const unsigned char *sha1, const unsigned char > *sha2) > { > - return !hashcmp(sha1, sha2); > + if (the_hash_algo->rawsz == 32) { > + return !memcmp(sha1, sha2, 32); > + } > + return !memcmp(sha1, sha2, 20); > } Yeah, I can reproduce that, too. I'm actually surprised that's enough to convince the compiler to inline, based on my earlier tests. I wonder if it is because it is more able to see the boolean nature of the memcmp() (i.e., in the earlier versions it was not willing to carry that information across the conditional boundary for some reason). Or perhaps it is that this is written slightly differently from my earlier case, where the fallback did not say "32" explicitly, but rather: return memcmp(sha1, sha2, the_hash_algo->rawsz); For your form, we'd probably want to add a third case arm with a BUG(), just in case. > As for this series itself, other than the typos people have pointed out, > it looks good to me. Thanks. Here it is again with those typos fixed, but no changes to the patches themselves. 1: 7bcf611959 = 1: 7bcf611959 coccinelle: use <...> for function exclusion 2: 6025ebe5d1 ! 2: 0e90944f5a introduce hasheq() and oideq() @@ -5,11 +5,10 @@ The main comparison functions we provide for comparing object ids are hashcmp() and oidcmp(). These are more flexible than a strict equality check, since they also -express ordering. That makes them them useful for sorting -and binary searching. However, it also makes them -potentially slower than a strict equality check. Consider -this C code, which is traditionally what our hashcmp has -looked like: +express ordering. That makes them useful for sorting and +binary searching. However, it also makes them potentially +slower than a strict equality check. Consider this C code, +which is traditionally what our hashcmp has looked like: #include int hashcmp(const unsigned char *a, const unsigned char *b) @@ -51,7 +50,7 @@ though the vast majority don't. We can solve that by introducing a hasheq() function (and -matching oideq() wrapper), which callers can use to make +matching oideq() wrapper), which callers can use to make it clear that they only care about equality. For now, the implementation will literally be "!hashcmp()", but it frees us up later to introduce code optimized specifically for the 3: 35a05a7e5c = 3: 4f1ea17e79 convert "oidcmp() == 0" to oideq() 4: b18fc3994d = 4: 6614cc71e4 convert "hashcmp() == 0" to hasheq() 5: 0fbcfcf7cc = 5: c3dd2b2c80 convert "oidcmp() != 0" to "!oideq()" 6: 32e169b886 = 6: c94914f1ef convert "hashcmp() != 0" to "!hasheq()" 7: b90051c38b = 7: c773fa9ca4 convert hashmap comparison functions to oideq() 8: b1e6ad80f0 = 8: ada5809fab read-cache: use oideq() in ce_compare functions 9: 18be86e078 ! 9: 62f24d63a4 show_dirstat: simplify same-content check @@ -2,7 +2,7 @@ show_dirstat: simplify same-content check -We two nested conditionals to store a content_changed +We use two
Re: [PATCH 0/9] introducing oideq()
On 8/26/2018 4:56 PM, brian m. carlson wrote: On Sat, Aug 25, 2018 at 04:00:31AM -0400, Jeff King wrote: This is a follow-up to the discussion in: https://public-inbox.org/git/20180822030344.ga14...@sigill.intra.peff.net/ The general idea is that the majority of callers don't care about actual plus/minus ordering from oidcmp() and hashcmp(); they just want to know if two oids are equal. The stricter equality check can be optimized better by the compiler. Due to the simplicity of the current code and our inlining, the compiler can usually figure this out for now. So I wouldn't expect this patch to actually improve performance right away. But as that discussion shows, we are likely to take a performance hit as we move to more runtime determination of the_hash_algo parameters. Having these callers use the more strict form will potentially help us recover that. So in that sense we _could_ simply punt on this series until then (and it's certainly post-v2.19 material). But I think it's worth doing now, simply from a readability/annotation standpoint. IMHO the resulting code is more clear (though I've long since taught myself to read !foocmp() as equality). I would quite like to see this series picked up for v2.20. If we want to minimize performance regressions with the SHA-256 work, I think it's important. Seconded. Applying the following patch on top of this series causes gcc to inline both branches, which is pretty much the best we can expect. I haven't benchmarked it, though, so I can't say what the actual performance consequence is. We should definitely measure the cost here, but when we have the option to move to newhash, that cost should be acceptable. Thanks, -Stolee
Re: [PATCH 0/9] introducing oideq()
On Sat, Aug 25, 2018 at 04:00:31AM -0400, Jeff King wrote: > This is a follow-up to the discussion in: > > https://public-inbox.org/git/20180822030344.ga14...@sigill.intra.peff.net/ > > The general idea is that the majority of callers don't care about actual > plus/minus ordering from oidcmp() and hashcmp(); they just want to know > if two oids are equal. The stricter equality check can be optimized > better by the compiler. > > Due to the simplicity of the current code and our inlining, the compiler > can usually figure this out for now. So I wouldn't expect this patch to > actually improve performance right away. But as that discussion shows, > we are likely to take a performance hit as we move to more runtime > determination of the_hash_algo parameters. Having these callers use the > more strict form will potentially help us recover that. > > So in that sense we _could_ simply punt on this series until then (and > it's certainly post-v2.19 material). But I think it's worth doing now, > simply from a readability/annotation standpoint. IMHO the resulting code > is more clear (though I've long since taught myself to read !foocmp() as > equality). I would quite like to see this series picked up for v2.20. If we want to minimize performance regressions with the SHA-256 work, I think it's important. Applying the following patch on top of this series causes gcc to inline both branches, which is pretty much the best we can expect. I haven't benchmarked it, though, so I can't say what the actual performance consequence is. As for this series itself, other than the typos people have pointed out, it looks good to me. diff --git a/cache.h b/cache.h index 3bb88ac6d0..1c182c6ef6 100644 --- a/cache.h +++ b/cache.h @@ -1033,7 +1033,10 @@ static inline int oidcmp(const struct object_id *oid1, const struct object_id *o static inline int hasheq(const unsigned char *sha1, const unsigned char *sha2) { - return !hashcmp(sha1, sha2); + if (the_hash_algo->rawsz == 32) { + return !memcmp(sha1, sha2, 32); + } + return !memcmp(sha1, sha2, 20); } static inline int oideq(const struct object_id *oid1, const struct object_id *oid2) -- brian m. carlson: Houston, Texas, US OpenPGP: https://keybase.io/bk2204 signature.asc Description: PGP signature
[PATCH 0/9] introducing oideq()
This is a follow-up to the discussion in: https://public-inbox.org/git/20180822030344.ga14...@sigill.intra.peff.net/ The general idea is that the majority of callers don't care about actual plus/minus ordering from oidcmp() and hashcmp(); they just want to know if two oids are equal. The stricter equality check can be optimized better by the compiler. Due to the simplicity of the current code and our inlining, the compiler can usually figure this out for now. So I wouldn't expect this patch to actually improve performance right away. But as that discussion shows, we are likely to take a performance hit as we move to more runtime determination of the_hash_algo parameters. Having these callers use the more strict form will potentially help us recover that. So in that sense we _could_ simply punt on this series until then (and it's certainly post-v2.19 material). But I think it's worth doing now, simply from a readability/annotation standpoint. IMHO the resulting code is more clear (though I've long since taught myself to read !foocmp() as equality). If we do punt, patch 1 could still be picked up now, as it's a related cleanup that stands on its own. The diffstat is scary, but the vast majority of that is purely mechanical via coccinelle. There's some discussion in the individual patches. We also don't _have_ to convert all sites now. I strongly suspect that only a few calls are actual measurable bottlenecks (the one in lookup_object() is the one I was primarily interested in). But it's just as easy to do it all at once with coccinelle, rather than try to measure each one (and once we add the coccinelle patches, we have to convert everything, or "make coccicheck" will nag people to do so). I didn't bother to hold back changes for any topics in flight. There are a handful of conflicts with "pu", but they're mostly trivial. The only big one is due to some code movement in ds/reachable. But though the conflict is big, the resolution is still easy (you can even just take the other side and "make coccicheck" to do it automatically). [1/9]: coccinelle: use <...> for function exclusion [2/9]: introduce hasheq() and oideq() [3/9]: convert "oidcmp() == 0" to oideq() [4/9]: convert "hashcmp() == 0" to hasheq() [5/9]: convert "oidcmp() != 0" to "!oideq()" [6/9]: convert "hashcmp() != 0" to "!hasheq()" [7/9]: convert hashmap comparison functions to oideq() [8/9]: read-cache: use oideq() in ce_compare functions [9/9]: show_dirstat: simplify same-content check bisect.c | 6 ++-- blame.c| 8 ++--- builtin/am.c | 2 +- builtin/checkout.c | 2 +- builtin/describe.c | 10 +++--- builtin/diff.c | 2 +- builtin/difftool.c | 4 +-- builtin/fast-export.c | 2 +- builtin/fetch.c| 6 ++-- builtin/fmt-merge-msg.c| 6 ++-- builtin/index-pack.c | 8 ++--- builtin/log.c | 6 ++-- builtin/merge-tree.c | 2 +- builtin/merge.c| 6 ++-- builtin/pack-objects.c | 4 +-- builtin/pull.c | 4 +-- builtin/receive-pack.c | 4 +-- builtin/remote.c | 2 +- builtin/replace.c | 6 ++-- builtin/rm.c | 2 +- builtin/show-branch.c | 6 ++-- builtin/tag.c | 2 +- builtin/unpack-objects.c | 4 +-- builtin/update-index.c | 4 +-- bulk-checkin.c | 2 +- bundle.c | 2 +- cache-tree.c | 2 +- cache.h| 22 + combine-diff.c | 4 +-- commit-graph.c | 10 +++--- commit.c | 2 +- connect.c | 2 +- contrib/coccinelle/commit.cocci| 4 +-- contrib/coccinelle/object_id.cocci | 50 -- diff-lib.c | 4 +-- diff.c | 23 ++ diffcore-break.c | 2 +- diffcore-rename.c | 2 +- dir.c | 6 ++-- fast-import.c | 10 +++--- fetch-pack.c | 2 +- http-push.c| 2 +- http-walker.c | 4 +-- http.c | 2 +- log-tree.c | 6 ++-- match-trees.c | 2 +- merge-recursive.c | 4 +-- notes-merge.c | 24 +++--- notes.c| 8 ++--- object.c | 2 +- oidmap.c | 12 +++ pack-check.c | 6 ++-- pack-objects.c | 2 +- pack-write.c