Re: USE_SHA1DC is broken in pu
Johannes Schindelin wrote: > On Wed, 22 Mar 2017, Jonathan Nieder wrote: > > Johannes Schindelin wrote: >>> As to the default of seriously slowing down all SHA-1 computations: >>> since you made that the default, at compile time, with no way to turn >>> on the faster computation, this will have a major, negative impact. >>> Are you really, really sure you want to do that? >>> >>> I thought that it was obvious that we would have at least a runtime >>> option to lessen the load. >> >> It's not obvious to me. I agree that the DC_SHA1 case can be sped up, >> e.g. by turning off the collision detection for sha1 calculations that >> are not part of fetching, receiving a push, or running fsck. > > And in those cases, using OpenSSL instead is *even* faster. [...] > The index is 300MB. If you have to experience a sudden drop in performance > of `git add`, even by "only" 30%, relative to OpenSSL, it is very > noticeable. It is painful. [...] > It gets even worse when it comes to fetching, let alone cloning. [...] > And by "switching collision detection off", I of course refer to *not* > using SHA1DC's routines at all, but what would have been used originally, > in Git for Windows' case: (hardware-accelerated) OpenSSL. > > Did I manage to clarify the problem? Yes. Thank you for explaining. Sincerely, Jonathan
Re: USE_SHA1DC is broken in pu
Jeff King writes: > Side note: I also have a feeling that any operation that cares about > non-object sha1 performance is probably ripe for other, bigger > optimizations. If you update 300MB worth of index entries, then the > cost of computing a checksum over it isn't a big deal. But if you have > a 300MB index file and you update one entry (or you just want to read > one entry), maybe we ought to consider solutions that don't involve > the whole 300MB in the first place. I know that's a much harder change > because it may involve new on-disk formats. But it seems like that's > the right long-term path forward. Yes ;-)
Re: USE_SHA1DC is broken in pu
On Thu, Mar 23, 2017 at 10:16:23AM -0700, Linus Torvalds wrote: > > If I write out an index, I should not suffer the slowdown from detecting > > collisions. > > The index case is a complete red herring. > > As already noted, the proper fix for the index case is to simply do it > asynchronously on read. On write, it's harder to do asynchronously, > but for a 300MB index file you're likely going to be doing IO in the > middle, so it's probably not even noticeable. I think there were some earlier timings that show OpenSSL had a small but measurable improvement over block-sha1 in this massive case (and sha1dc is about 1.75x slower than block-sha1, so it will be a little worse). So I am mildly sympathetic. BUT. I mostly agree with: > But the fact is, if you don't want SHA1DC, and you have crazy special > cases, you just continue to build with openssl support instead. Nobody > else should ever have to worry about *your* crazy cases. If somebody who has such a crazy special case wants to tweak the build to link in a second sha1 implementation and appropriately call it from non-security spots, I don't have a problem with that. But IMHO that's the itch of the crazy-case person to scratch. Side note: I also have a feeling that any operation that cares about non-object sha1 performance is probably ripe for other, bigger optimizations. If you update 300MB worth of index entries, then the cost of computing a checksum over it isn't a big deal. But if you have a 300MB index file and you update one entry (or you just want to read one entry), maybe we ought to consider solutions that don't involve the whole 300MB in the first place. I know that's a much harder change because it may involve new on-disk formats. But it seems like that's the right long-term path forward. -Peff
Re: USE_SHA1DC is broken in pu
On Thu, Mar 23, 2017 at 9:43 AM, Johannes Schindelin wrote: > > What I am saying is that this should be a more fine-grained, runtime knob. No it really shouldn't. > If I write out an index, I should not suffer the slowdown from detecting > collisions. The index case is a complete red herring. As already noted, the proper fix for the index case is to simply do it asynchronously on read. On write, it's harder to do asynchronously, but for a 300MB index file you're likely going to be doing IO in the middle, so it's probably not even noticeable. But even *if* you want to worry about the index file, it shouldn't be a runtime knob. The index file is completely different from the other uses of SHA1DC. But the fact is, if you don't want SHA1DC, and you have crazy special cases, you just continue to build with openssl support instead. Nobody else should ever have to worry about *your* crazy cases. Linus
Re: USE_SHA1DC is broken in pu
Hi Jonathan, On Wed, 22 Mar 2017, Jonathan Nieder wrote: > Johannes Schindelin wrote: > > > As to the default of seriously slowing down all SHA-1 computations: > > since you made that the default, at compile time, with no way to turn > > on the faster computation, this will have a major, negative impact. > > Are you really, really sure you want to do that? > > > > I thought that it was obvious that we would have at least a runtime > > option to lessen the load. > > It's not obvious to me. I agree that the DC_SHA1 case can be sped up, > e.g. by turning off the collision detection for sha1 calculations that > are not part of fetching, receiving a push, or running fsck. And in those cases, using OpenSSL instead is *even* faster. > To be clear, are you saying that this is a bad compile-time default > because distributors are going to leave it and end-users will end up > with a bad experience? Or are you saying distributors have no good > alternative to choose at compile time? Or something else? What I am saying is that this should be a more fine-grained, runtime knob. If I write out an index, I should not suffer the slowdown from detecting collisions. Because I implicitly trust myself and everything that I added (and everything that was checked before already). This may not matter with small projects. But we know a couple of real-world scenarios where this matters. Imagine for example the insane repository described by my colleague Saeed Noursalehi at GitMerge. It is *ginormous*. The index is 300MB. If you have to experience a sudden drop in performance of `git add`, even by "only" 30%, relative to OpenSSL, it is very noticeable. It is painful. That is the reason why we spent considerable time trying to enhance performance of SHA-1 hashing even by as little as a couple of percentage points here and there. The accumulated wins are noticeable, and I assume that those wins are completely annihilated by the heavy-handed switch to detect collisions always. It gets even worse when it comes to fetching, let alone cloning. And please note that the gigantic repository I mentioned above is a company-internal one, i.e. the servers/repository are implicitly trusted. Having to pay the price of a full clone going from 12+ hours to even only 15+ hours *hurts*. Particularly when that price is paid for no value in return at all: the server *already* will have checked for crafted objects. I could imagine that this problem could be addressed to everybody's satisfaction by introducing a tristate config setting where the collision detection can be switched on & off, and then also to, say, "external" i.e. collision detection would be switched on whenever objects are retrieved from somewhere else than the local repository (e.g. git-receive-pack). If fetching or cloning from a trusted source, this config setting could be switched off on the command-line, otherwise left at "external". And by "switching collision detection off", I of course refer to *not* using SHA1DC's routines at all, but what would have been used originally, in Git for Windows' case: (hardware-accelerated) OpenSSL. Did I manage to clarify the problem? Johannes
Re: USE_SHA1DC is broken in pu
Hi, Johannes Schindelin wrote: > As to the default of seriously slowing down all SHA-1 computations: since > you made that the default, at compile time, with no way to turn on the > faster computation, this will have a major, negative impact. Are you > really, really sure you want to do that? > > I thought that it was obvious that we would have at least a runtime option > to lessen the load. It's not obvious to me. I agree that the DC_SHA1 case can be sped up, e.g. by turning off the collision detection for sha1 calculations that are not part of fetching, receiving a push, or running fsck. To be clear, are you saying that this is a bad compile-time default because distributors are going to leave it and end-users will end up with a bad experience? Or are you saying distributors have no good alternative to choose at compile time? Or something else? Is there some particular downstream that this breaks? Thanks and hope that helps, Jonathan
Re: USE_SHA1DC is broken in pu
Hi Junio, On Tue, 21 Mar 2017, Junio C Hamano wrote: > Johannes Schindelin writes: > > > On Fri, 17 Mar 2017, Lars Schneider wrote: > > > >> > On 17 Mar 2017, at 11:18, Lars Schneider > >> > wrote: > >> > > >> > Would it make sense/have value to add a job to our TravisCI build > >> > [1] that compiles Git in a few variations with some high profile > >> > switches such as USE_SHA1DC? Running all the tests for these > >> > variations would probably take to long but just compiling would be > >> > less than 2min per variation. > >> > >> ... or just run individual tests instead of the entire test suite for > >> these variations (e.g. only t0013 for the USE_SHA1DC variation). > > > > The best solution may be to open a PR with .travis.yml patched to > > enable this flag. And then report back to he mailing list because the > > gentle people here are not that used to paying attention to Continuous > > Testing :-D > > Actually, the best solution may be to do nothing ;-) With the current > incarnation parked in 'pu' (or I may have already merged it to 'next'), > without any explicit VARIANT_SHA1 request to $(MAKE), we default to use > the DC_SHA1 variant. > > Those who are paying attention to Travis would have noticed this by now, > I thought ;-). I pay attention mainly to breakages, and that is already a lot to pay attention to. Thanks. As to the default of seriously slowing down all SHA-1 computations: since you made that the default, at compile time, with no way to turn on the faster computation, this will have a major, negative impact. Are you really, really sure you want to do that? I thought that it was obvious that we would have at least a runtime option to lessen the load. Surprised, Johannes
Re: USE_SHA1DC is broken in pu
Johannes Schindelin writes: > On Fri, 17 Mar 2017, Lars Schneider wrote: > >> > On 17 Mar 2017, at 11:18, Lars Schneider >> > wrote: >> > >> > Would it make sense/have value to add a job to our TravisCI build [1] >> > that compiles Git in a few variations with some high profile switches >> > such as USE_SHA1DC? Running all the tests for these variations would >> > probably take to long but just compiling would be less than 2min per >> > variation. >> >> ... or just run individual tests instead of the entire test suite for >> these variations (e.g. only t0013 for the USE_SHA1DC variation). > > The best solution may be to open a PR with .travis.yml patched to enable > this flag. And then report back to he mailing list because the gentle > people here are not that used to paying attention to Continuous Testing > :-D Actually, the best solution may be to do nothing ;-) With the current incarnation parked in 'pu' (or I may have already merged it to 'next'), without any explicit VARIANT_SHA1 request to $(MAKE), we default to use the DC_SHA1 variant. Those who are paying attention to Travis would have noticed this by now, I thought ;-).
Re: USE_SHA1DC is broken in pu
Hi Lars, On Fri, 17 Mar 2017, Lars Schneider wrote: > > On 17 Mar 2017, at 11:18, Lars Schneider > > wrote: > > > > Would it make sense/have value to add a job to our TravisCI build [1] > > that compiles Git in a few variations with some high profile switches > > such as USE_SHA1DC? Running all the tests for these variations would > > probably take to long but just compiling would be less than 2min per > > variation. > > ... or just run individual tests instead of the entire test suite for > these variations (e.g. only t0013 for the USE_SHA1DC variation). The best solution may be to open a PR with .travis.yml patched to enable this flag. And then report back to he mailing list because the gentle people here are not that used to paying attention to Continuous Testing :-D Ciao, Dscho
Re: USE_SHA1DC is broken in pu
> On 17 Mar 2017, at 11:18, Lars Schneider wrote: > > >> On 17 Mar 2017, at 03:22, Linus Torvalds >> wrote: >> >> I think there's a semantic merge error and it clashes with >> f18f816cb158 ("hash.h: move SHA-1 implementation selection into a >> header file"). >> >> Suggested possible merge resolution attached. >> >> Linus >> > > Would it make sense/have value to add a job to our TravisCI build [1] > that compiles Git in a few variations with some high profile switches > such as USE_SHA1DC? Running all the tests for these variations would > probably take to long but just compiling would be less than 2min per > variation. ... or just run individual tests instead of the entire test suite for these variations (e.g. only t0013 for the USE_SHA1DC variation). - Lars
Re: USE_SHA1DC is broken in pu
> On 17 Mar 2017, at 03:22, Linus Torvalds > wrote: > > I think there's a semantic merge error and it clashes with > f18f816cb158 ("hash.h: move SHA-1 implementation selection into a > header file"). > > Suggested possible merge resolution attached. > > Linus > Would it make sense/have value to add a job to our TravisCI build [1] that compiles Git in a few variations with some high profile switches such as USE_SHA1DC? Running all the tests for these variations would probably take to long but just compiling would be less than 2min per variation. - Lars [1] https://travis-ci.org/git/git/branches
Re: USE_SHA1DC is broken in pu
On Thu, Mar 16, 2017 at 12:51 PM, Linus Torvalds wrote: > > I'll send a patch on top of 'next', which already has the header file changes. Patches sent. It all looked fairly straightforward to me, but maybe I missed something. Linus
Re: USE_SHA1DC is broken in pu
On Thu, Mar 16, 2017 at 12:46 PM, Junio C Hamano wrote: > > That's easy to answer. What we have on 'pu' is a fair game for > wholesale replacement. That is the whole point of not merging > topics in flux to 'next' and declaring that 'pu' will constantly > rewind. Ok. I'll send a patch on top of 'next', which already has the header file changes. Linus
Re: USE_SHA1DC is broken in pu
Linus Torvalds writes: > I think there's a semantic merge error and it clashes with > f18f816cb158 ("hash.h: move SHA-1 implementation selection into a > header file"). > > Suggested possible merge resolution attached. > >Linus Obviously I have not been paying much attention to the current shape of that topic, awaiting the real thing X-<. I am hoping that we can make the "hash.h" thing graduate soon to 'master' and then queue the real thing on top, but in the meantime I should have made sure the current one still does the right thing. Thanks.
Re: USE_SHA1DC is broken in pu
Linus Torvalds writes: > On Thu, Mar 16, 2017 at 12:41 PM, Jeff King wrote: >> >> Potentially we should just eject sha1dc from "pu" for the moment. It >> needs re-rolled with the most recent version of the collision library >> (and I see Marc just posted that they hit a stable point, which is >> perhaps why you're looking at it). > > I looked at it, and even created a patch, and then decided that you'd > probably do it. > > But yes, re-integrating entirely (rather than creating a patch against > the previous SHA1DC) might be simpler. > > Junio, which way do you want to go? That's easy to answer. What we have on 'pu' is a fair game for wholesale replacement. That is the whole point of not merging topics in flux to 'next' and declaring that 'pu' will constantly rewind.
Re: USE_SHA1DC is broken in pu
On Thu, Mar 16, 2017 at 12:41 PM, Jeff King wrote: > > Potentially we should just eject sha1dc from "pu" for the moment. It > needs re-rolled with the most recent version of the collision library > (and I see Marc just posted that they hit a stable point, which is > perhaps why you're looking at it). I looked at it, and even created a patch, and then decided that you'd probably do it. But yes, re-integrating entirely (rather than creating a patch against the previous SHA1DC) might be simpler. Junio, which way do you want to go? Linus
Re: USE_SHA1DC is broken in pu
On Thu, Mar 16, 2017 at 12:22:00PM -0700, Linus Torvalds wrote: > I think there's a semantic merge error and it clashes with > f18f816cb158 ("hash.h: move SHA-1 implementation selection into a > header file"). > > Suggested possible merge resolution attached. Yeah, your resolution makes sense. Potentially we should just eject sha1dc from "pu" for the moment. It needs re-rolled with the most recent version of the collision library (and I see Marc just posted that they hit a stable point, which is perhaps why you're looking at it). I'd planned to work on that in the next few days, but if you want to do so, be my guest. -Peff
USE_SHA1DC is broken in pu
I think there's a semantic merge error and it clashes with f18f816cb158 ("hash.h: move SHA-1 implementation selection into a header file"). Suggested possible merge resolution attached. Linus Makefile | 2 +- hash.h | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 9d1d958bd..186ce17f2 100644 --- a/Makefile +++ b/Makefile @@ -1388,9 +1388,9 @@ ifdef APPLE_COMMON_CRYPTO endif ifdef USE_SHA1DC - SHA1_HEADER = "sha1dc/sha1.h" LIB_OBJS += sha1dc/sha1.o LIB_OBJS += sha1dc/ubc_check.o + BASIC_CFLAGS += -DSHA1DC else ifdef BLK_SHA1 LIB_OBJS += block-sha1/sha1.o diff --git a/hash.h b/hash.h index f0d9ddd0c..b7f4f1fd8 100644 --- a/hash.h +++ b/hash.h @@ -7,6 +7,8 @@ #include #elif defined(SHA1_OPENSSL) #include +#elif defined(SHA1DC) +#include "sha1dc/sha1.h" #else /* SHA1_BLK */ #include "block-sha1/sha1.h" #endif