Re: USE_SHA1DC is broken in pu

2017-03-23 Thread Jonathan Nieder
Johannes Schindelin wrote:
> On Wed, 22 Mar 2017, Jonathan Nieder wrote:
> > Johannes Schindelin wrote:

>>> As to the default of seriously slowing down all SHA-1 computations:
>>> since you made that the default, at compile time, with no way to turn
>>> on the faster computation, this will have a major, negative impact.
>>> Are you really, really sure you want to do that?
>>>
>>> I thought that it was obvious that we would have at least a runtime
>>> option to lessen the load.
>>
>> It's not obvious to me.  I agree that the DC_SHA1 case can be sped up,
>> e.g. by turning off the collision detection for sha1 calculations that
>> are not part of fetching, receiving a push, or running fsck.
>
> And in those cases, using OpenSSL instead is *even* faster.
[...]
> The index is 300MB. If you have to experience a sudden drop in performance
> of `git add`, even by "only" 30%, relative to OpenSSL, it is very
> noticeable. It is painful.
[...]
> It gets even worse when it comes to fetching, let alone cloning.
[...]
> And by "switching collision detection off", I of course refer to *not*
> using SHA1DC's routines at all, but what would have been used originally,
> in Git for Windows' case: (hardware-accelerated) OpenSSL.
>
> Did I manage to clarify the problem?

Yes.  Thank you for explaining.

Sincerely,
Jonathan


Re: USE_SHA1DC is broken in pu

2017-03-23 Thread Junio C Hamano
Jeff King  writes:

>   Side note: I also have a feeling that any operation that cares about
>   non-object sha1 performance is probably ripe for other, bigger
>   optimizations. If you update 300MB worth of index entries, then the
>   cost of computing a checksum over it isn't a big deal. But if you have
>   a 300MB index file and you update one entry (or you just want to read
>   one entry), maybe we ought to consider solutions that don't involve
>   the whole 300MB in the first place. I know that's a much harder change
>   because it may involve new on-disk formats. But it seems like that's
>   the right long-term path forward.

Yes ;-)


Re: USE_SHA1DC is broken in pu

2017-03-23 Thread Jeff King
On Thu, Mar 23, 2017 at 10:16:23AM -0700, Linus Torvalds wrote:

> > If I write out an index, I should not suffer the slowdown from detecting
> > collisions.
> 
> The index case is a complete red herring.
> 
> As already noted, the proper fix for the index case is to simply do it
> asynchronously on read. On write, it's harder to do asynchronously,
> but for a 300MB index file you're likely going to be doing IO in the
> middle, so it's probably not even noticeable.

I think there were some earlier timings that show OpenSSL had a small
but measurable improvement over block-sha1 in this massive case (and
sha1dc is about 1.75x slower than block-sha1, so it will be a little
worse).

So I am mildly sympathetic. BUT. I mostly agree with:

> But the fact is, if you don't want SHA1DC, and you have crazy special
> cases, you just continue to build with openssl support instead. Nobody
> else should ever have to worry about *your* crazy cases.

If somebody who has such a crazy special case wants to tweak the build
to link in a second sha1 implementation and appropriately call it from
non-security spots, I don't have a problem with that. But IMHO that's
the itch of the crazy-case person to scratch.

  Side note: I also have a feeling that any operation that cares about
  non-object sha1 performance is probably ripe for other, bigger
  optimizations. If you update 300MB worth of index entries, then the
  cost of computing a checksum over it isn't a big deal. But if you have
  a 300MB index file and you update one entry (or you just want to read
  one entry), maybe we ought to consider solutions that don't involve
  the whole 300MB in the first place. I know that's a much harder change
  because it may involve new on-disk formats. But it seems like that's
  the right long-term path forward.

-Peff


Re: USE_SHA1DC is broken in pu

2017-03-23 Thread Linus Torvalds
On Thu, Mar 23, 2017 at 9:43 AM, Johannes Schindelin
 wrote:
>
> What I am saying is that this should be a more fine-grained, runtime knob.

No it really shouldn't.

> If I write out an index, I should not suffer the slowdown from detecting
> collisions.

The index case is a complete red herring.

As already noted, the proper fix for the index case is to simply do it
asynchronously on read. On write, it's harder to do asynchronously,
but for a 300MB index file you're likely going to be doing IO in the
middle, so it's probably not even noticeable.

But even *if* you want to worry about the index file, it shouldn't be
a runtime knob. The index file is completely different from the other
uses of SHA1DC.

But the fact is, if you don't want SHA1DC, and you have crazy special
cases, you just continue to build with openssl support instead. Nobody
else should ever have to worry about *your* crazy cases.

Linus


Re: USE_SHA1DC is broken in pu

2017-03-23 Thread Johannes Schindelin
Hi Jonathan,

On Wed, 22 Mar 2017, Jonathan Nieder wrote:

> Johannes Schindelin wrote:
> 
> > As to the default of seriously slowing down all SHA-1 computations:
> > since you made that the default, at compile time, with no way to turn
> > on the faster computation, this will have a major, negative impact.
> > Are you really, really sure you want to do that?
> >
> > I thought that it was obvious that we would have at least a runtime
> > option to lessen the load.
> 
> It's not obvious to me.  I agree that the DC_SHA1 case can be sped up,
> e.g. by turning off the collision detection for sha1 calculations that
> are not part of fetching, receiving a push, or running fsck.

And in those cases, using OpenSSL instead is *even* faster.

> To be clear, are you saying that this is a bad compile-time default
> because distributors are going to leave it and end-users will end up
> with a bad experience?  Or are you saying distributors have no good
> alternative to choose at compile time?  Or something else?

What I am saying is that this should be a more fine-grained, runtime knob.

If I write out an index, I should not suffer the slowdown from detecting
collisions. Because I implicitly trust myself and everything that I added
(and everything that was checked before already). This may not matter with
small projects. But we know a couple of real-world scenarios where this
matters.

Imagine for example the insane repository described by my colleague Saeed
Noursalehi at GitMerge. It is *ginormous*.

The index is 300MB. If you have to experience a sudden drop in performance
of `git add`, even by "only" 30%, relative to OpenSSL, it is very
noticeable. It is painful.

That is the reason why we spent considerable time trying to enhance
performance of SHA-1 hashing even by as little as a couple of percentage
points here and there. The accumulated wins are noticeable, and
I assume that those wins are completely annihilated by the heavy-handed
switch to detect collisions always.

It gets even worse when it comes to fetching, let alone cloning.

And please note that the gigantic repository I mentioned above is a
company-internal one, i.e. the servers/repository are implicitly trusted.
Having to pay the price of a full clone going from 12+ hours to even only
15+ hours *hurts*. Particularly when that price is paid for no value in
return at all: the server *already* will have checked for crafted objects.

I could imagine that this problem could be addressed to everybody's
satisfaction by introducing a tristate config setting where the collision
detection can be switched on & off, and then also to, say, "external" i.e.
collision detection would be switched on whenever objects are retrieved
from somewhere else than the local repository (e.g. git-receive-pack).

If fetching or cloning from a trusted source, this config setting could be
switched off on the command-line, otherwise left at "external".

And by "switching collision detection off", I of course refer to *not*
using SHA1DC's routines at all, but what would have been used originally,
in Git for Windows' case: (hardware-accelerated) OpenSSL.

Did I manage to clarify the problem?
Johannes


Re: USE_SHA1DC is broken in pu

2017-03-22 Thread Jonathan Nieder
Hi,

Johannes Schindelin wrote:

> As to the default of seriously slowing down all SHA-1 computations: since
> you made that the default, at compile time, with no way to turn on the
> faster computation, this will have a major, negative impact. Are you
> really, really sure you want to do that?
>
> I thought that it was obvious that we would have at least a runtime option
> to lessen the load.

It's not obvious to me.  I agree that the DC_SHA1 case can be sped up,
e.g. by turning off the collision detection for sha1 calculations that
are not part of fetching, receiving a push, or running fsck.

To be clear, are you saying that this is a bad compile-time default
because distributors are going to leave it and end-users will end up
with a bad experience?  Or are you saying distributors have no good
alternative to choose at compile time?  Or something else?

Is there some particular downstream that this breaks?

Thanks and hope that helps,
Jonathan


Re: USE_SHA1DC is broken in pu

2017-03-22 Thread Johannes Schindelin
Hi Junio,

On Tue, 21 Mar 2017, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> > On Fri, 17 Mar 2017, Lars Schneider wrote:
> >
> >> > On 17 Mar 2017, at 11:18, Lars Schneider 
> >> > wrote:
> >> > 
> >> > Would it make sense/have value to add a job to our TravisCI build
> >> > [1] that compiles Git in a few variations with some high profile
> >> > switches such as USE_SHA1DC? Running all the tests for these
> >> > variations would probably take to long but just compiling would be
> >> > less than 2min per variation.
> >> 
> >> ... or just run individual tests instead of the entire test suite for
> >> these variations (e.g. only t0013 for the USE_SHA1DC variation).
> >
> > The best solution may be to open a PR with .travis.yml patched to
> > enable this flag. And then report back to he mailing list because the
> > gentle people here are not that used to paying attention to Continuous
> > Testing :-D
> 
> Actually, the best solution may be to do nothing ;-)  With the current
> incarnation parked in 'pu' (or I may have already merged it to 'next'),
> without any explicit VARIANT_SHA1 request to $(MAKE), we default to use
> the DC_SHA1 variant.
> 
> Those who are paying attention to Travis would have noticed this by now,
> I thought ;-).

I pay attention mainly to breakages, and that is already a lot to pay
attention to. Thanks.

As to the default of seriously slowing down all SHA-1 computations: since
you made that the default, at compile time, with no way to turn on the
faster computation, this will have a major, negative impact. Are you
really, really sure you want to do that?

I thought that it was obvious that we would have at least a runtime option
to lessen the load.

Surprised,
Johannes


Re: USE_SHA1DC is broken in pu

2017-03-21 Thread Junio C Hamano
Johannes Schindelin  writes:

> On Fri, 17 Mar 2017, Lars Schneider wrote:
>
>> > On 17 Mar 2017, at 11:18, Lars Schneider 
>> > wrote:
>> > 
>> > Would it make sense/have value to add a job to our TravisCI build [1]
>> > that compiles Git in a few variations with some high profile switches
>> > such as USE_SHA1DC? Running all the tests for these variations would
>> > probably take to long but just compiling would be less than 2min per
>> > variation.
>> 
>> ... or just run individual tests instead of the entire test suite for
>> these variations (e.g. only t0013 for the USE_SHA1DC variation).
>
> The best solution may be to open a PR with .travis.yml patched to enable
> this flag. And then report back to he mailing list because the gentle
> people here are not that used to paying attention to Continuous Testing
> :-D

Actually, the best solution may be to do nothing ;-)  With the
current incarnation parked in 'pu' (or I may have already merged it
to 'next'), without any explicit VARIANT_SHA1 request to $(MAKE), we
default to use the DC_SHA1 variant.

Those who are paying attention to Travis would have noticed this by
now, I thought ;-).


Re: USE_SHA1DC is broken in pu

2017-03-21 Thread Johannes Schindelin
Hi Lars,

On Fri, 17 Mar 2017, Lars Schneider wrote:

> > On 17 Mar 2017, at 11:18, Lars Schneider 
> > wrote:
> > 
> > Would it make sense/have value to add a job to our TravisCI build [1]
> > that compiles Git in a few variations with some high profile switches
> > such as USE_SHA1DC? Running all the tests for these variations would
> > probably take to long but just compiling would be less than 2min per
> > variation.
> 
> ... or just run individual tests instead of the entire test suite for
> these variations (e.g. only t0013 for the USE_SHA1DC variation).

The best solution may be to open a PR with .travis.yml patched to enable
this flag. And then report back to he mailing list because the gentle
people here are not that used to paying attention to Continuous Testing
:-D

Ciao,
Dscho


Re: USE_SHA1DC is broken in pu

2017-03-16 Thread Lars Schneider

> On 17 Mar 2017, at 11:18, Lars Schneider  wrote:
> 
> 
>> On 17 Mar 2017, at 03:22, Linus Torvalds  
>> wrote:
>> 
>> I think there's a semantic merge error and it clashes with
>> f18f816cb158 ("hash.h: move SHA-1 implementation selection into a
>> header file").
>> 
>> Suggested possible merge resolution attached.
>> 
>>  Linus
>> 
> 
> Would it make sense/have value to add a job to our TravisCI build [1] 
> that compiles Git in a few variations with some high profile switches 
> such as USE_SHA1DC? Running all the tests for these variations would 
> probably take to long but just compiling would be less than 2min per 
> variation.

... or just run individual tests instead of the entire test suite for 
these variations (e.g. only t0013 for the USE_SHA1DC variation).

- Lars



Re: USE_SHA1DC is broken in pu

2017-03-16 Thread Lars Schneider

> On 17 Mar 2017, at 03:22, Linus Torvalds  
> wrote:
> 
> I think there's a semantic merge error and it clashes with
> f18f816cb158 ("hash.h: move SHA-1 implementation selection into a
> header file").
> 
> Suggested possible merge resolution attached.
> 
>   Linus
> 

Would it make sense/have value to add a job to our TravisCI build [1] 
that compiles Git in a few variations with some high profile switches 
such as USE_SHA1DC? Running all the tests for these variations would 
probably take to long but just compiling would be less than 2min per 
variation.

- Lars

[1] https://travis-ci.org/git/git/branches

Re: USE_SHA1DC is broken in pu

2017-03-16 Thread Linus Torvalds
On Thu, Mar 16, 2017 at 12:51 PM, Linus Torvalds
 wrote:
>
> I'll send a patch on top of 'next', which already has the header file changes.

Patches sent. It all looked fairly straightforward to me, but maybe I
missed something.

 Linus


Re: USE_SHA1DC is broken in pu

2017-03-16 Thread Linus Torvalds
On Thu, Mar 16, 2017 at 12:46 PM, Junio C Hamano  wrote:
>
> That's easy to answer.  What we have on 'pu' is a fair game for
> wholesale replacement.  That is the whole point of not merging
> topics in flux to 'next' and declaring that 'pu' will constantly
> rewind.

Ok.

I'll send a patch on top of 'next', which already has the header file changes.

 Linus


Re: USE_SHA1DC is broken in pu

2017-03-16 Thread Junio C Hamano
Linus Torvalds  writes:

> I think there's a semantic merge error and it clashes with
> f18f816cb158 ("hash.h: move SHA-1 implementation selection into a
> header file").
>
> Suggested possible merge resolution attached.
>
>Linus

Obviously I have not been paying much attention to the current shape
of that topic, awaiting the real thing X-<.

I am hoping that we can make the "hash.h" thing graduate soon to
'master' and then queue the real thing on top, but in the meantime I
should have made sure the current one still does the right thing.

Thanks.


Re: USE_SHA1DC is broken in pu

2017-03-16 Thread Junio C Hamano
Linus Torvalds  writes:

> On Thu, Mar 16, 2017 at 12:41 PM, Jeff King  wrote:
>>
>> Potentially we should just eject sha1dc from "pu" for the moment. It
>> needs re-rolled with the most recent version of the collision library
>> (and I see Marc just posted that they hit a stable point, which is
>> perhaps why you're looking at it).
>
> I looked at it, and even created a patch, and then decided that you'd
> probably do it.
>
> But yes, re-integrating entirely (rather than creating a patch against
> the previous SHA1DC) might be simpler.
>
> Junio, which way do you want to go?

That's easy to answer.  What we have on 'pu' is a fair game for
wholesale replacement.  That is the whole point of not merging
topics in flux to 'next' and declaring that 'pu' will constantly
rewind.


Re: USE_SHA1DC is broken in pu

2017-03-16 Thread Linus Torvalds
On Thu, Mar 16, 2017 at 12:41 PM, Jeff King  wrote:
>
> Potentially we should just eject sha1dc from "pu" for the moment. It
> needs re-rolled with the most recent version of the collision library
> (and I see Marc just posted that they hit a stable point, which is
> perhaps why you're looking at it).

I looked at it, and even created a patch, and then decided that you'd
probably do it.

But yes, re-integrating entirely (rather than creating a patch against
the previous SHA1DC) might be simpler.

Junio, which way do you want to go?

 Linus


Re: USE_SHA1DC is broken in pu

2017-03-16 Thread Jeff King
On Thu, Mar 16, 2017 at 12:22:00PM -0700, Linus Torvalds wrote:

> I think there's a semantic merge error and it clashes with
> f18f816cb158 ("hash.h: move SHA-1 implementation selection into a
> header file").
> 
> Suggested possible merge resolution attached.

Yeah, your resolution makes sense.

Potentially we should just eject sha1dc from "pu" for the moment. It
needs re-rolled with the most recent version of the collision library
(and I see Marc just posted that they hit a stable point, which is
perhaps why you're looking at it).

I'd planned to work on that in the next few days, but if you want to do
so, be my guest.

-Peff


USE_SHA1DC is broken in pu

2017-03-16 Thread Linus Torvalds
I think there's a semantic merge error and it clashes with
f18f816cb158 ("hash.h: move SHA-1 implementation selection into a
header file").

Suggested possible merge resolution attached.

   Linus
 Makefile | 2 +-
 hash.h   | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 9d1d958bd..186ce17f2 100644
--- a/Makefile
+++ b/Makefile
@@ -1388,9 +1388,9 @@ ifdef APPLE_COMMON_CRYPTO
 endif
 
 ifdef USE_SHA1DC
-   SHA1_HEADER = "sha1dc/sha1.h"
LIB_OBJS += sha1dc/sha1.o
LIB_OBJS += sha1dc/ubc_check.o
+   BASIC_CFLAGS += -DSHA1DC
 else
 ifdef BLK_SHA1
LIB_OBJS += block-sha1/sha1.o
diff --git a/hash.h b/hash.h
index f0d9ddd0c..b7f4f1fd8 100644
--- a/hash.h
+++ b/hash.h
@@ -7,6 +7,8 @@
 #include 
 #elif defined(SHA1_OPENSSL)
 #include 
+#elif defined(SHA1DC)
+#include "sha1dc/sha1.h"
 #else /* SHA1_BLK */
 #include "block-sha1/sha1.h"
 #endif