Re: Hash algorithm analysis

2018-07-23 Thread demerphq
On Mon, 23 Jul 2018 at 14:48, Sitaram Chamarty  wrote:
> On 07/23/2018 06:10 PM, demerphq wrote:
> > On Sun, 22 Jul 2018 at 01:59, brian m. carlson
> >  wrote:
> >> I will admit that I don't love making this decision by myself, because
> >> right now, whatever I pick, somebody is going to be unhappy.  I want to
> >> state, unambiguously, that I'm trying to make a decision that is in the
> >> interests of the Git Project, the community, and our users.
> >>
> >> I'm happy to wait a few more days to see if a consensus develops; if so,
> >> I'll follow it.  If we haven't come to one by, say, Wednesday, I'll make
> >> a decision and write my patches accordingly.  The community is free, as
> >> always, to reject my patches if taking them is not in the interest of
> >> the project.
> >
> > Hi Brian.
> >
> > I do not envy you this decision.
> >
> > Personally I would aim towards pushing this decision out to the git
> > user base and facilitating things so we can choose whatever hash
> > function (and config) we wish, including ones not invented yet.
> >
> > Failing that I would aim towards a hashing strategy which has the most
> > flexibility. Keccak for instance has the interesting property that its
> > security level is tunable, and that it can produce aribitrarily long
> > hashes.  Leaving aside other concerns raised elsewhere in this thread,
> > these two features alone seem to make it a superior choice for an
> > initial implementation. You can find bugs by selecting unusual hash
> > sizes, including very long ones, and you can provide ways to tune the
> > function to peoples security and speed preferences.  Someone really
> > paranoid can specify an unusually large round count and a very long
> > hash.
> >
> > Also frankly I keep thinking that the ability to arbitrarily extend
> > the hash size has to be useful /somewhere/ in git.
>
> I would not suggest arbitrarily long hashes.  Not only would it
> complicate a lot of code, it is not clear that it has any real benefit.

It has the benefit of armoring the code for the *next* hash change,
and making it clear that such decisions are arbitrary and should not
be depended on.

> Plus, the code contortions required to support arbitrarily long hashes
> would be more susceptible to potential bugs and exploits, simply by
> being more complex code.  Why take chances?

I think the benefits would outweight the risks.

> I would suggest (a) hash size of 256 bits and (b) choice of any hash
> function that can produce such a hash.  If people feel strongly that 256
> bits may also turn out to be too small (really?) then a choice of 256 or
> 512, but not arbitrary sizes.

I am aware of too many systems that cannot change their size and are
locked into woefully bad decisions that were made long ago to buy
this.

Making it a per-repo option, would eliminate assumptions and make for
a more secure and flexible tool.

Anyway, I am not going to do the work so my opinion is worth the price
of the paper I sent it on. :-)

cheers,
Yves

-- 
perl -Mre=debug -e "/just|another|perl|hacker/"


Re: Hash algorithm analysis

2018-07-23 Thread demerphq
On Sun, 22 Jul 2018 at 01:59, brian m. carlson
 wrote:
> I will admit that I don't love making this decision by myself, because
> right now, whatever I pick, somebody is going to be unhappy.  I want to
> state, unambiguously, that I'm trying to make a decision that is in the
> interests of the Git Project, the community, and our users.
>
> I'm happy to wait a few more days to see if a consensus develops; if so,
> I'll follow it.  If we haven't come to one by, say, Wednesday, I'll make
> a decision and write my patches accordingly.  The community is free, as
> always, to reject my patches if taking them is not in the interest of
> the project.

Hi Brian.

I do not envy you this decision.

Personally I would aim towards pushing this decision out to the git
user base and facilitating things so we can choose whatever hash
function (and config) we wish, including ones not invented yet.

Failing that I would aim towards a hashing strategy which has the most
flexibility. Keccak for instance has the interesting property that its
security level is tunable, and that it can produce aribitrarily long
hashes.  Leaving aside other concerns raised elsewhere in this thread,
these two features alone seem to make it a superior choice for an
initial implementation. You can find bugs by selecting unusual hash
sizes, including very long ones, and you can provide ways to tune the
function to peoples security and speed preferences.  Someone really
paranoid can specify an unusually large round count and a very long
hash.

Also frankly I keep thinking that the ability to arbitrarily extend
the hash size has to be useful /somewhere/ in git.

cheers,
Yves
I am not a cryptographer.
-- 
perl -Mre=debug -e "/just|another|perl|hacker/"


Re: could `git merge --no-ff origin/master` be made more useful?

2018-05-15 Thread demerphq
On 15 May 2018 at 00:58, Ævar Arnfjörð Bjarmason <ava...@gmail.com> wrote:
>
> On Mon, May 14 2018, demerphq wrote:
>
>> The first time I tried to use --no-ff I tried to do something like this:
>>
>>   git checkout master
>>   git commit -a -m'whatever'
>>   git commit -a -m'whatever2'
>>   git merge --no-ff origin/master
>>
>> and was disappointed when "it didn't work" and git told me there was
>> nothing to do as the branch was up to date. (Which I found a bit
>> confusing.)
>>
>> I realize now my expectations were incorrect, and that the argument to
>> merge needs to resolve to a commit that is ahead of the current
>> commit, and in the above sequence it is the other way around. So to do
>> what I want I can do:
>>
>>   git checkout master
>>   git checkout -b topic
>>   git commit -a -m'whatever'
>>   git commit -a -m'whatever2'
>>   git checkout master
>>   git merge --no-ff topic
>>
>> and iiuir this works because 'master' would be behind 'topic' in this case.
>>
>> But I have a few questions, 1) is there is an argument to feed to git
>> merge to make the first recipe work like the second? And 2) is this
>> asymmetry necessary with --no-ff?
>
> I've been bitten my this myself, but found that it's documented as the
> very first thing in git-merge:
>
> Incorporates changes from the named commits (since the time their
> histories diverged from the current branch) into the current
> branch[...].
>
> Since origin/master hasn't diverged from your current branch (unlike the
> other way around), the merge with --no-ff is a noop.

Yeah, I got it, but only after rereading a lot of times.

>
>> More specifically would something horrible break if --no-ff
>> origin/trunk detected that the current branch was ahead of the named
>> branch and "swapped"  the implicit order of the two so that the first
>> recipe could behave like the second
>
> If it worked like that then the user who sets merge.ff=false in his
> config and issues a "git pull" after making a commit on his local master
> would create a merge commit.
>
> This old E-Mail of Junio's discusses that edge case & others in detail:
> https://public-inbox.org/git/7vty1zfwmd@alter.siamese.dyndns.org/

Thanks I skimmed, but it is long so I will review later.

I see the point about the config option for no-ff.

But what about an option like --reverse? Assuming we are on a local
branch master then

  git merge --no-ff --reverse origin/master

would treat origin/master as the "current" branch, and "master" as the
merged in branch, and create the appropriate merge commit. Which as
far as I can tell is tree-wise identical to creating a topic branch
instead of hacking on the local master.

>> Anyway, even if the above makes no sense, would it be hard to make the
>> message provided by git merge in the first recipe a bit more
>> suggestive of what is going on? For instance if it had said "Cannot
>> --no-ff merge, origin/master is behind master" it would have been much
>> more clear what was going on.
>
> I can't spot any reason for why we couldn't have something like this POC
> (would be properly done through advice.c):
>
> diff --git a/builtin/merge.c b/builtin/merge.c
> index 9db5a2cf16..920f67d9f8 100644
> --- a/builtin/merge.c
> +++ b/builtin/merge.c
> @@ -1407,6 +1407,8 @@ int cmd_merge(int argc, const char **argv, const 
> char *prefix)
>  * but first the most common case of merging one remote.
>  */
> finish_up_to_date(_("Already up to date."));
> +   if (fast_forward == FF_NO)
> +   fprintf(stderr, "did you mean this the other way 
> around?\n");
> goto done;
> } else if (fast_forward != FF_NO && !remoteheads->next &&
> !common->next &&
>
> But that should probably be reworked to be smart about whether --no-ff
> or merge.ff=false was specified, i.e. do we want to yell this at the
> user who's just set that at his config default, or the user who's
> specified --no-ff explicitly, or both? I don't know.

Yes, all those points make sense.

Yves


-- 
perl -Mre=debug -e "/just|another|perl|hacker/"


could `git merge --no-ff origin/master` be made more useful?

2018-05-14 Thread demerphq
The first time I tried to use --no-ff I tried to do something like this:

  git checkout master
  git commit -a -m'whatever'
  git commit -a -m'whatever2'
  git merge --no-ff origin/master

and was disappointed when "it didn't work" and git told me there was
nothing to do as the branch was up to date. (Which I found a bit
confusing.)

I realize now my expectations were incorrect, and that the argument to
merge needs to resolve to a commit that is ahead of the current
commit, and in the above sequence it is the other way around. So to do
what I want I can do:

  git checkout master
  git checkout -b topic
  git commit -a -m'whatever'
  git commit -a -m'whatever2'
  git checkout master
  git merge --no-ff topic

and iiuir this works because 'master' would be behind 'topic' in this case.

But I have a few questions, 1) is there is an argument to feed to git
merge to make the first recipe work like the second? And 2) is this
asymmetry necessary with --no-ff?

More specifically would something horrible break if --no-ff
origin/trunk detected that the current branch was ahead of the named
branch and "swapped"  the implicit order of the two so that the first
recipe could behave like the second?

Anyway, even if the above makes no sense, would it be hard to make the
message provided by git merge in the first recipe a bit more
suggestive of what is going on? For instance if it had said "Cannot
--no-ff merge, origin/master is behind master" it would have been much
more clear what was going on.

Yves










-- 
perl -Mre=debug -e "/just|another|perl|hacker/"


Re: [Problem] test_must_fail makes possibly questionable assumptions about exit_code.

2018-03-01 Thread demerphq
On 1 March 2018 at 16:08, Jeff King  wrote:
> On Thu, Mar 01, 2018 at 09:28:31AM -0500, Randall S. Becker wrote:
>
>> > It's not clear to me though if we just want to tweak the programs run in 
>> > the
>> > test scripts in order to get test_must_fail to stop complaining, or if we
>> > consider the unusual exit codes from our perl-based Git programs to be an
>> > error that should be fixed for real use, too.
>>
>> I'm living unusual exit code IRL all the time. So "fixed for real", is
>> what I'm looking for. So if we were to do that, where is the best
>> place to insert a fix - my original question - that would be permanent
>> in the main git test code. Or perhaps this needs to be in the main
>> code itself.
>
> If it's fixed in the real world, then it needs to be in the main code
> itself. It looks like git-svn already does this to some degree itself
> (most of the work happens in an eval, and it calls the "fatal" function
> if that throws an exception via 'die').
>
> So I think git-send-email.perl (and maybe others) needs to learn the
> same trick (by pushing the main bits of the script into an eval). Or it
> needs to include the SIG{__DIE__} trickery at the beginning of the
> script.
>
> I think the SIG{__DIE__} stuff could go into Git/PredictableDie.pm or
> something, and then any scripts that need it could just "use
> Git::PredictableDie".
>
> Does that make sense?

To me yes. By putting it in a module and 'use'ing it early you
guarantee it will be set up before any code following it is even
compiled.

If there is an existing module that the git perl code always uses then
it could go in there in a BEGIN{} block instead of adding the new
module.

Yves


-- 
perl -Mre=debug -e "/just|another|perl|hacker/"


Re: [Problem] test_must_fail makes possibly questionable assumptions about exit_code.

2018-03-01 Thread demerphq
On 1 March 2018 at 08:36, Jeff King <p...@peff.net> wrote:
> On Wed, Feb 28, 2018 at 05:51:14PM +0100, demerphq wrote:
>
>> I would look into putting it into a module and then using the PERL5OPT
>> environment var to have it loaded automagically in any of your perl
>> scripts.
>>
>> For instance if you put that code into a module called Git/DieTrap.pm
>>
>> then you could do:
>>
>> PERL5OPT=-MGit::DieTrap
>>
>> In your test setup code assuming you have some. Then you don't need to
>> change any of your scripts just the test runner framework.
>
> That's a clever trick.
>
> It's not clear to me though if we just want to tweak the programs run in
> the test scripts in order to get test_must_fail to stop complaining, or
> if we consider the unusual exit codes from our perl-based Git programs
> to be an error that should be fixed for real use, too.

Yeah, that is a decision you guys need to make, I am not familiar
enough with the issues to make any useful comment.

But I wanted to say that I will bring this subject up on perl5porters,
the exit code triggered by a die is a regular cause of trouble for
more than just you guys, and maybe we can get it changed for the
future. Nevertheless even if there was consensus it can be changed it
will take years before it is widely distributed enough to be useful to
you. :-(

Ill be bold and say sorry on the behalf of the perl committers for
this. Perl is so old sometimes things that used to make sense don't
make sense anymore.

cheers,
Yves

-- 
perl -Mre=debug -e "/just|another|perl|hacker/"


Re: [Problem] test_must_fail makes possibly questionable assumptions about exit_code.

2018-02-28 Thread demerphq
On 28 February 2018 at 18:19, demerphq <demer...@gmail.com> wrote:
> On 28 February 2018 at 18:10, Randall S. Becker <rsbec...@nexbridge.com> 
> wrote:
>> On February 28, 2018 11:46 AM, demerphq wrote:
>>> On 28 February 2018 at 08:49, Jeff King <p...@peff.net> wrote:
>>> > On Wed, Feb 28, 2018 at 07:42:51AM +, Eric Wong wrote:
>>> >
>>> >> > > >  a) We could override the meaning of die() in Git.pm.  This feels
>>> >> > > > ugly but if it works, it would be a very small patch.
>>> >> > >
>>> >> > > Unlikely to work since I think we use eval {} to trap exceptions
>>> >> > > from die.
>>> >> > >
>>> >> > > >  b) We could forbid use of die() and use some git_die() instead 
>>> >> > > > (but
>>> >> > > > with a better name) for our own error handling.
>>> >> > >
>>> >> > > Call sites may be dual-use: "die" can either be caught by an eval
>>> >> > > or used to show an error message to the user.
>>> >>
>>> >> 
>>> >>
>>> >> > > >  d) We could wrap each command in an eval {...} block to convert 
>>> >> > > > the
>>> >> > > > result from die() to exit 128.
>>> >> > >
>>> >> > > I prefer option d)
>>> >> >
>>> >> > FWIW, I agree with all of that. You can do (d) without an enclosing
>>> >> > eval block by just hooking the __DIE__ handler, like:
>>> >> >
>>> >> > $SIG{__DIE__} = sub {
>>> >> >   print STDERR "fatal: @_\n";
>>> >> >   exit 128;
>>> >> > };
>>> >>
>>> >> Looks like it has the same problems I pointed out with a) and b).
>>> >
>>> > You're right. I cut down my example too much and dropped the necessary
>>> > eval magic. Try this:
>>> >
>>> > -- >8 --
>>> > SIG{__DIE__} = sub {
>>> >   CORE::die @_ if $^S || !defined($^S);
>>> >   print STDERR "fatal: @_";
>>> >   exit 128;
>>> > };
>>>
>>> FWIW, this doesn't need to use CORE::die like that unless you have code that
>>> overrides die() or CORE::GLOBAL::die, which would be pretty unusual.
>>>
>>> die() within $SIG{__DIE__} is special cased not to trigger $SIG{__DIE__}
>>> again.
>>>
>>> Of course it doesn't hurt, but it might make a perl hacker do a double take
>>> why you are doing it. Maybe add a comment like
>>>
>>> # using CORE::die to armor against overridden die()
>>
>> The problem is actually in git code in its test suite that uses perl inline, 
>> not in my test code itself. The difficulty I'm having is placing this 
>> appropriate so that the signal handler gets used throughout the test suite 
>> including in the perl -e invocations. This is more a lack of my own 
>> understanding of plumbing of git test framework rather than of using or 
>> coding perl.
>
> Did you reply to the wrong mail?
>
> Create a file like:
>
> .../Git/DieTrap.pm
>
> which would look like  this:
>
> package Git::DieTrap;
> use strict;
> use warnings;
>
> SIG{__DIE__} = sub {

OOPs, that should read

$SIG{__DIE__} = sub {

sorry about that.

>CORE::die @_ if $^S || !defined($^S);
>print STDERR "fatal: @_";
>exit 128;
> };
>
> 1;
> __END__
>
> and then you would do:
>
> export PERL5OPT=-MGit::DieTrap
>
> before executing any tests. ANY use of perl from that point on will
> behave as though it has:
>
> use Git::DieTrap;
>
> at the top of the script, be it a -e, or any other way that Perl code
> is executed.
>
> cheers,
> Yves
>
> --
> perl -Mre=debug -e "/just|another|perl|hacker/"



-- 
perl -Mre=debug -e "/just|another|perl|hacker/"


Re: [Problem] test_must_fail makes possibly questionable assumptions about exit_code.

2018-02-28 Thread demerphq
On 28 February 2018 at 18:10, Randall S. Becker <rsbec...@nexbridge.com> wrote:
> On February 28, 2018 11:46 AM, demerphq wrote:
>> On 28 February 2018 at 08:49, Jeff King <p...@peff.net> wrote:
>> > On Wed, Feb 28, 2018 at 07:42:51AM +, Eric Wong wrote:
>> >
>> >> > > >  a) We could override the meaning of die() in Git.pm.  This feels
>> >> > > > ugly but if it works, it would be a very small patch.
>> >> > >
>> >> > > Unlikely to work since I think we use eval {} to trap exceptions
>> >> > > from die.
>> >> > >
>> >> > > >  b) We could forbid use of die() and use some git_die() instead (but
>> >> > > > with a better name) for our own error handling.
>> >> > >
>> >> > > Call sites may be dual-use: "die" can either be caught by an eval
>> >> > > or used to show an error message to the user.
>> >>
>> >> 
>> >>
>> >> > > >  d) We could wrap each command in an eval {...} block to convert the
>> >> > > > result from die() to exit 128.
>> >> > >
>> >> > > I prefer option d)
>> >> >
>> >> > FWIW, I agree with all of that. You can do (d) without an enclosing
>> >> > eval block by just hooking the __DIE__ handler, like:
>> >> >
>> >> > $SIG{__DIE__} = sub {
>> >> >   print STDERR "fatal: @_\n";
>> >> >   exit 128;
>> >> > };
>> >>
>> >> Looks like it has the same problems I pointed out with a) and b).
>> >
>> > You're right. I cut down my example too much and dropped the necessary
>> > eval magic. Try this:
>> >
>> > -- >8 --
>> > SIG{__DIE__} = sub {
>> >   CORE::die @_ if $^S || !defined($^S);
>> >   print STDERR "fatal: @_";
>> >   exit 128;
>> > };
>>
>> FWIW, this doesn't need to use CORE::die like that unless you have code that
>> overrides die() or CORE::GLOBAL::die, which would be pretty unusual.
>>
>> die() within $SIG{__DIE__} is special cased not to trigger $SIG{__DIE__}
>> again.
>>
>> Of course it doesn't hurt, but it might make a perl hacker do a double take
>> why you are doing it. Maybe add a comment like
>>
>> # using CORE::die to armor against overridden die()
>
> The problem is actually in git code in its test suite that uses perl inline, 
> not in my test code itself. The difficulty I'm having is placing this 
> appropriate so that the signal handler gets used throughout the test suite 
> including in the perl -e invocations. This is more a lack of my own 
> understanding of plumbing of git test framework rather than of using or 
> coding perl.

Did you reply to the wrong mail?

Create a file like:

.../Git/DieTrap.pm

which would look like  this:

package Git::DieTrap;
use strict;
use warnings;

SIG{__DIE__} = sub {
   CORE::die @_ if $^S || !defined($^S);
   print STDERR "fatal: @_";
   exit 128;
};

1;
__END__

and then you would do:

export PERL5OPT=-MGit::DieTrap

before executing any tests. ANY use of perl from that point on will
behave as though it has:

use Git::DieTrap;

at the top of the script, be it a -e, or any other way that Perl code
is executed.

cheers,
Yves

-- 
perl -Mre=debug -e "/just|another|perl|hacker/"


Re: [Problem] test_must_fail makes possibly questionable assumptions about exit_code.

2018-02-28 Thread demerphq
On 28 February 2018 at 15:55, Randall S. Becker  wrote:
> On February 28, 2018 2:49 AM, Peff wrote:
>> On Wed, Feb 28, 2018 at 07:42:51AM +, Eric Wong wrote:
>>
>> > > > >  a) We could override the meaning of die() in Git.pm.  This feels
>> > > > > ugly but if it works, it would be a very small patch.
>> > > >
>> > > > Unlikely to work since I think we use eval {} to trap exceptions
>> > > > from die.
>> > > >
>> > > > >  b) We could forbid use of die() and use some git_die() instead (but
>> > > > > with a better name) for our own error handling.
>> > > >
>> > > > Call sites may be dual-use: "die" can either be caught by an eval
>> > > > or used to show an error message to the user.
>> >
>> > 
>> >
>> > > > >  d) We could wrap each command in an eval {...} block to convert the
>> > > > > result from die() to exit 128.
>> > > >
>> > > > I prefer option d)
>> > >
>> > > FWIW, I agree with all of that. You can do (d) without an enclosing
>> > > eval block by just hooking the __DIE__ handler, like:
>> > >
>> > > $SIG{__DIE__} = sub {
>> > >   print STDERR "fatal: @_\n";
>> > >   exit 128;
>> > > };
>> >
>> > Looks like it has the same problems I pointed out with a) and b).
>>
>> You're right. I cut down my example too much and dropped the necessary
>> eval magic. Try this:
>>
>> -- >8 --
>> SIG{__DIE__} = sub {
>>   CORE::die @_ if $^S || !defined($^S);
>>   print STDERR "fatal: @_";
>>   exit 128;
>> };
>>
>> eval {
>>   die "inside eval";
>> };
>> print "eval status: $@" if $@;
>>
>> die "outside eval";
>> -- 8< --
>>
>> Running that should produce:
>>
>> $ perl foo.pl; echo $?
>> eval status: inside eval at foo.pl line 8.
>> fatal: outside eval at foo.pl line 12.
>> 128
>>
>> It may be getting a little too black-magic, though. Embedding in an eval is 
>> at
>> least straightforward, if a bit more invasive.
>
> I like this solution. The $64K question for me is how (a.k.a. where) to 
> instrument this broadly instead of in each perl fragment in the test suite.  
> The code:
>
> $SIG{__DIE__} = sub {
>   CORE::die @_ if $^S || !defined($^S);
>   print STDERR "fatal: @_";
>   exit 128;
> };
>
> eval {
>   die "inside eval";
> };
>
> print "eval status: $@" if $@;
>
> die "outside eval";
>
> as tested above, in NonStop results in an exit code of 128 whether run from a 
> script or from stdin (a good thing). I'm happy to do the heavy lifting on 
> this, but  a bit more direction as to the implementation would help.

I would look into putting it into a module and then using the PERL5OPT
environment var to have it loaded automagically in any of your perl
scripts.

For instance if you put that code into a module called Git/DieTrap.pm

then you could do:

PERL5OPT=-MGit::DieTrap

In your test setup code assuming you have some. Then you don't need to
change any of your scripts just the test runner framework.

Yves

-- 
perl -Mre=debug -e "/just|another|perl|hacker/"


Re: [Problem] test_must_fail makes possibly questionable assumptions about exit_code.

2018-02-28 Thread demerphq
On 28 February 2018 at 08:49, Jeff King  wrote:
> On Wed, Feb 28, 2018 at 07:42:51AM +, Eric Wong wrote:
>
>> > > >  a) We could override the meaning of die() in Git.pm.  This feels
>> > > > ugly but if it works, it would be a very small patch.
>> > >
>> > > Unlikely to work since I think we use eval {} to trap exceptions
>> > > from die.
>> > >
>> > > >  b) We could forbid use of die() and use some git_die() instead (but
>> > > > with a better name) for our own error handling.
>> > >
>> > > Call sites may be dual-use: "die" can either be caught by an
>> > > eval or used to show an error message to the user.
>>
>> 
>>
>> > > >  d) We could wrap each command in an eval {...} block to convert the
>> > > > result from die() to exit 128.
>> > >
>> > > I prefer option d)
>> >
>> > FWIW, I agree with all of that. You can do (d) without an enclosing eval
>> > block by just hooking the __DIE__ handler, like:
>> >
>> > $SIG{__DIE__} = sub {
>> >   print STDERR "fatal: @_\n";
>> >   exit 128;
>> > };
>>
>> Looks like it has the same problems I pointed out with a) and b).
>
> You're right. I cut down my example too much and dropped the necessary
> eval magic. Try this:
>
> -- >8 --
> SIG{__DIE__} = sub {
>   CORE::die @_ if $^S || !defined($^S);
>   print STDERR "fatal: @_";
>   exit 128;
> };

FWIW, this doesn't need to use CORE::die like that unless you have
code that overrides die() or CORE::GLOBAL::die, which would be pretty
unusual.

die() within $SIG{__DIE__} is special cased not to trigger $SIG{__DIE__} again.

Of course it doesn't hurt, but it might make a perl hacker do a double
take why you are doing it. Maybe add a comment like

# using CORE::die to armor against overridden die()

cheers,
Yves


Re: RFC v3: Another proposed hash function transition plan

2017-09-14 Thread demerphq
On 14 September 2017 at 17:23, Johannes Schindelin
 wrote:
> Hi Junio,
>
> On Thu, 14 Sep 2017, Junio C Hamano wrote:
>
>> Jonathan Nieder  writes:
>>
>> > In other words, a long lifetime for the hash absolutely is a design
>> > goal.  Coping well with an unexpectedly short lifetime for the hash is
>> > also a design goal.
>> >
>> > If the hash function lasts 10 years then I am happy.
>>
>> Absolutely.  When two functions have similar expected remaining life
>> and are equally widely supported, then faster is better than slower.
>> Otherwise our primary goal when picking the function from candidates
>> should be to optimize for its remaining life and wider availability.
>
> SHA-256 has been hammered on a lot more than SHA3-256.

Last year that was even more true of SHA1 than it is true of SHA-256 today.

Anyway,
Yves
-- 
perl -Mre=debug -e "/just|another|perl|hacker/"


Re: Bug: git branch --unset-upstream command can nuke config when disk is full.

2017-09-13 Thread demerphq
On 13 September 2017 at 17:22, Jeff King <p...@peff.net> wrote:
> On Wed, Sep 13, 2017 at 05:18:56PM +0200, demerphq wrote:
>
>> > Hmph. That is very disturbing. But with that information I should be
>> > able to track down the culprit. Thanks for digging.
>>
>> FWIW, I see that git_config_set_multivar_in_file_gently() uses
>> write_in_full() which in turn uses xwrite(), but the latter has the
>> following comment on it:
>>
>> /*
>>  * xwrite() is the same a write(), but it automatically restarts write()
>>  * operations with a recoverable error (EAGAIN and EINTR). xwrite() DOES NOT
>>  * GUARANTEE that "len" bytes is written even if the operation is successful.
>>  */
>>
>> I suspect that at this point I am not adding much value here, so I
>> will leave it at this.
>
> No, the problem is in this line:
>
>  if (write_in_full(fd, contents + copy_begin,
>copy_end - copy_begin) <
>  copy_end - copy_begin)
>   goto write_err_out;
>
> write_in_full() returns -1 on error (_not_ how many bytes were actually
> written). So its return is a signed ssize_t. But the result of the
> pointer subtraction "copy_end - copy_begin" is an unsigned ptrdiff_t.
> The compiler promotes the signed to an unsigned, so the condition can
> never be true (the "-1" becomes the highest possible value).

Bah. Good eye. I missed that entirely.

> I have the fix, but I'm searching the code base for other instances of
> the same error.

Yeah, I think there are few just in that file.

Thanks for fixing this!

cheers,
Yves

-- 
perl -Mre=debug -e "/just|another|perl|hacker/"


Re: Bug: git branch --unset-upstream command can nuke config when disk is full.

2017-09-13 Thread demerphq
On 13 September 2017 at 16:51, Jeff King <p...@peff.net> wrote:
> On Wed, Sep 13, 2017 at 04:49:45PM +0200, demerphq wrote:
>
>> On 13 September 2017 at 16:17, Jeff King <p...@peff.net> wrote:
>> > You're welcome to read over the function to double-check, but I just
>> > looked it over and couldn't find any unchecked writes.
>>
>> I can look, but I doubt I would notice something you did not.
>>
>> On the other hand the strace output does show that this is a case
>> where the writes failed, but we still renamed the empty config.lock
>> file into place:
>>
>>
>> write(3, "[core]\n\tsharedRepository = true\n"..., 288) = -1 ENOSPC
>> (No space left on device)
>> write(3, "merge = refs/heads/yves/"..., 51) = -1 ENOSPC (No
>> space left on device)
>> munmap(0x7f48d9b8c000, 363) = 0
>> close(3)= 0
>> rename("/usr/local/git_tree/main/.git/config.lock",
>> "/usr/local/git_tree/main/.git/config") = 0
>
> Hmph. That is very disturbing. But with that information I should be
> able to track down the culprit. Thanks for digging.

FWIW, I see that git_config_set_multivar_in_file_gently() uses
write_in_full() which in turn uses xwrite(), but the latter has the
following comment on it:

/*
 * xwrite() is the same a write(), but it automatically restarts write()
 * operations with a recoverable error (EAGAIN and EINTR). xwrite() DOES NOT
 * GUARANTEE that "len" bytes is written even if the operation is successful.
 */

I suspect that at this point I am not adding much value here, so I
will leave it at this.

>> I freed up space and things worked, so I somehow doubt the filesystem
>> is at fault. When I then filled up the disk and retried the error was
>> repeatable.
>
> Yeah, agreed. This really does look like a bug.

FWIW, where it bit me turned out to be harmless. So while no doubt
this could be a real PITA for someone it wasn't for me.

Yves


-- 
perl -Mre=debug -e "/just|another|perl|hacker/"


Re: Bug: git branch --unset-upstream command can nuke config when disk is full.

2017-09-13 Thread demerphq
On 13 September 2017 at 16:17, Jeff King  wrote:
> You're welcome to read over the function to double-check, but I just
> looked it over and couldn't find any unchecked writes.

I can look, but I doubt I would notice something you did not.

On the other hand the strace output does show that this is a case
where the writes failed, but we still renamed the empty config.lock
file into place:


write(3, "[core]\n\tsharedRepository = true\n"..., 288) = -1 ENOSPC
(No space left on device)
write(3, "merge = refs/heads/yves/"..., 51) = -1 ENOSPC (No
space left on device)
munmap(0x7f48d9b8c000, 363) = 0
close(3)= 0
rename("/usr/local/git_tree/main/.git/config.lock",
"/usr/local/git_tree/main/.git/config") = 0

Full strace is below:

>> > Given that your output is consistent with it failing to find the key,
>> > and that the result is an empty file, it sounds like somehow the mmap'd
>> > input appeared empty (but neither open nor fstat nor mmap returned an
>> > error). You're not on any kind of exotic filesystem, are you?
>>
>> I don't think so, but I don't know. Is there a command I can run to check?

I freed up space and things worked, so I somehow doubt the filesystem
is at fault. When I then filled up the disk and retried the error was
repeatable.

>> BTW, with a bit of faffing I can probably recreate this problem.
>> Should I try? Is there something I could do during recreation that
>> would help?
>
> If you think you can reproduce, the output of "strace" on a failing
> invocation would be very interesting.

I can reproduce, see below. Preceded and suffixed by ls on the .git/config file.

I have munged the branch name for privacy reasons, hope that doesn't
invalidate the strace utility.

cheers,
yves

$ ls -la .git/config
-rw-rw-r-- 1 root users 363 Sep 13 16:36 .git/config
$ strace git branch --unset-upstream
execve("/usr/bin/git", ["git", "branch", "--unset-upstream"], [/* 39
vars */]) = 0
brk(0)  = 0x222c000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x7f48d9b94000
access("/etc/ld.so.preload", R_OK)  = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)  = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=50300, ...}) = 0
mmap(NULL, 50300, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f48d9b87000
close(3)= 0
open("/lib64/libpcre.so.0", O_RDONLY)   = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0@\25\0\0\0\0\0\0"...,
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=183080, ...}) = 0
mmap(NULL, 2278264, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
0) = 0x7f48d9748000
mprotect(0x7f48d9774000, 2097152, PROT_NONE) = 0
mmap(0x7f48d9974000, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x2c000) = 0x7f48d9974000
close(3)= 0
open("/lib64/libz.so.1", O_RDONLY)  = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0
!\0\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=88600, ...}) = 0
mmap(NULL, 2183696, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
0) = 0x7f48d9532000
mprotect(0x7f48d9547000, 2093056, PROT_NONE) = 0
mmap(0x7f48d9746000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x14000) = 0x7f48d9746000
close(3)= 0
open("/lib64/libpthread.so.0", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\^\0\0\0\0\0\0"...,
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=143280, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1,
0) = 0x7f48d9b86000
mmap(NULL, 2212848, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
0) = 0x7f48d9315000
mprotect(0x7f48d932c000, 2097152, PROT_NONE) = 0
mmap(0x7f48d952c000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x17000) = 0x7f48d952c000
mmap(0x7f48d952e000, 13296, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7f48d952e000
close(3)= 0
open("/lib64/librt.so.1", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\240!\0\0\0\0\0\0"...,
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=44472, ...}) = 0
mmap(NULL, 2128816, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
0) = 0x7f48d910d000
mprotect(0x7f48d9114000, 2093056, PROT_NONE) = 0
mmap(0x7f48d9313000, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x6000) = 0x7f48d9313000
close(3)= 0
open("/lib64/libc.so.6", O_RDONLY)  = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\\356\1\0\0\0\0\0"...,
832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1924768, ...}) = 0
mmap(NULL, 3750184, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3,
0) = 0x7f48d8d79000
mprotect(0x7f48d8f03000, 2097152, PROT_NONE) = 0
mmap(0x7f48d9103000, 24576, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 

Re: RFC v3: Another proposed hash function transition plan

2017-09-13 Thread demerphq
On 13 September 2017 at 14:05, Johannes Schindelin
 wrote:
> For example, I am still in favor of SHA-256 over SHA3-256, after learning
> some background details from in-house cryptographers: it provides
> essentially the same level of security, according to my sources, while
> hardware support seems to be coming to SHA-256 a lot sooner than to
> SHA3-256.

FWIW, and I know it is not worth much, as far as I can tell there is
at least some security/math basis to prefer SHA3-256 to SHA-256.

The SHA1 and SHA-256 hash functions, (iirc along with their older
cousins MD5 and MD2) all have a common design feature where they mix a
relatively large block size into a much smaller state *each block*. So
for instance SHA-256 mixes a 512 bit block into a 256 bit state with a
2:1 "leverage" between the block being read and the state. In SHA1
this was worse, mixing a 512 bit block into a 160 bit state, closer to
3:1 leverage.

SHA3 however uses a completely different design where it mixes a 1088
bit block into a 1600 bit state, for a leverage of 2:3, and the excess
is *preserved between each block*.

Assuming everything else is equal between SHA-256 and SHA3 this
difference alone would seem to justify choosing SHA3 over SHA-256. We
know that there MUST be collisions when compressing a 512 bit block
into a 256 bit space, however one cannot say the same about mixing
1088 bits into a 1600 bit state. The excess state which is not
directly modified by the input block makes a big difference when
reading the next block.

Of course in both cases we end up compressing the entire source
document down to the same number of bits, however SHA3 does that
*once*, in finalization only, whereas SHA-256 does it *every* block
read. So it seems to me that the opportunity for collisions is *much*
higher in SHA-256 than it is in SHA3-256. (Even if they should be
vanishingly rare regardless.)

For this reason if I had a vote I would definitely vote SHA3-256, or
even for SHA3-512. The latter has an impressive 1:2 leverage between
block and state, and much better theoretical security levels.

cheers,
Yves
Note: I am not a cryptographer, although I am probably pretty well
informed as far hobby-hash-function-enthusiasts go.
-- 
perl -Mre=debug -e "/just|another|perl|hacker/"


Re: Bug: git branch --unset-upstream command can nuke config when disk is full.

2017-09-13 Thread demerphq
On 13 September 2017 at 14:34, Jeff King <p...@peff.net> wrote:
> On Wed, Sep 13, 2017 at 01:59:17PM +0200, demerphq wrote:
>
>> After being away for a while I saw the following message in one of my git 
>> repos:
>>
>> $ git status
>> On branch yves/xxx
>> Your branch is based on 'origin/yves/xxx', but the upstream is gone.
>>   (use "git branch --unset-upstream" to fixup)
>>
>> nothing to commit, working tree clean
>> $ git branch --unset-upstream
>> fatal: could not unset 'branch.yves/simple_projection.merge'
>
> Hrm. I wonder what caused this failure. The error would be in
> git_config_set_multivar_in_file_gently(). Most errors there produce
> another error message before hitting the die(). In fact, the only case I
> see where it would not produce another message is if it found nothing to
> unset (but in that case, "branch" would never have called the function
> in the first place).

I just double checked the terminal history and this is all i saw:

$ git status
On branch yves/xxx
Your branch is based on 'origin/yves/xxx', but the upstream is gone.
  (use "git branch --unset-upstream" to fixup)

nothing to commit, working tree clean
$ git branch --unset-upstream
fatal: could not unset 'branch.yves/xxx.merge'
$ git status
On branch yves/xxx
nothing to commit, working tree clean
$ git fetch
fatal: No remote repository specified.  Please, specify either a URL or a
remote name from which new revisions should be fetched.

>> At this point my .git/config file was empty, and all of my config was lost.
>>
>> I assume that things that rewrite .git/config do not check for a
>> successful write before deleting the old version of the file.
>
> No, it writes the new content to "config.lock" and then renames it into
> place.
> All of the write() calls to the temporary file are checked.

I was going to say that perhaps the write was not checked... But if
you are confident they are then...

>The
> old data is copied over after having been ready by mmap (which is also
> error-checked).
>
> Given that your output is consistent with it failing to find the key,
> and that the result is an empty file, it sounds like somehow the mmap'd
> input appeared empty (but neither open nor fstat nor mmap returned an
> error). You're not on any kind of exotic filesystem, are you?

I don't think so, but I don't know. Is there a command I can run to check?

BTW, with a bit of faffing I can probably recreate this problem.
Should I try? Is there something I could do during recreation that
would help?

cheers,
Yves

-- 
perl -Mre=debug -e "/just|another|perl|hacker/"


Bug: git branch --unset-upstream command can nuke config when disk is full.

2017-09-13 Thread demerphq
After being away for a while I saw the following message in one of my git repos:

$ git status
On branch yves/xxx
Your branch is based on 'origin/yves/xxx', but the upstream is gone.
  (use "git branch --unset-upstream" to fixup)

nothing to commit, working tree clean
$ git branch --unset-upstream
fatal: could not unset 'branch.yves/simple_projection.merge'

At this point my .git/config file was empty, and all of my config was lost.

I assume that things that rewrite .git/config do not check for a
successful write before deleting the old version of the file.

This was git version 2.14.1

Yves



-- 
perl -Mre=debug -e "/just|another|perl|hacker/"


Re: your mail

2017-06-23 Thread demerphq
On 22 June 2017 at 23:58, Ævar Arnfjörð Bjarmason  wrote:

> +You don't need to be subscribed to the list to send mail to it, and
> +others on-list will generally CC you when replying (although some
> +forget this). It's adviced to subscribe to the list if you want to be

FWIW: "adviced" is misspelled, it should be "advised", and IMO, it
feels like poor style to begin a sentence with a contraction. Not
strictly wrong, but sufficiently informal that I think it is out of
place in docs like this. Better to just say "It is", or even just "You
are", especially as you use "you" later in the sentence.

I actually think simplifying that sentence considerably is preferable:

"To be sure you receive all follow-up mails you should subscribe to the list."

flows better and is more succinct than

"It's advised to subscribe to the list if you want to be sure you're
not missing follow-up discussion".

> +sure you're not missing follow-up discussion, or if your interest in
> +the project is wider than a one-off bug report, question or patch.

cheers,
yves

-- 
perl -Mre=debug -e "/just|another|perl|hacker/"


Re: Unaligned accesses in sha1dc

2017-06-02 Thread demerphq
On 2 June 2017 at 22:14, Ævar Arnfjörð Bjarmason  wrote:
> On Fri, Jun 2, 2017 at 10:11 PM, Martin Ågren  wrote:
>> On 2 June 2017 at 21:32, Ævar Arnfjörð Bjarmason  wrote:
>>> On Fri, Jun 2, 2017 at 11:49 AM, Martin Ågren  
>>> wrote:
 On 2 June 2017 at 10:51, Ævar Arnfjörð Bjarmason  wrote:
> On Fri, Jun 2, 2017 at 2:15 AM, Junio C Hamano  wrote:
>> Martin Ågren  writes:
>>
>>> I looked into this some more. It turns out it is possible to trigger
>>> undefined behavior on "next". Here's what I did:
>>> ...
>>>
>>> This "fixes" the problem:
>>> ...
>>> diff --git a/sha1dc/sha1.c b/sha1dc/sha1.c
>>> index 3dff80a..d6f4c44 100644
>>> --- a/sha1dc/sha1.c
>>> +++ b/sha1dc/sha1.c
>>> @@ -66,9 +66,9 @@
>>> ...
>>> With this diff, various tests which seem relevant for SHA-1 pass,
>>> including t0013, and the UBSan-error is gone. The second diff is just
>>> a monkey-patch. I have no reason to believe I will be able to come up
>>> with a proper and complete patch for sha1dc. And I guess such a thing
>>> would not really be Git's patch to carry, either. But at least Git
>>> could consider whether to keep relying on undefined behavior or not.
>>>
>>> There's a fair chance I've mangled the whitespace. I'm using gmail's
>>> web interface... Sorry about that.
>>
>> Thanks.  I see Marc Stevens is CC'ed in the thread, so I'd expect
>> that the final "fix" would come from his sha1collisiondetection
>> repository via Ævar.
>>
>> In the meantime, I am wondering if it makes sense to merge the
>> earlier update with #ifdef ALLOW_UNALIGNED_ACCESS and #ifdef
>> SHA1DC_FORCE_LITTLEENDIAN for the v2.13.x maintenance track, which
>> would at least unblock those on platforms v2.13.0 did not work
>> correctly at all.
>>
>> Ævar, thoughts?
>
> I think we're mixing up several things here, which need to be untangled:
>
> 1) The sha1dc works just fine on most platforms even with undefined
> behavior, as evidenced by 2.13.0 working.

 Right, with "platform" meaning "combination of hardware-architecture
 and compiler". Nothing can be said about how the current code behaves
 on "x86". Such statements can only be made with regard to "x86 and
 this or that compiler". Even then, short of studying the compiler
 implementation/documentation in detail, one cannot be certain that
 seemingly unrelated changes in Git don't make the code do something
 else entirely.
>>>
>>> I think you're veering into a theoretical discussion here that has
>>> little to no bearing on the practicalities involved here.
>>>
>>> Yes if something is undefined behavior in C the compiler &
>>> architecture is free to do anything they want with it. In practice
>>> lots of undefined behavior is de-facto standardized across various
>>> platforms.
>>>
>>> As far as I can tell unaligned access is one of those things. I don't
>>> think there's ever been an x86 chip / compiler that would run this
>>> code with any semantic differences when it comes to unaligned access,
>>> and such a chip / compiler is unlikely to ever exist.
>>>
>>> I'm not advocating that we rely on undefined behavior willy-nilly,
>>> just that we should consider the real situation is (i.e. what actual
>>> architectures / compilers are doing or are likely to do) as opposed to
>>> the purely theoretical (if you gave a bunch of aliens who'd never
>>> heard of our technology the ANSI C standard to implement from
>>> scratch).
>>
>> Yeah, that's an argument. I just thought I'd provide whatever input I
>> could, albeit in text form. The only thing that matters in the end is
>> that you (the Git project) feel that you make the correct decision,
>> possibly going beyond "theoretical" reasoning into engineering-land.
>
> I forgot to note, I think it would be very useful if you could submit
> that patch of yours in cleaned up form to the upstream sha1dc project:
> https://github.com/cr-marcstevens/sha1collisiondetection
>
> They might be interested in taking it, even if it's guarded by some
> macro "don't do unaligned access even on archs that seem OK with it".
>
> My comments are just focusing on this in the context of whether we
> should be hotfixing our copy due to an issue in the wild, like e.g.
> the SPARC issue.

A good way to get the sha1dc project properly tests on all platforms
would be to wrap it in a cpan distribution and let cpants (cpan
testers) test it for you on all the platforms under the sun.

In the Sereal project we found and fixed many portability issues with
the csnappy code simply because there are people testing modules in
the cpan world on every platform you can think of, and a few you might
be surprised to find out people still use.

Yves

Yves


-- 
perl 

Re: Unaligned accesses in sha1dc

2017-06-02 Thread demerphq
On 2 June 2017 at 21:32, Ævar Arnfjörð Bjarmason  wrote:
> On Fri, Jun 2, 2017 at 11:49 AM, Martin Ågren  wrote:
>> On 2 June 2017 at 10:51, Ævar Arnfjörð Bjarmason  wrote:
>>> On Fri, Jun 2, 2017 at 2:15 AM, Junio C Hamano  wrote:
 Martin Ågren  writes:

> I looked into this some more. It turns out it is possible to trigger
> undefined behavior on "next". Here's what I did:
> ...
>
> This "fixes" the problem:
> ...
> diff --git a/sha1dc/sha1.c b/sha1dc/sha1.c
> index 3dff80a..d6f4c44 100644
> --- a/sha1dc/sha1.c
> +++ b/sha1dc/sha1.c
> @@ -66,9 +66,9 @@
> ...
> With this diff, various tests which seem relevant for SHA-1 pass,
> including t0013, and the UBSan-error is gone. The second diff is just
> a monkey-patch. I have no reason to believe I will be able to come up
> with a proper and complete patch for sha1dc. And I guess such a thing
> would not really be Git's patch to carry, either. But at least Git
> could consider whether to keep relying on undefined behavior or not.
>
> There's a fair chance I've mangled the whitespace. I'm using gmail's
> web interface... Sorry about that.

 Thanks.  I see Marc Stevens is CC'ed in the thread, so I'd expect
 that the final "fix" would come from his sha1collisiondetection
 repository via Ævar.

 In the meantime, I am wondering if it makes sense to merge the
 earlier update with #ifdef ALLOW_UNALIGNED_ACCESS and #ifdef
 SHA1DC_FORCE_LITTLEENDIAN for the v2.13.x maintenance track, which
 would at least unblock those on platforms v2.13.0 did not work
 correctly at all.

 Ævar, thoughts?
>>>
>>> I think we're mixing up several things here, which need to be untangled:
>>>
>>> 1) The sha1dc works just fine on most platforms even with undefined
>>> behavior, as evidenced by 2.13.0 working.
>>
>> Right, with "platform" meaning "combination of hardware-architecture
>> and compiler". Nothing can be said about how the current code behaves
>> on "x86". Such statements can only be made with regard to "x86 and
>> this or that compiler". Even then, short of studying the compiler
>> implementation/documentation in detail, one cannot be certain that
>> seemingly unrelated changes in Git don't make the code do something
>> else entirely.
>
> I think you're veering into a theoretical discussion here that has
> little to no bearing on the practicalities involved here.
>
> Yes if something is undefined behavior in C the compiler &
> architecture is free to do anything they want with it. In practice
> lots of undefined behavior is de-facto standardized across various
> platforms.
>
> As far as I can tell unaligned access is one of those things. I don't
> think there's ever been an x86 chip / compiler that would run this
> code with any semantic differences when it comes to unaligned access,
> and such a chip / compiler is unlikely to ever exist.
>
> I'm not advocating that we rely on undefined behavior willy-nilly,
> just that we should consider the real situation is (i.e. what actual
> architectures / compilers are doing or are likely to do) as opposed to
> the purely theoretical (if you gave a bunch of aliens who'd never
> heard of our technology the ANSI C standard to implement from
> scratch).
>
> Here's a performance test of your patch above against p3400-rebase.sh.
> I don't know how much these error bars from t/perf can be trusted.
> This is over 30 runs with -O3:
>
> - 3400.2: rebase on top of a lot of unrelated changes
>   v2.12.0 : 1.25(1.10+0.06)
>   v2.13.0 : 1.21(1.06+0.06) -3.2%
>   origin/next : 1.22(1.04+0.07) -2.4%
>   martin: 1.23(1.06+0.07) -1.6%
> - 3400.4: rebase a lot of unrelated changes without split-index
>   v2.12.0 : 6.49(3.60+0.52)
>   v2.13.0 : 8.21(4.18+0.55) +26.5%
>   origin/next : 8.27(4.34+0.64) +27.4%
>   martin: 8.80(4.36+0.62) +35.6%
> - 3400.6: rebase a lot of unrelated changes with split-index
>   v2.12.0 : 6.77(3.56+0.51)
>   v2.13.0 : 4.09(2.67+0.38) -39.6%
>   origin/next : 4.13(2.70+0.36) -39.0%
>   martin: 4.30(2.80+0.32) -36.5%
>
> And just your patch v.s. next:
>
> - 3400.2: rebase on top of a lot of unrelated changes
>   origin/next : 1.22(1.06+0.06)
>   martin  : 1.22(1.06+0.05) +0.0%
> - 3400.4: rebase a lot of unrelated changes without split-index
>   origin/next : 7.54(4.13+0.60)
>   martin  : 7.75(4.34+0.67) +2.8%
> - 3400.6: rebase a lot of unrelated changes with split-index
>   origin/next : 4.19(2.92+0.31)
>   martin  : 4.14(2.84+0.39) -1.2%
>
> It seems to be a bit slower, is that speedup worth the use of
> unaligned access? I genuinely don't know. I'm just interested to find
> what if anything we need to urgently fix in a release version of git.
>
> One data point there is that the fallback blk-sha1 implementation
> we've shipped 

Re: Git 2.13.0 segfaults on Solaris SPARC due to DC_SHA1=YesPlease being on by default

2017-06-01 Thread demerphq
On 16 May 2017 at 00:09, Jeff King  wrote:
> On Mon, May 15, 2017 at 04:13:58PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> On Mon, May 15, 2017 at 3:58 PM, Marc Stevens  wrote:
>> > Hi Aevar,
>> >
>> > Thank you for notifying us of this issue.
>> > Big endianness is a tricky issue, also since I don't have access or 
>> > accurate knowledge about all big endian systems.
>> > Our github repo does check correct functioning, including an endianness 
>> > mistake, with 'make test'.
>> > But I guess this is not included for SHA1DC in Git.
>> >
>> > Anyway, we can easily add the _BIG_ENDIAN macrotest to the git repo and 
>> > will do so soon.
>> >
>> > I don't think the segfault is caused by buffer overflow, inproper access, 
>> > or the endianness issue.
>> > But I did notice an unexpected issue: the message block pointer m=0x398ad5 
>> > is odd.
>> > Can you confirm whether loading an uint32_t from an odd address triggers a 
>> > hardware interrupt on your platform?
>> > This is not problem for x86, but maybe for your platform it is?
>> > If it is then we should always copy buffer contents to the sha1context to 
>> > avoid this issue.
>>
>> I don't have access to the box in question, Michael was testing this
>> code for me. But unaligned access is probably the cause, although
>> according to some info I found online that should give a SIGBUS not a
>> SIGSEGV, but that may have changed:
>
> Yeah, I would have expected SIGBUS there. If we have alignment issues,
> though, I'd expect that ARM systems will experience problems.
>
> Block-sha1 uses a macro which allows unaligned loads on platforms that
> support it, and otherwise does the endian conversion on the fly as we
> load the bytes into a local variable (which presumably happens all
> in-register). That may be faster than doing a mass copy of the buffer.

I agree. It is fairly normal to use that kind of macro and not do a
memcpy with hash functions.

In fact many hash functions ONLY use that kind of macro, as decent
compilers will automagically convert the macro into an unaligned load
on platforms that support fast unaligned loads.

The only reason to expose the direct unaligned load is to make sure
that the hashing code is fast on such platforms even when compiled
under -g.

Yves


-- 
perl -Mre=debug -e "/just|another|perl|hacker/"


Re: PCRE v2 compile error, was Re: What's cooking in git.git (May 2017, #01; Mon, 1)

2017-05-09 Thread demerphq
On 9 May 2017 at 13:12, Ævar Arnfjörð Bjarmason  wrote:
> On Tue, May 9, 2017 at 2:37 AM, brian m. carlson
>  wrote:
>> On Tue, May 09, 2017 at 02:00:18AM +0200, Ævar Arnfjörð Bjarmason wrote:
> * gitweb is vulnerable to CPU DoS now in its default configuration.
> It's easy to provide an ERE that ends up slurping up 100% CPU for
> several seconds on any non-trivial sized repo, do that in parallel &
> you have a DoS vector.

Does one need an ERE? Can't one do that now to many parts of git just
with a glob?

Yves


Re: Feature request: --format=json

2017-04-18 Thread demerphq
On 18 April 2017 at 10:44, Fred .Flintstone  wrote:
> Well the easiest way to work with that would be JSON.
> So the best would be if Git could output the data I want in JSON format.
> Then it would be easy for me to work with data.
>
> With git rev-list and git-cat file, its not so easy to reliably parse
> that output.

Doesn't seem too hard to work with rev-list to me. As far as I can
tell the following produces what  you want. You need perl installed
obviously, and the JSON::PP module is required, but should come
bundled with recent perls.

git rev-list master --pretty=raw | perl -MJSON::PP=encode_json
-ane'if(/^(\w+) (.*)/) { if ($1 eq "commit") { push @objs, $o if $o;
$o={}; } $o->{$1} = $2; } else { $o->{text} .= $_; } END{ push @objs,
$o if $o; for $o (@objs) { s/^//mg, s/^\n// for $o->{text};
($o->{message},$o->{long_message})= split /\n\n/, delete $o->{text}; }
print JSON::PP->new->pretty->encode(\@objs);}'

You might consider an alternative approach than stating that working
with JSON is "the easiest", especially to people who clearly are
making do without it. :-)

A better argument might be that exposing data through a well defined
and widely used and simple data format would trivially expand the
range of projects that might interoperate with git or enhance the git
ecosystem. For instance you could argue that having clean JSON output
would make it easier to integrate into search engines and other
indexing tools that already know how to speak JSON. Maybe a regular
contributor on this list might agree with your arguments and make it
happen.

Until then you can parse rev-list like the rest of us. :-)

cheers,
Yves


Re: Will OpenSSL's license change impact us?

2017-03-25 Thread demerphq
On 25 March 2017 at 17:35, Ævar Arnfjörð Bjarmason <ava...@gmail.com> wrote:
> On Sat, Mar 25, 2017 at 10:43 AM, demerphq <demer...@gmail.com> wrote:
>>
>>
>> On 25 Mar 2017 10:18 a.m., "Ævar Arnfjörð Bjarmason" <ava...@gmail.com>
>> wrote:
>>
>> On Sat, Mar 25, 2017 at 9:40 AM, demerphq <demer...@gmail.com> wrote:
>>> On 25 March 2017 at 00:51, Ævar Arnfjörð Bjarmason <ava...@gmail.com>
>>> wrote:
>>>> They're changing their license[1] to Apache 2 which unlike the current
>>>> fuzzy compatibility with the current license[2] is explicitly
>>>> incompatible with GPLv2[3].
>>>
>>> Are you sure there is an issue? From the Apache page on this:
>>>
>>> Apache 2 software can therefore be included in GPLv3 projects, because
>>> the GPLv3 license accepts our software into GPLv3 works. However,
>>> GPLv3 software cannot be included in Apache projects. The licenses are
>>> incompatible in one direction only, and it is a result of ASF's
>>> licensing philosophy and the GPLv3 authors' interpretation of
>>> copyright law.
>>>
>>> Which seems to be the opposite of the concern you are expressing.
>>
>> The Apache 2 license is indeed compatible with the GPLv3, but the Git
>> project explicitly uses GPLv2 with no "or later" clause
>>
>>
>> Read the paragraph immediately (I think) after the one I quoted where they
>> state the situation is the same with GPL v2.
>
> My understanding of that paragraph is that it's still laying out
> caveats about exactly how GPLv3 is compatible with Apache 2, when it
> is, when it isn't etc. But then it goes on to say:
>
> """
> Despite our best efforts, the FSF has never considered the Apache
> License to be compatible with GPL version 2, citing the patent
> termination and indemnification provisions as restrictions not present
> in the older GPL license. The Apache Software Foundation believes that
> you should always try to obey the constraints expressed by the
> copyright holder when redistributing their work.
> """
>
> So they're just deferring to the FSF saying it's incompatible, the
> FSF's statement:
> https://www.gnu.org/licenses/license-list.html#apache2 "this license
> is not compatible with GPL version 2".
>
> Anyway, I'm not a lawyer. Just thought I'd send some E-Mail about this
> since I noticed it, if it's an issue (and we could e.g. get the SFC to
> comment, Jeff?) we might need to add e.g. some checks / macros to
> ensure we're not compiling against an incompatible OpenSSL.

Just for the record this what Apache says, with the part I was
referring to earlier in slash style italics, and a couple of a key
points in star style bold:

quote
Apache 2 software *can therefore be included in GPLv3 projects*,
because the GPLv3 license accepts our software into GPLv3 works.
However, GPLv3 software cannot be included in Apache projects. *The
licenses are incompatible in one direction only*, and it is a result
of ASF's licensing philosophy and the GPLv3 authors' interpretation of
copyright law.

This licensing incompatibility applies only when some Apache project
software becomes a derivative work of some GPLv3 software, because
then the Apache software would have to be distributed under GPLv3.
This would be incompatible with ASF's requirement that all Apache
software must be distributed under the Apache License 2.0.

We avoid GPLv3 software because merely linking to it is considered by
the GPLv3 authors to create a derivative work. We want to honor their
license. Unless GPLv3 licensors relax this interpretation of their own
license regarding linking, our licensing philosophies are
fundamentally incompatible. /This is an identical issue for both GPLv2
and GPLv3./
quote

I read that as saying that you can use Apache 2 code in GPL projects,
but you can't use GPL code in Apache projects. Which makes sense as
Apache 2 is more liberal than GPL.

Yves

-- 
perl -Mre=debug -e "/just|another|perl|hacker/"


Re: Will OpenSSL's license change impact us?

2017-03-25 Thread demerphq
On 25 March 2017 at 00:51, Ævar Arnfjörð Bjarmason  wrote:
> They're changing their license[1] to Apache 2 which unlike the current
> fuzzy compatibility with the current license[2] is explicitly
> incompatible with GPLv2[3].

Are you sure there is an issue? From the Apache page on this:

Apache 2 software can therefore be included in GPLv3 projects, because
the GPLv3 license accepts our software into GPLv3 works. However,
GPLv3 software cannot be included in Apache projects. The licenses are
incompatible in one direction only, and it is a result of ASF's
licensing philosophy and the GPLv3 authors' interpretation of
copyright law.

Which seems to be the opposite of the concern you are expressing.

Yves


-- 
perl -Mre=debug -e "/just|another|perl|hacker/"


Re: send-email garbled header with trailing doublequote in email

2016-11-03 Thread demerphq
On 3 November 2016 at 15:18, Jeff King  wrote:
> On Wed, Nov 02, 2016 at 11:29:01PM +0100, Andrea Arcangeli wrote:
>
>> So this must be postfix then that out of the blue decided to garble it
>> in a strange way while parsing the input... The removal of all
>> whitespaces s/what ever/whatever/ especially I've no idea how it
>> decided to do so.
>>
>> Can you reproduce with postfix as sendmail at least? If you can
>> reproduce also see what happens if you add another --to.
>
> Yes, I can easily reproduce without using git at all by installing
> postfix in local-delivery mode and running:
>
> sendmail p...@sigill.intra.peff.net <<\EOF
> From: Jeff King 
> To: "what ever" " 
> Subject: patch
>
> This is the body
> EOF
>
> Many MTAs do this kind of header-rewriting. I don't necessarily agree
> with it as a general concept, but the real problem is the syntactically
> bogus header. The munging that postfix does makes things worse, but I
> can see why it is confused and does what it does (the whole email is
> inside a double-quoted portion that is never closed, so it probably
> thinks there is no hostname portion at all).
>
> So git is possibly at fault for passing along a bogus address. OTOH, the
> user is perhaps at fault for providing the bogus address to git in the
> first place. GIGO. :)
>
> I think if any change were to be made, it would be to recognize this
> bogosity and either clean it up or abort. That ideally would happen via
> Mail::Address so git does not have to add a bunch of ad-hoc "is this
> valid rfc822" checks. Reading the manpage for that module, though, it
> says:
>
>   [we do not handle all of rfc2822]
>   Often requests are made to the maintainers of this code improve this
>   situation, but this is not a good idea, where it will break zillions
>   of existing applications.  If you wish for a fully RFC2822 compliant
>   implementation you may take a look at Mail::Message::Field::Full, part
>   of MailBox.
>
> So it's possible that switching to a more robust module would improve
> things.

There is an RFC2822 compliant email address validator in the perl test
suite if you guys want to use it.  We use it to test recursive
patterns.

http://perl5.git.perl.org/perl.git/blob/HEAD:/t/re/reg_email.t

Yves


-- 
perl -Mre=debug -e "/just|another|perl|hacker/"


Re: git add without whitespace

2016-05-31 Thread demerphq
On 30 May 2016 at 21:06, Junio C Hamano  wrote:
> Robert Dailey  writes:
>
>> $ git diff -U0 -w --no-color | git apply --cached --ignore-whitespace
>> --unidiff-zero
>>
>> This command explicitly leaves out context because it can sometimes
>> cause the patch to fail to apply, I think due to whitespace being in
>> it, but I'm not completely sure myself.
>
> I have had this in my ~/.gitconfig for a long time.
>
> [alias]
> wsadd = "!sh -c 'git diff -- \"$@\" | git apply --cached 
> --whitespace=fix;\
> git co -- ${1-.} \"$@\"' -"
>
> That is, "take what's different from the _index_ and the working
> tree, apply that difference while correcting whitespace errors to
> the index, and check the result out to the working tree".  This
> would _not_ touch existing whitespace-damaged lines that you are not
> touching, and honours the customized definition of what is
> considered whitespace breakage for each paths (which you set up with
> the attributes system).


That is very very cool. I have a perl script that does the same thing
from git-blame output. This is MUCH nicer.

cheers,
Yves


-- 
perl -Mre=debug -e "/just|another|perl|hacker/"
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: t6044 broken on pu

2016-05-09 Thread demerphq
On 8 May 2016 at 20:20, Junio C Hamano  wrote:
> Torsten Bögershausen  writes:
>
>> May a  simple
>>  printf "1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n"
>>
>> be an option ?
>
> If you were to do that, at least have the decency to make it more
> readable by doing something like:
>
> printf "%s\n" 1 2 3 4 5 6 7 8 9 10
>
> ;-)
>
> But as I said, as a response to "t6044 broken on pu" bug report,
> s/seq/test_seq/ is the only sensible change.
>
> Improving "test_seq, the alternative to seq" is a separate topic.
>
> If you have aversion to $PERL, perhaps do them without using
> anything what is not expected to be built-in in modern shells,
> perhaps like this?
>
>  t/test-lib-functions.sh | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
> index 8d99eb3..4edddac 100644
> --- a/t/test-lib-functions.sh
> +++ b/t/test-lib-functions.sh
> @@ -739,7 +739,12 @@ test_seq () {
> 2)  ;;
> *)  error "bug in the test script: not 1 or 2 parameters to 
> test_seq" ;;
> esac
> -   perl -le 'print for $ARGV[0]..$ARGV[1]' -- "$@"
> +   test_seq_counter__=$1
> +   while test "$test_seq_counter__" -le $2
> +   do
> +   echo "$test_seq_counter__"
> +   test_seq_counter__=$((test_seq_counter__ + 1))
> +   done
>  }

Is that perl snippet ever called with non-numeric output?

perl -le 'print for $ARGV[0]..$ARGV[1]' -- A E
A
B
C
D
E

Yves



-- 
perl -Mre=debug -e "/just|another|perl|hacker/"
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: best practices against long git rebase times?

2015-12-04 Thread demerphq
On 4 December 2015 at 18:28, John Keeping <j...@keeping.me.uk> wrote:
> On Fri, Dec 04, 2015 at 06:09:33PM +0100, demerphq wrote:
>> On 4 December 2015 at 16:05, Andreas Krey <a.k...@gmx.de> wrote:
>> > Hi all,
>> >
>> > our workflow is pretty rebase-free for diverse reasons yet.
>> >
>> > One obstacle now appearing is that rebases simply take
>> > very long - once you might want to do a rebase there are
>> > several hundred commits on the remote branch, and our tree
>> > isn't small either.
>> >
>> > This produces rebase times in the minute range.
>> > I suppose this is because rebase tries to see
>> > if there are new commits in the destination
>> > branch that are identical to one of the local
>> > commits, to be able to skip them. (I didn't
>> > try to verify this hypothesis.)
>> >
>> > What can we do to make this faster?
>>
>> I bet you have a lot of refs; tags, or branches.
>>
>> git rebase performance along with many operations seems to scale
>> proportionately to the number of tags.
>>
>> At $work we create a tag every time we "roll out" a "server type".
>>
>> This produces many tags a day.
>>
>> Over time rebase, and many operations actually, start slowing down to
>> the point of painfulness.
>>
>> The workaround we ended up using was to set up a cron job and related
>> infra that removed old tags.
>>
>> Once we got rid of most of our old tags git became nice to use again.
>
> This is quite surprising.  Were you using packed or loose tags?

It didn't matter.

> It would be interesting to run git-rebase with GIT_TRACE_PERFORMANCE to
> see which subcommand is slow in this particular scenario.

These days it isn't that slow :-)

But I cc'ed Avar, he did the work on that, all I know is when he
finished the tag remover I stopped cursing every time I rebased.

I believe I remember him saying that you can reproduce it using a
public repo by taking the linux repo and creating a tag every 10
commits or so. Once you are done git in many operations will be nice
and slow!

In all fairness however, I do believe that some of the recent changes
to git helped, but I dont how much or which. What I do know is we
still have the cron sweeper process cleaning refs. (It broke one of my
repos that I set up with --reference just the other day).

Yves




-- 
perl -Mre=debug -e "/just|another|perl|hacker/"
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: best practices against long git rebase times?

2015-12-04 Thread demerphq
On 4 December 2015 at 16:05, Andreas Krey  wrote:
> Hi all,
>
> our workflow is pretty rebase-free for diverse reasons yet.
>
> One obstacle now appearing is that rebases simply take
> very long - once you might want to do a rebase there are
> several hundred commits on the remote branch, and our tree
> isn't small either.
>
> This produces rebase times in the minute range.
> I suppose this is because rebase tries to see
> if there are new commits in the destination
> branch that are identical to one of the local
> commits, to be able to skip them. (I didn't
> try to verify this hypothesis.)
>
> What can we do to make this faster?

I bet you have a lot of refs; tags, or branches.

git rebase performance along with many operations seems to scale
proportionately to the number of tags.

At $work we create a tag every time we "roll out" a "server type".

This produces many tags a day.

Over time rebase, and many operations actually, start slowing down to
the point of painfulness.

The workaround we ended up using was to set up a cron job and related
infra that removed old tags.

Once we got rid of most of our old tags git became nice to use again.

Try making a clone, nuking all the refs in it, and then time rebase and friends.

Yves





-- 
perl -Mre=debug -e "/just|another|perl|hacker/"
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: XDL_FAST_HASH can be very slow

2014-12-22 Thread demerphq
(sorry for the repost, I use gmail and it send html mails by default).
On 22 December 2014 at 11:48, Thomas Rast t...@thomasrast.ch wrote:

 1. hash function throughput
 2. quality of the hash values
 3. avoiding collision attacks

 XDL_FAST_HASH was strictly an attempt to improve throughput, and fairly
 successful at that (6942efc (xdiff: load full words in the inner loop of
 xdl_hash_record, 2012-04-06) quotes an 8% improvement on 'git log -p').

 You are now addressing quality.

 I have no idea how you ran into this, but if you are reworking things
 already, I think it would be good to also randomize whatever hash you
 put in so as to give some measure of protection against collision
 attacks.

I assume you mean DJB2 when you say DJB, and if so I will just note
that it is a pretty terrible hash function for arbitrary data. (I
understand it does better with raw text.) It does not pass either
strict-avalanche-criteria[1], nor does it pass the
bit-independence-criteria[2]. I have images which show how badly DJB2
fails these tests if anyone is interested.

Murmur3 is better, in that it does pass SAC and BIC, but before you
decide to use Murmur3 you should review https://131002.net/siphash/and
related resources which demonstrate multi-collision attacks on Murmur3
which are independent of the seed chosen. The paper also introduces a
new hash function with good performance properties, and claims that it
has cyptographic strength. (I say claims because I am not qualified to
judge if it is or not.) Eg:
https://131002.net/siphash/murmur3collisions-20120827.tar.gz

I think if you want performance and robustness against collision
attacks Siphash is a good candidate, as is perhaps the AES derived
hash used by the Go folks, but the performance of that algorithm is
strongly dependent on the CPU supporting AES primitives.

Anyway, the point is that simply adding a random seed to a hash
function like DJB2 or Murmur3 is not sufficient to prevent collision
attacks.

Yves
[1] A change in a single bit of the seed or the key should result in
50% of the output bits of the hash changing.
[2] output bits j and k should change independently when any single
input bit i is inverted, for all i, j and k.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GSoC14][RFC] Is there any interest in adding a port of checkpatch.pl to contrib/?

2014-03-19 Thread demerphq
On 18 March 2014 02:38, Jacopo Notarstefano
jacopo.notarstef...@gmail.com wrote:
 3. As far as I can tell, checkpatch needs to be run from the root
 folder of a linux repository clone. Cloning several hundred MBs for a
 single perl script looks a little foolish to me.

If that is your worry maybe you should upload the script to CPAN.

Yves


-- 
perl -Mre=debug -e /just|another|perl|hacker/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: question about: Facebook makes Mercurial faster than Git

2014-03-10 Thread demerphq
On 10 March 2014 11:07, Dennis Luehring dl.so...@gmx.net wrote:
 according to these blog posts

 http://www.infoq.com/news/2014/01/facebook-scaling-hg
 https://code.facebook.com/posts/218678814984400/scaling-mercurial-at-facebook/

 mercurial can be faster then git

 but i don't found any reply from the git community if it is a real problem
 or if there a ongoing (maybe git 2.0) changes to compete better in this case

They mailed the list about performance issues in git. From what I saw
there was relatively little feedback.

I had the impression, and I would not be surprised if they had the
impression that the git development community is relatively
unconcerned about performance issues on larger repositories.

There have been other reports, which are difficult to keep track of
without a bug tracking system, but the ones I know of are:

Poor performance of git status with large number of excluded files and
large repositories.
Poor performance, and breakage, on repositories with very large
numbers of files in them. (Rebase for instance will break if you
rebase a commit that contains a *lot* of files.)
Poor performance in protocol layer (and other places) with repos with
large numbers of refs. (Maybe this is fixed, not sure.)

cheers,
Yves




-- 
perl -Mre=debug -e /just|another|perl|hacker/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Confusing git log --- First time bug submission please advise on best practices

2014-02-07 Thread demerphq
On 7 February 2014 18:26, Duy Nguyen pclo...@gmail.com wrote:
 On Fri, Feb 07, 2014 at 09:43:46AM +, Francis Stephens wrote:
 Thanks for your clear response. I can see where I went wrong now.

 Maybe something like this would help avoid confusion a bit in the
 future? This toy patch puts a horizontal line as a break between two
 commits if they are not related, so we can clearly see linear commit
 segments.

FWIW, this would have saved a lot of head scratching at work over the years.

I'd love to see this in place.

Yves


-- 
perl -Mre=debug -e /just|another|perl|hacker/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Rebase triggers git diff header lacks filename information on very large patch with binary files

2014-01-14 Thread demerphq
Hi,

I just did a rebase, and it throws an error like this:

Applying: comment1
Applying: comment2
Applying: comment3
Applying: comment4
Applying: patch_with_binary_files
fatal: git diff header lacks filename information when removing 1
leading pathname component (line 7330213)
Repository lacks necessary blobs to fall back on 3-way merge.
Cannot fall back to three-way merge.
Patch failed at 0005 patch_with_binary_files
The copy of the patch that failed is found in:
   /usr/local/git_tree/affiliate_data/.git/rebase-apply/patch

When you have resolved this problem, run git rebase --continue.
If you prefer to skip this patch, run git rebase --skip instead.
To check out the original branch and stop rebasing, run git rebase --abort.

The patch is very large, 882453899 bytes.

The patch also includes many binary files.

Extracting the content around and before line 7330213 and up to the
next diff header in the patch I see this:

perl -lne'print $.\t$_ if 7330169 .. 7330213' .git/rebase-apply/patch
7330169 diff --git a/dir1/dir2/file.png b/dir1/dir2/file.png
7330170 new file mode 100644
7330171 index 
..8a3219cb6545f23e3f7c61f058d82fc2c1bd9aac
7330172 GIT binary patch
7330173 literal 11301
7330174 zcmXYX1ymeO)Ai!+PH-nk@Zb{MHE3{$;O=gVh2Rd06MPp4?hxD|K!5jEd=*(p7;Ov
[more lines of binary removed]
7330213 zznckDs-GVJg-A0uD|ONvCQWVX;j!JNnkQI9^=+zJ^SvLe1p-~c7bmY5wu4C=(8F0
[more lines of binary removed]
7330393 literal 0
7330394 HcmV?d1
7330395
7330396 diff --git a/dir1/dir2/file.css b/dir1/dir2/file.css
7330397 new file mode 100644
7330398 index 
..75c8afc558424ea185c62b5a1c61ad6c32cddc21

I have munged the filenames.

It looks to me like git can't apply patches over a certain size.

Any suggestions on how to proceed here?

cheers,
Yves




-- 
perl -Mre=debug -e /just|another|perl|hacker/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Rebase triggers git diff header lacks filename information on very large patch with binary files

2014-01-14 Thread demerphq
On 14 January 2014 12:48, demerphq demer...@gmail.com wrote:
 Hi,

 I just did a rebase, and it throws an error like this:

 Applying: comment1
 Applying: comment2
 Applying: comment3
 Applying: comment4
 Applying: patch_with_binary_files
 fatal: git diff header lacks filename information when removing 1
 leading pathname component (line 7330213)
 Repository lacks necessary blobs to fall back on 3-way merge.
 Cannot fall back to three-way merge.
 Patch failed at 0005 patch_with_binary_files
 The copy of the patch that failed is found in:
/usr/local/git_tree/affiliate_data/.git/rebase-apply/patch

 When you have resolved this problem, run git rebase --continue.
 If you prefer to skip this patch, run git rebase --skip instead.
 To check out the original branch and stop rebasing, run git rebase --abort.

 The patch is very large, 882453899 bytes.

 The patch also includes many binary files.

 Extracting the content around and before line 7330213 and up to the
 next diff header in the patch I see this:

 perl -lne'print $.\t$_ if 7330169 .. 7330213' .git/rebase-apply/patch
 7330169 diff --git a/dir1/dir2/file.png b/dir1/dir2/file.png
 7330170 new file mode 100644
 7330171 index 
 ..8a3219cb6545f23e3f7c61f058d82fc2c1bd9aac
 7330172 GIT binary patch
 7330173 literal 11301
 7330174 zcmXYX1ymeO)Ai!+PH-nk@Zb{MHE3{$;O=gVh2Rd06MPp4?hxD|K!5jEd=*(p7;Ov
 [more lines of binary removed]
 7330213 zznckDs-GVJg-A0uD|ONvCQWVX;j!JNnkQI9^=+zJ^SvLe1p-~c7bmY5wu4C=(8F0
 [more lines of binary removed]
 7330393 literal 0
 7330394 HcmV?d1
 7330395
 7330396 diff --git a/dir1/dir2/file.css b/dir1/dir2/file.css
 7330397 new file mode 100644
 7330398 index 
 ..75c8afc558424ea185c62b5a1c61ad6c32cddc21

 I have munged the filenames.

 It looks to me like git can't apply patches over a certain size.

 Any suggestions on how to proceed here?

I aborted and then did a merge instead and it seemed to work out.

Still, seems like something git should detect BEFORE it tries to do the rebase.

cheers,
Yves



-- 
perl -Mre=debug -e /just|another|perl|hacker/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Administrivia] On ruby and contrib/

2013-06-06 Thread demerphq
On 5 June 2013 16:45, Felipe Contreras felipe.contre...@gmail.com wrote:
 On Tue, Jun 4, 2013 at 7:04 PM, Junio C Hamano gits...@pobox.com wrote:
 That might make sense for the shorter term, but in longer term I see
 Perl as declining in favor of other languages. It's only a matter of
 time before Ruby surpasses Perl in popularity, and soon enough new
 contributors to the Git project will have problems trying to improve
 Git because parts of it are written in a language they are not
 familiar with, and have trouble learning (isn't that already
 happening?).

 The Ruby vs. Python is another question altogether, I could go into
 detail about why I think Ruby is a better choice, but my point right
 now is that Perl is not a good choice for the future.

Good thing you are being objective and leaving out the Python 3.0
mess, the long legacy of backwards compatibility in the Perl
community, the active community behind it, its extensive portability
support, and fail to mention the lack of an equivalent to CPAN. We
wouldn't want facts to get in the way of a personal bias would we?

Just thought I'd push back on the FUD. People have been saying Perl is
going away for decades...

Yves
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: inotify to minimize stat() calls

2013-02-10 Thread demerphq
On 10 February 2013 12:17, Duy Nguyen pclo...@gmail.com wrote:
 Bear in mind though this is Linux, where lstat is fast. On systems
 with slow lstat, these timings could look very different due to the
 large number of lstat calls compared to open+getdents. I really like
 to see similar numbers on Windows.

Is windows stat really so slow? I encountered this perception in
windows Perl in the past, and I know that on windows Perl stat
*appears* slow compared to *nix, because in order to satisfy the full
*nix stat interface, specifically the nlink field, it must open and
close the file*. As of 5.10 this can be disabled by setting a magic
var ${^WIN32_SLOPPY_STAT} to a true value, which makes a significant
improvement to the performance of the Perl level stat implementation.
I would not be surprised if the cygwin implementation of stat() has
the same issue as Perl did, and that stat appears much slower than it
actually need be if you don't care about the nlink field.

Yves
* http://perl5.git.perl.org/perl.git/blob/HEAD:/win32/win32.c#l1492

-- 
perl -Mre=debug -e /just|another|perl|hacker/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CodingGuidelines Perl amendment

2013-02-06 Thread demerphq
On 6 February 2013 17:29, Junio C Hamano gits...@pobox.com wrote:
 Ted Zlatanov t...@lifelogs.com writes:

  - As in C (see above), we avoid using braces unnecessarily (but Perl
forces braces around if/unless/else/foreach blocks, so this is not
always possible).

 Is it ever (as opposed to not always) possible to omit braces?

Only in a statement modifier.

 It sounds as if we encourage the use of statement modifiers, which
 certainly is not what I want to see.

As you mention below statement modifiers have their place. For instance

  next if $whatever;

Is considered preferable to

if ($whatever) {
  next;
}

Similarly

open my $fh, , $filename
   or die Failed to open '$filename': $!;

Is considered preferable by most Perl programmers to:

my $fh;
if ( not open $fh, , $filename ) {
  die Failed to open '$filename': $!;
}

 You probably would want to mention that opening braces for
 if/else/elsif do not sit on their own line,
 and closing braces for
 them will be followed the next else/elseif on the same line
 instead, but that is part of most of the C guidelines above apply
 so it may be redundant.

  - Don't abuse statement modifiers (unless $youmust).

 It does not make a useful guidance to leave $youmust part
 unspecified.

 Incidentally, your sentence is a good example of where use of
 statement modifiers is appropriate: $youmust is rarely true.

unless often leads to maintenance errors as the expression gets more
complicated over time, more branches need to be added to the
statement, etc. Basically people are bad at doing De Morgans law in
their head.

 In general:

 ... do something ...
 do_this() unless (condition);
 ... do something else ...

 is easier to follow the flow of the logic than

 ... do something ...
 unless (condition) {
 do_this();
 }
 ... do something else ...

 *only* when condition is extremely rare, iow, when do_this() is
 expected to be almost always called.

if (not $condition) {
  do_this();
}

Is much less error prone in terms of maintenance than

unless ($condition) {
  do_this();
}

Similarly

do_this() if not $condition;

leads to less maintenance errors than

do_this() unless $condition;

So if you objective is maintainability I would just ban unless outright.

Cheers,
Yves

-- 
perl -Mre=debug -e /just|another|perl|hacker/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CodingGuidelines Perl amendment

2013-02-06 Thread demerphq
On 6 February 2013 19:14, Junio C Hamano gits...@pobox.com wrote:
 demerphq demer...@gmail.com writes:

 As you mention below statement modifiers have their place. For instance

   next if $whatever;

 Is considered preferable to

 if ($whatever) {
   next;
 }

 Similarly

 open my $fh, , $filename
or die Failed to open '$filename': $!;

 Is considered preferable by most Perl programmers to:

 my $fh;
 if ( not open $fh, , $filename ) {
   die Failed to open '$filename': $!;
 }

 Yeah, and that is for the same reason.  When you are trying to get a
 birds-eye view of the codeflow, the former makes it clear that we
 do something, and then we open, and then we ..., without letting
 the error handling (which also is rare case) distract us.

perldoc perlstyle has language which explains this well if you want to
crib a description from somewhere.

 unless often leads to maintenance errors as the expression gets more
 complicated over time,...

 That might also be true, but my comment was not an endorsement for
 (or suggestion against) use of unless.  I was commenting on
 statement modifiers, which some people tend to overuse (or abuse)
 and make the resulting code harder to follow.

That's also my point about unless. They tend to get abused and then
lead to maint devs making errors, and people misunderstanding the
code. The only time that unless IMO is ok (ish) is when it really is
a very simple statement. As soon as it mentions more than one var it
should be converted to an if. This applies even more so to the
modifier form.

Yves

-- 
perl -Mre=debug -e /just|another|perl|hacker/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: CodingGuidelines Perl amendment

2013-02-06 Thread demerphq
On 6 February 2013 19:35, Ted Zlatanov t...@lifelogs.com wrote:
 On Wed, 6 Feb 2013 19:25:43 +0100 demerphq demer...@gmail.com wrote:

 d On 6 February 2013 19:05, Ted Zlatanov t...@lifelogs.com wrote:
 On Wed, 06 Feb 2013 08:29:30 -0800 Junio C Hamano gits...@pobox.com wrote:

 JCH Is it ever (as opposed to not always) possible to omit braces?

 Oh yes!  Not that I recommend it, and I'm not even going to touch on
 Perl Golf :)

 d I think you are wrong. Can you provide an example?

 d Larry specifically wanted to avoid the dangling else problem that C
 d suffers from, and made it so that blocks are mandatory. The only
 d exception is statement modifiers, which are not only allowed to omit
 d the braces but also the parens on the condition.

 Oh, perhaps I didn't state it correctly.  You can avoid braces, but not
 if you want to use if/elsif/else/unless/etc. which require them:

 condition  do_this();
 condition || do_this();
 condition ? do_this() : do_that();

 (and others I can't recall right now)

 But my point was only that it's always possible to get around these
 artificial restrictions; it's more important to ask for legible sensible
 code.  Sorry if that was unclear!

Ah ok. Right, at a low level:

if (condition) { do_this() }

is identical to

condition  do_this();

IOW, Perl allows logical operators to act as control flow statements.

I hope your document include something that says that using logical
operators as control flow statements should be used sparingly, and
generally should be restricted to low precedence operators and should
never involve more than one operator.

Yves




-- 
perl -Mre=debug -e /just|another|perl|hacker/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: push race

2012-10-15 Thread demerphq
On 15 October 2012 16:09, Ævar Arnfjörð Bjarmason ava...@gmail.com wrote:
 On Mon, Oct 15, 2012 at 11:14 AM, Angelo Borsotti
 angelo.borso...@gmail.com wrote:
 Hello,

 FWIW we have a lot of lemmings pushing to the same ref all the time at
 $work, and while I've seen cases where:

  1. Two clients try to push
  2. They both get the initial lock
  3. One of them fails to get the secondary lock (I think updating the ref)

 I've never seen cases where they clobber each other in #3 (and I would
 have known from dude, where's my commit that I just pushed reports).

Except that the error message is really cryptic. It definitely doesnt
shout out maybe you collided with someone elses push.

Yves

-- 
perl -Mre=debug -e /just|another|perl|hacker/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Ignore on commit

2012-10-04 Thread demerphq
On 5 October 2012 03:00, Andrew Ardill andrew.ard...@gmail.com wrote:
 On 5 October 2012 07:20, Marco Craveiro marco.crave...@gmail.com wrote:
 ...
 Similar but not quite; the idea is that you know that there is some
 code (I'm just talking about files here, so lets ignore hunks for the
 moment) which is normally checked in but for a period of time you want
 it ignored. So you don't want it git ignored but at the same time you
 don't want to see these files in the list of modified files.

 What is the reason git ignore is no good in this case? Is it simply
 that you can't see the ignored files in git status, or is it that
 adding and removing entries to .gitignore is too cumbersome? If it's
 the latter you could probably put together a simple shell wrapper to
 automate the task, as otherwise it seems like git ignore does what you
 need.

Git ignore doesn't ignore tracked files.

Yves


-- 
perl -Mre=debug -e /just|another|perl|hacker/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Should GIT_AUTHOR_{NAME,EMAIL} set the tagger name/email?

2012-09-11 Thread demerphq
On 11 September 2012 18:53, Junio C Hamano gits...@pobox.com wrote:
 Ævar Arnfjörð Bjarmason ava...@gmail.com writes:

 On Sat, Sep 1, 2012 at 6:12 PM, Andreas Schwab sch...@linux-m68k.org wrote:
 Ævar Arnfjörð Bjarmason ava...@gmail.com writes:

 I don't get what you mean, what committer info?

 GIT_COMMITTER_{NAME,EMAIL}.  A tagger isn't really an author.

 Ah, am I the only one that finds that a bit counterintuitive to the
 point of wanting to submit a patch to change it?

 If you've created a tag you're the *author* of that tag, the
 author/committer distinction for commit objects is there for e.g.
 rebases and applying commits via e.g. git-am.

 We don't have a similar facility for tags (you have to push them
 around directly), but we *could* and in that case having a
 Tag-Committer as well well as a Tagger would make sense.

 Junio, what do you think?

 Unless your name is Linus Torvalds and it is early in year 2005, I
 wouldn't even think about it.

 When we introduced tagger name can be overriden with environment,
 we could have added GIT_TAGGER_{NAME,EMAIL}, but we didn't.  Given
 that tagging happens far far less often than committing, I think it
 was a sensible thing to do.

 It is a perfectly normal thing in Git for you to commit a patch
 authored by other people on behalf of them (and that is why AUTHOR
 exists as a separate name from the committer), but you still stand
 behind the commits you create by setting COMMITTER of them to you.
 The fact that it was _you_ who create the tag has similar weight
 that you have your name as the committer in commit objects, so in
 that sense, I think the semantics used for the name in tag is far
 closer to COMMITTER than AUTHOR.

 I guess I wouldn't mind too much if git tag learned a --tagger
 option, and honored GIT_TAGGER_{NAME,EMAIL} if set (and otherwise,
 fall back to GIT_COMMITTER_{NAME,EMAIL}), but I do not know if it is
 worth it.  How often would you want to _lie_ about your identity
 when you are tagging, and what legitimate reason do you have for
 doing so?

Interestingly this came up because of the opposite problem. We wanted
to *prevent* users from telling lies about who they are.

IOW, when we do a rollout with git-deploy we want to automatically set
their username from a secondary authenticated source before we create
a rollout tag in their name.

cheers,
Yves




-- 
perl -Mre=debug -e /just|another|perl|hacker/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Sync production with Git

2012-08-08 Thread demerphq
On 8 August 2012 15:11, kiranpyati kiran.py...@infobeans.com wrote:
 I am new to github,

 Earlier we used to manually upload files on the production through FTP
 although git was present on the production. Due to this now git status shows
 many modified and untrack files.

 To sync that with git we have downloaded all files from production and
 committed to git. Now git has all files same as production.

 We have not pulled on production since last 6 months and because of this it
 shows modified and untracked files.

 Now if we pull on the production there any 100% chances of the conflict
 happened on all modified files. As there are hundreds of modified files
 since last since month. Git pull will show conflict to all those files. In
 that case site will get down and we can not afford this.

 We want a way to seamlessly sync production and Git.

 Can anybody please help me on this?

 Thanks in advance..!!

Try git-deploy.

https://github.com/git-deploy

It contains a full work flow management for handling rollouts from git.

Yves



-- 
perl -Mre=debug -e /just|another|perl|hacker/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Sync production with Git

2012-08-08 Thread demerphq
On 9 August 2012 06:21, demerphq demer...@gmail.com wrote:
 On 8 August 2012 15:11, kiranpyati kiran.py...@infobeans.com wrote:
 I am new to github,

 Earlier we used to manually upload files on the production through FTP
 although git was present on the production. Due to this now git status shows
 many modified and untrack files.

 To sync that with git we have downloaded all files from production and
 committed to git. Now git has all files same as production.

 We have not pulled on production since last 6 months and because of this it
 shows modified and untracked files.

 Now if we pull on the production there any 100% chances of the conflict
 happened on all modified files. As there are hundreds of modified files
 since last since month. Git pull will show conflict to all those files. In
 that case site will get down and we can not afford this.

 We want a way to seamlessly sync production and Git.

 Can anybody please help me on this?

 Thanks in advance..!!

 Try git-deploy.

 https://github.com/git-deploy

 It contains a full work flow management for handling rollouts from git.

Better link:

https://github.com/git-deploy/git-deploy

Yves


-- 
perl -Mre=debug -e /just|another|perl|hacker/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html