Re: [PATCH] branch.c: simplify chain of if statements

2014-03-17 Thread Ævar Arnfjörð Bjarmason
On Mon, Mar 17, 2014 at 12:46 PM, Dragos Foianu dragos.foi...@gmail.com wrote:
 The reason I did not go with this is because I would still need the four ifs
 in order to keep the bug check part of the code. I might be able to find a
 work-around for it on the second attempt.

 I have seen N_() used in other code but I wasn't sure what its purpose was.

Aside from other comments here, more generally if you see code that
looks odd it helps to see why it was introduced initially.

In this case if you'd ran e.g.:

git log --reverse -p -G'Branch %s set up to track remote branch %s
from %s by rebasing' -- branch.c

or otherwise searched for the first occurrence of that odd-looking
code you'd have gotten:

commit d53a3503
Author: Nguyễn Thái Ngọc Duy pclo...@gmail.com
Date:   Thu Jun 7 19:05:10 2012 +0700

Remove i18n legos in notifying new branch tracking setup

Signed-off-by: Nguyễn Thái Ngọc Duy pclo...@gmail.com
Signed-off-by: Junio C Hamano gits...@pobox.com

And searching for that commit has plenty of context for why that was
done: 
https://www.google.com/search?q=%22Remove%20i18n%20legos%20in%20notifying%20new%20branch%20tracking%20setup%22
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Git push race condition?

2014-03-24 Thread Ævar Arnfjörð Bjarmason
On Mon, Mar 24, 2014 at 8:18 PM, Scott Sandler
scott.m.sand...@gmail.com wrote:
 I run a private Git repository (using Gitlab) with about 200 users
 doing about 100 pushes per day.

Ditto but about 2x those numbers.

 error: Ref refs/heads/master is at
 4584c1f34e07cea2df6abc8e0d407fe016017130 but expected
 61b79b6d35b066d054fb3deab550f1c51598cf5f
 remote: error: failed to lock refs/heads/master

I also see this error once in a while. I read the code a while back
and it's basically because there's two levels of locks that
receive-pack tries to get, and it's possible for two pushers to get
the first lock due to a race condition.

I've never seen data loss due to this though, because the inner lock is atomic.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Borrowing objects from nearby repositories

2014-03-24 Thread Ævar Arnfjörð Bjarmason
On Wed, Mar 12, 2014 at 4:37 AM, Andrew Keller and...@kellerfarm.com wrote:
 Hi all,

 I am considering developing a new feature, and I'd like to poll the group for 
 opinions.

 Background: A couple years ago, I wrote a set of scripts that speed up 
 cloning of frequently used repositories.  The scripts utilize a bare Git 
 repository located at a known location, and automate providing a --reference 
 parameter to `git clone` and `git submodule update`.  Recently, some 
 coworkers of mine expressed an interest in using the scripts, so I published 
 the current version of my scripts, called `git repocache`, described at the 
 bottom of https://github.com/andrewkeller/ak-git-tools.

 Slowly, it has occurred to me that this feature, or something similar to it, 
 may be worth adding to Git, so I've been thinking about the best approach.  
 Here's my best idea so far:

 1)  Introduce '--borrow' to `git-fetch`.  This would behave similarly to 
 '--reference', except that it operates on a temporary basis, and does not 
 assume that the reference repository will exist after the operation 
 completes, so any used objects are copied into the local objects database.  
 In theory, this mechanism would be distinct from '--reference', so if both 
 are used, some objects would be copied, and some objects would be accessible 
 via a reference repository referenced by the alternates file.

Isn't this the same as git clone --reference path --no-hardlinks url ?

Also without --no-hardlinks we're not assuming that the other repo
doesn't go away (you could rm-rf it), just that the files won't be
*modified*, which Git won't do, but you could manually do with other
tools, so the default is to hardlink.

 2)  Teach `git fetch` to read 'repocache.path' (or a better-named 
 configuration), and use it to automatically activate borrowing.

So a default path for --reference path --no-hardlinks ?

 3)  For consistency, `git clone`, `git pull`, and `git submodule update` 
 should probably all learn '--borrow', and forward it to `git fetch`.

 4)  In some scenarios, it may be necessary to temporarily not automatically 
 borrow, so `git fetch`, and everything that calls it may need an argument to 
 do that.

 Intended outcome: With 'repocache.path' set, and the cached repository 
 properly updated, one could run `git clone url`, and the operation would 
 complete much faster than it does now due to less load on the network.

 Things I haven't figured out yet:

 *  What's the best approach to copying the needed objects?  It's probably 
 inefficient to copy individual objects out of pack files one at a time, but 
 it could be wasteful to copy entire pack files just because you need one 
 object.  Hard-linking could help, but that won't always be available.  One of 
 my previous ideas was to add a '--auto-repack' option to `git-clone`, which 
 solves this problem better, but introduces some other front-end usability 
 problems.
 *  To maintain optimal effectiveness, users would have to regularly run a 
 fetch in the cache repository.  Not all users know how to set up a scheduled 
 task on their computer, so this might become a maintenance problem for the 
 user.  This kind of problem I think brings into question the viability of the 
 underlying design here, assuming that the ultimate goal is to clone faster, 
 with very little or no change in the use of git.


 Thoughts?

 Thanks,
 Andrew Keller

 --
 To unsubscribe from this list: send the line unsubscribe git in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Get all tips quickly

2014-04-13 Thread Ævar Arnfjörð Bjarmason
On Sun, Apr 13, 2014 at 4:19 PM, Kirill Likhodedov
kirill.likhode...@jetbrains.com wrote:
 Hi,

 What is fastest possible way to get all “tips” (leafs of the Git log graph) 
 in a Git repository with hashes of commits they point to?

Tried git for-each-ref and the various options it has?

Doing this for 35k tags is still going to be non-trivial.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: general question about git

2014-04-21 Thread Ævar Arnfjörð Bjarmason
On Mon, Apr 21, 2014 at 3:17 PM, Miller, Hugh hughmil...@chevron.com wrote:
 I am interested in exploring the possibility of using versioning for data, 
 that is versioning non-text, non-code file sets. Typical examples are the 
 data files or project files used by some application. These file sets 
 typically contain binary files; these files can be somewhat large, 1GB to 
 10GB is not unusual.

 Would git be a suitable tool for this purpose ?

 Ideally, even if the data files can be versioned this way, one would probably 
 prefer to build the versioning tools into the application.

 Would the git libraries be suitable for this further aim ?

Stock Git is still unsuitable for this purpose, but I recommend you
check out git-annex.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Big Java repositories to play with?

2014-05-07 Thread Ævar Arnfjörð Bjarmason
On Wed, May 7, 2014 at 3:23 PM, Duy Nguyen pclo...@gmail.com wrote:
 I need some big Java repos (over 100k files) to test git status.
 Actually any repos with long path names and deep/wide directory
 structure are fine, not only Java ones. Right now I'm aware of
 gentoo-x86 and webkit. Let me know if you know some others. I'm afraid
 my Google-foo is not strong enough to search these repos.

1. Take a small repo with a small src directory
2. for i in {1..100}; do cp -Rvp src src-$i; done
3. git add src-*; git commit -mbigger

For some value of 100 you'll end up with a big repo to test git status on.

You just need lots of files to stat(), git status doesn't care about
history, so there's no reason why you need to track down an existing
large repository.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] git-add--interactive: Preserve diff heading when splitting hunks

2014-05-11 Thread Ævar Arnfjörð Bjarmason
Change the display of hunks in hunk splitting mode to preserve the diff
heading, which hasn't been done ever since the hunk splitting was
initially added in v1.4.4.2-270-g835b2ae.

Splitting the first hunk of this patch will now result in:

Stage this hunk [y,n,q,a,d,/,j,J,g,s,e,?]? s
Split into 2 hunks.
@@ -792,7 +792,7 @@ sub hunk_splittable {
[...]

Instead of:

Stage this hunk [y,n,q,a,d,/,j,J,g,s,e,?]? s
Split into 2 hunks.
@@ -792,7 +792,7 @@
[...]

This makes it easier to use the tool when you're splitting some giant
hunk and can't remember in which function you are anymore.

The diff is somewhat larger than I initially expected because in order
to display the headings in the same color scheme as the output from
git-diff(1) itself I had to split up the code that would previously
color diff output that previously consisted entirely of the fraginfo,
but now consists of the fraginfo and the diff heading (the latter of
which isn't colored).

Signed-off-by: Ævar Arnfjörð Bjarmason ava...@gmail.com
---
 git-add--interactive.perl | 40 
 1 file changed, 24 insertions(+), 16 deletions(-)

diff --git a/git-add--interactive.perl b/git-add--interactive.perl
index 1fadd69..ed1e564 100755
--- a/git-add--interactive.perl
+++ b/git-add--interactive.perl
@@ -792,11 +792,11 @@ sub hunk_splittable {
 
 sub parse_hunk_header {
my ($line) = @_;
-   my ($o_ofs, $o_cnt, $n_ofs, $n_cnt) =
-   $line =~ /^@@ -(\d+)(?:,(\d+))? \+(\d+)(?:,(\d+))? @@/;
+   my ($o_ofs, $o_cnt, $n_ofs, $n_cnt, $heading) =
+   $line =~ /^@@ -(\d+)(?:,(\d+))? \+(\d+)(?:,(\d+))? @@(.*)/;
$o_cnt = 1 unless defined $o_cnt;
$n_cnt = 1 unless defined $n_cnt;
-   return ($o_ofs, $o_cnt, $n_ofs, $n_cnt);
+   return ($o_ofs, $o_cnt, $n_ofs, $n_cnt, $heading);
 }
 
 sub split_hunk {
@@ -808,8 +808,7 @@ sub split_hunk {
# If there are context lines in the middle of a hunk,
# it can be split, but we would need to take care of
# overlaps later.
-
-   my ($o_ofs, undef, $n_ofs) = parse_hunk_header($text-[0]);
+   my ($o_ofs, undef, $n_ofs, undef, $heading) = 
parse_hunk_header($text-[0]);
my $hunk_start = 1;
 
   OUTER:
@@ -886,17 +885,26 @@ sub split_hunk {
my $o_cnt = $hunk-{OCNT};
my $n_cnt = $hunk-{NCNT};
 
-   my $head = (@@ -$o_ofs .
-   (($o_cnt != 1) ? ,$o_cnt : '') .
-+$n_ofs .
-   (($n_cnt != 1) ? ,$n_cnt : '') .
-@@\n);
-   my $display_head = $head;
-   unshift @{$hunk-{TEXT}}, $head;
-   if ($diff_use_color) {
-   $display_head = colored($fraginfo_color, $head);
-   }
-   unshift @{$hunk-{DISPLAY}}, $display_head;
+   my $fraginfo = join(
+   ,
+   @@ -$o_ofs,
+   (($o_cnt != 1) ? ,$o_cnt : ''),
++$n_ofs,
+   (($n_cnt != 1) ? ,$n_cnt : ''),
+@@
+   );
+   unshift @{$hunk-{TEXT}}, join(
+   ,
+   $fraginfo,
+   $heading,
+   \n
+   );
+   unshift @{$hunk-{DISPLAY}}, join(
+   ,
+   $diff_use_color ? colored($fraginfo_color, $fraginfo) : 
$fraginfo,
+   $heading,
+   \n
+   );
}
return @split;
 }
-- 
2.0.0.rc0

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] git-add--interactive: Preserve diff heading when splitting hunks

2014-05-12 Thread Ævar Arnfjörð Bjarmason
On Mon, May 12, 2014 at 8:39 PM, Jeff King p...@peff.net wrote:
 On Sun, May 11, 2014 at 04:09:56PM +, Ævar Arnfjörð Bjarmason wrote:

 Change the display of hunks in hunk splitting mode to preserve the diff
 heading, which hasn't been done ever since the hunk splitting was
 initially added in v1.4.4.2-270-g835b2ae.

 Splitting the first hunk of this patch will now result in:

 Stage this hunk [y,n,q,a,d,/,j,J,g,s,e,?]? s
 Split into 2 hunks.
 @@ -792,7 +792,7 @@ sub hunk_splittable {
 [...]

 Instead of:

 Stage this hunk [y,n,q,a,d,/,j,J,g,s,e,?]? s
 Split into 2 hunks.
 @@ -792,7 +792,7 @@
 [...]

 This makes it easier to use the tool when you're splitting some giant
 hunk and can't remember in which function you are anymore.

 This makes a lot of sense to me. I did notice two interesting quirks,
 one of which might be worth addressing.

 One, there is a slightly funny artifact in that the hunk header comes
 from the top of the context line, and that top is a different position
 for each of the split hunks. So in a file like:

   header_A
   content
   header_B
   one
   two
   three
   four

 you might have a diff like:

   @@ ... @@ header_A
header_B
one
two
   +new line 1
three
   +new line 2
four

 The hunk header for new line 1 is A, because B itself is part of
 the context. But the hunk header for new line 2, if it were an
 independent hunk, would be B. We print A because we copy it from the
 original hunk.

 It probably won't matter much in practice (and I can even see an
 argument that A is the right answer). And figuring out B here
 would be prohibitively difficult, I would think, as it would require
 applying the funcname rules internal to git-diff to a hunk that git-diff
 itself never actually sees.

 Since the output from your patch is strictly better than what we saw
 before, I think there is no reason we cannot leave such an improvement
 to later (or never).

Good suggestion, but tricky as you point out. Another thing I've
wanted many times is to make it smart enough that when you edit code
like:

  A()
  B();

And change it to:

  X();

  Y();

The change from A-X and B-Y may be completely unrelated and just
made in code where the author didn't add whitespace between unrelated
statements.

But because you change all the lines the tool can't split them up, it
could try harder and split hunks like that if you add a whitespace
boundary, or just go all the way down to adding/removing individual
lines, so you wouldn't have to fall down to edit mode and do so
manually.


 The diff is somewhat larger than I initially expected because in order
 to display the headings in the same color scheme as the output from
 git-diff(1) itself I had to split up the code that would previously
 color diff output that previously consisted entirely of the fraginfo,
 but now consists of the fraginfo and the diff heading (the latter of
 which isn't colored).

 The func heading is not colored by default, but you can configure it to
 be so with color.diff.func. I double-checked the behavior with your
 patch: you end up with the uncolored header in the split hunks, because
 it is parsed from the uncolored line. Which is not bad, but I think we
 can trivially do better, just by adding back in the color as we do with
 the fraginfo.

 Like:

 diff --git a/git-add--interactive.perl b/git-add--interactive.perl
 index ed1e564..ac5763d 100755
 --- a/git-add--interactive.perl
 +++ b/git-add--interactive.perl
 @@ -29,6 +29,10 @@ my ($fraginfo_color) =
 $diff_use_color ? (
 $repo-get_color('color.diff.frag', 'cyan'),
 ) : ();
 +my ($funcname_color) =
 +   $diff_use_color ? (
 +   $repo-get_color('color.diff.func', ''),
 +   ) : ();
  my ($diff_plain_color) =
 $diff_use_color ? (
 $repo-get_color('color.diff.plain', ''),
 @@ -902,7 +906,7 @@ sub split_hunk {
 unshift @{$hunk-{DISPLAY}}, join(
 ,
 $diff_use_color ? colored($fraginfo_color, $fraginfo) 
 : $fraginfo,
 -   $heading,
 +   $diff_use_color ? colored($funcname_color, $heading) 
 : $heading,
 \n
 );
 }

 I didn't prepare a commit message because I think it should probably
 just be squashed in.

Well spotted, indeed, that should be squashed in.

On a related note I thought by doing color.ui=auto I was turning on
all the colors, it would be nice if there was a built-in colorscheme
that added more coloring to items like these across our tools, it's
useful to have the hunk headers colored differently so they stand out
more.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] git related v0.3

2014-05-19 Thread Ævar Arnfjörð Bjarmason
On Mon, May 19, 2014 at 2:36 AM, Felipe Contreras
felipe.contre...@gmail.com wrote:
 This tool finds people that might be interested in a patch, by going
 back through the history for each single hunk modified, and finding
 people that reviewed, acknowledged, signed, or authored the code the
 patch is modifying.

 It does this by running `git blame` incrementally on each hunk, and
 finding the relevant commit message. After gathering all the relevant
 people, it groups them to show what exactly was their role when the
 participated in the development of the relevant commit, and on how many
 relevant commits they participated. They are only displayed if they pass
 a minimum threshold of participation.

 It is similar the the `git contacts` tool in the contrib area, which is a
 rewrite of this tool, except that `git contacts` does the absolute minimum;
 `git related` is way superior in every way.

The general heuristic I use, which I've found to be much better than
git-blame is:

 1. Find substrings of code I'm directly removing/altering, and
functions I'm removing/altering
 2. Do git log --reverse -p -S'substr' (maybe with -- file) for a
list of substrings

I've generally found that to be a better heuristic to start with in
both git.git and non-git.git code, blame tends to bias the view
towards giving you people who've just moved the code around or made
minor changes (are you at least using blame -w?).

We recently discussed having a tool like this at work to aid in our
review process, but I pointed out there that you had to be careful
with how it was written, e.g. if you rank importance as a function of
the number of commits you're now going to bother people more with
review requests if they make granular commits, whereas what you
actually want is to contact the significant authors, which generally
speaking can be defined as the original authors of the code you're
altering or replacing.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ANNOUNCE] A pre-receive hook to intelligently block binary data

2014-10-31 Thread Ævar Arnfjörð Bjarmason
After searching around a bit I couldn't find a stand-alone Git hook
that would intelligently block binary data pushes so I wrote my own:
https://github.com/avar/pre-receive-reject-binaries

Main features:

 * Quota per-commit for how much binary data is OK
 * Ability to optionally allow users to override binary pushes by
including a notice in their commit messages
 * Doesn't disallow removing existing binary data, or renaming
existing binary files
 * Will block commits that include references to existing binary blobs though
 * Spots cases where a push is pushing commits that add and then
remove binary blobs (i.e. counts net additions)
 * Has hookable support for logging by piping its output to external
commands when it runs or when it rejects/unblocks a binary push. I'm
using this for logging its output to a logfile, and to send E-Mails
when it blocks/is unblocked.
* Only requires a stock perl install, should run on any *nix-like OS
out of the box
* Should be relatively fast compared to some other similar solutions
I've seen, i.e. it parses the output of one git-log --stat command
for the entire push, and doesn't e.g. do a git show for each commit
being pushed.

One general note about git-log output: I was disappointed to see that
there was no easily parsable git log output that showed you how much
binary files increased in size, --numstat will just show - for
binary files, and it's non-trivial to parse the --stat output. It's
meant for human consumption and will sometimes include variations in
how much whitespace is inserted.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


The gitweb author initials feature from a36817b doesn't work with i18n names

2013-08-29 Thread Ævar Arnfjörð Bjarmason
The @author_initials feature Jakub added in a36817b claims to use a
i18n regexp (/\b([[:upper:]])\B/g), but in Perl this doesn't actually
do anything unless the string being matched against has the UTF8 flag.

So as a result it abbreviates me to AB not ÆAB. Here's something
that demonstrates the issue:

$ cat author-initials.pl
#!/usr/bin/env perl
use strict;
use warnings;

#binmode STDOUT, ':utf8';
open my $fd, -|, git, blame, --incremental, --, Makefile
or die Can't open: $!;
#binmode $fd, :utf8;
while (my $line = $fd) {
next unless my ($author) = $line =~ /^author (.*)/;
my @author_initials = ($author =~ /\b([[:upper:]])\B/g);
printf %s (%s)\n,  join(, @author_initials), $author;
}

With those two binmode commands commented out:

$ perl author-initials.pl |sort|uniq -c|sort -nr|head -n 5
 99 JH (Junio C Hamano)
 35 JN (Jonathan Nieder)
 35 JK (Jeff King)
 20 JS (Johannes Schindelin)
 16 AB (Ævar Arnfjörð Bjarmason)

And uncommented:

$ perl author-initials.pl |sort|uniq -c|sort -nr|head -n 5
 99 JH (Junio C Hamano)
 35 JN (Jonathan Nieder)
 35 JK (Jeff King)
 20 JS (Johannes Schindelin)
 16 ÆAB (Ævar Arnfjörð Bjarmason)

Jakub, do you see a reason not to just apply this:

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index f429f75..29b3fb5 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -6631,6 +6631,7 @@ sub git_blame_common {
$hash_base, '--', $file_name
or die_error(500, Open git-blame --porcelain failed);
}
+   binmode $fd, :utf8;

# incremental blame data returns early
if ($format eq 'data') {

I haven't gotten an env where I can test gitweb running, but that
looks like it should work to me.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] gitweb: Fix the author initials in blame for non-ASCII names

2013-08-30 Thread Ævar Arnfjörð Bjarmason
Change the @author_initials feature Jakub added in
v1.6.4-rc2-14-ga36817b to match non-ASCII author initials as intended.

The regexp Jakub added was intended to match
non-ASCII (/\b([[:upper:]])\B/g). But in Perl this doesn't actually
match non-ASCII upper-case characters unless the string being matched
against has the UTF8 flag.

So when we open a pipe to git blame we need to mark the file
descriptor we're opening as utf8 explicitly.

So as a result it abbreviates me to AB not ÆAB, entirely because Æ
isn't /[[:upper:]]/ unless the string being matched against has the UTF8
flag.

Here's something that demonstrates the issue:

#!/usr/bin/env perl
use strict;
use warnings;

binmode STDOUT, ':utf8' if $ENV{UTF8};
open my $fd, -|, git, blame, --incremental, --, Makefile or die 
Can't open: $!;
binmode $fd, :utf8 if $ENV{UTF8};
while (my $line = $fd) {
next unless my ($author) = $line =~ /^author (.*)/;
my @author_initials = ($author =~ /\b([[:upper:]])\B/g);
printf %s (%s)\n,  join(, @author_initials), $author;
}

When that's run with and without UTF8 being true in the environment it
gives, on git.git:

$ UTF8=0 perl author-initials.pl | sort | uniq -c |
sort -nr | head -n 5
 99 JH (Junio C Hamano)
 35 JN (Jonathan Nieder)
 35 JK (Jeff King)
 20 JS (Johannes Schindelin)
 16 AB (Ævar Arnfjörð Bjarmason)
$ UTF8=1 perl author-initials.pl | sort | uniq -c |
sort -nr | head -n 5
 99 JH (Junio C Hamano)
 35 JN (Jonathan Nieder)
 35 JK (Jeff King)
 20 JS (Johannes Schindelin)
 16 ÆAB (Ævar Arnfjörð Bjarmason)

Acked-by: Jakub Narębski jna...@gmail.com
Tested-by: Ævar Arnfjörð Bjarmason ava...@gmail.com
Tested-by: Simon Ruderich si...@ruderich.org
---
 gitweb/gitweb.perl | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index f429f75..ad48a5a 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -6631,6 +6631,7 @@ sub git_blame_common {
$hash_base, '--', $file_name
or die_error(500, Open git-blame --porcelain failed);
}
+   binmode $fh, ':utf8';
 
# incremental blame data returns early
if ($format eq 'data') {
-- 
1.8.4.rc2

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] gitweb: Fix the author initials in blame for non-ASCII names

2013-08-31 Thread Ævar Arnfjörð Bjarmason
I did. I just clumsily sent out the wrong patch. I.e. tested it
manually on another system, and then fat-fingered $fh instead of $fd.

Should I send another patch or do you want to just fix this one up?

On Fri, Aug 30, 2013 at 8:13 PM, Junio C Hamano gits...@pobox.com wrote:
 Junio C Hamano gits...@pobox.com writes:

 Ævar Arnfjörð Bjarmason  ava...@gmail.com writes:

 Acked-by: Jakub Narębski jna...@gmail.com
 Tested-by: Ævar Arnfjörð Bjarmason ava...@gmail.com
 Tested-by: Simon Ruderich si...@ruderich.org
 ---
 +++ b/gitweb/gitweb.perl
 @@ -6631,6 +6631,7 @@ sub git_blame_common {
 ...
 +binmode $fh, ':utf8';


 [Fri Aug 30 17:48:17 2013] gitweb.perl: Global symbol $fh requires
 explicit package name at 
 /home/gitster/w/buildfarm/next/t/../gitweb/gitweb.perl line 6634.
 [Fri Aug 30 17:48:17 2013] gitweb.perl: Execution of 
 /home/gitster/w/buildfarm/next/t/../gitweb/gitweb.perl aborted due to 
 compilation errors.

 I think in this function the filehandle is called $fd, not $fh.  Has
 any of you really tested this???
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Existing utility to track compiled files in another sister repository, for rollouts

2012-08-23 Thread Ævar Arnfjörð Bjarmason
On Thu, Aug 23, 2012 at 6:28 PM, Ævar Arnfjörð Bjarmason
ava...@gmail.com wrote:
 I'm planning on using Git for a deployment process where the steps are
 basically:

  1. You log into a deployment host, cd into software.git, do git pull
  2. A tool runs make for you, creates a deployment-MMDD-HHMMSS tag
  3. That make step will create a bunch of generated (text) files
  4. Get a list of these with : git clean -dxfn
  5. Copy those to to software-generated.git, removing any that we
 didn't just create, adding any that are new
  6. Commit that, tag it with generated-deployment-MMDD-HHMMSS
  7. Push out both our generated software.git and
 software-generated.git tag to our servers
  8. git reset --hard both of those to our newly pushed out tags
  9. Do git clean -dxf on software.git remove old generated files
  10. Copy new generated files from generated-software.git to software.git
  11. Restart our application to pick up the new software

 For this I'll need to write some git snapshot-commit tool for #5 and
 #6 to commit whatever the current state of the directory is (with
 removed/added files), and hack up something to do #9-#10.

 This should all be relatively easy, I was just wondering if there was
 any prior art on this that I could use instead of hacking it up
 myself.

Here's a quick hack that does #4-6 but not #9-10 yet, although that
would be easy: https://gist.github.com/3440792

Suggestions for improvements welcome, particularly whether there's a
simpler way to do this to nuke existing files in a repo and replace it
with new files all staged for commit:

# Go to the target repository, nuke anything already there
chdir $to_repository;
system git reset --hard;
system git clean -dxf;
system git ls-tree --name-only HEAD -z | xargs -0 rm -rf;
system git add --update; # stage any removals

Followed by:

system tar xvf incoming.tar;
system rm incoming.tar;
system git add * .??* || :; # Might die if we empty the repo,
TODO: make this use status - add each file
system git commit -m'Bump copy from $from_repository to
$to_repository' || :; # We might have nothing to change!
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/6] Gettext poison rework

2012-08-24 Thread Ævar Arnfjörð Bjarmason
On Fri, Aug 24, 2012 at 7:43 AM, Nguyễn Thái Ngọc Duy pclo...@gmail.com wrote:
 Still WIP but I'm getting closer. I dropped test-poisongen and started
 to use podebug [2] instead. Less code in git. podebug does not preserve
 shell variables yet. I'll follow that up at upstream [1].

 With this series, if you have translation toolkit installed, you could
 do

 make pseudo-locale L=your language code
 make GETTEXT_POISON=$LANG test

 podebug supports a few way of rewriting translations. Currently
 unicode is used but you can change it via PODEBUG_OPTS

 t9001 is not happy with $LANG != C though. May need to add some
 prereq there.

 [1] http://bugs.locamotion.org/show_bug.cgi?id=2450
 [2] http://translate.sourceforge.net/wiki/toolkit/podebug

The reason I didn't do something like this to begin with is that
gettext/glibc doesn't have support for fake locales, so you'd have to
appropriate a real one for tests. It's good to see you poking the
gettext mailing list about adding support far thot.

But something like podebug gets around that quite nicely, so we can
still have the testing the poison stuff was intended for, without the
complexity of supporting it throughout all our i18n code.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Should GIT_AUTHOR_{NAME,EMAIL} set the tagger name/email?

2012-09-01 Thread Ævar Arnfjörð Bjarmason
Maybe this is documented in some place I didn't spot, but I expected
that when I set GIT_AUTHOR_{NAME,EMAIL} it would affect the operation
of git-tag, but it doesn't seem to. When I create tags it seems to
completely ignore those variables.

Should it be doing that? Here's a test script demonstrating the issue:

#!/bin/sh -e
# Set defaults
git config --global user.name Ævar Arnfjörð Bjarmason
git config --global user.email ava...@gmail.com

rm -rf /tmp/test-git
git init /tmp/test-git
cd /tmp/test-git

make_commit() {
file=$1
content=$2
echo $content $file
git add $file
git commit -m$file: $content $file
git --no-pager log -1 HEAD | grep ^Author
}

make_commit README testing content
git config user.name Test User
git config user.email t...@example.com
make_commit README testing content again
git tag -a -mannotated tag tag-name-1
git --no-pager show tag-name-1 | grep ^Author

GIT_AUTHOR_NAME=Tag Test User
GIT_AUTHOR_EMAIL=tagt...@example.com git tag -a -manother annotated
tag tag-name-2
git --no-pager show tag-name-2 | grep ^Author

Which outputs:

$ sh /tmp/test-tag.sh
Initialized empty Git repository in /tmp/test-git/.git/
[master (root-commit) 9816756] README: testing content
 1 file changed, 1 insertion(+)
 create mode 100644 README
Author: Ævar Arnfjörð Bjarmason ava...@gmail.com
[master 304b71e] README: testing content again
 1 file changed, 1 insertion(+), 1 deletion(-)
Author: Test User t...@example.com
Author: Test User t...@example.com
Author: Test User t...@example.com

I'd expect references to Tag Test User tagt...@example.com for the
second tag I created.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Should GIT_AUTHOR_{NAME,EMAIL} set the tagger name/email?

2012-09-01 Thread Ævar Arnfjörð Bjarmason
On Sat, Sep 1, 2012 at 5:57 PM, Andreas Schwab sch...@linux-m68k.org wrote:
 Ævar Arnfjörð Bjarmason ava...@gmail.com writes:

 git --no-pager show tag-name-1 | grep ^Author

 A tag doesn't have an author, it has a tagger.  This shows the author of
 the *commit*.

I got the grep wrong, I meant that I expected the tagger to be set
according to GIT_AUTHOR_{NAME,EMAIL}, but it isn't either:

$ sh /tmp/test-tag.sh
Initialized empty Git repository in /tmp/test-git/.git/
[master (root-commit) f83fc11] README: testing content
 1 file changed, 1 insertion(+)
 create mode 100644 README
Author: Ævar Arnfjörð Bjarmason ava...@gmail.com
[master ef65731] README: testing content again
 1 file changed, 1 insertion(+), 1 deletion(-)
Author: Test User t...@example.com
Tagger: Test User t...@example.com
Author: Test User t...@example.com
Tagger: Test User t...@example.com
Author: Test User t...@example.com

 GIT_AUTHOR_NAME=Tag Test User
 GIT_AUTHOR_EMAIL=tagt...@example.com git tag -a -manother annotated
 tag tag-name-2

 The tagger is controlled by the committer info.

I don't get what you mean, what committer info?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Does or could git handle file licensing information?

2012-09-05 Thread Ævar Arnfjörð Bjarmason
On Wed, Sep 5, 2012 at 12:51 PM, Yohann Ferreira
yohann.ferre...@orange.fr wrote:
 As a day-to-day hard git user ;), I also have to manage files with different
 licenses I need to track.
 As git handles all those files in a very smart way, I wondered whether git
 could also handle that information, at least somehow.

Say you have files like:

main.c
imglib.c
config.c

Why not just have these files:

license-info/GPL
license-info/SOME-OTHER-LICENSE

Which would contain, respectively:

main.c
config.c

And:

imglib.c

Then just have a script, maybe add it as a hook on your server before
it accepts a push which ensures that all files currently in the tree
are listed in those license-info/* files.

You could also just add a license header to each of these files, and
have a script that ensures that everything has such a header. I think
the Debian project has such a script that you could adapt.

Git just tracks files, so just do this in some file-based manner and
you'll be fine.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Should GIT_AUTHOR_{NAME,EMAIL} set the tagger name/email?

2012-09-11 Thread Ævar Arnfjörð Bjarmason
On Sat, Sep 1, 2012 at 6:12 PM, Andreas Schwab sch...@linux-m68k.org wrote:
 Ævar Arnfjörð Bjarmason ava...@gmail.com writes:

 I don't get what you mean, what committer info?

 GIT_COMMITTER_{NAME,EMAIL}.  A tagger isn't really an author.

Ah, am I the only one that finds that a bit counterintuitive to the
point of wanting to submit a patch to change it?

If you've created a tag you're the *author* of that tag, the
author/committer distinction for commit objects is there for e.g.
rebases and applying commits via e.g. git-am.

We don't have a similar facility for tags (you have to push them
around directly), but we *could* and in that case having a
Tag-Committer as well well as a Tagger would make sense.

Junio, what do you think?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/5] Import wildmatch from rsync

2012-10-02 Thread Ævar Arnfjörð Bjarmason
On Wed, Sep 26, 2012 at 1:25 PM, Nguyễn Thái Ngọc Duy pclo...@gmail.com wrote:
 These files are from rsync.git commit
 f92f5b166e3019db42bc7fe1aa2f1a9178cd215d, which was the last commit
 before rsync turned GPL-3. All files are imported as-is and
 no-op. Adaptation is done in a separate patch.

Perhaps Wayne Davison (added to CC) wouldn't mind giving us permission
to use the subsequent changes to these files under the GPLv2?

rsync.git $ git --no-pager log --pretty=%h %an %s\n --reverse
f92f5b166e3019db42bc7fe1aa2f1a9178cd215d.. -- '*wild*'
4fd842f Wayne Davison Switching to GPL 3.\n
8e41b68 Wayne Davison Tweaking the license text a bit more.\n
d3d07a5 Wayne Davison Include 2008 in the copyright years.\n
adc2476 Wayne Davison Output numbers in 3-digit groups by default
(e.g. 1,234,567). Also improved the human-readable output functions,
including adding the ability to output negative numbers.\n
b3bf9b9 Wayne Davison Update the copyright year.\n
fd91c3b Wayne Davison Fix two unused-variable compiler warnings.\n



 rsync.git   -  git.git
 lib/wildmatch.[ch]  wildmatch.[ch]
 wildtest.c  test-wildmatch.c
 wildtest.txtt/t3070/wildtest.txt

 Signed-off-by: Nguyễn Thái Ngọc Duy pclo...@gmail.com
 Signed-off-by: Junio C Hamano gits...@pobox.com
 ---
  t/t3070/wildtest.txt | 165 +++
  test-wildmatch.c | 222 +++
  wildmatch.c  | 368 
 +++
  wildmatch.h  |   6 +
  4 files changed, 761 insertions(+)
  create mode 100644 t/t3070/wildtest.txt
  create mode 100644 test-wildmatch.c
  create mode 100644 wildmatch.c
  create mode 100644 wildmatch.h

 diff --git a/t/t3070/wildtest.txt b/t/t3070/wildtest.txt
 new file mode 100644
 index 000..42c1678
 --- /dev/null
 +++ b/t/t3070/wildtest.txt
 @@ -0,0 +1,165 @@
 +# Input is in the following format (all items white-space separated):
 +#
 +# The first two items are 1 or 0 indicating if the wildmat call is expected 
 to
 +# succeed and if fnmatch works the same way as wildmat, respectively.  After
 +# that is a text string for the match, and a pattern string.  Strings can be
 +# quoted (if desired) in either double or single quotes, as well as 
 backticks.
 +#
 +# MATCH FNMATCH_SAME text to match 'pattern to use'
 +
 +# Basic wildmat features
 +1 1 foofoo
 +0 1 foobar
 +1 1 '' 
 +1 1 foo???
 +0 1 foo??
 +1 1 foo*
 +1 1 foof*
 +0 1 foo*f
 +1 1 foo*foo*
 +1 1 foobar *ob*a*r*
 +1 1 aaabababab *ab
 +1 1 foo*   foo\*
 +0 1 foobar foo\*bar
 +1 1 f\oo   f\\oo
 +1 1 ball   *[al]?
 +0 1 ten[ten]
 +1 1 ten**[!te]
 +0 1 ten**[!ten]
 +1 1 tent[a-g]n
 +0 1 tent[!a-g]n
 +1 1 tont[!a-g]n
 +1 1 tont[^a-g]n
 +1 1 a]ba[]]b
 +1 1 a-ba[]-]b
 +1 1 a]ba[]-]b
 +0 1 aaba[]-]b
 +1 1 aaba[]a-]b
 +1 1 ]  ]
 +
 +# Extended slash-matching features
 +0 1 foo/baz/barfoo*bar
 +1 1 foo/baz/barfoo**bar
 +0 1 foo/barfoo?bar
 +0 1 foo/barfoo[/]bar
 +0 1 foo/barf[^eiu][^eiu][^eiu][^eiu][^eiu]r
 +1 1 foo-barf[^eiu][^eiu][^eiu][^eiu][^eiu]r
 +0 1 foo**/foo
 +1 1 /foo   **/foo
 +1 1 bar/baz/foo**/foo
 +0 1 bar/baz/foo*/foo
 +0 0 foo/bar/baz**/bar*
 +1 1 deep/foo/bar/baz   **/bar/*
 +0 1 deep/foo/bar/baz/  **/bar/*
 +1 1 deep/foo/bar/baz/  **/bar/**
 +0 1 deep/foo/bar   **/bar/*
 +1 1 deep/foo/bar/  **/bar/**
 +1 1 foo/bar/baz**/bar**
 +1 1 foo/bar/baz/x  */bar/**
 +0 0 deep/foo/bar/baz/x */bar/**
 +1 1 deep/foo/bar/baz/x **/bar/*/*
 +
 +# Various additional tests
 +0 1 acrt   a[c-c]st
 +1 1 acrt   a[c-c]rt
 +0 1 ]  [!]-]
 +1 1 a  [!]-]
 +0 1 '' \
 +0 1 \  \
 +0 1 /\ */\
 +1 1 /\ */\\
 +1 1 foofoo
 +1 1 @foo   @foo
 +0 1 foo@foo
 +1 1 [ab]   \[ab]
 +1 1 [ab]   [[]ab]
 +1 1 [ab]   [[:]ab]
 +0 1 [ab]   [[::]ab]
 +1 1 [ab]   [[:digit]ab]
 +1 1 [ab]   [\[:]ab]
 +1 1 ?a?b   \??\?b
 +1 1 abc\a\b\c
 +0 1 foo''
 +1 1 foo/bar/baz/to **/t[o]
 +
 +# Character class tests
 +1 1 a1B

Re: upload-pack is slow with lots of refs

2012-10-03 Thread Ævar Arnfjörð Bjarmason
On Wed, Oct 3, 2012 at 8:03 PM, Jeff King p...@peff.net wrote:
 On Wed, Oct 03, 2012 at 02:36:00PM +0200, Ævar Arnfjörð Bjarmason wrote:

 I'm creating a system where a lot of remotes constantly fetch from a
 central repository for deployment purposes, but I've noticed that even
 with a remote.$name.fetch configuration to only get certain refs a
 git fetch will still call git-upload pack which will provide a list
 of all references.

 This is being done against a repository with tens of thousands of refs
 (it has a tag for each deployment), so it ends up burning a lot of CPU
 time on the uploader/receiver side.

 Where is the CPU being burned? Are your refs packed (that's a huge
 savings)? What are the refs like? Are they .have refs from an alternates
 repository, or real refs? Are they pointing to commits or tag objects?

 What version of git are you using?  In the past year or so, I've made
 several tweaks to speed up large numbers of refs, including:

   - cff38a5 (receive-pack: eliminate duplicate .have refs, v1.7.6); note
 that this only helps if they are being pulled in by an alternates
 repo. And even then, it only helps if they are mostly duplicates;
 distinct ones are still O(n^2).

   - 7db8d53 (fetch-pack: avoid quadratic behavior in remove_duplicates)
 a0de288 (fetch-pack: avoid quadratic loop in filter_refs)
 Both in v1.7.11. I think there is still a potential quadratic loop
 in mark_complete()

   - 90108a2 (upload-pack: avoid parsing tag destinations)
 926f1dd (upload-pack: avoid parsing objects during ref advertisement)
 Both in v1.7.10. Note that tag objects are more expensive to
 advertise than commits, because we have to load and peel them.

 Even with those patches, though, I found that it was something like ~2s
 to advertise 100,000 refs.

I can't provide all the details now (not with access to that machine
now), but briefly:

 * The git client/server version is 1.7.8

 * The repository has around 50k refs, they're real refs, almost all
   of them (say all but 0.5k-1k) are annotated tags, the rest are
   branches.

 * 99% of them are packed, there's a weekly cronjob that packs them
   all up, there were a few newly pushed branches and tags outside of
   the

 * I tried echo -n | git upload-pack repo on both that 50k
   repository and a repository with 100 refs, the former took around
   ~1-2s to run on a 24 core box and the latter ~500ms.

 * When I ran git-upload-pack with GNU parallel I managed around 20/s
   packs on the 24 core box on the 50k ref one, 40/s on the 100 ref
   one.

 * A co-worker who was working on this today tried it on 1.7.12 and
   claimed that it had the same performance characteristics.

 * I tried to profile it under gcc -pg  echo -n | ./git-upload-pack
   repo but it doesn't produce a profile like that, presumably
   because the process exits unsuccessfully.

   Maybe someone here knows offhand what mock data I could feed
   git-upload-pack to make it happy to just list the refs, or better
   yet do a bit more work which it would do if it were actually doing
   the fetch (I suppose I could just do a fetch, but I wanted to do
   this from a locally compiled checkout).

 Has there been any work on extending the protocol so that the client
 tells the server what refs it's interested in?

 I don't think so. It would be hard to do in a backwards-compatible way,
 because the advertisement is the first thing the server says, before it
 has negotiated any capabilities with the client at all.

I suppose at least for the ssh protocol we could just do:

ssh server (git upload-pack repo --refs=* || git upload-pack repo)

And something similar with HTTP headers, but that of course leaves the
git:// protocol.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: upload-pack is slow with lots of refs

2012-10-03 Thread Ævar Arnfjörð Bjarmason
On Wed, Oct 3, 2012 at 11:20 PM, Jeff King p...@peff.net wrote:

Thanks for all that info, it's really useful.

  * A co-worker who was working on this today tried it on 1.7.12 and
claimed that it had the same performance characteristics.

 That's surprising to me. Can you try to verify those numbers?

I think he was wrong, I tested this on git.git by first creating a lot
of tags:

 parallel --eta git tag -a -m{} test-again-{} ::: $(git rev-list HEAD)

Then doing:

git pack-refs --all
git repack -A -d

And compiled with -g -O3 I get around 1.55 runs/s of git-upload-pack
on 1.7.8 and 2.59/s on the master branch.

  * I tried to profile it under gcc -pg  echo -n | ./git-upload-pack
repo but it doesn't produce a profile like that, presumably
because the process exits unsuccessfully.

 If it's a recent version of Linux, you'll get much nicer results with
 perf. Here's what my 400K-ref case looks like:

   $ time echo  | perf record git-upload-pack . /dev/null
   real0m0.808s
   user0m0.660s
   sys 0m0.136s

   $ perf report | grep -v ^# | head
   11.40%  git-upload-pack  libc-2.13.so[.] vfprintf
9.70%  git-upload-pack  git-upload-pack [.] find_pack_entry_one
7.64%  git-upload-pack  git-upload-pack [.] check_refname_format
6.81%  git-upload-pack  libc-2.13.so[.] __memcmp_sse4_1
5.79%  git-upload-pack  libc-2.13.so[.] getenv
4.20%  git-upload-pack  libc-2.13.so[.] __strlen_sse42
3.72%  git-upload-pack  git-upload-pack [.] ref_entry_cmp_sslice
3.15%  git-upload-pack  git-upload-pack [.] read_packed_refs
2.65%  git-upload-pack  git-upload-pack [.] sha1_to_hex
2.44%  git-upload-pack  libc-2.13.so[.] _IO_default_xsputn

FWIW here are my results on the above pathological git.git

$ uname -r; perf --version; echo  | perf record
./git-upload-pack ./dev/null; perf report | grep -v ^# | head
3.2.0-2-amd64
perf version 3.2.17
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.026 MB perf.data (~1131 samples) ]
29.08%  git-upload-pack  libz.so.1.2.7   [.] inflate
17.99%  git-upload-pack  libz.so.1.2.7   [.] 0xaec1
 6.21%  git-upload-pack  libc-2.13.so[.] 0x117503
 5.69%  git-upload-pack  libcrypto.so.1.0.0  [.] 0x82c3d
 4.87%  git-upload-pack  git-upload-pack [.] find_pack_entry_one
 3.18%  git-upload-pack  ld-2.13.so  [.] 0x886e
 2.96%  git-upload-pack  libc-2.13.so[.] vfprintf
 2.83%  git-upload-pack  git-upload-pack [.] search_for_subdir
 1.56%  git-upload-pack  [kernel.kallsyms]   [k] do_raw_spin_lock
 1.36%  git-upload-pack  libc-2.13.so[.] vsnprintf

I wonder why your report doesn't note any time in libz. This is on
Debian testing, maybe your OS uses different strip settings so it
doesn't show up?

$ ldd -r ./git-upload-pack
linux-vdso.so.1 =  (0x7fff621ff000)
libz.so.1 = /lib/x86_64-linux-gnu/libz.so.1 (0x7f768feee000)
libcrypto.so.1.0.0 =
/usr/lib/x86_64-linux-gnu/libcrypto.so.1.0.0 (0x7f768fb0a000)
libpthread.so.0 = /lib/x86_64-linux-gnu/libpthread.so.0
(0x7f768f8ed000)
libc.so.6 = /lib/x86_64-linux-gnu/libc.so.6 (0x7f768f566000)
libdl.so.2 = /lib/x86_64-linux-gnu/libdl.so.2 (0x7f768f362000)
/lib64/ld-linux-x86-64.so.2 (0x7f7690117000
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: upload-pack is slow with lots of refs

2012-10-03 Thread Ævar Arnfjörð Bjarmason
On Wed, Oct 3, 2012 at 8:03 PM, Jeff King p...@peff.net wrote:
 What version of git are you using?  In the past year or so, I've made
 several tweaks to speed up large numbers of refs, including:

   - cff38a5 (receive-pack: eliminate duplicate .have refs, v1.7.6); note
 that this only helps if they are being pulled in by an alternates
 repo. And even then, it only helps if they are mostly duplicates;
 distinct ones are still O(n^2).

   - 7db8d53 (fetch-pack: avoid quadratic behavior in remove_duplicates)
 a0de288 (fetch-pack: avoid quadratic loop in filter_refs)
 Both in v1.7.11. I think there is still a potential quadratic loop
 in mark_complete()

   - 90108a2 (upload-pack: avoid parsing tag destinations)
 926f1dd (upload-pack: avoid parsing objects during ref advertisement)
 Both in v1.7.10. Note that tag objects are more expensive to
 advertise than commits, because we have to load and peel them.

 Even with those patches, though, I found that it was something like ~2s
 to advertise 100,000 refs.

FWIW I bisected between 1.7.9 and 1.7.10 and found that the point at
which it went from 1.5/s to 2.5/s upload-pack runs on the pathological
git.git repository was none of those, but:

ccdc6037fe - parse_object: try internal cache before reading object db
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: upload-pack is slow with lots of refs

2012-10-03 Thread Ævar Arnfjörð Bjarmason
On Thu, Oct 4, 2012 at 1:21 AM, Jeff King p...@peff.net wrote:
 On Thu, Oct 04, 2012 at 12:32:35AM +0200, Ævar Arnfjörð Bjarmason wrote:

 On Wed, Oct 3, 2012 at 8:03 PM, Jeff King p...@peff.net wrote:
  What version of git are you using?  In the past year or so, I've made
  several tweaks to speed up large numbers of refs, including:
 
- cff38a5 (receive-pack: eliminate duplicate .have refs, v1.7.6); note
  that this only helps if they are being pulled in by an alternates
  repo. And even then, it only helps if they are mostly duplicates;
  distinct ones are still O(n^2).
 
- 7db8d53 (fetch-pack: avoid quadratic behavior in remove_duplicates)
  a0de288 (fetch-pack: avoid quadratic loop in filter_refs)
  Both in v1.7.11. I think there is still a potential quadratic loop
  in mark_complete()
 
- 90108a2 (upload-pack: avoid parsing tag destinations)
  926f1dd (upload-pack: avoid parsing objects during ref advertisement)
  Both in v1.7.10. Note that tag objects are more expensive to
  advertise than commits, because we have to load and peel them.
 
  Even with those patches, though, I found that it was something like ~2s
  to advertise 100,000 refs.

 FWIW I bisected between 1.7.9 and 1.7.10 and found that the point at
 which it went from 1.5/s to 2.5/s upload-pack runs on the pathological
 git.git repository was none of those, but:

 ccdc6037fe - parse_object: try internal cache before reading object db

 Ah, yeah, I forgot about that one. That implies that you have a lot of
 refs pointing to the same objects (since the benefit of that commit is
 to avoid reading from disk when we have already seen it).

 Out of curiosity, what does your repo contain? I saw a lot of speedup
 with that commit because my repos are big object stores, where we have
 the same duplicated tag refs for every fork of the repo.

Things are much faster with your monkeypatch, got up to around 10
runs/s.

The repository mainly contains a lot of git-deploy[1] generated tags
which are added for every rollout to several subsystems.

Of the ~50k references in the repo 75% point to a commit that no other
reference points to. Around 98% of the references are annotated tags,
the rest are branches.

1. https://github.com/git-deploy/git-deploy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: upload-pack is slow with lots of refs

2012-10-03 Thread Ævar Arnfjörð Bjarmason
On Thu, Oct 4, 2012 at 1:15 AM, Jeff King p...@peff.net wrote:
 On Thu, Oct 04, 2012 at 12:15:47AM +0200, Ævar Arnfjörð Bjarmason wrote:

 I think he was wrong, I tested this on git.git by first creating a lot
 of tags:

  parallel --eta git tag -a -m{} test-again-{} ::: $(git rev-list 
 HEAD)

 Then doing:

 git pack-refs --all
 git repack -A -d

 And compiled with -g -O3 I get around 1.55 runs/s of git-upload-pack
 on 1.7.8 and 2.59/s on the master branch.

 Thanks for the update, that's more like what I expected.

 FWIW here are my results on the above pathological git.git

 $ uname -r; perf --version; echo  | perf record
 ./git-upload-pack ./dev/null; perf report | grep -v ^# | head
 3.2.0-2-amd64
 perf version 3.2.17
 [ perf record: Woken up 1 times to write data ]
 [ perf record: Captured and wrote 0.026 MB perf.data (~1131 samples) ]
 29.08%  git-upload-pack  libz.so.1.2.7   [.] inflate
 17.99%  git-upload-pack  libz.so.1.2.7   [.] 0xaec1
  6.21%  git-upload-pack  libc-2.13.so[.] 0x117503
  5.69%  git-upload-pack  libcrypto.so.1.0.0  [.] 0x82c3d
  4.87%  git-upload-pack  git-upload-pack [.] find_pack_entry_one
  3.18%  git-upload-pack  ld-2.13.so  [.] 0x886e
  2.96%  git-upload-pack  libc-2.13.so[.] vfprintf
  2.83%  git-upload-pack  git-upload-pack [.] search_for_subdir
  1.56%  git-upload-pack  [kernel.kallsyms]   [k] do_raw_spin_lock
  1.36%  git-upload-pack  libc-2.13.so[.] vsnprintf

 I wonder why your report doesn't note any time in libz. This is on
 Debian testing, maybe your OS uses different strip settings so it
 doesn't show up?

 Mine was on Debian unstable. The difference is probably that I have 400K
 refs, but only 12K unique ones (this is the master alternates repo
 containing every ref from every fork of rails/rails on GitHub). So I
 spend proportionally more time fiddling with refs and outputting than
 I do actually inflating tag objects.

An updated profile with your patch:

$ uname -r; perf --version; echo  | perf record
./git-upload-pack ./dev/null; perf report | grep -v ^# | head
3.2.0-2-amd64
perf version 3.2.17
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.015 MB perf.data (~662 samples) ]
14.45%  git-upload-pack  libc-2.13.so[.] 0x78140
12.13%  git-upload-pack  [kernel.kallsyms]   [k] walk_component
11.01%  git-upload-pack  libc-2.13.so[.] _IO_getline_info
10.74%  git-upload-pack  git-upload-pack [.] find_pack_entry_one
 8.96%  git-upload-pack  [kernel.kallsyms]   [k] __mmdrop
 8.64%  git-upload-pack  git-upload-pack [.] sha1_to_hex
 6.73%  git-upload-pack  libc-2.13.so[.] vfprintf
 4.07%  git-upload-pack  libc-2.13.so[.] strchrnul
 4.00%  git-upload-pack  libc-2.13.so[.] getenv
 3.37%  git-upload-pack  git-upload-pack [.] packet_write

 Hmm. It seems like we should not need to open the tags at all. The main
 reason is to produce the peeled advertisement just after it. But for a
 packed ref with a modern version of git that supports the peeled
 extension, we should already have that information.

B.t.w. do you plan to submit this as a non-hack, I'd like to have it
in git.git, so if you're not going to I could pick it up and clean it
up a bit. But I think it would be better coming from you.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] optimizing upload-pack ref peeling

2012-10-04 Thread Ævar Arnfjörð Bjarmason
On Thu, Oct 4, 2012 at 10:04 AM, Jeff King p...@peff.net wrote:
 On Thu, Oct 04, 2012 at 03:56:09AM -0400, Jeff King wrote:

   [1/4]: peel_ref: use faster deref_tag_noverify
   [2/4]: peel_ref: do not return a null sha1
   [3/4]: peel_ref: check object type before loading
   [4/4]: upload-pack: use peel_ref for ref advertisements

 I included my own timings in the final one, but my pathological case
 at the end is a somewhat made-up attempt to emulate what you described.
 Can you double-check that this series still has a nice impact on your
 real-world repository?

It does, here's best of five for, all compiled with -g -O3:

v1.7.8:

$ time (echo  | ~/g/git/git-upload-pack . | pv /dev/null)
3.49MB 0:00:00 [ 5.3MB/s] [  =

   ]

real0m0.660s
user0m0.604s
sys 0m0.248s

master without your patches:

$ time (echo  | ~/g/git/git-upload-pack . | pv /dev/null)
3.49MB 0:00:00 [10.2MB/s] [  =

   ]

real0m0.344s
user0m0.300s
sys 0m0.172s

master with your patches:

$ time (echo  | ~/g/git/git-upload-pack . | pv /dev/null)
3.49MB 0:00:00 [31.8MB/s] [  =

   ]

real0m0.113s
user0m0.088s
sys 0m0.088s
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Is anyone working on a next-gen Git protocol?

2012-10-07 Thread Ævar Arnfjörð Bjarmason
On Wed, Oct 3, 2012 at 9:13 PM, Junio C Hamano gits...@pobox.com wrote:
 Ævar Arnfjörð Bjarmason ava...@gmail.com writes:

 I'm creating a system where a lot of remotes constantly fetch from a
 central repository for deployment purposes, but I've noticed that even
 with a remote.$name.fetch configuration to only get certain refs a
 git fetch will still call git-upload pack which will provide a list
 of all references.

 It has been observed that the sender has to advertise megabytes of
 refs because it has to speak first before knowing what the receiver
 wants, even when the receiver is interested in getting updates from
 only one of them, or worse yet, when the receiver is only trying to
 peek the ref it is interested has been updated.

Has anyone started working on a next-gen Git protocol as a result of
this discussion? If not I thought I'd give it a shot if/when I have
time.

The current protocol is basically (S = Server, C = Client)

 S: Spew out first ref
 S: Advertisement of capabilities
 S: Dump of all our refs
 C/S: Declare wanted refs, negotiate with server
 S: Send pack to client, if needed

And I thought I'd basically turn it into:

 C: Connect to server, declare what protocol we understand
 C: Advertisement of capabilities
 S: Advertisement of capabilities
 C/S: Negotiate what we want
 C/S: Same as v1, without the advertisement of capabilities, and maybe
don't dump refs at all

Basically future-proofing it by having the client say what it supports
to begin with along with what it can handle (like in HTTP).

Then in the negotiation phase the client  server would go back 
forth about what they want  how they want it. I'd planned to
implement something like:

C: want_refs refs/heads/*
S: OK to that
C: want_refs refs/tags/*
S: OK to that

Or:

C: want_refs refs/heads/master
S: OK to that
C: want_refs refs/tags/v*
S: OK to that

As a proof of concept (and also something that'll solve the issue I
had), but by adding an initial negotiation phase the protocol should
be open to any future extensions without making assumptions about the
client wanting to know about all of the server's refs, unlike the
current protocol.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[minor] two tests broken when run with a --root directory that's a symlink

2012-10-11 Thread Ævar Arnfjörð Bjarmason
These issues are minor, I noticed it because I test with /dev/shm/git
as the --root, which on Debian is symlinked to /run/..

$ rm -rf /tmp/{foo,bar}
$ mkdir /tmp/target; ln -s /tmp/target /tmp/link
$ prove ./t4035-diff-quiet.sh ./t9903-bash-prompt.sh :: --root=/tmp/target
./t4035-diff-quiet.sh ... ok
./t9903-bash-prompt.sh .. ok
All tests successful.
Files=2, Tests=64,  1 wallclock secs ( 0.04 usr  0.00 sys +  0.07
cusr  0.06 csys =  0.17 CPU)
Result: PASS
$ prove ./t4035-diff-quiet.sh ./t9903-bash-prompt.sh :: --root=/tmp/link
./t4035-diff-quiet.sh ... Dubious, test returned 1 (wstat 256, 0x100)
Failed 3/20 subtests
./t9903-bash-prompt.sh .. Dubious, test returned 1 (wstat 256, 0x100)
Failed 6/44 subtests

Everything else in the test suite passes with a --root that's a symlink.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in git.git (Oct 2012, #04; Thu, 11)

2012-10-12 Thread Ævar Arnfjörð Bjarmason
On Fri, Oct 12, 2012 at 1:12 AM, Junio C Hamano gits...@pobox.com wrote:

 * jk/peel-ref (2012-10-04) 4 commits
   (merged to 'next' on 2012-10-08 at 4adfa2f)
  + upload-pack: use peel_ref for ref advertisements
  + peel_ref: check object type before loading
  + peel_ref: do not return a null sha1
  + peel_ref: use faster deref_tag_noverify

  Speeds up git upload-pack (what is invoked by git fetch on the
  other side of the connection) by reducing the cost to advertise the
  branches and tags that are available in the repository.

FWIW I have this deployed at work for a userbase of a few hundred
users, none of whom have had any issues with it, it does speed things
up a lot though.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: push race

2012-10-15 Thread Ævar Arnfjörð Bjarmason
On Mon, Oct 15, 2012 at 11:14 AM, Angelo Borsotti
angelo.borso...@gmail.com wrote:
 Hello,

FWIW we have a lot of lemmings pushing to the same ref all the time at
$work, and while I've seen cases where:

 1. Two clients try to push
 2. They both get the initial lock
 3. One of them fails to get the secondary lock (I think updating the ref)

I've never seen cases where they clobber each other in #3 (and I would
have known from dude, where's my commit that I just pushed reports).

So while we could fix git to make sure there's no race condition such
that two clients never get the #2 lock I haven't seen it cause actual
data issues because of two clients getting the #3 lock.

It might still happen in some cases, I recommend testing it with e.g.
lots of pushes in parallel with GNU Parallel.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: When Will We See Collisions for SHA-1? (An interesting analysis by Bruce Schneier)

2012-10-15 Thread Ævar Arnfjörð Bjarmason
On Mon, Oct 15, 2012 at 6:42 PM, Elia Pinto gitter.spi...@gmail.com wrote:
 Very clear analysis. Well written. Perhaps is it the time to update
 http://git-scm.com/book/ch6-1.html (A SHORT NOTE ABOUT SHA-1) ?

 Hope useful

 http://www.schneier.com/crypto-gram-1210.html

This would be concerning if the Git security model would break down if
someone found a SHA1 collision, but it really wouldn't.

It's one thing to find *a* collision, it's quite another to:

 1. Find a collision for the sha1 of harmless.c which I know you use,
and replace it with evil.c.

 2. Somehow make evil.c compile so that it actually does something
useful and nefarious, and doesn't just make the C compiler puke.

If finding one arbitrary collision costs $43K in 2021 dollars
getting past this point is going to take quite a large multiple of
$43K.

 3. Somehow inject the new evil object into your repository, or
convince you to re-clone it / clone it from somewhere you usually
wouldn't.

At some point in the early days of Git Linus went on a rant to this
effect either on this list or on the LKML.

Maybe it would be useful to include some of that instead?

It would be very interesting to see an analysis that deals with some
actual Git-related security scenarios, instead of something that just
assumes that if someone finds *any* SHA1 collision the sky is going to
fall.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: diff support for the Eiffel language?

2012-10-22 Thread Ævar Arnfjörð Bjarmason
On Mon, Oct 22, 2012 at 1:58 PM, Ulrich Windl
ulrich.wi...@rz.uni-regensburg.de wrote:
 However there's one little thing I noticed with git diff:
 The conte4xt lines (staring with @@) show the current function (in Perl and 
 C), but they show the current feature clause in Eiffel (as opposed to the 
 expected current feature). I wonder how hard it is to fix it (Observed in git 
 1.7.7 of openSUSE 12.1).

See git.git's e90d065 for an example of adding a new diff pattern.

You could easily come up with a patch and send it to the list, however
it would probably be good to CC some Eiffel language list in case
there's some syntax oddities you've missed.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: The config include mechanism doesn't allow for overwriting

2012-10-23 Thread Ævar Arnfjörð Bjarmason
On Mon, Oct 22, 2012 at 11:15 PM, Jeff King p...@peff.net wrote:
 On Mon, Oct 22, 2012 at 05:55:00PM +0200, Ævar Arnfjörð Bjarmason wrote:

 I was hoping to write something like this:

 [user]
 name = Luser
 email = some-defa...@example.com
 [include]
 path = ~/.gitconfig.d/user-email

 Where that file would contain:

 [user]
 email = local-em...@example.com

 The intent is that it would work as you expect, and produce
 local-em...@example.com.

 But when you do that git prints:

 $ git config --get user.email
  some-defa...@example.com
  error: More than one value for the key user.email: 
 local-em...@example.com

 Ugh. The config code just feeds all the values sequentially to the
 callback. The normal callbacks within git will overwrite old values,
 whether from earlier in the file, from a file with lower priority (e.g.,
 /etc/gitconfig versus ~/.gitconfig), or from an earlier included. Which
 you can check with:

   $ git var GIT_AUTHOR_IDENT
   Luser local-em...@example.com 1350936694 -0400

 But git-config takes it upon itself to detect duplicates in its
 callback. Which is just silly, since it is not something that regular
 git would do. git-config should behave as much like the internal git
 parser as possible.

 I think config inclusion is much less useful when you can't clobber
 previously assigned values.

 Agreed. But I think the bug is in git-config, not in the include
 mechanism. I think I'd like to do something like the patch below, which
 just reuses the regular config code for git-config, collects the values,
 and then reports them. It does mean we use a little more memory (for the
 sake of simplicity, we store values instead of streaming them out), but
 the code is much shorter, less confusing, and automatically matches what
 regular git_config() does.

 It fails a few tests in t1300, but it looks like those tests are testing
 for the behavior we have identified as wrong, and should be fixed.

I think this patch looks good.

One other thing I think is worth clarifying (and I think should be
broken) is if you write a configuration like:

[foo]
bar = one
[foo]
bar = two
[foo]
bar = three

git-{config,var} -l will both give you:

foo.bar=one
foo.bar=two
foo.bar=three

And git config --get foo.bar will give you:

$ git config -f /tmp/test --get foo.bar
one
error: More than one value for the key foo.bar: two
error: More than one value for the key foo.bar: three

I think that it would be better if the config mechanism just silently
overwrote keys that clobbered earlier keys like your patch does.

But in addition can we simplify things for the consumers of
git-{config,var} -l by only printing:

foo.bar=three

Or are there too many variables like include.path that can
legitimately appear more than once.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 8/8] git-config: use git_config_with_options

2012-10-24 Thread Ævar Arnfjörð Bjarmason
Yeah same here. Thanks for tackling this bug. Looking forward to using
the include mechanism for overriding user.email in future versions.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Has anyone tried to implement git grep --blame?

2013-05-28 Thread Ævar Arnfjörð Bjarmason
This would be so much more convenient if git-grep supported it natively:

$ git grep -n 'if \(0\)' | perl -pe's/([^:]+):([^:]+).*/`git blame -L
$2,$2 $1`/se'
d18f76dc (Ævar Arnfjörð Bjarmason 2010-08-17 09:24:38 + 2278)   if (0)
65648283 (David Brown 2007-12-25 19:56:29 -0800 433) if (0) {

I.e. with all the coloring/pager interaction. Some Googling around
reveals people piping things to git-blame like that, but has anyone
made a stab at a smarter implementation (that would know to blame the
whole file if it had lots of hits etc..).

Don't know if I have time myself, but I'd be very pleased if someone
hacked that up.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Add a new email notification script to contrib

2012-11-08 Thread Ævar Arnfjörð Bjarmason
On Thu, Nov 8, 2012 at 1:17 PM, Michael Haggerty mhag...@alum.mit.edu wrote:
 On 11/08/2012 12:39 PM, Ævar Arnfjörð Bjarmason wrote:
 [...]

 I'm glad it's getting some use.  Thanks for the feedback.

 I'll test it out some more, the issues I've had with it so far in
 migrating from the existing script + some custom hacks we have to it
 have been:

  * Overly verbose default templates, easy to overwrite now. Might send
patches for some of them.

 The templating is currently not super flexible nor very well documented,
 but simple changes should be easy enough.  I mostly carried over the
 text explanations from the old post-receive-email script; it is true
 that they are quite verbose.

  * No ability to link to a custom gitweb, probably easy now.

 What do you mean by a custom gitweb?  What are the commitmail issues
 involved?

Just for the E-Mail to include a link to
http://gitweb.example.com/git/?h=$our_hash etc.

  * If someone only pushes one commit I'd like to only have one e-mail
with the diff, but if they push multiple commits I'd like to have a
summary e-mail and replies to that which have the patches.

It only seemed to support the latter mode, so you send out two
e-mails for pushing one commit.

 That's correct, and I've also thought about the feature that you
 described.  I think it would be pretty easy to implement; it is only not
 quite obvious to which mailing list(s?) such emails should be sent.

  * Ability to limit the number of lines, but not line length, that's
handy for some template repositories. Should be easy to add

 Should too-long lines be folded or truncated?  Either way, it should be
 pretty straightforward (Python even has a textwrap module that could be
 used).

 But in addition to that we have our own custom E-Mail notification
 scripts for:

  * People can subscribe to changes to certain files. I.e. if you
modify very_important.c we'll send an E-Mail to a more widely seen
review list.

  * Invididuals can also edit a config file to watch individual files /
glob patterns of files, e.g. src/main.c or src/crypto*

 I implemented something like this back when we were using Subversion,
 but it didn't get much use and seemed like more configuration hassle
 than it was worth.

 If this were implemented and I asked for notifications about a
 particular file, and a particular reference change affects the file,
 what should I see?

 * The summary email for the reference change (yes/no)?

 * Detail emails for all commits within the reference change, or only for
 the individual commits that modify the file?

 * Should the detail emails include the full patch for the corresponding
 commit, or only the diffs affecting the file(s) of interest?  (The
 latter would start to get expensive, because the script would have to
 generate individual emails per subscriber instead of letting sendmail
 fan the emails out across all subscribers.)

I think just sending the individual patch e-mails to all people who
subscribe to paths that got changed in that patch that match their
watchlist makes sense.

That's how an internal E-mailing script that I'm hopign to replace
with this works.

That script *also* supports sending the whole batch of patches pushed
in that push to someone watching any file that got modified in one of
the patches, in case you also want to get other stuff pushed in pushes
for files you're interested in.

But it doesn't generate individual E-Mails per recipient. I think that
way lies madness because as you rightly point out you have to start
worrying about the combinatorial nightmare of generating the E-mails
per subscriber.

 I think a good way to support that would be to have either a path to a
 config file with those watch specs, or a command so you could run git
 show ... on some repo users can push to.

 *How* this feature would be configured depends strongly on how the repo
 is hosted.  For example, gitolite has a well-developed scheme for how
 the server should be configured, and it would make sense to work
 together with that.  Other people might configure user access via LDAP
 or Apache.

 But overall it's very nice. I'll make some time to test it in my
 organization (with lots of commits and people reading commit e-mails).
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Add a new email notification script to contrib

2012-11-08 Thread Ævar Arnfjörð Bjarmason
On Thu, Nov 8, 2012 at 5:24 PM, Marc Branchaud mbranch...@xiplink.com wrote:
 I'd like there to be one list that always gets everything, and the other
 lists should get subsets of the everything list.

Since it supports multiple mailing lists per category you can always
do (I can't remember the specific config keys, but it's not
important):

commits = all-git-activ...@example.com,git-comm...@example.com
tags= all-git-activ...@example.com,git-t...@example.com

etc.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2] git-multimail: a replacement for post-receive-email

2013-01-29 Thread Ævar Arnfjörð Bjarmason
On Sun, Jan 27, 2013 at 9:37 AM, Michael Haggerty mhag...@alum.mit.edu wrote:
 A while ago, I submitted an RFC for adding a new email notification
 script to contrib [1].  The reaction seemed favorable and it was
 suggested that the new script should replace post-receive-email rather
 than be added separately, ideally with some kind of migration support.

I just want to say since I think this thread hasn't been getting the
attention it deserves: I'm all for this. I've used git-multimail and
it's a joy to configure and extend compared to the existing hacky
shellscript.

I'm not running it at $work yet because I still need to write some
extensions for to port some of of our local hacks to the old
shellscript over.

I fully support replacing the existing mailing script with
git-multimail, it's better in every way, and unlike the current script
has an active maintainer.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/7] Undocument deprecated alias 'push.default=tracking'

2013-01-31 Thread Ævar Arnfjörð Bjarmason
On Mon, Apr 23, 2012 at 10:37 AM, Matthieu Moy matthieu@imag.fr wrote:
 It's been deprecated since 53c4031 (Johan Herland, Wed Feb 16 2011,
 push.default: Rename 'tracking' to 'upstream'), so it's OK to remove it
 from documentation (even though it's still supported) to make the
 explanations more readable.

I don't think this was a good move for the documentation. Now every
time I find an old repo with push.default=tracking I end up
wondering what it was a synonym for again, and other users who don't
know what it does will just assume it's an invalid value or something.

We can't treat existing config values we still support as any other
deprecated feature. They still exist in files we have no control over,
and in people's brains who are reading man git-config trying to
remember what it meant.

 Signed-off-by: Matthieu Moy matthieu@imag.fr
 ---
 Feel free to squash into previous one if needed.

  Documentation/config.txt |1 -
  1 file changed, 1 deletion(-)

 diff --git a/Documentation/config.txt b/Documentation/config.txt
 index e38fab1..ddf6043 100644
 --- a/Documentation/config.txt
 +++ b/Documentation/config.txt
 @@ -1693,7 +1693,6 @@ push.default::
makes `git push` and `git pull` symmetrical in the sense that `push`
will update the same remote ref as the one which is merged by
`git pull`.
 -* `tracking` - deprecated synonym for `upstream`.
  * `current` - push the current branch to a branch of the same name.
+
The `current` and `upstream` modes are for those who want to
 --
 1.7.10.234.ge65dd.dirty

 --
 To unsubscribe from this list: send the line unsubscribe git in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Should log --cc imply log --cc -p?

2013-02-05 Thread Ævar Arnfjörð Bjarmason
On Mon, Feb 4, 2013 at 5:36 PM, Junio C Hamano gits...@pobox.com wrote:
 git log/diff-files -U8 do not need -p to enable textual patches,
 for example.  It is I already told you that I want 8-line context.
 For what else, other than showing textual diff, do you think I told
 you that for? and replacing 8-line context with various other
 options that affect patch generation will give us a variety of end
 user complaints that would tell us that C) is more intuitive to
 them.

On a related note I think --full-diff should imply -p too.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is anyone working on a next-gen Git protocol (Re: [PATCH v3 0/8] Hiding refs)

2013-02-05 Thread Ævar Arnfjörð Bjarmason
On Wed, Jan 30, 2013 at 7:45 PM, Junio C Hamano gits...@pobox.com wrote:
 The third round.

  - Multi-valued variable transfer.hiderefs lists prefixes of ref
hierarchies to be hidden from the requests coming over the
network.

  - A configuration optionally allows uploadpack to accept fetch
requests for an object at the tip of a hidden ref.

 Elsewhere, we discussed delaying ref advertisement (aka expand
 refs), but it is an orthogonal feature and this hiding refs
 completely from advertisement series does not attempt to address.

I'm a bit late to this so sorry if this has been covered before.

In the initial draft of this series the rationale for it was reducing
the network cost while talking with a repository with tons of
refs[1]. But later you seem to have changed your mind, and network
bandwidth reduction of advertisement is a side effect of clutter
reduction, and not necessarily the primary goal.

Do you have any plans for something that *does* have the reduction of
network bandwidth as a primary goal?

In October I asked if anyone was working on a next-gen Git protocol[3]
that would provide clients with the ability to specify what refs they
wanted. You replied to me off-list saying Yes.

Is this what you've been working on? Because if so I misunderstood you
thinking you were going to work on something that gave clients the
ability specify what they wanted before the initial ref advertisement.

I'm still very keen to have that ability, so if you're not working on
it I just might give it a go.

1. http://article.gmane.org/gmane.comp.version-control.git/213951
2. http://article.gmane.org/gmane.comp.version-control.git/213984
3. http://article.gmane.org/gmane.comp.version-control.git/214025
4. http://thread.gmane.org/gmane.comp.version-control.git/207190
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is anyone working on a next-gen Git protocol (Re: [PATCH v3 0/8] Hiding refs)

2013-02-05 Thread Ævar Arnfjörð Bjarmason
On Tue, Feb 5, 2013 at 5:03 PM, Junio C Hamano gits...@pobox.com wrote:
 Ævar Arnfjörð Bjarmason ava...@gmail.com writes:

 Do you have any plans for something that *does* have the reduction of
 network bandwidth as a primary goal?

 Uncluttering gives reduction of bandwidth anyway, so I do not see
 much point in the distinction you seem to be making.

Doing this work wouldn't only give us a way to specify which refs we
want, but if done correctly would future-proof the protocol in case we
want to add any other extensions down the line in a
backwards-compatible fashion without having the server first spew all
his refs at us.

Anyway, an implementation that allows a client to say I want X is
simpler than an implementation where a server has to anticipate in
advance which X the clients will ask for.

 Is this what you've been working on? Because if so I misunderstood you
 thinking you were going to work on something that gave clients the
 ability specify what they wanted before the initial ref advertisement.
 ...
 4. http://thread.gmane.org/gmane.comp.version-control.git/207190

 Who speaks first mentioned in 4. above, was primarily about
 delaying ref advertisement, which would be a larger protocol
 change.  Nobody seems to have attacked it since it was discussed,
 and I was tired of hearing nothing but complaints and whines.  This
 hiding refs series was done as a cheaper way to solve a related
 issue, without having to wait for the solution of delaying
 advertisement, which is an orthogonal issue.

Oh sure. I just wanted to know if you were working on delaying ref
advertisement to avoid duplicating efforts. I had the impression you
were given your earlier E-Mail, but obviously we had a
misunderstanding.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/8] Hiding refs

2013-02-06 Thread Ævar Arnfjörð Bjarmason
On Wed, Feb 6, 2013 at 8:17 PM, Junio C Hamano gits...@pobox.com wrote:

Maybe this should be split up into a different thread, but:

 The upload-pack-2 service sits on a port different from today's
 [...].

I think there's a simpler way to do this, which is that:

 * New clients supporting v2 of the protocol send some piece of data
   that would break old servers.

 * If that fails the new client goes oh jeeze, I guess it's an old
   server, and try again with the old protocol.

 * The client then saves a date (or the version the server gave us)
   indicating that it tried the new protocol on that remote, tries
   again sometime later.

We already covered in previous discussions how this would be simpler
with the HTTP protocol, since you could just send an extra header
inviting the server to speak the new protocol.

But for the other transports we can just try the new protocol and
retry with the old one as a fallback if it doesn't work. That'll allow
us to gracefully migrate without needing to change the git:// port.

Besides, I think the vast majority of users are using Git via http://
or ssh://, where we can't just change the port, but even so making
people change the port when we could handle this more gracefully would
be a big PITA. Adding new firewall holes is often a big bureaucratic
nightmare in some organizations.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/8] Hiding refs

2013-02-07 Thread Ævar Arnfjörð Bjarmason
On Thu, Feb 7, 2013 at 1:16 AM, Jeff King p...@peff.net wrote:
 On Wed, Feb 06, 2013 at 04:12:10PM -0800, Junio C Hamano wrote:

 Ævar Arnfjörð Bjarmason ava...@gmail.com writes:

  I think there's a simpler way to do this, which is that:
 
   * New clients supporting v2 of the protocol send some piece of data
 that would break old servers.
 
   * If that fails the new client goes oh jeeze, I guess it's an old
 server, and try again with the old protocol.
 
   * The client then saves a date (or the version the server gave us)
 indicating that it tried the new protocol on that remote, tries
 again sometime later.

 For that to work, the new server needs to wait for the client to
 speak first.  How would that server handle old clients who expect to
 be spoken first?  Wait with a read timeout (no timeout is the right
 timeout for everybody)?

 If the new client can handle the old-style server's response, then the
 server can start blasting out refs (optionally after a timeout) and stop
 when the client interrupts with hey, wait, I can speak the new
 protocol. The server just has to include you can interrupt me in its
 capability advertisement (obviously it would have to send out at least
 the first ref with the capabilities before the timeout).

Can't this also be handled by passing an extra argument to
upload-pack? Whether you're talking http, ssh + normal shell, ssh +
git-shell or git:// you pass some argument that older clients would
reject on but would cause newer clients that know about that argument
to wait for you to speak before blasting refs at you.

It would mean that older clients (e.g. older git-shell) would reject
your initial connection, but you could just try again, and save away
info about that remote's version.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: inotify to minimize stat() calls

2013-02-14 Thread Ævar Arnfjörð Bjarmason
On Fri, Feb 8, 2013 at 10:10 PM, Ramkumar Ramachandra
artag...@gmail.com wrote:
 For large repositories, many simple git commands like `git status`
 take a while to respond.  I understand that this is because of large
 number of stat() calls to figure out which files were changed.  I
 overheard that Mercurial wants to solve this problem using itnotify,
 but the idea bothers me because it's not portable.  Will Git ever
 consider using inotify on Linux?  What is the downside?

There's one relatively easy sub-task of this that I haven't seen
mentioned: Improving the speed of interactive rebase on large (as in
lots of checked out files) repositories.

That's the single biggest thing that bothers me when I use Git with
large repos, not the speed of git status. When you git rebase -i
HEAD~100 re-arrange some patches and save the TODO list it takes say
0.5-1s for each patch to be applied, but at least 10x less than that
on a small repository. E.g. try this on linux-2.6.git v.s. some small
project with a few dozen files.

I looked into this a long while ago and remembered that rebase was
doing something like a git status for every commit that it made to
check the dirtyness.

This could be vastly improved by having an unsafe option to git-rebase
where it just assumes that the starting state + whatever it wrote out
is the current state, i.e. it would break if someone stuck up on your
checkout during an interactive rebase and changed a file, but the
common case of the user having exclusive access to the repo and
waiting for the rebase would be much faster.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Can git restrict source files ?

2013-02-19 Thread Ævar Arnfjörð Bjarmason
On Tue, Feb 19, 2013 at 5:06 PM, Juan Pablo juanpablo8...@gmail.com wrote:
 I have a question, can i control the access to specific files or folders ?? I 
 need that some developers can't see some source files, thank you very much 
 for your time

No, but what you can do is to split these up into different
repositories. E.g. where I work we have a puppet.git and a
secrets.git, the latter contains passwords and other secret data, the
former just uses macros to include that and is accessible to everyone.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] help: show manpage for aliased command on git alias --help

2013-03-05 Thread Ævar Arnfjörð Bjarmason
Change the semantics of git alias --help to show the help for the
command alias is aliased to, instead of just saying:

`git alias' is aliased to `whatever'

E.g. if you have checkout aliased to co you won't get:

$ git co --help
`git co' is aliased to `checkout'

But will instead get the manpage for git-checkout. The behavior this
is replacing was originally added by Jeff King in 2156435. I'm
changing it because of this off-the-cuff comment on IRC:

14:27:43 @Tux git can be very unhelpful, literally:
14:27:46 @Tux $ git co --help
14:27:46 @Tux `git co' is aliased to `checkout'
14:28:08 @Tux I know!, gimme the help for checkout, please

And because I also think it makes more sense than showing you what the
thing is aliased to.

Signed-off-by: Ævar Arnfjörð Bjarmason ava...@gmail.com
---
 builtin/help.c |   12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/builtin/help.c b/builtin/help.c
index d1d7181..fdb3312 100644
--- a/builtin/help.c
+++ b/builtin/help.c
@@ -417,6 +417,7 @@ int cmd_help(int argc, const char **argv, const char 
*prefix)
 {
int nongit;
const char *alias;
+   const char *show_help_for;
enum help_format parsed_help_format;
load_command_list(git-, main_cmds, other_cmds);
 
@@ -449,20 +450,21 @@ int cmd_help(int argc, const char **argv, const char 
*prefix)
 
alias = alias_lookup(argv[0]);
if (alias  !is_git_command(argv[0])) {
-   printf_ln(_(`git %s' is aliased to `%s'), argv[0], alias);
-   return 0;
+   show_help_for = alias;
+   } else {
+   show_help_for = argv[0];
}
 
switch (help_format) {
case HELP_FORMAT_NONE:
case HELP_FORMAT_MAN:
-   show_man_page(argv[0]);
+   show_man_page(show_help_for);
break;
case HELP_FORMAT_INFO:
-   show_info_page(argv[0]);
+   show_info_page(show_help_for);
break;
case HELP_FORMAT_WEB:
-   show_html_page(argv[0]);
+   show_html_page(show_help_for);
break;
}
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] help: show manpage for aliased command on git alias --help

2013-03-05 Thread Ævar Arnfjörð Bjarmason
On Tue, Mar 5, 2013 at 5:16 PM, Junio C Hamano gits...@pobox.com wrote:
 Ævar Arnfjörð Bjarmason  ava...@gmail.com writes:

 Change the semantics of git alias --help to show the help for the
 command alias is aliased to, instead of just saying:

 `git alias' is aliased to `whatever'

 E.g. if you have checkout aliased to co you won't get:

 $ git co --help
 `git co' is aliased to `checkout'

 If you had lg aliased to log --oneline and you made

 $ git lg --help

 to give anything but

 'git lg' is aliased to `log --oneline'

 I would say that is a grave regression.

Good point. I'll fix that up.

No objection to the patch in principle though? I.e. not showing you
what the alias points to.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: propagating repo corruption across clone

2013-03-24 Thread Ævar Arnfjörð Bjarmason
On Sun, Mar 24, 2013 at 7:31 PM, Jeff King p...@peff.net wrote:

 I don't have details on the KDE corruption, or why it wasn't detected
 (if it was one of the cases I mentioned above, or a more subtle issue).

One thing worth mentioning is this part of the article:

Originally, mirrored clones were in fact not used, but non-mirrored
clones on the anongits come with their own set of issues, and are more
prone to getting stopped up by legitimate, authenticated force pushes,
ref deletions, and so on – and if we set the refspec such that those
are allowed through silently, we don’t gain much. 

So the only reason they were even using --mirror was because they were
running into those problems with fetching.

So aside from the problems with --mirror I think we should have
something that updates your local refs to be exactly like they are on
the other end, i.e. deletes some, non-fast-forwards others etc.
(obviously behind several --force options and so on). But such an
option *wouldn't* accept corrupted objects.

That would give KDE and other parties a safe way to do exact repo
mirroring like this, wouldn't protect them from someone maliciously
deleting all the refs in all the repos, but would prevent FS
corruption from propagating.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Sharness - Test library derived from Git

2012-07-17 Thread Ævar Arnfjörð Bjarmason
On Tue, Jul 17, 2012 at 10:06 AM, Mathias Lafeldt
mathias.lafe...@gmail.com wrote:
 I've been wanting to announce Sharness [1] on this list for quite some
 time now, but never managed to do so. With the release of version
 0.2.4, I think it's about time to change that.

 Sharness is a shell-based test harness library. It was derived from
 the Git project and is basically a generalized and stripped-down
 version of t/test-lib.sh (I basically removed all things specific to
 Git). So when you know how to write tests for Git, it should be very
 familiar.

Nice, I thought about doing something like this myself but never had the time.

Perhaps to avoid duplication we could move to this and keep
Git-specific function in some other file.

Do you think that would be sensible, and would you be willing to
submit patches for that?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Centralized git

2012-07-31 Thread Ævar Arnfjörð Bjarmason
On Tue, Jul 31, 2012 at 3:08 PM, Javier Domingo javier...@gmail.com wrote:
 Network, in this case is cheaper. The thing is that If I commit
 frecuently, will have plenty of GBs of history, that nearly for sure I
 won't use. I just need to have other people's work to merge. But I
 want to think in Git style, I am pretty accustomed to that way of
 doing things. That is why I sent this mail here.

 The idea is that if I modify 700MBs of video, with 20 commits I would
 get in 21GB. And making a pull would be... just even more horrible
 than anything. That is why I need to have also last checkouts filter.
 Just download branch's HEADs.

You're obviously aware of git-annex, is there any reason you can't
just use that?

That would give you what you want, you'd have a moving window of
current files, and then you'd delete old files as they become
un-needed.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Why doesn't git-fetch obey -c remote.origin.url on the command-line?

2014-06-13 Thread Ævar Arnfjörð Bjarmason
On a git built from the master branch just now:

 $ ./git config remote.origin.url
https://code.google.com/p/git-core/
$ ./git -c remote.origin.url=git://git.sourceforge.jp/gitroot/git-core/git.git
config remote.origin.url
git://git.sourceforge.jp/gitroot/git-core/git.git
$ GIT_TRACE=1 ./git -c
remote.origin.url=git://git.sourceforge.jp/gitroot/git-core/git.git
fetch 21 | head -n 2
trace: built-in: git 'fetch'
trace: run_command: 'git-remote-https' 'origin'
'https://code.google.com/p/git-core/'

I'd expect this to try to fetch from the remote.origin.url I specified
on the command-line, but for some reason fetch doesn't pick that up.
Isn't this a bug?

The use case for this is to have a script in cron that does a pull of
repositories via http while the developers expecting to occasionally
use those repositories as work directories should transparently be
able to pull/push from them.

I know about remote.origin.pushurl, but I'd prefer pulls to also be
over ssh in those cases, because then you don't have to worry about
proxy settings (different for the devs  that automated script).

I could fix this, but I thought I'd first send a question about
whether this shouldn't be considered a bug, and I haven't dug into
this yet but I think that configuration passed via the -c option
should *always* override any other config Git may get from elsewhere.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Enhancement Request: locale git option

2014-12-04 Thread Ævar Arnfjörð Bjarmason
On Thu, Dec 4, 2014 at 10:55 AM, Jeff King p...@peff.net wrote:
 On Thu, Dec 04, 2014 at 09:29:04AM +0100, Torsten Bögershausen wrote:

 How about
 alias git='LANGUAGE=de_DE.UTF-8 git'
 in your ~/.profile ?
 (Of course you need to change de to the language you want )

 Besides being awkward in scripts (which will not respect the alias and
 use a different language!), that variable will also be inherited by
 programs git spawns. So the editor, for example, may end up in the wrong
 language.

 I think respecting core.locale would make sense (probably the change
 would go into git_setup_gettext(), but you may have to fight with the
 setup code over looking at config so early in the process).

I think we should just stick to the standard *nix way of doing this:
Tell people to set their locale in their environment.

If someone's having this issue it's also happening for all the
binutils, and any other command-line and GUI program they use, unless
they override using the standard way of doing so, by setting the
relevant LC_* environment variables.

If you want Git in English then create an alias to override its locale
to be C, if you want the editor it spawns to be in some other language
alias that to something that explicitly sets LC_* for that editor.

Maybe I'm being overzealous about this (especially with the I
implemented this blinders on), but let's not have Git set the
precedent for other *nix programs that they all should come up with
some custom way to override locales, that's something to be done at
the OS locale library level, which we use.

 However, I think the original question is not one of localizing git, but
 rather of having it _not_ localized (avoiding the German translations).
 There is a hack you can do that for that, which is to set
 GIT_TEXTDOMAINDIR to something nonsensical (like /), which will mean
 git cannot find the .po files, and just uses the builtin messages.

You can, but the fact that that works is an internal implementation
detail we shouldn't document or support.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Enhancement Request: locale git option

2014-12-04 Thread Ævar Arnfjörð Bjarmason
On Thu, Dec 4, 2014 at 5:12 PM, Michael J Gruber
g...@drmicha.warpmail.net wrote:
 Ævar Arnfjörð Bjarmason schrieb am 04.12.2014 um 16:49:
 On Thu, Dec 4, 2014 at 10:55 AM, Jeff King p...@peff.net wrote:
 On Thu, Dec 04, 2014 at 09:29:04AM +0100, Torsten Bögershausen wrote:

 How about
 alias git='LANGUAGE=de_DE.UTF-8 git'
 in your ~/.profile ?
 (Of course you need to change de to the language you want )

 Besides being awkward in scripts (which will not respect the alias and
 use a different language!), that variable will also be inherited by
 programs git spawns. So the editor, for example, may end up in the wrong
 language.

 I think respecting core.locale would make sense (probably the change
 would go into git_setup_gettext(), but you may have to fight with the
 setup code over looking at config so early in the process).

 I think we should just stick to the standard *nix way of doing this:
 Tell people to set their locale in their environment.

 If someone's having this issue it's also happening for all the
 binutils, and any other command-line and GUI program they use, unless
 they override using the standard way of doing so, by setting the
 relevant LC_* environment variables.

 If you want Git in English then create an alias to override its locale
 to be C, if you want the editor it spawns to be in some other language
 alias that to something that explicitly sets LC_* for that editor.

 Maybe I'm being overzealous about this (especially with the I
 implemented this blinders on), but let's not have Git set the
 precedent for other *nix programs that they all should come up with
 some custom way to override locales, that's something to be done at
 the OS locale library level, which we use.

 However, I think the original question is not one of localizing git, but
 rather of having it _not_ localized (avoiding the German translations).
 There is a hack you can do that for that, which is to set
 GIT_TEXTDOMAINDIR to something nonsensical (like /), which will mean
 git cannot find the .po files, and just uses the builtin messages.

 You can, but the fact that that works is an internal implementation
 detail we shouldn't document or support.


 The main issue at hand is really that we have localised git but not its
 man pages. Even if you understand English, the man pages don't help you
 at all if you can't connect the technical terms used there to their
 localised counterparts in git's messages. (NO_GETTEXT=y is my solution.)

 That is one of the many reasons why I proposed to have a dictionary of
 the main technical terms for each language before we even localise git
 in that language. In an ideal word, we would provide a simple solution
 for looking these terms up both ways. I don't think we're going to have
 localised man pages any time soon, are we?

I think that's a great idea, and one that's only blocked on someone
(hint hint) sending patches for it.

It would be neat-o to have something to make translating the docs
easier, i.e. PO files for sections of the man pages. There's tools to
help with that which we could use.

But there's no reason for us not to have translated glossaries in the meantime.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Git's Perl scripts can fail if user is configured for perlbrew

2014-12-29 Thread Ævar Arnfjörð Bjarmason
On Sun, Dec 28, 2014 at 11:36 PM, Randy J. Ray rj...@blackperl.com wrote:
 I use git on MacOS via homebrew (http://brew.sh/), and a custom Perl
 installation built and managed via perlbrew (http://perlbrew.pl/). At some
 point, commands like git add -i broke. I say at some point, because I'm
 not a git power-user and I only just noticed it this week.

 I am running Git 2.2.1 with a perlbrew'd Perl 5.20.1. When I would run git
 add -i (or git add -p), it would immediately die with a signal 11. Some
 poking around showed that those git commands that are implemented as Perl
 scripts run under /usr/bin/perl, and also prefix some directories to the
 module search-path. The problem stems from the fact that, when you are using
 perlbrew, you also have the PERL5LIB environment variable set. The contents
 of it lay between the git-provided paths and the default contents of @INC.
 When the Git module is loaded, it (eventually) triggers a load of
 List::Util, whose C-level code fails to load because of a version mismatch;
 you got List::Util from the paths in PERL5LIB, but it doesn't match the
 version of perl from /usr/bin/perl.

 After poking around and trying a few different things, I have found that
 using the following line in place of #!/usr/bin/perl solves this problem:

 #!/usr/bin/env perl

 This can be done by defaulting PERL_PATH to /usr/bin/env perl in Makefile.

 I don't know enough about the overall git ecosystem to know if this would
 have an adverse effect on anything else (in particular, Windows
 compatibility, but then Windows probably isn't having this issue in the
 first place).

 I could just create and mail in the one-line patch for this, but I thought
 it might be better to open it up for some discussion first?

[CC'd the perlbrew author]

This is a bit of a tricky issue.

Using whatever perl is defined in the environment is just as likely to
break, in general the build process tries to pick these assets at
compile-time. Imagine you're experimenting with some custom perl
version and now Git inexplicably breaks.

It's better if Git detects a working perl when you compile it and
sticks with that, which is why we use /usr/bin/perl by default.

When you're setting PERL5LIB you're indicating to whatever perl
interpreter you're going to run that that's where they it should pick
up its modules. IMO they way perlbrew does this is broken, instead of
setting PATH + PERL5LIB globally for your login shell it should set
the PATH, and then the perl in that path should be a pointer to some
small shellscript that sets PERL5LIB for *its* perl.

I don't know what the right tradeoff here is, but I think it would be
just as sensible to unset PERL5LIB in our own perl scripts + modules,
it would make live monkeypatching when you wanted to harder, but we
could always add a GITPERL5LIB or something...
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Git's Perl scripts can fail if user is configured for perlbrew

2014-12-29 Thread Ævar Arnfjörð Bjarmason
On Mon, Dec 29, 2014 at 10:57 PM, Randy J. Ray rj...@blackperl.com wrote:
 On 12/29/14, 7:40 AM, Torsten Bögershausen wrote:

 Having problems with different perl installations is not an unknown
 problem
 in Git, I would say.

 And Git itself is prepared to handle this situation:

 In Makefile I can read:
 # Define PERL_PATH to the path of your Perl binary (usually
 /usr/bin/perl).

 (What Git can not decide is which perl it should use, the one pointed out
 by $PATH or /usr/bin/perl.)

 What does
 type perl say ?

 And what happens when you build and install Git like this:
 PERL_PATH=/XX/YY/perl make install

 ---
 Are you thinking about changing
 ifndef PERL_PATH
 PERL_PATH = /usr/bin/perl
 endif
 -- into --
 ifndef PERL_PATH
 PERL_PATH = $(shell which perl)
 endif
 ---

 At first glance that could make sense, at least to me.


 The problem in this case is the Perl being used at run-time, not build-time.
 The building of git is done by the homebrew project in this case, so I don't
 have direct control over it.

Correct, but we don't change /usr/bin/perl at runtime, we hardcode
that at compile-time.

Similarly we could hardcode PERL5LIB at compile-time, but we don't, if
we did you wouldn't have this problem.

I.e. the problem is that we're using the system-provided perl with a
custom PERL5LIB set for the benefit of a non-system provided perl
installed after you built Git (or built in a different environment...)
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Git v2.3.0

2015-02-06 Thread Ævar Arnfjörð Bjarmason
On Thu, Feb 5, 2015 at 11:53 PM, Junio C Hamano gits...@pobox.com wrote:
 The latest feature release Git v2.3.0 is now available at the
 usual places.

 [...]
  * Git 2.0 was supposed to make the simple mode for the default of
git push, but it didn't.
(merge 00a6fa0 jk/push-simple later to maint).

Maybe I'm misunderstanding what this does, but changing the push
default was *the* backwards compatibility breakage we advertised for
v2.0.0[1].

A lot of users (including myself) upgraded to v2.0.0 very carefully
making sure that the common pattern of git push our users were using
wasn't broken.

But apparently that change isn't taking effect until now. If so I
think this needs to be advertised a lot more prominently than buried
down along with other miscellaneous fixes in the changelog.

1. https://git.kernel.org/cgit/git/git.git/tree/Documentation/RelNotes/2.0.0.txt
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Git messes up 'ø' character

2015-01-20 Thread Ævar Arnfjörð Bjarmason
On Tue, Jan 20, 2015 at 10:23 PM, Noralf Trønnes no...@tronnes.org wrote:
 Den 20.01.2015 21:45, skrev Ævar Arnfjörð Bjarmason:

 On Tue, Jan 20, 2015 at 9:17 PM, Noralf Trønnes no...@tronnes.org wrote:

 Den 20.01.2015 21:07, skrev Torsten Bögershausen:

 On 2015-01-20 20.46, Noralf Trønnes wrote:
 could it be that your ø is not encoded as UTF-8,
 but in ISO-8859-15 (or so)

 $ git log -1
 commit b2a4f6abdb097c4dc092b56995a2af8e42fbea79
 Author: Noralf TrF8nnes no...@tronnes.org

 What does
 git config -l | grep Noralf | xxd
 say ?

 $ git config -l | grep Noralf | xxd
 000: 7573 6572 2e6e 616d 653d 4e6f 7261 6c66  user.name=Noralf
 010: 2054 72f8 6e6e 6573 0aTr.nnes.

 $ file ~/.gitconfig
 /home/pi/.gitconfig: ISO-8859 text

 What's happened here is that:

   1. You've authored your commit in ISO-8859-1
   2. Git itself has no place for the encoding of the author name in the
 commit object format
   3. git-format-patch has a --compose-encoding which I think would sort
 this out if you set it to ISO-8859-1, but it defaults to UTF-8
   4. Your patch is actually a ISO-8859-1 byte sequence, but is
 advertised as UTF-8
   5. You end up with a screwed-up commit

 You could work around this, but I suggest just joining the 21st
 century and working exclusively in UTF-8, it makes things much easier,
 speaking as someone with 3x more non-ASCII characters their his name
 than you :)


 Ok, then the question is: How do I switch to UTF-8?

 To me it seems I'm already using it:
 $ locale charmap
 UTF-8

Your .gitconfig has an ISO-8859-1 string, from an earlier mail of yours:

 $ git config -l | grep Noralf | xxd
 000: 7573 6572 2e6e 616d 653d 4e6f 7261 6c66  user.name=Noralf
 010: 2054 72f8 6e6e 6573 0aTr.nnes.

On a system configured for UTF-8 this would be:

$ echo Noralf Trønnes | xxd
000: 4e6f 7261 6c66 2054 72c3 b86e 6e65 730a  Noralf Tr..nnes.

Note the f8 v.s. c3 b8.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Git messes up 'ø' character

2015-01-20 Thread Ævar Arnfjörð Bjarmason
On Tue, Jan 20, 2015 at 10:38 PM, Noralf Trønnes no...@tronnes.org wrote:
 Den 20.01.2015 22:26, skrev Ævar Arnfjörð Bjarmason:

 On Tue, Jan 20, 2015 at 10:23 PM, Noralf Trønnes no...@tronnes.org
 wrote:

 Den 20.01.2015 21:45, skrev Ævar Arnfjörð Bjarmason:

 On Tue, Jan 20, 2015 at 9:17 PM, Noralf Trønnes no...@tronnes.org
 wrote:

 Den 20.01.2015 21:07, skrev Torsten Bögershausen:

 On 2015-01-20 20.46, Noralf Trønnes wrote:
 could it be that your ø is not encoded as UTF-8,
 but in ISO-8859-15 (or so)

 $ git log -1
 commit b2a4f6abdb097c4dc092b56995a2af8e42fbea79
 Author: Noralf TrF8nnes no...@tronnes.org

 What does
 git config -l | grep Noralf | xxd
 say ?

 $ git config -l | grep Noralf | xxd
 000: 7573 6572 2e6e 616d 653d 4e6f 7261 6c66  user.name=Noralf
 010: 2054 72f8 6e6e 6573 0aTr.nnes.

 $ file ~/.gitconfig
 /home/pi/.gitconfig: ISO-8859 text

 What's happened here is that:

1. You've authored your commit in ISO-8859-1
2. Git itself has no place for the encoding of the author name in the
 commit object format
3. git-format-patch has a --compose-encoding which I think would sort
 this out if you set it to ISO-8859-1, but it defaults to UTF-8
4. Your patch is actually a ISO-8859-1 byte sequence, but is
 advertised as UTF-8
5. You end up with a screwed-up commit

 You could work around this, but I suggest just joining the 21st
 century and working exclusively in UTF-8, it makes things much easier,
 speaking as someone with 3x more non-ASCII characters their his name
 than you :)

 Ok, then the question is: How do I switch to UTF-8?

 To me it seems I'm already using it:
 $ locale charmap
 UTF-8

 Your .gitconfig has an ISO-8859-1 string, from an earlier mail of yours:

 $ git config -l | grep Noralf | xxd
 000: 7573 6572 2e6e 616d 653d 4e6f 7261 6c66  user.name=Noralf
 010: 2054 72f8 6e6e 6573 0aTr.nnes.

 On a system configured for UTF-8 this would be:

 $ echo Noralf Trønnes | xxd
 000: 4e6f 7261 6c66 2054 72c3 b86e 6e65 730a  Noralf Tr..nnes.

 Note the f8 v.s. c3 b8.


 Yes:
 $ echo Noralf Trønnes | xxd
 000: 4e6f 7261 6c66 2054 72f8 6e6e 6573 0aNoralf Tr.nnes.

 Is there a command I can run that shows that I'm using ISO-8859-1 ?
 I need something to google with, my previous search only gave locale stuff,
 which seems fine.

What does this give you, this is UTF-8.

$ echo git commit --author=Noralf Trønnes no...@tronnes.org | xxd
000: 6769 7420 636f 6d6d 6974 202d 2d61 7574  git commit --aut
010: 686f 723d 4e6f 7261 6c66 2054 72c3 b86e  hor=Noralf Tr..n
020: 6e65 7320 3c6e 6f74 726f 4074 726f 6e6e  nes notro@tronn
030: 6573 2e6f 7267 3e0a  es.org.

To see if you're using UTF-8 just look at the codepoints for the
non-ASCII characters you're using and check if they're valid UTF-8.
E.g. you can check this out:
http://en.wikipedia.org/wiki/%C3%98#Computers

Which shows you that the UTF-8 hex version is C3 B8, but the Latin-1
is F8, you're emitting F8, I'm emitting C3 B8.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Git messes up 'ø' character

2015-01-20 Thread Ævar Arnfjörð Bjarmason
On Tue, Jan 20, 2015 at 10:20 PM, Jeff King p...@peff.net wrote:
 On Tue, Jan 20, 2015 at 09:45:46PM +0100, Ævar Arnfjörð Bjarmason wrote:

 What's happened here is that:

  1. You've authored your commit in ISO-8859-1
  2. Git itself has no place for the encoding of the author name in the
 commit object format

 Is (2) right? The encoding header in a commit object should apply not
 just to the commit message, but also to the author (and committer) name.

 I think the real problem is simply that it defaults to UTF-8, but he is
 giving it iso-8859-1 characters. Setting i18n.commitEncoding should fix
 it.

True, I forgot about that setting.

 -Peff

 PS If you try experimenting with this, you may fall afoul of 08a94a1
(commit/commit-tree: correct latin1 to utf-8, 2012-06-28), which will
silently correct Latin1 characters into UTF-8 (when the commit
message is expected to be in UTF-8, of course). So it actually
_should_ just work under modern gits, but only for Latin1.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Git v2.3.0-rc0

2015-01-20 Thread Ævar Arnfjörð Bjarmason
On Tue, Jan 13, 2015 at 12:57 AM, Junio C Hamano gits...@pobox.com wrote:
 An early preview release Git v2.3.0-rc0 is now available for
 testing at the usual places.
[...]
 Jeff King (38):
[...]
   parse_color: refactor color storage
[...]

I've had this in my .gitconfig since 2010 which was broken by Jeff's
v2.1.3-24-g695d95d:

;; Don't be so invasive about coloring ^M when I'm editing files
that
;; are supposed to have \r\n.
[color diff]
   whitespace = 0

To test this replace \n with \r\n in a file. Before this patch you could do:

git -c color.diff.whitespace=0 show

And just get:

[red]-[/red]
[green]+[/green]

As opposed to:

git -c color.diff.whitespace=1 show

Which gives you:

[red]-
[green]+[/green][red]^M[/red]

Now that just produces:

error: invalid color value: 0
fatal: bad config variable 'color.diff.whitespace' in file
'/home/avar/.gitconfig' at line 16

Maybe breaking this is OK (but I can't find what the replacement is),
but the config or the the changelog doesn't mention breaking existing
config settings.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Git messes up 'ø' character

2015-01-20 Thread Ævar Arnfjörð Bjarmason
On Tue, Jan 20, 2015 at 9:17 PM, Noralf Trønnes no...@tronnes.org wrote:
 Den 20.01.2015 21:07, skrev Torsten Bögershausen:

 On 2015-01-20 20.46, Noralf Trønnes wrote:
 could it be that your ø is not encoded as UTF-8,
 but in ISO-8859-15 (or so)

 $ git log -1
 commit b2a4f6abdb097c4dc092b56995a2af8e42fbea79
 Author: Noralf TrF8nnes no...@tronnes.org

 What does
 git config -l | grep Noralf | xxd
 say ?

 $ git config -l | grep Noralf | xxd
 000: 7573 6572 2e6e 616d 653d 4e6f 7261 6c66  user.name=Noralf
 010: 2054 72f8 6e6e 6573 0aTr.nnes.

 $ file ~/.gitconfig
 /home/pi/.gitconfig: ISO-8859 text

What's happened here is that:

 1. You've authored your commit in ISO-8859-1
 2. Git itself has no place for the encoding of the author name in the
commit object format
 3. git-format-patch has a --compose-encoding which I think would sort
this out if you set it to ISO-8859-1, but it defaults to UTF-8
 4. Your patch is actually a ISO-8859-1 byte sequence, but is
advertised as UTF-8
 5. You end up with a screwed-up commit

You could work around this, but I suggest just joining the 21st
century and working exclusively in UTF-8, it makes things much easier,
speaking as someone with 3x more non-ASCII characters their his name
than you :)
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Git Merge, April 8-9, Paris

2015-02-17 Thread Ævar Arnfjörð Bjarmason
On Sat, Jan 24, 2015 at 12:37 AM, Jeff King p...@peff.net wrote:
 GitHub is organizing a Git-related conference to be held April 8-9,
 2015, in Paris.  Details here:

   http://git-merge.com/

 The exact schedule is still being worked out, but there is going to be
 some dedicated time/space for Git (and libgit2 and JGit) developers to
 meet and talk to each other.

 If you have patches in Git, I'd encourage you to consider attending. If
 travel finances are a problem, please talk to me. GitHub may be able to
 defray the cost of travel.

 I hope to see people there!

I'll be there, excited to be there and meet you all.

I'm even more excited in a way to be traveling from The Netherlands to
Paris to attend conference claiming to be governed by California
law[1] :)

1. Small print at https://ti.to/github-events/git-merge-2015
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-02-20 Thread Ævar Arnfjörð Bjarmason
On Fri, Feb 20, 2015 at 1:04 AM, Duy Nguyen pclo...@gmail.com wrote:
 On Fri, Feb 20, 2015 at 6:29 AM, Ævar Arnfjörð Bjarmason
 ava...@gmail.com wrote:
 Anecdotally I work on a repo at work (where I'm mostly the Git guy) that's:

  * Around 500k commits
  * Around 100k tags
  * Around 5k branches
  * Around 500 commits/day, almost entirely to the same branch
  * 1.5 GB .git checkout.
  * Mostly text source, but some binaries (we're trying to cut down[1] on 
 those)

 Would be nice if you could make an anonymized version of this repo
 public. Working on a real large repo is better than an artificial
 one.

Yeah, I'll try to do that.

 But actually most of git fetch is spent in the reachability check
 subsequently done by git-rev-list which takes several seconds. I

 I wonder if reachability bitmap could help here..

I could have sworn I had that enabled already but evidently not. I did
test it and it cut down on clone times a bit. Now our daily repacking
is:

git --git-dir={} gc 
git --git-dir={} pack-refs --all --prune 
git --git-dir={} repack -Ad --window=250 --depth=100
--write-bitmap-index --pack-kept-objects 

It's not clear to me from the documentation whether this should just
be enabled on the server, or the clients too. In any case I've enabled
it on both.

Even then with it enabled on both a git pull that pulls down just
one commit on one branch is 13s. Trace attached at the end of the
mail.

 haven't looked into it but there's got to be room for optimization
 there, surely it only has to do reachability checks for new refs, or
 could run in some I trust this remote not to send me corrupt data
 completely mode (which would make sense within a company where you can
 trust your main Git box).

 No, it's not just about trusting the server side, it's about catching
 data corruption on the wire as well. We have a trick to avoid
 reachability check in clone case, which is much more expensive than a
 fetch. Maybe we could do something further to help the fetch case _if_
 reachability bitmaps don't help.

Still, if that's indeed a big bottleneck what's the worst-case
scenario here? That the local repository gets hosed? The server will
still recursively validate the objects it gets sent, right?

I wonder if a better trade-off in that case would be to skip this in
some situations and instead put something like git fsck in a
cronjob.

Here's a git pull trace mentioned above:

$ time GIT_TRACE=1 git pull
13:06:13.603781 git.c:555   trace: exec: 'git-pull'
13:06:13.603936 run-command.c:351   trace: run_command: 'git-pull'
13:06:13.620615 git.c:349   trace: built-in: git
'rev-parse' '--git-dir'
13:06:13.631602 git.c:349   trace: built-in: git
'rev-parse' '--is-bare-repository'
13:06:13.636103 git.c:349   trace: built-in: git
'rev-parse' '--show-toplevel'
13:06:13.641491 git.c:349   trace: built-in: git 'ls-files' '-u'
13:06:13.719923 git.c:349   trace: built-in: git
'symbolic-ref' '-q' 'HEAD'
13:06:13.728085 git.c:349   trace: built-in: git 'config'
'branch.trunk.rebase'
13:06:13.738160 git.c:349   trace: built-in: git 'config' 'pull.ff'
13:06:13.743286 git.c:349   trace: built-in: git
'rev-parse' '-q' '--verify' 'HEAD'
13:06:13.972091 git.c:349   trace: built-in: git
'rev-parse' '--verify' 'HEAD'
13:06:14.149420 git.c:349   trace: built-in: git
'update-index' '-q' '--ignore-submodules' '--refresh'
13:06:14.294098 git.c:349   trace: built-in: git
'diff-files' '--quiet' '--ignore-submodules'
13:06:14.467711 git.c:349   trace: built-in: git
'diff-index' '--cached' '--quiet' '--ignore-submodules' 'HEAD' '--'
13:06:14.683419 git.c:349   trace: built-in: git
'rev-parse' '-q' '--git-dir'
13:06:15.189707 git.c:349   trace: built-in: git
'rev-parse' '-q' '--verify' 'HEAD'
13:06:15.335948 git.c:349   trace: built-in: git 'fetch'
'--update-head-ok'
13:06:15.691303 run-command.c:351   trace: run_command: 'ssh'
'git.example.com' 'git-upload-pack '\''/gitrepos/core.git'\'''
13:06:17.095662 run-command.c:351   trace: run_command: 'rev-list'
'--objects' '--stdin' '--not' '--all' '--quiet'
remote: Counting objects: 6, done.
remote: Compressing objects: 100% (6/6), done.
3:06:20.426346 run-command.c:351   trace: run_command:
'unpack-objects' '--pack_header=2,6'
13:06:20.431806 exec_cmd.c:130  trace: exec: 'git'
'unpack-objects' '--pack_header=2,6'
13:06:20.437343 git.c:349   trace: built-in: git
'unpack-objects' '--pack_header=2,6'
remote: Total 6 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (6/6), done.
13:06:20.444196 run-command.c:351   trace: run_command: 'rev-list'
'--objects' '--stdin' '--not' '--all'
13:06:20.447135 exec_cmd.c:130  trace: exec: 'git' 'rev-list'
'--objects' '--stdin' '--not' '--all'
13:06:20.451283 git.c:349   trace: built

Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-02-20 Thread Ævar Arnfjörð Bjarmason
On Fri, Feb 20, 2015 at 1:09 PM, Ævar Arnfjörð Bjarmason
ava...@gmail.com wrote:
 On Fri, Feb 20, 2015 at 1:04 AM, Duy Nguyen pclo...@gmail.com wrote:
 On Fri, Feb 20, 2015 at 6:29 AM, Ævar Arnfjörð Bjarmason
 ava...@gmail.com wrote:
 Anecdotally I work on a repo at work (where I'm mostly the Git guy) 
 that's:

  * Around 500k commits
  * Around 100k tags
  * Around 5k branches
  * Around 500 commits/day, almost entirely to the same branch
  * 1.5 GB .git checkout.
  * Mostly text source, but some binaries (we're trying to cut down[1] on 
 those)

 Would be nice if you could make an anonymized version of this repo
 public. Working on a real large repo is better than an artificial
 one.

 Yeah, I'll try to do that.

 But actually most of git fetch is spent in the reachability check
 subsequently done by git-rev-list which takes several seconds. I

 I wonder if reachability bitmap could help here..

 I could have sworn I had that enabled already but evidently not. I did
 test it and it cut down on clone times a bit. Now our daily repacking
 is:

 git --git-dir={} gc 
 git --git-dir={} pack-refs --all --prune 
 git --git-dir={} repack -Ad --window=250 --depth=100
 --write-bitmap-index --pack-kept-objects 

 It's not clear to me from the documentation whether this should just
 be enabled on the server, or the clients too. In any case I've enabled
 it on both.

 Even then with it enabled on both a git pull that pulls down just
 one commit on one branch is 13s. Trace attached at the end of the
 mail.

 haven't looked into it but there's got to be room for optimization
 there, surely it only has to do reachability checks for new refs, or
 could run in some I trust this remote not to send me corrupt data
 completely mode (which would make sense within a company where you can
 trust your main Git box).

 No, it's not just about trusting the server side, it's about catching
 data corruption on the wire as well. We have a trick to avoid
 reachability check in clone case, which is much more expensive than a
 fetch. Maybe we could do something further to help the fetch case _if_
 reachability bitmaps don't help.

 Still, if that's indeed a big bottleneck what's the worst-case
 scenario here? That the local repository gets hosed? The server will
 still recursively validate the objects it gets sent, right?

 I wonder if a better trade-off in that case would be to skip this in
 some situations and instead put something like git fsck in a
 cronjob.

 Here's a git pull trace mentioned above:

 $ time GIT_TRACE=1 git pull
 13:06:13.603781 git.c:555   trace: exec: 'git-pull'
 13:06:13.603936 run-command.c:351   trace: run_command: 'git-pull'
 13:06:13.620615 git.c:349   trace: built-in: git
 'rev-parse' '--git-dir'
 13:06:13.631602 git.c:349   trace: built-in: git
 'rev-parse' '--is-bare-repository'
 13:06:13.636103 git.c:349   trace: built-in: git
 'rev-parse' '--show-toplevel'
 13:06:13.641491 git.c:349   trace: built-in: git 'ls-files' '-u'
 13:06:13.719923 git.c:349   trace: built-in: git
 'symbolic-ref' '-q' 'HEAD'
 13:06:13.728085 git.c:349   trace: built-in: git 'config'
 'branch.trunk.rebase'
 13:06:13.738160 git.c:349   trace: built-in: git 'config' 
 'pull.ff'
 13:06:13.743286 git.c:349   trace: built-in: git
 'rev-parse' '-q' '--verify' 'HEAD'
 13:06:13.972091 git.c:349   trace: built-in: git
 'rev-parse' '--verify' 'HEAD'
 13:06:14.149420 git.c:349   trace: built-in: git
 'update-index' '-q' '--ignore-submodules' '--refresh'
 13:06:14.294098 git.c:349   trace: built-in: git
 'diff-files' '--quiet' '--ignore-submodules'
 13:06:14.467711 git.c:349   trace: built-in: git
 'diff-index' '--cached' '--quiet' '--ignore-submodules' 'HEAD' '--'
 13:06:14.683419 git.c:349   trace: built-in: git
 'rev-parse' '-q' '--git-dir'
 13:06:15.189707 git.c:349   trace: built-in: git
 'rev-parse' '-q' '--verify' 'HEAD'
 13:06:15.335948 git.c:349   trace: built-in: git 'fetch'
 '--update-head-ok'
 13:06:15.691303 run-command.c:351   trace: run_command: 'ssh'
 'git.example.com' 'git-upload-pack '\''/gitrepos/core.git'\'''
 13:06:17.095662 run-command.c:351   trace: run_command: 'rev-list'
 '--objects' '--stdin' '--not' '--all' '--quiet'
 remote: Counting objects: 6, done.
 remote: Compressing objects: 100% (6/6), done.
 3:06:20.426346 run-command.c:351   trace: run_command:
 'unpack-objects' '--pack_header=2,6'
 13:06:20.431806 exec_cmd.c:130  trace: exec: 'git'
 'unpack-objects' '--pack_header=2,6'
 13:06:20.437343 git.c:349   trace: built-in: git
 'unpack-objects' '--pack_header=2,6'
 remote: Total 6 (delta 0), reused 0 (delta 0)
 Unpacking objects: 100% (6/6), done.
 13:06:20.444196 run-command.c:351   trace: run_command: 'rev-list'
 '--objects' '--stdin' '--not' '--all'
 13:06

Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-02-20 Thread Ævar Arnfjörð Bjarmason
On Fri, Feb 20, 2015 at 1:09 PM, Ævar Arnfjörð Bjarmason
ava...@gmail.com wrote:
 On Fri, Feb 20, 2015 at 1:04 AM, Duy Nguyen pclo...@gmail.com wrote:
 On Fri, Feb 20, 2015 at 6:29 AM, Ævar Arnfjörð Bjarmason
 ava...@gmail.com wrote:
 Anecdotally I work on a repo at work (where I'm mostly the Git guy) 
 that's:

  * Around 500k commits
  * Around 100k tags
  * Around 5k branches
  * Around 500 commits/day, almost entirely to the same branch
  * 1.5 GB .git checkout.
  * Mostly text source, but some binaries (we're trying to cut down[1] on 
 those)

 Would be nice if you could make an anonymized version of this repo
 public. Working on a real large repo is better than an artificial
 one.

 Yeah, I'll try to do that.

tl;dr: After some more testing it turns out the performance issues we
have are almost entirely due to the number of refs. Some of these I
knew about and were obvious (e..g. git pull), but some aren't so
obvious (why does git log without --all slow down as a function of
the overall number of refs?).

Rather than getting an anonymized version of the repo we have, a
simpler isolated test case is just doing this on linux.git:

$ git rev-list --all | perl -ne 'my $cnt; while () {
s([a-f0-9]+)git tag -a -mTest TAG $1gm; next unless int rand 10
== 1; $cnt++; s/TAG/tagnr-$cnt/; print }'  | sh -x

That'll create a tag for every 10th commit or so, which is around 50k
tags for linux.git.

I actually ran this a few times while testing it, so this is a before
and after on a hot cache of linux.git with 406 tags v.s. ~140k. I ran
the gc + repack + bitmaps for both repos noted in an earlier reply of
mine, and took the fastest run out of 3:

$ time (git log master -100 /dev/null)
Before: real0m0.021s
After: real0m2.929s
$ time (git status /dev/null)
# Around 150ms, no noticeable difference
$ time git fetch
# I'm fetching from g...@github.com:torvalds/linux.git here, the
# cache is hot but upstream has *no* changes
Before: real0m1.826s
After: real0m8.458s

Details on why git fetch is slow in this situation:

$ time GIT_TRACE=1 git fetch
15:15:00.435420 git.c:349   trace: built-in: git 'fetch'
15:15:00.654428 run-command.c:341   trace: run_command: 'ssh'
'g...@github.com' 'git-upload-pack '\''torvalds/linux.git'\'''
15:15:02.426121 run-command.c:341   trace: run_command:
'rev-list' '--objects' '--stdin' '--not' '--all' '--quiet'
15:15:05.507327 run-command.c:341   trace: run_command:
'rev-list' '--objects' '--stdin' '--not' '--all'
15:15:05.508329 exec_cmd.c:134  trace: exec: 'git'
'rev-list' '--objects' '--stdin' '--not' '--all'
15:15:05.510490 git.c:349   trace: built-in: git
'rev-list' '--objects' '--stdin' '--not' '--all'
15:15:08.874116 run-command.c:341   trace: run_command: 'gc' '--auto'
15:15:08.879570 exec_cmd.c:134  trace: exec: 'git' 'gc' '--auto'
15:15:08.882495 git.c:349   trace: built-in: git 'gc' '--auto'
real0m8.458s
user0m6.548s
sys 0m0.204s

Even things you'd expect to not be impacted are, like a reverse log
search on the master branch:

$ time (git log --reverse -p --grep=arm64 origin/master /dev/null)
Before: real0m4.473s
After: real0m6.194s

Or doing 10 commits and rebasing on the upstream:

$ time (git checkout origin/master~  for i in {1..10}; do echo
$i  file  git add file  git commit -mmoo $file; done  git
rebase origin/master)
Before: real0m6.798s
After: real0m12.340s

The remaining slowdown comes from the size of the tree, which we can
deal with by either reducing it in size (we have some copied JS
libraries and whatnot) or trying the inotify-powered git-status.

In our case there's no good reason for why we have this many refs in
the repository everyone uses. We basically just have a bunch of dated
rollout tags that have been accumulating for years, and a bunch of
mostly unused branches people just haven't cleaned up.

So I'm going to:

 1. Write a hook that rejects tags that aren't new (i.e. forbid
re-pushes of old tags)
 2. Create an archive repository that contains all the old tags (i.e.
just run git fetch on the main one from cron)
 3. Run a script to regularly delete tags from the main repo
 4. Run the same script on the clients that clone the repo

The branches are slightly harder, deleting those that are fully merged
into the same branch is easy, deleting those whose contents 100%
matches patch-id's already in the main branch is another thing we can
do, and just clean up branches unconditionally after they've reached a
certain age (they'll still be archived).
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Geolocation support

2015-02-09 Thread Ævar Arnfjörð Bjarmason
On Mon, Feb 9, 2015 at 2:24 AM, Junio C Hamano gits...@pobox.com wrote:
 In case I was not clear, I do not think it is likely for us to accept
 a patch that mucks with object header fields with this information.
 Have them in the log text and let UI interpret them.

We've already told clients for a long time to ignore fields they don't
know about, why would we not store what's intended to be
machine-readable key-value pair data in the commit object itself, as
opposed to sticking it in the log message where parsing it is always
going to be a bit more tricky  distracting, since users will have to
look at this arbitrary metadata when they do git log or git show.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Experience with Recovering From User Error (And suggestions for improvements)

2015-02-16 Thread Ævar Arnfjörð Bjarmason
On Mon, Feb 16, 2015 at 11:41 AM, Armin Ronacher
armin.ronac...@active-4.com wrote:
 Long story short: I failed big time yesterday with accidentally executing
 git reset hard in the wrong terminal window but managed to recover my
 changes from the staging area by manually examining blobs touched recently.

 After that however I figured I might want to add a precaution for myself
 that would have helped there.  git fsck is quite nice, but unfortunately it
 does not help if you do not have a commit.  So I figured it might be nice to
 create a dangling backup commit before a reset which would have helped me.
 Unfortunately there is currently no good way to hook into git reset.

 Things I noticed in the process:

 *   for recovering blobs, going through the objects itself was more
 useful because they were all recent changes and as such I could
 order by timestamp.  git fsck will not provide any timestamps
 (which generally makes sense, but made it quite useless for me)
 *   Recovering from blobs is painful, it would be nice if git reset
 --hard made a dangling dummy commit before :)
 *   There is no pre-commit hook which could be used to implement the
 previous suggestion.

 Would it make sense to introduce a `pre-commit` hook for this sort of thing
 or even create a dummy commit by default?  I did a quick googling around and
 it looks like I was not the first person who made this mistake.  Github's
 windows client even creates dangling backup commits in what appears to be
 fixed time intervals.

 I understand that ultimately this was a user error on my part, but it seems
 like a small change that could save a lot of frustration.

Something like can we have a hook for every change in the working
tree has come up in the past, but has been defeated by performance
concerns. git reset --hard is a low-level-ish operation, and it's
really useful to be able to quickly reset the working tree to some
state no matter what, and without creating extra commits or whatever.

We should definitely make recovery like this harder, but is there a
reason for why you don't use git reset --keep instead of --hard?
It'll keep any local changes to your index/staging area, and reset the
files that don't conflict, if there's any conflicts the operation will
be aborted.

If we created such hooks for git reset --hard we'd just need to
expose some other thing as that low-level operation (and break scripts
that already rely on it doing the minimal yes I want to change the
tree no matter what thing), and then we'd just be back to square one
in a few years when users started using git reset --really-hard (or
whatever the flag would be).
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Experience with Recovering From User Error (And suggestions for improvements)

2015-02-16 Thread Ævar Arnfjörð Bjarmason
On Mon, Feb 16, 2015 at 1:09 PM, Ævar Arnfjörð Bjarmason
ava...@gmail.com wrote:
 On Mon, Feb 16, 2015 at 11:41 AM, Armin Ronacher
 armin.ronac...@active-4.com wrote:
 Long story short: I failed big time yesterday with accidentally executing
 git reset hard in the wrong terminal window but managed to recover my
 changes from the staging area by manually examining blobs touched recently.

 After that however I figured I might want to add a precaution for myself
 that would have helped there.  git fsck is quite nice, but unfortunately it
 does not help if you do not have a commit.  So I figured it might be nice to
 create a dangling backup commit before a reset which would have helped me.
 Unfortunately there is currently no good way to hook into git reset.

 Things I noticed in the process:

 *   for recovering blobs, going through the objects itself was more
 useful because they were all recent changes and as such I could
 order by timestamp.  git fsck will not provide any timestamps
 (which generally makes sense, but made it quite useless for me)
 *   Recovering from blobs is painful, it would be nice if git reset
 --hard made a dangling dummy commit before :)
 *   There is no pre-commit hook which could be used to implement the
 previous suggestion.

 Would it make sense to introduce a `pre-commit` hook for this sort of thing
 or even create a dummy commit by default?  I did a quick googling around and
 it looks like I was not the first person who made this mistake.  Github's
 windows client even creates dangling backup commits in what appears to be
 fixed time intervals.

 I understand that ultimately this was a user error on my part, but it seems
 like a small change that could save a lot of frustration.

 Something like can we have a hook for every change in the working
 tree has come up in the past, but has been defeated by performance
 concerns. git reset --hard is a low-level-ish operation, and it's
 really useful to be able to quickly reset the working tree to some
 state no matter what, and without creating extra commits or whatever.

 We should definitely make recovery like this harder, but is there a
 reason for why you don't use git reset --keep instead of --hard?
 It'll keep any local changes to your index/staging area, and reset the
 files that don't conflict, if there's any conflicts the operation will
 be aborted.

Recovery like this easier, i.e. make it easier to get back
previously staged commits / blobs.

 If we created such hooks for git reset --hard we'd just need to
 expose some other thing as that low-level operation (and break scripts
 that already rely on it doing the minimal yes I want to change the
 tree no matter what thing), and then we'd just be back to square one
 in a few years when users started using git reset --really-hard (or
 whatever the flag would be).
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-02-19 Thread Ævar Arnfjörð Bjarmason
On Thu, Feb 19, 2015 at 10:26 PM, Stephen Morton
stephen.c.mor...@gmail.com wrote:
 I posted this to comp.version-control.git.user and didn't get any response. I
 think the question is plumbing-related enough that I can ask it here.

 I'm evaluating the feasibility of moving my team from SVN to git. We have a 
 very
 large repo. [1] We will have a central repo using GitLab (or similar) that
 everybody works with. Forks, code sharing, pull requests etc. will be done
 through this central server.

 By 'performance', I guess I mean speed of day to day operations for devs.

* (Obviously, trivially, a (non-local) clone will be slow with a large 
 repo.)
* Will a few simultaneous clones from the central server also slow down
  other concurrent operations for other users?
* Will 'git pull' be slow?
* 'git push'?
* 'git commit'? (It is listed as slow in reference [3].)
* 'git stautus'? (Slow again in reference 3 though I don't see it.)
* Some operations might not seem to be day-to-day but if they are called
  frequently by the web front-end to GitLab/Stash/GitHub etc then
  they can become bottlenecks. (e.g. 'git branch --contains' seems terribly
  adversely affected by large numbers of branches.)
* Others?


 Assuming I can put lots of resources into a central server with lots of CPU,
 RAM, fast SSD, fast networking, what aspects of the repo are most likely to
 affect devs' experience?
* Number of commits
* Sheer disk space occupied by the repo
* Number of tags.
* Number of branches.
* Binary objects in the repo that cause it to bloat in size [1]
* Other factors?

 Of the various HW items listed above --CPU speed, number of cores, RAM, SSD,
 networking-- which is most critical here?

 (Stash recommends 1.5 x repo_size x number of concurrent clones of
 available RAM.
 I assume that is good advice in general.)

 Assume ridiculous numbers. Let me exaggerate: say 1 million commits, 15 GB 
 repo,
 50k tags, 1,000 branches. (Due to historical code fixups, another 5,000 
 fix-up
 branches which are just one little dangling commit required to change the 
 code
 a little bit between a commit a tag that was not quite made from it.)

 While there's lots of information online, much of it is old [3] and with git
 constantly evolving I don't know how valid it still is. Then there's anecdotal
 evidence that is of questionable value.[2]
 Are many/all of the issues Facebook identified [3] resolved? (Yes, I
 understand Facebook went with Mercurial. But I imagine the git team 
 nevertheless
 took their analysis to heart.)

Anecdotally I work on a repo at work (where I'm mostly the Git guy) that's:

 * Around 500k commits
 * Around 100k tags
 * Around 5k branches
 * Around 500 commits/day, almost entirely to the same branch
 * 1.5 GB .git checkout.
 * Mostly text source, but some binaries (we're trying to cut down[1] on those)

The main scaling issues we have with Git are:

 * git pull takes around 10 seconds or so
 * Operations like git status are much slower because they scale
with the size of the work tree
 * Similarly git rebase takes a much longer time for each applied
commit, I think because it does the equivalent of git status for
every applied commit. Each commit applied takes around 1-2 seconds.
 * We have a lot of contention on pushes because we're mostly pushing
to one branch.
 * History spelunking (e.g. git log --reverse -p -Gstr) is taking
longer by the day

The obvious reason for why git pull is slow is because
git-upload-pack spews the complete set of refs at you each time. The
output from that command is around 10MB in size for us now. It takes
around 300 ms to run that locally from hot cache, a bit more to send
it over the network.

But actually most of git fetch is spent in the reachability check
subsequently done by git-rev-list which takes several seconds. I
haven't looked into it but there's got to be room for optimization
there, surely it only has to do reachability checks for new refs, or
could run in some I trust this remote not to send me corrupt data
completely mode (which would make sense within a company where you can
trust your main Git box).

The git status operations could be made faster by having something
like watchman, there's been some effort on getting that done in Git,
but I haven't tried it. This seems to have been the main focus of
Facebook's Mercurial optimization effort.

Some of this you can solve mostly by doing e.g. git status -uno,
having support for such unsafe operations (e.g. teaching rebase and
pals to use it) would be nice at the cost of some safety, but having
something that feeds of inotify would be even better.

It takes around 3 minutes to reclone our repo, we really don't care
(we rarely re-clone). But I thought I'd mention it because for some
reason this is important to Facebook and along with inotify were the
two major things they focused on.

As far as I know every day Git operations don't scale all 

Re: [PATCH] clone: Warn if clone lacks LICENSE or COPYING file

2015-03-22 Thread Ævar Arnfjörð Bjarmason
On Sat, Mar 21, 2015 at 7:06 PM, David A. Wheeler dwhee...@dwheeler.com wrote:
 Warn cloners if there is no LICENSE* or COPYING* file that makes
 the license clear.  This is a useful warning, because if there is
 no license somewhere, then local copyright laws (which forbid many uses)
 and terms of service apply - and the cloner may not be expecting that.
 Many projects accidentally omit a license, so this is common enough to note.
 For more info on the issue, feel free to see:
 http://choosealicense.com/no-license/
 http://www.wired.com/2013/07/github-licenses/
 https://twitter.com/stephenrwalli/status/247597785069789184

As others have indicated here this feature is really specific to a
single lint-like use-case and doesn't belong in clone as a built-in
feature.

However perhaps an interesting generalization of this would be
something like a post-clone hook, obviously you couldn't store that in
.git/hooks/ like other githooks(5) since there's no repo yet, but
having it configured via the user/system config might be an
interesting feature.

If you're still interested in getting this functionality perhaps a
patch to have some general post-clone hook mechanism would be
accepted, then you could check license files or anything else you
cared about.

You could also just have a shell alias that wrapped git-clone...
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Why is git fetch --prune so much slower than git remote prune?

2015-03-06 Thread Ævar Arnfjörð Bjarmason
The --prune option to fetch added in v1.6.5-8-gf360d84 seems to be
around 20-30x slower than the equivalent operation with git remote
prune. I'm wondering if I'm missing something and fetch does something
more, but it doesn't seem so.

To test this clone git.git, create 1000 branches it in, create two
local clones of that clone and then delete the 1000 branches in the
original. I have a script to do this at
https://gist.github.com/avar/497c8c8fbd641fb756ef

Then in each of the clones:

$ git branch -a|wc -l; time (~/g/git/git-fetch --prune origin
/dev/null 21); git branch -a | wc -l
1003
real0m3.337s
user0m2.996s
sys 0m0.336s
3

$ git branch -a|wc -l; time (~/g/git/git-remote prune origin
/dev/null 21); git branch -a | wc -l
1003
real0m0.067s
user0m0.020s
sys 0m0.040s
3

Both of these ends up doing a git fetch, so it's not that. I'm quite
rusty in C profiling but here's a gprof of the git-fetch command:

$ gprof ~/g/git/git-fetch|head -n 20
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self  self total
 time   seconds   secondscalls   s/call   s/call  name
 26.42  0.33 0.33  1584583 0.00 0.00  strbuf_getwholeline
 14.63  0.51 0.18 90601347 0.00 0.00  strbuf_grow
 13.82  0.68 0.17  1045676 0.00 0.00  find_pack_entry_one
  8.13  0.78 0.10  1050062 0.00 0.00  check_refname_format
  6.50  0.86 0.08  1584675 0.00 0.00  get_sha1_hex
  5.69  0.93 0.07  2100529 0.00 0.00  starts_with
  3.25  0.97 0.04  1044043 0.00 0.00  refname_is_safe
  3.25  1.01 0.04 8007 0.00 0.00  get_packed_ref_cache
  2.44  1.04 0.03  2605595 0.00 0.00  search_ref_dir
  2.44  1.07 0.03  1040500 0.00 0.00  peel_entry
  1.63  1.09 0.02  2632661 0.00 0.00  get_ref_dir
  1.63  1.11 0.02  1044043 0.00 0.00  create_ref_entry
  1.63  1.13 0.02 8024 0.00 0.00  do_for_each_entry_in_dir
  0.81  1.14 0.01  2155105 0.00 0.00  memory_limit_check
  0.81  1.15 0.01  1580503 0.00 0.00  sha1_to_hex

And of the git-remote command:

$ gprof ~/g/git/git-remote|head -n 20
Flat profile:

Each sample counts as 0.01 seconds.
 no time accumulated

  %   cumulative   self  self total
 time   seconds   secondscalls  Ts/call  Ts/call  name
  0.00  0.00 0.00   197475 0.00 0.00  strbuf_grow
  0.00  0.00 0.0024214 0.00 0.00  sort_ref_dir
  0.00  0.00 0.0024190 0.00 0.00  search_ref_dir
  0.00  0.00 0.0021661 0.00 0.00  memory_limit_check
  0.00  0.00 0.0020236 0.00 0.00  get_ref_dir
  0.00  0.00 0.00 9187 0.00 0.00  xrealloc
  0.00  0.00 0.00 7048 0.00 0.00  strbuf_add
  0.00  0.00 0.00 6348 0.00 0.00  do_xmalloc
  0.00  0.00 0.00 6126 0.00 0.00  xcalloc
  0.00  0.00 0.00 6056 0.00 0.00  cleanup_path
  0.00  0.00 0.00 6050 0.00 0.00  get_git_dir
  0.00  0.00 0.00 6050 0.00 0.00  vsnpath
  0.00  0.00 0.00 5554 0.00 0.00  config_file_fgetc

Aside from the slowness of git-fetch it seems git-remote can be sped
up quite a bit by more aggressively allocating a larger string buffer
from the get-go.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] http: Include locale.h when using setlocale()

2015-03-06 Thread Ævar Arnfjörð Bjarmason
Since v2.3.0-rc1-37-gf18604b we've been using setlocale() here without
importing locale.h. Oddly enough this only causes issues for me under
-O0 on GCC  Clang. I.e. if I do:

$ git clean -dxf; make -j 1 V=1 CFLAGS=-g -O0 -Wall http.o

I'll get this on clang 3.5.0-6  GCC 4.9.1-19 on Debian:

http.c: In function ‘get_preferred_languages’:
http.c:1021:2: warning: implicit declaration of function ‘setlocale’ 
[-Wimplicit-function-declaration]
  retval = setlocale(LC_MESSAGES, NULL);
  ^
http.c:1021:21: error: ‘LC_MESSAGES’ undeclared (first use in this function)
  retval = setlocale(LC_MESSAGES, NULL);

But changing -O0 to -O1 or another optimization level makes the issue go
away. Odd, but in any case we should be including this header if we're
going to use the function, so just do that.

Signed-off-by: Ævar Arnfjörð Bjarmason ava...@gmail.com
---
 http.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/http.c b/http.c
index 0153fb0..0606e6c 100644
--- a/http.c
+++ b/http.c
@@ -8,6 +8,9 @@
 #include credential.h
 #include version.h
 #include pkt-line.h
+#ifndef NO_GETTEXT
+#  include locale.h
+#endif
 
 int active_requests;
 int http_is_verbose;
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: how to make full copy of a repo

2015-03-28 Thread Ævar Arnfjörð Bjarmason
On Sat, Mar 28, 2015 at 7:52 PM, Torsten Bögershausen tbo...@web.de wrote:
 On 2015-03-28 03.56, Christoph Anton Mitterer wrote:
 Hey.

 I was looking for an ideally simple way to make a full copy of a git
 repo. Many howtos are floating around on this on the web, with also lots
 of voodoo.


 First, it shouldn't be just a clone, i.o.w.
 - I want to have all refs (local/remote branches/tags) and of course all
 objects from the source repo copied as is.
 So it's local branches should become my local branches and not remote
 branches as well - and so on.
 Basically I want to be able to delete the source afterwards (and all
 backups ;) ) and not having anything lost.

 - It shouldn't set the source repo as origin or it's branches as remote
 tracking branches, as said it should be identical the source repo, just
 freshly copied via the Git aware transport mechanisms.

 - Whether GC or repacking happens, I don't care, as long as nothing that
 is still reachable in the source repo wouldn't get lost (or get lost
 once I run a GC in the copied repo).

 - Whether anything that other tools have added to .git (e.g. git-svn
 stuff) get's lost, I don't care.

 - It should work for both, bare and non-bare repos, but it's okay when
 it doesn't copy anything that is not committed or stashed.



 I'd have said that either:
 $ git clone --mirror URl-to-source-repo copy
 for the direction from outside the source to a copy,
 or alternatively:
 $ cd source-repo
 $ git push --mirror URl-to-copy
 for the direction from within the source to a copy with copy being an
 empty bare or non-bare repo,
 would do the job.

 But:

 a) but the git-clone(1) part for --mirror:
and sets up a refspec configuration such that all these refs are
overwritten by a git remote update in the target repository.
kinda confuses me since I wanted to get independent of the source
repo and this ssems to set up a remote to it?

 b) do I need --all --tags for the push as well?

 c) When following
https://help.github.com/articles/duplicating-a-repository/
it doesn't seem as if --mirror is what I want because they seem to
advertise it rather as having the copy tracking the source repo.
Of course I read about just using git-clone --bare, but that seems to
not copy everything that --mirror does (remote-tracking branches,
notes).

So I'm a bit confused...
 This instructions have 3 repos:
 the source, old, the destination new and a temporary one.
 As you only push to new, new should have no information about
 old or temp.


 1) Is it working like I assumed above?
 2) Does that also copy things like git-config, hooks, etc.?
 3) Does it copy the configured remotes from the source?
 4) What else is not copied by that? I'd assume anything that is not
tracked by git and the stash of the source?

 You didn't write if this is a bare repository,
 if it is on a local disc, if it is reachable by rsync ?
 Linux or Windows ?

 For a full clone (in the sense of having everything, bit for bit)
 I would probably use rsync. (After stopping all activities on the repo)

This warrants more emphasis. If you rsync a repository that's
active, i.e. getting pushes you *will* get corrupt copies. E.g. you
can easily copy something out of the objects directory that's in the
middle of being written, or copy the refs namespace after you copy
objects and end up with an unreachable object.

There's unfortunately no good solution to this other than doing both
git --mirror backups and rsync backups (for hooks etc.) and combining
the two, or pushing a hook for the duration that bans all updates.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-03-02 Thread Ævar Arnfjörð Bjarmason
On Tue, Feb 24, 2015 at 1:44 PM, Michael Haggerty mhag...@alum.mit.edu wrote:
 On 02/20/2015 03:25 PM, Ævar Arnfjörð Bjarmason wrote:
 On Fri, Feb 20, 2015 at 1:09 PM, Ævar Arnfjörð Bjarmason
 ava...@gmail.com wrote:
 On Fri, Feb 20, 2015 at 1:04 AM, Duy Nguyen pclo...@gmail.com wrote:
 On Fri, Feb 20, 2015 at 6:29 AM, Ævar Arnfjörð Bjarmason
 ava...@gmail.com wrote:
 Anecdotally I work on a repo at work (where I'm mostly the Git guy) 
 that's:

  * Around 500k commits
  * Around 100k tags
  * Around 5k branches
  * Around 500 commits/day, almost entirely to the same branch
  * 1.5 GB .git checkout.
  * Mostly text source, but some binaries (we're trying to cut down[1] on 
 those)

 Would be nice if you could make an anonymized version of this repo
 public. Working on a real large repo is better than an artificial
 one.

 Yeah, I'll try to do that.

 tl;dr: After some more testing it turns out the performance issues we
 have are almost entirely due to the number of refs. Some of these I
 knew about and were obvious (e..g. git pull), but some aren't so
 obvious (why does git log without --all slow down as a function of
 the overall number of refs?).

 I'm assuming that you pack your references periodically. (If not, you
 should, because reading lots of loose references is very expensive for
 the commands that need to iterate over all references!)

Yes, as mentioned in another reply of mine, like this:

git --git-dir={} gc 
git --git-dir={} pack-refs --all --prune 
git --git-dir={} repack -Ad --window=250 --depth=100
--write-bitmap-index --pack-kept-objects 

 On the other hand, packed refs also have a downside, namely that
 whenever even a single packed reference has to be read, the whole
 packed-refs file has to be read and parsed. One way that this can bite
 you, even with innocuous-seeming commands, is if you haven't disabled
 the use of replace references (i.e., using git --no-replace-objects
 CMD or GIT_NO_REPLACE_OBJECTS). In that case, almost any Git command
 has to read the refs/replace/* namespace, which, in turn, forces the
 whole packed-refs file to be read and parsed. This can take a
 significant amount of time if you have a very large number of references.

Interesting. I tried the rough benchmarks I posted above with
GIT_NO_REPLACE_OBJECTS=1 and couldn't see any differences, although as
mentioned in another reply --no-decorate had a big effect on git-log.

 So try your experiments with replace references disabled. If that helps,
 consider disabling them on your server if you don't need them.

 Michael

 --
 Michael Haggerty
 mhag...@alum.mit.edu

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Git Scaling: What factors most affect Git performance for a large repo?

2015-03-02 Thread Ævar Arnfjörð Bjarmason
On Fri, Feb 20, 2015 at 10:04 PM, Junio C Hamano gits...@pobox.com wrote:
 Ævar Arnfjörð Bjarmason ava...@gmail.com writes:

 I actually ran this a few times while testing it, so this is a before
 and after on a hot cache of linux.git with 406 tags v.s. ~140k. I ran
 the gc + repack + bitmaps for both repos noted in an earlier reply of
 mine, and took the fastest run out of 3:

 $ time (git log master -100 /dev/null)
 Before: real0m0.021s
 After: real0m2.929s

 Do you force --decorate with some config?  Or do you see similar
 performance difference with git rev-parse master, too?

Yes, I had log.decorate=short set in my config. With --no-decorate:

$ time (git log --no-decorate -100 /dev/null)
# Before: real0m0.010s
# After: real0m0.065s

 $ time (git status /dev/null)
 # Around 150ms, no noticeable difference

 This is understandable, as it will not look at any ref other than
 HEAD.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] clone: Warn if clone lacks LICENSE or COPYING file

2015-03-23 Thread Ævar Arnfjörð Bjarmason
On Mon, Mar 23, 2015 at 5:46 PM, David A. Wheeler dwhee...@dwheeler.com wrote:
 Junio C Hamano:
An approach that checks only the top-level directory for fixed
filename pattern would not be an effective way to protect the
cloners, either.

 I disagree, I think it's remarkably effective. *Many* projects
 do this, including git itself. After all, many humans need to find out the 
 licensing
 basics too; having a simple convention for *finding* it helps humans and 
 tools alike.
 It's not even limited to open source software; developers of proprietary 
 materials
 (software or now) *also* typically want to declare licensing.

 Sure, the top-level licensing text might be incomplete, but having that 
 information
 provides a big help, and it's what most people rely on anyway. Indeed, a 
 *lack*
 of this is a sign of trouble, which is exactly what warnings are good for.

I don't think you're going to find people disagreeing with you that
it's good to have license information where appropriate, but Git is
the wrong tool to warn about this.

It's a generic content tracking tool, it shouldn't be warning on the
assumption that what you're tracking is a) an open source project and
b) that you care to be notified about some arbitrary files being
missing.

A lot of Git repositories don't care at all about licensing, and
having git-clone warn about this would just be useless noise most of
the time. E.g. anything I put on gist.github.com, the code hundreds of
people contribute to at work (we never distribute it anywhere, so a
license would be pointless). I even have open source projects myself
where there's no LICENSE or COPYING files since that would be
redundant to notices in the files themselves, but I digress.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] Git Merge Contributors Summit, April 8th, Paris

2015-04-07 Thread Ævar Arnfjörð Bjarmason
On Mon, Apr 6, 2015 at 10:28 PM, Stefan Beller sbel...@google.com wrote:
 I am interested in discussing the git pack protocol v2.
 (I have been thinking about that for a while now,
 though not sharing a lot on the mailing list, so feedback is
 somewhat limited. :( )

I'm keen to talk about the new protocol and other scaling issues I
raised in the recent Git Scaling: What factors most affect Git
performance for a large repo? thread. Although from my testing the
main problems in performance are the local pack-refs file 
reachability checks, mostly not the protocol itself.

At the risk of using this list + the venue for soliciting I also want
to mention that my employer is willing to pay someone on a contract
basis to work on Git scalability issues, given the right person etc.
etc. So if someone's at the conference is interested in that I'd be
keen to talk to you.

 On Mon, Apr 6, 2015 at 12:08 PM, Christian Couder
 christian.cou...@gmail.com wrote:
 On Mon, Apr 6, 2015 at 12:48 AM, Thomas Ferris Nicolaisen
 tfn...@gmail.com wrote:
 On Tue, Feb 24, 2015 at 11:09 PM, Jeff King p...@peff.net wrote:
 I wanted to make one more announcement about this, since a few more
 details have been posted at:

   http://git-merge.com/

 since my last announcement. Specifically, I wanted to call attention to
 the contributor's summit on the 8th. Basically, there will be a space
 that can hold up to 50 people, it's open only to git (and JGit and
 libgit2) devs, and there isn't a planned agenda. So I want to:

   1. Encourage developers to come. You might meet some folks in person
  you've worked with online. And you can see how beautiful we all
  are.

   2. Get people thinking about what they would like to talk about.  In
  past GitTogethers, it's been a mix of people with prepared things
  to talk about, group discussions of areas, and general kibitzing.
  We can be spontaneous on the day of the event, but if you have a
  topic you want to bring up, you may want to give it some thought
  beforehand.

 If you are a git dev and want to come, please RSVP to Chris Kelly
 amateurhu...@github.com who is organizing the event. If you would like
 to come, but finances make it hard (either for travel, or for the
 conference fee), please talk to me off-list, and we may be able to help.

 If you have questions, please feel free to ask me, and I'll try to get
 answers from the GitHub folks who are organizing the event.


 I'll be arriving around 11 am on the 8th, if anyone wants to record
 something for the GitMinutes podcast [1]. Send me an email directly,
 or just walk up to me at the conference and say hi! I'll hopefully be
 hanging around the contributor's summit area with some microphones,
 but I've been unable to get any feedback from GitHub about whether
 this is OK, so.. I guess we'll just wing it when I get there.

 [1] http://www.gitminutes.com/

 By the way as far as I know nothing has been planned for the
 Contributors Summit on the 8th.
 Maybe we could list some topics that we could discuss.

 I will probably write very short articles about some of the
 discussions for the next Git Rev News edition, but I would be happy if
 other people would like to contribute some. Please tell me and Thomas
 if you are interested.

 Also I am not sure if something is planned for the evening of the 8th
 or not. If nothing is planned maybe we could discuss having dinner
 together or something.

 And if someone needs help or arrives in Paris early or leaves late and
 is interested in meeting up, feel free to contact me.

 Best,
 Christian.
 --
 To unsubscribe from this list: send the line unsubscribe git in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 --
 To unsubscribe from this list: send the line unsubscribe git in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH/RFC] gitweb: Don't pass --full-history to git-log(1)

2015-08-05 Thread Ævar Arnfjörð Bjarmason
When you look at the history for a file via git log we don't show
--full-history by default, but the Gitweb UI does so, which can be very
confusing for all the reasons discussed in History Simplification in
git-log(1) and in
http://thread.gmane.org/gmane.comp.version-control.git/89400/focus=90659

We've been doing history via --full-history since pretty much forever,
but I think this is much more usable, and on a typical project with lots
of branches being merged it makes for a much less confusing view. We do
this for git log by default, why wouldn't Gitweb follow suit?

Signed-off-by: Ævar Arnfjörð Bjarmason ava...@gmail.com
---
 gitweb/gitweb.perl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 7a5b23a..2913896 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -7387,7 +7387,7 @@ sub git_log_generic {
}
my @commitlist =
parse_commits($commit_hash, 101, (100 * $page),
- defined $file_name ? ($file_name, 
--full-history) : ());
+ defined $file_name ? $file_name : ());
 
my $ftype;
if (!defined $file_hash  defined $file_name) {
-- 
2.1.3

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH/RFC] gitweb: Don't pass --full-history to git-log(1)

2015-08-05 Thread Ævar Arnfjörð Bjarmason
On Wed, Aug 5, 2015 at 6:54 PM, Junio C Hamano gits...@pobox.com wrote:
 Ævar Arnfjörð Bjarmason  ava...@gmail.com writes:

 When you look at the history for a file via git log we don't show
 --full-history by default, but the Gitweb UI does so, which can be very
 confusing for all the reasons discussed in History Simplification in
 git-log(1) and in
 http://thread.gmane.org/gmane.comp.version-control.git/89400/focus=90659

 We've been doing history via --full-history since pretty much forever,
 but I think this is much more usable, and on a typical project with lots
 of branches being merged it makes for a much less confusing view. We do
 this for git log by default, why wouldn't Gitweb follow suit?

 http://thread.gmane.org/gmane.comp.version-control.git/89400/focus=90758

 seems to agree with you in principle that this would be what gitweb
 should do if it were written today.

I'm reminded of the make(1) story about not supporting spaces instead
of tabs because the guy already had a few dozen users.

We could have changed this in 2008, when Git already had much fewer
users, and I think we can still change it. It makes more sense as a
default, especially on busy repos with lots of merges. At work where
lots of merges are in flight literally 1/10 commits for any given file
is relevant.

Who'd be linking to gitweb's log output expecting its semantics to
never change, and is use case more important than having a saner view
for the vast majority of users who are just browsing around?

But if there's strong objections to it a coworker who encountered this
made a patch to it to add an extra full history an addition to the
history view (which would change, but not the permalinks), in case
there were objections to just changing it.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How to rebase when some commit hashes are in some commit messages

2015-10-18 Thread Ævar Arnfjörð Bjarmason
On Mon, Oct 12, 2015 at 9:59 PM, Francois-Xavier Le Bail
 wrote:
> Hello,
>
> [I try some search engines without success, perhaps I have missed something].
>
> For example, if I rebase the following commits, I would want that if
> the commit hash 222... become 777...,
> the message
> "Update test output for "
> become
> "Update test output for 777..."
>
> Is it possible currently? And if yes how?

This isn't strictly speaking an answer to your question (others have
done that), but in my workflow if I have a patch series where I want
to refer to commits inside the series, and I know I'm going to rebase
it I work around this by just using the subject line of the commit as
an ID.

E.g. in the message I'll say something like "See my 'commit.c: Avoid
segfaults on OSX' commit for details". Then I can just find that with
git log --grep.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Since gc.autodetach=1 you can end up with auto-gc on every command with no user notification

2015-07-08 Thread Ævar Arnfjörð Bjarmason
Someone at work came to me with the problem that they were getting the
Auto packing the repository in background for optimum performance
notice on every Git command that they ran.

This problem is a combination of two things:

 * Since Nguyễn's v1.9-rc0-2-g9f673f9 where we started running git
gc in the background the user hasn't seen the There are too many
unreachable loose objects message added back in v1.5.3.1-27-ga087cc9

 * The checkout has a lot of loose objects. So even after git prune
--expire=2.week.ago the .git/objects/17 directory has 317 objects.
More than 27 in that directory trigger git gc --auto.

So it's partly a UI issue. Since the repacking is happening in the
background the user never sees the message suggesting that they run
git prune.

But perhaps the heuristic of are there more than 27 objects in
.git/objects/17 could be improved, but I don't know with what
exactly.

But having something fork a gc to the background on every fetch (and
similar object-modifying operations) is quite sub-optimal.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Bug: git-upload-pack will return successfully even when it can't read all references

2015-09-07 Thread Ævar Arnfjörð Bjarmason
We have a process to back up our Git repositories at work, this
started alerting because it wasn't getting the same refs as the
remote.

This turned out to be a pretty trivial filesystem error.
refs/heads/master wasn't readable by the backup process, but some
other stuff in refs/heads and objects/* was.

But I think it's a bug that if we ssh to the remote end, and
git-upload-pack can't read certain refs in refs/heads/ that we don't
return an error.

This simple shellscript reproduces the issue:

rm -rf /tmp/repo /tmp/repo-checkout
git init /tmp/repo
cd /tmp/repo
touch foo
git add foo
git commit -m"foo"
git checkout -b branch
git checkout master
git show-ref
chmod 000 .git/refs/heads/master
git show-ref
cd /tmp
git clone repo repo-checkout
echo "Status code of clone: $?"
cd repo-checkout
git show-ref

After running this you get:

$ (cd /tmp/repo-checkout && echo -n | strace
/tmp/avar/bin/git-upload-pack /tmp/repo 2>&1 | grep -e EACCES)
open("refs/heads/master", O_RDONLY) = -1 EACCES (Permission denied)
open("refs/heads/master", O_RDONLY) = -1 EACCES (Permission denied)
open("refs/heads/master", O_RDONLY) = -1 EACCES (Permission denied)

And "git fetch" will return 0.

We fail to call get refs/heads/master in head_ref_namespaced() called
by upload_pack(). I was going to see if I could patch it to return an
error, but that code seems very far removed from any error checking.

This isn't only an issue with git-upload-pack, e.g. show-ref itself
has the same issue:

$ chmod 600 .git/refs/heads/master
$ git show-ref; echo $?
e7255c8fcabc6e15f57cd984f9f117870052c1a0 refs/heads/branch
e7255c8fcabc6e15f57cd984f9f117870052c1a0 refs/heads/master
0
$ chmod 000 .git/refs/heads/master
$ git show-ref; echo $?
e7255c8fcabc6e15f57cd984f9f117870052c1a0 refs/heads/branch
0

I wanted to check if this was a regression and got as far back as
v1.4.3 with the same behavior before the commands wouldn't work
anymore due to changes in the git config parsing code.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bash completion lacks options

2015-09-07 Thread Ævar Arnfjörð Bjarmason
On Mon, Sep 7, 2015 at 5:07 PM, Olaf Hering  wrote:
> "git send-email --f" lacks --find-renames and others. Is the list
> of possible options maintained manually?

Yes, see contrib/completion/git-completion.bash.

There's no code for send-email there, you (or someone) could submit a patch! :)

> Perhaps this should be
> automated by placing the long strings in an ELF section, then filling
> variables like $__git_format_patch_options from such ELF section.
> An example how this was done in libguestfs is here (see daemon/daemon.h):
> https://github.com/libguestfs/libguestfs/commit/0306c98d319d189281af3c15101c8d343e400f13

This is an interesting approach, but wouldn't help with git-send-email
in particular, it's a Perl script, so there's no ELF section to parse.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: bash completion lacks options

2015-09-08 Thread Ævar Arnfjörð Bjarmason
On Mon, Sep 7, 2015 at 5:36 PM, Olaf Hering <o...@aepfle.de> wrote:
> Am 07.09.2015 um 17:34 schrieb Ævar Arnfjörð Bjarmason:
>> On Mon, Sep 7, 2015 at 5:07 PM, Olaf Hering <o...@aepfle.de> wrote:
>
>>> https://github.com/libguestfs/libguestfs/commit/0306c98d319d189281af3c15101c8d343e400f13
>>
>> This is an interesting approach, but wouldn't help with git-send-email
>> in particular, it's a Perl script, so there's no ELF section to parse.
>
> format-patch is a ELF binary, a link to git itself as I notice
> just now.

Yes, format-patch is written in C, but you mentioned send-email, which
is a Perl script.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug: git-upload-pack will return successfully even when it can't read all references

2015-09-08 Thread Ævar Arnfjörð Bjarmason
On Tue, Sep 8, 2015 at 8:53 AM, Jeff King <p...@peff.net> wrote:
> On Mon, Sep 07, 2015 at 02:11:15PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> This turned out to be a pretty trivial filesystem error.
>> refs/heads/master wasn't readable by the backup process, but some
>> other stuff in refs/heads and objects/* was.
>>
>> [...]
>>
>> I wanted to check if this was a regression and got as far back as
>> v1.4.3 with the same behavior before the commands wouldn't work
>> anymore due to changes in the git config parsing code.
>
> Right, it has basically always been this way. for_each_ref() silently
> eats oddities or errors while reading refs. Calling for_each_rawref()
> will include them, but we don't do it in most places; it would make
> non-critical operations on a corrupted repo barf.  And it is difficult
> to know what is "critical" inside the code. You might be calling
> "upload-pack" to salvage what you can from a corrupted repo, or to make
> a backup where you want to know what is corrupted and what is not.
>
> Commit 49672f2 introduced a "ref paranoia" environment variable to let
> you specify this (and robust backups was definitely one of the use cases
> I had in mind). It's a little tricky to use with upload-pack because you
> may be crossing an ssh boundary, but:
>
>   git clone -u 'GIT_REF_PARANOIA=1 git-upload-pack' ...
>
> should work.
>
> With your case:
>
>   $ git clone --no-local -u 'GIT_REF_PARANOIA=1 git-upload-pack' repo 
> repo-checkout
>   Cloning into 'repo-checkout'...
>   fatal: git upload-pack: not our ref 
>   fatal: The remote end hung up unexpectedly
>
> Without "--no-local" it behaves weirdly, but I would not recommend local
> clones in general if you are trying to be careful. They optimize out a
> lot of the safety checks, and we do things like copy the packed-refs
> file wholesale.
>
> And certainly the error message is not the greatest. upload-pack is not
> checking for the REF_ISBROKEN flag, so it just dumps:
>
>    refs/heads/master
>
> in the advertisement, and the client happily requests that object.
> REF_PARANOIA is really just a band-aid to feed the broken refs to the
> normal code paths, which typically barf on their own. :)
>
> Something like this:
>
> diff --git a/upload-pack.c b/upload-pack.c
> index 89e832b..3c621a5 100644
> --- a/upload-pack.c
> +++ b/upload-pack.c
> @@ -731,6 +731,9 @@ static int send_ref(const char *refname, const struct 
> object_id *oid,
> if (mark_our_ref(refname, oid))
> return 0;
>
> +   if (flag & REF_ISBROKEN)
> +   warning("remote ref '%s' is broken", refname);
> +
> if (capabilities) {
> struct strbuf symref_info = STRBUF_INIT;
>
> kind of helps, but the advertisement is too early for us to send
> sideband messages. So it makes it to the user if the transport is local
> or ssh, but not over git:// or http.
>
> That's something we could do better with protocol v2 (we'll negotiate
> capabilities before the advertisement).

Fantastic. REF_PARANOIA does exactly what I need, i.e. stall the fetch
process so permissions can be manually repaired.

I think it makes sense to keep the default at "let's try to copy over
what we can", for salvage purposes. I think the bug is that we still
return success in that case, and should return non-zero, but as you
point out this is easier said than done due to needing to deal with
the case where the remote transport sends us the ... ref.

I wonder if --upload-pack="GIT_REF_PARANOIA=1 git-upload-pack" should
be the default when running fetch if you have --prune enabled. There's
a particularly bad edge case now where if you have permission errors
on the master repository and run --prune on your backup along with a
--mirror clone to mirror the refs, then when you have permission
issues you'll prune everything from the backup.

But yeah, a proper fix needs protocol v2. Because among other things
that --upload-pack hack will only work for ssh, not http.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH 6/8] config: add core.untrackedCache

2015-12-02 Thread Ævar Arnfjörð Bjarmason
On Wed, Dec 2, 2015 at 8:12 AM, Torsten Bögershausen  wrote:
> On 12/01/2015 09:31 PM, Christian Couder wrote:
>>
>> When we know that mtime is fully supported by the environment, we
>> might want the untracked cache to be always used by default without
>> any mtime test or kernel version check being performed.
>

[Re-arranged some of the quotes for the clarity of my reply]

[Also: Full disclosure, Christian is working on this for Booking.com,
and I'm managing that project...]

> I always want to test and verify that the untracked cache is working,
> before I rely on it.

Then with this patch you can just not use the core.untrackedCache=true
option, or with the later patches in this series use "git update-index
--test-untracked-cache && git config core.untrackedCache true".

> I'm not sure if ever "we know" ?
> How can we know without testing ?
> I personaly can not say "I know" in all the different system I am using,

Some users of Git can know that their mtime works, just like they know
they deploy it on filesystems where say symlinks work.

The current implementation of turning on this feature needs to be run
on a per-repo basis and without the --force option includes mandatory
tests, which a) makes it inconvenient to deploy across all Git repos
on a set of machines b) Is needlessly paranoid as a default way to
enable it.

>> Also when we know that mtime is not supported by the environment,
>> for example because the repo is shared over a network file system,
>> then we might want 'git update-index --untracked-cache' to fail
>> immediately instead of it testing if it works (because it might
>> work on some systems using the repo over the network file system
>> but not others).
>
> Same here.
>
>> Signed-off-by: Christian Couder 
>> ---
>>   Documentation/config.txt   | 10 ++
>>   Documentation/git-update-index.txt | 11 +--
>>   builtin/update-index.c | 28 ++--
>>   cache.h|  1 +
>>   config.c   | 10 ++
>>   contrib/completion/git-completion.bash |  1 +
>>   dir.c  |  2 +-
>>   environment.c  |  1 +
>>   wt-status.c|  9 +
>>   9 files changed, 60 insertions(+), 13 deletions(-)
>>
>> diff --git a/Documentation/config.txt b/Documentation/config.txt
>> index b4b0194..bf176ff 100644
>> --- a/Documentation/config.txt
>> +++ b/Documentation/config.txt
>> @@ -308,6 +308,16 @@ core.trustctime::
>> crawlers and some backup systems).
>> See linkgit:git-update-index[1]. True by default.
>>   +core.untrackedCache::
>> +   If unset or set to 'default' or 'check', untracked cache will
>> +   not be enabled by default and when
>> +   'update-index --untracked-cache' is called, Git will test if
>> +   mtime is working properly before enabling it. If set to false,
>> +   Git will refuse to enable untracked cache even if
>> +   '--force-untracked-cache' is used. If set to true, Git will
>> +   blindly enabled untracked cache by default without testing if
>> +   it works. See linkgit:git-update-index[1].
>> +
>
> Please no.
> The command line option should always be able to overwrite any settings
> from a config file.

If we keep this patch and not the rest in this series (which I think
should also be applied) you'd either use the update-index way of
changing the setting, or the config option.

> Sorry, I may missing the big picture here.
> What exactly should be achieved ?
>
> A config variable that should ask Git to always try to use the untracked
> cache ?
> Or a config variable that tells Git to never use the untracked cache ?
> Or a combination ?
>
> core.untrackedCache::
>  false: Never use the untracked cache ?
>  true: Always try to use the untracked cache ?
>Try means: probe, and if the probing fails, record that if fails in
> the index,
>for this hostname/os/kernel/path (Don't remember all the details)
> unset: As today,

As discussed in the "[RFC/PATCH] config: add core.trustmtime" thread
this feature is IMO needlessly paranoid about enabling itself.

Current state of affairs:

 * Enable on a per-repo basis: git update-index --untracked-cache
 * Disable on a per-repo basis: git update-index --no-cache
 * Enable system-wide: N/A
 * Disable system-wide: N/A

With this patch:

 * Enable on a per-repo basis: git update-index --untracked-cache OR
"git config core.untrackedCache true"
 * Disable on a per-repo basis: git update-index --no-cache OR "git
config core.untrackedCache false"
 * Enable system-wide: git config --global core.untrackedCache true
 * Disable system-wide: git config --global core.untrackedCache false
 * Caveat: The core.untrackedCache config has precidence over "git update-index"

With the rest of the patches in this series:

 * Enable system-wide & per-repo the 

Re: [PATCH 7/8] config: add core.untrackedCache

2015-12-15 Thread Ævar Arnfjörð Bjarmason
On Tue, Dec 15, 2015 at 8:40 PM, Junio C Hamano <gits...@pobox.com> wrote:
> Ævar Arnfjörð Bjarmason <ava...@gmail.com> writes:
> I still have a problem with the approach from "design cleanliness"
> point of view[...]
>
> In any case I think we already have agreed to disagree on this
> point, so there is no use discussing it any longer from my side.  I
> am not closing the door to this series, but I am not convinced,
> either.  At least not yet.

In general the fantastic thing about the git configuration facility is
that it provides both systems administrators and normal users with
what they want. It's possible to configure things system-wide and
override those on a user or repository basis.

Of course hindsight is 20/20, but I think that given what's been
covered in this thread it's been established that it's categorically
better that if we introduce features like these that they be
configured through the normal configuration facility rather than the
configuration being sticky to the index. It gives you everything that
the per-index configuration gives you and more.

So assuming that's the case, how do we migrate something that's
configured via the index towards being configured through git-config?

I think there's no general answer to that, but in this case the worst
case scenario with accepting this series as-is is that we downgrade
some users who've opted in to it to pre-v2.5.0 "git status"
performance.

Since the change in performance really isn't noticeable except on
really large repositories, which are more likely to have someone
involved watching the changelog on upgrades I think that's OK.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/8] config: add core.untrackedCache

2015-12-15 Thread Ævar Arnfjörð Bjarmason
On Mon, Dec 14, 2015 at 8:44 PM, Junio C Hamano  wrote:

I'm replying to & quoting from two E-Mails of yours at once here for
clarity & less noise. I'm working wich Christian on getting this
integrated, and we both thought it would be good to have some fresh
input on the matter from me.

> Christian Couder  writes:

>> If you want only some repos to use the UC, you will set
>> core.untrackedCache in the repo config. Then after cloning such a
>> repo, you will copy the config file, and this will not be enough to
>> enable the UC.
>
> Surely.  "Does this index file keeps track of the untracked files'
> states?" is a property of the index.  Cloning does not propagate the
> configuration and copying or not copying is irrelevant.  If you want
> to enable, running "update-index --untracked-cache" is a way to do
> so.  I cannot see what's so hard about it.
>
>> And if you have set core.untrackedCache in the global config when you
>> clone, UC is enabled, but if you have just set it in the repo config
>> after the clone, it is not enabled.
>
> That's fine.  In your patch series, if you set it in the global, you
> will get the cache in the new one.  With the cleaned-up semantics I
> suggested, the same thing will happen.
>
> And with the cleaned-up semantics, the configuration is *ONLY* used
> to give the *DEFAULT* before other things happen, i.e. creation of
> the index file for the first time.  Because the configuration is
> only the default, an explicit "update-index --[no-]untracked-cache"
> will defeat it, just like any other config/option interaction.

As you know Christian is working on this for Booking.com to integrate
features we find useful into git.git in such a way that we don't have
to maintain some internal fork of Git.

What we're trying to do, and what a lot of other big deployments of
Git elsewhere would also find useful, is to ship a default sensible
configuration for all users on the system in /etc/gitconfig.

I'd like to be able to easily enable some feature that aids Git
performance globally on our thousands of machines and for our hundreds
of users by just tweaking something in puppet to change
/etc/gitconfig, and more importantly if that change ends up being bad
reverting that config in /etc/gitconfig should undo the change.

It's an unacceptable level of complexity for system-level automation
to have to scour the filesystem for existing Git repositories and run
"git update-index" on each of them, that's why we're submitting
patches to make this a config option, so we can simply flip a flag in
/etc/gitconfig.

It's also unacceptable to have the config simply provide the default
which'll be frozen either at clone time or after an initial "git
status".

Let's say I ship a /etc/gitconfig that says "new clones should use the
untracked cache". Now I roll that out across our fleet of machines and
it turns out the morning after that the feature doesn't work properly
for whatever reason. If it's just a "default until clone or status"
type of thing even if I revert the configuration a lot of users &
their repositories in the wild will still be broken, and will have to
be manually fixed. Which again leads to the scouring the filesystem
problem.

So that gives some more context for why we're pushing for this change.
I believe this feature breaks no existing use-case and just supports
new ones, and I think that your objections to it are based on a simple
misunderstanding as will become apparent if you read on below.

> The biggest issue I had with your patch series, IIRC, is that
> configuration will defeat the command line option.

I think it's a moot point to focus on configuration v.s. command-line
option. The important question is whether or not this feature can
still be configured on a repo-local basis with this series as before.
That's still the case since --local git configuration overrides
--global and --system, so users who want to enable/disable this
per-repo still can.

>> Shouldn't it be nice if they could just enable core.untrackedCache in
>> the global config files without having to also cd into every repo and
>> use "git update-index --untracked-cache" there?
>
> NO.  It is bad to change the behaviour behind users' back.

I'm not quite sure what the objection here is exactly. If you're a
normal user you can enable/disable this per-repo just like you can
now, and enable/disable it for all your repos in ~/.gitconfig.

If you mean that the user's configuration shouldn't be changed by the
global config in /etc/gitconfig I do think that's a moot point. If
you're a user on a system where I have root and I want to change your
Git configuration I'm going to be able to do that whatever the
mechanism is.

That's indeed that's what we're doing to enable this at Booking.com
currently, we run a job to find some limited set of common checkouts
and run "git update-index" for users as root. The problem with that is
that it's needlessly complex, hence this 

Re: [PATCH 7/8] config: add core.untrackedCache

2015-12-15 Thread Ævar Arnfjörð Bjarmason
On Wed, Dec 16, 2015 at 12:03 AM, Junio C Hamano <gits...@pobox.com> wrote:
> Ævar Arnfjörð Bjarmason <ava...@gmail.com> writes:
>
>> Of course hindsight is 20/20, but I think that given what's been
>> covered in this thread it's been established that it's categorically
>> better that if we introduce features like these that they be
>> configured through the normal configuration facility rather than the
>> configuration being sticky to the index.
>
> I doubt that any such thing has been established at all in this
> thread.  It may be true that you and perhaps Christian loudly
> repeated it, but loudly repeating something and establishing
> something as a fact are slightly different.
>
> The thing is, I do not necessarily view this as "configuration".
> The way I see the feature is that you say "--untracked" when you
> want the states of untracked paths be kept track of in the index.

You probably know this, but the --untracked-cache has no bearing on
what we actually keep track of, it's just an optimization for how
efficiently we execute "git status" commands without the "-uno"
option. We still produce the same output.

> just like you say "git add Makefile" when you want the state of
> 'Makefile' be kept track of in the index.  Either the index keeps
> track of it, or it doesn't, based solely on user's request, and the
> bit to tell us which is the case is already in the index, exactly
> because that is part of the data that is kept track of in the index.

What I mean by "[we've] established that it's categorically better [to
do this via git-config]" is that we can still do all that stuff, we
can just also do more stuff now.

>> Since the change in performance really isn't noticeable except on
>> really large repositories, which are more likely to have someone
>> involved watching the changelog on upgrades I think that's OK.
>
> Especially it is dubious to me that the trade-off you are making
> with this design is a good one.  In order to avoid paying a one-time
> cost to run "update-index --untracked-cache" at sites that _do_ want
> to use that feature (and after that, if you teach "git init" and
> "git clone" to pay attention to the "give you the default"
> configuration to run it for you, so that your users won't have to),

It's not unreasonable to avoid the cost of running "update-index
--untracked-cache", it's the difference between just adjusting
/etc/gitconfig and continually having to traverse the entire /
filesystem if you want to enable this feature on a system-wide basis.
It should be easy to enable any Git feature via the configuration
facility either on a --system, or --global or --local basis.

> you are forcing all codepaths that makes any write to the index (not
> just "init"-time) to make an extra check with the configuration all
> the time for everybody, because you made the presence of the
> untracked cache data in the index not usable as a sign that the user
> wants to use that feature.

Maybe I'm misunderstanding Christian's patches but don't we already
parse the git configuration on any commands that update the index
anyway? See git_default_core_config().
We already parse the git configuration to run "git status".

> If the feature is something only those
> with really large repositories care about, is it a good trade-off to
> make everybody pay the runtime cost and make code more complex and
> fragile?  I am not yet convinced.

I was arguing that only users with really large repositories would
notice if we turned this off because the enabling facility had changed
from per-index to config. But it doesn't follow that the expense of
checking the git configuration which we're parsing anyway for the
index-related commands makes things more complex & fragile.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH] config: add core.trustmtime

2015-11-26 Thread Ævar Arnfjörð Bjarmason
On Thu, Nov 26, 2015 at 6:53 PM, Duy Nguyen  wrote:
> On Thu, Nov 26, 2015 at 6:21 AM, Christian Couder
>  wrote:
>> I am wondering why you didn't make it by default run the mtime checks
>> when a kernel change is detected. Maybe that would be better than
>> disabling itself.
>
> It takes about 10 seconds to go through the mtime check. Imagine you
> have to wait 10s for some random "git status".. Plus I didn't want to
> do anything fancy.

I browsed through the commits that added the --untracked-cache and
tried to find the original mailing list discussion, but I couldn't
find the reason for why the default interface for enabling it is doing
these exhaustive tests.

Maybe I'm missing some really common breakage with st_mtime on some
system, but having a feature the user explicitly enables turn itself
off and doing FS-testing that takes 10 seconds when it's enabled seems
like the wrong default to me.

We don't do it with core.fileMode, core.ignorecase or core.trustctime
or core.symlinks. Do we really need to be treating this differently?

If that's a "no" then the default interface to this could be much
simpler. Rather than being a change you apply to .git/index (going
away if you nuke it etc.) it could just be a config option like the
rest.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH] config: add core.trustmtime

2015-11-25 Thread Ævar Arnfjörð Bjarmason
On Wed, Nov 25, 2015 at 7:35 AM, Christian Couder
 wrote:
> At Booking.com we know that mtime works everywhere and we don't
> want the untracked cache to stop working when a kernel is upgraded
> or when the repo is copied to a machine with a different kernel.
> I will add tests later if people are ok with this.

I bit more info: I rolled Git out internally with this patch:
https://github.com/avar/git/commit/c63f7c12c2664631961add7cf3da901b0b6aa2f2

The --untracked-cache feature hardcodes the equivalent of:

pwd; uname --kernel-name --kernel-release --kernel-version

Into the index. If any of those change it prints out the "cache is
disabled" warning.

This patch will make it stop being so afraid of itself to the point of
disabling itself on minor kernel upgrades :)

A few other issues with this feature I've noticed:

 * There's no way to just enable it globally via the config. Makes it
a bit of a hassle to use it. I wanted to have a config option to
enable it via the config, how about "index.untracked_cache = true" for
the config variable name?

 * Doing "cd /tmp: git --git-dir=/git/somewhere/else/.git update-index
--untracked-cache" doesn't work how I'd expect. It hardcodes "/tmp" as
the directory that "works" into the index, so if you use the working
tree you'll never use the untracked cache. I spotted this because I
carry out a bunch of git maintenance commands with --git-dir instead
of cd-ing to the relevant directories. This works for most other
things in git, is it a bug that it doesn't work here?

 * If you "ctrl+c" git update-index --untracked-cache at an
inopportune time you'll end up with a mtime-test-XX directory in
your working tree. Perhaps this tempdir should be created in the .git
directory instead?

 * Maybe we should have a --test-untracked-cache option, so you can
run the tests without enabling it.

Aside from the slight hassle of enabling this and keeping it enabled
this feature is great. It's sped up "git status" across the board by
about 40%. Slightly less than that on faster spinning disks, slightly
more than that on slower ones.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Rebase performance

2016-02-25 Thread Ævar Arnfjörð Bjarmason
On Wed, Feb 24, 2016 at 11:09 PM, Christian Couder
 wrote:

[Resent because I was accidentally in GMail's HTML mode and the ML rejected it]

> If there was a config option called maybe "rebase.taskset" or
> "rebase.setcpuaffinity" that could be set to ask the OS for all the
> rebase child processes to be run on the same core, people who run many
> rebases on big repos on big servers as we do at Booking.com could
> easily benefit from a nice speed up.
>
> Technically the option may make git-rebase--am.sh call "git am" using
> "taskset" (if taskset is available on the current OS).

I think aside from issues with git-apply this would be an interesting
feature to have in git. I.e. some general facility to intercept
commands and inject a prefix command in front of them, whether that's
taskset, nice/ionice, strace etc.

> Another possibility would be to libify the "git apply" functionality
> and then to use the libified "git apply" in run_apply() instead of
> launching a separate "git apply" process. One benefit from this is
> that we could probably get rid of the read_cache_from() call at the
> end of run_apply() and this would likely further speed up things. Also
> avoiding to launch separate processes might be a win especially on
> Windows.

Yeah that should help in this particular case and make the taskset
redundant since the whole sequence of operations would all be on one
core, right?

At the risk of derailing this thread, a thing that would make rebase
even faster I think would be to change it so that instead of applying
a patch at a time to the working tree the whole operation takes place
on temporary trees & commits and then we'll eventually move the branch
pointer to that once it's finished.

I.e. there's no reason for why a sequence of 1000 patches where a
FOO.txt is changed from "hi1", "hi2", "hi3", ... would be noticeably
slower than applying the same changes with git-fast-import.

Of course this would require a lot of nuances, e.g. if there's a
conflict we'd need to change the working tree & index as we do now
before continuing.

Has anyone looked into some advanced refactoring of the rebase process
that would work like this, or has some feedback on why this would be
dumb or that there's a better way to do it?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Is there a --stat or --numstat like option that'll allow me to have my cake and eat it too?

2016-03-08 Thread Ævar Arnfjörð Bjarmason
I maintain a hook for Git that allows you to block binary pushes[1],
from other implementations I've seen it's the least stupid thing out
there that does that.

Basically on-push it parses this:

git log --pretty=format:%H -M100% --stat=9000,9001 ..

The --stat=9000,9001 is there to make sure we still get the filename
if it's long[2].

It's important that this is something like "git-log" instead of
"git-show for each" for performance (think a push with hundreds of
commits). It's also important that it's not "git diff" (think a push
that adds/removes a huge binary file within one push). I also don't
want to manually parse "git log --numstat -p" or whatever for
performance reasons since every push hangs on this.

It's somewhat of a pain to parse that  --stat output, because I have
to look for /\|\s+Bin / in the output to detect binary changes.

You might be thinking "why don't you use --numstat?". Because while
that option does most of what I want it doesn't show the old/new size
of the binary file, so I can't have a policy to allow e.g. <=1KB files
without doing a second pass with --stat or "git show".

Both formats also have various parsing edge cases, e.g. with -M100% I
have to parse out renames like "foo.png => bar.png", but you can also
create a file with " => " in the filename and there's no way to
disambiguate it.

Both formats also only show lines added/deleted, but --numstat doesn't
show the size before/after for binary files, so if I want to also
prohibit huge non-binary files I can't without running both --stat and
--numstat.

What I really want is something for git-log more like
git-for-each-ref, so I could emit the following info for each file
being modified delimited by some binary marker:

- file name before
- file name after
- is rename?
- is binary?
- size in bytes before
- size it bytes after
- removed lines
- added lines

I think no combination of git-log options or any built-in machinery
comes close to giving me all of that without having to do multiple
passes with some combination of git-log and git-show, but I'd love to
be proven wrong.

1. https://github.com/avar/pre-receive-reject-binaries
2. OVER NINE THOUSAND should be enough for everyone, right?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Is there a --stat or --numstat like option that'll allow me to have my cake and eat it too?

2016-03-08 Thread Ævar Arnfjörð Bjarmason
On Tue, Mar 8, 2016 at 9:51 PM, Jeff King <p...@peff.net> wrote:
> On Tue, Mar 08, 2016 at 04:08:21PM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> What I really want is something for git-log more like
>> git-for-each-ref, so I could emit the following info for each file
>> being modified delimited by some binary marker:
>>
>> - file name before
>> - file name after
>> - is rename?
>> - is binary?
>> - size in bytes before
>> - size it bytes after
>> - removed lines
>> - added lines
>
> If you get the full sha1s of each object (e.g., by adding --raw), then
> you can dump them all to a single cat-file invocation to efficiently get
> the sizes.
>
> I'm not quite sure I understand why you want to know about renames and
> added/removed lines if you are just blocking binary files. If I were
> implementing this[1], I'd probably just block based on blob size, which
> you can do with:

I want to know about renames because if you're just moving an existing
binary file around that's fine, it's not adding a new big blob to the
repo.

The hook also has a facility to commit binary stuff if you add "yes I
know what I'm doing and want to commit N bytes to the repo" to the
commit message. Mostly when people do this it's an accident.

I wanted to know about added/removed lines because I was looking into
extending this non-binary files. Today at work someone committed 300MB
of text files to a branch, we could delete it in that case, but it
would also be nice to have limits on that sort of thing too.

>   git rev-list --objects $old..$new |
>   git cat-file --batch-check='%(objectsize) %(objectname) %(rest)' |
>   perl -alne 'print if $F[0] > 1_000_000; # or whatever' |
>   while read size sha1 file; do
> echo "Whoops, $file ($sha1) is too big"
> exit 1
>   done
>
> You can also use %(objectsize:disk) to get the on-disk size (which can
> tell you about things that don't compress well, which tend to be the
> sorts of things you are trying to keep out).
>
> You can't ask about binary-ness, but I don't think it would unreasonable
> for cat-file to have a "would git consider this content binary?"
> placeholder for --batch-check.
>
> The other things are properties of the comparison, not of individual
> objects, so you'll have to get them from "git log". But with some clever
> scripting, I think you could feed those sha1s (or $commit:$path
> specifiers) into a single cat-file invocation to get the before/after
> sizes.
>
> -Peff
>
> [1] GitHub has hard and soft limits for various blob sizes, and at one
> point the implementation looked very similar to what I showed here.
> The downside is that for a large push, the rev-list can actually
> take a fair bit of time (e.g., consider pushing up all of the kernel
> history to a brand new repo), and this is on top of the similar work
> already done by index-pack and check_everything_connected().
>
> These days I have a hacky patch to notice the too-big size directly
> in index-pack, which is essentially free. It doesn't know about the
> file path, so we pull that out later in the pre-receive hook. But we
> only have to do so in the uncommon case that there _is_ actually a
> too-big file, so normal pushes incur no penalty.

All good tips / insights. I'll definitely check some of this out.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GSoC] A late proposal: a modern send-email

2016-03-29 Thread Ævar Arnfjörð Bjarmason
On Tue, Mar 29, 2016 at 6:17 AM, 惠轶群 <huiyi...@gmail.com> wrote:
> 2016-03-29 0:49 GMT+08:00 Ævar Arnfjörð Bjarmason <ava...@gmail.com>:
>> On Sat, Mar 26, 2016 at 3:13 AM, 惠轶群 <huiyi...@gmail.com> wrote:
>>> 2016-03-26 2:16 GMT+08:00 Junio C Hamano <gits...@pobox.com>:
>>>> 惠轶群 <huiyi...@gmail.com> writes:
>>>>
>>>>> # Purpose
>>>>> The current implementation of send-email is based on perl and has only
>>>>> a tui, it has two problems:
>>>>> - user must install a ton of dependencies before submit a single patch.
>>>>> - tui and parameter are both not quite friendly to new users.
>>>>
>>>> Is "a ton of dependencies" true?  "apt-cache show git-email"
>>>> suggests otherwise.  Is "a ton of dependencies" truly a problem?
>>>> "apt-get install" would resolve the dependencies for you.
>>>
>>> There are three perl packages needed to send patch through gmail:
>>> - perl-mime-tools
>>> - perl-net-smtp-ssl
>>> - perl-authen-sasl
>>>
>>> Yes, not too many, but is it better none of them?
>>>
>>> What's more, when I try to send mails, I was first disrupted by
>>> "no perl-mime-tools" then by "no perl-net-smtp-ssl or perl-authen-sasl".
>>> Then I think, why not just a mailto link?
>>
>> I think your proposal should clarify a bit who these users are that
>> find it too difficult to install these perl module dependencies. Users
>> on OSX & Windows I would assume, because in the case of Linux distros
>> getting these is the equivalent of an apt-get command away.
>
> In fact, I'm not familiar with the build for OSX or Windows.

The core of your proposal rests on the assumption that
git-send-email's implementation is problematic because it has a "ton
of dependencies", and that this must be dealt with by implementing an
alternate E-Mail transport method.

But you don't go into how this is a practical issue for users exactly,
which is the rest of the proposal. I.e. "make it friendly for users".
Let's leave the question of creating an E-Mail GUI that's shipped with
Git aside.

Correct me if I'm wrong but don't we basically have 4 kinds of users
using git-send-email:

1) Those who get it from a binary Windows package (is it even packaged there?)
2) Also a binary package, but for for OSX
3) Users installing it via their Linux distribution's package system
4) Users building it from source on Windows/OSX/Linux.

I'm in group #3 myself for the purposes of using git-send-email and
have never had issues with its dependencies because my distro's
package management takes care of it for me.

I don't know what the status is of packaging it is on #1 and #2, but
that's what I'm asking about in my question, if this becomes a
non-issue for those two groups (if it isn't already) isn't this
question of dependencies a non-issue?

I.e. why does it matter if git-send-email has N dependencies if those
N are either packaged with the common Windows/OSX packages that most
users use, or installed as dependencies by their *nix distro?

 Group #4 is small enough and likely to be a git.git contributor or
distro package maintainer anyway that this issue doesn't matter for
them.

>> If installing these dependencies is hard for users perhaps a better
>> thing to focus on is altering the binary builds on Git for platforms
>> that don't have package systems to include these dependencies.
>
> Why `mailto` not a good choice? I'm confusing.

I'm not saying having this mailto: method you're proposing isn't good
in itself, I think it would be very useful to be able to magically
open git-send-email output in your favorite E-Mail client for editing
before sending it off like you usually send E-Mail.

Although I must say I'd be seriously surprised if the likes of git
formatted patches survive contact with popular E-Mail clients when the
body is specified via the body=* parameter, given that we're sending
pretty precisely formatted content and most mailers are very eager to
wrap lines or otherwise munge input.

I'm mainly trying to get to the bottom of this dependency issue you're
trying to solve.

>> In this case it would mean shipping a statically linked OpenSSL since
>> that's what these perl SSL packages eventually depend on.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GSoC] A late proposal: a modern send-email

2016-03-28 Thread Ævar Arnfjörð Bjarmason
On Sat, Mar 26, 2016 at 3:13 AM, 惠轶群  wrote:
> 2016-03-26 2:16 GMT+08:00 Junio C Hamano :
>> 惠轶群  writes:
>>
>>> # Purpose
>>> The current implementation of send-email is based on perl and has only
>>> a tui, it has two problems:
>>> - user must install a ton of dependencies before submit a single patch.
>>> - tui and parameter are both not quite friendly to new users.
>>
>> Is "a ton of dependencies" true?  "apt-cache show git-email"
>> suggests otherwise.  Is "a ton of dependencies" truly a problem?
>> "apt-get install" would resolve the dependencies for you.
>
> There are three perl packages needed to send patch through gmail:
> - perl-mime-tools
> - perl-net-smtp-ssl
> - perl-authen-sasl
>
> Yes, not too many, but is it better none of them?
>
> What's more, when I try to send mails, I was first disrupted by
> "no perl-mime-tools" then by "no perl-net-smtp-ssl or perl-authen-sasl".
> Then I think, why not just a mailto link?

I think your proposal should clarify a bit who these users are that
find it too difficult to install these perl module dependencies. Users
on OSX & Windows I would assume, because in the case of Linux distros
getting these is the equivalent of an apt-get command away.

If installing these dependencies is hard for users perhaps a better
thing to focus on is altering the binary builds on Git for platforms
that don't have package systems to include these dependencies.

In this case it would mean shipping a statically linked OpenSSL since
that's what these perl SSL packages eventually depend on.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/3] githooks.txt: Amend dangerous advice about 'update' hook ACL

2016-04-25 Thread Ævar Arnfjörð Bjarmason
Any ACL you implement via an 'update' hook isn't actual access control
if the user has login access to the machine running git, because they
can trivially just built their own git version which doesn't run the
hook.

Change the documentation to take this dangerous edge case into account,
and remove the mention of the advice originating on the mailing list,
the users reading this don't care where the idea came up.

Signed-off-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com>
---
 Documentation/githooks.txt | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt
index 6db515e..38bea7d 100644
--- a/Documentation/githooks.txt
+++ b/Documentation/githooks.txt
@@ -275,9 +275,13 @@ does not know the entire set of branches, so it would end 
up
 firing one e-mail per ref when used naively, though.  The
 <<post-receive,'post-receive'>> hook is more suited to that.
 
-Another use suggested on the mailing list is to use this hook to
-implement access control which is finer grained than the one
-based on filesystem group.
+Another use for this hook to implement access control which is finer
+grained than the one based on filesystem group. Note that if the user
+pushing has a normal login shell on the machine receiving the push
+implementing access control like this can be trivially bypassed by
+just not executing the hook. In those cases consider using
+e.g. linkgit:git-shell[1] as the login shell to restrict the user's
+access.
 
 Both standard output and standard error output are forwarded to
 'git send-pack' on the other end, so you can simply `echo` messages
-- 
2.1.3

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/3] githooks.txt: Improve the intro section

2016-04-25 Thread Ævar Arnfjörð Bjarmason
Change the documentation so that:

 * We don't talk about "little scripts". Hooks can be as big as you
   want, and don't have to be scripts, just call them "programs".

 * We note what happens with chdir() before a hook is called, nothing
   documented this explicitly, but the current behavior is
   predictable. It helps a lot to know what directory these hooks will
   be executed from.

 * We don't make claims about the example hooks which may not be true
   depending on the configuration of 'init.templateDir'. Clarify that
   we're talking about the default settings of git-init in those cases,
   and move some of this documentation into git-init's documentation
   about the default templates.

 * We briefly note in the intro that hooks can get their arguments in
   various different ways, and that how exactly is described below for
   each hook.

Signed-off-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com>
---
 Documentation/git-init.txt |  6 +-
 Documentation/githooks.txt | 32 
 2 files changed, 25 insertions(+), 13 deletions(-)

diff --git a/Documentation/git-init.txt b/Documentation/git-init.txt
index 8174d27..cc3be7d 100644
--- a/Documentation/git-init.txt
+++ b/Documentation/git-init.txt
@@ -130,7 +130,11 @@ The template directory will be one of the following (in 
order):
  - the default template directory: `/usr/share/git-core/templates`.
 
 The default template directory includes some directory structure, suggested
-"exclude patterns" (see linkgit:gitignore[5]), and sample hook files (see 
linkgit:githooks[5]).
+"exclude patterns" (see linkgit:gitignore[5]), and example hook files.
+
+The example hooks are all disabled by default. To enable a hook,
+rename it by removing its `.sample` suffix. See linkgit:githooks[5]
+for more info on hook execution.
 
 EXAMPLES
 
diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt
index a2f59b1..6db515e 100644
--- a/Documentation/githooks.txt
+++ b/Documentation/githooks.txt
@@ -13,18 +13,26 @@ $GIT_DIR/hooks/*
 DESCRIPTION
 ---
 
-Hooks are little scripts you can place in `$GIT_DIR/hooks`
-directory to trigger action at certain points.  When
-'git init' is run, a handful of example hooks are copied into the
-`hooks` directory of the new repository, but by default they are
-all disabled.  To enable a hook, rename it by removing its `.sample`
-suffix.
-
-NOTE: It is also a requirement for a given hook to be executable.
-However - in a freshly initialized repository - the `.sample` files are
-executable by default.
-
-This document describes the currently defined hooks.
+Hooks are programs you can place in the `$GIT_DIR/hooks` directory to
+trigger action at certain points. Hooks that don't have the executable
+bit set are ignored.
+
+When a hook is called in a non-bare repository the working directory
+is guaranteed to be the root of the working tree, in a bare repository
+the working directory will be the path to the repository. I.e. hooks
+don't need to worry about the user's current working directory.
+
+Hooks can get their arguments via the environment, command-line
+arguments, and stdin. See the documentation for each below hook for
+details.
+
+When 'git init' is run it may, depending on its configuration, copy
+hooks to the new repository, see the the "TEMPLATE DIRECTORY" section
+in linkgit:git-init[1] for details. When the rest of this document
+refers to "default hooks" we're talking about the default template
+shipped with Git.
+
+The currently supported hooks are described below.
 
 HOOKS
 -
-- 
2.1.3

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/3] Improvements to githooks.txt documentation

2016-04-25 Thread Ævar Arnfjörð Bjarmason
This includes minor grammar edits pointed out by Eric Sunshine + the
one v2 patch I sent out in response to comments by Jacob Keller.

I thought it was less confusing to just send out a whole v3 series
than ask Junio to piece together v1..v3 of various patches.

Ævar Arnfjörð Bjarmason (3):
  githooks.txt: Improve the intro section
  githooks.txt: Amend dangerous advice about 'update' hook ACL
  githooks.txt: Minor improvements to the grammar & phrasing

 Documentation/git-init.txt |  6 +++-
 Documentation/githooks.txt | 72 +++---
 2 files changed, 47 insertions(+), 31 deletions(-)

-- 
2.1.3

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   3   4   5   6   7   8   9   10   >