Re: [PATCH] branch.c: simplify chain of if statements
On Mon, Mar 17, 2014 at 12:46 PM, Dragos Foianu dragos.foi...@gmail.com wrote: The reason I did not go with this is because I would still need the four ifs in order to keep the bug check part of the code. I might be able to find a work-around for it on the second attempt. I have seen N_() used in other code but I wasn't sure what its purpose was. Aside from other comments here, more generally if you see code that looks odd it helps to see why it was introduced initially. In this case if you'd ran e.g.: git log --reverse -p -G'Branch %s set up to track remote branch %s from %s by rebasing' -- branch.c or otherwise searched for the first occurrence of that odd-looking code you'd have gotten: commit d53a3503 Author: Nguyễn Thái Ngọc Duy pclo...@gmail.com Date: Thu Jun 7 19:05:10 2012 +0700 Remove i18n legos in notifying new branch tracking setup Signed-off-by: Nguyễn Thái Ngọc Duy pclo...@gmail.com Signed-off-by: Junio C Hamano gits...@pobox.com And searching for that commit has plenty of context for why that was done: https://www.google.com/search?q=%22Remove%20i18n%20legos%20in%20notifying%20new%20branch%20tracking%20setup%22 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git push race condition?
On Mon, Mar 24, 2014 at 8:18 PM, Scott Sandler scott.m.sand...@gmail.com wrote: I run a private Git repository (using Gitlab) with about 200 users doing about 100 pushes per day. Ditto but about 2x those numbers. error: Ref refs/heads/master is at 4584c1f34e07cea2df6abc8e0d407fe016017130 but expected 61b79b6d35b066d054fb3deab550f1c51598cf5f remote: error: failed to lock refs/heads/master I also see this error once in a while. I read the code a while back and it's basically because there's two levels of locks that receive-pack tries to get, and it's possible for two pushers to get the first lock due to a race condition. I've never seen data loss due to this though, because the inner lock is atomic. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Borrowing objects from nearby repositories
On Wed, Mar 12, 2014 at 4:37 AM, Andrew Keller and...@kellerfarm.com wrote: Hi all, I am considering developing a new feature, and I'd like to poll the group for opinions. Background: A couple years ago, I wrote a set of scripts that speed up cloning of frequently used repositories. The scripts utilize a bare Git repository located at a known location, and automate providing a --reference parameter to `git clone` and `git submodule update`. Recently, some coworkers of mine expressed an interest in using the scripts, so I published the current version of my scripts, called `git repocache`, described at the bottom of https://github.com/andrewkeller/ak-git-tools. Slowly, it has occurred to me that this feature, or something similar to it, may be worth adding to Git, so I've been thinking about the best approach. Here's my best idea so far: 1) Introduce '--borrow' to `git-fetch`. This would behave similarly to '--reference', except that it operates on a temporary basis, and does not assume that the reference repository will exist after the operation completes, so any used objects are copied into the local objects database. In theory, this mechanism would be distinct from '--reference', so if both are used, some objects would be copied, and some objects would be accessible via a reference repository referenced by the alternates file. Isn't this the same as git clone --reference path --no-hardlinks url ? Also without --no-hardlinks we're not assuming that the other repo doesn't go away (you could rm-rf it), just that the files won't be *modified*, which Git won't do, but you could manually do with other tools, so the default is to hardlink. 2) Teach `git fetch` to read 'repocache.path' (or a better-named configuration), and use it to automatically activate borrowing. So a default path for --reference path --no-hardlinks ? 3) For consistency, `git clone`, `git pull`, and `git submodule update` should probably all learn '--borrow', and forward it to `git fetch`. 4) In some scenarios, it may be necessary to temporarily not automatically borrow, so `git fetch`, and everything that calls it may need an argument to do that. Intended outcome: With 'repocache.path' set, and the cached repository properly updated, one could run `git clone url`, and the operation would complete much faster than it does now due to less load on the network. Things I haven't figured out yet: * What's the best approach to copying the needed objects? It's probably inefficient to copy individual objects out of pack files one at a time, but it could be wasteful to copy entire pack files just because you need one object. Hard-linking could help, but that won't always be available. One of my previous ideas was to add a '--auto-repack' option to `git-clone`, which solves this problem better, but introduces some other front-end usability problems. * To maintain optimal effectiveness, users would have to regularly run a fetch in the cache repository. Not all users know how to set up a scheduled task on their computer, so this might become a maintenance problem for the user. This kind of problem I think brings into question the viability of the underlying design here, assuming that the ultimate goal is to clone faster, with very little or no change in the use of git. Thoughts? Thanks, Andrew Keller -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Get all tips quickly
On Sun, Apr 13, 2014 at 4:19 PM, Kirill Likhodedov kirill.likhode...@jetbrains.com wrote: Hi, What is fastest possible way to get all “tips” (leafs of the Git log graph) in a Git repository with hashes of commits they point to? Tried git for-each-ref and the various options it has? Doing this for 35k tags is still going to be non-trivial. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: general question about git
On Mon, Apr 21, 2014 at 3:17 PM, Miller, Hugh hughmil...@chevron.com wrote: I am interested in exploring the possibility of using versioning for data, that is versioning non-text, non-code file sets. Typical examples are the data files or project files used by some application. These file sets typically contain binary files; these files can be somewhat large, 1GB to 10GB is not unusual. Would git be a suitable tool for this purpose ? Ideally, even if the data files can be versioned this way, one would probably prefer to build the versioning tools into the application. Would the git libraries be suitable for this further aim ? Stock Git is still unsuitable for this purpose, but I recommend you check out git-annex. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Big Java repositories to play with?
On Wed, May 7, 2014 at 3:23 PM, Duy Nguyen pclo...@gmail.com wrote: I need some big Java repos (over 100k files) to test git status. Actually any repos with long path names and deep/wide directory structure are fine, not only Java ones. Right now I'm aware of gentoo-x86 and webkit. Let me know if you know some others. I'm afraid my Google-foo is not strong enough to search these repos. 1. Take a small repo with a small src directory 2. for i in {1..100}; do cp -Rvp src src-$i; done 3. git add src-*; git commit -mbigger For some value of 100 you'll end up with a big repo to test git status on. You just need lots of files to stat(), git status doesn't care about history, so there's no reason why you need to track down an existing large repository. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] git-add--interactive: Preserve diff heading when splitting hunks
Change the display of hunks in hunk splitting mode to preserve the diff heading, which hasn't been done ever since the hunk splitting was initially added in v1.4.4.2-270-g835b2ae. Splitting the first hunk of this patch will now result in: Stage this hunk [y,n,q,a,d,/,j,J,g,s,e,?]? s Split into 2 hunks. @@ -792,7 +792,7 @@ sub hunk_splittable { [...] Instead of: Stage this hunk [y,n,q,a,d,/,j,J,g,s,e,?]? s Split into 2 hunks. @@ -792,7 +792,7 @@ [...] This makes it easier to use the tool when you're splitting some giant hunk and can't remember in which function you are anymore. The diff is somewhat larger than I initially expected because in order to display the headings in the same color scheme as the output from git-diff(1) itself I had to split up the code that would previously color diff output that previously consisted entirely of the fraginfo, but now consists of the fraginfo and the diff heading (the latter of which isn't colored). Signed-off-by: Ævar Arnfjörð Bjarmason ava...@gmail.com --- git-add--interactive.perl | 40 1 file changed, 24 insertions(+), 16 deletions(-) diff --git a/git-add--interactive.perl b/git-add--interactive.perl index 1fadd69..ed1e564 100755 --- a/git-add--interactive.perl +++ b/git-add--interactive.perl @@ -792,11 +792,11 @@ sub hunk_splittable { sub parse_hunk_header { my ($line) = @_; - my ($o_ofs, $o_cnt, $n_ofs, $n_cnt) = - $line =~ /^@@ -(\d+)(?:,(\d+))? \+(\d+)(?:,(\d+))? @@/; + my ($o_ofs, $o_cnt, $n_ofs, $n_cnt, $heading) = + $line =~ /^@@ -(\d+)(?:,(\d+))? \+(\d+)(?:,(\d+))? @@(.*)/; $o_cnt = 1 unless defined $o_cnt; $n_cnt = 1 unless defined $n_cnt; - return ($o_ofs, $o_cnt, $n_ofs, $n_cnt); + return ($o_ofs, $o_cnt, $n_ofs, $n_cnt, $heading); } sub split_hunk { @@ -808,8 +808,7 @@ sub split_hunk { # If there are context lines in the middle of a hunk, # it can be split, but we would need to take care of # overlaps later. - - my ($o_ofs, undef, $n_ofs) = parse_hunk_header($text-[0]); + my ($o_ofs, undef, $n_ofs, undef, $heading) = parse_hunk_header($text-[0]); my $hunk_start = 1; OUTER: @@ -886,17 +885,26 @@ sub split_hunk { my $o_cnt = $hunk-{OCNT}; my $n_cnt = $hunk-{NCNT}; - my $head = (@@ -$o_ofs . - (($o_cnt != 1) ? ,$o_cnt : '') . -+$n_ofs . - (($n_cnt != 1) ? ,$n_cnt : '') . -@@\n); - my $display_head = $head; - unshift @{$hunk-{TEXT}}, $head; - if ($diff_use_color) { - $display_head = colored($fraginfo_color, $head); - } - unshift @{$hunk-{DISPLAY}}, $display_head; + my $fraginfo = join( + , + @@ -$o_ofs, + (($o_cnt != 1) ? ,$o_cnt : ''), ++$n_ofs, + (($n_cnt != 1) ? ,$n_cnt : ''), +@@ + ); + unshift @{$hunk-{TEXT}}, join( + , + $fraginfo, + $heading, + \n + ); + unshift @{$hunk-{DISPLAY}}, join( + , + $diff_use_color ? colored($fraginfo_color, $fraginfo) : $fraginfo, + $heading, + \n + ); } return @split; } -- 2.0.0.rc0 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] git-add--interactive: Preserve diff heading when splitting hunks
On Mon, May 12, 2014 at 8:39 PM, Jeff King p...@peff.net wrote: On Sun, May 11, 2014 at 04:09:56PM +, Ævar Arnfjörð Bjarmason wrote: Change the display of hunks in hunk splitting mode to preserve the diff heading, which hasn't been done ever since the hunk splitting was initially added in v1.4.4.2-270-g835b2ae. Splitting the first hunk of this patch will now result in: Stage this hunk [y,n,q,a,d,/,j,J,g,s,e,?]? s Split into 2 hunks. @@ -792,7 +792,7 @@ sub hunk_splittable { [...] Instead of: Stage this hunk [y,n,q,a,d,/,j,J,g,s,e,?]? s Split into 2 hunks. @@ -792,7 +792,7 @@ [...] This makes it easier to use the tool when you're splitting some giant hunk and can't remember in which function you are anymore. This makes a lot of sense to me. I did notice two interesting quirks, one of which might be worth addressing. One, there is a slightly funny artifact in that the hunk header comes from the top of the context line, and that top is a different position for each of the split hunks. So in a file like: header_A content header_B one two three four you might have a diff like: @@ ... @@ header_A header_B one two +new line 1 three +new line 2 four The hunk header for new line 1 is A, because B itself is part of the context. But the hunk header for new line 2, if it were an independent hunk, would be B. We print A because we copy it from the original hunk. It probably won't matter much in practice (and I can even see an argument that A is the right answer). And figuring out B here would be prohibitively difficult, I would think, as it would require applying the funcname rules internal to git-diff to a hunk that git-diff itself never actually sees. Since the output from your patch is strictly better than what we saw before, I think there is no reason we cannot leave such an improvement to later (or never). Good suggestion, but tricky as you point out. Another thing I've wanted many times is to make it smart enough that when you edit code like: A() B(); And change it to: X(); Y(); The change from A-X and B-Y may be completely unrelated and just made in code where the author didn't add whitespace between unrelated statements. But because you change all the lines the tool can't split them up, it could try harder and split hunks like that if you add a whitespace boundary, or just go all the way down to adding/removing individual lines, so you wouldn't have to fall down to edit mode and do so manually. The diff is somewhat larger than I initially expected because in order to display the headings in the same color scheme as the output from git-diff(1) itself I had to split up the code that would previously color diff output that previously consisted entirely of the fraginfo, but now consists of the fraginfo and the diff heading (the latter of which isn't colored). The func heading is not colored by default, but you can configure it to be so with color.diff.func. I double-checked the behavior with your patch: you end up with the uncolored header in the split hunks, because it is parsed from the uncolored line. Which is not bad, but I think we can trivially do better, just by adding back in the color as we do with the fraginfo. Like: diff --git a/git-add--interactive.perl b/git-add--interactive.perl index ed1e564..ac5763d 100755 --- a/git-add--interactive.perl +++ b/git-add--interactive.perl @@ -29,6 +29,10 @@ my ($fraginfo_color) = $diff_use_color ? ( $repo-get_color('color.diff.frag', 'cyan'), ) : (); +my ($funcname_color) = + $diff_use_color ? ( + $repo-get_color('color.diff.func', ''), + ) : (); my ($diff_plain_color) = $diff_use_color ? ( $repo-get_color('color.diff.plain', ''), @@ -902,7 +906,7 @@ sub split_hunk { unshift @{$hunk-{DISPLAY}}, join( , $diff_use_color ? colored($fraginfo_color, $fraginfo) : $fraginfo, - $heading, + $diff_use_color ? colored($funcname_color, $heading) : $heading, \n ); } I didn't prepare a commit message because I think it should probably just be squashed in. Well spotted, indeed, that should be squashed in. On a related note I thought by doing color.ui=auto I was turning on all the colors, it would be nice if there was a built-in colorscheme that added more coloring to items like these across our tools, it's useful to have the hunk headers colored differently so they stand out more. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] git related v0.3
On Mon, May 19, 2014 at 2:36 AM, Felipe Contreras felipe.contre...@gmail.com wrote: This tool finds people that might be interested in a patch, by going back through the history for each single hunk modified, and finding people that reviewed, acknowledged, signed, or authored the code the patch is modifying. It does this by running `git blame` incrementally on each hunk, and finding the relevant commit message. After gathering all the relevant people, it groups them to show what exactly was their role when the participated in the development of the relevant commit, and on how many relevant commits they participated. They are only displayed if they pass a minimum threshold of participation. It is similar the the `git contacts` tool in the contrib area, which is a rewrite of this tool, except that `git contacts` does the absolute minimum; `git related` is way superior in every way. The general heuristic I use, which I've found to be much better than git-blame is: 1. Find substrings of code I'm directly removing/altering, and functions I'm removing/altering 2. Do git log --reverse -p -S'substr' (maybe with -- file) for a list of substrings I've generally found that to be a better heuristic to start with in both git.git and non-git.git code, blame tends to bias the view towards giving you people who've just moved the code around or made minor changes (are you at least using blame -w?). We recently discussed having a tool like this at work to aid in our review process, but I pointed out there that you had to be careful with how it was written, e.g. if you rank importance as a function of the number of commits you're now going to bother people more with review requests if they make granular commits, whereas what you actually want is to contact the significant authors, which generally speaking can be defined as the original authors of the code you're altering or replacing. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ANNOUNCE] A pre-receive hook to intelligently block binary data
After searching around a bit I couldn't find a stand-alone Git hook that would intelligently block binary data pushes so I wrote my own: https://github.com/avar/pre-receive-reject-binaries Main features: * Quota per-commit for how much binary data is OK * Ability to optionally allow users to override binary pushes by including a notice in their commit messages * Doesn't disallow removing existing binary data, or renaming existing binary files * Will block commits that include references to existing binary blobs though * Spots cases where a push is pushing commits that add and then remove binary blobs (i.e. counts net additions) * Has hookable support for logging by piping its output to external commands when it runs or when it rejects/unblocks a binary push. I'm using this for logging its output to a logfile, and to send E-Mails when it blocks/is unblocked. * Only requires a stock perl install, should run on any *nix-like OS out of the box * Should be relatively fast compared to some other similar solutions I've seen, i.e. it parses the output of one git-log --stat command for the entire push, and doesn't e.g. do a git show for each commit being pushed. One general note about git-log output: I was disappointed to see that there was no easily parsable git log output that showed you how much binary files increased in size, --numstat will just show - for binary files, and it's non-trivial to parse the --stat output. It's meant for human consumption and will sometimes include variations in how much whitespace is inserted. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
The gitweb author initials feature from a36817b doesn't work with i18n names
The @author_initials feature Jakub added in a36817b claims to use a i18n regexp (/\b([[:upper:]])\B/g), but in Perl this doesn't actually do anything unless the string being matched against has the UTF8 flag. So as a result it abbreviates me to AB not ÆAB. Here's something that demonstrates the issue: $ cat author-initials.pl #!/usr/bin/env perl use strict; use warnings; #binmode STDOUT, ':utf8'; open my $fd, -|, git, blame, --incremental, --, Makefile or die Can't open: $!; #binmode $fd, :utf8; while (my $line = $fd) { next unless my ($author) = $line =~ /^author (.*)/; my @author_initials = ($author =~ /\b([[:upper:]])\B/g); printf %s (%s)\n, join(, @author_initials), $author; } With those two binmode commands commented out: $ perl author-initials.pl |sort|uniq -c|sort -nr|head -n 5 99 JH (Junio C Hamano) 35 JN (Jonathan Nieder) 35 JK (Jeff King) 20 JS (Johannes Schindelin) 16 AB (Ævar Arnfjörð Bjarmason) And uncommented: $ perl author-initials.pl |sort|uniq -c|sort -nr|head -n 5 99 JH (Junio C Hamano) 35 JN (Jonathan Nieder) 35 JK (Jeff King) 20 JS (Johannes Schindelin) 16 ÆAB (Ævar Arnfjörð Bjarmason) Jakub, do you see a reason not to just apply this: diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index f429f75..29b3fb5 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -6631,6 +6631,7 @@ sub git_blame_common { $hash_base, '--', $file_name or die_error(500, Open git-blame --porcelain failed); } + binmode $fd, :utf8; # incremental blame data returns early if ($format eq 'data') { I haven't gotten an env where I can test gitweb running, but that looks like it should work to me. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] gitweb: Fix the author initials in blame for non-ASCII names
Change the @author_initials feature Jakub added in v1.6.4-rc2-14-ga36817b to match non-ASCII author initials as intended. The regexp Jakub added was intended to match non-ASCII (/\b([[:upper:]])\B/g). But in Perl this doesn't actually match non-ASCII upper-case characters unless the string being matched against has the UTF8 flag. So when we open a pipe to git blame we need to mark the file descriptor we're opening as utf8 explicitly. So as a result it abbreviates me to AB not ÆAB, entirely because Æ isn't /[[:upper:]]/ unless the string being matched against has the UTF8 flag. Here's something that demonstrates the issue: #!/usr/bin/env perl use strict; use warnings; binmode STDOUT, ':utf8' if $ENV{UTF8}; open my $fd, -|, git, blame, --incremental, --, Makefile or die Can't open: $!; binmode $fd, :utf8 if $ENV{UTF8}; while (my $line = $fd) { next unless my ($author) = $line =~ /^author (.*)/; my @author_initials = ($author =~ /\b([[:upper:]])\B/g); printf %s (%s)\n, join(, @author_initials), $author; } When that's run with and without UTF8 being true in the environment it gives, on git.git: $ UTF8=0 perl author-initials.pl | sort | uniq -c | sort -nr | head -n 5 99 JH (Junio C Hamano) 35 JN (Jonathan Nieder) 35 JK (Jeff King) 20 JS (Johannes Schindelin) 16 AB (Ævar Arnfjörð Bjarmason) $ UTF8=1 perl author-initials.pl | sort | uniq -c | sort -nr | head -n 5 99 JH (Junio C Hamano) 35 JN (Jonathan Nieder) 35 JK (Jeff King) 20 JS (Johannes Schindelin) 16 ÆAB (Ævar Arnfjörð Bjarmason) Acked-by: Jakub Narębski jna...@gmail.com Tested-by: Ævar Arnfjörð Bjarmason ava...@gmail.com Tested-by: Simon Ruderich si...@ruderich.org --- gitweb/gitweb.perl | 1 + 1 file changed, 1 insertion(+) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index f429f75..ad48a5a 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -6631,6 +6631,7 @@ sub git_blame_common { $hash_base, '--', $file_name or die_error(500, Open git-blame --porcelain failed); } + binmode $fh, ':utf8'; # incremental blame data returns early if ($format eq 'data') { -- 1.8.4.rc2 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] gitweb: Fix the author initials in blame for non-ASCII names
I did. I just clumsily sent out the wrong patch. I.e. tested it manually on another system, and then fat-fingered $fh instead of $fd. Should I send another patch or do you want to just fix this one up? On Fri, Aug 30, 2013 at 8:13 PM, Junio C Hamano gits...@pobox.com wrote: Junio C Hamano gits...@pobox.com writes: Ævar Arnfjörð Bjarmason ava...@gmail.com writes: Acked-by: Jakub Narębski jna...@gmail.com Tested-by: Ævar Arnfjörð Bjarmason ava...@gmail.com Tested-by: Simon Ruderich si...@ruderich.org --- +++ b/gitweb/gitweb.perl @@ -6631,6 +6631,7 @@ sub git_blame_common { ... +binmode $fh, ':utf8'; [Fri Aug 30 17:48:17 2013] gitweb.perl: Global symbol $fh requires explicit package name at /home/gitster/w/buildfarm/next/t/../gitweb/gitweb.perl line 6634. [Fri Aug 30 17:48:17 2013] gitweb.perl: Execution of /home/gitster/w/buildfarm/next/t/../gitweb/gitweb.perl aborted due to compilation errors. I think in this function the filehandle is called $fd, not $fh. Has any of you really tested this??? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Existing utility to track compiled files in another sister repository, for rollouts
On Thu, Aug 23, 2012 at 6:28 PM, Ævar Arnfjörð Bjarmason ava...@gmail.com wrote: I'm planning on using Git for a deployment process where the steps are basically: 1. You log into a deployment host, cd into software.git, do git pull 2. A tool runs make for you, creates a deployment-MMDD-HHMMSS tag 3. That make step will create a bunch of generated (text) files 4. Get a list of these with : git clean -dxfn 5. Copy those to to software-generated.git, removing any that we didn't just create, adding any that are new 6. Commit that, tag it with generated-deployment-MMDD-HHMMSS 7. Push out both our generated software.git and software-generated.git tag to our servers 8. git reset --hard both of those to our newly pushed out tags 9. Do git clean -dxf on software.git remove old generated files 10. Copy new generated files from generated-software.git to software.git 11. Restart our application to pick up the new software For this I'll need to write some git snapshot-commit tool for #5 and #6 to commit whatever the current state of the directory is (with removed/added files), and hack up something to do #9-#10. This should all be relatively easy, I was just wondering if there was any prior art on this that I could use instead of hacking it up myself. Here's a quick hack that does #4-6 but not #9-10 yet, although that would be easy: https://gist.github.com/3440792 Suggestions for improvements welcome, particularly whether there's a simpler way to do this to nuke existing files in a repo and replace it with new files all staged for commit: # Go to the target repository, nuke anything already there chdir $to_repository; system git reset --hard; system git clean -dxf; system git ls-tree --name-only HEAD -z | xargs -0 rm -rf; system git add --update; # stage any removals Followed by: system tar xvf incoming.tar; system rm incoming.tar; system git add * .??* || :; # Might die if we empty the repo, TODO: make this use status - add each file system git commit -m'Bump copy from $from_repository to $to_repository' || :; # We might have nothing to change! -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/6] Gettext poison rework
On Fri, Aug 24, 2012 at 7:43 AM, Nguyễn Thái Ngọc Duy pclo...@gmail.com wrote: Still WIP but I'm getting closer. I dropped test-poisongen and started to use podebug [2] instead. Less code in git. podebug does not preserve shell variables yet. I'll follow that up at upstream [1]. With this series, if you have translation toolkit installed, you could do make pseudo-locale L=your language code make GETTEXT_POISON=$LANG test podebug supports a few way of rewriting translations. Currently unicode is used but you can change it via PODEBUG_OPTS t9001 is not happy with $LANG != C though. May need to add some prereq there. [1] http://bugs.locamotion.org/show_bug.cgi?id=2450 [2] http://translate.sourceforge.net/wiki/toolkit/podebug The reason I didn't do something like this to begin with is that gettext/glibc doesn't have support for fake locales, so you'd have to appropriate a real one for tests. It's good to see you poking the gettext mailing list about adding support far thot. But something like podebug gets around that quite nicely, so we can still have the testing the poison stuff was intended for, without the complexity of supporting it throughout all our i18n code. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Should GIT_AUTHOR_{NAME,EMAIL} set the tagger name/email?
Maybe this is documented in some place I didn't spot, but I expected that when I set GIT_AUTHOR_{NAME,EMAIL} it would affect the operation of git-tag, but it doesn't seem to. When I create tags it seems to completely ignore those variables. Should it be doing that? Here's a test script demonstrating the issue: #!/bin/sh -e # Set defaults git config --global user.name Ævar Arnfjörð Bjarmason git config --global user.email ava...@gmail.com rm -rf /tmp/test-git git init /tmp/test-git cd /tmp/test-git make_commit() { file=$1 content=$2 echo $content $file git add $file git commit -m$file: $content $file git --no-pager log -1 HEAD | grep ^Author } make_commit README testing content git config user.name Test User git config user.email t...@example.com make_commit README testing content again git tag -a -mannotated tag tag-name-1 git --no-pager show tag-name-1 | grep ^Author GIT_AUTHOR_NAME=Tag Test User GIT_AUTHOR_EMAIL=tagt...@example.com git tag -a -manother annotated tag tag-name-2 git --no-pager show tag-name-2 | grep ^Author Which outputs: $ sh /tmp/test-tag.sh Initialized empty Git repository in /tmp/test-git/.git/ [master (root-commit) 9816756] README: testing content 1 file changed, 1 insertion(+) create mode 100644 README Author: Ævar Arnfjörð Bjarmason ava...@gmail.com [master 304b71e] README: testing content again 1 file changed, 1 insertion(+), 1 deletion(-) Author: Test User t...@example.com Author: Test User t...@example.com Author: Test User t...@example.com I'd expect references to Tag Test User tagt...@example.com for the second tag I created. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Should GIT_AUTHOR_{NAME,EMAIL} set the tagger name/email?
On Sat, Sep 1, 2012 at 5:57 PM, Andreas Schwab sch...@linux-m68k.org wrote: Ævar Arnfjörð Bjarmason ava...@gmail.com writes: git --no-pager show tag-name-1 | grep ^Author A tag doesn't have an author, it has a tagger. This shows the author of the *commit*. I got the grep wrong, I meant that I expected the tagger to be set according to GIT_AUTHOR_{NAME,EMAIL}, but it isn't either: $ sh /tmp/test-tag.sh Initialized empty Git repository in /tmp/test-git/.git/ [master (root-commit) f83fc11] README: testing content 1 file changed, 1 insertion(+) create mode 100644 README Author: Ævar Arnfjörð Bjarmason ava...@gmail.com [master ef65731] README: testing content again 1 file changed, 1 insertion(+), 1 deletion(-) Author: Test User t...@example.com Tagger: Test User t...@example.com Author: Test User t...@example.com Tagger: Test User t...@example.com Author: Test User t...@example.com GIT_AUTHOR_NAME=Tag Test User GIT_AUTHOR_EMAIL=tagt...@example.com git tag -a -manother annotated tag tag-name-2 The tagger is controlled by the committer info. I don't get what you mean, what committer info? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Does or could git handle file licensing information?
On Wed, Sep 5, 2012 at 12:51 PM, Yohann Ferreira yohann.ferre...@orange.fr wrote: As a day-to-day hard git user ;), I also have to manage files with different licenses I need to track. As git handles all those files in a very smart way, I wondered whether git could also handle that information, at least somehow. Say you have files like: main.c imglib.c config.c Why not just have these files: license-info/GPL license-info/SOME-OTHER-LICENSE Which would contain, respectively: main.c config.c And: imglib.c Then just have a script, maybe add it as a hook on your server before it accepts a push which ensures that all files currently in the tree are listed in those license-info/* files. You could also just add a license header to each of these files, and have a script that ensures that everything has such a header. I think the Debian project has such a script that you could adapt. Git just tracks files, so just do this in some file-based manner and you'll be fine. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Should GIT_AUTHOR_{NAME,EMAIL} set the tagger name/email?
On Sat, Sep 1, 2012 at 6:12 PM, Andreas Schwab sch...@linux-m68k.org wrote: Ævar Arnfjörð Bjarmason ava...@gmail.com writes: I don't get what you mean, what committer info? GIT_COMMITTER_{NAME,EMAIL}. A tagger isn't really an author. Ah, am I the only one that finds that a bit counterintuitive to the point of wanting to submit a patch to change it? If you've created a tag you're the *author* of that tag, the author/committer distinction for commit objects is there for e.g. rebases and applying commits via e.g. git-am. We don't have a similar facility for tags (you have to push them around directly), but we *could* and in that case having a Tag-Committer as well well as a Tagger would make sense. Junio, what do you think? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] Import wildmatch from rsync
On Wed, Sep 26, 2012 at 1:25 PM, Nguyễn Thái Ngọc Duy pclo...@gmail.com wrote: These files are from rsync.git commit f92f5b166e3019db42bc7fe1aa2f1a9178cd215d, which was the last commit before rsync turned GPL-3. All files are imported as-is and no-op. Adaptation is done in a separate patch. Perhaps Wayne Davison (added to CC) wouldn't mind giving us permission to use the subsequent changes to these files under the GPLv2? rsync.git $ git --no-pager log --pretty=%h %an %s\n --reverse f92f5b166e3019db42bc7fe1aa2f1a9178cd215d.. -- '*wild*' 4fd842f Wayne Davison Switching to GPL 3.\n 8e41b68 Wayne Davison Tweaking the license text a bit more.\n d3d07a5 Wayne Davison Include 2008 in the copyright years.\n adc2476 Wayne Davison Output numbers in 3-digit groups by default (e.g. 1,234,567). Also improved the human-readable output functions, including adding the ability to output negative numbers.\n b3bf9b9 Wayne Davison Update the copyright year.\n fd91c3b Wayne Davison Fix two unused-variable compiler warnings.\n rsync.git - git.git lib/wildmatch.[ch] wildmatch.[ch] wildtest.c test-wildmatch.c wildtest.txtt/t3070/wildtest.txt Signed-off-by: Nguyễn Thái Ngọc Duy pclo...@gmail.com Signed-off-by: Junio C Hamano gits...@pobox.com --- t/t3070/wildtest.txt | 165 +++ test-wildmatch.c | 222 +++ wildmatch.c | 368 +++ wildmatch.h | 6 + 4 files changed, 761 insertions(+) create mode 100644 t/t3070/wildtest.txt create mode 100644 test-wildmatch.c create mode 100644 wildmatch.c create mode 100644 wildmatch.h diff --git a/t/t3070/wildtest.txt b/t/t3070/wildtest.txt new file mode 100644 index 000..42c1678 --- /dev/null +++ b/t/t3070/wildtest.txt @@ -0,0 +1,165 @@ +# Input is in the following format (all items white-space separated): +# +# The first two items are 1 or 0 indicating if the wildmat call is expected to +# succeed and if fnmatch works the same way as wildmat, respectively. After +# that is a text string for the match, and a pattern string. Strings can be +# quoted (if desired) in either double or single quotes, as well as backticks. +# +# MATCH FNMATCH_SAME text to match 'pattern to use' + +# Basic wildmat features +1 1 foofoo +0 1 foobar +1 1 '' +1 1 foo??? +0 1 foo?? +1 1 foo* +1 1 foof* +0 1 foo*f +1 1 foo*foo* +1 1 foobar *ob*a*r* +1 1 aaabababab *ab +1 1 foo* foo\* +0 1 foobar foo\*bar +1 1 f\oo f\\oo +1 1 ball *[al]? +0 1 ten[ten] +1 1 ten**[!te] +0 1 ten**[!ten] +1 1 tent[a-g]n +0 1 tent[!a-g]n +1 1 tont[!a-g]n +1 1 tont[^a-g]n +1 1 a]ba[]]b +1 1 a-ba[]-]b +1 1 a]ba[]-]b +0 1 aaba[]-]b +1 1 aaba[]a-]b +1 1 ] ] + +# Extended slash-matching features +0 1 foo/baz/barfoo*bar +1 1 foo/baz/barfoo**bar +0 1 foo/barfoo?bar +0 1 foo/barfoo[/]bar +0 1 foo/barf[^eiu][^eiu][^eiu][^eiu][^eiu]r +1 1 foo-barf[^eiu][^eiu][^eiu][^eiu][^eiu]r +0 1 foo**/foo +1 1 /foo **/foo +1 1 bar/baz/foo**/foo +0 1 bar/baz/foo*/foo +0 0 foo/bar/baz**/bar* +1 1 deep/foo/bar/baz **/bar/* +0 1 deep/foo/bar/baz/ **/bar/* +1 1 deep/foo/bar/baz/ **/bar/** +0 1 deep/foo/bar **/bar/* +1 1 deep/foo/bar/ **/bar/** +1 1 foo/bar/baz**/bar** +1 1 foo/bar/baz/x */bar/** +0 0 deep/foo/bar/baz/x */bar/** +1 1 deep/foo/bar/baz/x **/bar/*/* + +# Various additional tests +0 1 acrt a[c-c]st +1 1 acrt a[c-c]rt +0 1 ] [!]-] +1 1 a [!]-] +0 1 '' \ +0 1 \ \ +0 1 /\ */\ +1 1 /\ */\\ +1 1 foofoo +1 1 @foo @foo +0 1 foo@foo +1 1 [ab] \[ab] +1 1 [ab] [[]ab] +1 1 [ab] [[:]ab] +0 1 [ab] [[::]ab] +1 1 [ab] [[:digit]ab] +1 1 [ab] [\[:]ab] +1 1 ?a?b \??\?b +1 1 abc\a\b\c +0 1 foo'' +1 1 foo/bar/baz/to **/t[o] + +# Character class tests +1 1 a1B
Re: upload-pack is slow with lots of refs
On Wed, Oct 3, 2012 at 8:03 PM, Jeff King p...@peff.net wrote: On Wed, Oct 03, 2012 at 02:36:00PM +0200, Ævar Arnfjörð Bjarmason wrote: I'm creating a system where a lot of remotes constantly fetch from a central repository for deployment purposes, but I've noticed that even with a remote.$name.fetch configuration to only get certain refs a git fetch will still call git-upload pack which will provide a list of all references. This is being done against a repository with tens of thousands of refs (it has a tag for each deployment), so it ends up burning a lot of CPU time on the uploader/receiver side. Where is the CPU being burned? Are your refs packed (that's a huge savings)? What are the refs like? Are they .have refs from an alternates repository, or real refs? Are they pointing to commits or tag objects? What version of git are you using? In the past year or so, I've made several tweaks to speed up large numbers of refs, including: - cff38a5 (receive-pack: eliminate duplicate .have refs, v1.7.6); note that this only helps if they are being pulled in by an alternates repo. And even then, it only helps if they are mostly duplicates; distinct ones are still O(n^2). - 7db8d53 (fetch-pack: avoid quadratic behavior in remove_duplicates) a0de288 (fetch-pack: avoid quadratic loop in filter_refs) Both in v1.7.11. I think there is still a potential quadratic loop in mark_complete() - 90108a2 (upload-pack: avoid parsing tag destinations) 926f1dd (upload-pack: avoid parsing objects during ref advertisement) Both in v1.7.10. Note that tag objects are more expensive to advertise than commits, because we have to load and peel them. Even with those patches, though, I found that it was something like ~2s to advertise 100,000 refs. I can't provide all the details now (not with access to that machine now), but briefly: * The git client/server version is 1.7.8 * The repository has around 50k refs, they're real refs, almost all of them (say all but 0.5k-1k) are annotated tags, the rest are branches. * 99% of them are packed, there's a weekly cronjob that packs them all up, there were a few newly pushed branches and tags outside of the * I tried echo -n | git upload-pack repo on both that 50k repository and a repository with 100 refs, the former took around ~1-2s to run on a 24 core box and the latter ~500ms. * When I ran git-upload-pack with GNU parallel I managed around 20/s packs on the 24 core box on the 50k ref one, 40/s on the 100 ref one. * A co-worker who was working on this today tried it on 1.7.12 and claimed that it had the same performance characteristics. * I tried to profile it under gcc -pg echo -n | ./git-upload-pack repo but it doesn't produce a profile like that, presumably because the process exits unsuccessfully. Maybe someone here knows offhand what mock data I could feed git-upload-pack to make it happy to just list the refs, or better yet do a bit more work which it would do if it were actually doing the fetch (I suppose I could just do a fetch, but I wanted to do this from a locally compiled checkout). Has there been any work on extending the protocol so that the client tells the server what refs it's interested in? I don't think so. It would be hard to do in a backwards-compatible way, because the advertisement is the first thing the server says, before it has negotiated any capabilities with the client at all. I suppose at least for the ssh protocol we could just do: ssh server (git upload-pack repo --refs=* || git upload-pack repo) And something similar with HTTP headers, but that of course leaves the git:// protocol. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: upload-pack is slow with lots of refs
On Wed, Oct 3, 2012 at 11:20 PM, Jeff King p...@peff.net wrote: Thanks for all that info, it's really useful. * A co-worker who was working on this today tried it on 1.7.12 and claimed that it had the same performance characteristics. That's surprising to me. Can you try to verify those numbers? I think he was wrong, I tested this on git.git by first creating a lot of tags: parallel --eta git tag -a -m{} test-again-{} ::: $(git rev-list HEAD) Then doing: git pack-refs --all git repack -A -d And compiled with -g -O3 I get around 1.55 runs/s of git-upload-pack on 1.7.8 and 2.59/s on the master branch. * I tried to profile it under gcc -pg echo -n | ./git-upload-pack repo but it doesn't produce a profile like that, presumably because the process exits unsuccessfully. If it's a recent version of Linux, you'll get much nicer results with perf. Here's what my 400K-ref case looks like: $ time echo | perf record git-upload-pack . /dev/null real0m0.808s user0m0.660s sys 0m0.136s $ perf report | grep -v ^# | head 11.40% git-upload-pack libc-2.13.so[.] vfprintf 9.70% git-upload-pack git-upload-pack [.] find_pack_entry_one 7.64% git-upload-pack git-upload-pack [.] check_refname_format 6.81% git-upload-pack libc-2.13.so[.] __memcmp_sse4_1 5.79% git-upload-pack libc-2.13.so[.] getenv 4.20% git-upload-pack libc-2.13.so[.] __strlen_sse42 3.72% git-upload-pack git-upload-pack [.] ref_entry_cmp_sslice 3.15% git-upload-pack git-upload-pack [.] read_packed_refs 2.65% git-upload-pack git-upload-pack [.] sha1_to_hex 2.44% git-upload-pack libc-2.13.so[.] _IO_default_xsputn FWIW here are my results on the above pathological git.git $ uname -r; perf --version; echo | perf record ./git-upload-pack ./dev/null; perf report | grep -v ^# | head 3.2.0-2-amd64 perf version 3.2.17 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.026 MB perf.data (~1131 samples) ] 29.08% git-upload-pack libz.so.1.2.7 [.] inflate 17.99% git-upload-pack libz.so.1.2.7 [.] 0xaec1 6.21% git-upload-pack libc-2.13.so[.] 0x117503 5.69% git-upload-pack libcrypto.so.1.0.0 [.] 0x82c3d 4.87% git-upload-pack git-upload-pack [.] find_pack_entry_one 3.18% git-upload-pack ld-2.13.so [.] 0x886e 2.96% git-upload-pack libc-2.13.so[.] vfprintf 2.83% git-upload-pack git-upload-pack [.] search_for_subdir 1.56% git-upload-pack [kernel.kallsyms] [k] do_raw_spin_lock 1.36% git-upload-pack libc-2.13.so[.] vsnprintf I wonder why your report doesn't note any time in libz. This is on Debian testing, maybe your OS uses different strip settings so it doesn't show up? $ ldd -r ./git-upload-pack linux-vdso.so.1 = (0x7fff621ff000) libz.so.1 = /lib/x86_64-linux-gnu/libz.so.1 (0x7f768feee000) libcrypto.so.1.0.0 = /usr/lib/x86_64-linux-gnu/libcrypto.so.1.0.0 (0x7f768fb0a000) libpthread.so.0 = /lib/x86_64-linux-gnu/libpthread.so.0 (0x7f768f8ed000) libc.so.6 = /lib/x86_64-linux-gnu/libc.so.6 (0x7f768f566000) libdl.so.2 = /lib/x86_64-linux-gnu/libdl.so.2 (0x7f768f362000) /lib64/ld-linux-x86-64.so.2 (0x7f7690117000 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: upload-pack is slow with lots of refs
On Wed, Oct 3, 2012 at 8:03 PM, Jeff King p...@peff.net wrote: What version of git are you using? In the past year or so, I've made several tweaks to speed up large numbers of refs, including: - cff38a5 (receive-pack: eliminate duplicate .have refs, v1.7.6); note that this only helps if they are being pulled in by an alternates repo. And even then, it only helps if they are mostly duplicates; distinct ones are still O(n^2). - 7db8d53 (fetch-pack: avoid quadratic behavior in remove_duplicates) a0de288 (fetch-pack: avoid quadratic loop in filter_refs) Both in v1.7.11. I think there is still a potential quadratic loop in mark_complete() - 90108a2 (upload-pack: avoid parsing tag destinations) 926f1dd (upload-pack: avoid parsing objects during ref advertisement) Both in v1.7.10. Note that tag objects are more expensive to advertise than commits, because we have to load and peel them. Even with those patches, though, I found that it was something like ~2s to advertise 100,000 refs. FWIW I bisected between 1.7.9 and 1.7.10 and found that the point at which it went from 1.5/s to 2.5/s upload-pack runs on the pathological git.git repository was none of those, but: ccdc6037fe - parse_object: try internal cache before reading object db -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: upload-pack is slow with lots of refs
On Thu, Oct 4, 2012 at 1:21 AM, Jeff King p...@peff.net wrote: On Thu, Oct 04, 2012 at 12:32:35AM +0200, Ævar Arnfjörð Bjarmason wrote: On Wed, Oct 3, 2012 at 8:03 PM, Jeff King p...@peff.net wrote: What version of git are you using? In the past year or so, I've made several tweaks to speed up large numbers of refs, including: - cff38a5 (receive-pack: eliminate duplicate .have refs, v1.7.6); note that this only helps if they are being pulled in by an alternates repo. And even then, it only helps if they are mostly duplicates; distinct ones are still O(n^2). - 7db8d53 (fetch-pack: avoid quadratic behavior in remove_duplicates) a0de288 (fetch-pack: avoid quadratic loop in filter_refs) Both in v1.7.11. I think there is still a potential quadratic loop in mark_complete() - 90108a2 (upload-pack: avoid parsing tag destinations) 926f1dd (upload-pack: avoid parsing objects during ref advertisement) Both in v1.7.10. Note that tag objects are more expensive to advertise than commits, because we have to load and peel them. Even with those patches, though, I found that it was something like ~2s to advertise 100,000 refs. FWIW I bisected between 1.7.9 and 1.7.10 and found that the point at which it went from 1.5/s to 2.5/s upload-pack runs on the pathological git.git repository was none of those, but: ccdc6037fe - parse_object: try internal cache before reading object db Ah, yeah, I forgot about that one. That implies that you have a lot of refs pointing to the same objects (since the benefit of that commit is to avoid reading from disk when we have already seen it). Out of curiosity, what does your repo contain? I saw a lot of speedup with that commit because my repos are big object stores, where we have the same duplicated tag refs for every fork of the repo. Things are much faster with your monkeypatch, got up to around 10 runs/s. The repository mainly contains a lot of git-deploy[1] generated tags which are added for every rollout to several subsystems. Of the ~50k references in the repo 75% point to a commit that no other reference points to. Around 98% of the references are annotated tags, the rest are branches. 1. https://github.com/git-deploy/git-deploy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: upload-pack is slow with lots of refs
On Thu, Oct 4, 2012 at 1:15 AM, Jeff King p...@peff.net wrote: On Thu, Oct 04, 2012 at 12:15:47AM +0200, Ævar Arnfjörð Bjarmason wrote: I think he was wrong, I tested this on git.git by first creating a lot of tags: parallel --eta git tag -a -m{} test-again-{} ::: $(git rev-list HEAD) Then doing: git pack-refs --all git repack -A -d And compiled with -g -O3 I get around 1.55 runs/s of git-upload-pack on 1.7.8 and 2.59/s on the master branch. Thanks for the update, that's more like what I expected. FWIW here are my results on the above pathological git.git $ uname -r; perf --version; echo | perf record ./git-upload-pack ./dev/null; perf report | grep -v ^# | head 3.2.0-2-amd64 perf version 3.2.17 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.026 MB perf.data (~1131 samples) ] 29.08% git-upload-pack libz.so.1.2.7 [.] inflate 17.99% git-upload-pack libz.so.1.2.7 [.] 0xaec1 6.21% git-upload-pack libc-2.13.so[.] 0x117503 5.69% git-upload-pack libcrypto.so.1.0.0 [.] 0x82c3d 4.87% git-upload-pack git-upload-pack [.] find_pack_entry_one 3.18% git-upload-pack ld-2.13.so [.] 0x886e 2.96% git-upload-pack libc-2.13.so[.] vfprintf 2.83% git-upload-pack git-upload-pack [.] search_for_subdir 1.56% git-upload-pack [kernel.kallsyms] [k] do_raw_spin_lock 1.36% git-upload-pack libc-2.13.so[.] vsnprintf I wonder why your report doesn't note any time in libz. This is on Debian testing, maybe your OS uses different strip settings so it doesn't show up? Mine was on Debian unstable. The difference is probably that I have 400K refs, but only 12K unique ones (this is the master alternates repo containing every ref from every fork of rails/rails on GitHub). So I spend proportionally more time fiddling with refs and outputting than I do actually inflating tag objects. An updated profile with your patch: $ uname -r; perf --version; echo | perf record ./git-upload-pack ./dev/null; perf report | grep -v ^# | head 3.2.0-2-amd64 perf version 3.2.17 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.015 MB perf.data (~662 samples) ] 14.45% git-upload-pack libc-2.13.so[.] 0x78140 12.13% git-upload-pack [kernel.kallsyms] [k] walk_component 11.01% git-upload-pack libc-2.13.so[.] _IO_getline_info 10.74% git-upload-pack git-upload-pack [.] find_pack_entry_one 8.96% git-upload-pack [kernel.kallsyms] [k] __mmdrop 8.64% git-upload-pack git-upload-pack [.] sha1_to_hex 6.73% git-upload-pack libc-2.13.so[.] vfprintf 4.07% git-upload-pack libc-2.13.so[.] strchrnul 4.00% git-upload-pack libc-2.13.so[.] getenv 3.37% git-upload-pack git-upload-pack [.] packet_write Hmm. It seems like we should not need to open the tags at all. The main reason is to produce the peeled advertisement just after it. But for a packed ref with a modern version of git that supports the peeled extension, we should already have that information. B.t.w. do you plan to submit this as a non-hack, I'd like to have it in git.git, so if you're not going to I could pick it up and clean it up a bit. But I think it would be better coming from you. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] optimizing upload-pack ref peeling
On Thu, Oct 4, 2012 at 10:04 AM, Jeff King p...@peff.net wrote: On Thu, Oct 04, 2012 at 03:56:09AM -0400, Jeff King wrote: [1/4]: peel_ref: use faster deref_tag_noverify [2/4]: peel_ref: do not return a null sha1 [3/4]: peel_ref: check object type before loading [4/4]: upload-pack: use peel_ref for ref advertisements I included my own timings in the final one, but my pathological case at the end is a somewhat made-up attempt to emulate what you described. Can you double-check that this series still has a nice impact on your real-world repository? It does, here's best of five for, all compiled with -g -O3: v1.7.8: $ time (echo | ~/g/git/git-upload-pack . | pv /dev/null) 3.49MB 0:00:00 [ 5.3MB/s] [ = ] real0m0.660s user0m0.604s sys 0m0.248s master without your patches: $ time (echo | ~/g/git/git-upload-pack . | pv /dev/null) 3.49MB 0:00:00 [10.2MB/s] [ = ] real0m0.344s user0m0.300s sys 0m0.172s master with your patches: $ time (echo | ~/g/git/git-upload-pack . | pv /dev/null) 3.49MB 0:00:00 [31.8MB/s] [ = ] real0m0.113s user0m0.088s sys 0m0.088s -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Is anyone working on a next-gen Git protocol?
On Wed, Oct 3, 2012 at 9:13 PM, Junio C Hamano gits...@pobox.com wrote: Ævar Arnfjörð Bjarmason ava...@gmail.com writes: I'm creating a system where a lot of remotes constantly fetch from a central repository for deployment purposes, but I've noticed that even with a remote.$name.fetch configuration to only get certain refs a git fetch will still call git-upload pack which will provide a list of all references. It has been observed that the sender has to advertise megabytes of refs because it has to speak first before knowing what the receiver wants, even when the receiver is interested in getting updates from only one of them, or worse yet, when the receiver is only trying to peek the ref it is interested has been updated. Has anyone started working on a next-gen Git protocol as a result of this discussion? If not I thought I'd give it a shot if/when I have time. The current protocol is basically (S = Server, C = Client) S: Spew out first ref S: Advertisement of capabilities S: Dump of all our refs C/S: Declare wanted refs, negotiate with server S: Send pack to client, if needed And I thought I'd basically turn it into: C: Connect to server, declare what protocol we understand C: Advertisement of capabilities S: Advertisement of capabilities C/S: Negotiate what we want C/S: Same as v1, without the advertisement of capabilities, and maybe don't dump refs at all Basically future-proofing it by having the client say what it supports to begin with along with what it can handle (like in HTTP). Then in the negotiation phase the client server would go back forth about what they want how they want it. I'd planned to implement something like: C: want_refs refs/heads/* S: OK to that C: want_refs refs/tags/* S: OK to that Or: C: want_refs refs/heads/master S: OK to that C: want_refs refs/tags/v* S: OK to that As a proof of concept (and also something that'll solve the issue I had), but by adding an initial negotiation phase the protocol should be open to any future extensions without making assumptions about the client wanting to know about all of the server's refs, unlike the current protocol. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[minor] two tests broken when run with a --root directory that's a symlink
These issues are minor, I noticed it because I test with /dev/shm/git as the --root, which on Debian is symlinked to /run/.. $ rm -rf /tmp/{foo,bar} $ mkdir /tmp/target; ln -s /tmp/target /tmp/link $ prove ./t4035-diff-quiet.sh ./t9903-bash-prompt.sh :: --root=/tmp/target ./t4035-diff-quiet.sh ... ok ./t9903-bash-prompt.sh .. ok All tests successful. Files=2, Tests=64, 1 wallclock secs ( 0.04 usr 0.00 sys + 0.07 cusr 0.06 csys = 0.17 CPU) Result: PASS $ prove ./t4035-diff-quiet.sh ./t9903-bash-prompt.sh :: --root=/tmp/link ./t4035-diff-quiet.sh ... Dubious, test returned 1 (wstat 256, 0x100) Failed 3/20 subtests ./t9903-bash-prompt.sh .. Dubious, test returned 1 (wstat 256, 0x100) Failed 6/44 subtests Everything else in the test suite passes with a --root that's a symlink. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What's cooking in git.git (Oct 2012, #04; Thu, 11)
On Fri, Oct 12, 2012 at 1:12 AM, Junio C Hamano gits...@pobox.com wrote: * jk/peel-ref (2012-10-04) 4 commits (merged to 'next' on 2012-10-08 at 4adfa2f) + upload-pack: use peel_ref for ref advertisements + peel_ref: check object type before loading + peel_ref: do not return a null sha1 + peel_ref: use faster deref_tag_noverify Speeds up git upload-pack (what is invoked by git fetch on the other side of the connection) by reducing the cost to advertise the branches and tags that are available in the repository. FWIW I have this deployed at work for a userbase of a few hundred users, none of whom have had any issues with it, it does speed things up a lot though. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: push race
On Mon, Oct 15, 2012 at 11:14 AM, Angelo Borsotti angelo.borso...@gmail.com wrote: Hello, FWIW we have a lot of lemmings pushing to the same ref all the time at $work, and while I've seen cases where: 1. Two clients try to push 2. They both get the initial lock 3. One of them fails to get the secondary lock (I think updating the ref) I've never seen cases where they clobber each other in #3 (and I would have known from dude, where's my commit that I just pushed reports). So while we could fix git to make sure there's no race condition such that two clients never get the #2 lock I haven't seen it cause actual data issues because of two clients getting the #3 lock. It might still happen in some cases, I recommend testing it with e.g. lots of pushes in parallel with GNU Parallel. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: When Will We See Collisions for SHA-1? (An interesting analysis by Bruce Schneier)
On Mon, Oct 15, 2012 at 6:42 PM, Elia Pinto gitter.spi...@gmail.com wrote: Very clear analysis. Well written. Perhaps is it the time to update http://git-scm.com/book/ch6-1.html (A SHORT NOTE ABOUT SHA-1) ? Hope useful http://www.schneier.com/crypto-gram-1210.html This would be concerning if the Git security model would break down if someone found a SHA1 collision, but it really wouldn't. It's one thing to find *a* collision, it's quite another to: 1. Find a collision for the sha1 of harmless.c which I know you use, and replace it with evil.c. 2. Somehow make evil.c compile so that it actually does something useful and nefarious, and doesn't just make the C compiler puke. If finding one arbitrary collision costs $43K in 2021 dollars getting past this point is going to take quite a large multiple of $43K. 3. Somehow inject the new evil object into your repository, or convince you to re-clone it / clone it from somewhere you usually wouldn't. At some point in the early days of Git Linus went on a rant to this effect either on this list or on the LKML. Maybe it would be useful to include some of that instead? It would be very interesting to see an analysis that deals with some actual Git-related security scenarios, instead of something that just assumes that if someone finds *any* SHA1 collision the sky is going to fall. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: diff support for the Eiffel language?
On Mon, Oct 22, 2012 at 1:58 PM, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: However there's one little thing I noticed with git diff: The conte4xt lines (staring with @@) show the current function (in Perl and C), but they show the current feature clause in Eiffel (as opposed to the expected current feature). I wonder how hard it is to fix it (Observed in git 1.7.7 of openSUSE 12.1). See git.git's e90d065 for an example of adding a new diff pattern. You could easily come up with a patch and send it to the list, however it would probably be good to CC some Eiffel language list in case there's some syntax oddities you've missed. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: The config include mechanism doesn't allow for overwriting
On Mon, Oct 22, 2012 at 11:15 PM, Jeff King p...@peff.net wrote: On Mon, Oct 22, 2012 at 05:55:00PM +0200, Ævar Arnfjörð Bjarmason wrote: I was hoping to write something like this: [user] name = Luser email = some-defa...@example.com [include] path = ~/.gitconfig.d/user-email Where that file would contain: [user] email = local-em...@example.com The intent is that it would work as you expect, and produce local-em...@example.com. But when you do that git prints: $ git config --get user.email some-defa...@example.com error: More than one value for the key user.email: local-em...@example.com Ugh. The config code just feeds all the values sequentially to the callback. The normal callbacks within git will overwrite old values, whether from earlier in the file, from a file with lower priority (e.g., /etc/gitconfig versus ~/.gitconfig), or from an earlier included. Which you can check with: $ git var GIT_AUTHOR_IDENT Luser local-em...@example.com 1350936694 -0400 But git-config takes it upon itself to detect duplicates in its callback. Which is just silly, since it is not something that regular git would do. git-config should behave as much like the internal git parser as possible. I think config inclusion is much less useful when you can't clobber previously assigned values. Agreed. But I think the bug is in git-config, not in the include mechanism. I think I'd like to do something like the patch below, which just reuses the regular config code for git-config, collects the values, and then reports them. It does mean we use a little more memory (for the sake of simplicity, we store values instead of streaming them out), but the code is much shorter, less confusing, and automatically matches what regular git_config() does. It fails a few tests in t1300, but it looks like those tests are testing for the behavior we have identified as wrong, and should be fixed. I think this patch looks good. One other thing I think is worth clarifying (and I think should be broken) is if you write a configuration like: [foo] bar = one [foo] bar = two [foo] bar = three git-{config,var} -l will both give you: foo.bar=one foo.bar=two foo.bar=three And git config --get foo.bar will give you: $ git config -f /tmp/test --get foo.bar one error: More than one value for the key foo.bar: two error: More than one value for the key foo.bar: three I think that it would be better if the config mechanism just silently overwrote keys that clobbered earlier keys like your patch does. But in addition can we simplify things for the consumers of git-{config,var} -l by only printing: foo.bar=three Or are there too many variables like include.path that can legitimately appear more than once. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 8/8] git-config: use git_config_with_options
Yeah same here. Thanks for tackling this bug. Looking forward to using the include mechanism for overriding user.email in future versions. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Has anyone tried to implement git grep --blame?
This would be so much more convenient if git-grep supported it natively: $ git grep -n 'if \(0\)' | perl -pe's/([^:]+):([^:]+).*/`git blame -L $2,$2 $1`/se' d18f76dc (Ævar Arnfjörð Bjarmason 2010-08-17 09:24:38 + 2278) if (0) 65648283 (David Brown 2007-12-25 19:56:29 -0800 433) if (0) { I.e. with all the coloring/pager interaction. Some Googling around reveals people piping things to git-blame like that, but has anyone made a stab at a smarter implementation (that would know to blame the whole file if it had lots of hits etc..). Don't know if I have time myself, but I'd be very pleased if someone hacked that up. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Add a new email notification script to contrib
On Thu, Nov 8, 2012 at 1:17 PM, Michael Haggerty mhag...@alum.mit.edu wrote: On 11/08/2012 12:39 PM, Ævar Arnfjörð Bjarmason wrote: [...] I'm glad it's getting some use. Thanks for the feedback. I'll test it out some more, the issues I've had with it so far in migrating from the existing script + some custom hacks we have to it have been: * Overly verbose default templates, easy to overwrite now. Might send patches for some of them. The templating is currently not super flexible nor very well documented, but simple changes should be easy enough. I mostly carried over the text explanations from the old post-receive-email script; it is true that they are quite verbose. * No ability to link to a custom gitweb, probably easy now. What do you mean by a custom gitweb? What are the commitmail issues involved? Just for the E-Mail to include a link to http://gitweb.example.com/git/?h=$our_hash etc. * If someone only pushes one commit I'd like to only have one e-mail with the diff, but if they push multiple commits I'd like to have a summary e-mail and replies to that which have the patches. It only seemed to support the latter mode, so you send out two e-mails for pushing one commit. That's correct, and I've also thought about the feature that you described. I think it would be pretty easy to implement; it is only not quite obvious to which mailing list(s?) such emails should be sent. * Ability to limit the number of lines, but not line length, that's handy for some template repositories. Should be easy to add Should too-long lines be folded or truncated? Either way, it should be pretty straightforward (Python even has a textwrap module that could be used). But in addition to that we have our own custom E-Mail notification scripts for: * People can subscribe to changes to certain files. I.e. if you modify very_important.c we'll send an E-Mail to a more widely seen review list. * Invididuals can also edit a config file to watch individual files / glob patterns of files, e.g. src/main.c or src/crypto* I implemented something like this back when we were using Subversion, but it didn't get much use and seemed like more configuration hassle than it was worth. If this were implemented and I asked for notifications about a particular file, and a particular reference change affects the file, what should I see? * The summary email for the reference change (yes/no)? * Detail emails for all commits within the reference change, or only for the individual commits that modify the file? * Should the detail emails include the full patch for the corresponding commit, or only the diffs affecting the file(s) of interest? (The latter would start to get expensive, because the script would have to generate individual emails per subscriber instead of letting sendmail fan the emails out across all subscribers.) I think just sending the individual patch e-mails to all people who subscribe to paths that got changed in that patch that match their watchlist makes sense. That's how an internal E-mailing script that I'm hopign to replace with this works. That script *also* supports sending the whole batch of patches pushed in that push to someone watching any file that got modified in one of the patches, in case you also want to get other stuff pushed in pushes for files you're interested in. But it doesn't generate individual E-Mails per recipient. I think that way lies madness because as you rightly point out you have to start worrying about the combinatorial nightmare of generating the E-mails per subscriber. I think a good way to support that would be to have either a path to a config file with those watch specs, or a command so you could run git show ... on some repo users can push to. *How* this feature would be configured depends strongly on how the repo is hosted. For example, gitolite has a well-developed scheme for how the server should be configured, and it would make sense to work together with that. Other people might configure user access via LDAP or Apache. But overall it's very nice. I'll make some time to test it in my organization (with lots of commits and people reading commit e-mails). -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Add a new email notification script to contrib
On Thu, Nov 8, 2012 at 5:24 PM, Marc Branchaud mbranch...@xiplink.com wrote: I'd like there to be one list that always gets everything, and the other lists should get subsets of the everything list. Since it supports multiple mailing lists per category you can always do (I can't remember the specific config keys, but it's not important): commits = all-git-activ...@example.com,git-comm...@example.com tags= all-git-activ...@example.com,git-t...@example.com etc. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC v2] git-multimail: a replacement for post-receive-email
On Sun, Jan 27, 2013 at 9:37 AM, Michael Haggerty mhag...@alum.mit.edu wrote: A while ago, I submitted an RFC for adding a new email notification script to contrib [1]. The reaction seemed favorable and it was suggested that the new script should replace post-receive-email rather than be added separately, ideally with some kind of migration support. I just want to say since I think this thread hasn't been getting the attention it deserves: I'm all for this. I've used git-multimail and it's a joy to configure and extend compared to the existing hacky shellscript. I'm not running it at $work yet because I still need to write some extensions for to port some of of our local hacks to the old shellscript over. I fully support replacing the existing mailing script with git-multimail, it's better in every way, and unlike the current script has an active maintainer. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/7] Undocument deprecated alias 'push.default=tracking'
On Mon, Apr 23, 2012 at 10:37 AM, Matthieu Moy matthieu@imag.fr wrote: It's been deprecated since 53c4031 (Johan Herland, Wed Feb 16 2011, push.default: Rename 'tracking' to 'upstream'), so it's OK to remove it from documentation (even though it's still supported) to make the explanations more readable. I don't think this was a good move for the documentation. Now every time I find an old repo with push.default=tracking I end up wondering what it was a synonym for again, and other users who don't know what it does will just assume it's an invalid value or something. We can't treat existing config values we still support as any other deprecated feature. They still exist in files we have no control over, and in people's brains who are reading man git-config trying to remember what it meant. Signed-off-by: Matthieu Moy matthieu@imag.fr --- Feel free to squash into previous one if needed. Documentation/config.txt |1 - 1 file changed, 1 deletion(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index e38fab1..ddf6043 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1693,7 +1693,6 @@ push.default:: makes `git push` and `git pull` symmetrical in the sense that `push` will update the same remote ref as the one which is merged by `git pull`. -* `tracking` - deprecated synonym for `upstream`. * `current` - push the current branch to a branch of the same name. + The `current` and `upstream` modes are for those who want to -- 1.7.10.234.ge65dd.dirty -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Should log --cc imply log --cc -p?
On Mon, Feb 4, 2013 at 5:36 PM, Junio C Hamano gits...@pobox.com wrote: git log/diff-files -U8 do not need -p to enable textual patches, for example. It is I already told you that I want 8-line context. For what else, other than showing textual diff, do you think I told you that for? and replacing 8-line context with various other options that affect patch generation will give us a variety of end user complaints that would tell us that C) is more intuitive to them. On a related note I think --full-diff should imply -p too. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is anyone working on a next-gen Git protocol (Re: [PATCH v3 0/8] Hiding refs)
On Wed, Jan 30, 2013 at 7:45 PM, Junio C Hamano gits...@pobox.com wrote: The third round. - Multi-valued variable transfer.hiderefs lists prefixes of ref hierarchies to be hidden from the requests coming over the network. - A configuration optionally allows uploadpack to accept fetch requests for an object at the tip of a hidden ref. Elsewhere, we discussed delaying ref advertisement (aka expand refs), but it is an orthogonal feature and this hiding refs completely from advertisement series does not attempt to address. I'm a bit late to this so sorry if this has been covered before. In the initial draft of this series the rationale for it was reducing the network cost while talking with a repository with tons of refs[1]. But later you seem to have changed your mind, and network bandwidth reduction of advertisement is a side effect of clutter reduction, and not necessarily the primary goal. Do you have any plans for something that *does* have the reduction of network bandwidth as a primary goal? In October I asked if anyone was working on a next-gen Git protocol[3] that would provide clients with the ability to specify what refs they wanted. You replied to me off-list saying Yes. Is this what you've been working on? Because if so I misunderstood you thinking you were going to work on something that gave clients the ability specify what they wanted before the initial ref advertisement. I'm still very keen to have that ability, so if you're not working on it I just might give it a go. 1. http://article.gmane.org/gmane.comp.version-control.git/213951 2. http://article.gmane.org/gmane.comp.version-control.git/213984 3. http://article.gmane.org/gmane.comp.version-control.git/214025 4. http://thread.gmane.org/gmane.comp.version-control.git/207190 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is anyone working on a next-gen Git protocol (Re: [PATCH v3 0/8] Hiding refs)
On Tue, Feb 5, 2013 at 5:03 PM, Junio C Hamano gits...@pobox.com wrote: Ævar Arnfjörð Bjarmason ava...@gmail.com writes: Do you have any plans for something that *does* have the reduction of network bandwidth as a primary goal? Uncluttering gives reduction of bandwidth anyway, so I do not see much point in the distinction you seem to be making. Doing this work wouldn't only give us a way to specify which refs we want, but if done correctly would future-proof the protocol in case we want to add any other extensions down the line in a backwards-compatible fashion without having the server first spew all his refs at us. Anyway, an implementation that allows a client to say I want X is simpler than an implementation where a server has to anticipate in advance which X the clients will ask for. Is this what you've been working on? Because if so I misunderstood you thinking you were going to work on something that gave clients the ability specify what they wanted before the initial ref advertisement. ... 4. http://thread.gmane.org/gmane.comp.version-control.git/207190 Who speaks first mentioned in 4. above, was primarily about delaying ref advertisement, which would be a larger protocol change. Nobody seems to have attacked it since it was discussed, and I was tired of hearing nothing but complaints and whines. This hiding refs series was done as a cheaper way to solve a related issue, without having to wait for the solution of delaying advertisement, which is an orthogonal issue. Oh sure. I just wanted to know if you were working on delaying ref advertisement to avoid duplicating efforts. I had the impression you were given your earlier E-Mail, but obviously we had a misunderstanding. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 0/8] Hiding refs
On Wed, Feb 6, 2013 at 8:17 PM, Junio C Hamano gits...@pobox.com wrote: Maybe this should be split up into a different thread, but: The upload-pack-2 service sits on a port different from today's [...]. I think there's a simpler way to do this, which is that: * New clients supporting v2 of the protocol send some piece of data that would break old servers. * If that fails the new client goes oh jeeze, I guess it's an old server, and try again with the old protocol. * The client then saves a date (or the version the server gave us) indicating that it tried the new protocol on that remote, tries again sometime later. We already covered in previous discussions how this would be simpler with the HTTP protocol, since you could just send an extra header inviting the server to speak the new protocol. But for the other transports we can just try the new protocol and retry with the old one as a fallback if it doesn't work. That'll allow us to gracefully migrate without needing to change the git:// port. Besides, I think the vast majority of users are using Git via http:// or ssh://, where we can't just change the port, but even so making people change the port when we could handle this more gracefully would be a big PITA. Adding new firewall holes is often a big bureaucratic nightmare in some organizations. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 0/8] Hiding refs
On Thu, Feb 7, 2013 at 1:16 AM, Jeff King p...@peff.net wrote: On Wed, Feb 06, 2013 at 04:12:10PM -0800, Junio C Hamano wrote: Ævar Arnfjörð Bjarmason ava...@gmail.com writes: I think there's a simpler way to do this, which is that: * New clients supporting v2 of the protocol send some piece of data that would break old servers. * If that fails the new client goes oh jeeze, I guess it's an old server, and try again with the old protocol. * The client then saves a date (or the version the server gave us) indicating that it tried the new protocol on that remote, tries again sometime later. For that to work, the new server needs to wait for the client to speak first. How would that server handle old clients who expect to be spoken first? Wait with a read timeout (no timeout is the right timeout for everybody)? If the new client can handle the old-style server's response, then the server can start blasting out refs (optionally after a timeout) and stop when the client interrupts with hey, wait, I can speak the new protocol. The server just has to include you can interrupt me in its capability advertisement (obviously it would have to send out at least the first ref with the capabilities before the timeout). Can't this also be handled by passing an extra argument to upload-pack? Whether you're talking http, ssh + normal shell, ssh + git-shell or git:// you pass some argument that older clients would reject on but would cause newer clients that know about that argument to wait for you to speak before blasting refs at you. It would mean that older clients (e.g. older git-shell) would reject your initial connection, but you could just try again, and save away info about that remote's version. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: inotify to minimize stat() calls
On Fri, Feb 8, 2013 at 10:10 PM, Ramkumar Ramachandra artag...@gmail.com wrote: For large repositories, many simple git commands like `git status` take a while to respond. I understand that this is because of large number of stat() calls to figure out which files were changed. I overheard that Mercurial wants to solve this problem using itnotify, but the idea bothers me because it's not portable. Will Git ever consider using inotify on Linux? What is the downside? There's one relatively easy sub-task of this that I haven't seen mentioned: Improving the speed of interactive rebase on large (as in lots of checked out files) repositories. That's the single biggest thing that bothers me when I use Git with large repos, not the speed of git status. When you git rebase -i HEAD~100 re-arrange some patches and save the TODO list it takes say 0.5-1s for each patch to be applied, but at least 10x less than that on a small repository. E.g. try this on linux-2.6.git v.s. some small project with a few dozen files. I looked into this a long while ago and remembered that rebase was doing something like a git status for every commit that it made to check the dirtyness. This could be vastly improved by having an unsafe option to git-rebase where it just assumes that the starting state + whatever it wrote out is the current state, i.e. it would break if someone stuck up on your checkout during an interactive rebase and changed a file, but the common case of the user having exclusive access to the repo and waiting for the rebase would be much faster. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Can git restrict source files ?
On Tue, Feb 19, 2013 at 5:06 PM, Juan Pablo juanpablo8...@gmail.com wrote: I have a question, can i control the access to specific files or folders ?? I need that some developers can't see some source files, thank you very much for your time No, but what you can do is to split these up into different repositories. E.g. where I work we have a puppet.git and a secrets.git, the latter contains passwords and other secret data, the former just uses macros to include that and is accessible to everyone. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] help: show manpage for aliased command on git alias --help
Change the semantics of git alias --help to show the help for the command alias is aliased to, instead of just saying: `git alias' is aliased to `whatever' E.g. if you have checkout aliased to co you won't get: $ git co --help `git co' is aliased to `checkout' But will instead get the manpage for git-checkout. The behavior this is replacing was originally added by Jeff King in 2156435. I'm changing it because of this off-the-cuff comment on IRC: 14:27:43 @Tux git can be very unhelpful, literally: 14:27:46 @Tux $ git co --help 14:27:46 @Tux `git co' is aliased to `checkout' 14:28:08 @Tux I know!, gimme the help for checkout, please And because I also think it makes more sense than showing you what the thing is aliased to. Signed-off-by: Ævar Arnfjörð Bjarmason ava...@gmail.com --- builtin/help.c | 12 +++- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/builtin/help.c b/builtin/help.c index d1d7181..fdb3312 100644 --- a/builtin/help.c +++ b/builtin/help.c @@ -417,6 +417,7 @@ int cmd_help(int argc, const char **argv, const char *prefix) { int nongit; const char *alias; + const char *show_help_for; enum help_format parsed_help_format; load_command_list(git-, main_cmds, other_cmds); @@ -449,20 +450,21 @@ int cmd_help(int argc, const char **argv, const char *prefix) alias = alias_lookup(argv[0]); if (alias !is_git_command(argv[0])) { - printf_ln(_(`git %s' is aliased to `%s'), argv[0], alias); - return 0; + show_help_for = alias; + } else { + show_help_for = argv[0]; } switch (help_format) { case HELP_FORMAT_NONE: case HELP_FORMAT_MAN: - show_man_page(argv[0]); + show_man_page(show_help_for); break; case HELP_FORMAT_INFO: - show_info_page(argv[0]); + show_info_page(show_help_for); break; case HELP_FORMAT_WEB: - show_html_page(argv[0]); + show_html_page(show_help_for); break; } -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] help: show manpage for aliased command on git alias --help
On Tue, Mar 5, 2013 at 5:16 PM, Junio C Hamano gits...@pobox.com wrote: Ævar Arnfjörð Bjarmason ava...@gmail.com writes: Change the semantics of git alias --help to show the help for the command alias is aliased to, instead of just saying: `git alias' is aliased to `whatever' E.g. if you have checkout aliased to co you won't get: $ git co --help `git co' is aliased to `checkout' If you had lg aliased to log --oneline and you made $ git lg --help to give anything but 'git lg' is aliased to `log --oneline' I would say that is a grave regression. Good point. I'll fix that up. No objection to the patch in principle though? I.e. not showing you what the alias points to. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: propagating repo corruption across clone
On Sun, Mar 24, 2013 at 7:31 PM, Jeff King p...@peff.net wrote: I don't have details on the KDE corruption, or why it wasn't detected (if it was one of the cases I mentioned above, or a more subtle issue). One thing worth mentioning is this part of the article: Originally, mirrored clones were in fact not used, but non-mirrored clones on the anongits come with their own set of issues, and are more prone to getting stopped up by legitimate, authenticated force pushes, ref deletions, and so on – and if we set the refspec such that those are allowed through silently, we don’t gain much. So the only reason they were even using --mirror was because they were running into those problems with fetching. So aside from the problems with --mirror I think we should have something that updates your local refs to be exactly like they are on the other end, i.e. deletes some, non-fast-forwards others etc. (obviously behind several --force options and so on). But such an option *wouldn't* accept corrupted objects. That would give KDE and other parties a safe way to do exact repo mirroring like this, wouldn't protect them from someone maliciously deleting all the refs in all the repos, but would prevent FS corruption from propagating. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] Sharness - Test library derived from Git
On Tue, Jul 17, 2012 at 10:06 AM, Mathias Lafeldt mathias.lafe...@gmail.com wrote: I've been wanting to announce Sharness [1] on this list for quite some time now, but never managed to do so. With the release of version 0.2.4, I think it's about time to change that. Sharness is a shell-based test harness library. It was derived from the Git project and is basically a generalized and stripped-down version of t/test-lib.sh (I basically removed all things specific to Git). So when you know how to write tests for Git, it should be very familiar. Nice, I thought about doing something like this myself but never had the time. Perhaps to avoid duplication we could move to this and keep Git-specific function in some other file. Do you think that would be sensible, and would you be willing to submit patches for that? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Centralized git
On Tue, Jul 31, 2012 at 3:08 PM, Javier Domingo javier...@gmail.com wrote: Network, in this case is cheaper. The thing is that If I commit frecuently, will have plenty of GBs of history, that nearly for sure I won't use. I just need to have other people's work to merge. But I want to think in Git style, I am pretty accustomed to that way of doing things. That is why I sent this mail here. The idea is that if I modify 700MBs of video, with 20 commits I would get in 21GB. And making a pull would be... just even more horrible than anything. That is why I need to have also last checkouts filter. Just download branch's HEADs. You're obviously aware of git-annex, is there any reason you can't just use that? That would give you what you want, you'd have a moving window of current files, and then you'd delete old files as they become un-needed. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Why doesn't git-fetch obey -c remote.origin.url on the command-line?
On a git built from the master branch just now: $ ./git config remote.origin.url https://code.google.com/p/git-core/ $ ./git -c remote.origin.url=git://git.sourceforge.jp/gitroot/git-core/git.git config remote.origin.url git://git.sourceforge.jp/gitroot/git-core/git.git $ GIT_TRACE=1 ./git -c remote.origin.url=git://git.sourceforge.jp/gitroot/git-core/git.git fetch 21 | head -n 2 trace: built-in: git 'fetch' trace: run_command: 'git-remote-https' 'origin' 'https://code.google.com/p/git-core/' I'd expect this to try to fetch from the remote.origin.url I specified on the command-line, but for some reason fetch doesn't pick that up. Isn't this a bug? The use case for this is to have a script in cron that does a pull of repositories via http while the developers expecting to occasionally use those repositories as work directories should transparently be able to pull/push from them. I know about remote.origin.pushurl, but I'd prefer pulls to also be over ssh in those cases, because then you don't have to worry about proxy settings (different for the devs that automated script). I could fix this, but I thought I'd first send a question about whether this shouldn't be considered a bug, and I haven't dug into this yet but I think that configuration passed via the -c option should *always* override any other config Git may get from elsewhere. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Enhancement Request: locale git option
On Thu, Dec 4, 2014 at 10:55 AM, Jeff King p...@peff.net wrote: On Thu, Dec 04, 2014 at 09:29:04AM +0100, Torsten Bögershausen wrote: How about alias git='LANGUAGE=de_DE.UTF-8 git' in your ~/.profile ? (Of course you need to change de to the language you want ) Besides being awkward in scripts (which will not respect the alias and use a different language!), that variable will also be inherited by programs git spawns. So the editor, for example, may end up in the wrong language. I think respecting core.locale would make sense (probably the change would go into git_setup_gettext(), but you may have to fight with the setup code over looking at config so early in the process). I think we should just stick to the standard *nix way of doing this: Tell people to set their locale in their environment. If someone's having this issue it's also happening for all the binutils, and any other command-line and GUI program they use, unless they override using the standard way of doing so, by setting the relevant LC_* environment variables. If you want Git in English then create an alias to override its locale to be C, if you want the editor it spawns to be in some other language alias that to something that explicitly sets LC_* for that editor. Maybe I'm being overzealous about this (especially with the I implemented this blinders on), but let's not have Git set the precedent for other *nix programs that they all should come up with some custom way to override locales, that's something to be done at the OS locale library level, which we use. However, I think the original question is not one of localizing git, but rather of having it _not_ localized (avoiding the German translations). There is a hack you can do that for that, which is to set GIT_TEXTDOMAINDIR to something nonsensical (like /), which will mean git cannot find the .po files, and just uses the builtin messages. You can, but the fact that that works is an internal implementation detail we shouldn't document or support. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Enhancement Request: locale git option
On Thu, Dec 4, 2014 at 5:12 PM, Michael J Gruber g...@drmicha.warpmail.net wrote: Ævar Arnfjörð Bjarmason schrieb am 04.12.2014 um 16:49: On Thu, Dec 4, 2014 at 10:55 AM, Jeff King p...@peff.net wrote: On Thu, Dec 04, 2014 at 09:29:04AM +0100, Torsten Bögershausen wrote: How about alias git='LANGUAGE=de_DE.UTF-8 git' in your ~/.profile ? (Of course you need to change de to the language you want ) Besides being awkward in scripts (which will not respect the alias and use a different language!), that variable will also be inherited by programs git spawns. So the editor, for example, may end up in the wrong language. I think respecting core.locale would make sense (probably the change would go into git_setup_gettext(), but you may have to fight with the setup code over looking at config so early in the process). I think we should just stick to the standard *nix way of doing this: Tell people to set their locale in their environment. If someone's having this issue it's also happening for all the binutils, and any other command-line and GUI program they use, unless they override using the standard way of doing so, by setting the relevant LC_* environment variables. If you want Git in English then create an alias to override its locale to be C, if you want the editor it spawns to be in some other language alias that to something that explicitly sets LC_* for that editor. Maybe I'm being overzealous about this (especially with the I implemented this blinders on), but let's not have Git set the precedent for other *nix programs that they all should come up with some custom way to override locales, that's something to be done at the OS locale library level, which we use. However, I think the original question is not one of localizing git, but rather of having it _not_ localized (avoiding the German translations). There is a hack you can do that for that, which is to set GIT_TEXTDOMAINDIR to something nonsensical (like /), which will mean git cannot find the .po files, and just uses the builtin messages. You can, but the fact that that works is an internal implementation detail we shouldn't document or support. The main issue at hand is really that we have localised git but not its man pages. Even if you understand English, the man pages don't help you at all if you can't connect the technical terms used there to their localised counterparts in git's messages. (NO_GETTEXT=y is my solution.) That is one of the many reasons why I proposed to have a dictionary of the main technical terms for each language before we even localise git in that language. In an ideal word, we would provide a simple solution for looking these terms up both ways. I don't think we're going to have localised man pages any time soon, are we? I think that's a great idea, and one that's only blocked on someone (hint hint) sending patches for it. It would be neat-o to have something to make translating the docs easier, i.e. PO files for sections of the man pages. There's tools to help with that which we could use. But there's no reason for us not to have translated glossaries in the meantime. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git's Perl scripts can fail if user is configured for perlbrew
On Sun, Dec 28, 2014 at 11:36 PM, Randy J. Ray rj...@blackperl.com wrote: I use git on MacOS via homebrew (http://brew.sh/), and a custom Perl installation built and managed via perlbrew (http://perlbrew.pl/). At some point, commands like git add -i broke. I say at some point, because I'm not a git power-user and I only just noticed it this week. I am running Git 2.2.1 with a perlbrew'd Perl 5.20.1. When I would run git add -i (or git add -p), it would immediately die with a signal 11. Some poking around showed that those git commands that are implemented as Perl scripts run under /usr/bin/perl, and also prefix some directories to the module search-path. The problem stems from the fact that, when you are using perlbrew, you also have the PERL5LIB environment variable set. The contents of it lay between the git-provided paths and the default contents of @INC. When the Git module is loaded, it (eventually) triggers a load of List::Util, whose C-level code fails to load because of a version mismatch; you got List::Util from the paths in PERL5LIB, but it doesn't match the version of perl from /usr/bin/perl. After poking around and trying a few different things, I have found that using the following line in place of #!/usr/bin/perl solves this problem: #!/usr/bin/env perl This can be done by defaulting PERL_PATH to /usr/bin/env perl in Makefile. I don't know enough about the overall git ecosystem to know if this would have an adverse effect on anything else (in particular, Windows compatibility, but then Windows probably isn't having this issue in the first place). I could just create and mail in the one-line patch for this, but I thought it might be better to open it up for some discussion first? [CC'd the perlbrew author] This is a bit of a tricky issue. Using whatever perl is defined in the environment is just as likely to break, in general the build process tries to pick these assets at compile-time. Imagine you're experimenting with some custom perl version and now Git inexplicably breaks. It's better if Git detects a working perl when you compile it and sticks with that, which is why we use /usr/bin/perl by default. When you're setting PERL5LIB you're indicating to whatever perl interpreter you're going to run that that's where they it should pick up its modules. IMO they way perlbrew does this is broken, instead of setting PATH + PERL5LIB globally for your login shell it should set the PATH, and then the perl in that path should be a pointer to some small shellscript that sets PERL5LIB for *its* perl. I don't know what the right tradeoff here is, but I think it would be just as sensible to unset PERL5LIB in our own perl scripts + modules, it would make live monkeypatching when you wanted to harder, but we could always add a GITPERL5LIB or something... -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git's Perl scripts can fail if user is configured for perlbrew
On Mon, Dec 29, 2014 at 10:57 PM, Randy J. Ray rj...@blackperl.com wrote: On 12/29/14, 7:40 AM, Torsten Bögershausen wrote: Having problems with different perl installations is not an unknown problem in Git, I would say. And Git itself is prepared to handle this situation: In Makefile I can read: # Define PERL_PATH to the path of your Perl binary (usually /usr/bin/perl). (What Git can not decide is which perl it should use, the one pointed out by $PATH or /usr/bin/perl.) What does type perl say ? And what happens when you build and install Git like this: PERL_PATH=/XX/YY/perl make install --- Are you thinking about changing ifndef PERL_PATH PERL_PATH = /usr/bin/perl endif -- into -- ifndef PERL_PATH PERL_PATH = $(shell which perl) endif --- At first glance that could make sense, at least to me. The problem in this case is the Perl being used at run-time, not build-time. The building of git is done by the homebrew project in this case, so I don't have direct control over it. Correct, but we don't change /usr/bin/perl at runtime, we hardcode that at compile-time. Similarly we could hardcode PERL5LIB at compile-time, but we don't, if we did you wouldn't have this problem. I.e. the problem is that we're using the system-provided perl with a custom PERL5LIB set for the benefit of a non-system provided perl installed after you built Git (or built in a different environment...) -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] Git v2.3.0
On Thu, Feb 5, 2015 at 11:53 PM, Junio C Hamano gits...@pobox.com wrote: The latest feature release Git v2.3.0 is now available at the usual places. [...] * Git 2.0 was supposed to make the simple mode for the default of git push, but it didn't. (merge 00a6fa0 jk/push-simple later to maint). Maybe I'm misunderstanding what this does, but changing the push default was *the* backwards compatibility breakage we advertised for v2.0.0[1]. A lot of users (including myself) upgraded to v2.0.0 very carefully making sure that the common pattern of git push our users were using wasn't broken. But apparently that change isn't taking effect until now. If so I think this needs to be advertised a lot more prominently than buried down along with other miscellaneous fixes in the changelog. 1. https://git.kernel.org/cgit/git/git.git/tree/Documentation/RelNotes/2.0.0.txt -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git messes up 'ø' character
On Tue, Jan 20, 2015 at 10:23 PM, Noralf Trønnes no...@tronnes.org wrote: Den 20.01.2015 21:45, skrev Ævar Arnfjörð Bjarmason: On Tue, Jan 20, 2015 at 9:17 PM, Noralf Trønnes no...@tronnes.org wrote: Den 20.01.2015 21:07, skrev Torsten Bögershausen: On 2015-01-20 20.46, Noralf Trønnes wrote: could it be that your ø is not encoded as UTF-8, but in ISO-8859-15 (or so) $ git log -1 commit b2a4f6abdb097c4dc092b56995a2af8e42fbea79 Author: Noralf TrF8nnes no...@tronnes.org What does git config -l | grep Noralf | xxd say ? $ git config -l | grep Noralf | xxd 000: 7573 6572 2e6e 616d 653d 4e6f 7261 6c66 user.name=Noralf 010: 2054 72f8 6e6e 6573 0aTr.nnes. $ file ~/.gitconfig /home/pi/.gitconfig: ISO-8859 text What's happened here is that: 1. You've authored your commit in ISO-8859-1 2. Git itself has no place for the encoding of the author name in the commit object format 3. git-format-patch has a --compose-encoding which I think would sort this out if you set it to ISO-8859-1, but it defaults to UTF-8 4. Your patch is actually a ISO-8859-1 byte sequence, but is advertised as UTF-8 5. You end up with a screwed-up commit You could work around this, but I suggest just joining the 21st century and working exclusively in UTF-8, it makes things much easier, speaking as someone with 3x more non-ASCII characters their his name than you :) Ok, then the question is: How do I switch to UTF-8? To me it seems I'm already using it: $ locale charmap UTF-8 Your .gitconfig has an ISO-8859-1 string, from an earlier mail of yours: $ git config -l | grep Noralf | xxd 000: 7573 6572 2e6e 616d 653d 4e6f 7261 6c66 user.name=Noralf 010: 2054 72f8 6e6e 6573 0aTr.nnes. On a system configured for UTF-8 this would be: $ echo Noralf Trønnes | xxd 000: 4e6f 7261 6c66 2054 72c3 b86e 6e65 730a Noralf Tr..nnes. Note the f8 v.s. c3 b8. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git messes up 'ø' character
On Tue, Jan 20, 2015 at 10:38 PM, Noralf Trønnes no...@tronnes.org wrote: Den 20.01.2015 22:26, skrev Ævar Arnfjörð Bjarmason: On Tue, Jan 20, 2015 at 10:23 PM, Noralf Trønnes no...@tronnes.org wrote: Den 20.01.2015 21:45, skrev Ævar Arnfjörð Bjarmason: On Tue, Jan 20, 2015 at 9:17 PM, Noralf Trønnes no...@tronnes.org wrote: Den 20.01.2015 21:07, skrev Torsten Bögershausen: On 2015-01-20 20.46, Noralf Trønnes wrote: could it be that your ø is not encoded as UTF-8, but in ISO-8859-15 (or so) $ git log -1 commit b2a4f6abdb097c4dc092b56995a2af8e42fbea79 Author: Noralf TrF8nnes no...@tronnes.org What does git config -l | grep Noralf | xxd say ? $ git config -l | grep Noralf | xxd 000: 7573 6572 2e6e 616d 653d 4e6f 7261 6c66 user.name=Noralf 010: 2054 72f8 6e6e 6573 0aTr.nnes. $ file ~/.gitconfig /home/pi/.gitconfig: ISO-8859 text What's happened here is that: 1. You've authored your commit in ISO-8859-1 2. Git itself has no place for the encoding of the author name in the commit object format 3. git-format-patch has a --compose-encoding which I think would sort this out if you set it to ISO-8859-1, but it defaults to UTF-8 4. Your patch is actually a ISO-8859-1 byte sequence, but is advertised as UTF-8 5. You end up with a screwed-up commit You could work around this, but I suggest just joining the 21st century and working exclusively in UTF-8, it makes things much easier, speaking as someone with 3x more non-ASCII characters their his name than you :) Ok, then the question is: How do I switch to UTF-8? To me it seems I'm already using it: $ locale charmap UTF-8 Your .gitconfig has an ISO-8859-1 string, from an earlier mail of yours: $ git config -l | grep Noralf | xxd 000: 7573 6572 2e6e 616d 653d 4e6f 7261 6c66 user.name=Noralf 010: 2054 72f8 6e6e 6573 0aTr.nnes. On a system configured for UTF-8 this would be: $ echo Noralf Trønnes | xxd 000: 4e6f 7261 6c66 2054 72c3 b86e 6e65 730a Noralf Tr..nnes. Note the f8 v.s. c3 b8. Yes: $ echo Noralf Trønnes | xxd 000: 4e6f 7261 6c66 2054 72f8 6e6e 6573 0aNoralf Tr.nnes. Is there a command I can run that shows that I'm using ISO-8859-1 ? I need something to google with, my previous search only gave locale stuff, which seems fine. What does this give you, this is UTF-8. $ echo git commit --author=Noralf Trønnes no...@tronnes.org | xxd 000: 6769 7420 636f 6d6d 6974 202d 2d61 7574 git commit --aut 010: 686f 723d 4e6f 7261 6c66 2054 72c3 b86e hor=Noralf Tr..n 020: 6e65 7320 3c6e 6f74 726f 4074 726f 6e6e nes notro@tronn 030: 6573 2e6f 7267 3e0a es.org. To see if you're using UTF-8 just look at the codepoints for the non-ASCII characters you're using and check if they're valid UTF-8. E.g. you can check this out: http://en.wikipedia.org/wiki/%C3%98#Computers Which shows you that the UTF-8 hex version is C3 B8, but the Latin-1 is F8, you're emitting F8, I'm emitting C3 B8. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git messes up 'ø' character
On Tue, Jan 20, 2015 at 10:20 PM, Jeff King p...@peff.net wrote: On Tue, Jan 20, 2015 at 09:45:46PM +0100, Ævar Arnfjörð Bjarmason wrote: What's happened here is that: 1. You've authored your commit in ISO-8859-1 2. Git itself has no place for the encoding of the author name in the commit object format Is (2) right? The encoding header in a commit object should apply not just to the commit message, but also to the author (and committer) name. I think the real problem is simply that it defaults to UTF-8, but he is giving it iso-8859-1 characters. Setting i18n.commitEncoding should fix it. True, I forgot about that setting. -Peff PS If you try experimenting with this, you may fall afoul of 08a94a1 (commit/commit-tree: correct latin1 to utf-8, 2012-06-28), which will silently correct Latin1 characters into UTF-8 (when the commit message is expected to be in UTF-8, of course). So it actually _should_ just work under modern gits, but only for Latin1. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] Git v2.3.0-rc0
On Tue, Jan 13, 2015 at 12:57 AM, Junio C Hamano gits...@pobox.com wrote: An early preview release Git v2.3.0-rc0 is now available for testing at the usual places. [...] Jeff King (38): [...] parse_color: refactor color storage [...] I've had this in my .gitconfig since 2010 which was broken by Jeff's v2.1.3-24-g695d95d: ;; Don't be so invasive about coloring ^M when I'm editing files that ;; are supposed to have \r\n. [color diff] whitespace = 0 To test this replace \n with \r\n in a file. Before this patch you could do: git -c color.diff.whitespace=0 show And just get: [red]-[/red] [green]+[/green] As opposed to: git -c color.diff.whitespace=1 show Which gives you: [red]- [green]+[/green][red]^M[/red] Now that just produces: error: invalid color value: 0 fatal: bad config variable 'color.diff.whitespace' in file '/home/avar/.gitconfig' at line 16 Maybe breaking this is OK (but I can't find what the replacement is), but the config or the the changelog doesn't mention breaking existing config settings. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git messes up 'ø' character
On Tue, Jan 20, 2015 at 9:17 PM, Noralf Trønnes no...@tronnes.org wrote: Den 20.01.2015 21:07, skrev Torsten Bögershausen: On 2015-01-20 20.46, Noralf Trønnes wrote: could it be that your ø is not encoded as UTF-8, but in ISO-8859-15 (or so) $ git log -1 commit b2a4f6abdb097c4dc092b56995a2af8e42fbea79 Author: Noralf TrF8nnes no...@tronnes.org What does git config -l | grep Noralf | xxd say ? $ git config -l | grep Noralf | xxd 000: 7573 6572 2e6e 616d 653d 4e6f 7261 6c66 user.name=Noralf 010: 2054 72f8 6e6e 6573 0aTr.nnes. $ file ~/.gitconfig /home/pi/.gitconfig: ISO-8859 text What's happened here is that: 1. You've authored your commit in ISO-8859-1 2. Git itself has no place for the encoding of the author name in the commit object format 3. git-format-patch has a --compose-encoding which I think would sort this out if you set it to ISO-8859-1, but it defaults to UTF-8 4. Your patch is actually a ISO-8859-1 byte sequence, but is advertised as UTF-8 5. You end up with a screwed-up commit You could work around this, but I suggest just joining the 21st century and working exclusively in UTF-8, it makes things much easier, speaking as someone with 3x more non-ASCII characters their his name than you :) -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] Git Merge, April 8-9, Paris
On Sat, Jan 24, 2015 at 12:37 AM, Jeff King p...@peff.net wrote: GitHub is organizing a Git-related conference to be held April 8-9, 2015, in Paris. Details here: http://git-merge.com/ The exact schedule is still being worked out, but there is going to be some dedicated time/space for Git (and libgit2 and JGit) developers to meet and talk to each other. If you have patches in Git, I'd encourage you to consider attending. If travel finances are a problem, please talk to me. GitHub may be able to defray the cost of travel. I hope to see people there! I'll be there, excited to be there and meet you all. I'm even more excited in a way to be traveling from The Netherlands to Paris to attend conference claiming to be governed by California law[1] :) 1. Small print at https://ti.to/github-events/git-merge-2015 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git Scaling: What factors most affect Git performance for a large repo?
On Fri, Feb 20, 2015 at 1:04 AM, Duy Nguyen pclo...@gmail.com wrote: On Fri, Feb 20, 2015 at 6:29 AM, Ævar Arnfjörð Bjarmason ava...@gmail.com wrote: Anecdotally I work on a repo at work (where I'm mostly the Git guy) that's: * Around 500k commits * Around 100k tags * Around 5k branches * Around 500 commits/day, almost entirely to the same branch * 1.5 GB .git checkout. * Mostly text source, but some binaries (we're trying to cut down[1] on those) Would be nice if you could make an anonymized version of this repo public. Working on a real large repo is better than an artificial one. Yeah, I'll try to do that. But actually most of git fetch is spent in the reachability check subsequently done by git-rev-list which takes several seconds. I I wonder if reachability bitmap could help here.. I could have sworn I had that enabled already but evidently not. I did test it and it cut down on clone times a bit. Now our daily repacking is: git --git-dir={} gc git --git-dir={} pack-refs --all --prune git --git-dir={} repack -Ad --window=250 --depth=100 --write-bitmap-index --pack-kept-objects It's not clear to me from the documentation whether this should just be enabled on the server, or the clients too. In any case I've enabled it on both. Even then with it enabled on both a git pull that pulls down just one commit on one branch is 13s. Trace attached at the end of the mail. haven't looked into it but there's got to be room for optimization there, surely it only has to do reachability checks for new refs, or could run in some I trust this remote not to send me corrupt data completely mode (which would make sense within a company where you can trust your main Git box). No, it's not just about trusting the server side, it's about catching data corruption on the wire as well. We have a trick to avoid reachability check in clone case, which is much more expensive than a fetch. Maybe we could do something further to help the fetch case _if_ reachability bitmaps don't help. Still, if that's indeed a big bottleneck what's the worst-case scenario here? That the local repository gets hosed? The server will still recursively validate the objects it gets sent, right? I wonder if a better trade-off in that case would be to skip this in some situations and instead put something like git fsck in a cronjob. Here's a git pull trace mentioned above: $ time GIT_TRACE=1 git pull 13:06:13.603781 git.c:555 trace: exec: 'git-pull' 13:06:13.603936 run-command.c:351 trace: run_command: 'git-pull' 13:06:13.620615 git.c:349 trace: built-in: git 'rev-parse' '--git-dir' 13:06:13.631602 git.c:349 trace: built-in: git 'rev-parse' '--is-bare-repository' 13:06:13.636103 git.c:349 trace: built-in: git 'rev-parse' '--show-toplevel' 13:06:13.641491 git.c:349 trace: built-in: git 'ls-files' '-u' 13:06:13.719923 git.c:349 trace: built-in: git 'symbolic-ref' '-q' 'HEAD' 13:06:13.728085 git.c:349 trace: built-in: git 'config' 'branch.trunk.rebase' 13:06:13.738160 git.c:349 trace: built-in: git 'config' 'pull.ff' 13:06:13.743286 git.c:349 trace: built-in: git 'rev-parse' '-q' '--verify' 'HEAD' 13:06:13.972091 git.c:349 trace: built-in: git 'rev-parse' '--verify' 'HEAD' 13:06:14.149420 git.c:349 trace: built-in: git 'update-index' '-q' '--ignore-submodules' '--refresh' 13:06:14.294098 git.c:349 trace: built-in: git 'diff-files' '--quiet' '--ignore-submodules' 13:06:14.467711 git.c:349 trace: built-in: git 'diff-index' '--cached' '--quiet' '--ignore-submodules' 'HEAD' '--' 13:06:14.683419 git.c:349 trace: built-in: git 'rev-parse' '-q' '--git-dir' 13:06:15.189707 git.c:349 trace: built-in: git 'rev-parse' '-q' '--verify' 'HEAD' 13:06:15.335948 git.c:349 trace: built-in: git 'fetch' '--update-head-ok' 13:06:15.691303 run-command.c:351 trace: run_command: 'ssh' 'git.example.com' 'git-upload-pack '\''/gitrepos/core.git'\''' 13:06:17.095662 run-command.c:351 trace: run_command: 'rev-list' '--objects' '--stdin' '--not' '--all' '--quiet' remote: Counting objects: 6, done. remote: Compressing objects: 100% (6/6), done. 3:06:20.426346 run-command.c:351 trace: run_command: 'unpack-objects' '--pack_header=2,6' 13:06:20.431806 exec_cmd.c:130 trace: exec: 'git' 'unpack-objects' '--pack_header=2,6' 13:06:20.437343 git.c:349 trace: built-in: git 'unpack-objects' '--pack_header=2,6' remote: Total 6 (delta 0), reused 0 (delta 0) Unpacking objects: 100% (6/6), done. 13:06:20.444196 run-command.c:351 trace: run_command: 'rev-list' '--objects' '--stdin' '--not' '--all' 13:06:20.447135 exec_cmd.c:130 trace: exec: 'git' 'rev-list' '--objects' '--stdin' '--not' '--all' 13:06:20.451283 git.c:349 trace: built
Re: Git Scaling: What factors most affect Git performance for a large repo?
On Fri, Feb 20, 2015 at 1:09 PM, Ævar Arnfjörð Bjarmason ava...@gmail.com wrote: On Fri, Feb 20, 2015 at 1:04 AM, Duy Nguyen pclo...@gmail.com wrote: On Fri, Feb 20, 2015 at 6:29 AM, Ævar Arnfjörð Bjarmason ava...@gmail.com wrote: Anecdotally I work on a repo at work (where I'm mostly the Git guy) that's: * Around 500k commits * Around 100k tags * Around 5k branches * Around 500 commits/day, almost entirely to the same branch * 1.5 GB .git checkout. * Mostly text source, but some binaries (we're trying to cut down[1] on those) Would be nice if you could make an anonymized version of this repo public. Working on a real large repo is better than an artificial one. Yeah, I'll try to do that. But actually most of git fetch is spent in the reachability check subsequently done by git-rev-list which takes several seconds. I I wonder if reachability bitmap could help here.. I could have sworn I had that enabled already but evidently not. I did test it and it cut down on clone times a bit. Now our daily repacking is: git --git-dir={} gc git --git-dir={} pack-refs --all --prune git --git-dir={} repack -Ad --window=250 --depth=100 --write-bitmap-index --pack-kept-objects It's not clear to me from the documentation whether this should just be enabled on the server, or the clients too. In any case I've enabled it on both. Even then with it enabled on both a git pull that pulls down just one commit on one branch is 13s. Trace attached at the end of the mail. haven't looked into it but there's got to be room for optimization there, surely it only has to do reachability checks for new refs, or could run in some I trust this remote not to send me corrupt data completely mode (which would make sense within a company where you can trust your main Git box). No, it's not just about trusting the server side, it's about catching data corruption on the wire as well. We have a trick to avoid reachability check in clone case, which is much more expensive than a fetch. Maybe we could do something further to help the fetch case _if_ reachability bitmaps don't help. Still, if that's indeed a big bottleneck what's the worst-case scenario here? That the local repository gets hosed? The server will still recursively validate the objects it gets sent, right? I wonder if a better trade-off in that case would be to skip this in some situations and instead put something like git fsck in a cronjob. Here's a git pull trace mentioned above: $ time GIT_TRACE=1 git pull 13:06:13.603781 git.c:555 trace: exec: 'git-pull' 13:06:13.603936 run-command.c:351 trace: run_command: 'git-pull' 13:06:13.620615 git.c:349 trace: built-in: git 'rev-parse' '--git-dir' 13:06:13.631602 git.c:349 trace: built-in: git 'rev-parse' '--is-bare-repository' 13:06:13.636103 git.c:349 trace: built-in: git 'rev-parse' '--show-toplevel' 13:06:13.641491 git.c:349 trace: built-in: git 'ls-files' '-u' 13:06:13.719923 git.c:349 trace: built-in: git 'symbolic-ref' '-q' 'HEAD' 13:06:13.728085 git.c:349 trace: built-in: git 'config' 'branch.trunk.rebase' 13:06:13.738160 git.c:349 trace: built-in: git 'config' 'pull.ff' 13:06:13.743286 git.c:349 trace: built-in: git 'rev-parse' '-q' '--verify' 'HEAD' 13:06:13.972091 git.c:349 trace: built-in: git 'rev-parse' '--verify' 'HEAD' 13:06:14.149420 git.c:349 trace: built-in: git 'update-index' '-q' '--ignore-submodules' '--refresh' 13:06:14.294098 git.c:349 trace: built-in: git 'diff-files' '--quiet' '--ignore-submodules' 13:06:14.467711 git.c:349 trace: built-in: git 'diff-index' '--cached' '--quiet' '--ignore-submodules' 'HEAD' '--' 13:06:14.683419 git.c:349 trace: built-in: git 'rev-parse' '-q' '--git-dir' 13:06:15.189707 git.c:349 trace: built-in: git 'rev-parse' '-q' '--verify' 'HEAD' 13:06:15.335948 git.c:349 trace: built-in: git 'fetch' '--update-head-ok' 13:06:15.691303 run-command.c:351 trace: run_command: 'ssh' 'git.example.com' 'git-upload-pack '\''/gitrepos/core.git'\''' 13:06:17.095662 run-command.c:351 trace: run_command: 'rev-list' '--objects' '--stdin' '--not' '--all' '--quiet' remote: Counting objects: 6, done. remote: Compressing objects: 100% (6/6), done. 3:06:20.426346 run-command.c:351 trace: run_command: 'unpack-objects' '--pack_header=2,6' 13:06:20.431806 exec_cmd.c:130 trace: exec: 'git' 'unpack-objects' '--pack_header=2,6' 13:06:20.437343 git.c:349 trace: built-in: git 'unpack-objects' '--pack_header=2,6' remote: Total 6 (delta 0), reused 0 (delta 0) Unpacking objects: 100% (6/6), done. 13:06:20.444196 run-command.c:351 trace: run_command: 'rev-list' '--objects' '--stdin' '--not' '--all' 13:06
Re: Git Scaling: What factors most affect Git performance for a large repo?
On Fri, Feb 20, 2015 at 1:09 PM, Ævar Arnfjörð Bjarmason ava...@gmail.com wrote: On Fri, Feb 20, 2015 at 1:04 AM, Duy Nguyen pclo...@gmail.com wrote: On Fri, Feb 20, 2015 at 6:29 AM, Ævar Arnfjörð Bjarmason ava...@gmail.com wrote: Anecdotally I work on a repo at work (where I'm mostly the Git guy) that's: * Around 500k commits * Around 100k tags * Around 5k branches * Around 500 commits/day, almost entirely to the same branch * 1.5 GB .git checkout. * Mostly text source, but some binaries (we're trying to cut down[1] on those) Would be nice if you could make an anonymized version of this repo public. Working on a real large repo is better than an artificial one. Yeah, I'll try to do that. tl;dr: After some more testing it turns out the performance issues we have are almost entirely due to the number of refs. Some of these I knew about and were obvious (e..g. git pull), but some aren't so obvious (why does git log without --all slow down as a function of the overall number of refs?). Rather than getting an anonymized version of the repo we have, a simpler isolated test case is just doing this on linux.git: $ git rev-list --all | perl -ne 'my $cnt; while () { s([a-f0-9]+)git tag -a -mTest TAG $1gm; next unless int rand 10 == 1; $cnt++; s/TAG/tagnr-$cnt/; print }' | sh -x That'll create a tag for every 10th commit or so, which is around 50k tags for linux.git. I actually ran this a few times while testing it, so this is a before and after on a hot cache of linux.git with 406 tags v.s. ~140k. I ran the gc + repack + bitmaps for both repos noted in an earlier reply of mine, and took the fastest run out of 3: $ time (git log master -100 /dev/null) Before: real0m0.021s After: real0m2.929s $ time (git status /dev/null) # Around 150ms, no noticeable difference $ time git fetch # I'm fetching from g...@github.com:torvalds/linux.git here, the # cache is hot but upstream has *no* changes Before: real0m1.826s After: real0m8.458s Details on why git fetch is slow in this situation: $ time GIT_TRACE=1 git fetch 15:15:00.435420 git.c:349 trace: built-in: git 'fetch' 15:15:00.654428 run-command.c:341 trace: run_command: 'ssh' 'g...@github.com' 'git-upload-pack '\''torvalds/linux.git'\''' 15:15:02.426121 run-command.c:341 trace: run_command: 'rev-list' '--objects' '--stdin' '--not' '--all' '--quiet' 15:15:05.507327 run-command.c:341 trace: run_command: 'rev-list' '--objects' '--stdin' '--not' '--all' 15:15:05.508329 exec_cmd.c:134 trace: exec: 'git' 'rev-list' '--objects' '--stdin' '--not' '--all' 15:15:05.510490 git.c:349 trace: built-in: git 'rev-list' '--objects' '--stdin' '--not' '--all' 15:15:08.874116 run-command.c:341 trace: run_command: 'gc' '--auto' 15:15:08.879570 exec_cmd.c:134 trace: exec: 'git' 'gc' '--auto' 15:15:08.882495 git.c:349 trace: built-in: git 'gc' '--auto' real0m8.458s user0m6.548s sys 0m0.204s Even things you'd expect to not be impacted are, like a reverse log search on the master branch: $ time (git log --reverse -p --grep=arm64 origin/master /dev/null) Before: real0m4.473s After: real0m6.194s Or doing 10 commits and rebasing on the upstream: $ time (git checkout origin/master~ for i in {1..10}; do echo $i file git add file git commit -mmoo $file; done git rebase origin/master) Before: real0m6.798s After: real0m12.340s The remaining slowdown comes from the size of the tree, which we can deal with by either reducing it in size (we have some copied JS libraries and whatnot) or trying the inotify-powered git-status. In our case there's no good reason for why we have this many refs in the repository everyone uses. We basically just have a bunch of dated rollout tags that have been accumulating for years, and a bunch of mostly unused branches people just haven't cleaned up. So I'm going to: 1. Write a hook that rejects tags that aren't new (i.e. forbid re-pushes of old tags) 2. Create an archive repository that contains all the old tags (i.e. just run git fetch on the main one from cron) 3. Run a script to regularly delete tags from the main repo 4. Run the same script on the clients that clone the repo The branches are slightly harder, deleting those that are fully merged into the same branch is easy, deleting those whose contents 100% matches patch-id's already in the main branch is another thing we can do, and just clean up branches unconditionally after they've reached a certain age (they'll still be archived). -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Geolocation support
On Mon, Feb 9, 2015 at 2:24 AM, Junio C Hamano gits...@pobox.com wrote: In case I was not clear, I do not think it is likely for us to accept a patch that mucks with object header fields with this information. Have them in the log text and let UI interpret them. We've already told clients for a long time to ignore fields they don't know about, why would we not store what's intended to be machine-readable key-value pair data in the commit object itself, as opposed to sticking it in the log message where parsing it is always going to be a bit more tricky distracting, since users will have to look at this arbitrary metadata when they do git log or git show. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Experience with Recovering From User Error (And suggestions for improvements)
On Mon, Feb 16, 2015 at 11:41 AM, Armin Ronacher armin.ronac...@active-4.com wrote: Long story short: I failed big time yesterday with accidentally executing git reset hard in the wrong terminal window but managed to recover my changes from the staging area by manually examining blobs touched recently. After that however I figured I might want to add a precaution for myself that would have helped there. git fsck is quite nice, but unfortunately it does not help if you do not have a commit. So I figured it might be nice to create a dangling backup commit before a reset which would have helped me. Unfortunately there is currently no good way to hook into git reset. Things I noticed in the process: * for recovering blobs, going through the objects itself was more useful because they were all recent changes and as such I could order by timestamp. git fsck will not provide any timestamps (which generally makes sense, but made it quite useless for me) * Recovering from blobs is painful, it would be nice if git reset --hard made a dangling dummy commit before :) * There is no pre-commit hook which could be used to implement the previous suggestion. Would it make sense to introduce a `pre-commit` hook for this sort of thing or even create a dummy commit by default? I did a quick googling around and it looks like I was not the first person who made this mistake. Github's windows client even creates dangling backup commits in what appears to be fixed time intervals. I understand that ultimately this was a user error on my part, but it seems like a small change that could save a lot of frustration. Something like can we have a hook for every change in the working tree has come up in the past, but has been defeated by performance concerns. git reset --hard is a low-level-ish operation, and it's really useful to be able to quickly reset the working tree to some state no matter what, and without creating extra commits or whatever. We should definitely make recovery like this harder, but is there a reason for why you don't use git reset --keep instead of --hard? It'll keep any local changes to your index/staging area, and reset the files that don't conflict, if there's any conflicts the operation will be aborted. If we created such hooks for git reset --hard we'd just need to expose some other thing as that low-level operation (and break scripts that already rely on it doing the minimal yes I want to change the tree no matter what thing), and then we'd just be back to square one in a few years when users started using git reset --really-hard (or whatever the flag would be). -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Experience with Recovering From User Error (And suggestions for improvements)
On Mon, Feb 16, 2015 at 1:09 PM, Ævar Arnfjörð Bjarmason ava...@gmail.com wrote: On Mon, Feb 16, 2015 at 11:41 AM, Armin Ronacher armin.ronac...@active-4.com wrote: Long story short: I failed big time yesterday with accidentally executing git reset hard in the wrong terminal window but managed to recover my changes from the staging area by manually examining blobs touched recently. After that however I figured I might want to add a precaution for myself that would have helped there. git fsck is quite nice, but unfortunately it does not help if you do not have a commit. So I figured it might be nice to create a dangling backup commit before a reset which would have helped me. Unfortunately there is currently no good way to hook into git reset. Things I noticed in the process: * for recovering blobs, going through the objects itself was more useful because they were all recent changes and as such I could order by timestamp. git fsck will not provide any timestamps (which generally makes sense, but made it quite useless for me) * Recovering from blobs is painful, it would be nice if git reset --hard made a dangling dummy commit before :) * There is no pre-commit hook which could be used to implement the previous suggestion. Would it make sense to introduce a `pre-commit` hook for this sort of thing or even create a dummy commit by default? I did a quick googling around and it looks like I was not the first person who made this mistake. Github's windows client even creates dangling backup commits in what appears to be fixed time intervals. I understand that ultimately this was a user error on my part, but it seems like a small change that could save a lot of frustration. Something like can we have a hook for every change in the working tree has come up in the past, but has been defeated by performance concerns. git reset --hard is a low-level-ish operation, and it's really useful to be able to quickly reset the working tree to some state no matter what, and without creating extra commits or whatever. We should definitely make recovery like this harder, but is there a reason for why you don't use git reset --keep instead of --hard? It'll keep any local changes to your index/staging area, and reset the files that don't conflict, if there's any conflicts the operation will be aborted. Recovery like this easier, i.e. make it easier to get back previously staged commits / blobs. If we created such hooks for git reset --hard we'd just need to expose some other thing as that low-level operation (and break scripts that already rely on it doing the minimal yes I want to change the tree no matter what thing), and then we'd just be back to square one in a few years when users started using git reset --really-hard (or whatever the flag would be). -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git Scaling: What factors most affect Git performance for a large repo?
On Thu, Feb 19, 2015 at 10:26 PM, Stephen Morton stephen.c.mor...@gmail.com wrote: I posted this to comp.version-control.git.user and didn't get any response. I think the question is plumbing-related enough that I can ask it here. I'm evaluating the feasibility of moving my team from SVN to git. We have a very large repo. [1] We will have a central repo using GitLab (or similar) that everybody works with. Forks, code sharing, pull requests etc. will be done through this central server. By 'performance', I guess I mean speed of day to day operations for devs. * (Obviously, trivially, a (non-local) clone will be slow with a large repo.) * Will a few simultaneous clones from the central server also slow down other concurrent operations for other users? * Will 'git pull' be slow? * 'git push'? * 'git commit'? (It is listed as slow in reference [3].) * 'git stautus'? (Slow again in reference 3 though I don't see it.) * Some operations might not seem to be day-to-day but if they are called frequently by the web front-end to GitLab/Stash/GitHub etc then they can become bottlenecks. (e.g. 'git branch --contains' seems terribly adversely affected by large numbers of branches.) * Others? Assuming I can put lots of resources into a central server with lots of CPU, RAM, fast SSD, fast networking, what aspects of the repo are most likely to affect devs' experience? * Number of commits * Sheer disk space occupied by the repo * Number of tags. * Number of branches. * Binary objects in the repo that cause it to bloat in size [1] * Other factors? Of the various HW items listed above --CPU speed, number of cores, RAM, SSD, networking-- which is most critical here? (Stash recommends 1.5 x repo_size x number of concurrent clones of available RAM. I assume that is good advice in general.) Assume ridiculous numbers. Let me exaggerate: say 1 million commits, 15 GB repo, 50k tags, 1,000 branches. (Due to historical code fixups, another 5,000 fix-up branches which are just one little dangling commit required to change the code a little bit between a commit a tag that was not quite made from it.) While there's lots of information online, much of it is old [3] and with git constantly evolving I don't know how valid it still is. Then there's anecdotal evidence that is of questionable value.[2] Are many/all of the issues Facebook identified [3] resolved? (Yes, I understand Facebook went with Mercurial. But I imagine the git team nevertheless took their analysis to heart.) Anecdotally I work on a repo at work (where I'm mostly the Git guy) that's: * Around 500k commits * Around 100k tags * Around 5k branches * Around 500 commits/day, almost entirely to the same branch * 1.5 GB .git checkout. * Mostly text source, but some binaries (we're trying to cut down[1] on those) The main scaling issues we have with Git are: * git pull takes around 10 seconds or so * Operations like git status are much slower because they scale with the size of the work tree * Similarly git rebase takes a much longer time for each applied commit, I think because it does the equivalent of git status for every applied commit. Each commit applied takes around 1-2 seconds. * We have a lot of contention on pushes because we're mostly pushing to one branch. * History spelunking (e.g. git log --reverse -p -Gstr) is taking longer by the day The obvious reason for why git pull is slow is because git-upload-pack spews the complete set of refs at you each time. The output from that command is around 10MB in size for us now. It takes around 300 ms to run that locally from hot cache, a bit more to send it over the network. But actually most of git fetch is spent in the reachability check subsequently done by git-rev-list which takes several seconds. I haven't looked into it but there's got to be room for optimization there, surely it only has to do reachability checks for new refs, or could run in some I trust this remote not to send me corrupt data completely mode (which would make sense within a company where you can trust your main Git box). The git status operations could be made faster by having something like watchman, there's been some effort on getting that done in Git, but I haven't tried it. This seems to have been the main focus of Facebook's Mercurial optimization effort. Some of this you can solve mostly by doing e.g. git status -uno, having support for such unsafe operations (e.g. teaching rebase and pals to use it) would be nice at the cost of some safety, but having something that feeds of inotify would be even better. It takes around 3 minutes to reclone our repo, we really don't care (we rarely re-clone). But I thought I'd mention it because for some reason this is important to Facebook and along with inotify were the two major things they focused on. As far as I know every day Git operations don't scale all
Re: [PATCH] clone: Warn if clone lacks LICENSE or COPYING file
On Sat, Mar 21, 2015 at 7:06 PM, David A. Wheeler dwhee...@dwheeler.com wrote: Warn cloners if there is no LICENSE* or COPYING* file that makes the license clear. This is a useful warning, because if there is no license somewhere, then local copyright laws (which forbid many uses) and terms of service apply - and the cloner may not be expecting that. Many projects accidentally omit a license, so this is common enough to note. For more info on the issue, feel free to see: http://choosealicense.com/no-license/ http://www.wired.com/2013/07/github-licenses/ https://twitter.com/stephenrwalli/status/247597785069789184 As others have indicated here this feature is really specific to a single lint-like use-case and doesn't belong in clone as a built-in feature. However perhaps an interesting generalization of this would be something like a post-clone hook, obviously you couldn't store that in .git/hooks/ like other githooks(5) since there's no repo yet, but having it configured via the user/system config might be an interesting feature. If you're still interested in getting this functionality perhaps a patch to have some general post-clone hook mechanism would be accepted, then you could check license files or anything else you cared about. You could also just have a shell alias that wrapped git-clone... -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Why is git fetch --prune so much slower than git remote prune?
The --prune option to fetch added in v1.6.5-8-gf360d84 seems to be around 20-30x slower than the equivalent operation with git remote prune. I'm wondering if I'm missing something and fetch does something more, but it doesn't seem so. To test this clone git.git, create 1000 branches it in, create two local clones of that clone and then delete the 1000 branches in the original. I have a script to do this at https://gist.github.com/avar/497c8c8fbd641fb756ef Then in each of the clones: $ git branch -a|wc -l; time (~/g/git/git-fetch --prune origin /dev/null 21); git branch -a | wc -l 1003 real0m3.337s user0m2.996s sys 0m0.336s 3 $ git branch -a|wc -l; time (~/g/git/git-remote prune origin /dev/null 21); git branch -a | wc -l 1003 real0m0.067s user0m0.020s sys 0m0.040s 3 Both of these ends up doing a git fetch, so it's not that. I'm quite rusty in C profiling but here's a gprof of the git-fetch command: $ gprof ~/g/git/git-fetch|head -n 20 Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds secondscalls s/call s/call name 26.42 0.33 0.33 1584583 0.00 0.00 strbuf_getwholeline 14.63 0.51 0.18 90601347 0.00 0.00 strbuf_grow 13.82 0.68 0.17 1045676 0.00 0.00 find_pack_entry_one 8.13 0.78 0.10 1050062 0.00 0.00 check_refname_format 6.50 0.86 0.08 1584675 0.00 0.00 get_sha1_hex 5.69 0.93 0.07 2100529 0.00 0.00 starts_with 3.25 0.97 0.04 1044043 0.00 0.00 refname_is_safe 3.25 1.01 0.04 8007 0.00 0.00 get_packed_ref_cache 2.44 1.04 0.03 2605595 0.00 0.00 search_ref_dir 2.44 1.07 0.03 1040500 0.00 0.00 peel_entry 1.63 1.09 0.02 2632661 0.00 0.00 get_ref_dir 1.63 1.11 0.02 1044043 0.00 0.00 create_ref_entry 1.63 1.13 0.02 8024 0.00 0.00 do_for_each_entry_in_dir 0.81 1.14 0.01 2155105 0.00 0.00 memory_limit_check 0.81 1.15 0.01 1580503 0.00 0.00 sha1_to_hex And of the git-remote command: $ gprof ~/g/git/git-remote|head -n 20 Flat profile: Each sample counts as 0.01 seconds. no time accumulated % cumulative self self total time seconds secondscalls Ts/call Ts/call name 0.00 0.00 0.00 197475 0.00 0.00 strbuf_grow 0.00 0.00 0.0024214 0.00 0.00 sort_ref_dir 0.00 0.00 0.0024190 0.00 0.00 search_ref_dir 0.00 0.00 0.0021661 0.00 0.00 memory_limit_check 0.00 0.00 0.0020236 0.00 0.00 get_ref_dir 0.00 0.00 0.00 9187 0.00 0.00 xrealloc 0.00 0.00 0.00 7048 0.00 0.00 strbuf_add 0.00 0.00 0.00 6348 0.00 0.00 do_xmalloc 0.00 0.00 0.00 6126 0.00 0.00 xcalloc 0.00 0.00 0.00 6056 0.00 0.00 cleanup_path 0.00 0.00 0.00 6050 0.00 0.00 get_git_dir 0.00 0.00 0.00 6050 0.00 0.00 vsnpath 0.00 0.00 0.00 5554 0.00 0.00 config_file_fgetc Aside from the slowness of git-fetch it seems git-remote can be sped up quite a bit by more aggressively allocating a larger string buffer from the get-go. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] http: Include locale.h when using setlocale()
Since v2.3.0-rc1-37-gf18604b we've been using setlocale() here without importing locale.h. Oddly enough this only causes issues for me under -O0 on GCC Clang. I.e. if I do: $ git clean -dxf; make -j 1 V=1 CFLAGS=-g -O0 -Wall http.o I'll get this on clang 3.5.0-6 GCC 4.9.1-19 on Debian: http.c: In function ‘get_preferred_languages’: http.c:1021:2: warning: implicit declaration of function ‘setlocale’ [-Wimplicit-function-declaration] retval = setlocale(LC_MESSAGES, NULL); ^ http.c:1021:21: error: ‘LC_MESSAGES’ undeclared (first use in this function) retval = setlocale(LC_MESSAGES, NULL); But changing -O0 to -O1 or another optimization level makes the issue go away. Odd, but in any case we should be including this header if we're going to use the function, so just do that. Signed-off-by: Ævar Arnfjörð Bjarmason ava...@gmail.com --- http.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/http.c b/http.c index 0153fb0..0606e6c 100644 --- a/http.c +++ b/http.c @@ -8,6 +8,9 @@ #include credential.h #include version.h #include pkt-line.h +#ifndef NO_GETTEXT +# include locale.h +#endif int active_requests; int http_is_verbose; -- 2.1.3 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: how to make full copy of a repo
On Sat, Mar 28, 2015 at 7:52 PM, Torsten Bögershausen tbo...@web.de wrote: On 2015-03-28 03.56, Christoph Anton Mitterer wrote: Hey. I was looking for an ideally simple way to make a full copy of a git repo. Many howtos are floating around on this on the web, with also lots of voodoo. First, it shouldn't be just a clone, i.o.w. - I want to have all refs (local/remote branches/tags) and of course all objects from the source repo copied as is. So it's local branches should become my local branches and not remote branches as well - and so on. Basically I want to be able to delete the source afterwards (and all backups ;) ) and not having anything lost. - It shouldn't set the source repo as origin or it's branches as remote tracking branches, as said it should be identical the source repo, just freshly copied via the Git aware transport mechanisms. - Whether GC or repacking happens, I don't care, as long as nothing that is still reachable in the source repo wouldn't get lost (or get lost once I run a GC in the copied repo). - Whether anything that other tools have added to .git (e.g. git-svn stuff) get's lost, I don't care. - It should work for both, bare and non-bare repos, but it's okay when it doesn't copy anything that is not committed or stashed. I'd have said that either: $ git clone --mirror URl-to-source-repo copy for the direction from outside the source to a copy, or alternatively: $ cd source-repo $ git push --mirror URl-to-copy for the direction from within the source to a copy with copy being an empty bare or non-bare repo, would do the job. But: a) but the git-clone(1) part for --mirror: and sets up a refspec configuration such that all these refs are overwritten by a git remote update in the target repository. kinda confuses me since I wanted to get independent of the source repo and this ssems to set up a remote to it? b) do I need --all --tags for the push as well? c) When following https://help.github.com/articles/duplicating-a-repository/ it doesn't seem as if --mirror is what I want because they seem to advertise it rather as having the copy tracking the source repo. Of course I read about just using git-clone --bare, but that seems to not copy everything that --mirror does (remote-tracking branches, notes). So I'm a bit confused... This instructions have 3 repos: the source, old, the destination new and a temporary one. As you only push to new, new should have no information about old or temp. 1) Is it working like I assumed above? 2) Does that also copy things like git-config, hooks, etc.? 3) Does it copy the configured remotes from the source? 4) What else is not copied by that? I'd assume anything that is not tracked by git and the stash of the source? You didn't write if this is a bare repository, if it is on a local disc, if it is reachable by rsync ? Linux or Windows ? For a full clone (in the sense of having everything, bit for bit) I would probably use rsync. (After stopping all activities on the repo) This warrants more emphasis. If you rsync a repository that's active, i.e. getting pushes you *will* get corrupt copies. E.g. you can easily copy something out of the objects directory that's in the middle of being written, or copy the refs namespace after you copy objects and end up with an unreachable object. There's unfortunately no good solution to this other than doing both git --mirror backups and rsync backups (for hooks etc.) and combining the two, or pushing a hook for the duration that bans all updates. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git Scaling: What factors most affect Git performance for a large repo?
On Tue, Feb 24, 2015 at 1:44 PM, Michael Haggerty mhag...@alum.mit.edu wrote: On 02/20/2015 03:25 PM, Ævar Arnfjörð Bjarmason wrote: On Fri, Feb 20, 2015 at 1:09 PM, Ævar Arnfjörð Bjarmason ava...@gmail.com wrote: On Fri, Feb 20, 2015 at 1:04 AM, Duy Nguyen pclo...@gmail.com wrote: On Fri, Feb 20, 2015 at 6:29 AM, Ævar Arnfjörð Bjarmason ava...@gmail.com wrote: Anecdotally I work on a repo at work (where I'm mostly the Git guy) that's: * Around 500k commits * Around 100k tags * Around 5k branches * Around 500 commits/day, almost entirely to the same branch * 1.5 GB .git checkout. * Mostly text source, but some binaries (we're trying to cut down[1] on those) Would be nice if you could make an anonymized version of this repo public. Working on a real large repo is better than an artificial one. Yeah, I'll try to do that. tl;dr: After some more testing it turns out the performance issues we have are almost entirely due to the number of refs. Some of these I knew about and were obvious (e..g. git pull), but some aren't so obvious (why does git log without --all slow down as a function of the overall number of refs?). I'm assuming that you pack your references periodically. (If not, you should, because reading lots of loose references is very expensive for the commands that need to iterate over all references!) Yes, as mentioned in another reply of mine, like this: git --git-dir={} gc git --git-dir={} pack-refs --all --prune git --git-dir={} repack -Ad --window=250 --depth=100 --write-bitmap-index --pack-kept-objects On the other hand, packed refs also have a downside, namely that whenever even a single packed reference has to be read, the whole packed-refs file has to be read and parsed. One way that this can bite you, even with innocuous-seeming commands, is if you haven't disabled the use of replace references (i.e., using git --no-replace-objects CMD or GIT_NO_REPLACE_OBJECTS). In that case, almost any Git command has to read the refs/replace/* namespace, which, in turn, forces the whole packed-refs file to be read and parsed. This can take a significant amount of time if you have a very large number of references. Interesting. I tried the rough benchmarks I posted above with GIT_NO_REPLACE_OBJECTS=1 and couldn't see any differences, although as mentioned in another reply --no-decorate had a big effect on git-log. So try your experiments with replace references disabled. If that helps, consider disabling them on your server if you don't need them. Michael -- Michael Haggerty mhag...@alum.mit.edu -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git Scaling: What factors most affect Git performance for a large repo?
On Fri, Feb 20, 2015 at 10:04 PM, Junio C Hamano gits...@pobox.com wrote: Ævar Arnfjörð Bjarmason ava...@gmail.com writes: I actually ran this a few times while testing it, so this is a before and after on a hot cache of linux.git with 406 tags v.s. ~140k. I ran the gc + repack + bitmaps for both repos noted in an earlier reply of mine, and took the fastest run out of 3: $ time (git log master -100 /dev/null) Before: real0m0.021s After: real0m2.929s Do you force --decorate with some config? Or do you see similar performance difference with git rev-parse master, too? Yes, I had log.decorate=short set in my config. With --no-decorate: $ time (git log --no-decorate -100 /dev/null) # Before: real0m0.010s # After: real0m0.065s $ time (git status /dev/null) # Around 150ms, no noticeable difference This is understandable, as it will not look at any ref other than HEAD. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] clone: Warn if clone lacks LICENSE or COPYING file
On Mon, Mar 23, 2015 at 5:46 PM, David A. Wheeler dwhee...@dwheeler.com wrote: Junio C Hamano: An approach that checks only the top-level directory for fixed filename pattern would not be an effective way to protect the cloners, either. I disagree, I think it's remarkably effective. *Many* projects do this, including git itself. After all, many humans need to find out the licensing basics too; having a simple convention for *finding* it helps humans and tools alike. It's not even limited to open source software; developers of proprietary materials (software or now) *also* typically want to declare licensing. Sure, the top-level licensing text might be incomplete, but having that information provides a big help, and it's what most people rely on anyway. Indeed, a *lack* of this is a sign of trouble, which is exactly what warnings are good for. I don't think you're going to find people disagreeing with you that it's good to have license information where appropriate, but Git is the wrong tool to warn about this. It's a generic content tracking tool, it shouldn't be warning on the assumption that what you're tracking is a) an open source project and b) that you care to be notified about some arbitrary files being missing. A lot of Git repositories don't care at all about licensing, and having git-clone warn about this would just be useless noise most of the time. E.g. anything I put on gist.github.com, the code hundreds of people contribute to at work (we never distribute it anywhere, so a license would be pointless). I even have open source projects myself where there's no LICENSE or COPYING files since that would be redundant to notices in the files themselves, but I digress. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] Git Merge Contributors Summit, April 8th, Paris
On Mon, Apr 6, 2015 at 10:28 PM, Stefan Beller sbel...@google.com wrote: I am interested in discussing the git pack protocol v2. (I have been thinking about that for a while now, though not sharing a lot on the mailing list, so feedback is somewhat limited. :( ) I'm keen to talk about the new protocol and other scaling issues I raised in the recent Git Scaling: What factors most affect Git performance for a large repo? thread. Although from my testing the main problems in performance are the local pack-refs file reachability checks, mostly not the protocol itself. At the risk of using this list + the venue for soliciting I also want to mention that my employer is willing to pay someone on a contract basis to work on Git scalability issues, given the right person etc. etc. So if someone's at the conference is interested in that I'd be keen to talk to you. On Mon, Apr 6, 2015 at 12:08 PM, Christian Couder christian.cou...@gmail.com wrote: On Mon, Apr 6, 2015 at 12:48 AM, Thomas Ferris Nicolaisen tfn...@gmail.com wrote: On Tue, Feb 24, 2015 at 11:09 PM, Jeff King p...@peff.net wrote: I wanted to make one more announcement about this, since a few more details have been posted at: http://git-merge.com/ since my last announcement. Specifically, I wanted to call attention to the contributor's summit on the 8th. Basically, there will be a space that can hold up to 50 people, it's open only to git (and JGit and libgit2) devs, and there isn't a planned agenda. So I want to: 1. Encourage developers to come. You might meet some folks in person you've worked with online. And you can see how beautiful we all are. 2. Get people thinking about what they would like to talk about. In past GitTogethers, it's been a mix of people with prepared things to talk about, group discussions of areas, and general kibitzing. We can be spontaneous on the day of the event, but if you have a topic you want to bring up, you may want to give it some thought beforehand. If you are a git dev and want to come, please RSVP to Chris Kelly amateurhu...@github.com who is organizing the event. If you would like to come, but finances make it hard (either for travel, or for the conference fee), please talk to me off-list, and we may be able to help. If you have questions, please feel free to ask me, and I'll try to get answers from the GitHub folks who are organizing the event. I'll be arriving around 11 am on the 8th, if anyone wants to record something for the GitMinutes podcast [1]. Send me an email directly, or just walk up to me at the conference and say hi! I'll hopefully be hanging around the contributor's summit area with some microphones, but I've been unable to get any feedback from GitHub about whether this is OK, so.. I guess we'll just wing it when I get there. [1] http://www.gitminutes.com/ By the way as far as I know nothing has been planned for the Contributors Summit on the 8th. Maybe we could list some topics that we could discuss. I will probably write very short articles about some of the discussions for the next Git Rev News edition, but I would be happy if other people would like to contribute some. Please tell me and Thomas if you are interested. Also I am not sure if something is planned for the evening of the 8th or not. If nothing is planned maybe we could discuss having dinner together or something. And if someone needs help or arrives in Paris early or leaves late and is interested in meeting up, feel free to contact me. Best, Christian. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH/RFC] gitweb: Don't pass --full-history to git-log(1)
When you look at the history for a file via git log we don't show --full-history by default, but the Gitweb UI does so, which can be very confusing for all the reasons discussed in History Simplification in git-log(1) and in http://thread.gmane.org/gmane.comp.version-control.git/89400/focus=90659 We've been doing history via --full-history since pretty much forever, but I think this is much more usable, and on a typical project with lots of branches being merged it makes for a much less confusing view. We do this for git log by default, why wouldn't Gitweb follow suit? Signed-off-by: Ævar Arnfjörð Bjarmason ava...@gmail.com --- gitweb/gitweb.perl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 7a5b23a..2913896 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -7387,7 +7387,7 @@ sub git_log_generic { } my @commitlist = parse_commits($commit_hash, 101, (100 * $page), - defined $file_name ? ($file_name, --full-history) : ()); + defined $file_name ? $file_name : ()); my $ftype; if (!defined $file_hash defined $file_name) { -- 2.1.3 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC] gitweb: Don't pass --full-history to git-log(1)
On Wed, Aug 5, 2015 at 6:54 PM, Junio C Hamano gits...@pobox.com wrote: Ævar Arnfjörð Bjarmason ava...@gmail.com writes: When you look at the history for a file via git log we don't show --full-history by default, but the Gitweb UI does so, which can be very confusing for all the reasons discussed in History Simplification in git-log(1) and in http://thread.gmane.org/gmane.comp.version-control.git/89400/focus=90659 We've been doing history via --full-history since pretty much forever, but I think this is much more usable, and on a typical project with lots of branches being merged it makes for a much less confusing view. We do this for git log by default, why wouldn't Gitweb follow suit? http://thread.gmane.org/gmane.comp.version-control.git/89400/focus=90758 seems to agree with you in principle that this would be what gitweb should do if it were written today. I'm reminded of the make(1) story about not supporting spaces instead of tabs because the guy already had a few dozen users. We could have changed this in 2008, when Git already had much fewer users, and I think we can still change it. It makes more sense as a default, especially on busy repos with lots of merges. At work where lots of merges are in flight literally 1/10 commits for any given file is relevant. Who'd be linking to gitweb's log output expecting its semantics to never change, and is use case more important than having a saner view for the vast majority of users who are just browsing around? But if there's strong objections to it a coworker who encountered this made a patch to it to add an extra full history an addition to the history view (which would change, but not the permalinks), in case there were objections to just changing it. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: How to rebase when some commit hashes are in some commit messages
On Mon, Oct 12, 2015 at 9:59 PM, Francois-Xavier Le Bailwrote: > Hello, > > [I try some search engines without success, perhaps I have missed something]. > > For example, if I rebase the following commits, I would want that if > the commit hash 222... become 777..., > the message > "Update test output for " > become > "Update test output for 777..." > > Is it possible currently? And if yes how? This isn't strictly speaking an answer to your question (others have done that), but in my workflow if I have a patch series where I want to refer to commits inside the series, and I know I'm going to rebase it I work around this by just using the subject line of the commit as an ID. E.g. in the message I'll say something like "See my 'commit.c: Avoid segfaults on OSX' commit for details". Then I can just find that with git log --grep. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Since gc.autodetach=1 you can end up with auto-gc on every command with no user notification
Someone at work came to me with the problem that they were getting the Auto packing the repository in background for optimum performance notice on every Git command that they ran. This problem is a combination of two things: * Since Nguyễn's v1.9-rc0-2-g9f673f9 where we started running git gc in the background the user hasn't seen the There are too many unreachable loose objects message added back in v1.5.3.1-27-ga087cc9 * The checkout has a lot of loose objects. So even after git prune --expire=2.week.ago the .git/objects/17 directory has 317 objects. More than 27 in that directory trigger git gc --auto. So it's partly a UI issue. Since the repacking is happening in the background the user never sees the message suggesting that they run git prune. But perhaps the heuristic of are there more than 27 objects in .git/objects/17 could be improved, but I don't know with what exactly. But having something fork a gc to the background on every fetch (and similar object-modifying operations) is quite sub-optimal. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Bug: git-upload-pack will return successfully even when it can't read all references
We have a process to back up our Git repositories at work, this started alerting because it wasn't getting the same refs as the remote. This turned out to be a pretty trivial filesystem error. refs/heads/master wasn't readable by the backup process, but some other stuff in refs/heads and objects/* was. But I think it's a bug that if we ssh to the remote end, and git-upload-pack can't read certain refs in refs/heads/ that we don't return an error. This simple shellscript reproduces the issue: rm -rf /tmp/repo /tmp/repo-checkout git init /tmp/repo cd /tmp/repo touch foo git add foo git commit -m"foo" git checkout -b branch git checkout master git show-ref chmod 000 .git/refs/heads/master git show-ref cd /tmp git clone repo repo-checkout echo "Status code of clone: $?" cd repo-checkout git show-ref After running this you get: $ (cd /tmp/repo-checkout && echo -n | strace /tmp/avar/bin/git-upload-pack /tmp/repo 2>&1 | grep -e EACCES) open("refs/heads/master", O_RDONLY) = -1 EACCES (Permission denied) open("refs/heads/master", O_RDONLY) = -1 EACCES (Permission denied) open("refs/heads/master", O_RDONLY) = -1 EACCES (Permission denied) And "git fetch" will return 0. We fail to call get refs/heads/master in head_ref_namespaced() called by upload_pack(). I was going to see if I could patch it to return an error, but that code seems very far removed from any error checking. This isn't only an issue with git-upload-pack, e.g. show-ref itself has the same issue: $ chmod 600 .git/refs/heads/master $ git show-ref; echo $? e7255c8fcabc6e15f57cd984f9f117870052c1a0 refs/heads/branch e7255c8fcabc6e15f57cd984f9f117870052c1a0 refs/heads/master 0 $ chmod 000 .git/refs/heads/master $ git show-ref; echo $? e7255c8fcabc6e15f57cd984f9f117870052c1a0 refs/heads/branch 0 I wanted to check if this was a regression and got as far back as v1.4.3 with the same behavior before the commands wouldn't work anymore due to changes in the git config parsing code. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bash completion lacks options
On Mon, Sep 7, 2015 at 5:07 PM, Olaf Heringwrote: > "git send-email --f" lacks --find-renames and others. Is the list > of possible options maintained manually? Yes, see contrib/completion/git-completion.bash. There's no code for send-email there, you (or someone) could submit a patch! :) > Perhaps this should be > automated by placing the long strings in an ELF section, then filling > variables like $__git_format_patch_options from such ELF section. > An example how this was done in libguestfs is here (see daemon/daemon.h): > https://github.com/libguestfs/libguestfs/commit/0306c98d319d189281af3c15101c8d343e400f13 This is an interesting approach, but wouldn't help with git-send-email in particular, it's a Perl script, so there's no ELF section to parse. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: bash completion lacks options
On Mon, Sep 7, 2015 at 5:36 PM, Olaf Hering <o...@aepfle.de> wrote: > Am 07.09.2015 um 17:34 schrieb Ævar Arnfjörð Bjarmason: >> On Mon, Sep 7, 2015 at 5:07 PM, Olaf Hering <o...@aepfle.de> wrote: > >>> https://github.com/libguestfs/libguestfs/commit/0306c98d319d189281af3c15101c8d343e400f13 >> >> This is an interesting approach, but wouldn't help with git-send-email >> in particular, it's a Perl script, so there's no ELF section to parse. > > format-patch is a ELF binary, a link to git itself as I notice > just now. Yes, format-patch is written in C, but you mentioned send-email, which is a Perl script. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug: git-upload-pack will return successfully even when it can't read all references
On Tue, Sep 8, 2015 at 8:53 AM, Jeff King <p...@peff.net> wrote: > On Mon, Sep 07, 2015 at 02:11:15PM +0200, Ævar Arnfjörð Bjarmason wrote: > >> This turned out to be a pretty trivial filesystem error. >> refs/heads/master wasn't readable by the backup process, but some >> other stuff in refs/heads and objects/* was. >> >> [...] >> >> I wanted to check if this was a regression and got as far back as >> v1.4.3 with the same behavior before the commands wouldn't work >> anymore due to changes in the git config parsing code. > > Right, it has basically always been this way. for_each_ref() silently > eats oddities or errors while reading refs. Calling for_each_rawref() > will include them, but we don't do it in most places; it would make > non-critical operations on a corrupted repo barf. And it is difficult > to know what is "critical" inside the code. You might be calling > "upload-pack" to salvage what you can from a corrupted repo, or to make > a backup where you want to know what is corrupted and what is not. > > Commit 49672f2 introduced a "ref paranoia" environment variable to let > you specify this (and robust backups was definitely one of the use cases > I had in mind). It's a little tricky to use with upload-pack because you > may be crossing an ssh boundary, but: > > git clone -u 'GIT_REF_PARANOIA=1 git-upload-pack' ... > > should work. > > With your case: > > $ git clone --no-local -u 'GIT_REF_PARANOIA=1 git-upload-pack' repo > repo-checkout > Cloning into 'repo-checkout'... > fatal: git upload-pack: not our ref > fatal: The remote end hung up unexpectedly > > Without "--no-local" it behaves weirdly, but I would not recommend local > clones in general if you are trying to be careful. They optimize out a > lot of the safety checks, and we do things like copy the packed-refs > file wholesale. > > And certainly the error message is not the greatest. upload-pack is not > checking for the REF_ISBROKEN flag, so it just dumps: > > refs/heads/master > > in the advertisement, and the client happily requests that object. > REF_PARANOIA is really just a band-aid to feed the broken refs to the > normal code paths, which typically barf on their own. :) > > Something like this: > > diff --git a/upload-pack.c b/upload-pack.c > index 89e832b..3c621a5 100644 > --- a/upload-pack.c > +++ b/upload-pack.c > @@ -731,6 +731,9 @@ static int send_ref(const char *refname, const struct > object_id *oid, > if (mark_our_ref(refname, oid)) > return 0; > > + if (flag & REF_ISBROKEN) > + warning("remote ref '%s' is broken", refname); > + > if (capabilities) { > struct strbuf symref_info = STRBUF_INIT; > > kind of helps, but the advertisement is too early for us to send > sideband messages. So it makes it to the user if the transport is local > or ssh, but not over git:// or http. > > That's something we could do better with protocol v2 (we'll negotiate > capabilities before the advertisement). Fantastic. REF_PARANOIA does exactly what I need, i.e. stall the fetch process so permissions can be manually repaired. I think it makes sense to keep the default at "let's try to copy over what we can", for salvage purposes. I think the bug is that we still return success in that case, and should return non-zero, but as you point out this is easier said than done due to needing to deal with the case where the remote transport sends us the ... ref. I wonder if --upload-pack="GIT_REF_PARANOIA=1 git-upload-pack" should be the default when running fetch if you have --prune enabled. There's a particularly bad edge case now where if you have permission errors on the master repository and run --prune on your backup along with a --mirror clone to mirror the refs, then when you have permission issues you'll prune everything from the backup. But yeah, a proper fix needs protocol v2. Because among other things that --upload-pack hack will only work for ssh, not http. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH 6/8] config: add core.untrackedCache
On Wed, Dec 2, 2015 at 8:12 AM, Torsten Bögershausenwrote: > On 12/01/2015 09:31 PM, Christian Couder wrote: >> >> When we know that mtime is fully supported by the environment, we >> might want the untracked cache to be always used by default without >> any mtime test or kernel version check being performed. > [Re-arranged some of the quotes for the clarity of my reply] [Also: Full disclosure, Christian is working on this for Booking.com, and I'm managing that project...] > I always want to test and verify that the untracked cache is working, > before I rely on it. Then with this patch you can just not use the core.untrackedCache=true option, or with the later patches in this series use "git update-index --test-untracked-cache && git config core.untrackedCache true". > I'm not sure if ever "we know" ? > How can we know without testing ? > I personaly can not say "I know" in all the different system I am using, Some users of Git can know that their mtime works, just like they know they deploy it on filesystems where say symlinks work. The current implementation of turning on this feature needs to be run on a per-repo basis and without the --force option includes mandatory tests, which a) makes it inconvenient to deploy across all Git repos on a set of machines b) Is needlessly paranoid as a default way to enable it. >> Also when we know that mtime is not supported by the environment, >> for example because the repo is shared over a network file system, >> then we might want 'git update-index --untracked-cache' to fail >> immediately instead of it testing if it works (because it might >> work on some systems using the repo over the network file system >> but not others). > > Same here. > >> Signed-off-by: Christian Couder >> --- >> Documentation/config.txt | 10 ++ >> Documentation/git-update-index.txt | 11 +-- >> builtin/update-index.c | 28 ++-- >> cache.h| 1 + >> config.c | 10 ++ >> contrib/completion/git-completion.bash | 1 + >> dir.c | 2 +- >> environment.c | 1 + >> wt-status.c| 9 + >> 9 files changed, 60 insertions(+), 13 deletions(-) >> >> diff --git a/Documentation/config.txt b/Documentation/config.txt >> index b4b0194..bf176ff 100644 >> --- a/Documentation/config.txt >> +++ b/Documentation/config.txt >> @@ -308,6 +308,16 @@ core.trustctime:: >> crawlers and some backup systems). >> See linkgit:git-update-index[1]. True by default. >> +core.untrackedCache:: >> + If unset or set to 'default' or 'check', untracked cache will >> + not be enabled by default and when >> + 'update-index --untracked-cache' is called, Git will test if >> + mtime is working properly before enabling it. If set to false, >> + Git will refuse to enable untracked cache even if >> + '--force-untracked-cache' is used. If set to true, Git will >> + blindly enabled untracked cache by default without testing if >> + it works. See linkgit:git-update-index[1]. >> + > > Please no. > The command line option should always be able to overwrite any settings > from a config file. If we keep this patch and not the rest in this series (which I think should also be applied) you'd either use the update-index way of changing the setting, or the config option. > Sorry, I may missing the big picture here. > What exactly should be achieved ? > > A config variable that should ask Git to always try to use the untracked > cache ? > Or a config variable that tells Git to never use the untracked cache ? > Or a combination ? > > core.untrackedCache:: > false: Never use the untracked cache ? > true: Always try to use the untracked cache ? >Try means: probe, and if the probing fails, record that if fails in > the index, >for this hostname/os/kernel/path (Don't remember all the details) > unset: As today, As discussed in the "[RFC/PATCH] config: add core.trustmtime" thread this feature is IMO needlessly paranoid about enabling itself. Current state of affairs: * Enable on a per-repo basis: git update-index --untracked-cache * Disable on a per-repo basis: git update-index --no-cache * Enable system-wide: N/A * Disable system-wide: N/A With this patch: * Enable on a per-repo basis: git update-index --untracked-cache OR "git config core.untrackedCache true" * Disable on a per-repo basis: git update-index --no-cache OR "git config core.untrackedCache false" * Enable system-wide: git config --global core.untrackedCache true * Disable system-wide: git config --global core.untrackedCache false * Caveat: The core.untrackedCache config has precidence over "git update-index" With the rest of the patches in this series: * Enable system-wide & per-repo the
Re: [PATCH 7/8] config: add core.untrackedCache
On Tue, Dec 15, 2015 at 8:40 PM, Junio C Hamano <gits...@pobox.com> wrote: > Ævar Arnfjörð Bjarmason <ava...@gmail.com> writes: > I still have a problem with the approach from "design cleanliness" > point of view[...] > > In any case I think we already have agreed to disagree on this > point, so there is no use discussing it any longer from my side. I > am not closing the door to this series, but I am not convinced, > either. At least not yet. In general the fantastic thing about the git configuration facility is that it provides both systems administrators and normal users with what they want. It's possible to configure things system-wide and override those on a user or repository basis. Of course hindsight is 20/20, but I think that given what's been covered in this thread it's been established that it's categorically better that if we introduce features like these that they be configured through the normal configuration facility rather than the configuration being sticky to the index. It gives you everything that the per-index configuration gives you and more. So assuming that's the case, how do we migrate something that's configured via the index towards being configured through git-config? I think there's no general answer to that, but in this case the worst case scenario with accepting this series as-is is that we downgrade some users who've opted in to it to pre-v2.5.0 "git status" performance. Since the change in performance really isn't noticeable except on really large repositories, which are more likely to have someone involved watching the changelog on upgrades I think that's OK. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/8] config: add core.untrackedCache
On Mon, Dec 14, 2015 at 8:44 PM, Junio C Hamanowrote: I'm replying to & quoting from two E-Mails of yours at once here for clarity & less noise. I'm working wich Christian on getting this integrated, and we both thought it would be good to have some fresh input on the matter from me. > Christian Couder writes: >> If you want only some repos to use the UC, you will set >> core.untrackedCache in the repo config. Then after cloning such a >> repo, you will copy the config file, and this will not be enough to >> enable the UC. > > Surely. "Does this index file keeps track of the untracked files' > states?" is a property of the index. Cloning does not propagate the > configuration and copying or not copying is irrelevant. If you want > to enable, running "update-index --untracked-cache" is a way to do > so. I cannot see what's so hard about it. > >> And if you have set core.untrackedCache in the global config when you >> clone, UC is enabled, but if you have just set it in the repo config >> after the clone, it is not enabled. > > That's fine. In your patch series, if you set it in the global, you > will get the cache in the new one. With the cleaned-up semantics I > suggested, the same thing will happen. > > And with the cleaned-up semantics, the configuration is *ONLY* used > to give the *DEFAULT* before other things happen, i.e. creation of > the index file for the first time. Because the configuration is > only the default, an explicit "update-index --[no-]untracked-cache" > will defeat it, just like any other config/option interaction. As you know Christian is working on this for Booking.com to integrate features we find useful into git.git in such a way that we don't have to maintain some internal fork of Git. What we're trying to do, and what a lot of other big deployments of Git elsewhere would also find useful, is to ship a default sensible configuration for all users on the system in /etc/gitconfig. I'd like to be able to easily enable some feature that aids Git performance globally on our thousands of machines and for our hundreds of users by just tweaking something in puppet to change /etc/gitconfig, and more importantly if that change ends up being bad reverting that config in /etc/gitconfig should undo the change. It's an unacceptable level of complexity for system-level automation to have to scour the filesystem for existing Git repositories and run "git update-index" on each of them, that's why we're submitting patches to make this a config option, so we can simply flip a flag in /etc/gitconfig. It's also unacceptable to have the config simply provide the default which'll be frozen either at clone time or after an initial "git status". Let's say I ship a /etc/gitconfig that says "new clones should use the untracked cache". Now I roll that out across our fleet of machines and it turns out the morning after that the feature doesn't work properly for whatever reason. If it's just a "default until clone or status" type of thing even if I revert the configuration a lot of users & their repositories in the wild will still be broken, and will have to be manually fixed. Which again leads to the scouring the filesystem problem. So that gives some more context for why we're pushing for this change. I believe this feature breaks no existing use-case and just supports new ones, and I think that your objections to it are based on a simple misunderstanding as will become apparent if you read on below. > The biggest issue I had with your patch series, IIRC, is that > configuration will defeat the command line option. I think it's a moot point to focus on configuration v.s. command-line option. The important question is whether or not this feature can still be configured on a repo-local basis with this series as before. That's still the case since --local git configuration overrides --global and --system, so users who want to enable/disable this per-repo still can. >> Shouldn't it be nice if they could just enable core.untrackedCache in >> the global config files without having to also cd into every repo and >> use "git update-index --untracked-cache" there? > > NO. It is bad to change the behaviour behind users' back. I'm not quite sure what the objection here is exactly. If you're a normal user you can enable/disable this per-repo just like you can now, and enable/disable it for all your repos in ~/.gitconfig. If you mean that the user's configuration shouldn't be changed by the global config in /etc/gitconfig I do think that's a moot point. If you're a user on a system where I have root and I want to change your Git configuration I'm going to be able to do that whatever the mechanism is. That's indeed that's what we're doing to enable this at Booking.com currently, we run a job to find some limited set of common checkouts and run "git update-index" for users as root. The problem with that is that it's needlessly complex, hence this
Re: [PATCH 7/8] config: add core.untrackedCache
On Wed, Dec 16, 2015 at 12:03 AM, Junio C Hamano <gits...@pobox.com> wrote: > Ævar Arnfjörð Bjarmason <ava...@gmail.com> writes: > >> Of course hindsight is 20/20, but I think that given what's been >> covered in this thread it's been established that it's categorically >> better that if we introduce features like these that they be >> configured through the normal configuration facility rather than the >> configuration being sticky to the index. > > I doubt that any such thing has been established at all in this > thread. It may be true that you and perhaps Christian loudly > repeated it, but loudly repeating something and establishing > something as a fact are slightly different. > > The thing is, I do not necessarily view this as "configuration". > The way I see the feature is that you say "--untracked" when you > want the states of untracked paths be kept track of in the index. You probably know this, but the --untracked-cache has no bearing on what we actually keep track of, it's just an optimization for how efficiently we execute "git status" commands without the "-uno" option. We still produce the same output. > just like you say "git add Makefile" when you want the state of > 'Makefile' be kept track of in the index. Either the index keeps > track of it, or it doesn't, based solely on user's request, and the > bit to tell us which is the case is already in the index, exactly > because that is part of the data that is kept track of in the index. What I mean by "[we've] established that it's categorically better [to do this via git-config]" is that we can still do all that stuff, we can just also do more stuff now. >> Since the change in performance really isn't noticeable except on >> really large repositories, which are more likely to have someone >> involved watching the changelog on upgrades I think that's OK. > > Especially it is dubious to me that the trade-off you are making > with this design is a good one. In order to avoid paying a one-time > cost to run "update-index --untracked-cache" at sites that _do_ want > to use that feature (and after that, if you teach "git init" and > "git clone" to pay attention to the "give you the default" > configuration to run it for you, so that your users won't have to), It's not unreasonable to avoid the cost of running "update-index --untracked-cache", it's the difference between just adjusting /etc/gitconfig and continually having to traverse the entire / filesystem if you want to enable this feature on a system-wide basis. It should be easy to enable any Git feature via the configuration facility either on a --system, or --global or --local basis. > you are forcing all codepaths that makes any write to the index (not > just "init"-time) to make an extra check with the configuration all > the time for everybody, because you made the presence of the > untracked cache data in the index not usable as a sign that the user > wants to use that feature. Maybe I'm misunderstanding Christian's patches but don't we already parse the git configuration on any commands that update the index anyway? See git_default_core_config(). We already parse the git configuration to run "git status". > If the feature is something only those > with really large repositories care about, is it a good trade-off to > make everybody pay the runtime cost and make code more complex and > fragile? I am not yet convinced. I was arguing that only users with really large repositories would notice if we turned this off because the enabling facility had changed from per-index to config. But it doesn't follow that the expense of checking the git configuration which we're parsing anyway for the index-related commands makes things more complex & fragile. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH] config: add core.trustmtime
On Thu, Nov 26, 2015 at 6:53 PM, Duy Nguyenwrote: > On Thu, Nov 26, 2015 at 6:21 AM, Christian Couder > wrote: >> I am wondering why you didn't make it by default run the mtime checks >> when a kernel change is detected. Maybe that would be better than >> disabling itself. > > It takes about 10 seconds to go through the mtime check. Imagine you > have to wait 10s for some random "git status".. Plus I didn't want to > do anything fancy. I browsed through the commits that added the --untracked-cache and tried to find the original mailing list discussion, but I couldn't find the reason for why the default interface for enabling it is doing these exhaustive tests. Maybe I'm missing some really common breakage with st_mtime on some system, but having a feature the user explicitly enables turn itself off and doing FS-testing that takes 10 seconds when it's enabled seems like the wrong default to me. We don't do it with core.fileMode, core.ignorecase or core.trustctime or core.symlinks. Do we really need to be treating this differently? If that's a "no" then the default interface to this could be much simpler. Rather than being a change you apply to .git/index (going away if you nuke it etc.) it could just be a config option like the rest. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH] config: add core.trustmtime
On Wed, Nov 25, 2015 at 7:35 AM, Christian Couderwrote: > At Booking.com we know that mtime works everywhere and we don't > want the untracked cache to stop working when a kernel is upgraded > or when the repo is copied to a machine with a different kernel. > I will add tests later if people are ok with this. I bit more info: I rolled Git out internally with this patch: https://github.com/avar/git/commit/c63f7c12c2664631961add7cf3da901b0b6aa2f2 The --untracked-cache feature hardcodes the equivalent of: pwd; uname --kernel-name --kernel-release --kernel-version Into the index. If any of those change it prints out the "cache is disabled" warning. This patch will make it stop being so afraid of itself to the point of disabling itself on minor kernel upgrades :) A few other issues with this feature I've noticed: * There's no way to just enable it globally via the config. Makes it a bit of a hassle to use it. I wanted to have a config option to enable it via the config, how about "index.untracked_cache = true" for the config variable name? * Doing "cd /tmp: git --git-dir=/git/somewhere/else/.git update-index --untracked-cache" doesn't work how I'd expect. It hardcodes "/tmp" as the directory that "works" into the index, so if you use the working tree you'll never use the untracked cache. I spotted this because I carry out a bunch of git maintenance commands with --git-dir instead of cd-ing to the relevant directories. This works for most other things in git, is it a bug that it doesn't work here? * If you "ctrl+c" git update-index --untracked-cache at an inopportune time you'll end up with a mtime-test-XX directory in your working tree. Perhaps this tempdir should be created in the .git directory instead? * Maybe we should have a --test-untracked-cache option, so you can run the tests without enabling it. Aside from the slight hassle of enabling this and keeping it enabled this feature is great. It's sped up "git status" across the board by about 40%. Slightly less than that on faster spinning disks, slightly more than that on slower ones. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Rebase performance
On Wed, Feb 24, 2016 at 11:09 PM, Christian Couderwrote: [Resent because I was accidentally in GMail's HTML mode and the ML rejected it] > If there was a config option called maybe "rebase.taskset" or > "rebase.setcpuaffinity" that could be set to ask the OS for all the > rebase child processes to be run on the same core, people who run many > rebases on big repos on big servers as we do at Booking.com could > easily benefit from a nice speed up. > > Technically the option may make git-rebase--am.sh call "git am" using > "taskset" (if taskset is available on the current OS). I think aside from issues with git-apply this would be an interesting feature to have in git. I.e. some general facility to intercept commands and inject a prefix command in front of them, whether that's taskset, nice/ionice, strace etc. > Another possibility would be to libify the "git apply" functionality > and then to use the libified "git apply" in run_apply() instead of > launching a separate "git apply" process. One benefit from this is > that we could probably get rid of the read_cache_from() call at the > end of run_apply() and this would likely further speed up things. Also > avoiding to launch separate processes might be a win especially on > Windows. Yeah that should help in this particular case and make the taskset redundant since the whole sequence of operations would all be on one core, right? At the risk of derailing this thread, a thing that would make rebase even faster I think would be to change it so that instead of applying a patch at a time to the working tree the whole operation takes place on temporary trees & commits and then we'll eventually move the branch pointer to that once it's finished. I.e. there's no reason for why a sequence of 1000 patches where a FOO.txt is changed from "hi1", "hi2", "hi3", ... would be noticeably slower than applying the same changes with git-fast-import. Of course this would require a lot of nuances, e.g. if there's a conflict we'd need to change the working tree & index as we do now before continuing. Has anyone looked into some advanced refactoring of the rebase process that would work like this, or has some feedback on why this would be dumb or that there's a better way to do it? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Is there a --stat or --numstat like option that'll allow me to have my cake and eat it too?
I maintain a hook for Git that allows you to block binary pushes[1], from other implementations I've seen it's the least stupid thing out there that does that. Basically on-push it parses this: git log --pretty=format:%H -M100% --stat=9000,9001 .. The --stat=9000,9001 is there to make sure we still get the filename if it's long[2]. It's important that this is something like "git-log" instead of "git-show for each" for performance (think a push with hundreds of commits). It's also important that it's not "git diff" (think a push that adds/removes a huge binary file within one push). I also don't want to manually parse "git log --numstat -p" or whatever for performance reasons since every push hangs on this. It's somewhat of a pain to parse that --stat output, because I have to look for /\|\s+Bin / in the output to detect binary changes. You might be thinking "why don't you use --numstat?". Because while that option does most of what I want it doesn't show the old/new size of the binary file, so I can't have a policy to allow e.g. <=1KB files without doing a second pass with --stat or "git show". Both formats also have various parsing edge cases, e.g. with -M100% I have to parse out renames like "foo.png => bar.png", but you can also create a file with " => " in the filename and there's no way to disambiguate it. Both formats also only show lines added/deleted, but --numstat doesn't show the size before/after for binary files, so if I want to also prohibit huge non-binary files I can't without running both --stat and --numstat. What I really want is something for git-log more like git-for-each-ref, so I could emit the following info for each file being modified delimited by some binary marker: - file name before - file name after - is rename? - is binary? - size in bytes before - size it bytes after - removed lines - added lines I think no combination of git-log options or any built-in machinery comes close to giving me all of that without having to do multiple passes with some combination of git-log and git-show, but I'd love to be proven wrong. 1. https://github.com/avar/pre-receive-reject-binaries 2. OVER NINE THOUSAND should be enough for everyone, right? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Is there a --stat or --numstat like option that'll allow me to have my cake and eat it too?
On Tue, Mar 8, 2016 at 9:51 PM, Jeff King <p...@peff.net> wrote: > On Tue, Mar 08, 2016 at 04:08:21PM +0100, Ævar Arnfjörð Bjarmason wrote: > >> What I really want is something for git-log more like >> git-for-each-ref, so I could emit the following info for each file >> being modified delimited by some binary marker: >> >> - file name before >> - file name after >> - is rename? >> - is binary? >> - size in bytes before >> - size it bytes after >> - removed lines >> - added lines > > If you get the full sha1s of each object (e.g., by adding --raw), then > you can dump them all to a single cat-file invocation to efficiently get > the sizes. > > I'm not quite sure I understand why you want to know about renames and > added/removed lines if you are just blocking binary files. If I were > implementing this[1], I'd probably just block based on blob size, which > you can do with: I want to know about renames because if you're just moving an existing binary file around that's fine, it's not adding a new big blob to the repo. The hook also has a facility to commit binary stuff if you add "yes I know what I'm doing and want to commit N bytes to the repo" to the commit message. Mostly when people do this it's an accident. I wanted to know about added/removed lines because I was looking into extending this non-binary files. Today at work someone committed 300MB of text files to a branch, we could delete it in that case, but it would also be nice to have limits on that sort of thing too. > git rev-list --objects $old..$new | > git cat-file --batch-check='%(objectsize) %(objectname) %(rest)' | > perl -alne 'print if $F[0] > 1_000_000; # or whatever' | > while read size sha1 file; do > echo "Whoops, $file ($sha1) is too big" > exit 1 > done > > You can also use %(objectsize:disk) to get the on-disk size (which can > tell you about things that don't compress well, which tend to be the > sorts of things you are trying to keep out). > > You can't ask about binary-ness, but I don't think it would unreasonable > for cat-file to have a "would git consider this content binary?" > placeholder for --batch-check. > > The other things are properties of the comparison, not of individual > objects, so you'll have to get them from "git log". But with some clever > scripting, I think you could feed those sha1s (or $commit:$path > specifiers) into a single cat-file invocation to get the before/after > sizes. > > -Peff > > [1] GitHub has hard and soft limits for various blob sizes, and at one > point the implementation looked very similar to what I showed here. > The downside is that for a large push, the rev-list can actually > take a fair bit of time (e.g., consider pushing up all of the kernel > history to a brand new repo), and this is on top of the similar work > already done by index-pack and check_everything_connected(). > > These days I have a hacky patch to notice the too-big size directly > in index-pack, which is essentially free. It doesn't know about the > file path, so we pull that out later in the pre-receive hook. But we > only have to do so in the uncommon case that there _is_ actually a > too-big file, so normal pushes incur no penalty. All good tips / insights. I'll definitely check some of this out. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GSoC] A late proposal: a modern send-email
On Tue, Mar 29, 2016 at 6:17 AM, 惠轶群 <huiyi...@gmail.com> wrote: > 2016-03-29 0:49 GMT+08:00 Ævar Arnfjörð Bjarmason <ava...@gmail.com>: >> On Sat, Mar 26, 2016 at 3:13 AM, 惠轶群 <huiyi...@gmail.com> wrote: >>> 2016-03-26 2:16 GMT+08:00 Junio C Hamano <gits...@pobox.com>: >>>> 惠轶群 <huiyi...@gmail.com> writes: >>>> >>>>> # Purpose >>>>> The current implementation of send-email is based on perl and has only >>>>> a tui, it has two problems: >>>>> - user must install a ton of dependencies before submit a single patch. >>>>> - tui and parameter are both not quite friendly to new users. >>>> >>>> Is "a ton of dependencies" true? "apt-cache show git-email" >>>> suggests otherwise. Is "a ton of dependencies" truly a problem? >>>> "apt-get install" would resolve the dependencies for you. >>> >>> There are three perl packages needed to send patch through gmail: >>> - perl-mime-tools >>> - perl-net-smtp-ssl >>> - perl-authen-sasl >>> >>> Yes, not too many, but is it better none of them? >>> >>> What's more, when I try to send mails, I was first disrupted by >>> "no perl-mime-tools" then by "no perl-net-smtp-ssl or perl-authen-sasl". >>> Then I think, why not just a mailto link? >> >> I think your proposal should clarify a bit who these users are that >> find it too difficult to install these perl module dependencies. Users >> on OSX & Windows I would assume, because in the case of Linux distros >> getting these is the equivalent of an apt-get command away. > > In fact, I'm not familiar with the build for OSX or Windows. The core of your proposal rests on the assumption that git-send-email's implementation is problematic because it has a "ton of dependencies", and that this must be dealt with by implementing an alternate E-Mail transport method. But you don't go into how this is a practical issue for users exactly, which is the rest of the proposal. I.e. "make it friendly for users". Let's leave the question of creating an E-Mail GUI that's shipped with Git aside. Correct me if I'm wrong but don't we basically have 4 kinds of users using git-send-email: 1) Those who get it from a binary Windows package (is it even packaged there?) 2) Also a binary package, but for for OSX 3) Users installing it via their Linux distribution's package system 4) Users building it from source on Windows/OSX/Linux. I'm in group #3 myself for the purposes of using git-send-email and have never had issues with its dependencies because my distro's package management takes care of it for me. I don't know what the status is of packaging it is on #1 and #2, but that's what I'm asking about in my question, if this becomes a non-issue for those two groups (if it isn't already) isn't this question of dependencies a non-issue? I.e. why does it matter if git-send-email has N dependencies if those N are either packaged with the common Windows/OSX packages that most users use, or installed as dependencies by their *nix distro? Group #4 is small enough and likely to be a git.git contributor or distro package maintainer anyway that this issue doesn't matter for them. >> If installing these dependencies is hard for users perhaps a better >> thing to focus on is altering the binary builds on Git for platforms >> that don't have package systems to include these dependencies. > > Why `mailto` not a good choice? I'm confusing. I'm not saying having this mailto: method you're proposing isn't good in itself, I think it would be very useful to be able to magically open git-send-email output in your favorite E-Mail client for editing before sending it off like you usually send E-Mail. Although I must say I'd be seriously surprised if the likes of git formatted patches survive contact with popular E-Mail clients when the body is specified via the body=* parameter, given that we're sending pretty precisely formatted content and most mailers are very eager to wrap lines or otherwise munge input. I'm mainly trying to get to the bottom of this dependency issue you're trying to solve. >> In this case it would mean shipping a statically linked OpenSSL since >> that's what these perl SSL packages eventually depend on. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GSoC] A late proposal: a modern send-email
On Sat, Mar 26, 2016 at 3:13 AM, 惠轶群wrote: > 2016-03-26 2:16 GMT+08:00 Junio C Hamano : >> 惠轶群 writes: >> >>> # Purpose >>> The current implementation of send-email is based on perl and has only >>> a tui, it has two problems: >>> - user must install a ton of dependencies before submit a single patch. >>> - tui and parameter are both not quite friendly to new users. >> >> Is "a ton of dependencies" true? "apt-cache show git-email" >> suggests otherwise. Is "a ton of dependencies" truly a problem? >> "apt-get install" would resolve the dependencies for you. > > There are three perl packages needed to send patch through gmail: > - perl-mime-tools > - perl-net-smtp-ssl > - perl-authen-sasl > > Yes, not too many, but is it better none of them? > > What's more, when I try to send mails, I was first disrupted by > "no perl-mime-tools" then by "no perl-net-smtp-ssl or perl-authen-sasl". > Then I think, why not just a mailto link? I think your proposal should clarify a bit who these users are that find it too difficult to install these perl module dependencies. Users on OSX & Windows I would assume, because in the case of Linux distros getting these is the equivalent of an apt-get command away. If installing these dependencies is hard for users perhaps a better thing to focus on is altering the binary builds on Git for platforms that don't have package systems to include these dependencies. In this case it would mean shipping a statically linked OpenSSL since that's what these perl SSL packages eventually depend on. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 2/3] githooks.txt: Amend dangerous advice about 'update' hook ACL
Any ACL you implement via an 'update' hook isn't actual access control if the user has login access to the machine running git, because they can trivially just built their own git version which doesn't run the hook. Change the documentation to take this dangerous edge case into account, and remove the mention of the advice originating on the mailing list, the users reading this don't care where the idea came up. Signed-off-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com> --- Documentation/githooks.txt | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt index 6db515e..38bea7d 100644 --- a/Documentation/githooks.txt +++ b/Documentation/githooks.txt @@ -275,9 +275,13 @@ does not know the entire set of branches, so it would end up firing one e-mail per ref when used naively, though. The <<post-receive,'post-receive'>> hook is more suited to that. -Another use suggested on the mailing list is to use this hook to -implement access control which is finer grained than the one -based on filesystem group. +Another use for this hook to implement access control which is finer +grained than the one based on filesystem group. Note that if the user +pushing has a normal login shell on the machine receiving the push +implementing access control like this can be trivially bypassed by +just not executing the hook. In those cases consider using +e.g. linkgit:git-shell[1] as the login shell to restrict the user's +access. Both standard output and standard error output are forwarded to 'git send-pack' on the other end, so you can simply `echo` messages -- 2.1.3 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 1/3] githooks.txt: Improve the intro section
Change the documentation so that: * We don't talk about "little scripts". Hooks can be as big as you want, and don't have to be scripts, just call them "programs". * We note what happens with chdir() before a hook is called, nothing documented this explicitly, but the current behavior is predictable. It helps a lot to know what directory these hooks will be executed from. * We don't make claims about the example hooks which may not be true depending on the configuration of 'init.templateDir'. Clarify that we're talking about the default settings of git-init in those cases, and move some of this documentation into git-init's documentation about the default templates. * We briefly note in the intro that hooks can get their arguments in various different ways, and that how exactly is described below for each hook. Signed-off-by: Ævar Arnfjörð Bjarmason <ava...@gmail.com> --- Documentation/git-init.txt | 6 +- Documentation/githooks.txt | 32 2 files changed, 25 insertions(+), 13 deletions(-) diff --git a/Documentation/git-init.txt b/Documentation/git-init.txt index 8174d27..cc3be7d 100644 --- a/Documentation/git-init.txt +++ b/Documentation/git-init.txt @@ -130,7 +130,11 @@ The template directory will be one of the following (in order): - the default template directory: `/usr/share/git-core/templates`. The default template directory includes some directory structure, suggested -"exclude patterns" (see linkgit:gitignore[5]), and sample hook files (see linkgit:githooks[5]). +"exclude patterns" (see linkgit:gitignore[5]), and example hook files. + +The example hooks are all disabled by default. To enable a hook, +rename it by removing its `.sample` suffix. See linkgit:githooks[5] +for more info on hook execution. EXAMPLES diff --git a/Documentation/githooks.txt b/Documentation/githooks.txt index a2f59b1..6db515e 100644 --- a/Documentation/githooks.txt +++ b/Documentation/githooks.txt @@ -13,18 +13,26 @@ $GIT_DIR/hooks/* DESCRIPTION --- -Hooks are little scripts you can place in `$GIT_DIR/hooks` -directory to trigger action at certain points. When -'git init' is run, a handful of example hooks are copied into the -`hooks` directory of the new repository, but by default they are -all disabled. To enable a hook, rename it by removing its `.sample` -suffix. - -NOTE: It is also a requirement for a given hook to be executable. -However - in a freshly initialized repository - the `.sample` files are -executable by default. - -This document describes the currently defined hooks. +Hooks are programs you can place in the `$GIT_DIR/hooks` directory to +trigger action at certain points. Hooks that don't have the executable +bit set are ignored. + +When a hook is called in a non-bare repository the working directory +is guaranteed to be the root of the working tree, in a bare repository +the working directory will be the path to the repository. I.e. hooks +don't need to worry about the user's current working directory. + +Hooks can get their arguments via the environment, command-line +arguments, and stdin. See the documentation for each below hook for +details. + +When 'git init' is run it may, depending on its configuration, copy +hooks to the new repository, see the the "TEMPLATE DIRECTORY" section +in linkgit:git-init[1] for details. When the rest of this document +refers to "default hooks" we're talking about the default template +shipped with Git. + +The currently supported hooks are described below. HOOKS - -- 2.1.3 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 0/3] Improvements to githooks.txt documentation
This includes minor grammar edits pointed out by Eric Sunshine + the one v2 patch I sent out in response to comments by Jacob Keller. I thought it was less confusing to just send out a whole v3 series than ask Junio to piece together v1..v3 of various patches. Ævar Arnfjörð Bjarmason (3): githooks.txt: Improve the intro section githooks.txt: Amend dangerous advice about 'update' hook ACL githooks.txt: Minor improvements to the grammar & phrasing Documentation/git-init.txt | 6 +++- Documentation/githooks.txt | 72 +++--- 2 files changed, 47 insertions(+), 31 deletions(-) -- 2.1.3 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html