Re: Huge performance bottleneck reading packs

2016-10-13 Thread Vegard Nossum

On 10/13/2016 10:43 PM, Jeff King wrote:

No problem. I do think you'll benefit a lot from packing everything into
a single pack, but it's also clear that Git was doing more work than it
needed to be. This was a known issue when we added the racy-check to
has_sha1_file(), and knew that we might have to field reports like this
one.


After 'git gc' (with the .5G limit) I have 23 packs and with your patch
I now get:

$ time git fetch

real0m26.520s
user0m3.580s
sys 0m0.636s

Thanks!


vegard


[PATCH 1/2] git-svn: reduce scope of input record separator change

2016-10-13 Thread Eric Wong
Reducing the scope of where we change the record separator ($/)
avoids bugs in calls which rely on the input record separator
further down, such as the 'chomp' usage in command_oneline.

This is necessary for a future change to git-svn, but exists in
Git.pm since it seems useful for gitweb and our other Perl
scripts, too.

Signed-off-by: Eric Wong 
---
 git-svn.perl|  4 ++--
 perl/Git.pm | 16 +++-
 perl/Git/SVN/Editor.pm  | 12 +---
 perl/Git/SVN/Fetcher.pm | 15 +--
 4 files changed, 27 insertions(+), 20 deletions(-)

diff --git a/git-svn.perl b/git-svn.perl
index 4d41d22..6d1a142 100755
--- a/git-svn.perl
+++ b/git-svn.perl
@@ -44,6 +44,7 @@ use Git qw(
command_close_pipe
command_bidi_pipe
command_close_bidi_pipe
+   get_record
 );
 
 BEGIN {
@@ -1880,10 +1881,9 @@ sub get_commit_entry {
{
require Encode;
# SVN requires messages to be UTF-8 when entering the repo
-   local $/;
open $log_fh, '<', $commit_msg or croak $!;
binmode $log_fh;
-   chomp($log_entry{log} = <$log_fh>);
+   chomp($log_entry{log} = get_record($log_fh, undef));
 
my $enc = Git::config('i18n.commitencoding') || 'UTF-8';
my $msg = $log_entry{log};
diff --git a/perl/Git.pm b/perl/Git.pm
index ce7e4e8..d2c5a8d 100644
--- a/perl/Git.pm
+++ b/perl/Git.pm
@@ -59,7 +59,7 @@ require Exporter;
 command_bidi_pipe command_close_bidi_pipe
 version exec_path html_path hash_object git_cmd_try
 remote_refs prompt
-get_tz_offset
+get_tz_offset get_record
 credential credential_read credential_write
 temp_acquire temp_is_locked temp_release temp_reset temp_path);
 
@@ -538,6 +538,20 @@ sub get_tz_offset {
return sprintf("%s%02d%02d", $sign, (gmtime(abs($t - $gm)))[2,1]);
 }
 
+=item get_record ( FILEHANDLE, INPUT_RECORD_SEPARATOR )
+
+Read one record from FILEHANDLE delimited by INPUT_RECORD_SEPARATOR,
+removing any trailing INPUT_RECORD_SEPARATOR.
+
+=cut
+
+sub get_record {
+   my ($fh, $rs) = @_;
+   local $/ = $rs;
+   my $rec = <$fh>;
+   chomp $rec if defined $rs;
+   $rec;
+}
 
 =item prompt ( PROMPT , ISPASSWORD  )
 
diff --git a/perl/Git/SVN/Editor.pm b/perl/Git/SVN/Editor.pm
index 4c4199a..0df16ed 100644
--- a/perl/Git/SVN/Editor.pm
+++ b/perl/Git/SVN/Editor.pm
@@ -7,7 +7,9 @@ use SVN::Delta;
 use Carp qw/croak/;
 use Git qw/command command_oneline command_noisy command_output_pipe
command_input_pipe command_close_pipe
-   command_bidi_pipe command_close_bidi_pipe/;
+   command_bidi_pipe command_close_bidi_pipe
+   get_record/;
+
 BEGIN {
@ISA = qw(SVN::Delta::Editor);
 }
@@ -57,11 +59,9 @@ sub generate_diff {
push @diff_tree, "-l$_rename_limit" if defined $_rename_limit;
push @diff_tree, $tree_a, $tree_b;
my ($diff_fh, $ctx) = command_output_pipe(@diff_tree);
-   local $/ = "\0";
my $state = 'meta';
my @mods;
-   while (<$diff_fh>) {
-   chomp $_; # this gets rid of the trailing "\0"
+   while (defined($_ = get_record($diff_fh, "\0"))) {
if ($state eq 'meta' && /^:(\d{6})\s(\d{6})\s
($::sha1)\s($::sha1)\s
([MTCRAD])\d*$/xo) {
@@ -173,9 +173,7 @@ sub rmdirs {
 
my ($fh, $ctx) = command_output_pipe(qw/ls-tree --name-only -r -z/,
 $self->{tree_b});
-   local $/ = "\0";
-   while (<$fh>) {
-   chomp;
+   while (defined($_ = get_record($fh, "\0"))) {
my @dn = split m#/#, $_;
while (pop @dn) {
delete $rm->{join '/', @dn};
diff --git a/perl/Git/SVN/Fetcher.pm b/perl/Git/SVN/Fetcher.pm
index d8c21ad..64e900a 100644
--- a/perl/Git/SVN/Fetcher.pm
+++ b/perl/Git/SVN/Fetcher.pm
@@ -9,7 +9,8 @@ use Carp qw/croak/;
 use File::Basename qw/dirname/;
 use Git qw/command command_oneline command_noisy command_output_pipe
command_input_pipe command_close_pipe
-   command_bidi_pipe command_close_bidi_pipe/;
+   command_bidi_pipe command_close_bidi_pipe
+   get_record/;
 BEGIN {
@ISA = qw(SVN::Delta::Editor);
 }
@@ -86,11 +87,9 @@ sub _mark_empty_symlinks {
my $printed_warning;
chomp(my $empty_blob = `git hash-object -t blob --stdin < /dev/null`);
my ($ls, $ctx) = command_output_pipe(qw/ls-tree -r -z/, $cmt);
-   local $/ = "\0";
my $pfx = defined($switch_path) ? $switch_path : $git_svn->path;
$pfx .= '/' if length($pfx);
-   while (<$ls>) {
-   chomp;
+   while (defined($_ = get_record($ls, "\0"))) {
s/\A100644 blob $empty_blob\t//o or next;
 

[PATCH 2/2] git-svn: "git worktree" awareness

2016-10-13 Thread Eric Wong
git-svn internals were previously not aware of repository
layout differences for users of the "git worktree" command.
Introduce this awareness by using "git rev-parse --git-path"
instead of relying on outdated uses of GIT_DIR and friends.

Thanks-to: Duy Nguyen 
Reported-by: Mathieu Arnold 
Signed-off-by: Eric Wong 
---
 git-svn.perl  |  9 +
 perl/Git/SVN.pm   | 24 +++-
 perl/Git/SVN/Migration.pm | 37 ++---
 3 files changed, 42 insertions(+), 28 deletions(-)

diff --git a/git-svn.perl b/git-svn.perl
index 6d1a142..fa42364 100755
--- a/git-svn.perl
+++ b/git-svn.perl
@@ -1700,7 +1700,7 @@ sub cmd_gc {
 "files will not be compressed.\n";
}
File::Find::find({ wanted => \&gc_directory, no_chdir => 1},
-"$ENV{GIT_DIR}/svn");
+Git::SVN::svn_dir());
 }
 
 ### utility functions #
@@ -1734,7 +1734,7 @@ sub post_fetch_checkout {
return unless verify_ref('HEAD^0');
 
return if $ENV{GIT_DIR} !~ m#^(?:.*/)?\.git$#;
-   my $index = $ENV{GIT_INDEX_FILE} || "$ENV{GIT_DIR}/index";
+   my $index = command_oneline(qw(rev-parse --git-path index));
return if -f $index;
 
return if command_oneline(qw/rev-parse --is-inside-work-tree/) eq 
'false';
@@ -1836,8 +1836,9 @@ sub get_tree_from_treeish {
 sub get_commit_entry {
my ($treeish) = shift;
my %log_entry = ( log => '', tree => get_tree_from_treeish($treeish) );
-   my $commit_editmsg = "$ENV{GIT_DIR}/COMMIT_EDITMSG";
-   my $commit_msg = "$ENV{GIT_DIR}/COMMIT_MSG";
+   my @git_path = qw(rev-parse --git-path);
+   my $commit_editmsg = command_oneline(@git_path, 'COMMIT_EDITMSG');
+   my $commit_msg = command_oneline(@git_path, 'COMMIT_MSG');
open my $log_fh, '>', $commit_editmsg or croak $!;
 
my $type = command_oneline(qw/cat-file -t/, $treeish);
diff --git a/perl/Git/SVN.pm b/perl/Git/SVN.pm
index 018beb8..499e84b 100644
--- a/perl/Git/SVN.pm
+++ b/perl/Git/SVN.pm
@@ -807,10 +807,15 @@ sub get_fetch_range {
(++$min, $max);
 }
 
+sub svn_dir {
+   command_oneline(qw(rev-parse --git-path svn));
+}
+
 sub tmp_config {
my (@args) = @_;
-   my $old_def_config = "$ENV{GIT_DIR}/svn/config";
-   my $config = "$ENV{GIT_DIR}/svn/.metadata";
+   my $svn_dir = svn_dir();
+   my $old_def_config = "$svn_dir/config";
+   my $config = "$svn_dir/.metadata";
if (! -f $config && -f $old_def_config) {
rename $old_def_config, $config or
   die "Failed rename $old_def_config => $config: $!\n";
@@ -1671,7 +1676,7 @@ sub tie_for_persistent_memoization {
return if $memoized;
$memoized = 1;
 
-   my $cache_path = "$ENV{GIT_DIR}/svn/.caches/";
+   my $cache_path = svn_dir() . '/.caches/';
mkpath([$cache_path]) unless -d $cache_path;
 
my %lookup_svn_merge_cache;
@@ -1712,7 +1717,7 @@ sub tie_for_persistent_memoization {
sub clear_memoized_mergeinfo_caches {
die "Only call this method in non-memoized context" if 
($memoized);
 
-   my $cache_path = "$ENV{GIT_DIR}/svn/.caches/";
+   my $cache_path = svn_dir() . '/.caches/';
return unless -d $cache_path;
 
for my $cache_file (("$cache_path/lookup_svn_merge",
@@ -2446,12 +2451,13 @@ sub _new {
 "refs/remotes/$prefix$default_ref_id";
}
$_[1] = $repo_id;
-   my $dir = "$ENV{GIT_DIR}/svn/$ref_id";
+   my $svn_dir = svn_dir();
+   my $dir = "$svn_dir/$ref_id";
 
-   # Older repos imported by us used $GIT_DIR/svn/foo instead of
-   # $GIT_DIR/svn/refs/remotes/foo when tracking refs/remotes/foo
+   # Older repos imported by us used $svn_dir/foo instead of
+   # $svn_dir/refs/remotes/foo when tracking refs/remotes/foo
if ($ref_id =~ m{^refs/remotes/(.+)}) {
-   my $old_dir = "$ENV{GIT_DIR}/svn/$1";
+   my $old_dir = "$svn_dir/$1";
if (-d $old_dir && ! -d $dir) {
$dir = $old_dir;
}
@@ -2461,7 +2467,7 @@ sub _new {
mkpath([$dir]);
my $obj = bless {
ref_id => $ref_id, dir => $dir, index => "$dir/index",
-   config => "$ENV{GIT_DIR}/svn/config",
+   config => "$svn_dir/config",
map_root => "$dir/.rev_map", repo_id => $repo_id }, $class;
 
# Ensure it gets canonicalized
diff --git a/perl/Git/SVN/Migration.pm b/perl/Git/SVN/Migration.pm
index cf6ffa7..dc90f6a 100644
--- a/perl/Git/SVN/Migration.pm
+++ b/perl/Git/SVN/Migration.pm
@@ -44,7 +44,9 @@ use Git qw(
command_noisy
command_output_pipe
command_close_pipe
+   command_oneline
 );
+use Git::SVN;
 
 sub migra

[PATCH 0/2] git-svn: implement "git worktree" awareness

2016-10-13 Thread Eric Wong
+Cc Jakub since gitweb could probably take advantage of get_record
from the first patch, too.  I'm not completely sure about the API
for this, though.

The following changes since commit 3cdd5d19178a54d2e51b5098d43b57571241d0ab:

  Sync with maint (2016-10-11 14:55:48 -0700)

are available in the git repository at:

  git://bogomips.org/git-svn.git svn-wt

for you to fetch changes up to 112423eb905cf28c9445781a7647ba590d597ab3:

  git-svn: "git worktree" awareness (2016-10-14 01:36:12 +)


Eric Wong (2):
  git-svn: reduce scope of input record separator change
  git-svn: "git worktree" awareness

 git-svn.perl  | 13 +++--
 perl/Git.pm   | 16 +++-
 perl/Git/SVN.pm   | 24 +++-
 perl/Git/SVN/Editor.pm| 12 +---
 perl/Git/SVN/Fetcher.pm   | 15 +--
 perl/Git/SVN/Migration.pm | 37 ++---
 6 files changed, 69 insertions(+), 48 deletions(-)
-- 
EW



Change Default merge strategy options

2016-10-13 Thread Daniel Lopez
Hi,

How to use 'git config --global' to set default strategy like recursive.


Example:
Currently , when we want to enforce a specific strategic we need to include  
its reference on the command line : 
git.exe merge --strategy=recursive --strategy-option=ignore-all-space dev-local


we would like to define it as the default strategic to follow and be able to 
simplify the command line to:

git.exe merge dev-fix1.3-local 

Using alias is not an option as the git is being called from TortoiseGit (our 
current gui tool).


Daniel Lopez
Concept Developer

Tel: +351 289 100683 | Email: dlo...@csscorporate.com | www.csscorporate.com |  
 



Re: [GIT GUI l18n 1/2] git-gui: Mark 'All' in remote.tcl for translation

2016-10-13 Thread Pat Thoyts
Alexander Shopov  writes:

>Signed-off-by: Alexander Shopov 
>---
> lib/remote.tcl | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
>diff --git a/lib/remote.tcl b/lib/remote.tcl
>index 4e5c784..26af8ae 100644
>--- a/lib/remote.tcl
>+++ b/lib/remote.tcl
>@@ -250,12 +250,12 @@ proc update_all_remotes_menu_entry {} {
> 
>   $fetch_m insert end separator
>   $fetch_m insert end command \
>-  -label "All" \
>+  -label [mc "All"] \
>   -command fetch_from_all
> 
>   $prune_m insert end separator
>   $prune_m insert end command \
>-  -label "All" \
>+  -label [mc "All" ]\
>   -command prune_from_all
>   }
>   } else {

OK - this looks fine except the line just above compares the text of
this entry so also needs [mc] adding. I've applied it as:

-- a/lib/remote.tcl
+++ b/lib/remote.tcl
@@ -246,22 +246,22 @@ proc update_all_remotes_menu_entry {} {
if {$have_remote > 1} {
make_sure_remote_submenues_exist $remote_m
if {[$fetch_m type end] eq "command" \
-   && [$fetch_m entrycget end -label] ne "All"} {
+   && [$fetch_m entrycget end -label] ne [mc 
"All"]} {
 
$fetch_m insert end separator
$fetch_m insert end command \
-   -label "All" \
+   -label [mc "All"] \
-command fetch_from_all
 
$prune_m insert end separator
$prune_m insert end command \
-   -label "All" \
+   -label [mc "All"] \
-command prune_from_all
}
} else {
if {[winfo exists $fetch_m]} {
if {[$fetch_m type end] eq "command" \
-   && [$fetch_m entrycget end -label] eq "All"} {
+   && [$fetch_m entrycget end -label] eq 
[mc "All"]} {
 
delete_from_menu $fetch_m end
delete_from_menu $fetch_m end


-- 
Pat Thoytshttp://www.patthoyts.tk/
PGP fingerprint 2C 6E 98 07 2C 59 C8 97  10 CE 11 E6 04 E0 B9 DD


Schönen Tag

2016-10-13 Thread merlin . hutter
Schönen Tag


Dies ist die Santander Consumer Finance wir Darlehen zu 3% für beide 
Unternehmen Darlehen und persönliche Darlehen bieten für Ihr Projekt und 
individuellen Anforderungen der Position Darlehen anbieten zu können.

Füllen Sie das Formular aus, wenn interessiert

Vor-und Nachname:
Land:
Erforderliche Kreditbetrag :
Dauer:
Zweck des Darlehens:


Eine schnelle Reaktion ist erforderlich

Grüße

Ana Botin


Re: [PATCHv3] attr: convert to new threadsafe API

2016-10-13 Thread Stefan Beller
> On Wed, Oct 12, 2016 at 4:33 PM, Junio C Hamano  wrote:
> so I am hoping that we won't have to do this uglier variant

---8<--- attr.h:
...
struct git_attr_result {
int check_nr;
/* Whether is was statically allocated and cannot be resized. */
int static_alloc;
const char *value[FLEX_ARRAY];
};

/* create a pointer pointing to a git_attr_result with a given size: */
#define GIT_ATTR_RESULT_INIT_FOR(name, size) \
struct { \
int check_nr; \
int static_alloc; \
const char *value[size]; \
} local_##name; \
struct git_attr_result *name = \
(struct git_attr_result *)&(local_##name); \
local_##name.static_alloc = 1;
...
---8<--- e.g. ws.c:

static struct git_attr_check *attr_whitespace_rule;
GIT_ATTR_RESULT_INIT_FOR(result, 1);

git_attr_check_initl(&attr_whitespace_rule, "whitespace", NULL);

if (!git_check_attr(pathname, attr_whitespace_rule, result)) {
if (ATTR_TRUE(result->value[0])) {
...
} else if (ATTR_FALSE(result->value[0])) {
...


Talent Scout

2016-10-13 Thread Camilia Brunnet
Dear Concern,

I am Talent Scout For BLUE SKY FILM STUDIO, Present Blue sky Studio a
Film Corporation Located in the United State, is Soliciting for the
Right to use Your Photo/Face and Personality as One of the Semi -Major
Role/ Character in our Upcoming ANIMATED Stereoscope 3D Movie-The Story
of Ferdinand (Ferdinand 2017) The Movie is Currently Filming (In
Production) Please Note That There Will Be No Auditions, Traveling or
Any Special / Professional Acting Skills, Since the Production of This
Movie Will Be Done with our State of Art Computer -Generating Imagery
Equipment. We Are Prepared to Pay the Total Sum of $620,000.00 USD. For
More Information/Understanding, Please Write us on the E-Mail Below.
CONTACT EMAIL: blueskystud...@usa.com
All Reply to:  blueskystud...@usa.com
Note: Only the Response send to this mail will be Given a Prior
Consideration.

Talent Scout
Camilia Brunnet


Re: problem with git worktree and git svn

2016-10-13 Thread Eric Wong
Duy Nguyen  wrote:
> On Thu, Oct 13, 2016 at 8:52 AM, Eric Wong  wrote:
> > +sub svn_dir {
> > +   my $git_dir = scalar @_ ? $_[0] : $ENV{GIT_DIR};
> > +   my $common = $ENV{GIT_COMMON_DIR} || "$git_dir/commondir";
> > +   $git_dir .= '/'.::file_to_s($common) if -e $common;
> > +   my $svn_dir = $git_dir . '/svn';
> > +   $svn_dir =~ tr!/!/!s;
> > +   $svn_dir;
> > +}
> 
> 
> If this is shell script, this function could be just
> 
> svn_dir() {
> git rev-parse --git-path svn
> }

Ah, thanks; I missed --git-path.  I will do this in Git/SVN.pm:

--- a/perl/Git/SVN.pm
+++ b/perl/Git/SVN.pm
@@ -808,12 +808,7 @@ sub get_fetch_range {
 }
 
 sub svn_dir {
-   my $git_dir = scalar @_ ? $_[0] : $ENV{GIT_DIR};
-   my $common = $ENV{GIT_COMMON_DIR} || "$git_dir/commondir";
-   $git_dir .= '/'.::file_to_s($common) if -e $common;
-   my $svn_dir = $git_dir . '/svn';
-   $svn_dir =~ tr!/!/!s;
-   $svn_dir;
+   command_oneline(qw(rev-parse --git-path svn));
 }
 
 sub tmp_config {

> which should give you correct path in either single or multi-worktree
> context and you don't need to bother with details like
> $GIT_COMMON_DIR. But I don't know how Perl bindings are implemented, I
> don't know if we have something similar (or easy to add it, like
> Git::git_path()).

I'm not sure it's necessary given the convenience of command_oneline,
and I'd rather avoid the overhead of documenting+supporting a new API
for Git.pm

> I don't know much about git-svn, but from the look of it I agree
> replacing $ENV{GIT_DIR}/svn with svn_dir() should fix it, assuming
> that you don't hard code $ENV{GIT_DIR}/blahblah somewhere else. I
> don't see any other potential problems (from multi-worktree
> perspective).

I think there was a place where $GIT_DIR/config was used, but
only for documentation purposes.


Re: Huge performance bottleneck reading packs

2016-10-13 Thread Jeff King
On Thu, Oct 13, 2016 at 08:18:11PM +0200, Vegard Nossum wrote:

> > My guess is that the number is relatively high. And that would explain
> > why nobody else has really complained much; such a pattern is probably
> > uncommon.
> 
> I get ~3,700 objects "they are advertising that we don't have".
> 
> They are all indeed tags which I don't have (nor want) locally.

Thanks. That's about what I expected.

> So without your patch I get these numbers:
> 
>   %   cumulative   self  self total
>  time   seconds   secondscalls   s/call   s/call  name
>  37.34 29.8329.83 1948265116 0.00 0.00  strip_suffix_mem
>  27.56 51.8522.0211358 0.00 0.01 prepare_packed_git_one
>  14.24 63.2311.38 1924457702 0.00 0.00  strip_suffix
>   4.88 67.13 3.90 26045319 0.00 0.00  find_pack_entry_one
>   4.17 70.46 3.33 84828149 0.00 0.00  hashcmp
>   3.28 73.08 2.62 79760199 0.00 0.00  hashcmp
>   2.44 75.03 1.95 const_error
> 
> the call graph data shows all the reprepare_packed_git() calls to come
> from a chain like this:
> 
> do_for_each_ref
> do_for_each_ref_iterator
> ref_iterator_advance
> files_ref_iterator_advance
> ref_resolves_to_object
> has_sha1_file
> has_sha1_file_with_flags
> reprepare_packed_git

Hrm. It seems weird that we'd hit reprepare_packed_git() via
do_for_each_ref(), because that's looking at _your_ refs. So any time
that code path hits reprepare_packed_git(), it should be complaining
about repository corruption to stderr.

And that also wouldn't make sense that my patch would improve it. Are
you sure you've read the perf output correctly (I'll admit that I am
often confused by the way it reports call graphs)?

> With your (first) patch I get this instead:
> 
>   %   cumulative   self  self total
>  time   seconds   secondscalls  ms/call  ms/call  name
>  29.41  0.25 0.25  4490399 0.00 0.00  hashcmp
>  16.47  0.39 0.14   843447 0.00 0.00  find_pack_entry_one

These functions appearing at the top are probably a sign that you have
too many packs (these are just object lookup, which has to linearly try
each pack).

So I'd expect things to improve after a git-gc (and especially if you
turn off the packSizeLimit).

> Do you have all the profile data you were interested in before I try a
> 'git gc'?

Yes, I think so. At least I'm satisfied that there's not another
HAS_SHA1_QUICK case that I'm missing.

> I really appreciate the quick response and the work you put into fixing
> this, even when it's my fault for disabling gc, thanks!

No problem. I do think you'll benefit a lot from packing everything into
a single pack, but it's also clear that Git was doing more work than it
needed to be. This was a known issue when we added the racy-check to
has_sha1_file(), and knew that we might have to field reports like this
one.

-Peff


Re: [PATCH v3 25/25] sequencer: mark all error messages for translation

2016-10-13 Thread Johannes Sixt

Am 13.10.2016 um 16:56 schrieb Johannes Schindelin:

On Wed, 12 Oct 2016, Junio C Hamano wrote:

You have at least two independent changes relative to Dscho's
version.

 (1) Show line breaks more prominently by avoiding "\n\n" and
 breaking the string at "\n"; this matches the way how the
 result would be displayed more closely to how the source looks
 like.

 (2) Ignore the usual indentation rule and have messages start at
 the left end of the source.

Which one are you saying "makes sense" to?  Both?

I guess both can be grouped together into one theme: match the way
the final output and the source code look like.


Yes, that summarizes it pretty well.


I agree with that motivation, but I decided to go about it in a way that
is more in line with the existing source code:

-- snipsnap --
diff --git a/sequencer.c b/sequencer.c
index 8e10bb5..1cf70f7 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -483,6 +483,20 @@ static char **read_author_script(void)
return env;
 }

+static const char staged_changes_advice[] =
+N_("you have staged changes in your working tree\n"
+"If these changes are meant to be squashed into the previous commit, run:\n"
+"\n"
+"  git commit --amend %s\n"
+"\n"
+"If they are meant to go into a new commit, run:\n"
+"\n"
+"  git commit %s\n"
+"\n"
+"In both cases, once you're done, continue with:\n"
+"\n"
+"  git rebase --continue\n");


Much better! Thank you.

-- Hannes



Re: [PATCH] fetch: use "quick" has_sha1_file for tag following

2016-10-13 Thread Jeff King
On Thu, Oct 13, 2016 at 01:04:43PM -0400, Jeff King wrote:

> > This patch teaches fetch to use HAS_SHA1_QUICK to sacrifice
> > accuracy for speed, in cases where we might be racy with a
> > simultaneous repack. This is similar to the fix in 0eeb077
> > (index-pack: avoid excessive re-reading of pack directory,
> > 2015-06-09). As with that case, it's OK for has_sha1_file()
> > occasionally say "no I don't have it" when we do, because
> > the worst case is not a corruption, but simply that we may
> > fail to auto-follow a tag that points to it.
> 
> Failing in this direction doesn't make me feel great. I had hoped it
> would fail the _other_ way, and request an object that we might already
> have.
> 
> There are two alternatives that might be worth pursuing:
> 
>   1. The thing that makes this really painful is the quadratic
>  duplicate-search in prepare_packed_git_one(). We could speed that
>  up pretty easily by keeping a hash of the packs indexed by their
>  names, and looking up in that.
> 
>  That drops the quadratic factor, but it's still going to be
>  O(nr_tags * nr_packs), as we have to at the very least readdir()
>  all of objects/pack to see that nothing new showed up.

That hash patch would look something like what is below.

Here are numbers for the perf script ("quick" is the one that skips
reprepare entirely, "hash" is this hash table patch, and "quick+hash" is
the combination of the two):

origin  quickhash 
quick+hash

11.09(10.38+0.70)   0.08(0.06+0.01) -99.3%   1.41(0.65+0.75) -87.3%   
0.07(0.05+0.01) -99.4%

So yes, it does improve this case, but not nearly as much as the "quick"
approach. Putting it on top of the "quick" approach is barely noticeable
(it is speeding up the initial prepare_packed_git() call, but it's just
not expensive enough in the first place to be worth it).

The "hash" patch does fix a potentially quadratic behavior, though, so I
guess in theory it is worth it. But I had to go up to quite a large
number of packs to make it worthwhile. Here it is at 7500 packs, running
"git cat-file -e $some_object_that_exists":

  [origin]
  real0m0.417s
  user0m0.376s
  sys 0m0.040s

  [hash]
  real0m0.054s
  user0m0.008s
  sys 0m0.044s

So it's certainly better. But 7500 packs is just silly, and squeezing
out ~400ms there is hardly worth it. If you repack this same case into a
single pack, the command drops to 5ms. So yes, there's close to an order
of magnitude speedup here, but you get that _and_ another order of
magnitude just by repacking.

So I think it's only worth pursuing if we wanted to scrap my original
patch, and continue to aggressively reprepare_packed_git(). I'd worry
that it's still expensive on some systems, even with the hash, because
the opendir/readdir might cost a lot (OTOH, we'll by definition have
just done a stat() on the loose version of the object, so there's a
limit to how fast we can make it).

I dunno.

---
 cache.h |  2 ++
 sha1_file.c | 44 +++-
 2 files changed, 37 insertions(+), 9 deletions(-)

diff --git a/cache.h b/cache.h
index ec791a6..0d8b4e8 100644
--- a/cache.h
+++ b/cache.h
@@ -1411,6 +1411,7 @@ struct pack_window {
 };
 
 extern struct packed_git {
+   struct hashmap_entry name_entry;
struct packed_git *next;
struct pack_window *windows;
off_t pack_size;
@@ -1428,6 +1429,7 @@ extern struct packed_git {
 do_not_close:1;
unsigned char sha1[20];
struct revindex_entry *revindex;
+   size_t basename_len;
/* something like ".git/objects/pack/x.pack" */
char pack_name[FLEX_ARRAY]; /* more */
 } *packed_git;
diff --git a/sha1_file.c b/sha1_file.c
index c652cb6..eb6a5f3 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -602,6 +602,8 @@ struct packed_git *packed_git;
 static struct mru packed_git_mru_storage;
 struct mru *packed_git_mru = &packed_git_mru_storage;
 
+static struct hashmap pack_name_index;
+
 void pack_report(void)
 {
fprintf(stderr,
@@ -1260,6 +1262,30 @@ struct packed_git *parse_pack_index(unsigned char *sha1, 
const char *idx_path)
return p;
 }
 
+static int pack_name_hashcmp(const void *va, const void *vb, const void *name)
+{
+   const struct packed_git *a = va, *b = vb;
+
+   if (a->basename_len != b->basename_len)
+   return -1;
+   else if (name)
+   return memcmp(a->pack_name, name, a->basename_len);
+   else
+   return memcmp(a->pack_name, b->pack_name, a->basename_len);
+}
+
+static int pack_loaded(const char *name, size_t len)
+{
+   struct packed_git key;
+
+   if (!pack_name_index.tablesize)
+   return 0;
+
+   hashmap_entry_init(&key, memhash(name, len));
+   key.basename_len = len;
+   return !!hashm

Re: Uninitialized submodules as symlinks

2016-10-13 Thread Kevin Daudt
On Thu, Oct 13, 2016 at 06:10:17PM +0200, Heiko Voigt wrote:
> On Fri, Oct 07, 2016 at 06:17:05PM +, David Turner wrote:
> > Presently, uninitialized submodules are materialized in the working
> > tree as empty directories.  We would like to consider having them be
> > symlinks.  Specifically, we'd like them to be symlinks into a FUSE
> > filesystem which retrieves files on demand.
> 
> How about portability? This feature would only work on Unix like
> operating systems. You have to be careful to not break Windows since
> they do not have symlinks.
> 

NTFS does have symlinks, but you need admin right to create them though
(unless you change the policy).


[GIT GUI l18n 1/2] git-gui: Mark 'All' in remote.tcl for translation

2016-10-13 Thread Alexander Shopov
Signed-off-by: Alexander Shopov 
---
 lib/remote.tcl | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/remote.tcl b/lib/remote.tcl
index 4e5c784..26af8ae 100644
--- a/lib/remote.tcl
+++ b/lib/remote.tcl
@@ -250,12 +250,12 @@ proc update_all_remotes_menu_entry {} {
 
$fetch_m insert end separator
$fetch_m insert end command \
-   -label "All" \
+   -label [mc "All"] \
-command fetch_from_all
 
$prune_m insert end separator
$prune_m insert end command \
-   -label "All" \
+   -label [mc "All" ]\
-command prune_from_all
}
} else {
-- 
2.10.1



Re: [PATCHv3] attr: convert to new threadsafe API

2016-10-13 Thread Stefan Beller
On Wed, Oct 12, 2016 at 4:33 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> @@ -89,15 +114,20 @@ static void setup_check(void)
>>
>>  
>>   const char *path;
>> + struct git_attr_result *result;
>>
>>   setup_check();
>> - git_check_attr(path, check);
>> + result = git_check_attr(path, check);
>
> This looks stale by a few revisions of the other parts of the patch?

I'll update the documentation.

>
>> diff --git a/archive.c b/archive.c
>> index 11e3951..15849a8 100644
>> --- a/archive.c
>> +++ b/archive.c
>> @@ -107,10 +107,12 @@ static int write_archive_entry(const unsigned char 
>> *sha1, const char *base,
>>   void *context)
>>  {
>>   static struct strbuf path = STRBUF_INIT;
>> + static struct git_attr_check *check;
>> +
>>   struct archiver_context *c = context;
>>   struct archiver_args *args = c->args;
>>   write_archive_entry_fn_t write_entry = c->write_entry;
>> - static struct git_attr_check *check;
>> + struct git_attr_result result = GIT_ATTR_RESULT_INIT;
>>   const char *path_without_prefix;
>>   int err;
>>
>> @@ -124,12 +126,16 @@ static int write_archive_entry(const unsigned char 
>> *sha1, const char *base,
>>   strbuf_addch(&path, '/');
>>   path_without_prefix = path.buf + args->baselen;
>>
>> - if (!check)
>> - check = git_attr_check_initl("export-ignore", "export-subst", 
>> NULL);
>> - if (!git_check_attr(path_without_prefix, check)) {
>> - if (ATTR_TRUE(check->check[0].value))
>> + git_attr_check_initl(&check, "export-ignore", "export-subst", NULL);
>> + git_attr_result_init(&result, check);
>> +
>> + if (!git_check_attr(path_without_prefix, check, &result)) {
>> + if (ATTR_TRUE(result.value[0])) {
>> + git_attr_result_clear(&result);
>>   return 0;
>> - args->convert = ATTR_TRUE(check->check[1].value);
>> + }
>> + args->convert = ATTR_TRUE(result.value[1]);
>> + git_attr_result_clear(&result);
>>   }
>
> This is exactly what I meant by "can we avoid alloc/free of result
> in leaf function when we _know_ how many attributes we are
> interested in already, which is the majority of the case?".

We can avoid that. For now I settled with an implementation that
has no "answer" type, but uses a bare "const char *result[FLEX_ARRAY];",
or rather a const char **.

>
> Starting with a simple but unoptimized internal implementation of
> the attr subsystem is one thing (which is good).  Exposing an API that
> cannot be optimally implemented later without changing the callers
> is another (which is bad).
>
> By encapsulating each element into "struct git_attr_result", we can
> extend the API without changing the API user [*1*].

Oh I see.

So instead of a raw string we want to have

struct git_attr_result {
const char *value;
};

just to have it extensible. Makes sense. I'll redo that.


>
> But I do not think of a way to allow an efficient implementation
> later unless the source of the API user somehow says "this user is
> only interested in this many attributes", like having this
>
> struct git_attr_result result[2];

const char *result[2] = {NULL, NULL};

as of now would be

struct git_attr_result result[2];

but we'd lose the ability to set them to NULL easily. Probably not needed.

>
> (because this caller only wants "ignore" and "subst") on the API
> user's side [*2*].  Without such a clue (like the patch above, that
> only says "there is a structure called 'result'"), I do not think of
> a way to avoid dynamic allocation on the API implementation side.
>
> All the other callers in the patch (pack-objects, convert, ll-merge,
> etc.) seem to share the exact same pattern.  Each of the leaf
> functions knows a fixed set of attributes it is interested in, the
> caller iterates over many paths and makes calls to these leaf
> functions, and it is a waste to pay alloc/free overhead for each and
> every iteration when we know how many elements result needs to
> store.
>

Right.

>
> [Footnote]
>
> *1* Would we need a wrapping struct around the array of results?  If
> that is the case, we may need something ugly like this on the
> API user side:
>
> GIT_ATTR_RESULT_TYPE(2) result = {2,};
>
> with something like the following on the API implementation
> side:
>
> #define GIT_ATTR_RESULT_TYPE(n) \
> struct { \
> int num_slots; \
> const char *value[n]; \
> }
>
> struct git_attr_result {
> int num_slots;
> const char *value[FLEX_ARRAY];
> };
> git_attr_result_init(void *result_, struct git_attr_check *check)
> {
> struct git_attr_result *result = result_;
>
> assert(result->num_slots, check->num_attrs);
> ...
> }
> 

Re: Huge performance bottleneck reading packs

2016-10-13 Thread Vegard Nossum

On 10/13/2016 05:26 PM, Jeff King wrote:

On Thu, Oct 13, 2016 at 09:20:07AM +0200, Vegard Nossum wrote:


Does the patch below help?


Yes, ~2m10s -> ~1m25s when I test a git fetch this morning (the other
variation in time may be due to different CPU usage by other programs,
but I ran with/without the patch multiple times and the difference is
consistent).
[...]
There are some 20k refs on the remote, closer to 25k locally.


OK, that makes some sense. For whatever reason, your remote has a bunch
of tags that point to objects you do not already have. That could
happen, I think, if the remote added a lot of tags since you cloned
(because clone will grab all of the tags), but those tags do not point
to history that you are otherwise fetching (since fetch by default will
"auto-follow" such tags).

It might be interesting to see the results of:

  # all the objects we have
  git cat-file --batch-all-objects --batch-check='%(objectname)' >us

  # all the objects the remote is advertising
  git ls-remote origin | cut -f1 | sort -u >them

  # what they are advertising that we don't have
  comm -13 us them | wc -l

My guess is that the number is relatively high. And that would explain
why nobody else has really complained much; such a pattern is probably
uncommon.


I get ~3,700 objects "they are advertising that we don't have".

They are all indeed tags which I don't have (nor want) locally.


The fetch doesn't actually get anything from the remote as everything is
already up to date (that makes the 2m40s times even more frustrating in
a way :-)). Here's count-objects:


If the fetch is largely a noop, then that makes me wonder why we are
spending even a minute in the "good" case. I wonder if there is some


It was user time I reported and this is over network, so some of it
might be IO wait on the network? With the (first) patch:

$ time git fetch

real1m9.248s
user0m40.808s
sys 0m2.996s

Also see the profile data below.


other spot that is wasting CPU on some inefficient data structure
related to the number of refs you have. If you can do any profiling that
points to a hot spot, that would be interesting to see (and also whether
a gc improves things).

I see in find_non_local_tags() that we build up a sorted string_list via
repeated calls to string_list_insert(), which will keep the thing sorted
at each stage. That's not as efficient as just sorting at the end, but I
think it's probably OK in practice because we actually feed it via
for_each_ref(), whose output is sorted, and so we'd always just be
appending to the end.


So without your patch I get these numbers:

  %   cumulative   self  self total
 time   seconds   secondscalls   s/call   s/call  name
 37.34 29.8329.83 1948265116 0.00 0.00  strip_suffix_mem
 27.56 51.8522.0211358 0.00 0.01 
prepare_packed_git_one

 14.24 63.2311.38 1924457702 0.00 0.00  strip_suffix
  4.88 67.13 3.90 26045319 0.00 0.00  find_pack_entry_one
  4.17 70.46 3.33 84828149 0.00 0.00  hashcmp
  3.28 73.08 2.62 79760199 0.00 0.00  hashcmp
  2.44 75.03 1.95 const_error

the call graph data shows all the reprepare_packed_git() calls to come
from a chain like this:

do_for_each_ref
do_for_each_ref_iterator
ref_iterator_advance
files_ref_iterator_advance
ref_resolves_to_object
has_sha1_file
has_sha1_file_with_flags
reprepare_packed_git

the do_for_each_ref() calls come from prepare_replace_object(),
do_fetch(), find_common(), everything_local(),
check_for_new_submodule_commits(), and find_non_local_tags().

With your (first) patch I get this instead:

  %   cumulative   self  self total
 time   seconds   secondscalls  ms/call  ms/call  name
 29.41  0.25 0.25  4490399 0.00 0.00  hashcmp
 16.47  0.39 0.14   843447 0.00 0.00  find_pack_entry_one
 10.59  0.48 0.0960490 0.00 0.00 
unpack_object_header_buffer

  4.71  0.52 0.04   167441 0.00 0.00  lookup_object
  3.53  0.55 0.03   843447 0.00 0.00  fill_pack_entry
  3.53  0.58 0.03   213169 0.00 0.00 
check_refname_component

  2.35  0.60 0.02   550723 0.00 0.00  strip_suffix_mem
  2.35  0.62 0.02   214607 0.00 0.00  insert_obj_hash
  2.35  0.64 0.02   158866 0.00 0.00  hashcmp
  2.35  0.66 0.02   153936 0.00 0.00 
nth_packed_object_offset


Do you have all the profile data you were interested in before I try a
'git gc'?

I really appreciate the quick response and the work you put into fixing
this, even when it's my fault for disabling gc, thanks!


Vegard


Re: [PATCH] fetch: use "quick" has_sha1_file for tag following

2016-10-13 Thread Jeff King
On Thu, Oct 13, 2016 at 12:53:44PM -0400, Jeff King wrote:

> -- >8 --
> Subject: [PATCH] fetch: use "quick" has_sha1_file for tag following

A few comments on my own patch...

> This patch teaches fetch to use HAS_SHA1_QUICK to sacrifice
> accuracy for speed, in cases where we might be racy with a
> simultaneous repack. This is similar to the fix in 0eeb077
> (index-pack: avoid excessive re-reading of pack directory,
> 2015-06-09). As with that case, it's OK for has_sha1_file()
> occasionally say "no I don't have it" when we do, because
> the worst case is not a corruption, but simply that we may
> fail to auto-follow a tag that points to it.

Failing in this direction doesn't make me feel great. I had hoped it
would fail the _other_ way, and request an object that we might already
have.

There are two alternatives that might be worth pursuing:

  1. The thing that makes this really painful is the quadratic
 duplicate-search in prepare_packed_git_one(). We could speed that
 up pretty easily by keeping a hash of the packs indexed by their
 names, and looking up in that.

 That drops the quadratic factor, but it's still going to be
 O(nr_tags * nr_packs), as we have to at the very least readdir()
 all of objects/pack to see that nothing new showed up.

 I wonder if we could trust the timestamp on the objects/pack
 directory to avoid re-reading it at all. That turns it into a
 single stat() call.

  2. We could stop bothering to reprepare_packed_git() only after the
 nth time it is called. This basically turns on the "sacrifice
 accuracy for speed" behavior automatically, but it means that most
 cases would never do so, because it wouldn't be creating a
 performance issue in the first place.

 It feels weird and flaky that git might behave differently based on
 the number of unfetched tags the remote happens to have, though.

> Here are results from the included perf script, which sets
> up a situation similar to the one described above:
> 
> TestHEAD^   HEAD
> --
> 5550.4: fetch   11.21(10.42+0.78)   0.08(0.04+0.02) -99.3%

The situation in this perf script is obviously designed to show off this
one specific optimization. It feels a bit overly specific, and I doubt
anybody will be all that interested in running it again once the fix is
in. OTOH, I did want to document my reproduction steps, and this seemed
like the only reasonable place to do so. And as the perf suite is
already pretty expensive, perhaps nobody minds adding to it too much.

-Peff


[PATCH] fetch: use "quick" has_sha1_file for tag following

2016-10-13 Thread Jeff King
On Thu, Oct 13, 2016 at 11:26:32AM -0400, Jeff King wrote:

> On Thu, Oct 13, 2016 at 09:20:07AM +0200, Vegard Nossum wrote:
> 
> > > Does the patch below help?
> > 
> > Yes, ~2m10s -> ~1m25s when I test a git fetch this morning (the other
> > variation in time may be due to different CPU usage by other programs,
> > but I ran with/without the patch multiple times and the difference is
> > consistent).
> > [...]
> > There are some 20k refs on the remote, closer to 25k locally.
> 
> OK, that makes some sense. For whatever reason, your remote has a bunch
> of tags that point to objects you do not already have. That could
> happen, I think, if the remote added a lot of tags since you cloned
> (because clone will grab all of the tags), but those tags do not point
> to history that you are otherwise fetching (since fetch by default will
> "auto-follow" such tags).

Armed with this information, I was able to reproduce the issue locally.
However, once my patch is applied, it's now quite fast. So I still don't
know where your other 1m25s is going.

So here's that same patch wrapped up with a commit message. Note that I
converted one more call site to the "quick" form; it would trigger when
the candidate tags are real tag objects, not just pointers to commits.
That might improve your runtime more, depending on what is actually in
your repository.

-- >8 --
Subject: [PATCH] fetch: use "quick" has_sha1_file for tag following

When we auto-follow tags in a fetch, we look at all of the
tags advertised by the remote and fetch ones where we don't
already have the tag, but we do have the object it peels to.
This involves a lot of calls to has_sha1_file(), some of
which we can reasonably expect to fail. Since 45e8a74
(has_sha1_file: re-check pack directory before giving up,
2013-08-30), this may cause many calls to
reprepare_packed_git(), which is potentially expensive.

This has gone unnoticed for several years because it
requires a fairly unique setup to matter:

  1. You need to have a lot of packs on the client side to
 make reprepare_packed_git() expensive (the most
 expensive part is finding duplicates in an unsorted
 list, which is currently quadratic).

  2. You need a large number of tag refs on the server side
 that are candidates for auto-following (i.e., that the
 client doesn't have). Each one triggers a re-read of
 the pack directory.

  3. Under normal circumstances, the client would
 auto-follow those tags and after one large fetch, (2)
 would no longer be true. But if those tags point to
 history which is disconnected from what the client
 otherwise fetches, then it will never auto-follow, and
 those candidates will impact it on every fetch.

So when all three are true, each fetch pays an extra
O(nr_tags * nr_packs^2) cost, mostly in string comparisons
on the pack names. This was exacerbated by 47bf4b0
(prepare_packed_git_one: refactor duplicate-pack check,
2014-06-30) which uses a slightly more expensive string
check, under the assumption that the duplicate check doesn't
happen very often (and it shouldn't; the real problem here
is how often we are calling reprepare_packed_git()).

This patch teaches fetch to use HAS_SHA1_QUICK to sacrifice
accuracy for speed, in cases where we might be racy with a
simultaneous repack. This is similar to the fix in 0eeb077
(index-pack: avoid excessive re-reading of pack directory,
2015-06-09). As with that case, it's OK for has_sha1_file()
occasionally say "no I don't have it" when we do, because
the worst case is not a corruption, but simply that we may
fail to auto-follow a tag that points to it.

Here are results from the included perf script, which sets
up a situation similar to the one described above:

TestHEAD^   HEAD
--
5550.4: fetch   11.21(10.42+0.78)   0.08(0.04+0.02) -99.3%

Reported-by: Vegard Nossum 
Signed-off-by: Jeff King 
---
 builtin/fetch.c| 11 --
 cache.h|  1 +
 sha1_file.c|  5 +++
 t/perf/p5550-fetch-tags.sh | 99 ++
 4 files changed, 112 insertions(+), 4 deletions(-)
 create mode 100755 t/perf/p5550-fetch-tags.sh

diff --git a/builtin/fetch.c b/builtin/fetch.c
index d5329f9..74c0546 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -241,9 +241,10 @@ static void find_non_local_tags(struct transport 
*transport,
 * as one to ignore by setting util to NULL.
 */
if (ends_with(ref->name, "^{}")) {
-   if (item && !has_object_file(&ref->old_oid) &&
+   if (item &&
+   !has_object_file_with_flags(&ref->old_oid, 
HAS_SHA1_QUICK) &&
!will_fetch(head, ref->old_oid.hash) &&
-   !has_sha1_file(item->util) &&
+   !has_sha1_file_with_flags(item->util, 
HAS_SHA

Re: Huge performance bottleneck reading packs

2016-10-13 Thread Jeff King
On Thu, Oct 13, 2016 at 09:20:07AM +0200, Vegard Nossum wrote:

> > Does the patch below help?
> 
> Yes, ~2m10s -> ~1m25s when I test a git fetch this morning (the other
> variation in time may be due to different CPU usage by other programs,
> but I ran with/without the patch multiple times and the difference is
> consistent).
> [...]
> There are some 20k refs on the remote, closer to 25k locally.

OK, that makes some sense. For whatever reason, your remote has a bunch
of tags that point to objects you do not already have. That could
happen, I think, if the remote added a lot of tags since you cloned
(because clone will grab all of the tags), but those tags do not point
to history that you are otherwise fetching (since fetch by default will
"auto-follow" such tags).

It might be interesting to see the results of:

  # all the objects we have
  git cat-file --batch-all-objects --batch-check='%(objectname)' >us

  # all the objects the remote is advertising
  git ls-remote origin | cut -f1 | sort -u >them

  # what they are advertising that we don't have
  comm -13 us them | wc -l

My guess is that the number is relatively high. And that would explain
why nobody else has really complained much; such a pattern is probably
uncommon.

> The fetch doesn't actually get anything from the remote as everything is
> already up to date (that makes the 2m40s times even more frustrating in
> a way :-)). Here's count-objects:

If the fetch is largely a noop, then that makes me wonder why we are
spending even a minute in the "good" case. I wonder if there is some
other spot that is wasting CPU on some inefficient data structure
related to the number of refs you have. If you can do any profiling that
points to a hot spot, that would be interesting to see (and also whether
a gc improves things).

I see in find_non_local_tags() that we build up a sorted string_list via
repeated calls to string_list_insert(), which will keep the thing sorted
at each stage. That's not as efficient as just sorting at the end, but I
think it's probably OK in practice because we actually feed it via
for_each_ref(), whose output is sorted, and so we'd always just be
appending to the end.

-Peff


Re: [PATCH v2 3/3] batch check whether submodule needs pushing into one call

2016-10-13 Thread Heiko Voigt
On Wed, Oct 12, 2016 at 10:37:33AM -0700, Junio C Hamano wrote:
> Heiko Voigt  writes:
> 
> >> If we do not even have these commits locally, then there is no point
> >> attempting to push, so returning 0 (i.e. it is not "needs pushing"
> >> situation) is correct but it is a but subtle.  It's not "we know
> >> they already have them", but it is "even if we tried to push, it
> >> won't do us or the other side any good."  A single-liner in-code
> >> comment may help.
> >
> > First the naming part. How about:
> >
> > submodule_has_commits()
> 
> Nice.

Ok will use that. And while I am at it: I will also rename all the
'hashes' variables to commits because that makes the code way clearer I
think.

> > Returning 0 here means: "No push needed" but the correct answer would
> > be: "We do not know". 
> 
> Is it?  Perhaps I am misreading the "submodule-has-commits"; I
> thought it was "the remote may or may not need updating, but we
> ourselves don't have what they may need to have commits in their
> submodule that are referenced by their superproject, so it would not
> help them even if we pushed our submodule to them".  It indeed is
> different from "No push needed" (rather, "our pushing would be
> pointless").

Yes you could also rephrase/see it that way. But the question is: If we
do not have what the remote needs would the user expect us to tell him
that fact and stop or does he usually not care?

> > So how about:
> >
> >
> > if (!submodule_has_hashes(path, hashes))
> > /* NEEDSWORK: The correct answer here is "We do not
> >  * know" instead of "No". We currently proceed pushing
> >  * here as if the submodules commits are available on a
> >  * remote, which is not always correct. */
> > return 0;
> 
> I am not sure.  
> 
> What should happen in this scenario?
> 
>  * We have two remotes, A and B, for our superproject.
> 
>  * We are not interested in one submodule at path X.  Our repository
>is primarily used to work on the superproject and possibly other
>submodules but not the one at path X.
> 
>  * We pulled from A to update ourselves.  They were actively working
>on the submodule we are not interested in, and path X in the
>superproject records a new commit that we do not have.
> 
>  * We are now trying to push to B.

I am not sure if this is a typical scenario? Well, if you are updating
your main branch from someone else and then push it to your own fork
maybe. You could specify --no-recurse-submodules for this case though.
The proper solution for this case would probably be something along the
lines of 'submodule..fetchRecurseSubmodules' but for push so we
can mark certain submodules as uninteresting by default.

I like to be more protective to the user here. Its usually more
annoying for possibly many others when you push out things that have
missing things compared to one person not being able to push because his
submodule is not up-to-date/initialized.

> Should different things happen in these two subcases?
> 
>  - We are not interested in submodule at path X, so we haven't even
>done "submodule init" on it.
> 
>  - We are not interested in submodule at path X, so even though we
>do have a rather stale clone of it, we do not usually bother
>updating what is checked out at path X and commit our changes
>outside that area.
> 
> I tend to think that in these two cases the same thing should
> happen.  I am not sure if that same thing should be rejection
> (i.e. "you do not know for sure that the commit at path X of the
> superproject you are pushing exists in the submodule repository at
> the receiving end, so I'd refuse to push the superproject"), as it
> makes the only remedy for the situation is for you to make a full
> clone of the submodule you are not interested in and you have never
> touched yourself in either of these two subcases.

I also think in both situations the same thing should happen. A decision
that something different should happen should be made explicitly instead
of implicitly just because some submodule is not initialized. That might
be by accident or because a certain submodule is new so here the choice
should be made deliberately by the user, IMO.

Cheers Heiko


Re: Uninitialized submodules as symlinks

2016-10-13 Thread Heiko Voigt
On Fri, Oct 07, 2016 at 06:17:05PM +, David Turner wrote:
> Presently, uninitialized submodules are materialized in the working
> tree as empty directories.  We would like to consider having them be
> symlinks.  Specifically, we'd like them to be symlinks into a FUSE
> filesystem which retrieves files on demand.

How about portability? This feature would only work on Unix like
operating systems. You have to be careful to not break Windows since
they do not have symlinks.

Cheers Heiko


Re: Formatting problem send_mail in version 2.10.0

2016-10-13 Thread Matthieu Moy
Kevin Daudt  writes:

> On Wed, Oct 12, 2016 at 07:13:22PM -0400, Jeff King wrote:
>
>> I think the answer is pretty clearly no. It's just that historically we
>> have auto-munged it into something useful. I think the viable options
>> are basically:
>> 
>>   1. Tell people not to do that, and to do something RFC compliant like
>>  "Stable [4.8+]" . This is a little funny
>>  for git because we otherwise do not require things like
>>  rfc-compliant quoting for our name/email pairs. But it Just Works
>>  without anybody having to write extra code, or worry about corner
>>  cases in parsing.
>> 
>>   2. Drop everything after the trailing ">". This gives a valid rfc2822
>>  cc, and people can pick the "# 4.8" from the cc line in the body.
>
> Comments, surrounded by parenthesis are allowed after the ">" according
> to the RFC, just plain dropping everything comming after that would
> break that support.

Our in-house parser does consider (...) comments, and my patch does not
change how they are handled: they are still kept after the address part.

However, another piece of code does strip everything behind ">":

sub sanitize_address {
...
# remove garbage after email address
$recipient =~ s/(.*>).*$/$1/;

introduced in 831a488b76e0 (git-send-email: remove garbage after email
address, 2012-11-22).

IMHO, it's OK to continue doing this: removing comments from To: and Cc:
is not really a problem (and I think we've seen nobody complain about it
since 2012). But after my patch, these two lines can probably safely be
removed, as there can no longer be "garbage" after the email, only
comments.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/


Re: 2 directories same spelling one directory is camel cased

2016-10-13 Thread Torsten Bögershausen
On 12.10.16 18:05, David Brown wrote:
> Howdy git gurus,
> 
> I have the dubious distinction of working with a remote repo (master) that 
> has a class loader run-time error when cloned, built and executed.
> 
> The reason for the runtime issue is a directory hierarchical path has to 
> directories (folders) with the same name spelling but one of the directories 
> is camel-cased. The package names are the same.
> 
> The compiler doesn't care but the run-time class loader has an issue with the 
> 2 'same like named' classes.
> 
> How to remove the offending directories and files at the locally cloned repo 
> but not push 'deleted' directories and files back to origin/master the remote 
> repo?
> 
> Please advise.
> 
> Regards.
This email did not resolve from here: David Brown 

It is somewhat unclear, which issue the class loader has.
What does 
git ls-files | grep -i "sameNameBitDifferent"
say ?

What do you mean with 
"How to remove the offending directories" ?

Just run
rm -rf "offending directories" ?

Or, may be
git mv dir1 NewDir1

If you don't want to push, you don't have to, or what do I miss ?





Re: git branches & merge

2016-10-13 Thread Kevin Daudt
On Wed, Oct 12, 2016 at 04:43:07PM +0200, Anatoly Borodin wrote:
> Hi,
> 
> 
> the IP will not be overwritten, you'll still have the new IP in
> master. Nothing to worry about :)
> 
> 

To expand on that, git does a so called 3-way merge. This means git will
look for a common base commit, and compare changes from both sides to
see which side actually made a change.

In your case, the base and the release branch both should show the old
ip, and the master side would show the new IP. This tells git that
master has changed, and not the release branch, and takes the master
side of the change, resulting the new IP to show up.

Hope this helps, Kevin.


Re: Formatting problem send_mail in version 2.10.0

2016-10-13 Thread Kevin Daudt
On Wed, Oct 12, 2016 at 07:13:22PM -0400, Jeff King wrote:
> On Wed, Oct 12, 2016 at 01:53:52PM -0700, Junio C Hamano wrote:
> 
> > Matthieu Moy  writes:
> > 
> > >>> If it's not in the body of the message, then where is it?
> > >>
> > >> This point is clarified in the thread
> > >> http://marc.info/?l=linux-wireless&m=147625930203434&w=2, which is
> > >> with my upstream maintainer.
> > >
> > > Which explicitly states that the syntax is not [$number], but # $number,
> > > right?
> > 
> > But I do not think that works, either.  Let's step back.
> > 
> > People write things like these
> > 
> > Cc: Stable  # 4.8
> > Cc: Stable  [4.8+]
> > 
> > in the trailer part in the body of the message.  Are these lines
> > meant to be usable if they appear as Cc: headers of an outgoing
> > piece of e-mail as-is?
> 
> I think the answer is pretty clearly no. It's just that historically we
> have auto-munged it into something useful. I think the viable options
> are basically:
> 
>   1. Tell people not to do that, and to do something RFC compliant like
>  "Stable [4.8+]" . This is a little funny
>  for git because we otherwise do not require things like
>  rfc-compliant quoting for our name/email pairs. But it Just Works
>  without anybody having to write extra code, or worry about corner
>  cases in parsing.
> 
>   2. Drop everything after the trailing ">". This gives a valid rfc2822
>  cc, and people can pick the "# 4.8" from the cc line in the body.

Comments, surrounded by parenthesis are allowed after the ">" according
to the RFC, just plain dropping everything comming after that would
break that support.




Re: [PATCH v2 1/3] serialize collection of changed submodules

2016-10-13 Thread Heiko Voigt
On Wed, Oct 12, 2016 at 10:18:28AM -0700, Junio C Hamano wrote:
> Heiko Voigt  writes:
> 
> > Which seems quite extensively long for a static function so how about
> > we shorten it a bit and add a comment:
> >
> > /* lookup or create commit object list for submodule */
> > get_commit_objects_for_submodule_path(...
> 
> Or you can even lose "get_" and "path", I guess.  You are not even
> "getting" commits but the array that holds them, so the caller can
> use it to "get" one of them or it can even use it to "put" a new
> one, no?  "get-commit-objects" is a misnomer in that sense.  Either
> one of
> 
> get_submodule_commits_array()
> submodule_commits()
> 
> perhaps?  I dunno.

I like the last one. Will use 'submodule_commits()'.

Cheers Heiko


Re: [PATCH v3 25/25] sequencer: mark all error messages for translation

2016-10-13 Thread Johannes Schindelin
Hi,

On Wed, 12 Oct 2016, Junio C Hamano wrote:

> Johannes Sixt  writes:
> 
> > Can we please have the following change instead? I think it makes sense
> > to deviate from the usual conventions in a case like this.
> 
> You have at least two independent changes relative to Dscho's
> version.  
> 
>  (1) Show line breaks more prominently by avoiding "\n\n" and
>  breaking the string at "\n"; this matches the way how the
>  result would be displayed more closely to how the source looks
>  like.
> 
>  (2) Ignore the usual indentation rule and have messages start at
>  the left end of the source.
> 
> Which one are you saying "makes sense" to?  Both?
> 
> I guess both can be grouped together into one theme: match the way
> the final output and the source code look like.
> 
> If that is the motivation behind "makes sense", I'd prefer to see
> the change explained explicitly with that rationale in the log
> message.
> 
> Thanks.  I personally agree with that motivation (if the one I
> guessed above is your motivation, that is).

I agree with that motivation, but I decided to go about it in a way that
is more in line with the existing source code:

-- snipsnap --
diff --git a/sequencer.c b/sequencer.c
index 8e10bb5..1cf70f7 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -483,6 +483,20 @@ static char **read_author_script(void)
return env;
 }
 
+static const char staged_changes_advice[] =
+N_("you have staged changes in your working tree\n"
+"If these changes are meant to be squashed into the previous commit, run:\n"
+"\n"
+"  git commit --amend %s\n"
+"\n"
+"If they are meant to go into a new commit, run:\n"
+"\n"
+"  git commit %s\n"
+"\n"
+"In both cases, once you're done, continue with:\n"
+"\n"
+"  git rebase --continue\n");
+
 /*
  * If we are cherry-pick, and if the merge did not result in
  * hand-editing, we will hit this commit and inherit the original
@@ -509,18 +523,7 @@ static int run_git_commit(const char *defmsg, struct 
replay_opts *opts,
if (!env) {
const char *gpg_opt = gpg_sign_opt_quoted(opts);
 
-   return error(_("you have staged changes in your "
-  "working tree. If these changes are "
-  "meant to be\n"
-  "squashed into the previous commit, "
-  "run:\n\n"
-  "  git commit --amend %s\n\n"
-  "If they are meant to go into a new "
-  "commit, run:\n\n"
-  "  git commit %s\n\n"
-  "In both cases, once you're done, "
-  "continue with:\n\n"
-  "  git rebase --continue\n"),
+   return error(_(staged_changes_advice),
 gpg_opt, gpg_opt);
}
}


Re: Huge performance bottleneck reading packs

2016-10-13 Thread Jeff King
On Thu, Oct 13, 2016 at 09:17:34AM +0200, Vegard Nossum wrote:

> Oops. I disabled gc a while ago; one reason I did that is that it takes
> a long time to run and it has a tendency to kick in at the worst time. I
> guess I should really put it in cron then.
> 
> I'm not sure if this is related, but I also had a problem with GitPython
> and large pack files in the past (" ValueError: Couldn't obtain fanout
> table or warning: packfile ./objects/pack/pack cannot be accessed")

Sounds like they didn't correctly implement the extra index fanout that
happens for pack above 2G. The old Grit library had a similar bug.

> and I have pack.packSizeLimit set to 512m to fix that.
> Although the whole repo is 17G so I guess it shouldn't be necessary to
> have that many pack files.

Using packSizeLimit does "solve" that problem, but it comes with its own
set of issues. There is a very good chance that your repository would be
much smaller than 17G as a single packfile, because Git does not allow
deltas across packs, and it does not optimize the placement of objects
to keep delta-related objects in a single pack. So you'll quite often be
storing full copies of objects that could otherwise be stored as a tiny
delta.

You might want to compare the resulting size for a full repack with and
without pack.packSizeLimit.

But I agree that is not the cause of your thousand packs. They are more
likely the accumulated cruft of a thousand fetches.

-Peff


RE: [External] Re: Fork Errors

2016-10-13 Thread Vacha, Brian [USA]
Thanks, Khomoutov.  I turned out that when I went back to version 2.7.0 of Git 
(as mentioned in this post https://github.com/git-for-windows/git/issues/776) 
that I no longer received the fork errors.  However, then I received a 
Permission error and also wasn't thrilled that I had to use an old Git.  After 
some installing and uninstalling different ways, I remembered that when things 
were working for me in the past, I had the GitHub shell installed.  So, I 
installed the GitHub shell with the newest version of Git and was able to use 
its  Git Shell without problem.  Now, I can build my app with npm successfully.


-Original Message-
From: Konstantin Khomoutov [mailto:kostix+...@007spb.ru] 
Sent: Thursday, October 06, 2016 10:41 AM
To: Vacha, Brian [USA] 
Cc: git@vger.kernel.org
Subject: [External] Re: Fork Errors

On Thu, 6 Oct 2016 14:02:09 +
"Vacha, Brian [USA]"  wrote:

> When starting Git Bash, I receive the following errors:
> 0 [main] bash 18088 fork: child 14072 - died waiting for dll loading, 
> errno 11 bash: fork: retry: No child processes
> 1190419 [main] bash 18088 fork: child 8744 - died waiting for dll 
> loading, errno 11 bash: fork: retry: No child processes
> 3343518 [main] bash 18088 fork: child 12324 - died waiting for dll 
> loading, errno 11 bash: fork: retry: No child processes
> 7480858 [main] bash 18088 fork: child 17008 - died waiting for dll 
> loading, errno 11 bash: fork: retry: No child processes
> 15635036 [main] bash 18088 fork: child 8108 - died waiting for dll 
> loading, errno 11 bash: fork: Resource temporarily unavailable 
> bash-4.3$
> 
> My connection is great at 72 Mbps download and 93 Mbps upload.  I 
> don't receive other errors so it appears to be a Git Bash issue to me.

Have you tried searching through Git for Windows bugtracker [1] for your 
problem.  I'm pretty sure it was recently discussed there.
The issue #776 [2] looks like the one you're experiencing.

1. https://github.com/git-for-windows/git/issues
2. https://github.com/git-for-windows/git/issues/776


Re: What's cooking in git.git (Oct 2016, #03; Tue, 11)

2016-10-13 Thread Johannes Schindelin
Hi,

On Tue, 11 Oct 2016, Junio C Hamano wrote:

> Stefan Beller  writes:
> 
> > On Tue, Oct 11, 2016 at 2:06 PM, Junio C Hamano  wrote:
> >>
> >> * sb/submodule-ignore-trailing-slash (2016-10-10) 2 commits
> >>   (merged to 'next' on 2016-10-11 at e37425ed17)
> >>  + submodule: ignore trailing slash in relative url
> >>  + submodule: ignore trailing slash on superproject URL
> >>
> >>  A minor regression fix for "git submodule".
> >>
> >>  Will merge to 'master'.
> >
> > Going by the bug report, this *may* be more than
> > minor and worth merging down to maint as well, eventually.
> 
> The topic was forked at a reasonably old commit so that it can be
> merged as far down to maint-2.9 if we wanted to.  Which means the
> regression was fairly old and fix is not all that urgent as well.

And if you merge it to `master` and `maint`, I will humbly request to do
that at the same time as whatever fix for the regression I reported we
settle on.

I would *hate* to have a `master` (let alone a `maint`) that breaks in Git
for Windows SDK.

Ciao,
Dscho


Re: [PATCH 1/2] submodule: ignore trailing slash on superproject URL

2016-10-13 Thread Johannes Schindelin
Hi Stefan,

On Wed, 12 Oct 2016, Stefan Beller wrote:

> On Wed, Oct 12, 2016 at 6:30 AM, Johannes Schindelin
>  wrote:
> >
> > On Mon, 10 Oct 2016, Stefan Beller wrote:
> >
> >> diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
> >> index 444ec06..a7841a5 100644
> >> --- a/builtin/submodule--helper.c
> >> +++ b/builtin/submodule--helper.c
> >> @@ -95,6 +95,8 @@ static int chop_last_dir(char **remoteurl, int 
> >> is_relative)
> >>   * NEEDSWORK: This works incorrectly on the domain and protocol part.
> >>   * remote_url  url  outcome  expectation
> >>   * http://a.com/b  ../c http://a.com/c   as is
> >> + * http://a.com/b/ ../c http://a.com/c   same as previous 
> >> line, but
> >> + *   ignore trailing 
> >> slash in url
> >>   * http://a.com/b  ../../c  http://c error out
> >>   * http://a.com/b  ../../../c   http:/c  error out
> >>   * http://a.com/b  ../../../../chttp:c   error out
> >> @@ -113,8 +115,8 @@ static char *relative_url(const char *remote_url,
> >>   struct strbuf sb = STRBUF_INIT;
> >>   size_t len = strlen(remoteurl);
> >>
> >> - if (is_dir_sep(remoteurl[len]))
> >> - remoteurl[len] = '\0';
> >> + if (is_dir_sep(remoteurl[len-1]))
> >> + remoteurl[len-1] = '\0';
> >>
> >>   if (!url_is_local_not_ssh(remoteurl) || is_absolute_path(remoteurl))
> >>   is_relative = 0;
> >> diff --git a/t/t0060-path-utils.sh b/t/t0060-path-utils.sh
> >> index bf2deee..82b98f8 100755
> >> --- a/t/t0060-path-utils.sh
> >> +++ b/t/t0060-path-utils.sh
> >> @@ -319,6 +319,7 @@ test_submodule_relative_url "../" "foo/bar" 
> >> "../submodule" "../foo/submodule"
> >>  test_submodule_relative_url "../" "foo" "../submodule" "../submodule"
> >>
> >>  test_submodule_relative_url "(null)" "../foo/bar" "../sub/a/b/c" 
> >> "../foo/sub/a/b/c"
> >> +test_submodule_relative_url "(null)" "../foo/bar/" "../sub/a/b/c" 
> >> "../foo/sub/a/b/c"
> >>  test_submodule_relative_url "(null)" "../foo/bar" "../submodule" 
> >> "../foo/submodule"
> >>  test_submodule_relative_url "(null)" "../foo/submodule" "../submodule" 
> >> "../foo/submodule"
> >>  test_submodule_relative_url "(null)" "../foo" "../submodule" 
> >> "../submodule"
> >
> > I see that this already made it to `next`. I saw that because it breaks
> > the build of Git for Windows (this was not noticed earlier because other
> > compile failures prevented the tests from running), as now the test cases
> > 173 and 177 of t0060 fail (*not* the newly introduced 163).
> >
> > Here is the output with -v -x:
> >
> > -- snip --
> > [...]
> > expecting success:
> > actual=$(git submodule--helper resolve-relative-url-test 
> > '(null)' '/usr/src/git/wip/t/trash directory.t0060-path-utils/.' '../.') &&
> > test "$actual" = 'C:/git-sdk-64/usr/src/git/wip/t/trash 
> > directory.t0060-path-utils/.'
> >
> > +++ git submodule--helper resolve-relative-url-test '(null)' 
> > '/usr/src/git/wip/t/trash directory.t0060-path-utils/.' ../.
> > ++ actual=C:/git-sdk-64/usr/src/git/wip/t/.
> > ++ test C:/git-sdk-64/usr/src/git/wip/t/. = 
> > 'C:/git-sdk-64/usr/src/git/wip/t/trash directory.t0060-path-utils/.'
> 
> So this wipes away one dir too much in a test that doesn't end with a
> dir separator

The problem is not *that* simple. You see, on Windows, there are no Unixy
paths (I used to say POSIX but that is not correct, if you think of VMS
paths looking quite a bit different from what Git expects). To appease
Git's assumption about the exact form of paths, the Bash (actually, the
POSIX emulation layer called MSYS2) converts paths of the form
/c/Windows/system32/drivers/etc/hosts to
C:/Windows/system32/drivers/etc/hosts.

Please note that paths that are already in the latter form are not
touched.

And note also that URLs (actually, anything matching "^[A-Za-z]+://") are
*also* not converted.

The paths that *are* converted can also be of the form /etc/passwd, in
which case the path is prefixed with the Windows directory in which whose
usr/bin/ subdirectory the MSYS2 runtime lives.

In that latter case, i.e. Unixy paths being converted to Windows ones, the
very special case of a trailing "/." is truncated to "/" (IIRC there are
some Windows programs that do not take well to "." referring to a
directory, but my memory on that is flakey).

> (In Windows that is '/' and '\' only, no dots?)

Most Windows functions handle forward slashes just fine. Certainly all
functions involved in the code path in question.

> > One very, very ugly workaround for this newly-introduced breakage would be
> > this:
> >
> > -- snip --
> > diff --git a/t/t0060-path-utils.sh b/t/t0060-path-utils.sh
> > index 82b98f8..abd82e9 100755
> > --- a/t/t0060-path-utils.sh
> > +++ b/t/t0060-path-utils.sh
> > @@ -328,11 +328,11 @@ test_submodule_relative_url "(null)" "./foo" 
> > "../submodule"

Re: Re* [PATCH v3 05/25] sequencer: eventually release memory allocated for the option values

2016-10-13 Thread Johannes Schindelin
Hi Junio,

On Wed, 12 Oct 2016, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> > On Tue, 11 Oct 2016, Junio C Hamano wrote:
> >
> >> The only reason why the OPT_STRDUP appeared convenient was because
> >> options[] element happened to use a field in the structure directly.
> >> The patch under discussion does an equivalent of
> >> 
> >> app.x_field = xstrdup_or_null(opt_x);
> >
> > Oh, that xstrdup_or_null() function slipped by me. My local patches use it
> > now.
> 
> It has slipped many people ;-)

Thanks, I feel better now ;-)

Ciao,
Dscho


Re: [PATCH v3 13/25] sequencer: prepare for rebase -i's commit functionality

2016-10-13 Thread Johannes Schindelin
Hi Junio,

On Wed, 12 Oct 2016, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> >> Hmph, didn't we recently add parse_key_value_squoted() to build
> >> read_author_script() in builtin/am.c on top of it, so that this
> >> piece of code can also take advantage of and share the parser?
> >
> > I already pointed out that the author-script file may *not* be quoted.
> 
> I think my puzzlement comes from here.  What makes it OK for "am" to
> expect the contents of author-script file to be quoted but it is not
> OK to expect the same here?  What makes it not quoted for _this_
> reader, in other words?

The `git am` command is inherently *not* interactive, while the
interactive rebase, well, is.

As such, we must assume that enterprisey users did come up with scripts
that edit, or create, author-script files, and exploited the fact that the
interactive rebase previously sourced them.

Come to think of it, there is a bigger problem here, as users might have
abused the author-script to execute commands in rebase -i's own context.
Not sure we can do anything about that.

But the point stands, if anybody used unquoted, or differently quoted,
values in author-script, we should at least attempt within reason to
support that.

> I am not sure what you meant by "nominally related", but the purpose
> of the author-script in these two codepaths is the same, isn't it?

The purpose is, but the means aren't. As I pointed out above, the
interactive rebase is inherently much more interactive, and needs to be
much more forgiving in its input, than `git am`.

> Somebody leaves the author information from the source (either from
> an e-mailed patch or an existing commit), so that a later step can
> use that pieces of information left in the file when (re)creating a
> commit to record the tree made by using pieces of information from
> the source.
> 
> Are our use in the author-script in these two codepaths _already_
> inconsistent?  IOW, "am" never writes malformed unquoted values,
> while the sequencer writes out in a way that is randomly quoted or
> not quoted, iow, if you fed such an author-file to "am", it wouldn't
> understand it?

We heed Postel's Law: what the sequencer writes is in a very strict
format, but what the sequencer accepts need not be.

> I fully support your position to use different codepaths, if the
> file that has the same name and that is used for the same purpose
> uses different format in these two separate codepaths and the users
> already expect them to be different.  We obviously need to have two
> separate parsers.

Well, traditionally we *do* have separate parsers. I do not say that we
need to keep that, but given what I said above, it might not be a bad idea
to keep the lenient parser required by `git rebase -i` separate from the
one used by `git am` so that the latter can be faster (by making
assumptions the other parser cannot).

> But if that is not the case, IOW, if "am"'s author-script shares the
> same issue (i.e. "'am' initially writes the file properly quoted,
> but this or that can happen to change its quoting and we need to
> read from such a file"), then perhaps sharing needs to happen the
> other way around?  This patch may prepare "rebase -i" side for the
> "this or that" (I still do not know what they are) to allow the
> resulting file read correctly, but the same "this or that" can break
> what "am" has used and is in use there if that is the case, no?
> 
> What makes it OK for "am" to expect the contents of author-script
> file to be quoted but it is not OK to expect the same here?  What
> makes it not quoted for _this_ reader, and doesn't "am" share the
> same issue?

The fact that `git am` is *non-interactive*.

Ciao,
Dscho


Re: [PATCH v3 12/25] sequencer: remember the onelines when parsing the todo file

2016-10-13 Thread Johannes Schindelin
Hi Junio,

On Wed, 12 Oct 2016, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> >> > +const char *arg;
> >> > +int arg_len;
> >> >  size_t offset_in_buf;
> >> 
> >> micronit: you can make it to size_t and lose the cast below, no?
> >
> > No. The primary users of arg_len call a printf() style function with %.*s,
> > expecting an int. So your suggestion would lose one cast, but introduce at
> > least four casts in return.
> 
> Actually my point was not the number of casts required, but more
> about using the correct type to store things.  Granted, I do not
> expect each of the lines would ever get too long to exceed "int"
> (but fit in "size_t") in practice, and from that point of view, one
> may be able to argue that "int" and "size_t" are both correct types,
> but that argument applies equally to offset_in_buf, so...

You cannot make it a size_t without changing the printf() standard to
expect size_t for %.*s, because that is the intended usage of that field
in these here patches.

Ciao,
Dscho


Re: interactive rebase should better highlight the not-applying commit

2016-10-13 Thread Johannes Schindelin
Hi Joshua,

On Wed, 12 Oct 2016, Joshua N Pritikin wrote:

> On Wed, Oct 12, 2016 at 06:24:37PM +0200, Johannes Schindelin wrote:
> 
> > But maybe I read it all wrong and you do want to make this happen
> > yourself, and you simply want a little advice how to go about it?
> 
> Ugh, if you insist.

I don't. If you want that feature to see the light of day, you should
insist yourself ;-)

> > > On Tue, Oct 11, 2016 at 02:25:19PM -0700, Stefan Beller wrote:
> > > > On Tue, Oct 11, 2016 at 12:07 PM, Joshua N Pritikin 
> > > >  wrote:
> > > > However IIUC currently rebase is completely rewritten/ported to C 
> > > > where it is easier to add color support as we do have some color 
> > > > support in there already.
> > > 
> > > Sounds great. Is there a beta release that I can try out?
> > 
> > There is no release as such, unless you count Git for Windows v2.10.0.
> 
> Nope, that doesn't count. ;-)

Sometimes honesty goes too far. You basically told me that what I work on
does not count. That does not exactly curry my favor.

> > But you can try the `interactive-rebase` branch of
> > https://github.com/dscho/git; please note, though, that my main aim
> > was to be as faithful as possible in the conversion (modulo speed, of
> > course).
> 
> Hm OK
> 
> > > Sometimes I do a rebase to fix some tiny thing 10-15 commits from HEAD.
> > > Maybe only 1 file is affected and there are no merge conflicts, but when
> > > rebase reapplies all the commits, the timestamps of lots of unmodified
> > > files change even though they are unmodified compared to before the
> > > rebase.
> > 
> > Well, they *were* modified, right?
> 
> Were they? Isn't that just an artefact of the implementation?

Yes, they were modified, as the todo script you saved for the interactive
rebase to perform told it to cherry-pick those changes. That is a worktree
operation, performing on files, not a repository operation working on
objects in Git's database.

> > A workaround would be to create a new worktree using the awesome `git
> > worktree` command, perform the rebase there (on an unnamed branch --
> > AKA "detached HEAD", no relation to Helloween), and then come back to
> > the original worktree and reset --hard to the new revision. That reset
> > would detect that there are actually no changes required to said
> > files.
> 
> What would be the problem with doing this by default? Or could it be a
> configuration option that can be enabled?

It could definitely be a new feature that is triggered by a new (opt-in)
configuration option.

It cannot be on by default, at least not in the short run, because those
cherry-picks can fail with merge conflicts and power users of the
interactive rebase expect those conflicts to show in the current worktree.

Ciao,
Johannes


Re: problem with git worktree and git svn

2016-10-13 Thread Duy Nguyen
On Thu, Oct 13, 2016 at 8:52 AM, Eric Wong  wrote:
> +sub svn_dir {
> +   my $git_dir = scalar @_ ? $_[0] : $ENV{GIT_DIR};
> +   my $common = $ENV{GIT_COMMON_DIR} || "$git_dir/commondir";
> +   $git_dir .= '/'.::file_to_s($common) if -e $common;
> +   my $svn_dir = $git_dir . '/svn';
> +   $svn_dir =~ tr!/!/!s;
> +   $svn_dir;
> +}


If this is shell script, this function could be just

svn_dir() {
git rev-parse --git-path svn
}

which should give you correct path in either single or multi-worktree
context and you don't need to bother with details like
$GIT_COMMON_DIR. But I don't know how Perl bindings are implemented, I
don't know if we have something similar (or easy to add it, like
Git::git_path()).

I don't know much about git-svn, but from the look of it I agree
replacing $ENV{GIT_DIR}/svn with svn_dir() should fix it, assuming
that you don't hard code $ENV{GIT_DIR}/blahblah somewhere else. I
don't see any other potential problems (from multi-worktree
perspective).
-- 
Duy


Re: [PATCH] worktree: allow the main brach of a bare repository to be checked out

2016-10-13 Thread Duy Nguyen
On Thu, Oct 13, 2016 at 1:50 AM, Junio C Hamano  wrote:
> Junio C Hamano  writes:
>
>> Dennis Kaarsemaker  writes:
>>
>>> OK, so here it is as a proper patch.
>
> Here is what I queued.  Duy, what do you think?  It seems OK to me.

Ack. Thanks both.
-- 
Duy


Re: Huge performance bottleneck reading packs

2016-10-13 Thread Vegard Nossum

On 10/13/2016 12:45 AM, Junio C Hamano wrote:
> Vegard Nossum  writes:
>
>> A closer inspection reveals the problem to really be that this is an
>> extremely hot path with more than -- holy cow -- 4,106,756,451
>> iterations on the 'packed_git' list for a single 'git fetch' on my
>> repository. I'm guessing the patch above just made the inner loop
>> ever so slightly slower.
>
> Very plausible, and this ...
>
>> My .git/objects/pack/ has ~2088 files (1042 idx files, 1042 pack files,
>> and 4 tmp_pack_* files).
>
> ... may explain why nobody else has seen a difference.
>
> Is there a reason why your repository has that many pack files?  Is
> automatic GC not working for some reason?

Oops. I disabled gc a while ago; one reason I did that is that it takes
a long time to run and it has a tendency to kick in at the worst time. I
guess I should really put it in cron then.

I'm not sure if this is related, but I also had a problem with GitPython
and large pack files in the past (" ValueError: Couldn't obtain fanout
table or warning: packfile ./objects/pack/pack cannot be accessed")
and I have pack.packSizeLimit set to 512m to fix that.
Although the whole repo is 17G so I guess it shouldn't be necessary to
have that many pack files.

Will try Jeff's patch, then a gc. Thanks!


Vegard


Re: Huge performance bottleneck reading packs

2016-10-13 Thread Vegard Nossum

On 10/13/2016 01:47 AM, Jeff King wrote:

On Wed, Oct 12, 2016 at 07:18:07PM -0400, Jeff King wrote:


Also, is it possible to make the repository in question available? I
might be able to reproduce based on your description, but it would save
time if I could directly run gdb on your example.


I won't be able to make the repository available, sorry.


I tried this by making a bunch of packs in linux.git (my standard "this
is pretty big" repo), like so:

  for i in $(seq 1000); do
git rev-list --objects HEAD~$((i+1))..HEAD~$i |
git pack-objects --delta-base-offset .git/objects/pack/pack
  done

and then doing a 25,000-object fetch from upstream (no significance to
the number; that's just how far behind upstream I happened to be).

However, I didn't notice any regression. In fact, it was much _slower_
than v1.9.0, because that older version didn't have threaded index-pack.

If you can't share the repo directly, can you tell us more about your
fetch? How many objects are in your repository? How many objects are
fetched? How many refs are there on the remote side?


The fetch doesn't actually get anything from the remote as everything is
already up to date (that makes the 2m40s times even more frustrating in
a way :-)). Here's count-objects:

$ git count-objects -v
warning: garbage found: .git/objects/pack/tmp_pack_pAZcu4
warning: garbage found: .git/objects/pack/tmp_pack_KhzrrI
warning: garbage found: .git/objects/pack/tmp_pack_mycfro
warning: garbage found: .git/objects/pack/tmp_pack_2kxKOn
count: 51609
size: 288768
in-pack: 23902336
packs: 1044
size-pack: 16588157
prune-packable: 48628
garbage: 4
size-garbage: 84792

There are some 20k refs on the remote, closer to 25k locally.

I'll try to get a profile (without your patch) before doing a gc run.


Vegard


Re: Huge performance bottleneck reading packs

2016-10-13 Thread Vegard Nossum

On 10/13/2016 01:01 AM, Jeff King wrote:

On Thu, Oct 13, 2016 at 12:30:52AM +0200, Vegard Nossum wrote:


However, the commit found by 'git blame' above appears just fine to me,
I haven't been able to spot a bug in it.

A closer inspection reveals the problem to really be that this is an
extremely hot path with more than -- holy cow -- 4,106,756,451
iterations on the 'packed_git' list for a single 'git fetch' on my
repository. I'm guessing the patch above just made the inner loop
ever so slightly slower.

My .git/objects/pack/ has ~2088 files (1042 idx files, 1042 pack files,
and 4 tmp_pack_* files).


Yeah. I agree that the commit you found makes the check a little more
expensive, but I think the root of the problem is calling
prepare_packed_git_one many times. This _should_ happen once for each
pack at program startup, and possibly again if we need to re-scan the
pack directory to account for racing with a simultaneous repack.

The latter is generally triggered when we fail to look up an object we
expect to exist. So I'd suspect 45e8a74 (has_sha1_file: re-check pack
directory before giving up, 2013-08-30) is playing a part. We dealt with
that to some degree in 0eeb077 (index-pack: avoid excessive re-reading
of pack directory, 2015-06-09), but it would not surprise me if there is
another spot that needs similar treatment.

Does the patch below help?


Yes, ~2m10s -> ~1m25s when I test a git fetch this morning (the other
variation in time may be due to different CPU usage by other programs,
but I ran with/without the patch multiple times and the difference is
consistent).

Thanks,


Vegard