date:20140212

Re: [PATCH v2] Make the global packed_git variable static to sha1_file.c.

2014-02-12 Thread Junio C Hamano

sza...@chromium.org writes:

> From 0a59547f3e95ddecf7606c5f259ae6177c5a104f Mon Sep 17 00:00:00 2001

Please drop this line.

> From: Stefan Zager 

Please drop this line and instead have it in your e-mail header.

> Date: Mon, 10 Feb 2014 16:55:12 -0800

The date in your e-mail header, which is the first time general
public saw this particular version of the patch, is used by default
as the author time.  Unless there is a compelling reason to override
that with an in-body header, please drop this line.

> Subject: [PATCH] Make the global packed_git variable static to sha1_file.c.

Please drop this line and instead have it in your e-mail header.

> This patch is a pure refactor with no functional changes,...
>
> diff --git a/builtin/count-objects.c b/builtin/count-objects.c
> index a7f70cb..6554dfe 100644
> --- a/builtin/count-objects.c
> +++ b/builtin/count-objects.c
> @@ -83,14 +83,32 @@ static char const * const count_objects_usage[] = {
>   NULL
>  };
>  
> +struct pack_data {
> + unsigned long packed;
> + off_t size_pack;
> + unsigned long num_pack;
> +};
> +
> +int pack_data_fn(struct packed_git *p, void *data)

Can't/shouldn't this be static?

Also I'd suggest s/pack_data_fn/collect_pack_data/ or something.
"_fn" may be a good suffix for typedef'ed typename used in a
callback function, but for a concrete function, it only adds noise
without giving us anything new.

Yes, I know there are a few existing violators, but we
shouldn't make the codebase worse, using their existence due
to past carelessness as an excuse.

> diff --git a/cache.h b/cache.h
> index dc040fb..542a9d9 100644
> --- a/cache.h
> +++ b/cache.h
> ...
> +/* The 'hint' argument is for the commonly-used 'last found pack' 
> optimization.
> + * It can be NULL.
> + */

/*
 * Please try to have opening slash-asterisk and closing
 * asterisk-slash in a multi-line comment block on their
 * own lines by themselves.
 */

> +extern void foreach_packed_git(packed_git_foreach_fn fn, struct packed_git 
> *hint, void *data);
> +
> +extern size_t packed_git_count();
> +extern size_t packed_git_local_count();

extern size_t packed_git_count(void);
extern size_t packed_git_local_count(void);

> diff --git a/sha1_file.c b/sha1_file.c
> index 6e8c05d..aeeb7e6 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -60,6 +60,7 @@ static struct cached_object empty_tree = {
>   0
>  };
>  
> +static struct packed_git *packed_git;
>  static struct packed_git *last_found_pack;
>  
>  static struct cached_object *find_cached_object(const unsigned char *sha1)
> @@ -468,7 +469,6 @@ static unsigned int pack_open_fds;
>  static unsigned int pack_max_fds;
>  static size_t peak_pack_mapped;
>  static size_t pack_mapped;
> -struct packed_git *packed_git;

Hmm, any particular reason why only this variable and not others are
moved up?

> @@ -1091,6 +1091,37 @@ struct packed_git *add_packed_git(const char *path, 
> int path_len, int local)
>   return p;
>  }
>  
> +void foreach_packed_git(packed_git_foreach_fn fn, struct packed_git *hint, 
> void *data)
> +{
> + struct packed_git *p;
> + if (hint && ((*fn)(hint, data)))
> + return;
> + for (p = packed_git; p; p = p->next)
> + if (p != hint && (*fn)(p, data))
> + return;

(mental note) In the new API, a non-zero return signals an early
return/break from the loop.

> +}
> +
> +size_t packed_git_count()

size_t packed_git_count(void)

> +{
> + size_t res = 0;
> + struct packed_git *p;
> +
> + for (p = packed_git; p; p = p->next)
> + ++res;

When pre- or post- increment does not make any difference (i.e. you
do not use its value), please stick to post-increment.  pre-increment
looks unusual in our codebase and becomes distracting while reading
it through.

Same comments for packed-git-local-count.

> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
> index 541667f..bc3074b 100644
> --- a/builtin/pack-objects.c
> +++ b/builtin/pack-objects.c
> @@ -900,14 +900,45 @@ static int no_try_delta(const char *path)
>   return 0;
>  }
>  
> +struct find_pack_data {
> + const unsigned char *sha1;
> + off_t offset;
> + struct packed_git *found_pack;
> + int exclude;
> + int found_non_local_pack;
> + int found_pack_keep;
> +};
> +
> +static int find_pack_fn(struct packed_git *p, void *data)
> +{
> + struct find_pack_data *fpd = (struct find_pack_data *) data;
> + off_t offset = find_pack_entry_one(fpd->sha1, p);
> + if (offset) {
> + if (!fpd->found_pack) {
> + if (!is_pack_valid(p)) {
> + warning("packfile %s cannot be accessed", 
> p->pack_name);
> + return 0;
> + }
> + fpd->offset = offset;
> + fpd->found_pack = p;
> + }
> + if (fpd->exclude)
> + return 1;
> +

Re: [PATCH] tests: turn on network daemon tests by default

2014-02-12 Thread Jeff King

On Tue, Feb 11, 2014 at 03:58:27PM -0800, Junio C Hamano wrote:

> Sure. One immediate complaint is that I would probably need to do
> something like this in the build automation:
> 
>   if testing a branch without this patch
> then
>   : do nothing
>   else
>   GIT_TEST_GIT_DAEMON=false
>   fi
> 
> Arguably, it is the fault of the current/original code that treated
> *any* non-empty value that is set in the environment variable as
> "true"---if it paid attention to GIT_TEST_GIT_DAEMON=NoThanks, we
> wouldn't have to have a workaround like this.

Yes, I didn't really think about build config that works reliably
between both versions (though personally, I think you should be building
with GIT_TEST_GIT_DAEMON=true :) ).

> I wonder if GIT_TEST_X=$(test_tristate "$GIT_TEST_X") pattern can be
> made a bit more friendly, though.  For example, can we behave
> differently depending on the reason why $GIT_TEST_X is empty?
> 
>  - People who have *not* been opting in to the expensive tests have
>not done anything special; GIT_TEST_X environment variable did
>not exist for them (i.e. unset), and we used to skip when
>"$GIT_TEST_X" is an empty string.
> 
>  - We want to encourage people who do not care to run these tests.
>If people do not do anything, their $GIT_TEST_X will continue to
>be an empty string without GIT_TEST_X variable in the
>environment.
> 
> If we let people who *do* want to opt out of the expensive tests by
> explicitly setting $GIT_TEST_X to an empty string in the new scheme,
> wouldn't the transition go a lot smoother?

Hmm. So you are suggesting that the old code treated "undefined" and
"empty" the same (as "false"). But that in the new code, we would treat
them _differently_, taking undefined to mean "auto" and empty to mean
"false". I suppose that works, but it is rather unfortunate that the end
state we are left with (for all time) makes such a confusing and subtle
distinction.

I think this should work OK with the existing Makefile conventions. That
is, we do not ever set GIT_TEST_HTTPD in the Makefile ourselves, but
rely on it being either unset or set to whatever the user likes (this is
opposed to something like CFLAGS, where the distinction is long gone).

So I'm not excited about it, but I do not think there is any other
loophole through which we can maintain compatibility. If that's
important, I think we have to do it.

> The caller may have to pass the name of the variable:
> 
>   GIT_TEST_DAEMON=$(test_tristate GIT_TEST_DAEMON)

I don't think that's a big deal. I actually was tempted to just make
this:

  test_normalize_tristate GIT_TEST_DAEMON

in the first place, since you would always want to look at the
normalized value from there on out.

> and then the callee may become
> 
>   test_tristate () {
>   variable=$1
> if eval "test x\"\${$variable+isset}\" = xisset"

Hmm, today I learned about "{foo:+bar}" versus "${foo+bar}". I'm not
sure how that bit of shell trivia escaped me for so many years.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Process of updating a local branch from another with tracking

2014-02-12 Thread Robert Dailey

I have the following alias defined:
sync = "!f() { cbr=$(git rev-parse --abbrev-ref HEAD); git co $1 &&
git pull && git co $cbr && git rebase $1; }; f"

The goal is to basically update a local branch which tracks a branch
on a remote, and then rebase my local topic branch onto that updated
local branch.

I do this instead of just rebasing onto origin/master. Example:

git checkout master
git pull --rebase origin master
git checkout topic1
git rebase master

I could do this instead but not sure if it is recommended:

git checkout topic1
git fetch
git rebase origin/master

Any thoughts on the alias I created? I'm a Windows user and still new
to linux stuff, so if it sucks or could be simplified just let me
know.

And as a secondary question, just curious what people think about
rebasing onto a remote tracking branch. When I do merges I usually
refer to the remote branch, but during rebase I use the local branch
instead, but I don't know if there is any functional reason to not
skip the local branch and go straight to the remote tracking branch
(it saves a step).
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

What's cooking in git.git (Feb 2014, #04; Wed, 12)

2014-02-12 Thread Junio C Hamano

Here are the topics that have been cooking.  Commits prefixed with
'-' are only in 'pu' (proposed updates) while commits prefixed with
'+' are in 'next'.

As a workaround to make life easier for third-party tools, the
upcoming major release will be called "Git 1.9.0" (not "Git 1.9").
The first maintenance release for it will be "Git 1.9.1", and the
major release after "Git 1.9.0" will either be "Git 2.0.0" or "Git
1.10.0".

You can find the changes described here in the integration branches
of the repositories listed at

http://git-blame.blogspot.com/p/git-public-repositories.html

--
[New Topics]

* al/docs (2014-02-11) 4 commits
 - docs/git-blame: explain more clearly the example pickaxe use
 - docs/git-clone: clarify use of --no-hardlinks option
 - docs/git-remote: capitalize first word of initial blurb
 - docs/merge-strategies: remove hyphen from mis-merges

 A handful of documentation updates, all trivially harmless.

 Will merge to 'next'.


* jk/test-ports (2014-02-10) 2 commits
 - tests: auto-set git-daemon port
 - tests: auto-set LIB_HTTPD_PORT from test name
 (this branch is tangled with nd/http-fetch-shallow-fix.)

 Avoid having to assign port number to be used in tests manually.

 Will merge to 'next'.


* nd/daemonize-gc (2014-02-10) 2 commits
 - gc: config option for running --auto in background
 - daemon: move daemonize() to libgit.a

 Allow running "gc --auto" in the background.

 Will merge to 'next'.


* nd/gitignore-trailing-whitespace (2014-02-10) 2 commits
 - dir: ignore trailing spaces in exclude patterns
 - dir: warn about trailing spaces in exclude patterns

 Warn and then ignore trailing whitespaces in .gitignore files,
 unless they are quoted for fnmatch(3), e.g. "path\ ".


* nd/log-show-linear-break (2014-02-10) 1 commit
 - log: add --show-linear-break to help see non-linear history


* ss/completion-rec-sub-fetch-push (2014-02-11) 1 commit
 - completion: teach --recurse-submodules to fetch, pull and push


* ks/tree-diff-more (2014-02-12) 16 commits
 - tree-diff: reuse base str(buf) memory on sub-tree recursion
 - tree-diff: no need to call "full" diff_tree_sha1 from show_path()
 - tree-diff: rework diff_tree interface to be sha1 based
 - tree-diff: remove special-case diff-emitting code for empty-tree cases
 - tree-diff: simplify tree_entry_pathcmp
 - tree-diff: show_path prototype is not needed anymore
 - tree-diff: rename compare_tree_entry -> tree_entry_pathcmp
 - tree-diff: move all action-taking code out of compare_tree_entry()
 - tree-diff: don't assume compare_tree_entry() returns -1,0,1
 - FIXUP!
 - tree-diff: consolidate code for emitting diffs and recursion in one place
 - tree-diff: show_tree() is not needed
 - tree-diff: no need to pass match to skip_uninteresting()
 - tree-diff: no need to manually verify that there is no mode change for a path
 - combine-diff: move changed-paths scanning logic into its own function
 - combine-diff: move show_log_first logic/action out of paths scanning
 (this branch uses ks/combine-diff and ks/tree-diff-walk.)


* jh/note-trees-record-blobs (2014-02-12) 1 commit
 - notes: Disallow reusing non-blob as a note object


* jk/run-network-tests-by-default (2014-02-12) 1 commit
 - tests: turn on network daemon tests by default

 Teach "make test" to run networking tests when possible by default.

 Needs a bit more work.
 e.g. $gmane/242013

--
[Stalled]

* po/everyday-doc (2014-01-27) 1 commit
 - Make 'git help everyday' work

 This may make the said command to emit something, but the source is
 not meant to be formatted into a manual pages to begin with, and
 also its contents are a bit stale.  It may be a good first step in
 the right direction, but needs more work to at least get the
 mark-up right before public consumption.

 Will hold.


* jk/branch-at-publish-rebased (2014-01-17) 5 commits
 - t1507 (rev-parse-upstream): fix typo in test title
 - implement @{publish} shorthand
 - branch_get: provide per-branch pushremote pointers
 - branch_get: return early on error
 - sha1_name: refactor upstream_mark

 Give an easier access to the tracking branches from "other" side in
 a triangular workflow by introducing B@{publish} that works in a
 similar way to how B@{upstream} does.

 Meant to be used as a basis for whatever Ram wants to build on.

 Will hold.


* rb/merge-prepare-commit-msg-hook (2014-01-10) 4 commits
 - merge: drop unused arg from abort_commit method signature
 - merge: make prepare_to_commit responsible for write_merge_state
 - t7505: ensure cleanup after hook blocks merge
 - t7505: add missing &&

 Expose more merge states (e.g. $GIT_DIR/MERGE_MODE) to hooks that
 run during "git merge".  The log message stresses too much on one
 hook, prepare-commit-msg, but it would equally apply to other hooks
 like post-merge, I think.

 Waiting for a reroll.


* jl/submodule-recursive-checkout (2013-12-26) 5 commits
 - Teach checkout

Re: [PATCH] tests: turn on network daemon tests by default

2014-02-12 Thread Jeff King

On Wed, Feb 12, 2014 at 11:06:43AM -0800, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > Agreed, but I think the only way to know the size of those fallouts is
> > to try it and see who complains.  I would not normally be so cavalier
> > with git itself, but I think for the test infrastructure, we have a
> > small, tech-savvy audience that can help us iterate on it without too
> > much pain.
> 
> There is another.

I somehow pictured this while reading your email:
http://youtu.be/X4JVcvR7IM0

> $ GIT_TEST_HTTPD=false sh t5537-fetch-shallow.sh 
> ok 1 - setup
> ok 2 - setup shallow clone
> ok 3 - clone from shallow clone
> ok 4 - fetch from shallow clone
> ok 5 - fetch --depth from shallow clone
> ok 6 - fetch --unshallow from shallow clone
> ok 7 - fetch something upstream has but hidden by clients shallow boundaries
> ok 8 - fetch that requires changes in .git/shallow is filtered
> ok 9 - fetch --update-shallow
> error: Can't use skip_all after running some tests

Ah, yeah, I did not notice that t5537 does its own special handling of
GIT_TEST_HTTPD. I think it is wrong to do so, and it is already buggy in
the case that GIT_TEST_HTTPD is set, but apache cannot be started.

E.g., with the current "master":

  $ sudo chmod -x `which apache2`
  $ GIT_TEST_HTTPD=1 ./t5537-fetch-shallow.sh
  ok 1 - setup
  ok 2 - setup shallow clone
  ok 3 - clone from shallow clone
  ok 4 - fetch from shallow clone
  ok 5 - fetch --depth from shallow clone
  ok 6 - fetch --unshallow from shallow clone
  ok 7 - fetch something upstream has but hidden by clients shallow boundaries
  ok 8 - fetch that requires changes in .git/shallow is filtered
  ok 9 - fetch --update-shallow
  error: Can't use skip_all after running some tests

lib-httpd was never designed to be included from anywhere except the
beginning of the file. But that wouldn't be right for t5537, because it
wants to run some of the tests, even if apache setup fails. The right
way to do it is probably to have lib-httpd do all of its work in a lazy
prereq. I don't know how clunky that will end up, though; it might be
simpler to just move the shallow http test into one of the http-fetch
scripts.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] tests: turn on network daemon tests by default

2014-02-12 Thread Junio C Hamano

Jeff King  writes:

>   test_normalize_tristate GIT_TEST_DAEMON

Heh, great minds think alike.  This is what I am playing with,
without committing (because I do like your "ask config if this is a
kind of various boolean 'false' representations, which I haven't
managed to add to it).


 t/lib-git-daemon.sh |  2 +-
 t/lib-httpd.sh  |  2 +-
 t/test-lib-functions.sh | 45 +++--
 3 files changed, 29 insertions(+), 20 deletions(-)

diff --git a/t/lib-git-daemon.sh b/t/lib-git-daemon.sh
index 36106de..615bf5d 100644
--- a/t/lib-git-daemon.sh
+++ b/t/lib-git-daemon.sh
@@ -16,7 +16,7 @@
 #  stop_git_daemon
 #  test_done
 
-GIT_TEST_GIT_DAEMON=$(test_tristate "$GIT_TEST_GIT_DAEMON")
+test_tristate GIT_TEST_GIT_DAEMON
 if test "$GIT_TEST_GIT_DAEMON" = false
 then
skip_all="git-daemon testing disabled (unset GIT_TEST_GIT_DAEMON to 
enable)"
diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh
index 583fb14..f9c2e22 100644
--- a/t/lib-httpd.sh
+++ b/t/lib-httpd.sh
@@ -30,7 +30,7 @@
 # Copyright (c) 2008 Clemens Buchacher 
 #
 
-GIT_TEST_HTTPD=$(test_tristate "$GIT_TEST_HTTPD")
+test_tristate GIT_TEST_HTTPD
 if test "$GIT_TEST_HTTPD" = false
 then
skip_all="Network testing disabled (unset GIT_TEST_HTTPD to enable)"
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 3cc09c0..21c5214 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -716,27 +716,36 @@ perl () {
command "$PERL_PATH" "$@"
 }
 
-# Normalize the value given in $1 to one of "true", "false", or "auto". The
-# result is written to stdout. E.g.:
+# Given a variable $1, normalize the value of it to one of "true",
+# "false", or "auto" and store the result to it.
 #
-# GIT_TEST_HTTPD=$(test_tristate "$GIT_TEST_HTTPD")
+# test_tristate GIT_TEST_HTTPD
 #
+# A variable set to an empty string is set to 'false'.
+# A variable set to 'false' or 'auto' keeps its value.
+# Anything else is set to 'true'.
+# An unset variable defaults to 'auto'.
+#
+# The last rule is to allow people to set the variable to an empty
+# string and export it to decline testing the particular feature
+# for versions both before and after this change.  We used to treat
+# both unset and empty variable as a signal for "do not test" and
+# took any non-empty string as "please test".
+
 test_tristate () {
-   case "$1" in
-   ""|auto)
-   echo auto
-   ;;
-   *)
-   # Rely on git-config to handle all the variants of
-   # true/1/on/etc that we allow.
-   if ! git -c magic.hack="$1" config --bool magic.hack 2>/dev/null
-   then
-   # If git-config failed, it was some non-bool value like
-   # YesPlease. Default this to "true" for historical
-   # compatibility.
-   echo true
-   fi
-   esac
+   if eval "test x\"\${$1+isset}\" = xisset"
+   then
+   # explicitly set
+   eval "
+   case \"\$$1\" in
+   '') $1=false ;;
+   false | auto) ;;
+   *)  $1=true ;;
+   esac
+   "
+   else
+   eval "$1=auto"
+   fi
 }
 
 # Exit the test suite, either by skipping all remaining tests or by
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Make the git codebase thread-safe

2014-02-12 Thread Junio C Hamano

On Wed, Feb 12, 2014 at 12:27 PM, Stefan Zager  wrote:
> On Wed, Feb 12, 2014 at 12:06 PM, Junio C Hamano  wrote:
>> Stefan Zager  writes:
>>
>> Calls to write (and preparation of data to be written) will then
>> remain single-threaded, but it sounds like that codepath is not the
>> bottleneck in your measurement, so
>
> Yes, I considered that as well.  At a minimum, that would still
> require attr.c to implement thread locking, since attribute files must
> be parsed to look for stream filters.  I have already done that work.

I would have imagined that use of the attribute system belongs to "write and
preparation of data to be written" category, i.e. the single threaded
part of the
kludge I outlined.

> But I'm not sure it's the best long-term approach to add convoluted
> custom threading solutions to each git operation as it appears on the
> performance radar.

Yeah, it depends on how clean and non-intrusive an abstraction we can make.
The kludge I outlined is certainly not very pretty.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Make the git codebase thread-safe

2014-02-12 Thread Mike Hommey

On Wed, Feb 12, 2014 at 08:15:24PM +0100, David Kastrup wrote:
> Stefan Zager  writes:
> 
> > On Wed, Feb 12, 2014 at 10:50 AM, David Kastrup  wrote:
> >
> >> Really, give the above patch a try.  I am taking longer to finish it
> >> than anticipated (with a lot due to procrastination but that is,
> >> unfortunately, a large part of my workflow), and it's cutting into my
> >> "paychecks" (voluntary donations which to a good degree depend on timely
> >> and nontrivial progress reports for my freely available work on GNU
> >> LilyPond).
> >
> > I will give that a try.  How much of a performance improvement have
> > you clocked?
> 
> Depends on file type and size.  With large files with lots of small
> changes, performance improvements get more impressive.
> 
> Some ugly real-world examples are the Emacs repository, src/xdisp.c
> (performance improvement about a factor of 3), a large file in the style
> of /usr/share/dict/words clocking in at a factor of about 5.
> 
> Again, that's with an SSD and ext4 filesystem on GNU/Linux, and there
> are no improvements in system time (I/O) except for patch 4 of the
> series which helps perhaps 20% or so.
> 
> So the benefits of the patch will come into play mostly for big, bad
> files on Windows: other than that, the I/O time is likely to be the
> dominant player anyway.

How much fragmentation does that add to the files, though?

Mike
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] contrib/diff-highlight: multibyte characters diff

2014-02-12 Thread Thomas Adam

On 12 February 2014 20:59, Jeff King  wrote:
> +sub decode {
> +   my $orig = shift;
> +   my $decoded = eval { decode_utf8($orig, Encode::FB_CROAK) };
> +   return defined $decoded ?

I'd still advocate checking $@ here, rather than the defined $decoded check.

> +  ($decoded, sub { encode_utf8(shift) }) :
> +  ($orig, sub { shift });
> +}
> +

-- Thomas Adam
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] contrib/diff-highlight: multibyte characters diff

2014-02-12 Thread Jeff King

On Wed, Feb 12, 2014 at 11:10:49PM +, Thomas Adam wrote:

> On 12 February 2014 20:59, Jeff King  wrote:
> > +sub decode {
> > +   my $orig = shift;
> > +   my $decoded = eval { decode_utf8($orig, Encode::FB_CROAK) };
> > +   return defined $decoded ?
> 
> I'd still advocate checking $@ here, rather than the defined $decoded check.

I don't mind changing it, but for my edification, what is the advantage?

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Make the git codebase thread-safe

2014-02-12 Thread Mike Hommey

On Wed, Feb 12, 2014 at 12:00:19PM +0100, Karsten Blees wrote:
> Am 12.02.2014 04:43, schrieb Duy Nguyen:
> > On Wed, Feb 12, 2014 at 9:02 AM, Robin H. Johnson  
> > wrote:
> >> On Tue, Feb 11, 2014 at 05:54:51PM -0800,  Stefan Zager wrote:
> >>> We in the chromium project have a keen interest in adding threading to
> >>> git in the pursuit of performance for lengthy operations (checkout,
> >>> status, blame, ...).  Our motivation comes from hitting some
> >>> performance walls when working with repositories the size of chromium
> >>> and blink:
> >> +1 from Gentoo on performance improvements for large repos.
> >>
> >> The main repository in the ongoing Git migration project looks to be in
> >> the 1.5GB range (and for those that want to propose splitting it up, we
> >> have explored that option and found it lacking), with very deep history
> >> (but no branches of note, and very few tags).
> > 
> > From v1.9 shallow clone should work for all push/pull/clone... so
> > history depth does not matter (on the client side). As for
> > gentoo-x86's large worktree, using index v4 and avoid full-tree
> > operations (e.g. "status .", not "status"..) should make all
> > operations reasonably fast. I plan to make "status" fast even without
> > path limiting with the help of inotify, but that's not going to be
> > finished soon. Did I miss anything else?
> > 
> 
> Regarding git-status on msysgit, enable core.preloadindex and core.fscache 
> (as of 1.8.5.2).
> 
> There's no inotify on Windows, and I gave up using ReadDirectoryChangesW to
> keep fscache up to date, as it _may_ report DOS file names (e.g. C:\PROGRA~1
> instead of C:\Program Files).

You can use GetLongPathNameW to get the latter from the former.

Mike
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Make the git codebase thread-safe

2014-02-12 Thread Karsten Blees

Am 13.02.2014 00:03, schrieb Mike Hommey:
> On Wed, Feb 12, 2014 at 12:00:19PM +0100, Karsten Blees wrote:
>> Am 12.02.2014 04:43, schrieb Duy Nguyen:
>>> On Wed, Feb 12, 2014 at 9:02 AM, Robin H. Johnson  
>>> wrote:
 On Tue, Feb 11, 2014 at 05:54:51PM -0800,  Stefan Zager wrote:
> We in the chromium project have a keen interest in adding threading to
> git in the pursuit of performance for lengthy operations (checkout,
> status, blame, ...).  Our motivation comes from hitting some
> performance walls when working with repositories the size of chromium
> and blink:
 +1 from Gentoo on performance improvements for large repos.

 The main repository in the ongoing Git migration project looks to be in
 the 1.5GB range (and for those that want to propose splitting it up, we
 have explored that option and found it lacking), with very deep history
 (but no branches of note, and very few tags).
>>>
>>> From v1.9 shallow clone should work for all push/pull/clone... so
>>> history depth does not matter (on the client side). As for
>>> gentoo-x86's large worktree, using index v4 and avoid full-tree
>>> operations (e.g. "status .", not "status"..) should make all
>>> operations reasonably fast. I plan to make "status" fast even without
>>> path limiting with the help of inotify, but that's not going to be
>>> finished soon. Did I miss anything else?
>>>
>>
>> Regarding git-status on msysgit, enable core.preloadindex and core.fscache 
>> (as of 1.8.5.2).
>>
>> There's no inotify on Windows, and I gave up using ReadDirectoryChangesW to
>> keep fscache up to date, as it _may_ report DOS file names (e.g. C:\PROGRA~1
>> instead of C:\Program Files).
> 
> You can use GetLongPathNameW to get the latter from the former.
> 
> Mike
> 

Except if its a delete or rename notification...my final ReadDirectoryChangesW 
version cached the files by their long _and_ short names, but was so complex 
that it slowed most commands down rather than speeding them up :-)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] contrib/diff-highlight: multibyte characters diff

2014-02-12 Thread brian m. carlson

On Wed, Feb 12, 2014 at 06:27:40PM -0500, Jeff King wrote:
> On Wed, Feb 12, 2014 at 11:10:49PM +, Thomas Adam wrote:
> 
> > On 12 February 2014 20:59, Jeff King  wrote:
> > > +sub decode {
> > > +   my $orig = shift;
> > > +   my $decoded = eval { decode_utf8($orig, Encode::FB_CROAK) };
> > > +   return defined $decoded ?
> > 
> > I'd still advocate checking $@ here, rather than the defined $decoded check.
> 
> I don't mind changing it, but for my edification, what is the advantage?

The documentation for decode_utf8 isn't clear, but I don't know if it
can ever return undef.  What, for example, does it return if $orig is
not defined?  That's the benefit: it's immediately clear to the user
that you're interested in whether it threw an exception, rather than
whether it produced a given value.

That said, $DAYJOB is a Perl shop, and I would certainly not reject this
code in review, and depending on the situation, I might even write
something like this.  I personally think it's fine.

-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187

signature.asc
Description: Digital signature

Re: [PATCH] tests: turn on network daemon tests by default

2014-02-12 Thread Duy Nguyen

On Thu, Feb 13, 2014 at 5:12 AM, Jeff King  wrote:
> lib-httpd was never designed to be included from anywhere except the
> beginning of the file. But that wouldn't be right for t5537, because it
> wants to run some of the tests, even if apache setup fails. The right
> way to do it is probably to have lib-httpd do all of its work in a lazy
> prereq. I don't know how clunky that will end up, though; it might be
> simpler to just move the shallow http test into one of the http-fetch
> scripts.

I'll move it out later.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] contrib/diff-highlight: multibyte characters diff

2014-02-12 Thread Jeff King

On Thu, Feb 13, 2014 at 01:17:54AM +, brian m. carlson wrote:

> On Wed, Feb 12, 2014 at 06:27:40PM -0500, Jeff King wrote:
> > On Wed, Feb 12, 2014 at 11:10:49PM +, Thomas Adam wrote:
> > 
> > > On 12 February 2014 20:59, Jeff King  wrote:
> > > > +sub decode {
> > > > +   my $orig = shift;
> > > > +   my $decoded = eval { decode_utf8($orig, Encode::FB_CROAK) };
> > > > +   return defined $decoded ?
> > > 
> > > I'd still advocate checking $@ here, rather than the defined $decoded 
> > > check.
> > 
> > I don't mind changing it, but for my edification, what is the advantage?
> 
> The documentation for decode_utf8 isn't clear, but I don't know if it
> can ever return undef.  What, for example, does it return if $orig is
> not defined?  That's the benefit: it's immediately clear to the user
> that you're interested in whether it threw an exception, rather than
> whether it produced a given value.

I'd argue that I am more interested in whether it returned a value. Let
us imagine for a moment that decode_utf8 could return undef without
throwing an exception. What should the function return in such a case?

I think the only sensible thing is the original (and to indicate that
the result was not converted).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Make the git codebase thread-safe

2014-02-12 Thread brian m. carlson

On Tue, Feb 11, 2014 at 05:54:51PM -0800, Stefan Zager wrote:
> To this end, I'd like to start submitting patches that make the code
> base generally more thread-safe and thread-friendly.  Right after this
> email, I'm going to send the first such patch, which makes the global
> list of pack files (packed_git) internal to sha1_file.c.

I'm definitely interested in this if it also works on POSIX systems.  At
work, we have a 7.6 GiB repo (packed)[0], so while performance is not
bad, I certainly wouldn't object if it were better.

[0] Using du -sh.  For comparison, the Linux kernel repo is 1.4 GiB.

-- 
brian m. carlson / brian with sandals: Houston, Texas, US
+1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187

signature.asc
Description: Digital signature

Re: feature request: text=input option in .gitattributes

2014-02-12 Thread Cameron Taggart

Thank you Torsten. Could some one help me clarify what this means?

https://www.kernel.org/pub/software/scm/git/docs/gitattributes.html
"This ensures that all files that git considers to be text will have
normalized (LF) line endings in the repository. The core.eol
configuration variable controls which line endings git will use for
normalized files in your working directory; the default is to use the
native line ending for your platform, or CRLF if core.autocrlf is
set."

Does this mean?
1) the files in your working directory should only be set to CRLF if
your native line endings are core.eol=crlf and core.autocrlf=true?
2) the files in your working directory should be set to CRLF by
default if your native line endings are core.eol=crlf and not
core.autocrlf=input

I was hoping it would be #1, but it does not appear to be working that
way. The F# Compiler Service that I want to source index has
text=auto. It can be used to test.

git clone https://github.com/fsharp/FSharp.Compiler.Service fcs

git config --global core.eol crlf
git clone https://github.com/fsharp/FSharp.Compiler.Service fcsCoreEolCrlf

Please help clarify how git should be working, so that I can either
log a bug or a feature request.

cheers,
Cameron

On Wed, Feb 12, 2014 at 9:56 AM, Torsten Bögershausen  wrote:
> On 2014-02-11 15.57, Cameron Taggart wrote:
>> After requesting this as
>> https://github.com/msysgit/msysgit/issues/164, I was told to take it
>> upstream, so here I am.
>>
>> I would like a text=input feature added that has the same behavior as
>> text=auto, except that it defaults to core.autocrlf=input behavior
>> instead of core.autocrlf=true.
> If you want to normailze your repo, and keep it normalized,
> I can  recommend to use .gitattributes.
>
> Please see the excellent page here:
> https://www.kernel.org/pub/software/scm/git/docs/gitattributes.html
> And especially this part:
>
> $ echo "* text=auto" >>.gitattributes
> $ rm .git/index # Remove the index to force git to
> $ git reset # re-scan the working directory
> $ git status# Show files that will be normalized
> $ git add -u
> $ git add .gitattributes
> $ git commit -m "Introduce end-of-line normalization"
>
>
>
> /Torsten
>
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: git-note -C changes commit type?

2014-02-12 Thread Joachim Breitner

Dear Johan,

Am Mittwoch, den 12.02.2014, 00:52 +0100 schrieb Johan Herland:
> On Tue, Feb 11, 2014 at 6:23 PM, Joachim Breitner
>  wrote:
> > judging from the documentation I got the impression that I can pass any
> > git object has to "git note -C " and it would stored as-is. But it
> > seems the objects gets mangled somehow...
> 
> ...well... the documentation does not say "any object", it actually
> explicitly says "blob object"... ;)

ok, my bad; guess I’m not fully versed with gits terminology.

> You would have a notes ref "refs/notes/history" whose tree would
> contain an entry named e1bfac434ebd3135a3784f6fc802f235098eebd0
> pointing to a _commit_ (3d7de37...). Obviously, it would not make
> sense to use refs/notes/history while displaying the commit log ("git
> log --notes=history"), as the raw commit object would be shown in the
> log. However, more fundamentally: a tree referring to a _commit_ is
> usually how Git stores _submodule_ links (i.e. which revision of the
> named submodule is to be used by this super-repo tree), and I'm (off
> the top of my head) not at all sure that such a submodule link in a
> notes tree is handled "sanely" by Git - or even that it makes sense at
> all. For one, I'm not sure that Git requires (or even expects) the
> commit object referenced by a tree to be present in the same object
> DB. So if you share your notes, I don't know whether or not the
> fetch/push machinery will include the commit object in the shared
> notes... These are questions that should be answered before we decide
> whether using commits directly as notes makes sense.

If that is the case, then my approach is indeed flawed. The main point
of the exercise is to have a tree that follows another commit (or, as a
next-best approximation, a note attached to that commit) around.

> In that case, you might be better off using an explicit
> ref to keep that history alive; e.g. you could create
> refs/history/e1bfac4 to point to 3d7de37 ("git update-ref
> refs/history/e1bfac4 3d7de37"), and keep everything
> alive/reachable/shareable that way...

That’s an interesting idea; instead of relying on the notes feature
putting the hash in the ref name. But I wonder how that scales – imagine
every second feature merged into Linux¹ also having such a history ref? 

I guess having a way for a tree to reference commits in a way that is
followed by git gc, i.e. separate from submodules, would allow a less
noisy implementation, and possibly create the opportunity for many other
strange uses of git :-)

Greetings,
Joachim

¹ I’m not proposing for anyone else but me to use this, at the moment,
don’t worry :-). But I am considering to use it in the context of GHC,
which isn’t a small project either.

-- 
Joachim “nomeata” Breitner
  m...@joachim-breitner.de • http://www.joachim-breitner.de/
  Jabber: nome...@joachim-breitner.de  • GPG-Key: 0x4743206C
  Debian Developer: nome...@debian.org

signature.asc
Description: This is a digitally signed message part

Re: [PATCH 00/11] More preparatory work for multiparent tree-walker

2014-02-12 Thread Kirill Smelkov

On Tue, Feb 11, 2014 at 11:59:02AM -0800, Junio C Hamano wrote:
> Kirill Smelkov  writes:
> 
> > Sorry for the confusion. Could you please do the following:
> >
> > Patches should be applied over to ks/tree-diff-walk
> > (74aa4a18). Before applying the patches, please cherry-pick
> >
> > c90483d9(tree-diff: no need to manually verify that there is no
> >  mode change for a path)
> >
> > ef4f0928(tree-diff: no need to pass match to
> >  skip_uninteresting())
> >
> > into that branch, and drop them from ks/combine-diff - we'll have one
> > branch reworking the diff tree-walker, and the other taking advantage of
> > it for combine-diff.
> 
> As long as that does not lose changes to tests and clean-ups, I'm
> fine with that direction.  For example, I do not know if you want to
> lose e3f62d12 (diffcore-order: export generic ordering interface,
> 2014-01-20), which is part of the combine-diff topic.

Sorry for the confusion again, and please don't worry: we are not going
to lose anything - my only plea here was to transfer two of the patches
to more appropriate topic.

That couple touches tree-diff.c - they were some initial cleanups I've
noticed while working on separate combine-diff tree-walker, which we
decided to drop instead of generalizing diff tree-walker to handle all
cases. Only the cleanups are still relevant and needed as a base for
what I've sent you here.

And as to e3f62d12 (diffcore-order: export generic ordering interface,
2014-01-20) and other patches on ks/diff-c-with-diff-order topic - they
stay as they are - we do not need to rework them as ks/combine-diff
builds on top of the topic and generalizing diff tree-walker is
orthogonal work.

So in short: could you please transform the two "tree-diff" patches from
ks/combine-diff to ks/tree-diff-walk, then apply sent-here patches to
ks/tree-diff-walk; thats all.

Thanks,
Kirill
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: git-note -C changes commit type?

2014-02-12 Thread Johan Herland

On Wed, Feb 12, 2014 at 1:06 AM, Junio C Hamano  wrote:
> Johan Herland  writes:
>> There is currently no way the "git notes" commands will allow you to
>> store the 3d7de37 commit object directly as a note. There is also
>> (AFAICS) no easy workaround (git fast-import could've been a
>> workaround if it did not already require the first N/notemodify
>> argument to be a blob object). The best alternative, off the top of my
>> head, would be to write your own program using the notes.h API to
>> manipulate the notes tree directly (or - suboptimally - use other
>> low-level Git operations to do the same).
>
> Even worse. I do not think such a non-blob object in the notes tree
> does not participate in the reachability at all, so you won't be
> able to fetch "refs/notes/whatever" and expect to get a useful
> result.

s/non-blob/non-(blob-or-tree)/

Any object type that is deemed reachable by reference from a regular
git tree object will also be usable (as far as reachability goes) in a
notes tree.

> I do not think storing the raw bits of commit object as a
> blob in the notes tree is useful behaviour, either.  The command
> probably should refuse to get anything non-blob via that option.

Patch coming up...

> Perhaps the notes entry should just note the object name of whatever
> commit it wants to refer to in a *blob*?

Agreed.

...Johan

-- 
Johan Herland, 
www.herland.net
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] notes: Disallow reusing non-blob as a note object

2014-02-12 Thread Johan Herland

Currently "git notes add -C $object" will read the raw bytes from $object,
and then copy those bytes into the note object, which is hardcoded to be
of type blob. This means that if the given $object is a non-blob (e.g.
tree or commit), the raw bytes from that object is copied into a blob
object. This is probably not useful, and certainly not what any sane
user would expect. So disallow it, by erroring out if the $object passed
to the -C option is not a blob.

The fix also applies to the -c option (in which the user is prompted to
edit/verify the note contents in a text editor), and also when -c/-C is
passed to "git notes append" (which appends the $object contents to an
existing note object). In both cases, passing a non-blob $object does not
make sense.

Also add a couple of tests demonstrating expected behavior.

Suggested-by: Junio C Hamano 
Signed-off-by: Johan Herland 
---
 builtin/notes.c  |  6 +-
 t/t3301-notes.sh | 27 +++
 2 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/builtin/notes.c b/builtin/notes.c
index 2b24d05..bb89930 100644
--- a/builtin/notes.c
+++ b/builtin/notes.c
@@ -269,7 +269,11 @@ static int parse_reuse_arg(const struct option *opt, const 
char *arg, int unset)
die(_("Failed to resolve '%s' as a valid ref."), arg);
if (!(buf = read_sha1_file(object, &type, &len)) || !len) {
free(buf);
-   die(_("Failed to read object '%s'."), arg);;
+   die(_("Failed to read object '%s'."), arg);
+   }
+   if (type != OBJ_BLOB) {
+   free(buf);
+   die(_("Cannot read note data from non-blob object '%s'."), arg);
}
strbuf_add(&(msg->buf), buf, len);
free(buf);
diff --git a/t/t3301-notes.sh b/t/t3301-notes.sh
index 16de05a..3bb79a4 100755
--- a/t/t3301-notes.sh
+++ b/t/t3301-notes.sh
@@ -812,6 +812,33 @@ test_expect_success 'create note from non-existing note 
with "git notes add -C"
test_must_fail git notes list HEAD
 '
 
+test_expect_success 'create note from non-blob with "git notes add -C" fails' '
+   commit=$(git rev-parse --verify HEAD) &&
+   tree=$(git rev-parse --verify HEAD:) &&
+   test_must_fail git notes add -C $commit &&
+   test_must_fail git notes add -C $tree &&
+   test_must_fail git notes list HEAD
+'
+
+cat > expect << EOF
+commit 80d796defacd5db327b7a4e50099663902fbdc5c
+Author: A U Thor 
+Date:   Thu Apr 7 15:20:13 2005 -0700
+
+8th
+
+Notes (other):
+This is a blob object
+EOF
+
+test_expect_success 'create note from blob with "git notes add -C" reuses blob 
id' '
+   blob=$(echo "This is a blob object" | git hash-object -w --stdin) &&
+   git notes add -C $blob &&
+   git log -1 > actual &&
+   test_cmp expect actual &&
+   test "$(git notes list HEAD)" = "$blob"
+'
+
 cat > expect << EOF
 commit 016e982bad97eacdbda0fcbd7ce5b0ba87c81f1b
 Author: A U Thor 
-- 
1.8.4.653.g2df02b3

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: git-note -C changes commit type?

2014-02-12 Thread Johan Herland

On Wed, Feb 12, 2014 at 9:53 AM, Joachim Breitner
 wrote:
> Am Mittwoch, den 12.02.2014, 00:52 +0100 schrieb Johan Herland:
>> You would have a notes ref "refs/notes/history" whose tree would
>> contain an entry named e1bfac434ebd3135a3784f6fc802f235098eebd0
>> pointing to a _commit_ (3d7de37...). Obviously, it would not make
>> sense to use refs/notes/history while displaying the commit log ("git
>> log --notes=history"), as the raw commit object would be shown in the
>> log. However, more fundamentally: a tree referring to a _commit_ is
>> usually how Git stores _submodule_ links (i.e. which revision of the
>> named submodule is to be used by this super-repo tree), and I'm (off
>> the top of my head) not at all sure that such a submodule link in a
>> notes tree is handled "sanely" by Git - or even that it makes sense at
>> all. For one, I'm not sure that Git requires (or even expects) the
>> commit object referenced by a tree to be present in the same object
>> DB. So if you share your notes, I don't know whether or not the
>> fetch/push machinery will include the commit object in the shared
>> notes... These are questions that should be answered before we decide
>> whether using commits directly as notes makes sense.
>
> If that is the case, then my approach is indeed flawed. The main point
> of the exercise is to have a tree that follows another commit (or, as a
> next-best approximation, a note attached to that commit) around.
>
>> In that case, you might be better off using an explicit
>> ref to keep that history alive; e.g. you could create
>> refs/history/e1bfac4 to point to 3d7de37 ("git update-ref
>> refs/history/e1bfac4 3d7de37"), and keep everything
>> alive/reachable/shareable that way...
>
> That’s an interesting idea; instead of relying on the notes feature
> putting the hash in the ref name. But I wonder how that scales – imagine
> every second feature merged into Linux¹ also having such a history ref?

Ah, that will probably not scale very well.

> I guess having a way for a tree to reference commits in a way that is
> followed by git gc, i.e. separate from submodules, would allow a less
> noisy implementation, and possibly create the opportunity for many other
> strange uses of git :-)

Here's another way to solve your problem, which should be fairly
transparent and performant:

Whenever you want to reference "history" of a commit (I'm using quotes
here, because we're not talking about the "regular" git sense of
history, i.e. the commit graph), you perform the following two steps:

1. Append the "historical" commit SHA-1 (3d7de37 in your example) to a
note on the "current" commit (e1bfac4). E.g.:

git notes --ref history append -m 3d7de37... e1bfac4...

2. Perform some automated merge into a "history"-tracking ref (e.g.
refs/history), to keep the "historical" commits reachable.

(You can easily wrap both steps into a script to automate things.)

Step #1 encodes the "history" of a commit in a note, but does not keep
the "history" reachable.

Step #2 keeps all "historical" commits reachable by making them part
of the history (in the git sense - without quotes) of a proper ref
(refs/history). The actual result/outcome of the merge is not
interesting. It only exists to insert the "historical" commit
(3d7de37) into the ancestry for refs/history. Since the actual merge
itself is uninteresting, you should probably use a merge strategy that
never yields conflicts, e.g. "-s ours"

You can now share the "history" by pushing/fetching the two refs
refs/notes/history and refs/history.

(In theory, you might even be able to combine the two refs, by
performing the merge directly into refs/notes/history, always taking
care to retain the notes tree contents as the result of the merge. In
other words, after you do step #1 (append the note), you manually
rewrite the just-created tip of refs/notes/history to include 3d7de37
as a second parent. This keeps 3d7de37 reachable (and it will be
shared when you share refs/notes/history), and it should not interfere
with the notes infrastructure, as they only look at the current state
of the notes tree.)

Hope this helps,

...Johan

-- 
Johan Herland, 
www.herland.net
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: git-note -C changes commit type?

2014-02-12 Thread Joachim Breitner

Dear Johan,

thanks for the patch!

Am Mittwoch, den 12.02.2014, 11:26 +0100 schrieb Johan Herland:
> Here's another way to solve your problem, which should be fairly
> transparent and performant:
> 
> Whenever you want to reference "history" of a commit (I'm using quotes
> here, because we're not talking about the "regular" git sense of
> history, i.e. the commit graph), you perform the following two steps:
> 
> 1. Append the "historical" commit SHA-1 (3d7de37 in your example) to a
> note on the "current" commit (e1bfac4). E.g.:
> 
> git notes --ref history append -m 3d7de37... e1bfac4...
> 
> 2. Perform some automated merge into a "history"-tracking ref (e.g.
> refs/history), to keep the "historical" commits reachable.
> 
> (You can easily wrap both steps into a script to automate things.)
> 
> Step #1 encodes the "history" of a commit in a note, but does not keep
> the "history" reachable.
> 
> Step #2 keeps all "historical" commits reachable by making them part
> of the history (in the git sense - without quotes) of a proper ref
> (refs/history). The actual result/outcome of the merge is not
> interesting. It only exists to insert the "historical" commit
> (3d7de37) into the ancestry for refs/history. Since the actual merge
> itself is uninteresting, you should probably use a merge strategy that
> never yields conflicts, e.g. "-s ours"
> 
> You can now share the "history" by pushing/fetching the two refs
> refs/notes/history and refs/history.
>
> (In theory, you might even be able to combine the two refs, by
> performing the merge directly into refs/notes/history, always taking
> care to retain the notes tree contents as the result of the merge. In
> other words, after you do step #1 (append the note), you manually
> rewrite the just-created tip of refs/notes/history to include 3d7de37
> as a second parent. This keeps 3d7de37 reachable (and it will be
> shared when you share refs/notes/history), and it should not interfere
> with the notes infrastructure, as they only look at the current state
> of the notes tree.)

That is quite a good approximation. What it doesn’t do is dropping
history (in the refs/history sense) of commits that disappear, but the
same problem exists with notes. Thanks!


I guess there are no plans to make the commit object format itself
extensible, are they? Extensible in the sense that I can add a custom
field to it (e.g. history:). Git would not have to know anything about
the field besides its type, i.e. that it contains refs that it has to
follow. Very much like "parent:", just without the semantics of it wrt.
"git log" and the like.


Greetings,
Joachim
-- 
Joachim “nomeata” Breitner
  m...@joachim-breitner.de • http://www.joachim-breitner.de/
  Jabber: nome...@joachim-breitner.de  • GPG-Key: 0x4743206C
  Debian Developer: nome...@debian.org


signature.asc
Description: This is a digitally signed message part

Re: Make the git codebase thread-safe

2014-02-12 Thread Karsten Blees

Am 12.02.2014 04:43, schrieb Duy Nguyen:
> On Wed, Feb 12, 2014 at 9:02 AM, Robin H. Johnson  wrote:
>> On Tue, Feb 11, 2014 at 05:54:51PM -0800,  Stefan Zager wrote:
>>> We in the chromium project have a keen interest in adding threading to
>>> git in the pursuit of performance for lengthy operations (checkout,
>>> status, blame, ...).  Our motivation comes from hitting some
>>> performance walls when working with repositories the size of chromium
>>> and blink:
>> +1 from Gentoo on performance improvements for large repos.
>>
>> The main repository in the ongoing Git migration project looks to be in
>> the 1.5GB range (and for those that want to propose splitting it up, we
>> have explored that option and found it lacking), with very deep history
>> (but no branches of note, and very few tags).
> 
> From v1.9 shallow clone should work for all push/pull/clone... so
> history depth does not matter (on the client side). As for
> gentoo-x86's large worktree, using index v4 and avoid full-tree
> operations (e.g. "status .", not "status"..) should make all
> operations reasonably fast. I plan to make "status" fast even without
> path limiting with the help of inotify, but that's not going to be
> finished soon. Did I miss anything else?
> 

Regarding git-status on msysgit, enable core.preloadindex and core.fscache (as 
of 1.8.5.2).

There's no inotify on Windows, and I gave up using ReadDirectoryChangesW to 
keep fscache up to date, as it _may_ report DOS file names (e.g. C:\PROGRA~1 
instead of C:\Program Files).
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: pack bitmap woes on Windows

2014-02-12 Thread Erik Faye-Lund

On Wed, Feb 12, 2014 at 8:27 AM, Johannes Sixt  wrote:
> Running test suite of 'next' on Windows fails in t5310-pack-bitmaps with
> the following symptoms. I haven't followed the topic. Have there been
> patches floating that addressed the problem in one way or another?
>
> (gdb) run
> Starting program: D:\Src\mingw-git\t\trash 
> directory.t5310-pack-bitmaps/..\..\git.exe rev-list --test-bitmap HEAD
> [New thread 3528.0x8d4]
> Bitmap v1 test (20 entries loaded)
> Found bitmap for 537ea4d3eb79c95f602873b1167c480006d2ac2d. 64 bits / 15873b36 
> checksum
>
> Breakpoint 1, die (err=0x5939e9 "Out of memory, realloc failed") at usage.c:97
> 97  if (die_is_recursing()) {
> (gdb) bt
> #0  die (err=0x5939e9 "Out of memory, realloc failed") at usage.c:97
> #1  0x00487c4c in xrealloc (ptr=0x12b10008, size=837107040) at wrapper.c:109

~800 megs is a pretty large allocation for 32-bit systems. What gives?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Failed to find remote helpers after install from source

2014-02-12 Thread Antonio Pérez Barrero

Hi,

I installed git 1.8.5.4 from source under $HOME/bin. My system is SUSE
Linux Enterprise Server 11 SP1 (x86_64). I followed instructions and
just run make && make install.

After installation I cannot clone repos from https urls, getting the
following error:

$ git clone https://github.com/apbarrero/pyrad.git
Cloning into 'pyrad'...
fatal: Unable to find remote helper for 'https'

I checked that git is properly compiled against libcurl-devel and
libexpat-devel packages. Then I tryed using git-remote-https installed
under $HOME/bin/libexec/git-core/ and it worked fine to connect to the
same remote:

$ $HOME/libexec/git-core/git-remote-https https://github.com/apbarrero/pyrad.git
capabilities
fetch
option
push
check-connectivity

list
@refs/heads/master HEAD
9599cf833354793b81d2a47504826332473bcb12 refs/heads/master
1f8f2b995bb5ab55e6c6f1051ccb44875ab1e60d refs/tags/0.6
68552227901d377b513884c70d9582da0329a270 refs/tags/0.6^{}
e0cd958edc5b3aad7e31435990674e2cff4e3b7e refs/tags/0.7
c50213b2d4213f3574c1a6b454e6887a529de340 refs/tags/0.7^{}
6085deb4ee37862d65f4a26f472fa2d1894a4331 refs/tags/0.8
33902c5b3da1272a4f5930815f561b8068315ba3 refs/tags/0.8^{}
5a45639faaf1cbf7622fe47e2795d6f5a0ee6658 refs/tags/0.9
edd69b9014d7e5bbf9da203d7db9a26587756aa4 refs/tags/0.9^{}
6e3b16ed19b329be944bd1b10aa17d02eb473009 refs/tags/1.0
30beedc5c4e56a15f4025d25331515aa2a917234 refs/tags/1.0^{}
dacf4bd37aaddd3872faeb8a77c801fe3c8550cb refs/tags/1.1
4c3e2d6700947ca6ea7b3319ff52abb7029bf3be refs/tags/1.1^{}
9a7f5a4e9fe19ef9f45db4e28a7d4648a011cc9b refs/tags/1.2
ee7ec8f2b37da5e84bf0fbb83e214a8bd3cfdf70 refs/tags/1.2^{}
e16af24d814e8d8c83b172ca6103fd3ab93b08db refs/tags/2.0
ce4a625caf5c6d892e020ec150373043a203366e refs/tags/2.0^{}

I looked at this post in stackoverflow and use strace to find out git
clone is trying to look for remote helpers under
$HOME/bin/libexec/git-core, but make install left the libexec/git-core
directory under $HOME, instead of$HOME/bin.

...
[pid 675896] execve("$HOME/bin/libexec/git-core/git-remote-https",
["git-remote-https", "origin", "https://github.com/apbarrero/pyr";...],
[/* 88 vars */]) = -1 ENOENT (No such file or directory)
[pid 675896] execve("$HOME/bin/git-remote-https", ["git-remote-https",
"origin", "https://github.com/apbarrero/pyr";...], [/* 88 vars */]) =
-1 ENOENT (No such file or directory)
[pid 675896] execve("$HOME/bin/git-remote-https", ["git-remote-https",
"origin", "https://github.com/apbarrero/pyr";...], [/* 88 vars */]) =
-1 ENOENT (No such file or directory)
[pid 675896] execve("$HOME/bin/git-remote-https", ["git-remote-https",
"origin", "https://github.com/apbarrero/pyr";...], [/* 88 vars */]) =
-1 ENOENT (No such file or directory)
[pid 675896] execve("$HOME/bin/git-remote-https", ["git-remote-https",
"origin", "https://github.com/apbarrero/pyr";...], [/* 88 vars */]) =
-1 ENOENT (No such file or directory)
[pid 675896] execve("/usr/local/bin/git-remote-https",
["git-remote-https", "origin", "https://github.com/apbarrero/pyr";...],
[/* 88 vars */]) = -1 ENOENT (No such file or directory)
[pid 675896] execve("/usr/bin/git-remote-https", ["git-remote-https",
"origin", "https://github.com/apbarrero/pyr";...], [/* 88 vars */]) =
-1 ENOENT (No such file or directory)
[pid 675896] execve("/bin/git-remote-https", ["git-remote-https",
"origin", "https://github.com/apbarrero/pyr";...], [/* 88 vars */]) =
-1 ENOENT (No such file or directory)
[pid 675896] execve("/usr/bin/X11/git-remote-https",
["git-remote-https", "origin", "https://github.com/apbarrero/pyr";...],
[/* 88 vars */]) = -1 ENOENT (No such file or directory)
[pid 675896] execve("/usr/X11R6/bin/git-remote-https",
["git-remote-https", "origin", "https://github.com/apbarrero/pyr";...],
[/* 88 vars */]) = -1 ENOENT (No such file or directory)
[pid 675896] execve("/usr/games/git-remote-https",
["git-remote-https", "origin", "https://github.com/apbarrero/pyr";...],
[/* 88 vars */]) = -1 ENOENT (No such file or directory)
[pid 675896] execve("/usr/lib64/jvm/jre/bin/git-remote-https",
["git-remote-https", "origin", "https://github.com/apbarrero/pyr";...],
[/* 88 vars */]) = -1 ENOENT (No such file or directory)
[pid 675896] execve("/usr/lib/mit/bin/git-remote-https",
["git-remote-https", "origin", "https://github.com/apbarrero/pyr";...],
[/* 88 vars */]) = -1 ENOENT (No such file or directory)
[pid 675896] execve("/usr/lib/mit/sbin/git-remote-https",
["git-remote-https", "origin", "https://github.com/apbarrero/pyr";...],
[/* 88 vars */]) = -1 ENOENT (No such file or directory)
...

Creating a soft link under $HOME/bin/ to ../libexec got it working.

So it look like it is rather an installation issue or a bug in the
directory list where git looks for binaries under libexec or I missed
some parameter to pass to make.

Regards,

-- 
   Antonio Pérez Barrero
   apbarr...@gmail.com
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kern

Re: Make the git codebase thread-safe

2014-02-12 Thread Erik Faye-Lund

On Wed, Feb 12, 2014 at 2:54 AM, Stefan Zager  wrote:
> We in the chromium project have a keen interest in adding threading to
> git in the pursuit of performance for lengthy operations (checkout,
> status, blame, ...).  Our motivation comes from hitting some
> performance walls when working with repositories the size of chromium
> and blink:
>
> https://chromium.googlesource.com/chromium/src
> https://chromium.googlesource.com/chromium/blink
>
> We are particularly concerned with the performance of msysgit, and we
> have already chalked up a significant performance gain by turning on
> the threading code in pack-objects (which was already enabled for
> posix platforms, but not on msysgit, owing to the lack of a correct
> pread implementation).

How did you manage to do this? I'm not aware of any way to implement
pread on Windows (without going down the insanity-path of wrapping and
potentially locking inside every IO operation)...
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: pack bitmap woes on Windows

2014-02-12 Thread Johannes Sixt

Am 2/12/2014 12:56, schrieb Erik Faye-Lund:
> On Wed, Feb 12, 2014 at 8:27 AM, Johannes Sixt  wrote:
>> Running test suite of 'next' on Windows fails in t5310-pack-bitmaps with
>> the following symptoms. I haven't followed the topic. Have there been
>> patches floating that addressed the problem in one way or another?
>>
>> (gdb) run
>> Starting program: D:\Src\mingw-git\t\trash 
>> directory.t5310-pack-bitmaps/..\..\git.exe rev-list --test-bitmap HEAD
>> [New thread 3528.0x8d4]
>> Bitmap v1 test (20 entries loaded)
>> Found bitmap for 537ea4d3eb79c95f602873b1167c480006d2ac2d. 64 bits / 
>> 15873b36 checksum
>>
>> Breakpoint 1, die (err=0x5939e9 "Out of memory, realloc failed") at 
>> usage.c:97
>> 97  if (die_is_recursing()) {
>> (gdb) bt
>> #0  die (err=0x5939e9 "Out of memory, realloc failed") at usage.c:97
>> #1  0x00487c4c in xrealloc (ptr=0x12b10008, size=837107040) at wrapper.c:109
> 
> ~800 megs is a pretty large allocation for 32-bit systems. What gives?

That's exactly the problem: why would a tiny repository from the test
suite require such a large allocation? (Not to mention that the allocation
ultimately fails in my case.)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: pack bitmap woes on Windows

2014-02-12 Thread David Kastrup

Johannes Sixt  writes:

> Running test suite of 'next' on Windows fails in t5310-pack-bitmaps with
> the following symptoms. I haven't followed the topic. Have there been
> patches floating that addressed the problem in one way or another?
>
> (gdb) run
> Starting program: D:\Src\mingw-git\t\trash 
> directory.t5310-pack-bitmaps/..\..\git.exe rev-list --test-bitmap HEAD
> [New thread 3528.0x8d4]
> Bitmap v1 test (20 entries loaded)
> Found bitmap for 537ea4d3eb79c95f602873b1167c480006d2ac2d. 64 bits / 15873b36 
> checksum

Does reverting a201c20b41a2f0725977bcb89a2a66135d776ba2 help?

-- 
David Kastrup
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: pack bitmap woes on Windows

2014-02-12 Thread Johannes Sixt

Am 2/12/2014 13:55, schrieb David Kastrup:
> Johannes Sixt  writes:
> 
>> Running test suite of 'next' on Windows fails in t5310-pack-bitmaps with
>> the following symptoms. I haven't followed the topic. Have there been
>> patches floating that addressed the problem in one way or another?
>>
>> (gdb) run
>> Starting program: D:\Src\mingw-git\t\trash 
>> directory.t5310-pack-bitmaps/..\..\git.exe rev-list --test-bitmap HEAD
>> [New thread 3528.0x8d4]
>> Bitmap v1 test (20 entries loaded)
>> Found bitmap for 537ea4d3eb79c95f602873b1167c480006d2ac2d. 64 bits / 
>> 15873b36 checksum
> 
> Does reverting a201c20b41a2f0725977bcb89a2a66135d776ba2 help?

YES! t5310 passes after reverting this commit.

-- Hannes
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: pack bitmap woes on Windows

2014-02-12 Thread David Kastrup

Johannes Sixt  writes:

> Am 2/12/2014 13:55, schrieb David Kastrup:
>> Johannes Sixt  writes:
>> 
>>> Running test suite of 'next' on Windows fails in t5310-pack-bitmaps with
>>> the following symptoms. I haven't followed the topic. Have there been
>>> patches floating that addressed the problem in one way or another?
>>>
>>> (gdb) run
>>> Starting program: D:\Src\mingw-git\t\trash
>>> directory.t5310-pack-bitmaps/..\..\git.exe rev-list --test-bitmap
>>> HEAD
>>> [New thread 3528.0x8d4]
>>> Bitmap v1 test (20 entries loaded)
>>> Found bitmap for 537ea4d3eb79c95f602873b1167c480006d2ac2d. 64 bits
>>> / 15873b36 checksum
>> 
>> Does reverting a201c20b41a2f0725977bcb89a2a66135d776ba2 help?
>
> YES! t5310 passes after reverting this commit.

Oh.  I just looked through the backtrace until finding a routine
reasonably related with the regtest and checked for the last commit
changing it, then posted my question.

Then I looked through the diff of the patch and considered it
unconspicuous.  So I commenced reading through earlier commits.

I actually don't have a good idea what might be wrong here.  The code is
somewhat distasteful as it basically uses eword_t and uint64_t
interchangeably, but then this does match its current definition.

-- 
David Kastrup
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: pack bitmap woes on Windows

2014-02-12 Thread Duy Nguyen

On Wed, Feb 12, 2014 at 8:23 PM, David Kastrup  wrote:
> Johannes Sixt  writes:
>
>> Am 2/12/2014 13:55, schrieb David Kastrup:
>>> Johannes Sixt  writes:
>>>
 Running test suite of 'next' on Windows fails in t5310-pack-bitmaps with
 the following symptoms. I haven't followed the topic. Have there been
 patches floating that addressed the problem in one way or another?

 (gdb) run
 Starting program: D:\Src\mingw-git\t\trash
 directory.t5310-pack-bitmaps/..\..\git.exe rev-list --test-bitmap
 HEAD
 [New thread 3528.0x8d4]
 Bitmap v1 test (20 entries loaded)
 Found bitmap for 537ea4d3eb79c95f602873b1167c480006d2ac2d. 64 bits
 / 15873b36 checksum
>>>
>>> Does reverting a201c20b41a2f0725977bcb89a2a66135d776ba2 help?
>>
>> YES! t5310 passes after reverting this commit.
>
> Oh.  I just looked through the backtrace until finding a routine
> reasonably related with the regtest and checked for the last commit
> changing it, then posted my question.
>
> Then I looked through the diff of the patch and considered it
> unconspicuous.  So I commenced reading through earlier commits.
>
> I actually don't have a good idea what might be wrong here.  The code is
> somewhat distasteful as it basically uses eword_t and uint64_t
> interchangeably, but then this does match its current definition.

Perhaps __BYTE_ORDER or __BIG_ENDIAN is misdefined and the ntohll() is skipped?
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: pack bitmap woes on Windows

2014-02-12 Thread Erik Faye-Lund

On Wed, Feb 12, 2014 at 3:09 PM, Duy Nguyen  wrote:
> On Wed, Feb 12, 2014 at 8:23 PM, David Kastrup  wrote:
>> Johannes Sixt  writes:
>>
>>> Am 2/12/2014 13:55, schrieb David Kastrup:
 Johannes Sixt  writes:

> Running test suite of 'next' on Windows fails in t5310-pack-bitmaps with
> the following symptoms. I haven't followed the topic. Have there been
> patches floating that addressed the problem in one way or another?
>
> (gdb) run
> Starting program: D:\Src\mingw-git\t\trash
> directory.t5310-pack-bitmaps/..\..\git.exe rev-list --test-bitmap
> HEAD
> [New thread 3528.0x8d4]
> Bitmap v1 test (20 entries loaded)
> Found bitmap for 537ea4d3eb79c95f602873b1167c480006d2ac2d. 64 bits
> / 15873b36 checksum

 Does reverting a201c20b41a2f0725977bcb89a2a66135d776ba2 help?
>>>
>>> YES! t5310 passes after reverting this commit.
>>
>> Oh.  I just looked through the backtrace until finding a routine
>> reasonably related with the regtest and checked for the last commit
>> changing it, then posted my question.
>>
>> Then I looked through the diff of the patch and considered it
>> unconspicuous.  So I commenced reading through earlier commits.
>>
>> I actually don't have a good idea what might be wrong here.  The code is
>> somewhat distasteful as it basically uses eword_t and uint64_t
>> interchangeably, but then this does match its current definition.
>
> Perhaps __BYTE_ORDER or __BIG_ENDIAN is misdefined and the ntohll() is 
> skipped?

That is indeed the case.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] blame.c: prepare_lines should not call xrealloc for every line

2014-02-12 Thread David Kastrup

Making a single preparation run for counting the lines will avoid memory
fragmentation.  Also, fix the allocated memory size which was wrong
when sizeof(int *) != sizeof(int), and would have been too small
for sizeof(int *) < sizeof(int), admittedly unlikely.

Signed-off-by: David Kastrup 
---

Since there was no feedback after the last defense/explanation of the
coding choices, the code rewritten by this patch was much more awful,
and the kind of style requests (fixed already in the last iteration)
are not actually heeded by the core developers themselves, I have no
idea whether this patch will be dropped just like the last one.

As opposed to the last try, this incorporates a suggestion from Jeff
to change sizeof(type) to sizeof(expression) which is not helping much
since the types of lineno and sb->lineno still need to be changed in
sync.

It also fiddles cosmetically with the code layout of the loops.

builtin/blame.c | 46 +++---
 1 file changed, 31 insertions(+), 15 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index e44a6bb..1aefedf 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -1772,25 +1772,41 @@ static int prepare_lines(struct scoreboard *sb)
 {
const char *buf = sb->final_buf;
unsigned long len = sb->final_buf_size;
-   int num = 0, incomplete = 0, bol = 1;
+   const char *end = buf + len;
+   const char *p;
+   int *lineno;
+   int num = 0, incomplete = 0;
 
-   if (len && buf[len-1] != '\n')
-   incomplete++; /* incomplete line at the end */
-   while (len--) {
-   if (bol) {
-   sb->lineno = xrealloc(sb->lineno,
- sizeof(int *) * (num + 1));
-   sb->lineno[num] = buf - sb->final_buf;
-   bol = 0;
-   }
-   if (*buf++ == '\n') {
+   for (p = buf;;) {
+   p = memchr(p, '\n', end - p);
+   if (p) {
+   p++;
num++;
-   bol = 1;
+   continue;
}
+   break;
}
-   sb->lineno = xrealloc(sb->lineno,
- sizeof(int *) * (num + incomplete + 1));
-   sb->lineno[num + incomplete] = buf - sb->final_buf;
+
+   if (len && end[-1] != '\n')
+   incomplete++; /* incomplete line at the end */
+
+   sb->lineno = xmalloc(sizeof(*sb->lineno) * (num + incomplete + 1));
+   lineno = sb->lineno;
+
+   *lineno++ = 0;
+   for (p = buf;;) {
+   p = memchr(p, '\n', end - p);
+   if (p) {
+   p++;
+   *lineno++ = p - buf;
+   continue;
+   }
+   break;
+   }
+
+   if (incomplete)
+   *lineno++ = len;
+
sb->num_lines = num + incomplete;
return sb->num_lines;
 }
-- 
1.8.3.2

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: pack bitmap woes on Windows

2014-02-12 Thread Erik Faye-Lund

On Wed, Feb 12, 2014 at 3:22 PM, Erik Faye-Lund  wrote:
> On Wed, Feb 12, 2014 at 3:09 PM, Duy Nguyen  wrote:
>> On Wed, Feb 12, 2014 at 8:23 PM, David Kastrup  wrote:
>>> Johannes Sixt  writes:
>>>
 Am 2/12/2014 13:55, schrieb David Kastrup:
> Johannes Sixt  writes:
>
>> Running test suite of 'next' on Windows fails in t5310-pack-bitmaps with
>> the following symptoms. I haven't followed the topic. Have there been
>> patches floating that addressed the problem in one way or another?
>>
>> (gdb) run
>> Starting program: D:\Src\mingw-git\t\trash
>> directory.t5310-pack-bitmaps/..\..\git.exe rev-list --test-bitmap
>> HEAD
>> [New thread 3528.0x8d4]
>> Bitmap v1 test (20 entries loaded)
>> Found bitmap for 537ea4d3eb79c95f602873b1167c480006d2ac2d. 64 bits
>> / 15873b36 checksum
>
> Does reverting a201c20b41a2f0725977bcb89a2a66135d776ba2 help?

 YES! t5310 passes after reverting this commit.
>>>
>>> Oh.  I just looked through the backtrace until finding a routine
>>> reasonably related with the regtest and checked for the last commit
>>> changing it, then posted my question.
>>>
>>> Then I looked through the diff of the patch and considered it
>>> unconspicuous.  So I commenced reading through earlier commits.
>>>
>>> I actually don't have a good idea what might be wrong here.  The code is
>>> somewhat distasteful as it basically uses eword_t and uint64_t
>>> interchangeably, but then this does match its current definition.
>>
>> Perhaps __BYTE_ORDER or __BIG_ENDIAN is misdefined and the ntohll() is 
>> skipped?
>
> That is indeed the case.

Looking a bit at it, the whole byte-order detection mess seems insane to me.

MinGW falls inside the "defined(__GNUC__) && (defined(__i386__) ||
defined(__x86_64__))"-block, but does not define __BYTE_ORDER.

It seems the author of a201c20b41a2f0725977bcb89a2a66135d776ba2
assumes __BYTE_ORDER was guaranteed to always be defined, probably
fooled by the following check:

#if !defined(__BYTE_ORDER)
# error "Cannot determine endianness"
#endif

However, that check is only performed if we're NOT building with GCC
on x86/x64 nor MSVC (which I don't think defined __BYTE_ORDER either)

The following would make __BYTE_ORDER a guarantee, and show that MinGW
does not define it:

---8<---

diff --git a/compat/bswap.h b/compat/bswap.h
index 120c6c1..c73bf0e 100644
--- a/compat/bswap.h
+++ b/compat/bswap.h
@@ -109,10 +109,6 @@ static inline uint64_t git_bswap64(uint64_t x)
 # endif
 #endif

-#if !defined(__BYTE_ORDER)
-# error "Cannot determine endianness"
-#endif
-
 #if __BYTE_ORDER == __BIG_ENDIAN
 # define ntohll(n) (n)
 # define htonll(n) (n)
@@ -123,6 +119,10 @@ static inline uint64_t git_bswap64(uint64_t x)

 #endif

+#if !defined(__BYTE_ORDER)
+# error "Cannot determine endianness"
+#endif
+
 /*
  * Performance might be improved if the CPU architecture is OK with
  * unaligned 32-bit loads and a fast ntohl() is available.

---8<---

But another level of insanity (IMO) is defining double-underscore
macros. These symbols are reserved for the implementation. Sure,
knowing we're on a given implementation and defining something *else*
dependent on them is fine. But defining them is just yuckiness, and
not very standard-kosher.

IMO, we should rather do the definition the other way around:

#if !defined(BYTE_ORDER)
# if defined(__BYTE_ORDER) && defined(__LITTLE_ENDIAN) && defined(__BIG_ENDIAN)
#  define BYTE_ORDER __BYTE_ORDER
#  define LITTLE_ENDIAN __LITTLE_ENDIAN
#  define BIG_ENDIAN __BIG_ENDIAN
# endif
#endif

...and only referred to BYTE_ORDER, LITTLE_ENDIAN and BIG_ENDIAN.


That way we'd follow closer to where opengroup are heading:

http://www.opengroup.org/austin/docs/austin_514.txt
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: feature request: text=input option in .gitattributes

2014-02-12 Thread Torsten Bögershausen

On 2014-02-11 15.57, Cameron Taggart wrote:
> After requesting this as
> https://github.com/msysgit/msysgit/issues/164, I was told to take it
> upstream, so here I am.
> 
> I would like a text=input feature added that has the same behavior as
> text=auto, except that it defaults to core.autocrlf=input behavior
> instead of core.autocrlf=true. 
If you want to normailze your repo, and keep it normalized,
I can  recommend to use .gitattributes.

Please see the excellent page here:
https://www.kernel.org/pub/software/scm/git/docs/gitattributes.html
And especially this part:

$ echo "* text=auto" >>.gitattributes
$ rm .git/index # Remove the index to force git to
$ git reset # re-scan the working directory
$ git status# Show files that will be normalized
$ git add -u
$ git add .gitattributes
$ git commit -m "Introduce end-of-line normalization"



/Torsten

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Turning a test script into something usable in t/perf

2014-02-12 Thread David Kastrup


Hi,

I find that I have little clue about how to convert the following brief
test script into some test to place in t/perf:

#!/bin/sh
rm -rf /tmp/git-test
mkdir /tmp/git-test
cd /tmp/git-test
git init
LIMIT=20
yes a|head -$LIMIT >data
yes b|head -$LIMIT >data2
git add data data2
git commit -m "split"
git rm data2
yes 'a
b' | head -$(($LIMIT*2)) >data
git add data
git commit -m "combined"
time git blame data >/dev/null

The variable LIMIT is the deciding factor for determining performance
which, with the code in current master, is rather measurably O(LIMIT^2).
I think that the current test takes about 15 minutes to complete on my
computer.

Obviously, that's sort of excessive: there is little point in choosing
sizes that show off more than two orders of magnitude in improvement.

Now the pathological cases are lots of small but attributable fragments
in the blamed source files.  One real-world project that is hit rather
hard is an alphabetically sorted large list of words that tends to get
insertions/deletions of few scattered lines at a time.

Should one aim for an actually pathological case like in this script?
Should one try benchmarking with one of the stock repositories instead
that don't really demonstrate well just how bad the behavior might
become and which code passages are dominant regarding the quadratic
behavior?

-- 
David Kastrup

Re: pack bitmap woes on Windows

2014-02-12 Thread Jeff King

On Wed, Feb 12, 2014 at 03:49:24PM +0100, Erik Faye-Lund wrote:

> It seems the author of a201c20b41a2f0725977bcb89a2a66135d776ba2
> assumes __BYTE_ORDER was guaranteed to always be defined, probably
> fooled by the following check:
> 
> #if !defined(__BYTE_ORDER)
> # error "Cannot determine endianness"
> #endif
> 
> However, that check is only performed if we're NOT building with GCC
> on x86/x64 nor MSVC (which I don't think defined __BYTE_ORDER either)
> 
> The following would make __BYTE_ORDER a guarantee, and show that MinGW
> does not define it:

Yes, that is the problem. Sorry, this got brought up earlier[1], but the
discussion went on and I did not notice that Brian's patch did not get
applied.

After re-reading the discussion, I think the patch below is the best
solution. Can you confirm that it fixes the problem for you?

-Peff

[1] http://thread.gmane.org/gmane.comp.version-control.git/241278

-- >8 --
Subject: ewah: unconditionally ntohll ewah data

Commit a201c20 tried to optimize out a loop like:

  for (i = 0; i < len; i++)
  data[i] = ntohll(data[i]);

in the big-endian case, because we know that ntohll is a
noop, and we do not need to pay the cost of the loop at all.
However, it mistakenly assumed that __BYTE_ORDER was always
defined, whereas it may not be on systems which do not
define it by default, and where we did not need to define it
to set up the ntohll macro. This includes OS X and Windows.

We could muck with the ordering in compat/bswap.h to make
sure it is defined unconditionally, but it is simpler to
still to just execute the loop unconditionally. That avoids
the application code knowing anything about these magic
macros, and lets it depend only on having ntohll defined.

And since the resulting loop looks like (on a big-endian
system):

  for (i = 0; i < len; i++)
  data[i] = data[i];

any decent compiler can probably optimize it out.

Original report and analysis by Brian Gernhardt.

Signed-off-by: Jeff King 
---
 ewah/ewah_io.c | 10 +++---
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/ewah/ewah_io.c b/ewah/ewah_io.c
index 4a7fae6..f7f700e 100644
--- a/ewah/ewah_io.c
+++ b/ewah/ewah_io.c
@@ -113,6 +113,7 @@ int ewah_serialize(struct ewah_bitmap *self, int fd)
 int ewah_read_mmap(struct ewah_bitmap *self, void *map, size_t len)
 {
uint8_t *ptr = map;
+   size_t i;

self->bit_size = get_be32(ptr);
ptr += sizeof(uint32_t);
@@ -135,13 +136,8 @@ int ewah_read_mmap(struct ewah_bitmap *self, void *map, 
size_t len)
memcpy(self->buffer, ptr, self->buffer_size * sizeof(uint64_t));
ptr += self->buffer_size * sizeof(uint64_t);

-#if __BYTE_ORDER != __BIG_ENDIAN
-   {
-   size_t i;
-   for (i = 0; i < self->buffer_size; ++i)
-   self->buffer[i] = ntohll(self->buffer[i]);
-   }
-#endif
+   for (i = 0; i < self->buffer_size; ++i)
+   self->buffer[i] = ntohll(self->buffer[i]);

self->rlw = self->buffer + get_be32(ptr);

-- 
1.8.5.2.500.g8060133

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/11] More preparatory work for multiparent tree-walker

2014-02-12 Thread Junio C Hamano

Junio C Hamano  writes:

> Kirill Smelkov  writes:
>
>> Sorry for the confusion. Could you please do the following:
>>
>> Patches should be applied over to ks/tree-diff-walk
>> (74aa4a18). Before applying the patches, please cherry-pick
>>
>> c90483d9(tree-diff: no need to manually verify that there is no
>>  mode change for a path)
>>
>> ef4f0928(tree-diff: no need to pass match to
>>  skip_uninteresting())
>>
>> into that branch, and drop them from ks/combine-diff - we'll have one
>> branch reworking the diff tree-walker, and the other taking advantage of
>> it for combine-diff.
>
> As long as that does not lose changes to tests and clean-ups, I'm
> fine with that direction.  For example, I do not know if you want to
> lose e3f62d12 (diffcore-order: export generic ordering interface,
> 2014-01-20), which is part of the combine-diff topic.

Ahh, sorry, I misread the "drop" as "salvage these two and drop the
rest".  The new series does apply cleanly on a commit in master..pu
that has both ks/tree-diff-walk and ks/combine-diff, and the latter
is not yet in 'next' so we are free to reorganize.

Let me flip the latter topic around, also queue these updates and
push the result out on 'pu'.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 2/2] gc: config option for running --auto in background

2014-02-12 Thread Junio C Hamano

Duy Nguyen  writes:

> On Tue, Feb 11, 2014 at 2:11 AM, Junio C Hamano  wrote:
>> On Mon, Feb 10, 2014 at 10:43 AM, Junio C Hamano  wrote:
 If --quiet is set, we should not be printing anyway. If not, I thinkg
 we could only print "auto packing in background.." when we actually
 can do that, else just print the old message. It means an #ifdef
 NO_POSIX_GOODIES here again though..
>>>
>>> Didn't you change it not to die but return nosys or something?
>>
>> Ah, the problem is that it is too late to take back "... will do so in
>> the background" when you noticed that daemonize() did not succeed, so
>> you would need a way to see if we can daemonize() before actually
>> doing so if you want to give different messages.
>>
>> "int can_daemonize(void)" could be an answer that is nicer than
>> NO_POSIX_GOODIES, but I am not sure if it is worth it.
>
> Or we could pass the "quiet" flag to daemonize() and let it print
> something in the #ifdef NO_POSIX_GOODIES part.

Hmph...  What would that something say?  "I was asked to gc in the
background but I can't here" is not suitable for daemonize() that is
not specific to "gc".

The flow I had in mind was something along the lines of this

if (!quiet) {
if (detach_auto && can_daemonize())
say "auto packing in the background";
else
say "auto packing"
}
if (detach_auto && can_daemonize())
daemonize();

If we had daemonize(noisy=1) and coded it this way:

if (!quiet)
say "auto packing";
if (detach_auto)
daemonize(!quiet);

we could do something like:

daemonize(int noisy) {
if (noisy && !defined(NO_POSIX_GOODIES))
say ", and doing so in the background";
... do the actual daemonizing ...
}

but that feels ugly.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Make the git codebase thread-safe

2014-02-12 Thread Stefan Zager

On Tue, Feb 11, 2014 at 6:11 PM, Duy Nguyen  wrote:
>
> I have no comments about thread safety improvements (well, not yet).
> If you have investigated about git performance on chromium
> repositories, could you please sum it up? Threading may be an option
> to improve performance, but it's probably not the only option.

Well, the painful operations that we use frequently are pack-objects,
checkout, status, and blame.  Anything on Windows that touches a lot
of files is miserable due to the usual file system slowness on
Windows, and luafv.sys (the UAC file virtualization driver) seems to
make it much worse.

With threading turned on, pack-objects on Windows now takes about
twice as long as on Linux, which is still more than a 2x improvement
over the non-threaded operation.

Checkout is really bad on Windows.  The blink repository is ~200K
files, and a full clean checkout from the index takes about 10 seconds
on Linux, and about 3:30 on Windows.  I used the Very Sleepy profiler
to see where all the time was spent on Windows: 55% of the time was
spent in OpenFile, and 25% in CloseFile (both in win32).  My immediate
goal is to add threading to checkout, so those file system calls can
be done in parallel.

Enabling the fscache speeds up status quite a bit.  I'm optimistic
that parallelizing the stat calls will yield a further improvement.
Beyond that, it may not be possible to do much more without using a
file system watcher daemon, like facebook does with mercurial.
(https://code.facebook.com/posts/218678814984400/scaling-mercurial-at-facebook/)

Blame is something that chromium and blink developers use heavily, and
it is not unusual for a blame invocation on the blink repository to
run for 30 seconds.  It seems like it should be possible to
parallelize blame, but it requires pack file operations to be
thread-safe.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Make the git codebase thread-safe

2014-02-12 Thread Stefan Zager

On Tue, Feb 11, 2014 at 7:43 PM, Duy Nguyen  wrote:
>
> From v1.9 shallow clone should work for all push/pull/clone... so
> history depth does not matter (on the client side). As for
> gentoo-x86's large worktree, using index v4 and avoid full-tree
> operations (e.g. "status .", not "status"..) should make all
> operations reasonably fast. I plan to make "status" fast even without
> path limiting with the help of inotify, but that's not going to be
> finished soon. Did I miss anything else?

Chromium developers frequently want to run status over their entire
checkout, and a lot of them run 'git commit -a'.  We want to do
everything possible to speed this up.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Make the git codebase thread-safe

2014-02-12 Thread Stefan Zager

On Wed, Feb 12, 2014 at 3:59 AM, Erik Faye-Lund  wrote:
> On Wed, Feb 12, 2014 at 2:54 AM, Stefan Zager  wrote:
>>
>> We are particularly concerned with the performance of msysgit, and we
>> have already chalked up a significant performance gain by turning on
>> the threading code in pack-objects (which was already enabled for
>> posix platforms, but not on msysgit, owing to the lack of a correct
>> pread implementation).
>
> How did you manage to do this? I'm not aware of any way to implement
> pread on Windows (without going down the insanity-path of wrapping and
> potentially locking inside every IO operation)...

I don't want to steal the thunder of my coworker, who wrote the
implementation.  He plans to submit it upstream soon-ish.  It relies
on using the lpOverlapped argument to ReadFile(), with some additional
tomfoolery to make sure that the implicit position pointer for the
file descriptor doesn't get modified.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Make the global packed_git variable static to sha1_file.c.

2014-02-12 Thread Stefan Zager

On Tue, Feb 11, 2014 at 11:29 PM, Chris Packham  wrote:
> Hi,
>
> On 12/02/14 14:57, Stefan Zager wrote:
>> From b4796d9d99c03b0b7cddd50808a41413e45f1129 Mon Sep 17 00:00:00 2001
>> From: Stefan Zager 
>> Date: Mon, 10 Feb 2014 16:55:12 -0800
>> Subject: [PATCH] Make the global packed_git variable static to sha1_file.c.
>
> I'm not really qualified to comment on substance but there are some
> basic style issues w.r.t. whitespace namely using 4 spaces for indent
> and mixing tabs/spaces. This might seem pedantic for the first round of
> a patch but it does put off reviewers.
>
> From Documentation/CodingGuidelines:
>
>  - We use tabs to indent, and interpret tabs as taking up to
>8 spaces.

My bad, I will upload a fixed patch.  In my defense: I edited the code
in emacs and then ran "M-x tabify" over the entire file.  But that had
the unfortunate side effect of adding a bunch of whitespace-only
changes to the diff, illuminating the fact that there is already mixed
whitespace in the existing code.  So I had to go back and selectively
tabify my changes, and I clearly missed a bunch.

If anyone has a recommendation for a less labor-intensive way to do
this in emacs, I'd be very grateful.

Thanks,

Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Make the git codebase thread-safe

2014-02-12 Thread Erik Faye-Lund

On Wed, Feb 12, 2014 at 7:20 PM, Stefan Zager  wrote:
> On Wed, Feb 12, 2014 at 3:59 AM, Erik Faye-Lund  wrote:
>> On Wed, Feb 12, 2014 at 2:54 AM, Stefan Zager  wrote:
>>>
>>> We are particularly concerned with the performance of msysgit, and we
>>> have already chalked up a significant performance gain by turning on
>>> the threading code in pack-objects (which was already enabled for
>>> posix platforms, but not on msysgit, owing to the lack of a correct
>>> pread implementation).
>>
>> How did you manage to do this? I'm not aware of any way to implement
>> pread on Windows (without going down the insanity-path of wrapping and
>> potentially locking inside every IO operation)...
>
> I don't want to steal the thunder of my coworker, who wrote the
> implementation.  He plans to submit it upstream soon-ish.  It relies
> on using the lpOverlapped argument to ReadFile(), with some additional
> tomfoolery to make sure that the implicit position pointer for the
> file descriptor doesn't get modified.

Is the code available somewhere? I'm especially interested in the
"additional tomfoolery to make sure that the implicit position pointer
for the file descriptor doesn't get modified"-part, as this was what I
ended up butting my head into when trying to do this myself.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Make the git codebase thread-safe

2014-02-12 Thread Matthieu Moy

Stefan Zager  writes:

> I'm optimistic that parallelizing the stat calls will yield a further
> improvement.

It has already been mentionned in the thread, but in case you overlooked
it: did you look at core.preloadindex? It seems at least very close to
what you want.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Make the git codebase thread-safe

2014-02-12 Thread Stefan Zager

On Wed, Feb 12, 2014 at 10:27 AM, Erik Faye-Lund  wrote:
> On Wed, Feb 12, 2014 at 7:20 PM, Stefan Zager  wrote:
>>
>> I don't want to steal the thunder of my coworker, who wrote the
>> implementation.  He plans to submit it upstream soon-ish.  It relies
>> on using the lpOverlapped argument to ReadFile(), with some additional
>> tomfoolery to make sure that the implicit position pointer for the
>> file descriptor doesn't get modified.
>
> Is the code available somewhere? I'm especially interested in the
> "additional tomfoolery to make sure that the implicit position pointer
> for the file descriptor doesn't get modified"-part, as this was what I
> ended up butting my head into when trying to do this myself.

https://chromium-review.googlesource.com/#/c/186104/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Make the git codebase thread-safe

2014-02-12 Thread Stefan Zager

On Wed, Feb 12, 2014 at 10:33 AM, Matthieu Moy
 wrote:
> Stefan Zager  writes:
>
>> I'm optimistic that parallelizing the stat calls will yield a further
>> improvement.
>
> It has already been mentionned in the thread, but in case you overlooked
> it: did you look at core.preloadindex? It seems at least very close to
> what you want.

Ah yes, sorry, I overlooked that.  We have indeed turned on
core.preloadindex, and it does indeed speed up status.  That speedup
is reflected in my previous comments about our observations working
with chromium and blink.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Make the git codebase thread-safe

2014-02-12 Thread Erik Faye-Lund

On Wed, Feb 12, 2014 at 7:34 PM, Stefan Zager  wrote:
> On Wed, Feb 12, 2014 at 10:27 AM, Erik Faye-Lund  wrote:
>> On Wed, Feb 12, 2014 at 7:20 PM, Stefan Zager  wrote:
>>>
>>> I don't want to steal the thunder of my coworker, who wrote the
>>> implementation.  He plans to submit it upstream soon-ish.  It relies
>>> on using the lpOverlapped argument to ReadFile(), with some additional
>>> tomfoolery to make sure that the implicit position pointer for the
>>> file descriptor doesn't get modified.
>>
>> Is the code available somewhere? I'm especially interested in the
>> "additional tomfoolery to make sure that the implicit position pointer
>> for the file descriptor doesn't get modified"-part, as this was what I
>> ended up butting my head into when trying to do this myself.
>
> https://chromium-review.googlesource.com/#/c/186104/

ReOpenFile, that's fantastic. Thanks a lot!
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Make the git codebase thread-safe

2014-02-12 Thread David Kastrup

Stefan Zager  writes:

> On Tue, Feb 11, 2014 at 6:11 PM, Duy Nguyen  wrote:
>>
>> I have no comments about thread safety improvements (well, not yet).
>> If you have investigated about git performance on chromium
>> repositories, could you please sum it up? Threading may be an option
>> to improve performance, but it's probably not the only option.
>
> Well, the painful operations that we use frequently are pack-objects,
> checkout, status, and blame.

Have you checked the patch in
http://thread.gmane.org/gmane.comp.version-control.git/241448> and
followups,
Message-ID: <1391454849-26558-1-git-send-email-...@gnu.org>?

While this does not yet support -M and -C options, it's conceivable that
you don't use them in your server/scripts.

> Anything on Windows that touches a lot of files is miserable due to
> the usual file system slowness on Windows, and luafv.sys (the UAC file
> virtualization driver) seems to make it much worse.

There is an obvious solution here...  Dedicated hardware is not that
expensive.  Virtualization will always have a price.

> Blame is something that chromium and blink developers use heavily, and
> it is not unusual for a blame invocation on the blink repository to
> run for 30 seconds.  It seems like it should be possible to
> parallelize blame, but it requires pack file operations to be
> thread-safe.

Really, give the above patch a try.  I am taking longer to finish it
than anticipated (with a lot due to procrastination but that is,
unfortunately, a large part of my workflow), and it's cutting into my
"paychecks" (voluntary donations which to a good degree depend on timely
and nontrivial progress reports for my freely available work on GNU
LilyPond).

Note that it looks like the majority of the remaining time on GNU/Linux
tends to be spent in system time: I/O time, memory management.  And I
have an SSD drive.  When using packed repositories of considerable size,
decompression comes into play as well.  I don't think that you can hope
to get noticeably higher I/O throughput by multithreading, so really,
really, really consider dedicated hardware running on a native Linux
file system.

-- 
David Kastrup
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Make the global packed_git variable static to sha1_file.c.

2014-02-12 Thread David Kastrup

Stefan Zager  writes:

> On Tue, Feb 11, 2014 at 11:29 PM, Chris Packham  
> wrote:
>> Hi,
>>
>> On 12/02/14 14:57, Stefan Zager wrote:
>>> From b4796d9d99c03b0b7cddd50808a41413e45f1129 Mon Sep 17 00:00:00 2001
>>> From: Stefan Zager 
>>> Date: Mon, 10 Feb 2014 16:55:12 -0800
>>> Subject: [PATCH] Make the global packed_git variable static to sha1_file.c.
>>
>> I'm not really qualified to comment on substance but there are some
>> basic style issues w.r.t. whitespace namely using 4 spaces for indent
>> and mixing tabs/spaces. This might seem pedantic for the first round of
>> a patch but it does put off reviewers.
>>
>> From Documentation/CodingGuidelines:
>>
>>  - We use tabs to indent, and interpret tabs as taking up to
>>8 spaces.
>
> My bad, I will upload a fixed patch.  In my defense: I edited the code
> in emacs and then ran "M-x tabify" over the entire file.  But that had
> the unfortunate side effect of adding a bunch of whitespace-only
> changes to the diff, illuminating the fact that there is already mixed
> whitespace in the existing code.  So I had to go back and selectively
> tabify my changes, and I clearly missed a bunch.
>
> If anyone has a recommendation for a less labor-intensive way to do
> this in emacs, I'd be very grateful.

C-c . RET linux RET before entering any changes.

-- 
David Kastrup
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] Make the global packed_git variable static to sha1_file.c.

2014-02-12 Thread szager

>From 0a59547f3e95ddecf7606c5f259ae6177c5a104f Mon Sep 17 00:00:00 2001
From: Stefan Zager 
Date: Mon, 10 Feb 2014 16:55:12 -0800
Subject: [PATCH] Make the global packed_git variable static to sha1_file.c.

This is a first step in making the codebase thread-safe.  By and
large, the operations which might benefit from threading are those
that work with pack files (e.g., checkout, blame), so the focus of
this patch is stop leaking the global list of pack files outside of
sha1_file.c.

The next step will be to control access to the list of pack files
with a mutex.  However, that alone is not enough to make pack file
access thread safe.  Even in a read-only operation, the window list
associated with each pack file will need to be controlled.
Additionally, the global counters in sha1_file.c will need to be
controlled.

This patch is a pure refactor with no functional changes, so it
shouldn't require any additional tests.  Adding the actual locks
will be a functional change, and will require additional tests.

Signed-off-by: Stefan Zager 
---
 builtin/count-objects.c  |  44 ++-
 builtin/fsck.c   |  46 +++-
 builtin/gc.c |  26 +++
 builtin/pack-objects.c   | 188 ---
 builtin/pack-redundant.c |  37 +++---
 cache.h  |  16 +++-
 fast-import.c|   4 +-
 http-backend.c   |  28 ---
 http-push.c  |   4 +-
 http-walker.c|   2 +-
 pack-revindex.c  |  20 ++---
 server-info.c|  35 +
 sha1_file.c  |  35 -
 sha1_name.c  |  18 -
 14 files changed, 315 insertions(+), 188 deletions(-)

diff --git a/builtin/count-objects.c b/builtin/count-objects.c
index a7f70cb..6554dfe 100644
--- a/builtin/count-objects.c
+++ b/builtin/count-objects.c
@@ -83,14 +83,32 @@ static char const * const count_objects_usage[] = {
NULL
 };
 
+struct pack_data {
+   unsigned long packed;
+   off_t size_pack;
+   unsigned long num_pack;
+};
+
+int pack_data_fn(struct packed_git *p, void *data)
+{
+   struct pack_data *pd = (struct pack_data *) data;
+   if (p->pack_local && !open_pack_index(p)) {
+   pd->packed += p->num_objects;
+   pd->size_pack += p->pack_size + p->index_size;
+   pd->num_pack++;
+   }
+   return 0;
+}
+
 int cmd_count_objects(int argc, const char **argv, const char *prefix)
 {
int i, verbose = 0, human_readable = 0;
const char *objdir = get_object_directory();
int len = strlen(objdir);
char *path = xmalloc(len + 50);
-   unsigned long loose = 0, packed = 0, packed_loose = 0;
+   unsigned long loose = 0, packed_loose = 0;
off_t loose_size = 0;
+   struct pack_data pd = {0,0,0};
struct option opts[] = {
OPT__VERBOSE(&verbose, N_("be verbose")),
OPT_BOOL('H', "human-readable", &human_readable,
@@ -118,41 +136,29 @@ int cmd_count_objects(int argc, const char **argv, const 
char *prefix)
closedir(d);
}
if (verbose) {
-   struct packed_git *p;
-   unsigned long num_pack = 0;
-   off_t size_pack = 0;
struct strbuf loose_buf = STRBUF_INIT;
struct strbuf pack_buf = STRBUF_INIT;
struct strbuf garbage_buf = STRBUF_INIT;
-   if (!packed_git)
-   prepare_packed_git();
-   for (p = packed_git; p; p = p->next) {
-   if (!p->pack_local)
-   continue;
-   if (open_pack_index(p))
-   continue;
-   packed += p->num_objects;
-   size_pack += p->pack_size + p->index_size;
-   num_pack++;
-   }
+   prepare_packed_git();
+   foreach_packed_git(pack_data_fn, NULL, &pd);
 
if (human_readable) {
strbuf_humanise_bytes(&loose_buf, loose_size);
-   strbuf_humanise_bytes(&pack_buf, size_pack);
+   strbuf_humanise_bytes(&pack_buf, pd.size_pack);
strbuf_humanise_bytes(&garbage_buf, size_garbage);
} else {
strbuf_addf(&loose_buf, "%lu",
(unsigned long)(loose_size / 1024));
strbuf_addf(&pack_buf, "%lu",
-   (unsigned long)(size_pack / 1024));
+   (unsigned long)(pd.size_pack / 1024));
strbuf_addf(&garbage_buf, "%lu",
(unsigned long)(size_garbage / 1024));
}
 
printf("count: %lu\n", loose);
printf("size: %s\n", loose_buf.buf);
-   printf("in-pack: %lu\n", packed);
-

Re: Make the git codebase thread-safe

2014-02-12 Thread Stefan Zager

On Wed, Feb 12, 2014 at 10:50 AM, David Kastrup  wrote:
> Stefan Zager  writes:
>
>> Anything on Windows that touches a lot of files is miserable due to
>> the usual file system slowness on Windows, and luafv.sys (the UAC file
>> virtualization driver) seems to make it much worse.
>
> There is an obvious solution here...  Dedicated hardware is not that
> expensive.  Virtualization will always have a price.

Not sure I follow you.  We need to support people developing,
building, and testing on natively Windows machines.  And we need to
support users with reasonable hardware, including spinning disks.  If
we were only interested in optimizing for Google employees, each of
whom has one or more small nuclear reactors under their desk, this
would be easy.

>> Blame is something that chromium and blink developers use heavily, and
>> it is not unusual for a blame invocation on the blink repository to
>> run for 30 seconds.  It seems like it should be possible to
>> parallelize blame, but it requires pack file operations to be
>> thread-safe.
>
> Really, give the above patch a try.  I am taking longer to finish it
> than anticipated (with a lot due to procrastination but that is,
> unfortunately, a large part of my workflow), and it's cutting into my
> "paychecks" (voluntary donations which to a good degree depend on timely
> and nontrivial progress reports for my freely available work on GNU
> LilyPond).

I will give that a try.  How much of a performance improvement have you clocked?

> Note that it looks like the majority of the remaining time on GNU/Linux
> tends to be spent in system time: I/O time, memory management.  And I
> have an SSD drive.  When using packed repositories of considerable size,
> decompression comes into play as well.  I don't think that you can hope
> to get noticeably higher I/O throughput by multithreading, so really,
> really, really consider dedicated hardware running on a native Linux
> file system.

I have a background in hardware, and I have much more faith in modern
disk schedulers :)

Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] tests: turn on network daemon tests by default

2014-02-12 Thread Junio C Hamano

Jeff King  writes:

> Agreed, but I think the only way to know the size of those fallouts is
> to try it and see who complains.  I would not normally be so cavalier
> with git itself, but I think for the test infrastructure, we have a
> small, tech-savvy audience that can help us iterate on it without too
> much pain.

There is another.

$ GIT_TEST_HTTPD=false sh t5537-fetch-shallow.sh 
ok 1 - setup
ok 2 - setup shallow clone
ok 3 - clone from shallow clone
ok 4 - fetch from shallow clone
ok 5 - fetch --depth from shallow clone
ok 6 - fetch --unshallow from shallow clone
ok 7 - fetch something upstream has but hidden by clients shallow boundaries
ok 8 - fetch that requires changes in .git/shallow is filtered
ok 9 - fetch --update-shallow
error: Can't use skip_all after running some tests

Under 'prove' test target, the way it exits causes:

*** prove ***
t5537-fetch-shallow.sh .. Dubious, test returned 1 (wstat 256, 0x100)
All 9 subtests passed

which leads to:

Test Summary Report
---
t5537-fetch-shallow.sh (Wstat: 256 Tests: 9 Failed: 0)
  Non-zero exit status: 1
  Parse errors: No plan found in TAP output


On the 'master' branch without these "auto opt-in" patches,

$ GIT_TEST_HTTPD= sh t5537-fetch-shallow.sh 
ok 1 - setup
ok 2 - setup shallow clone
ok 3 - clone from shallow clone
ok 4 - fetch from shallow clone
ok 5 - fetch --depth from shallow clone
ok 6 - fetch --unshallow from shallow clone
ok 7 - fetch something upstream has but hidden by clients shallow boundaries
ok 8 - fetch that requires changes in .git/shallow is filtered
ok 9 - fetch --update-shallow
skipping remaining tests, git built without http support
# passed all 9 test(s)
1..9

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Make the git codebase thread-safe

2014-02-12 Thread David Kastrup

Stefan Zager  writes:

> On Wed, Feb 12, 2014 at 10:50 AM, David Kastrup  wrote:
>
>> Really, give the above patch a try.  I am taking longer to finish it
>> than anticipated (with a lot due to procrastination but that is,
>> unfortunately, a large part of my workflow), and it's cutting into my
>> "paychecks" (voluntary donations which to a good degree depend on timely
>> and nontrivial progress reports for my freely available work on GNU
>> LilyPond).
>
> I will give that a try.  How much of a performance improvement have
> you clocked?

Depends on file type and size.  With large files with lots of small
changes, performance improvements get more impressive.

Some ugly real-world examples are the Emacs repository, src/xdisp.c
(performance improvement about a factor of 3), a large file in the style
of /usr/share/dict/words clocking in at a factor of about 5.

Again, that's with an SSD and ext4 filesystem on GNU/Linux, and there
are no improvements in system time (I/O) except for patch 4 of the
series which helps perhaps 20% or so.

So the benefits of the patch will come into play mostly for big, bad
files on Windows: other than that, the I/O time is likely to be the
dominant player anyway.

If you have benchmarked the stuff, for annoying cases expect I/O time to
go down maybe 10-20%, and user time to drop by a factor of 4.  Under
GNU/Linux, that makes for a significant overall improvement.  On
Windows, the payback is likely quite less because of the worse I/O
performance.  Pity.

-- 
David Kastrup
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Make the global packed_git variable static to sha1_file.c.

2014-02-12 Thread Junio C Hamano

Stefan Zager  writes:

> If anyone has a recommendation for a less labor-intensive way to do
> this in emacs, I'd be very grateful.

This is not "do this in emacs", but here is a possible approach.

You can ask "git diff" about what you changed, and actually apply
the change while fixing whitespace errors.  I.e.

git diff sha1_file.c | git apply --cached --whitespace=fix
git diff
git checkout sha1_file.c

The first step will add a cleaned-up version to your index.

The second "diff" (optional) is to see what whitespace errors are
introduced when going from that cleaned-up version to what you have
in the working tree.

With the last step you would update the working tree version to the
cleaned-up version from the index.

[alias]
wsadd = "!sh -c 'git diff -- \"$@\" | git apply --cached 
--whitespace=fix;\
git co -- ${1-.} \"$@\"' -"
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Make the git codebase thread-safe

2014-02-12 Thread Karsten Blees

Am 12.02.2014 19:37, schrieb Erik Faye-Lund:
> On Wed, Feb 12, 2014 at 7:34 PM, Stefan Zager  wrote:
>> On Wed, Feb 12, 2014 at 10:27 AM, Erik Faye-Lund  wrote:
>>> On Wed, Feb 12, 2014 at 7:20 PM, Stefan Zager  wrote:

 I don't want to steal the thunder of my coworker, who wrote the
 implementation.  He plans to submit it upstream soon-ish.  It relies
 on using the lpOverlapped argument to ReadFile(), with some additional
 tomfoolery to make sure that the implicit position pointer for the
 file descriptor doesn't get modified.
>>>
>>> Is the code available somewhere? I'm especially interested in the
>>> "additional tomfoolery to make sure that the implicit position pointer
>>> for the file descriptor doesn't get modified"-part, as this was what I
>>> ended up butting my head into when trying to do this myself.
>>
>> https://chromium-review.googlesource.com/#/c/186104/
> 
> ReOpenFile, that's fantastic. Thanks a lot!

...but should be loaded dynamically via GetProcAddress, or are we ready to drop 
XP support?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Make the git codebase thread-safe

2014-02-12 Thread Stefan Zager

On Wed, Feb 12, 2014 at 11:22 AM, Karsten Blees  wrote:
> Am 12.02.2014 19:37, schrieb Erik Faye-Lund:
>> On Wed, Feb 12, 2014 at 7:34 PM, Stefan Zager  wrote:
>>> On Wed, Feb 12, 2014 at 10:27 AM, Erik Faye-Lund  
>>> wrote:
 On Wed, Feb 12, 2014 at 7:20 PM, Stefan Zager  wrote:
>
> I don't want to steal the thunder of my coworker, who wrote the
> implementation.  He plans to submit it upstream soon-ish.  It relies
> on using the lpOverlapped argument to ReadFile(), with some additional
> tomfoolery to make sure that the implicit position pointer for the
> file descriptor doesn't get modified.

 Is the code available somewhere? I'm especially interested in the
 "additional tomfoolery to make sure that the implicit position pointer
 for the file descriptor doesn't get modified"-part, as this was what I
 ended up butting my head into when trying to do this myself.
>>>
>>> https://chromium-review.googlesource.com/#/c/186104/
>>
>> ReOpenFile, that's fantastic. Thanks a lot!
>
> ...but should be loaded dynamically via GetProcAddress, or are we ready to 
> drop XP support?

Right, that is an issue.  From our perspective, it's well past time to
drop XP support.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] blame.c: prepare_lines should not call xrealloc for every line

2014-02-12 Thread Junio C Hamano

David Kastrup  writes:

> Making a single preparation run for counting the lines will avoid memory
> fragmentation.  Also, fix the allocated memory size which was wrong
> when sizeof(int *) != sizeof(int), and would have been too small
> for sizeof(int *) < sizeof(int), admittedly unlikely.
>
> Signed-off-by: David Kastrup 
> ---

I think I took sizeof(int*)->sizeof(int) patch to the 'next' branch
already, which might have to conflict with this clean-up, but it
should be trivial to resolve.

Thanks for resending.  I was busy elsewhere (i.e. "no feedback" does
not mean "silent rejection" nor "silent agreement" at least from
me), and such a resend does help prevent patches fall thru cracks.

> diff --git a/builtin/blame.c b/builtin/blame.c
> index e44a6bb..1aefedf 100644
> --- a/builtin/blame.c
> +++ b/builtin/blame.c
> @@ -1772,25 +1772,41 @@ static int prepare_lines(struct scoreboard *sb)
>  {
>   const char *buf = sb->final_buf;
>   unsigned long len = sb->final_buf_size;
> + const char *end = buf + len;
> + const char *p;
> + int *lineno;
> + int num = 0, incomplete = 0;
>  
> + for (p = buf;;) {
> + p = memchr(p, '\n', end - p);
> + if (p) {
> + p++;
>   num++;
> + continue;
>   }
> + break;
>   }
> +
> + if (len && end[-1] != '\n')
> + incomplete++; /* incomplete line at the end */
> +
> + sb->lineno = xmalloc(sizeof(*sb->lineno) * (num + incomplete + 1));
> + lineno = sb->lineno;
> +
> + *lineno++ = 0;
> + for (p = buf;;) {
> + p = memchr(p, '\n', end - p);
> + if (p) {
> + p++;
> + *lineno++ = p - buf;
> + continue;
> + }
> + break;
> + }
> +
> + if (incomplete)
> + *lineno++ = len;
> +
>   sb->num_lines = num + incomplete;
>   return sb->num_lines;
>  }
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] tree-walk: be more specific about corrupt tree errors

2014-02-12 Thread Jeff King

When the tree-walker runs into an error, it just calls
die(), and the message is always "corrupt tree file".
However, we are actually covering several cases here; let's
give the user a hint about what happened.

Let's also avoid using the word "corrupt", which makes it
seem like the data bit-rotted on disk. Our sha1 check would
already have found that. These errors are ones of data that
is malformed in the first place.

Signed-off-by: Jeff King 
---
Michael and I have been looking off-list at some bogus trees (created by
a non-git.git implementation). git-fsck unhelpfully dies during the
tree-walk:

  $ git fsck
  Checking object directories: 100% (256/256), done.
  fatal: corrupt tree file

I think in the long run we want to either teach fsck to avoid the
regular tree-walker or to set a special "continue as much as you can"
flag. That will let us keep going to find more errors, do our usual fsck
error checks (which are more strict), and especially report on _which_
object was broken (we can't do that here because we are deep in the call
stack and may not even have a real object yet).

This patch at least gives us slightly more specific error messages (both
for fsck and for other commands). And it may provide a first step in
clarity if we follow the "continue as much as you can" path.

 tree-walk.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/tree-walk.c b/tree-walk.c
index 79dba1d..d53b084 100644
--- a/tree-walk.c
+++ b/tree-walk.c
@@ -28,11 +28,13 @@ static void decode_tree_entry(struct tree_desc *desc, const 
char *buf, unsigned
unsigned int mode, len;
 
if (size < 24 || buf[size - 21])
-   die("corrupt tree file");
+   die("truncated tree file");
 
path = get_mode(buf, &mode);
-   if (!path || !*path)
-   die("corrupt tree file");
+   if (!path)
+   die("malformed mode in tree entry");
+   if (!*path)
+   die("empty filename in tree entry");
len = strlen(path) + 1;
 
/* Initialize the descriptor entry */
@@ -81,7 +83,7 @@ void update_tree_entry(struct tree_desc *desc)
unsigned long len = end - (const unsigned char *)buf;
 
if (size < len)
-   die("corrupt tree file");
+   die("truncated tree file");
buf = end;
size -= len;
desc->buffer = buf;
-- 
1.8.5.2.500.g8060133
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Make the git codebase thread-safe

2014-02-12 Thread Junio C Hamano

Stefan Zager  writes:

> ...  I used the Very Sleepy profiler
> to see where all the time was spent on Windows: 55% of the time was
> spent in OpenFile, and 25% in CloseFile (both in win32).

This is somewhat interesting.

When we check things out, checkout_paths() has a list of paths to be
checked out, and iterates over them and call checkout_entry().

I wonder if you can:

 - introduce a version of checkout_entry() that takes file
   descriptors to write to;

 - have an asynchronous helper threads that pre-open the paths to be
   written out and feed  to a
   queue;

 - restructure that loop so that it reads the  from the queue, performs the actual writing out,
   and then feeds  to another queue; and

 - have another asynchronous helper threads that reads  from the queue and close them.

Calls to write (and preparation of data to be written) will then
remain single-threaded, but it sounds like that codepath is not the
bottleneck in your measurement, so

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2] Make the global packed_git variable static to sha1_file.c.

2014-02-12 Thread Junio C Hamano

I'll locally fix up these style issues before commenting on the patch.

Thanks.

ERROR: space required after that ',' (ctx:VxV)
#78: FILE: builtin/count-objects.c:111:
+   struct pack_data pd = {0,0,0};
^

ERROR: space required after that ',' (ctx:VxV)
#78: FILE: builtin/count-objects.c:111:
+   struct pack_data pd = {0,0,0};
  ^

ERROR: space required after that ',' (ctx:VxV)
#171: FILE: builtin/fsck.c:683:
+   struct verify_packs_data vpd = {0,0,0};
 ^

ERROR: space required after that ',' (ctx:VxV)
#171: FILE: builtin/fsck.c:683:
+   struct verify_packs_data vpd = {0,0,0};
   ^

ERROR: space required after that ',' (ctx:VxV)
#541: FILE: builtin/pack-redundant.c:591:
+   struct add_pack_data apd = {filename,0,NULL};
^

ERROR: space required after that ',' (ctx:VxV)
#541: FILE: builtin/pack-redundant.c:591:
+   struct add_pack_data apd = {filename,0,NULL};
  ^

ERROR: space required after that ',' (ctx:VxV)
#565: FILE: builtin/pack-redundant.c:604:
+   struct add_pack_data apd = {NULL,0,NULL};
^

ERROR: space required after that ',' (ctx:VxV)
#565: FILE: builtin/pack-redundant.c:604:
+   struct add_pack_data apd = {NULL,0,NULL};
  ^

ERROR: space prohibited after that '-' (ctx:WxW)
#726: FILE: pack-revindex.c:47:
+   num = - 1 - num;
  ^

ERROR: space required after that ',' (ctx:VxV)
#901: FILE: sha1_name.c:195:
+   struct disambiguate_data d = {len,bin_pfx,ds};
 ^

ERROR: space required after that ',' (ctx:VxV)
#901: FILE: sha1_name.c:195:
+   struct disambiguate_data d = {len,bin_pfx,ds};
 ^

total: 11 errors, 0 warnings, 781 lines checked
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Make the git codebase thread-safe

2014-02-12 Thread Stefan Zager

On Wed, Feb 12, 2014 at 12:06 PM, Junio C Hamano  wrote:
> Stefan Zager  writes:
>
>> ...  I used the Very Sleepy profiler
>> to see where all the time was spent on Windows: 55% of the time was
>> spent in OpenFile, and 25% in CloseFile (both in win32).
>
> This is somewhat interesting.
>
> When we check things out, checkout_paths() has a list of paths to be
> checked out, and iterates over them and call checkout_entry().
>
> I wonder if you can:
>
>  - introduce a version of checkout_entry() that takes file
>descriptors to write to;
>
>  - have an asynchronous helper threads that pre-open the paths to be
>written out and feed  to a
>queue;
>
>  - restructure that loop so that it reads the to be written> from the queue, performs the actual writing out,
>and then feeds  to another queue; and
>
>  - have another asynchronous helper threads that reads descriptor to be closed> from the queue and close them.
>
> Calls to write (and preparation of data to be written) will then
> remain single-threaded, but it sounds like that codepath is not the
> bottleneck in your measurement, so

Yes, I considered that as well.  At a minimum, that would still
require attr.c to implement thread locking, since attribute files must
be parsed to look for stream filters.  I have already done that work.

But I'm not sure it's the best long-term approach to add convoluted
custom threading solutions to each git operation as it appears on the
performance radar.  I'm hoping to make the entire code base more
thread-friendly, so that threading can be added in a more natural and
idiomatic (and less painful) way.

For example, the most natural way to add threading to checkout would
be in the loops over the index in check_updates() in unpack-trees.c.
If attr.c and sha1_file.c were thread-safe, then it would be possible
to thread checkout entirely in check_updates(), with a pretty compact
code change.  I have already done the work in attr.c; sha1_file.c is
hairier, but do-able.

Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Introduce experimental remote object access mode

2014-02-12 Thread Shawn Pearce

On Tue, Feb 11, 2014 at 11:29 AM, Junio C Hamano  wrote:
> Shawn Pearce  writes:
>
>> Why would you do this? Perhaps you need more time in your day
>> to consume tea or coffee. Set GIT_RTT and enjoy a beverage.
>
> So the conclusion is that it is not practical to do a lazy fetch if
> it is done extremely naively at "we want this object --- wait a bit
> and we'll give you" level?

Yes, this is what I thought when someone proposed this hack in
sha1_file.c to me on Monday. So I ran a quick experiment to see if my
instinct was right.

> I am wondering if we can do a bit better, like "we want this object
> --- wait a bit, ah that's a commit, so it is likely that you may
> want the trees and blobs associated with it, too, if not right now
> but in a near future, let me push a pack that holds them to you"?

Ah, smart observation. That might work. However I doubt it.

I implemented a version of Git on top of Google Bigtable (and Apache
HBase and Apache Cassandra) multiple times using JGit. tl;dr: this
approach doesn't work in practice.

The naive implementation for these distributed NoSQL systems is to
store each object in its own row keyed by SHA-1, and lookup the object
when you want it. This is very slow and is more or less what this
stupid patch shows. Worse, none of them were able to even get close to
the 1ms latency I used in this example.

In another implementation (which I published into JGit as the "DHT"
backend) I stored a group of related commits together in a row. Row
target sizes were in the 1-2 MiB range when using pack style
compression for commits, so the average row held hundreds of commits.
Reading one commit would actually slurp back a number of related
commits. The idea was if we need commit A now we will need B, C, D, E
(its parents and ancestors) soon as the application walks the revision
history, like rev-list or pack-objects.

For a process like pack-objects this almost seems to work. If commits
are together we can slurp a group at a time to amortize the round trip
latency. Unfortunately the application can still go through hundreds
of commits faster than the real world RTT is. So I tried to fix this
by storing an extra metadata pointer in each row to identify the next
row, so next block of commits could start loading right away. Its
still slow, as the application can scan through data faster than the
RTT.

At least for pack generation the traversal code does commits and
builds up a list of all root trees. The root trees can be async loaded
in batches, but the depth first traversal is still a killer. There are
stalls while the application waits for the next subtree, even if you
cluster the trees also into groups using depth first traversal the way
the packer produces pack files today.

Junio's idea to cluster data by commit and its related trees and blobs
is just a different data organization. It may be necessary to make two
copies of the data, one clustered by commits and another by
commit+tree+blob to satisfy different access patterns. And in the
commit+tree+blob case you may need multiple redundant copies of blobs
near commits that use them if those commits are frequently accessed.
Its a lot of redundant disk space.

We always say disk is cheap, but disk is slow and not getting faster.
SSDs are helping, but SSDs are expensive and have size limitations
compared to spinning disk. Just making many copies of data isn't
necessarily a solution.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] contrib/diff-highlight: multibyte characters diff

2014-02-12 Thread Jeff King

On Tue, Feb 11, 2014 at 06:09:10PM +0900, Yoshihiro Sugi wrote:

> diff-highlight split each hunks and compare them as byte sequences.
> it causes problems when diff hunks include multibyte characters.
> This change enable to work on such cases by decoding inputs and encoding 
> output as utf8 string.

Thanks for looking at this. I didn't consider multibyte characters at
all when I wrote the original.

Sadly, applying your patch seems to cause diff-highlight to take about
1.5x as much CPU (I just tried it on the complete output of "git log
--all -p" in my git.git repository, which went from ~60s to ~90s). That
is not necessarily a deal-breaker, as it is more important to be
correct. But I wonder if there is any way we can optimize it. From my
understanding, it may not be the encoding/decoding itself, but rather
that the utf8-aware text routines are slower. More on that below.

>  # Highlight by reversing foreground and background. You could do
>  # other things like bold or underline if you prefer.
> @@ -15,8 +16,9 @@ my @added;
>  my $in_hunk;
>  
>  while (<>) {
> + $_ = decode_utf8($_);

What happens when the diff content is not utf8 at all? The default
failure mode for decode_utf8 seems to be to replace non-utf8 sequences
with a substitution character. Which means we would corrupt something
like iso8859-1, which works fine in the current code.

>   if (!$in_hunk) {
> - print;
> + print encode_utf8($_);

And we have the opposite question here. We know that what we have in the
string is perl's internal format. But how do we know that what the
user's terminal wants is utf8?

I think in the most general case, we would need to auto-detect the
encoding for each string (since there is no guarantee that a file even
retains the same encoding through its history). And then probably punt on
highlighting when comparing two lines in different encodings, and if
they are in the same encoding, treat them as a sequence of code points
rather than octets when comparing.

That's probably way too slow and complicated to worry about. And
frankly, I am OK with making the tool work _best_ with UTF-8, as long as
it can fail gracefully for other encodings (i.e., if you can turn off
the utf8 magic when you have another encoding, and we will pass it
through blindly as octets).

Would it be enough to do something like the patch below? The specific
things I am shooting for are:

  - the only part of the code that cares about the sequence is the
split/highlight bit, and most lines do not get highlighted at all.
Therefore it tries to lazily do the decode step, which gets back our
lost performance (I measured ~12% slowdown).

  - we try the utf8 decode and if it fails, fall back to treating it
like a sequence of bytes. That doesn't help people with other
multibyte encodings, but at least lets single-byte encodings besides
utf8 continue to work.

I constructed a very simple test of:

  git init &&
  printf 'cont\xc3\xa9nt\n' >file &&
  git add file &&
  printf 'cont\xc3\xb6nt\n' >file &&
  git diff

and it seemed to work. Can you confirm that it does the right thing on
your more complicated cases?

-Peff

---
diff --git a/contrib/diff-highlight/diff-highlight 
b/contrib/diff-highlight/diff-highlight
index c4404d4..9447ba2 100755
--- a/contrib/diff-highlight/diff-highlight
+++ b/contrib/diff-highlight/diff-highlight
@@ -2,6 +2,7 @@

 use warnings FATAL => 'all';
 use strict;
+use Encode qw(decode_utf8 encode_utf8);

 # Highlight by reversing foreground and background. You could do
 # other things like bold or underline if you prefer.
@@ -73,13 +74,23 @@ sub show_hunk {

my @queue;
for (my $i = 0; $i < @$a; $i++) {
-   my ($rm, $add) = highlight_pair($a->[$i], $b->[$i]);
-   print $rm;
-   push @queue, $add;
+   my ($a_dec, $encode_rm) = decode($a->[$i]);
+   my ($b_dec, $encode_add) = decode($b->[$i]);
+   my ($rm, $add) = highlight_pair($a_dec, $b_dec);
+   print $encode_rm->($rm);
+   push @queue, $encode_add->($add);
}
print @queue;
 }

+sub decode {
+   my $orig = shift;
+   my $decoded = eval { decode_utf8($orig, Encode::FB_CROAK) };
+   return defined $decoded ?
+  ($decoded, sub { encode_utf8(shift) }) :
+  ($orig, sub { shift });
+}
+
 sub highlight_pair {
my @a = split_line(shift);
my @b = split_line(shift);
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

65 matches

Mail list logo