Re: [PATCH v2] Make the global packed_git variable static to sha1_file.c.
sza...@chromium.org writes: > From 0a59547f3e95ddecf7606c5f259ae6177c5a104f Mon Sep 17 00:00:00 2001 Please drop this line. > From: Stefan Zager Please drop this line and instead have it in your e-mail header. > Date: Mon, 10 Feb 2014 16:55:12 -0800 The date in your e-mail header, which is the first time general public saw this particular version of the patch, is used by default as the author time. Unless there is a compelling reason to override that with an in-body header, please drop this line. > Subject: [PATCH] Make the global packed_git variable static to sha1_file.c. Please drop this line and instead have it in your e-mail header. > This patch is a pure refactor with no functional changes,... > > diff --git a/builtin/count-objects.c b/builtin/count-objects.c > index a7f70cb..6554dfe 100644 > --- a/builtin/count-objects.c > +++ b/builtin/count-objects.c > @@ -83,14 +83,32 @@ static char const * const count_objects_usage[] = { > NULL > }; > > +struct pack_data { > + unsigned long packed; > + off_t size_pack; > + unsigned long num_pack; > +}; > + > +int pack_data_fn(struct packed_git *p, void *data) Can't/shouldn't this be static? Also I'd suggest s/pack_data_fn/collect_pack_data/ or something. "_fn" may be a good suffix for typedef'ed typename used in a callback function, but for a concrete function, it only adds noise without giving us anything new. Yes, I know there are a few existing violators, but we shouldn't make the codebase worse, using their existence due to past carelessness as an excuse. > diff --git a/cache.h b/cache.h > index dc040fb..542a9d9 100644 > --- a/cache.h > +++ b/cache.h > ... > +/* The 'hint' argument is for the commonly-used 'last found pack' > optimization. > + * It can be NULL. > + */ /* * Please try to have opening slash-asterisk and closing * asterisk-slash in a multi-line comment block on their * own lines by themselves. */ > +extern void foreach_packed_git(packed_git_foreach_fn fn, struct packed_git > *hint, void *data); > + > +extern size_t packed_git_count(); > +extern size_t packed_git_local_count(); extern size_t packed_git_count(void); extern size_t packed_git_local_count(void); > diff --git a/sha1_file.c b/sha1_file.c > index 6e8c05d..aeeb7e6 100644 > --- a/sha1_file.c > +++ b/sha1_file.c > @@ -60,6 +60,7 @@ static struct cached_object empty_tree = { > 0 > }; > > +static struct packed_git *packed_git; > static struct packed_git *last_found_pack; > > static struct cached_object *find_cached_object(const unsigned char *sha1) > @@ -468,7 +469,6 @@ static unsigned int pack_open_fds; > static unsigned int pack_max_fds; > static size_t peak_pack_mapped; > static size_t pack_mapped; > -struct packed_git *packed_git; Hmm, any particular reason why only this variable and not others are moved up? > @@ -1091,6 +1091,37 @@ struct packed_git *add_packed_git(const char *path, > int path_len, int local) > return p; > } > > +void foreach_packed_git(packed_git_foreach_fn fn, struct packed_git *hint, > void *data) > +{ > + struct packed_git *p; > + if (hint && ((*fn)(hint, data))) > + return; > + for (p = packed_git; p; p = p->next) > + if (p != hint && (*fn)(p, data)) > + return; (mental note) In the new API, a non-zero return signals an early return/break from the loop. > +} > + > +size_t packed_git_count() size_t packed_git_count(void) > +{ > + size_t res = 0; > + struct packed_git *p; > + > + for (p = packed_git; p; p = p->next) > + ++res; When pre- or post- increment does not make any difference (i.e. you do not use its value), please stick to post-increment. pre-increment looks unusual in our codebase and becomes distracting while reading it through. Same comments for packed-git-local-count. > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c > index 541667f..bc3074b 100644 > --- a/builtin/pack-objects.c > +++ b/builtin/pack-objects.c > @@ -900,14 +900,45 @@ static int no_try_delta(const char *path) > return 0; > } > > +struct find_pack_data { > + const unsigned char *sha1; > + off_t offset; > + struct packed_git *found_pack; > + int exclude; > + int found_non_local_pack; > + int found_pack_keep; > +}; > + > +static int find_pack_fn(struct packed_git *p, void *data) > +{ > + struct find_pack_data *fpd = (struct find_pack_data *) data; > + off_t offset = find_pack_entry_one(fpd->sha1, p); > + if (offset) { > + if (!fpd->found_pack) { > + if (!is_pack_valid(p)) { > + warning("packfile %s cannot be accessed", > p->pack_name); > + return 0; > + } > + fpd->offset = offset; > + fpd->found_pack = p; > + } > + if (fpd->exclude) > + return 1; > +
Re: [PATCH] tests: turn on network daemon tests by default
On Tue, Feb 11, 2014 at 03:58:27PM -0800, Junio C Hamano wrote: > Sure. One immediate complaint is that I would probably need to do > something like this in the build automation: > > if testing a branch without this patch > then > : do nothing > else > GIT_TEST_GIT_DAEMON=false > fi > > Arguably, it is the fault of the current/original code that treated > *any* non-empty value that is set in the environment variable as > "true"---if it paid attention to GIT_TEST_GIT_DAEMON=NoThanks, we > wouldn't have to have a workaround like this. Yes, I didn't really think about build config that works reliably between both versions (though personally, I think you should be building with GIT_TEST_GIT_DAEMON=true :) ). > I wonder if GIT_TEST_X=$(test_tristate "$GIT_TEST_X") pattern can be > made a bit more friendly, though. For example, can we behave > differently depending on the reason why $GIT_TEST_X is empty? > > - People who have *not* been opting in to the expensive tests have >not done anything special; GIT_TEST_X environment variable did >not exist for them (i.e. unset), and we used to skip when >"$GIT_TEST_X" is an empty string. > > - We want to encourage people who do not care to run these tests. >If people do not do anything, their $GIT_TEST_X will continue to >be an empty string without GIT_TEST_X variable in the >environment. > > If we let people who *do* want to opt out of the expensive tests by > explicitly setting $GIT_TEST_X to an empty string in the new scheme, > wouldn't the transition go a lot smoother? Hmm. So you are suggesting that the old code treated "undefined" and "empty" the same (as "false"). But that in the new code, we would treat them _differently_, taking undefined to mean "auto" and empty to mean "false". I suppose that works, but it is rather unfortunate that the end state we are left with (for all time) makes such a confusing and subtle distinction. I think this should work OK with the existing Makefile conventions. That is, we do not ever set GIT_TEST_HTTPD in the Makefile ourselves, but rely on it being either unset or set to whatever the user likes (this is opposed to something like CFLAGS, where the distinction is long gone). So I'm not excited about it, but I do not think there is any other loophole through which we can maintain compatibility. If that's important, I think we have to do it. > The caller may have to pass the name of the variable: > > GIT_TEST_DAEMON=$(test_tristate GIT_TEST_DAEMON) I don't think that's a big deal. I actually was tempted to just make this: test_normalize_tristate GIT_TEST_DAEMON in the first place, since you would always want to look at the normalized value from there on out. > and then the callee may become > > test_tristate () { > variable=$1 > if eval "test x\"\${$variable+isset}\" = xisset" Hmm, today I learned about "{foo:+bar}" versus "${foo+bar}". I'm not sure how that bit of shell trivia escaped me for so many years. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Process of updating a local branch from another with tracking
I have the following alias defined: sync = "!f() { cbr=$(git rev-parse --abbrev-ref HEAD); git co $1 && git pull && git co $cbr && git rebase $1; }; f" The goal is to basically update a local branch which tracks a branch on a remote, and then rebase my local topic branch onto that updated local branch. I do this instead of just rebasing onto origin/master. Example: git checkout master git pull --rebase origin master git checkout topic1 git rebase master I could do this instead but not sure if it is recommended: git checkout topic1 git fetch git rebase origin/master Any thoughts on the alias I created? I'm a Windows user and still new to linux stuff, so if it sucks or could be simplified just let me know. And as a secondary question, just curious what people think about rebasing onto a remote tracking branch. When I do merges I usually refer to the remote branch, but during rebase I use the local branch instead, but I don't know if there is any functional reason to not skip the local branch and go straight to the remote tracking branch (it saves a step). -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
What's cooking in git.git (Feb 2014, #04; Wed, 12)
Here are the topics that have been cooking. Commits prefixed with '-' are only in 'pu' (proposed updates) while commits prefixed with '+' are in 'next'. As a workaround to make life easier for third-party tools, the upcoming major release will be called "Git 1.9.0" (not "Git 1.9"). The first maintenance release for it will be "Git 1.9.1", and the major release after "Git 1.9.0" will either be "Git 2.0.0" or "Git 1.10.0". You can find the changes described here in the integration branches of the repositories listed at http://git-blame.blogspot.com/p/git-public-repositories.html -- [New Topics] * al/docs (2014-02-11) 4 commits - docs/git-blame: explain more clearly the example pickaxe use - docs/git-clone: clarify use of --no-hardlinks option - docs/git-remote: capitalize first word of initial blurb - docs/merge-strategies: remove hyphen from mis-merges A handful of documentation updates, all trivially harmless. Will merge to 'next'. * jk/test-ports (2014-02-10) 2 commits - tests: auto-set git-daemon port - tests: auto-set LIB_HTTPD_PORT from test name (this branch is tangled with nd/http-fetch-shallow-fix.) Avoid having to assign port number to be used in tests manually. Will merge to 'next'. * nd/daemonize-gc (2014-02-10) 2 commits - gc: config option for running --auto in background - daemon: move daemonize() to libgit.a Allow running "gc --auto" in the background. Will merge to 'next'. * nd/gitignore-trailing-whitespace (2014-02-10) 2 commits - dir: ignore trailing spaces in exclude patterns - dir: warn about trailing spaces in exclude patterns Warn and then ignore trailing whitespaces in .gitignore files, unless they are quoted for fnmatch(3), e.g. "path\ ". * nd/log-show-linear-break (2014-02-10) 1 commit - log: add --show-linear-break to help see non-linear history * ss/completion-rec-sub-fetch-push (2014-02-11) 1 commit - completion: teach --recurse-submodules to fetch, pull and push * ks/tree-diff-more (2014-02-12) 16 commits - tree-diff: reuse base str(buf) memory on sub-tree recursion - tree-diff: no need to call "full" diff_tree_sha1 from show_path() - tree-diff: rework diff_tree interface to be sha1 based - tree-diff: remove special-case diff-emitting code for empty-tree cases - tree-diff: simplify tree_entry_pathcmp - tree-diff: show_path prototype is not needed anymore - tree-diff: rename compare_tree_entry -> tree_entry_pathcmp - tree-diff: move all action-taking code out of compare_tree_entry() - tree-diff: don't assume compare_tree_entry() returns -1,0,1 - FIXUP! - tree-diff: consolidate code for emitting diffs and recursion in one place - tree-diff: show_tree() is not needed - tree-diff: no need to pass match to skip_uninteresting() - tree-diff: no need to manually verify that there is no mode change for a path - combine-diff: move changed-paths scanning logic into its own function - combine-diff: move show_log_first logic/action out of paths scanning (this branch uses ks/combine-diff and ks/tree-diff-walk.) * jh/note-trees-record-blobs (2014-02-12) 1 commit - notes: Disallow reusing non-blob as a note object * jk/run-network-tests-by-default (2014-02-12) 1 commit - tests: turn on network daemon tests by default Teach "make test" to run networking tests when possible by default. Needs a bit more work. e.g. $gmane/242013 -- [Stalled] * po/everyday-doc (2014-01-27) 1 commit - Make 'git help everyday' work This may make the said command to emit something, but the source is not meant to be formatted into a manual pages to begin with, and also its contents are a bit stale. It may be a good first step in the right direction, but needs more work to at least get the mark-up right before public consumption. Will hold. * jk/branch-at-publish-rebased (2014-01-17) 5 commits - t1507 (rev-parse-upstream): fix typo in test title - implement @{publish} shorthand - branch_get: provide per-branch pushremote pointers - branch_get: return early on error - sha1_name: refactor upstream_mark Give an easier access to the tracking branches from "other" side in a triangular workflow by introducing B@{publish} that works in a similar way to how B@{upstream} does. Meant to be used as a basis for whatever Ram wants to build on. Will hold. * rb/merge-prepare-commit-msg-hook (2014-01-10) 4 commits - merge: drop unused arg from abort_commit method signature - merge: make prepare_to_commit responsible for write_merge_state - t7505: ensure cleanup after hook blocks merge - t7505: add missing && Expose more merge states (e.g. $GIT_DIR/MERGE_MODE) to hooks that run during "git merge". The log message stresses too much on one hook, prepare-commit-msg, but it would equally apply to other hooks like post-merge, I think. Waiting for a reroll. * jl/submodule-recursive-checkout (2013-12-26) 5 commits - Teach checkout
Re: [PATCH] tests: turn on network daemon tests by default
On Wed, Feb 12, 2014 at 11:06:43AM -0800, Junio C Hamano wrote: > Jeff King writes: > > > Agreed, but I think the only way to know the size of those fallouts is > > to try it and see who complains. I would not normally be so cavalier > > with git itself, but I think for the test infrastructure, we have a > > small, tech-savvy audience that can help us iterate on it without too > > much pain. > > There is another. I somehow pictured this while reading your email: http://youtu.be/X4JVcvR7IM0 > $ GIT_TEST_HTTPD=false sh t5537-fetch-shallow.sh > ok 1 - setup > ok 2 - setup shallow clone > ok 3 - clone from shallow clone > ok 4 - fetch from shallow clone > ok 5 - fetch --depth from shallow clone > ok 6 - fetch --unshallow from shallow clone > ok 7 - fetch something upstream has but hidden by clients shallow boundaries > ok 8 - fetch that requires changes in .git/shallow is filtered > ok 9 - fetch --update-shallow > error: Can't use skip_all after running some tests Ah, yeah, I did not notice that t5537 does its own special handling of GIT_TEST_HTTPD. I think it is wrong to do so, and it is already buggy in the case that GIT_TEST_HTTPD is set, but apache cannot be started. E.g., with the current "master": $ sudo chmod -x `which apache2` $ GIT_TEST_HTTPD=1 ./t5537-fetch-shallow.sh ok 1 - setup ok 2 - setup shallow clone ok 3 - clone from shallow clone ok 4 - fetch from shallow clone ok 5 - fetch --depth from shallow clone ok 6 - fetch --unshallow from shallow clone ok 7 - fetch something upstream has but hidden by clients shallow boundaries ok 8 - fetch that requires changes in .git/shallow is filtered ok 9 - fetch --update-shallow error: Can't use skip_all after running some tests lib-httpd was never designed to be included from anywhere except the beginning of the file. But that wouldn't be right for t5537, because it wants to run some of the tests, even if apache setup fails. The right way to do it is probably to have lib-httpd do all of its work in a lazy prereq. I don't know how clunky that will end up, though; it might be simpler to just move the shallow http test into one of the http-fetch scripts. -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] tests: turn on network daemon tests by default
Jeff King writes: > test_normalize_tristate GIT_TEST_DAEMON Heh, great minds think alike. This is what I am playing with, without committing (because I do like your "ask config if this is a kind of various boolean 'false' representations, which I haven't managed to add to it). t/lib-git-daemon.sh | 2 +- t/lib-httpd.sh | 2 +- t/test-lib-functions.sh | 45 +++-- 3 files changed, 29 insertions(+), 20 deletions(-) diff --git a/t/lib-git-daemon.sh b/t/lib-git-daemon.sh index 36106de..615bf5d 100644 --- a/t/lib-git-daemon.sh +++ b/t/lib-git-daemon.sh @@ -16,7 +16,7 @@ # stop_git_daemon # test_done -GIT_TEST_GIT_DAEMON=$(test_tristate "$GIT_TEST_GIT_DAEMON") +test_tristate GIT_TEST_GIT_DAEMON if test "$GIT_TEST_GIT_DAEMON" = false then skip_all="git-daemon testing disabled (unset GIT_TEST_GIT_DAEMON to enable)" diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh index 583fb14..f9c2e22 100644 --- a/t/lib-httpd.sh +++ b/t/lib-httpd.sh @@ -30,7 +30,7 @@ # Copyright (c) 2008 Clemens Buchacher # -GIT_TEST_HTTPD=$(test_tristate "$GIT_TEST_HTTPD") +test_tristate GIT_TEST_HTTPD if test "$GIT_TEST_HTTPD" = false then skip_all="Network testing disabled (unset GIT_TEST_HTTPD to enable)" diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh index 3cc09c0..21c5214 100644 --- a/t/test-lib-functions.sh +++ b/t/test-lib-functions.sh @@ -716,27 +716,36 @@ perl () { command "$PERL_PATH" "$@" } -# Normalize the value given in $1 to one of "true", "false", or "auto". The -# result is written to stdout. E.g.: +# Given a variable $1, normalize the value of it to one of "true", +# "false", or "auto" and store the result to it. # -# GIT_TEST_HTTPD=$(test_tristate "$GIT_TEST_HTTPD") +# test_tristate GIT_TEST_HTTPD # +# A variable set to an empty string is set to 'false'. +# A variable set to 'false' or 'auto' keeps its value. +# Anything else is set to 'true'. +# An unset variable defaults to 'auto'. +# +# The last rule is to allow people to set the variable to an empty +# string and export it to decline testing the particular feature +# for versions both before and after this change. We used to treat +# both unset and empty variable as a signal for "do not test" and +# took any non-empty string as "please test". + test_tristate () { - case "$1" in - ""|auto) - echo auto - ;; - *) - # Rely on git-config to handle all the variants of - # true/1/on/etc that we allow. - if ! git -c magic.hack="$1" config --bool magic.hack 2>/dev/null - then - # If git-config failed, it was some non-bool value like - # YesPlease. Default this to "true" for historical - # compatibility. - echo true - fi - esac + if eval "test x\"\${$1+isset}\" = xisset" + then + # explicitly set + eval " + case \"\$$1\" in + '') $1=false ;; + false | auto) ;; + *) $1=true ;; + esac + " + else + eval "$1=auto" + fi } # Exit the test suite, either by skipping all remaining tests or by -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Make the git codebase thread-safe
On Wed, Feb 12, 2014 at 12:27 PM, Stefan Zager wrote: > On Wed, Feb 12, 2014 at 12:06 PM, Junio C Hamano wrote: >> Stefan Zager writes: >> >> Calls to write (and preparation of data to be written) will then >> remain single-threaded, but it sounds like that codepath is not the >> bottleneck in your measurement, so > > Yes, I considered that as well. At a minimum, that would still > require attr.c to implement thread locking, since attribute files must > be parsed to look for stream filters. I have already done that work. I would have imagined that use of the attribute system belongs to "write and preparation of data to be written" category, i.e. the single threaded part of the kludge I outlined. > But I'm not sure it's the best long-term approach to add convoluted > custom threading solutions to each git operation as it appears on the > performance radar. Yeah, it depends on how clean and non-intrusive an abstraction we can make. The kludge I outlined is certainly not very pretty. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Make the git codebase thread-safe
On Wed, Feb 12, 2014 at 08:15:24PM +0100, David Kastrup wrote: > Stefan Zager writes: > > > On Wed, Feb 12, 2014 at 10:50 AM, David Kastrup wrote: > > > >> Really, give the above patch a try. I am taking longer to finish it > >> than anticipated (with a lot due to procrastination but that is, > >> unfortunately, a large part of my workflow), and it's cutting into my > >> "paychecks" (voluntary donations which to a good degree depend on timely > >> and nontrivial progress reports for my freely available work on GNU > >> LilyPond). > > > > I will give that a try. How much of a performance improvement have > > you clocked? > > Depends on file type and size. With large files with lots of small > changes, performance improvements get more impressive. > > Some ugly real-world examples are the Emacs repository, src/xdisp.c > (performance improvement about a factor of 3), a large file in the style > of /usr/share/dict/words clocking in at a factor of about 5. > > Again, that's with an SSD and ext4 filesystem on GNU/Linux, and there > are no improvements in system time (I/O) except for patch 4 of the > series which helps perhaps 20% or so. > > So the benefits of the patch will come into play mostly for big, bad > files on Windows: other than that, the I/O time is likely to be the > dominant player anyway. How much fragmentation does that add to the files, though? Mike -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] contrib/diff-highlight: multibyte characters diff
On 12 February 2014 20:59, Jeff King wrote: > +sub decode { > + my $orig = shift; > + my $decoded = eval { decode_utf8($orig, Encode::FB_CROAK) }; > + return defined $decoded ? I'd still advocate checking $@ here, rather than the defined $decoded check. > + ($decoded, sub { encode_utf8(shift) }) : > + ($orig, sub { shift }); > +} > + -- Thomas Adam -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] contrib/diff-highlight: multibyte characters diff
On Wed, Feb 12, 2014 at 11:10:49PM +, Thomas Adam wrote: > On 12 February 2014 20:59, Jeff King wrote: > > +sub decode { > > + my $orig = shift; > > + my $decoded = eval { decode_utf8($orig, Encode::FB_CROAK) }; > > + return defined $decoded ? > > I'd still advocate checking $@ here, rather than the defined $decoded check. I don't mind changing it, but for my edification, what is the advantage? -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Make the git codebase thread-safe
On Wed, Feb 12, 2014 at 12:00:19PM +0100, Karsten Blees wrote: > Am 12.02.2014 04:43, schrieb Duy Nguyen: > > On Wed, Feb 12, 2014 at 9:02 AM, Robin H. Johnson > > wrote: > >> On Tue, Feb 11, 2014 at 05:54:51PM -0800, Stefan Zager wrote: > >>> We in the chromium project have a keen interest in adding threading to > >>> git in the pursuit of performance for lengthy operations (checkout, > >>> status, blame, ...). Our motivation comes from hitting some > >>> performance walls when working with repositories the size of chromium > >>> and blink: > >> +1 from Gentoo on performance improvements for large repos. > >> > >> The main repository in the ongoing Git migration project looks to be in > >> the 1.5GB range (and for those that want to propose splitting it up, we > >> have explored that option and found it lacking), with very deep history > >> (but no branches of note, and very few tags). > > > > From v1.9 shallow clone should work for all push/pull/clone... so > > history depth does not matter (on the client side). As for > > gentoo-x86's large worktree, using index v4 and avoid full-tree > > operations (e.g. "status .", not "status"..) should make all > > operations reasonably fast. I plan to make "status" fast even without > > path limiting with the help of inotify, but that's not going to be > > finished soon. Did I miss anything else? > > > > Regarding git-status on msysgit, enable core.preloadindex and core.fscache > (as of 1.8.5.2). > > There's no inotify on Windows, and I gave up using ReadDirectoryChangesW to > keep fscache up to date, as it _may_ report DOS file names (e.g. C:\PROGRA~1 > instead of C:\Program Files). You can use GetLongPathNameW to get the latter from the former. Mike -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Make the git codebase thread-safe
Am 13.02.2014 00:03, schrieb Mike Hommey: > On Wed, Feb 12, 2014 at 12:00:19PM +0100, Karsten Blees wrote: >> Am 12.02.2014 04:43, schrieb Duy Nguyen: >>> On Wed, Feb 12, 2014 at 9:02 AM, Robin H. Johnson >>> wrote: On Tue, Feb 11, 2014 at 05:54:51PM -0800, Stefan Zager wrote: > We in the chromium project have a keen interest in adding threading to > git in the pursuit of performance for lengthy operations (checkout, > status, blame, ...). Our motivation comes from hitting some > performance walls when working with repositories the size of chromium > and blink: +1 from Gentoo on performance improvements for large repos. The main repository in the ongoing Git migration project looks to be in the 1.5GB range (and for those that want to propose splitting it up, we have explored that option and found it lacking), with very deep history (but no branches of note, and very few tags). >>> >>> From v1.9 shallow clone should work for all push/pull/clone... so >>> history depth does not matter (on the client side). As for >>> gentoo-x86's large worktree, using index v4 and avoid full-tree >>> operations (e.g. "status .", not "status"..) should make all >>> operations reasonably fast. I plan to make "status" fast even without >>> path limiting with the help of inotify, but that's not going to be >>> finished soon. Did I miss anything else? >>> >> >> Regarding git-status on msysgit, enable core.preloadindex and core.fscache >> (as of 1.8.5.2). >> >> There's no inotify on Windows, and I gave up using ReadDirectoryChangesW to >> keep fscache up to date, as it _may_ report DOS file names (e.g. C:\PROGRA~1 >> instead of C:\Program Files). > > You can use GetLongPathNameW to get the latter from the former. > > Mike > Except if its a delete or rename notification...my final ReadDirectoryChangesW version cached the files by their long _and_ short names, but was so complex that it slowed most commands down rather than speeding them up :-) -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] contrib/diff-highlight: multibyte characters diff
On Wed, Feb 12, 2014 at 06:27:40PM -0500, Jeff King wrote: > On Wed, Feb 12, 2014 at 11:10:49PM +, Thomas Adam wrote: > > > On 12 February 2014 20:59, Jeff King wrote: > > > +sub decode { > > > + my $orig = shift; > > > + my $decoded = eval { decode_utf8($orig, Encode::FB_CROAK) }; > > > + return defined $decoded ? > > > > I'd still advocate checking $@ here, rather than the defined $decoded check. > > I don't mind changing it, but for my edification, what is the advantage? The documentation for decode_utf8 isn't clear, but I don't know if it can ever return undef. What, for example, does it return if $orig is not defined? That's the benefit: it's immediately clear to the user that you're interested in whether it threw an exception, rather than whether it produced a given value. That said, $DAYJOB is a Perl shop, and I would certainly not reject this code in review, and depending on the situation, I might even write something like this. I personally think it's fine. -- brian m. carlson / brian with sandals: Houston, Texas, US +1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187 signature.asc Description: Digital signature
Re: [PATCH] tests: turn on network daemon tests by default
On Thu, Feb 13, 2014 at 5:12 AM, Jeff King wrote: > lib-httpd was never designed to be included from anywhere except the > beginning of the file. But that wouldn't be right for t5537, because it > wants to run some of the tests, even if apache setup fails. The right > way to do it is probably to have lib-httpd do all of its work in a lazy > prereq. I don't know how clunky that will end up, though; it might be > simpler to just move the shallow http test into one of the http-fetch > scripts. I'll move it out later. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] contrib/diff-highlight: multibyte characters diff
On Thu, Feb 13, 2014 at 01:17:54AM +, brian m. carlson wrote: > On Wed, Feb 12, 2014 at 06:27:40PM -0500, Jeff King wrote: > > On Wed, Feb 12, 2014 at 11:10:49PM +, Thomas Adam wrote: > > > > > On 12 February 2014 20:59, Jeff King wrote: > > > > +sub decode { > > > > + my $orig = shift; > > > > + my $decoded = eval { decode_utf8($orig, Encode::FB_CROAK) }; > > > > + return defined $decoded ? > > > > > > I'd still advocate checking $@ here, rather than the defined $decoded > > > check. > > > > I don't mind changing it, but for my edification, what is the advantage? > > The documentation for decode_utf8 isn't clear, but I don't know if it > can ever return undef. What, for example, does it return if $orig is > not defined? That's the benefit: it's immediately clear to the user > that you're interested in whether it threw an exception, rather than > whether it produced a given value. I'd argue that I am more interested in whether it returned a value. Let us imagine for a moment that decode_utf8 could return undef without throwing an exception. What should the function return in such a case? I think the only sensible thing is the original (and to indicate that the result was not converted). -Peff -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Make the git codebase thread-safe
On Tue, Feb 11, 2014 at 05:54:51PM -0800, Stefan Zager wrote: > To this end, I'd like to start submitting patches that make the code > base generally more thread-safe and thread-friendly. Right after this > email, I'm going to send the first such patch, which makes the global > list of pack files (packed_git) internal to sha1_file.c. I'm definitely interested in this if it also works on POSIX systems. At work, we have a 7.6 GiB repo (packed)[0], so while performance is not bad, I certainly wouldn't object if it were better. [0] Using du -sh. For comparison, the Linux kernel repo is 1.4 GiB. -- brian m. carlson / brian with sandals: Houston, Texas, US +1 832 623 2791 | http://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: RSA v4 4096b: 88AC E9B2 9196 305B A994 7552 F1BA 225C 0223 B187 signature.asc Description: Digital signature
Re: feature request: text=input option in .gitattributes
Thank you Torsten. Could some one help me clarify what this means? https://www.kernel.org/pub/software/scm/git/docs/gitattributes.html "This ensures that all files that git considers to be text will have normalized (LF) line endings in the repository. The core.eol configuration variable controls which line endings git will use for normalized files in your working directory; the default is to use the native line ending for your platform, or CRLF if core.autocrlf is set." Does this mean? 1) the files in your working directory should only be set to CRLF if your native line endings are core.eol=crlf and core.autocrlf=true? 2) the files in your working directory should be set to CRLF by default if your native line endings are core.eol=crlf and not core.autocrlf=input I was hoping it would be #1, but it does not appear to be working that way. The F# Compiler Service that I want to source index has text=auto. It can be used to test. git clone https://github.com/fsharp/FSharp.Compiler.Service fcs git config --global core.eol crlf git clone https://github.com/fsharp/FSharp.Compiler.Service fcsCoreEolCrlf Please help clarify how git should be working, so that I can either log a bug or a feature request. cheers, Cameron On Wed, Feb 12, 2014 at 9:56 AM, Torsten Bögershausen wrote: > On 2014-02-11 15.57, Cameron Taggart wrote: >> After requesting this as >> https://github.com/msysgit/msysgit/issues/164, I was told to take it >> upstream, so here I am. >> >> I would like a text=input feature added that has the same behavior as >> text=auto, except that it defaults to core.autocrlf=input behavior >> instead of core.autocrlf=true. > If you want to normailze your repo, and keep it normalized, > I can recommend to use .gitattributes. > > Please see the excellent page here: > https://www.kernel.org/pub/software/scm/git/docs/gitattributes.html > And especially this part: > > $ echo "* text=auto" >>.gitattributes > $ rm .git/index # Remove the index to force git to > $ git reset # re-scan the working directory > $ git status# Show files that will be normalized > $ git add -u > $ git add .gitattributes > $ git commit -m "Introduce end-of-line normalization" > > > > /Torsten > -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git-note -C changes commit type?
Dear Johan, Am Mittwoch, den 12.02.2014, 00:52 +0100 schrieb Johan Herland: > On Tue, Feb 11, 2014 at 6:23 PM, Joachim Breitner > wrote: > > judging from the documentation I got the impression that I can pass any > > git object has to "git note -C " and it would stored as-is. But it > > seems the objects gets mangled somehow... > > ...well... the documentation does not say "any object", it actually > explicitly says "blob object"... ;) ok, my bad; guess I’m not fully versed with gits terminology. > You would have a notes ref "refs/notes/history" whose tree would > contain an entry named e1bfac434ebd3135a3784f6fc802f235098eebd0 > pointing to a _commit_ (3d7de37...). Obviously, it would not make > sense to use refs/notes/history while displaying the commit log ("git > log --notes=history"), as the raw commit object would be shown in the > log. However, more fundamentally: a tree referring to a _commit_ is > usually how Git stores _submodule_ links (i.e. which revision of the > named submodule is to be used by this super-repo tree), and I'm (off > the top of my head) not at all sure that such a submodule link in a > notes tree is handled "sanely" by Git - or even that it makes sense at > all. For one, I'm not sure that Git requires (or even expects) the > commit object referenced by a tree to be present in the same object > DB. So if you share your notes, I don't know whether or not the > fetch/push machinery will include the commit object in the shared > notes... These are questions that should be answered before we decide > whether using commits directly as notes makes sense. If that is the case, then my approach is indeed flawed. The main point of the exercise is to have a tree that follows another commit (or, as a next-best approximation, a note attached to that commit) around. > In that case, you might be better off using an explicit > ref to keep that history alive; e.g. you could create > refs/history/e1bfac4 to point to 3d7de37 ("git update-ref > refs/history/e1bfac4 3d7de37"), and keep everything > alive/reachable/shareable that way... That’s an interesting idea; instead of relying on the notes feature putting the hash in the ref name. But I wonder how that scales – imagine every second feature merged into Linux¹ also having such a history ref? I guess having a way for a tree to reference commits in a way that is followed by git gc, i.e. separate from submodules, would allow a less noisy implementation, and possibly create the opportunity for many other strange uses of git :-) Greetings, Joachim ¹ I’m not proposing for anyone else but me to use this, at the moment, don’t worry :-). But I am considering to use it in the context of GHC, which isn’t a small project either. -- Joachim “nomeata” Breitner m...@joachim-breitner.de • http://www.joachim-breitner.de/ Jabber: nome...@joachim-breitner.de • GPG-Key: 0x4743206C Debian Developer: nome...@debian.org signature.asc Description: This is a digitally signed message part
Re: [PATCH 00/11] More preparatory work for multiparent tree-walker
On Tue, Feb 11, 2014 at 11:59:02AM -0800, Junio C Hamano wrote: > Kirill Smelkov writes: > > > Sorry for the confusion. Could you please do the following: > > > > Patches should be applied over to ks/tree-diff-walk > > (74aa4a18). Before applying the patches, please cherry-pick > > > > c90483d9(tree-diff: no need to manually verify that there is no > > mode change for a path) > > > > ef4f0928(tree-diff: no need to pass match to > > skip_uninteresting()) > > > > into that branch, and drop them from ks/combine-diff - we'll have one > > branch reworking the diff tree-walker, and the other taking advantage of > > it for combine-diff. > > As long as that does not lose changes to tests and clean-ups, I'm > fine with that direction. For example, I do not know if you want to > lose e3f62d12 (diffcore-order: export generic ordering interface, > 2014-01-20), which is part of the combine-diff topic. Sorry for the confusion again, and please don't worry: we are not going to lose anything - my only plea here was to transfer two of the patches to more appropriate topic. That couple touches tree-diff.c - they were some initial cleanups I've noticed while working on separate combine-diff tree-walker, which we decided to drop instead of generalizing diff tree-walker to handle all cases. Only the cleanups are still relevant and needed as a base for what I've sent you here. And as to e3f62d12 (diffcore-order: export generic ordering interface, 2014-01-20) and other patches on ks/diff-c-with-diff-order topic - they stay as they are - we do not need to rework them as ks/combine-diff builds on top of the topic and generalizing diff tree-walker is orthogonal work. So in short: could you please transform the two "tree-diff" patches from ks/combine-diff to ks/tree-diff-walk, then apply sent-here patches to ks/tree-diff-walk; thats all. Thanks, Kirill -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git-note -C changes commit type?
On Wed, Feb 12, 2014 at 1:06 AM, Junio C Hamano wrote: > Johan Herland writes: >> There is currently no way the "git notes" commands will allow you to >> store the 3d7de37 commit object directly as a note. There is also >> (AFAICS) no easy workaround (git fast-import could've been a >> workaround if it did not already require the first N/notemodify >> argument to be a blob object). The best alternative, off the top of my >> head, would be to write your own program using the notes.h API to >> manipulate the notes tree directly (or - suboptimally - use other >> low-level Git operations to do the same). > > Even worse. I do not think such a non-blob object in the notes tree > does not participate in the reachability at all, so you won't be > able to fetch "refs/notes/whatever" and expect to get a useful > result. s/non-blob/non-(blob-or-tree)/ Any object type that is deemed reachable by reference from a regular git tree object will also be usable (as far as reachability goes) in a notes tree. > I do not think storing the raw bits of commit object as a > blob in the notes tree is useful behaviour, either. The command > probably should refuse to get anything non-blob via that option. Patch coming up... > Perhaps the notes entry should just note the object name of whatever > commit it wants to refer to in a *blob*? Agreed. ...Johan -- Johan Herland, www.herland.net -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] notes: Disallow reusing non-blob as a note object
Currently "git notes add -C $object" will read the raw bytes from $object, and then copy those bytes into the note object, which is hardcoded to be of type blob. This means that if the given $object is a non-blob (e.g. tree or commit), the raw bytes from that object is copied into a blob object. This is probably not useful, and certainly not what any sane user would expect. So disallow it, by erroring out if the $object passed to the -C option is not a blob. The fix also applies to the -c option (in which the user is prompted to edit/verify the note contents in a text editor), and also when -c/-C is passed to "git notes append" (which appends the $object contents to an existing note object). In both cases, passing a non-blob $object does not make sense. Also add a couple of tests demonstrating expected behavior. Suggested-by: Junio C Hamano Signed-off-by: Johan Herland --- builtin/notes.c | 6 +- t/t3301-notes.sh | 27 +++ 2 files changed, 32 insertions(+), 1 deletion(-) diff --git a/builtin/notes.c b/builtin/notes.c index 2b24d05..bb89930 100644 --- a/builtin/notes.c +++ b/builtin/notes.c @@ -269,7 +269,11 @@ static int parse_reuse_arg(const struct option *opt, const char *arg, int unset) die(_("Failed to resolve '%s' as a valid ref."), arg); if (!(buf = read_sha1_file(object, &type, &len)) || !len) { free(buf); - die(_("Failed to read object '%s'."), arg);; + die(_("Failed to read object '%s'."), arg); + } + if (type != OBJ_BLOB) { + free(buf); + die(_("Cannot read note data from non-blob object '%s'."), arg); } strbuf_add(&(msg->buf), buf, len); free(buf); diff --git a/t/t3301-notes.sh b/t/t3301-notes.sh index 16de05a..3bb79a4 100755 --- a/t/t3301-notes.sh +++ b/t/t3301-notes.sh @@ -812,6 +812,33 @@ test_expect_success 'create note from non-existing note with "git notes add -C" test_must_fail git notes list HEAD ' +test_expect_success 'create note from non-blob with "git notes add -C" fails' ' + commit=$(git rev-parse --verify HEAD) && + tree=$(git rev-parse --verify HEAD:) && + test_must_fail git notes add -C $commit && + test_must_fail git notes add -C $tree && + test_must_fail git notes list HEAD +' + +cat > expect << EOF +commit 80d796defacd5db327b7a4e50099663902fbdc5c +Author: A U Thor +Date: Thu Apr 7 15:20:13 2005 -0700 + +8th + +Notes (other): +This is a blob object +EOF + +test_expect_success 'create note from blob with "git notes add -C" reuses blob id' ' + blob=$(echo "This is a blob object" | git hash-object -w --stdin) && + git notes add -C $blob && + git log -1 > actual && + test_cmp expect actual && + test "$(git notes list HEAD)" = "$blob" +' + cat > expect << EOF commit 016e982bad97eacdbda0fcbd7ce5b0ba87c81f1b Author: A U Thor -- 1.8.4.653.g2df02b3 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git-note -C changes commit type?
On Wed, Feb 12, 2014 at 9:53 AM, Joachim Breitner wrote: > Am Mittwoch, den 12.02.2014, 00:52 +0100 schrieb Johan Herland: >> You would have a notes ref "refs/notes/history" whose tree would >> contain an entry named e1bfac434ebd3135a3784f6fc802f235098eebd0 >> pointing to a _commit_ (3d7de37...). Obviously, it would not make >> sense to use refs/notes/history while displaying the commit log ("git >> log --notes=history"), as the raw commit object would be shown in the >> log. However, more fundamentally: a tree referring to a _commit_ is >> usually how Git stores _submodule_ links (i.e. which revision of the >> named submodule is to be used by this super-repo tree), and I'm (off >> the top of my head) not at all sure that such a submodule link in a >> notes tree is handled "sanely" by Git - or even that it makes sense at >> all. For one, I'm not sure that Git requires (or even expects) the >> commit object referenced by a tree to be present in the same object >> DB. So if you share your notes, I don't know whether or not the >> fetch/push machinery will include the commit object in the shared >> notes... These are questions that should be answered before we decide >> whether using commits directly as notes makes sense. > > If that is the case, then my approach is indeed flawed. The main point > of the exercise is to have a tree that follows another commit (or, as a > next-best approximation, a note attached to that commit) around. > >> In that case, you might be better off using an explicit >> ref to keep that history alive; e.g. you could create >> refs/history/e1bfac4 to point to 3d7de37 ("git update-ref >> refs/history/e1bfac4 3d7de37"), and keep everything >> alive/reachable/shareable that way... > > That’s an interesting idea; instead of relying on the notes feature > putting the hash in the ref name. But I wonder how that scales – imagine > every second feature merged into Linux¹ also having such a history ref? Ah, that will probably not scale very well. > I guess having a way for a tree to reference commits in a way that is > followed by git gc, i.e. separate from submodules, would allow a less > noisy implementation, and possibly create the opportunity for many other > strange uses of git :-) Here's another way to solve your problem, which should be fairly transparent and performant: Whenever you want to reference "history" of a commit (I'm using quotes here, because we're not talking about the "regular" git sense of history, i.e. the commit graph), you perform the following two steps: 1. Append the "historical" commit SHA-1 (3d7de37 in your example) to a note on the "current" commit (e1bfac4). E.g.: git notes --ref history append -m 3d7de37... e1bfac4... 2. Perform some automated merge into a "history"-tracking ref (e.g. refs/history), to keep the "historical" commits reachable. (You can easily wrap both steps into a script to automate things.) Step #1 encodes the "history" of a commit in a note, but does not keep the "history" reachable. Step #2 keeps all "historical" commits reachable by making them part of the history (in the git sense - without quotes) of a proper ref (refs/history). The actual result/outcome of the merge is not interesting. It only exists to insert the "historical" commit (3d7de37) into the ancestry for refs/history. Since the actual merge itself is uninteresting, you should probably use a merge strategy that never yields conflicts, e.g. "-s ours" You can now share the "history" by pushing/fetching the two refs refs/notes/history and refs/history. (In theory, you might even be able to combine the two refs, by performing the merge directly into refs/notes/history, always taking care to retain the notes tree contents as the result of the merge. In other words, after you do step #1 (append the note), you manually rewrite the just-created tip of refs/notes/history to include 3d7de37 as a second parent. This keeps 3d7de37 reachable (and it will be shared when you share refs/notes/history), and it should not interfere with the notes infrastructure, as they only look at the current state of the notes tree.) Hope this helps, ...Johan -- Johan Herland, www.herland.net -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git-note -C changes commit type?
Dear Johan, thanks for the patch! Am Mittwoch, den 12.02.2014, 11:26 +0100 schrieb Johan Herland: > Here's another way to solve your problem, which should be fairly > transparent and performant: > > Whenever you want to reference "history" of a commit (I'm using quotes > here, because we're not talking about the "regular" git sense of > history, i.e. the commit graph), you perform the following two steps: > > 1. Append the "historical" commit SHA-1 (3d7de37 in your example) to a > note on the "current" commit (e1bfac4). E.g.: > > git notes --ref history append -m 3d7de37... e1bfac4... > > 2. Perform some automated merge into a "history"-tracking ref (e.g. > refs/history), to keep the "historical" commits reachable. > > (You can easily wrap both steps into a script to automate things.) > > Step #1 encodes the "history" of a commit in a note, but does not keep > the "history" reachable. > > Step #2 keeps all "historical" commits reachable by making them part > of the history (in the git sense - without quotes) of a proper ref > (refs/history). The actual result/outcome of the merge is not > interesting. It only exists to insert the "historical" commit > (3d7de37) into the ancestry for refs/history. Since the actual merge > itself is uninteresting, you should probably use a merge strategy that > never yields conflicts, e.g. "-s ours" > > You can now share the "history" by pushing/fetching the two refs > refs/notes/history and refs/history. > > (In theory, you might even be able to combine the two refs, by > performing the merge directly into refs/notes/history, always taking > care to retain the notes tree contents as the result of the merge. In > other words, after you do step #1 (append the note), you manually > rewrite the just-created tip of refs/notes/history to include 3d7de37 > as a second parent. This keeps 3d7de37 reachable (and it will be > shared when you share refs/notes/history), and it should not interfere > with the notes infrastructure, as they only look at the current state > of the notes tree.) That is quite a good approximation. What it doesn’t do is dropping history (in the refs/history sense) of commits that disappear, but the same problem exists with notes. Thanks! I guess there are no plans to make the commit object format itself extensible, are they? Extensible in the sense that I can add a custom field to it (e.g. history:). Git would not have to know anything about the field besides its type, i.e. that it contains refs that it has to follow. Very much like "parent:", just without the semantics of it wrt. "git log" and the like. Greetings, Joachim -- Joachim “nomeata” Breitner m...@joachim-breitner.de • http://www.joachim-breitner.de/ Jabber: nome...@joachim-breitner.de • GPG-Key: 0x4743206C Debian Developer: nome...@debian.org signature.asc Description: This is a digitally signed message part
Re: Make the git codebase thread-safe
Am 12.02.2014 04:43, schrieb Duy Nguyen: > On Wed, Feb 12, 2014 at 9:02 AM, Robin H. Johnson wrote: >> On Tue, Feb 11, 2014 at 05:54:51PM -0800, Stefan Zager wrote: >>> We in the chromium project have a keen interest in adding threading to >>> git in the pursuit of performance for lengthy operations (checkout, >>> status, blame, ...). Our motivation comes from hitting some >>> performance walls when working with repositories the size of chromium >>> and blink: >> +1 from Gentoo on performance improvements for large repos. >> >> The main repository in the ongoing Git migration project looks to be in >> the 1.5GB range (and for those that want to propose splitting it up, we >> have explored that option and found it lacking), with very deep history >> (but no branches of note, and very few tags). > > From v1.9 shallow clone should work for all push/pull/clone... so > history depth does not matter (on the client side). As for > gentoo-x86's large worktree, using index v4 and avoid full-tree > operations (e.g. "status .", not "status"..) should make all > operations reasonably fast. I plan to make "status" fast even without > path limiting with the help of inotify, but that's not going to be > finished soon. Did I miss anything else? > Regarding git-status on msysgit, enable core.preloadindex and core.fscache (as of 1.8.5.2). There's no inotify on Windows, and I gave up using ReadDirectoryChangesW to keep fscache up to date, as it _may_ report DOS file names (e.g. C:\PROGRA~1 instead of C:\Program Files). -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pack bitmap woes on Windows
On Wed, Feb 12, 2014 at 8:27 AM, Johannes Sixt wrote: > Running test suite of 'next' on Windows fails in t5310-pack-bitmaps with > the following symptoms. I haven't followed the topic. Have there been > patches floating that addressed the problem in one way or another? > > (gdb) run > Starting program: D:\Src\mingw-git\t\trash > directory.t5310-pack-bitmaps/..\..\git.exe rev-list --test-bitmap HEAD > [New thread 3528.0x8d4] > Bitmap v1 test (20 entries loaded) > Found bitmap for 537ea4d3eb79c95f602873b1167c480006d2ac2d. 64 bits / 15873b36 > checksum > > Breakpoint 1, die (err=0x5939e9 "Out of memory, realloc failed") at usage.c:97 > 97 if (die_is_recursing()) { > (gdb) bt > #0 die (err=0x5939e9 "Out of memory, realloc failed") at usage.c:97 > #1 0x00487c4c in xrealloc (ptr=0x12b10008, size=837107040) at wrapper.c:109 ~800 megs is a pretty large allocation for 32-bit systems. What gives? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Failed to find remote helpers after install from source
Hi, I installed git 1.8.5.4 from source under $HOME/bin. My system is SUSE Linux Enterprise Server 11 SP1 (x86_64). I followed instructions and just run make && make install. After installation I cannot clone repos from https urls, getting the following error: $ git clone https://github.com/apbarrero/pyrad.git Cloning into 'pyrad'... fatal: Unable to find remote helper for 'https' I checked that git is properly compiled against libcurl-devel and libexpat-devel packages. Then I tryed using git-remote-https installed under $HOME/bin/libexec/git-core/ and it worked fine to connect to the same remote: $ $HOME/libexec/git-core/git-remote-https https://github.com/apbarrero/pyrad.git capabilities fetch option push check-connectivity list @refs/heads/master HEAD 9599cf833354793b81d2a47504826332473bcb12 refs/heads/master 1f8f2b995bb5ab55e6c6f1051ccb44875ab1e60d refs/tags/0.6 68552227901d377b513884c70d9582da0329a270 refs/tags/0.6^{} e0cd958edc5b3aad7e31435990674e2cff4e3b7e refs/tags/0.7 c50213b2d4213f3574c1a6b454e6887a529de340 refs/tags/0.7^{} 6085deb4ee37862d65f4a26f472fa2d1894a4331 refs/tags/0.8 33902c5b3da1272a4f5930815f561b8068315ba3 refs/tags/0.8^{} 5a45639faaf1cbf7622fe47e2795d6f5a0ee6658 refs/tags/0.9 edd69b9014d7e5bbf9da203d7db9a26587756aa4 refs/tags/0.9^{} 6e3b16ed19b329be944bd1b10aa17d02eb473009 refs/tags/1.0 30beedc5c4e56a15f4025d25331515aa2a917234 refs/tags/1.0^{} dacf4bd37aaddd3872faeb8a77c801fe3c8550cb refs/tags/1.1 4c3e2d6700947ca6ea7b3319ff52abb7029bf3be refs/tags/1.1^{} 9a7f5a4e9fe19ef9f45db4e28a7d4648a011cc9b refs/tags/1.2 ee7ec8f2b37da5e84bf0fbb83e214a8bd3cfdf70 refs/tags/1.2^{} e16af24d814e8d8c83b172ca6103fd3ab93b08db refs/tags/2.0 ce4a625caf5c6d892e020ec150373043a203366e refs/tags/2.0^{} I looked at this post in stackoverflow and use strace to find out git clone is trying to look for remote helpers under $HOME/bin/libexec/git-core, but make install left the libexec/git-core directory under $HOME, instead of$HOME/bin. ... [pid 675896] execve("$HOME/bin/libexec/git-core/git-remote-https", ["git-remote-https", "origin", "https://github.com/apbarrero/pyr";...], [/* 88 vars */]) = -1 ENOENT (No such file or directory) [pid 675896] execve("$HOME/bin/git-remote-https", ["git-remote-https", "origin", "https://github.com/apbarrero/pyr";...], [/* 88 vars */]) = -1 ENOENT (No such file or directory) [pid 675896] execve("$HOME/bin/git-remote-https", ["git-remote-https", "origin", "https://github.com/apbarrero/pyr";...], [/* 88 vars */]) = -1 ENOENT (No such file or directory) [pid 675896] execve("$HOME/bin/git-remote-https", ["git-remote-https", "origin", "https://github.com/apbarrero/pyr";...], [/* 88 vars */]) = -1 ENOENT (No such file or directory) [pid 675896] execve("$HOME/bin/git-remote-https", ["git-remote-https", "origin", "https://github.com/apbarrero/pyr";...], [/* 88 vars */]) = -1 ENOENT (No such file or directory) [pid 675896] execve("/usr/local/bin/git-remote-https", ["git-remote-https", "origin", "https://github.com/apbarrero/pyr";...], [/* 88 vars */]) = -1 ENOENT (No such file or directory) [pid 675896] execve("/usr/bin/git-remote-https", ["git-remote-https", "origin", "https://github.com/apbarrero/pyr";...], [/* 88 vars */]) = -1 ENOENT (No such file or directory) [pid 675896] execve("/bin/git-remote-https", ["git-remote-https", "origin", "https://github.com/apbarrero/pyr";...], [/* 88 vars */]) = -1 ENOENT (No such file or directory) [pid 675896] execve("/usr/bin/X11/git-remote-https", ["git-remote-https", "origin", "https://github.com/apbarrero/pyr";...], [/* 88 vars */]) = -1 ENOENT (No such file or directory) [pid 675896] execve("/usr/X11R6/bin/git-remote-https", ["git-remote-https", "origin", "https://github.com/apbarrero/pyr";...], [/* 88 vars */]) = -1 ENOENT (No such file or directory) [pid 675896] execve("/usr/games/git-remote-https", ["git-remote-https", "origin", "https://github.com/apbarrero/pyr";...], [/* 88 vars */]) = -1 ENOENT (No such file or directory) [pid 675896] execve("/usr/lib64/jvm/jre/bin/git-remote-https", ["git-remote-https", "origin", "https://github.com/apbarrero/pyr";...], [/* 88 vars */]) = -1 ENOENT (No such file or directory) [pid 675896] execve("/usr/lib/mit/bin/git-remote-https", ["git-remote-https", "origin", "https://github.com/apbarrero/pyr";...], [/* 88 vars */]) = -1 ENOENT (No such file or directory) [pid 675896] execve("/usr/lib/mit/sbin/git-remote-https", ["git-remote-https", "origin", "https://github.com/apbarrero/pyr";...], [/* 88 vars */]) = -1 ENOENT (No such file or directory) ... Creating a soft link under $HOME/bin/ to ../libexec got it working. So it look like it is rather an installation issue or a bug in the directory list where git looks for binaries under libexec or I missed some parameter to pass to make. Regards, -- Antonio Pérez Barrero apbarr...@gmail.com -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kern
Re: Make the git codebase thread-safe
On Wed, Feb 12, 2014 at 2:54 AM, Stefan Zager wrote: > We in the chromium project have a keen interest in adding threading to > git in the pursuit of performance for lengthy operations (checkout, > status, blame, ...). Our motivation comes from hitting some > performance walls when working with repositories the size of chromium > and blink: > > https://chromium.googlesource.com/chromium/src > https://chromium.googlesource.com/chromium/blink > > We are particularly concerned with the performance of msysgit, and we > have already chalked up a significant performance gain by turning on > the threading code in pack-objects (which was already enabled for > posix platforms, but not on msysgit, owing to the lack of a correct > pread implementation). How did you manage to do this? I'm not aware of any way to implement pread on Windows (without going down the insanity-path of wrapping and potentially locking inside every IO operation)... -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pack bitmap woes on Windows
Am 2/12/2014 12:56, schrieb Erik Faye-Lund: > On Wed, Feb 12, 2014 at 8:27 AM, Johannes Sixt wrote: >> Running test suite of 'next' on Windows fails in t5310-pack-bitmaps with >> the following symptoms. I haven't followed the topic. Have there been >> patches floating that addressed the problem in one way or another? >> >> (gdb) run >> Starting program: D:\Src\mingw-git\t\trash >> directory.t5310-pack-bitmaps/..\..\git.exe rev-list --test-bitmap HEAD >> [New thread 3528.0x8d4] >> Bitmap v1 test (20 entries loaded) >> Found bitmap for 537ea4d3eb79c95f602873b1167c480006d2ac2d. 64 bits / >> 15873b36 checksum >> >> Breakpoint 1, die (err=0x5939e9 "Out of memory, realloc failed") at >> usage.c:97 >> 97 if (die_is_recursing()) { >> (gdb) bt >> #0 die (err=0x5939e9 "Out of memory, realloc failed") at usage.c:97 >> #1 0x00487c4c in xrealloc (ptr=0x12b10008, size=837107040) at wrapper.c:109 > > ~800 megs is a pretty large allocation for 32-bit systems. What gives? That's exactly the problem: why would a tiny repository from the test suite require such a large allocation? (Not to mention that the allocation ultimately fails in my case.) -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pack bitmap woes on Windows
Johannes Sixt writes: > Running test suite of 'next' on Windows fails in t5310-pack-bitmaps with > the following symptoms. I haven't followed the topic. Have there been > patches floating that addressed the problem in one way or another? > > (gdb) run > Starting program: D:\Src\mingw-git\t\trash > directory.t5310-pack-bitmaps/..\..\git.exe rev-list --test-bitmap HEAD > [New thread 3528.0x8d4] > Bitmap v1 test (20 entries loaded) > Found bitmap for 537ea4d3eb79c95f602873b1167c480006d2ac2d. 64 bits / 15873b36 > checksum Does reverting a201c20b41a2f0725977bcb89a2a66135d776ba2 help? -- David Kastrup -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pack bitmap woes on Windows
Am 2/12/2014 13:55, schrieb David Kastrup: > Johannes Sixt writes: > >> Running test suite of 'next' on Windows fails in t5310-pack-bitmaps with >> the following symptoms. I haven't followed the topic. Have there been >> patches floating that addressed the problem in one way or another? >> >> (gdb) run >> Starting program: D:\Src\mingw-git\t\trash >> directory.t5310-pack-bitmaps/..\..\git.exe rev-list --test-bitmap HEAD >> [New thread 3528.0x8d4] >> Bitmap v1 test (20 entries loaded) >> Found bitmap for 537ea4d3eb79c95f602873b1167c480006d2ac2d. 64 bits / >> 15873b36 checksum > > Does reverting a201c20b41a2f0725977bcb89a2a66135d776ba2 help? YES! t5310 passes after reverting this commit. -- Hannes -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pack bitmap woes on Windows
Johannes Sixt writes: > Am 2/12/2014 13:55, schrieb David Kastrup: >> Johannes Sixt writes: >> >>> Running test suite of 'next' on Windows fails in t5310-pack-bitmaps with >>> the following symptoms. I haven't followed the topic. Have there been >>> patches floating that addressed the problem in one way or another? >>> >>> (gdb) run >>> Starting program: D:\Src\mingw-git\t\trash >>> directory.t5310-pack-bitmaps/..\..\git.exe rev-list --test-bitmap >>> HEAD >>> [New thread 3528.0x8d4] >>> Bitmap v1 test (20 entries loaded) >>> Found bitmap for 537ea4d3eb79c95f602873b1167c480006d2ac2d. 64 bits >>> / 15873b36 checksum >> >> Does reverting a201c20b41a2f0725977bcb89a2a66135d776ba2 help? > > YES! t5310 passes after reverting this commit. Oh. I just looked through the backtrace until finding a routine reasonably related with the regtest and checked for the last commit changing it, then posted my question. Then I looked through the diff of the patch and considered it unconspicuous. So I commenced reading through earlier commits. I actually don't have a good idea what might be wrong here. The code is somewhat distasteful as it basically uses eword_t and uint64_t interchangeably, but then this does match its current definition. -- David Kastrup -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pack bitmap woes on Windows
On Wed, Feb 12, 2014 at 8:23 PM, David Kastrup wrote: > Johannes Sixt writes: > >> Am 2/12/2014 13:55, schrieb David Kastrup: >>> Johannes Sixt writes: >>> Running test suite of 'next' on Windows fails in t5310-pack-bitmaps with the following symptoms. I haven't followed the topic. Have there been patches floating that addressed the problem in one way or another? (gdb) run Starting program: D:\Src\mingw-git\t\trash directory.t5310-pack-bitmaps/..\..\git.exe rev-list --test-bitmap HEAD [New thread 3528.0x8d4] Bitmap v1 test (20 entries loaded) Found bitmap for 537ea4d3eb79c95f602873b1167c480006d2ac2d. 64 bits / 15873b36 checksum >>> >>> Does reverting a201c20b41a2f0725977bcb89a2a66135d776ba2 help? >> >> YES! t5310 passes after reverting this commit. > > Oh. I just looked through the backtrace until finding a routine > reasonably related with the regtest and checked for the last commit > changing it, then posted my question. > > Then I looked through the diff of the patch and considered it > unconspicuous. So I commenced reading through earlier commits. > > I actually don't have a good idea what might be wrong here. The code is > somewhat distasteful as it basically uses eword_t and uint64_t > interchangeably, but then this does match its current definition. Perhaps __BYTE_ORDER or __BIG_ENDIAN is misdefined and the ntohll() is skipped? -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pack bitmap woes on Windows
On Wed, Feb 12, 2014 at 3:09 PM, Duy Nguyen wrote: > On Wed, Feb 12, 2014 at 8:23 PM, David Kastrup wrote: >> Johannes Sixt writes: >> >>> Am 2/12/2014 13:55, schrieb David Kastrup: Johannes Sixt writes: > Running test suite of 'next' on Windows fails in t5310-pack-bitmaps with > the following symptoms. I haven't followed the topic. Have there been > patches floating that addressed the problem in one way or another? > > (gdb) run > Starting program: D:\Src\mingw-git\t\trash > directory.t5310-pack-bitmaps/..\..\git.exe rev-list --test-bitmap > HEAD > [New thread 3528.0x8d4] > Bitmap v1 test (20 entries loaded) > Found bitmap for 537ea4d3eb79c95f602873b1167c480006d2ac2d. 64 bits > / 15873b36 checksum Does reverting a201c20b41a2f0725977bcb89a2a66135d776ba2 help? >>> >>> YES! t5310 passes after reverting this commit. >> >> Oh. I just looked through the backtrace until finding a routine >> reasonably related with the regtest and checked for the last commit >> changing it, then posted my question. >> >> Then I looked through the diff of the patch and considered it >> unconspicuous. So I commenced reading through earlier commits. >> >> I actually don't have a good idea what might be wrong here. The code is >> somewhat distasteful as it basically uses eword_t and uint64_t >> interchangeably, but then this does match its current definition. > > Perhaps __BYTE_ORDER or __BIG_ENDIAN is misdefined and the ntohll() is > skipped? That is indeed the case. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] blame.c: prepare_lines should not call xrealloc for every line
Making a single preparation run for counting the lines will avoid memory fragmentation. Also, fix the allocated memory size which was wrong when sizeof(int *) != sizeof(int), and would have been too small for sizeof(int *) < sizeof(int), admittedly unlikely. Signed-off-by: David Kastrup --- Since there was no feedback after the last defense/explanation of the coding choices, the code rewritten by this patch was much more awful, and the kind of style requests (fixed already in the last iteration) are not actually heeded by the core developers themselves, I have no idea whether this patch will be dropped just like the last one. As opposed to the last try, this incorporates a suggestion from Jeff to change sizeof(type) to sizeof(expression) which is not helping much since the types of lineno and sb->lineno still need to be changed in sync. It also fiddles cosmetically with the code layout of the loops. builtin/blame.c | 46 +++--- 1 file changed, 31 insertions(+), 15 deletions(-) diff --git a/builtin/blame.c b/builtin/blame.c index e44a6bb..1aefedf 100644 --- a/builtin/blame.c +++ b/builtin/blame.c @@ -1772,25 +1772,41 @@ static int prepare_lines(struct scoreboard *sb) { const char *buf = sb->final_buf; unsigned long len = sb->final_buf_size; - int num = 0, incomplete = 0, bol = 1; + const char *end = buf + len; + const char *p; + int *lineno; + int num = 0, incomplete = 0; - if (len && buf[len-1] != '\n') - incomplete++; /* incomplete line at the end */ - while (len--) { - if (bol) { - sb->lineno = xrealloc(sb->lineno, - sizeof(int *) * (num + 1)); - sb->lineno[num] = buf - sb->final_buf; - bol = 0; - } - if (*buf++ == '\n') { + for (p = buf;;) { + p = memchr(p, '\n', end - p); + if (p) { + p++; num++; - bol = 1; + continue; } + break; } - sb->lineno = xrealloc(sb->lineno, - sizeof(int *) * (num + incomplete + 1)); - sb->lineno[num + incomplete] = buf - sb->final_buf; + + if (len && end[-1] != '\n') + incomplete++; /* incomplete line at the end */ + + sb->lineno = xmalloc(sizeof(*sb->lineno) * (num + incomplete + 1)); + lineno = sb->lineno; + + *lineno++ = 0; + for (p = buf;;) { + p = memchr(p, '\n', end - p); + if (p) { + p++; + *lineno++ = p - buf; + continue; + } + break; + } + + if (incomplete) + *lineno++ = len; + sb->num_lines = num + incomplete; return sb->num_lines; } -- 1.8.3.2 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: pack bitmap woes on Windows
On Wed, Feb 12, 2014 at 3:22 PM, Erik Faye-Lund wrote: > On Wed, Feb 12, 2014 at 3:09 PM, Duy Nguyen wrote: >> On Wed, Feb 12, 2014 at 8:23 PM, David Kastrup wrote: >>> Johannes Sixt writes: >>> Am 2/12/2014 13:55, schrieb David Kastrup: > Johannes Sixt writes: > >> Running test suite of 'next' on Windows fails in t5310-pack-bitmaps with >> the following symptoms. I haven't followed the topic. Have there been >> patches floating that addressed the problem in one way or another? >> >> (gdb) run >> Starting program: D:\Src\mingw-git\t\trash >> directory.t5310-pack-bitmaps/..\..\git.exe rev-list --test-bitmap >> HEAD >> [New thread 3528.0x8d4] >> Bitmap v1 test (20 entries loaded) >> Found bitmap for 537ea4d3eb79c95f602873b1167c480006d2ac2d. 64 bits >> / 15873b36 checksum > > Does reverting a201c20b41a2f0725977bcb89a2a66135d776ba2 help? YES! t5310 passes after reverting this commit. >>> >>> Oh. I just looked through the backtrace until finding a routine >>> reasonably related with the regtest and checked for the last commit >>> changing it, then posted my question. >>> >>> Then I looked through the diff of the patch and considered it >>> unconspicuous. So I commenced reading through earlier commits. >>> >>> I actually don't have a good idea what might be wrong here. The code is >>> somewhat distasteful as it basically uses eword_t and uint64_t >>> interchangeably, but then this does match its current definition. >> >> Perhaps __BYTE_ORDER or __BIG_ENDIAN is misdefined and the ntohll() is >> skipped? > > That is indeed the case. Looking a bit at it, the whole byte-order detection mess seems insane to me. MinGW falls inside the "defined(__GNUC__) && (defined(__i386__) || defined(__x86_64__))"-block, but does not define __BYTE_ORDER. It seems the author of a201c20b41a2f0725977bcb89a2a66135d776ba2 assumes __BYTE_ORDER was guaranteed to always be defined, probably fooled by the following check: #if !defined(__BYTE_ORDER) # error "Cannot determine endianness" #endif However, that check is only performed if we're NOT building with GCC on x86/x64 nor MSVC (which I don't think defined __BYTE_ORDER either) The following would make __BYTE_ORDER a guarantee, and show that MinGW does not define it: ---8<--- diff --git a/compat/bswap.h b/compat/bswap.h index 120c6c1..c73bf0e 100644 --- a/compat/bswap.h +++ b/compat/bswap.h @@ -109,10 +109,6 @@ static inline uint64_t git_bswap64(uint64_t x) # endif #endif -#if !defined(__BYTE_ORDER) -# error "Cannot determine endianness" -#endif - #if __BYTE_ORDER == __BIG_ENDIAN # define ntohll(n) (n) # define htonll(n) (n) @@ -123,6 +119,10 @@ static inline uint64_t git_bswap64(uint64_t x) #endif +#if !defined(__BYTE_ORDER) +# error "Cannot determine endianness" +#endif + /* * Performance might be improved if the CPU architecture is OK with * unaligned 32-bit loads and a fast ntohl() is available. ---8<--- But another level of insanity (IMO) is defining double-underscore macros. These symbols are reserved for the implementation. Sure, knowing we're on a given implementation and defining something *else* dependent on them is fine. But defining them is just yuckiness, and not very standard-kosher. IMO, we should rather do the definition the other way around: #if !defined(BYTE_ORDER) # if defined(__BYTE_ORDER) && defined(__LITTLE_ENDIAN) && defined(__BIG_ENDIAN) # define BYTE_ORDER __BYTE_ORDER # define LITTLE_ENDIAN __LITTLE_ENDIAN # define BIG_ENDIAN __BIG_ENDIAN # endif #endif ...and only referred to BYTE_ORDER, LITTLE_ENDIAN and BIG_ENDIAN. That way we'd follow closer to where opengroup are heading: http://www.opengroup.org/austin/docs/austin_514.txt -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: feature request: text=input option in .gitattributes
On 2014-02-11 15.57, Cameron Taggart wrote: > After requesting this as > https://github.com/msysgit/msysgit/issues/164, I was told to take it > upstream, so here I am. > > I would like a text=input feature added that has the same behavior as > text=auto, except that it defaults to core.autocrlf=input behavior > instead of core.autocrlf=true. If you want to normailze your repo, and keep it normalized, I can recommend to use .gitattributes. Please see the excellent page here: https://www.kernel.org/pub/software/scm/git/docs/gitattributes.html And especially this part: $ echo "* text=auto" >>.gitattributes $ rm .git/index # Remove the index to force git to $ git reset # re-scan the working directory $ git status# Show files that will be normalized $ git add -u $ git add .gitattributes $ git commit -m "Introduce end-of-line normalization" /Torsten -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Turning a test script into something usable in t/perf
Hi, I find that I have little clue about how to convert the following brief test script into some test to place in t/perf: #!/bin/sh rm -rf /tmp/git-test mkdir /tmp/git-test cd /tmp/git-test git init LIMIT=20 yes a|head -$LIMIT >data yes b|head -$LIMIT >data2 git add data data2 git commit -m "split" git rm data2 yes 'a b' | head -$(($LIMIT*2)) >data git add data git commit -m "combined" time git blame data >/dev/null The variable LIMIT is the deciding factor for determining performance which, with the code in current master, is rather measurably O(LIMIT^2). I think that the current test takes about 15 minutes to complete on my computer. Obviously, that's sort of excessive: there is little point in choosing sizes that show off more than two orders of magnitude in improvement. Now the pathological cases are lots of small but attributable fragments in the blamed source files. One real-world project that is hit rather hard is an alphabetically sorted large list of words that tends to get insertions/deletions of few scattered lines at a time. Should one aim for an actually pathological case like in this script? Should one try benchmarking with one of the stock repositories instead that don't really demonstrate well just how bad the behavior might become and which code passages are dominant regarding the quadratic behavior? -- David Kastrup
Re: pack bitmap woes on Windows
On Wed, Feb 12, 2014 at 03:49:24PM +0100, Erik Faye-Lund wrote: > It seems the author of a201c20b41a2f0725977bcb89a2a66135d776ba2 > assumes __BYTE_ORDER was guaranteed to always be defined, probably > fooled by the following check: > > #if !defined(__BYTE_ORDER) > # error "Cannot determine endianness" > #endif > > However, that check is only performed if we're NOT building with GCC > on x86/x64 nor MSVC (which I don't think defined __BYTE_ORDER either) > > The following would make __BYTE_ORDER a guarantee, and show that MinGW > does not define it: Yes, that is the problem. Sorry, this got brought up earlier[1], but the discussion went on and I did not notice that Brian's patch did not get applied. After re-reading the discussion, I think the patch below is the best solution. Can you confirm that it fixes the problem for you? -Peff [1] http://thread.gmane.org/gmane.comp.version-control.git/241278 -- >8 -- Subject: ewah: unconditionally ntohll ewah data Commit a201c20 tried to optimize out a loop like: for (i = 0; i < len; i++) data[i] = ntohll(data[i]); in the big-endian case, because we know that ntohll is a noop, and we do not need to pay the cost of the loop at all. However, it mistakenly assumed that __BYTE_ORDER was always defined, whereas it may not be on systems which do not define it by default, and where we did not need to define it to set up the ntohll macro. This includes OS X and Windows. We could muck with the ordering in compat/bswap.h to make sure it is defined unconditionally, but it is simpler to still to just execute the loop unconditionally. That avoids the application code knowing anything about these magic macros, and lets it depend only on having ntohll defined. And since the resulting loop looks like (on a big-endian system): for (i = 0; i < len; i++) data[i] = data[i]; any decent compiler can probably optimize it out. Original report and analysis by Brian Gernhardt. Signed-off-by: Jeff King --- ewah/ewah_io.c | 10 +++--- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/ewah/ewah_io.c b/ewah/ewah_io.c index 4a7fae6..f7f700e 100644 --- a/ewah/ewah_io.c +++ b/ewah/ewah_io.c @@ -113,6 +113,7 @@ int ewah_serialize(struct ewah_bitmap *self, int fd) int ewah_read_mmap(struct ewah_bitmap *self, void *map, size_t len) { uint8_t *ptr = map; + size_t i; self->bit_size = get_be32(ptr); ptr += sizeof(uint32_t); @@ -135,13 +136,8 @@ int ewah_read_mmap(struct ewah_bitmap *self, void *map, size_t len) memcpy(self->buffer, ptr, self->buffer_size * sizeof(uint64_t)); ptr += self->buffer_size * sizeof(uint64_t); -#if __BYTE_ORDER != __BIG_ENDIAN - { - size_t i; - for (i = 0; i < self->buffer_size; ++i) - self->buffer[i] = ntohll(self->buffer[i]); - } -#endif + for (i = 0; i < self->buffer_size; ++i) + self->buffer[i] = ntohll(self->buffer[i]); self->rlw = self->buffer + get_be32(ptr); -- 1.8.5.2.500.g8060133 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/11] More preparatory work for multiparent tree-walker
Junio C Hamano writes: > Kirill Smelkov writes: > >> Sorry for the confusion. Could you please do the following: >> >> Patches should be applied over to ks/tree-diff-walk >> (74aa4a18). Before applying the patches, please cherry-pick >> >> c90483d9(tree-diff: no need to manually verify that there is no >> mode change for a path) >> >> ef4f0928(tree-diff: no need to pass match to >> skip_uninteresting()) >> >> into that branch, and drop them from ks/combine-diff - we'll have one >> branch reworking the diff tree-walker, and the other taking advantage of >> it for combine-diff. > > As long as that does not lose changes to tests and clean-ups, I'm > fine with that direction. For example, I do not know if you want to > lose e3f62d12 (diffcore-order: export generic ordering interface, > 2014-01-20), which is part of the combine-diff topic. Ahh, sorry, I misread the "drop" as "salvage these two and drop the rest". The new series does apply cleanly on a commit in master..pu that has both ks/tree-diff-walk and ks/combine-diff, and the latter is not yet in 'next' so we are free to reorganize. Let me flip the latter topic around, also queue these updates and push the result out on 'pu'. Thanks. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] gc: config option for running --auto in background
Duy Nguyen writes: > On Tue, Feb 11, 2014 at 2:11 AM, Junio C Hamano wrote: >> On Mon, Feb 10, 2014 at 10:43 AM, Junio C Hamano wrote: If --quiet is set, we should not be printing anyway. If not, I thinkg we could only print "auto packing in background.." when we actually can do that, else just print the old message. It means an #ifdef NO_POSIX_GOODIES here again though.. >>> >>> Didn't you change it not to die but return nosys or something? >> >> Ah, the problem is that it is too late to take back "... will do so in >> the background" when you noticed that daemonize() did not succeed, so >> you would need a way to see if we can daemonize() before actually >> doing so if you want to give different messages. >> >> "int can_daemonize(void)" could be an answer that is nicer than >> NO_POSIX_GOODIES, but I am not sure if it is worth it. > > Or we could pass the "quiet" flag to daemonize() and let it print > something in the #ifdef NO_POSIX_GOODIES part. Hmph... What would that something say? "I was asked to gc in the background but I can't here" is not suitable for daemonize() that is not specific to "gc". The flow I had in mind was something along the lines of this if (!quiet) { if (detach_auto && can_daemonize()) say "auto packing in the background"; else say "auto packing" } if (detach_auto && can_daemonize()) daemonize(); If we had daemonize(noisy=1) and coded it this way: if (!quiet) say "auto packing"; if (detach_auto) daemonize(!quiet); we could do something like: daemonize(int noisy) { if (noisy && !defined(NO_POSIX_GOODIES)) say ", and doing so in the background"; ... do the actual daemonizing ... } but that feels ugly. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Make the git codebase thread-safe
On Tue, Feb 11, 2014 at 6:11 PM, Duy Nguyen wrote: > > I have no comments about thread safety improvements (well, not yet). > If you have investigated about git performance on chromium > repositories, could you please sum it up? Threading may be an option > to improve performance, but it's probably not the only option. Well, the painful operations that we use frequently are pack-objects, checkout, status, and blame. Anything on Windows that touches a lot of files is miserable due to the usual file system slowness on Windows, and luafv.sys (the UAC file virtualization driver) seems to make it much worse. With threading turned on, pack-objects on Windows now takes about twice as long as on Linux, which is still more than a 2x improvement over the non-threaded operation. Checkout is really bad on Windows. The blink repository is ~200K files, and a full clean checkout from the index takes about 10 seconds on Linux, and about 3:30 on Windows. I used the Very Sleepy profiler to see where all the time was spent on Windows: 55% of the time was spent in OpenFile, and 25% in CloseFile (both in win32). My immediate goal is to add threading to checkout, so those file system calls can be done in parallel. Enabling the fscache speeds up status quite a bit. I'm optimistic that parallelizing the stat calls will yield a further improvement. Beyond that, it may not be possible to do much more without using a file system watcher daemon, like facebook does with mercurial. (https://code.facebook.com/posts/218678814984400/scaling-mercurial-at-facebook/) Blame is something that chromium and blink developers use heavily, and it is not unusual for a blame invocation on the blink repository to run for 30 seconds. It seems like it should be possible to parallelize blame, but it requires pack file operations to be thread-safe. Stefan -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Make the git codebase thread-safe
On Tue, Feb 11, 2014 at 7:43 PM, Duy Nguyen wrote: > > From v1.9 shallow clone should work for all push/pull/clone... so > history depth does not matter (on the client side). As for > gentoo-x86's large worktree, using index v4 and avoid full-tree > operations (e.g. "status .", not "status"..) should make all > operations reasonably fast. I plan to make "status" fast even without > path limiting with the help of inotify, but that's not going to be > finished soon. Did I miss anything else? Chromium developers frequently want to run status over their entire checkout, and a lot of them run 'git commit -a'. We want to do everything possible to speed this up. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Make the git codebase thread-safe
On Wed, Feb 12, 2014 at 3:59 AM, Erik Faye-Lund wrote: > On Wed, Feb 12, 2014 at 2:54 AM, Stefan Zager wrote: >> >> We are particularly concerned with the performance of msysgit, and we >> have already chalked up a significant performance gain by turning on >> the threading code in pack-objects (which was already enabled for >> posix platforms, but not on msysgit, owing to the lack of a correct >> pread implementation). > > How did you manage to do this? I'm not aware of any way to implement > pread on Windows (without going down the insanity-path of wrapping and > potentially locking inside every IO operation)... I don't want to steal the thunder of my coworker, who wrote the implementation. He plans to submit it upstream soon-ish. It relies on using the lpOverlapped argument to ReadFile(), with some additional tomfoolery to make sure that the implicit position pointer for the file descriptor doesn't get modified. Stefan -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Make the global packed_git variable static to sha1_file.c.
On Tue, Feb 11, 2014 at 11:29 PM, Chris Packham wrote: > Hi, > > On 12/02/14 14:57, Stefan Zager wrote: >> From b4796d9d99c03b0b7cddd50808a41413e45f1129 Mon Sep 17 00:00:00 2001 >> From: Stefan Zager >> Date: Mon, 10 Feb 2014 16:55:12 -0800 >> Subject: [PATCH] Make the global packed_git variable static to sha1_file.c. > > I'm not really qualified to comment on substance but there are some > basic style issues w.r.t. whitespace namely using 4 spaces for indent > and mixing tabs/spaces. This might seem pedantic for the first round of > a patch but it does put off reviewers. > > From Documentation/CodingGuidelines: > > - We use tabs to indent, and interpret tabs as taking up to >8 spaces. My bad, I will upload a fixed patch. In my defense: I edited the code in emacs and then ran "M-x tabify" over the entire file. But that had the unfortunate side effect of adding a bunch of whitespace-only changes to the diff, illuminating the fact that there is already mixed whitespace in the existing code. So I had to go back and selectively tabify my changes, and I clearly missed a bunch. If anyone has a recommendation for a less labor-intensive way to do this in emacs, I'd be very grateful. Thanks, Stefan -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Make the git codebase thread-safe
On Wed, Feb 12, 2014 at 7:20 PM, Stefan Zager wrote: > On Wed, Feb 12, 2014 at 3:59 AM, Erik Faye-Lund wrote: >> On Wed, Feb 12, 2014 at 2:54 AM, Stefan Zager wrote: >>> >>> We are particularly concerned with the performance of msysgit, and we >>> have already chalked up a significant performance gain by turning on >>> the threading code in pack-objects (which was already enabled for >>> posix platforms, but not on msysgit, owing to the lack of a correct >>> pread implementation). >> >> How did you manage to do this? I'm not aware of any way to implement >> pread on Windows (without going down the insanity-path of wrapping and >> potentially locking inside every IO operation)... > > I don't want to steal the thunder of my coworker, who wrote the > implementation. He plans to submit it upstream soon-ish. It relies > on using the lpOverlapped argument to ReadFile(), with some additional > tomfoolery to make sure that the implicit position pointer for the > file descriptor doesn't get modified. Is the code available somewhere? I'm especially interested in the "additional tomfoolery to make sure that the implicit position pointer for the file descriptor doesn't get modified"-part, as this was what I ended up butting my head into when trying to do this myself. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Make the git codebase thread-safe
Stefan Zager writes: > I'm optimistic that parallelizing the stat calls will yield a further > improvement. It has already been mentionned in the thread, but in case you overlooked it: did you look at core.preloadindex? It seems at least very close to what you want. -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Make the git codebase thread-safe
On Wed, Feb 12, 2014 at 10:27 AM, Erik Faye-Lund wrote: > On Wed, Feb 12, 2014 at 7:20 PM, Stefan Zager wrote: >> >> I don't want to steal the thunder of my coworker, who wrote the >> implementation. He plans to submit it upstream soon-ish. It relies >> on using the lpOverlapped argument to ReadFile(), with some additional >> tomfoolery to make sure that the implicit position pointer for the >> file descriptor doesn't get modified. > > Is the code available somewhere? I'm especially interested in the > "additional tomfoolery to make sure that the implicit position pointer > for the file descriptor doesn't get modified"-part, as this was what I > ended up butting my head into when trying to do this myself. https://chromium-review.googlesource.com/#/c/186104/ -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Make the git codebase thread-safe
On Wed, Feb 12, 2014 at 10:33 AM, Matthieu Moy wrote: > Stefan Zager writes: > >> I'm optimistic that parallelizing the stat calls will yield a further >> improvement. > > It has already been mentionned in the thread, but in case you overlooked > it: did you look at core.preloadindex? It seems at least very close to > what you want. Ah yes, sorry, I overlooked that. We have indeed turned on core.preloadindex, and it does indeed speed up status. That speedup is reflected in my previous comments about our observations working with chromium and blink. Stefan -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Make the git codebase thread-safe
On Wed, Feb 12, 2014 at 7:34 PM, Stefan Zager wrote: > On Wed, Feb 12, 2014 at 10:27 AM, Erik Faye-Lund wrote: >> On Wed, Feb 12, 2014 at 7:20 PM, Stefan Zager wrote: >>> >>> I don't want to steal the thunder of my coworker, who wrote the >>> implementation. He plans to submit it upstream soon-ish. It relies >>> on using the lpOverlapped argument to ReadFile(), with some additional >>> tomfoolery to make sure that the implicit position pointer for the >>> file descriptor doesn't get modified. >> >> Is the code available somewhere? I'm especially interested in the >> "additional tomfoolery to make sure that the implicit position pointer >> for the file descriptor doesn't get modified"-part, as this was what I >> ended up butting my head into when trying to do this myself. > > https://chromium-review.googlesource.com/#/c/186104/ ReOpenFile, that's fantastic. Thanks a lot! -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Make the git codebase thread-safe
Stefan Zager writes: > On Tue, Feb 11, 2014 at 6:11 PM, Duy Nguyen wrote: >> >> I have no comments about thread safety improvements (well, not yet). >> If you have investigated about git performance on chromium >> repositories, could you please sum it up? Threading may be an option >> to improve performance, but it's probably not the only option. > > Well, the painful operations that we use frequently are pack-objects, > checkout, status, and blame. Have you checked the patch in http://thread.gmane.org/gmane.comp.version-control.git/241448> and followups, Message-ID: <1391454849-26558-1-git-send-email-...@gnu.org>? While this does not yet support -M and -C options, it's conceivable that you don't use them in your server/scripts. > Anything on Windows that touches a lot of files is miserable due to > the usual file system slowness on Windows, and luafv.sys (the UAC file > virtualization driver) seems to make it much worse. There is an obvious solution here... Dedicated hardware is not that expensive. Virtualization will always have a price. > Blame is something that chromium and blink developers use heavily, and > it is not unusual for a blame invocation on the blink repository to > run for 30 seconds. It seems like it should be possible to > parallelize blame, but it requires pack file operations to be > thread-safe. Really, give the above patch a try. I am taking longer to finish it than anticipated (with a lot due to procrastination but that is, unfortunately, a large part of my workflow), and it's cutting into my "paychecks" (voluntary donations which to a good degree depend on timely and nontrivial progress reports for my freely available work on GNU LilyPond). Note that it looks like the majority of the remaining time on GNU/Linux tends to be spent in system time: I/O time, memory management. And I have an SSD drive. When using packed repositories of considerable size, decompression comes into play as well. I don't think that you can hope to get noticeably higher I/O throughput by multithreading, so really, really, really consider dedicated hardware running on a native Linux file system. -- David Kastrup -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Make the global packed_git variable static to sha1_file.c.
Stefan Zager writes: > On Tue, Feb 11, 2014 at 11:29 PM, Chris Packham > wrote: >> Hi, >> >> On 12/02/14 14:57, Stefan Zager wrote: >>> From b4796d9d99c03b0b7cddd50808a41413e45f1129 Mon Sep 17 00:00:00 2001 >>> From: Stefan Zager >>> Date: Mon, 10 Feb 2014 16:55:12 -0800 >>> Subject: [PATCH] Make the global packed_git variable static to sha1_file.c. >> >> I'm not really qualified to comment on substance but there are some >> basic style issues w.r.t. whitespace namely using 4 spaces for indent >> and mixing tabs/spaces. This might seem pedantic for the first round of >> a patch but it does put off reviewers. >> >> From Documentation/CodingGuidelines: >> >> - We use tabs to indent, and interpret tabs as taking up to >>8 spaces. > > My bad, I will upload a fixed patch. In my defense: I edited the code > in emacs and then ran "M-x tabify" over the entire file. But that had > the unfortunate side effect of adding a bunch of whitespace-only > changes to the diff, illuminating the fact that there is already mixed > whitespace in the existing code. So I had to go back and selectively > tabify my changes, and I clearly missed a bunch. > > If anyone has a recommendation for a less labor-intensive way to do > this in emacs, I'd be very grateful. C-c . RET linux RET before entering any changes. -- David Kastrup -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] Make the global packed_git variable static to sha1_file.c.
>From 0a59547f3e95ddecf7606c5f259ae6177c5a104f Mon Sep 17 00:00:00 2001 From: Stefan Zager Date: Mon, 10 Feb 2014 16:55:12 -0800 Subject: [PATCH] Make the global packed_git variable static to sha1_file.c. This is a first step in making the codebase thread-safe. By and large, the operations which might benefit from threading are those that work with pack files (e.g., checkout, blame), so the focus of this patch is stop leaking the global list of pack files outside of sha1_file.c. The next step will be to control access to the list of pack files with a mutex. However, that alone is not enough to make pack file access thread safe. Even in a read-only operation, the window list associated with each pack file will need to be controlled. Additionally, the global counters in sha1_file.c will need to be controlled. This patch is a pure refactor with no functional changes, so it shouldn't require any additional tests. Adding the actual locks will be a functional change, and will require additional tests. Signed-off-by: Stefan Zager --- builtin/count-objects.c | 44 ++- builtin/fsck.c | 46 +++- builtin/gc.c | 26 +++ builtin/pack-objects.c | 188 --- builtin/pack-redundant.c | 37 +++--- cache.h | 16 +++- fast-import.c| 4 +- http-backend.c | 28 --- http-push.c | 4 +- http-walker.c| 2 +- pack-revindex.c | 20 ++--- server-info.c| 35 + sha1_file.c | 35 - sha1_name.c | 18 - 14 files changed, 315 insertions(+), 188 deletions(-) diff --git a/builtin/count-objects.c b/builtin/count-objects.c index a7f70cb..6554dfe 100644 --- a/builtin/count-objects.c +++ b/builtin/count-objects.c @@ -83,14 +83,32 @@ static char const * const count_objects_usage[] = { NULL }; +struct pack_data { + unsigned long packed; + off_t size_pack; + unsigned long num_pack; +}; + +int pack_data_fn(struct packed_git *p, void *data) +{ + struct pack_data *pd = (struct pack_data *) data; + if (p->pack_local && !open_pack_index(p)) { + pd->packed += p->num_objects; + pd->size_pack += p->pack_size + p->index_size; + pd->num_pack++; + } + return 0; +} + int cmd_count_objects(int argc, const char **argv, const char *prefix) { int i, verbose = 0, human_readable = 0; const char *objdir = get_object_directory(); int len = strlen(objdir); char *path = xmalloc(len + 50); - unsigned long loose = 0, packed = 0, packed_loose = 0; + unsigned long loose = 0, packed_loose = 0; off_t loose_size = 0; + struct pack_data pd = {0,0,0}; struct option opts[] = { OPT__VERBOSE(&verbose, N_("be verbose")), OPT_BOOL('H', "human-readable", &human_readable, @@ -118,41 +136,29 @@ int cmd_count_objects(int argc, const char **argv, const char *prefix) closedir(d); } if (verbose) { - struct packed_git *p; - unsigned long num_pack = 0; - off_t size_pack = 0; struct strbuf loose_buf = STRBUF_INIT; struct strbuf pack_buf = STRBUF_INIT; struct strbuf garbage_buf = STRBUF_INIT; - if (!packed_git) - prepare_packed_git(); - for (p = packed_git; p; p = p->next) { - if (!p->pack_local) - continue; - if (open_pack_index(p)) - continue; - packed += p->num_objects; - size_pack += p->pack_size + p->index_size; - num_pack++; - } + prepare_packed_git(); + foreach_packed_git(pack_data_fn, NULL, &pd); if (human_readable) { strbuf_humanise_bytes(&loose_buf, loose_size); - strbuf_humanise_bytes(&pack_buf, size_pack); + strbuf_humanise_bytes(&pack_buf, pd.size_pack); strbuf_humanise_bytes(&garbage_buf, size_garbage); } else { strbuf_addf(&loose_buf, "%lu", (unsigned long)(loose_size / 1024)); strbuf_addf(&pack_buf, "%lu", - (unsigned long)(size_pack / 1024)); + (unsigned long)(pd.size_pack / 1024)); strbuf_addf(&garbage_buf, "%lu", (unsigned long)(size_garbage / 1024)); } printf("count: %lu\n", loose); printf("size: %s\n", loose_buf.buf); - printf("in-pack: %lu\n", packed); -
Re: Make the git codebase thread-safe
On Wed, Feb 12, 2014 at 10:50 AM, David Kastrup wrote: > Stefan Zager writes: > >> Anything on Windows that touches a lot of files is miserable due to >> the usual file system slowness on Windows, and luafv.sys (the UAC file >> virtualization driver) seems to make it much worse. > > There is an obvious solution here... Dedicated hardware is not that > expensive. Virtualization will always have a price. Not sure I follow you. We need to support people developing, building, and testing on natively Windows machines. And we need to support users with reasonable hardware, including spinning disks. If we were only interested in optimizing for Google employees, each of whom has one or more small nuclear reactors under their desk, this would be easy. >> Blame is something that chromium and blink developers use heavily, and >> it is not unusual for a blame invocation on the blink repository to >> run for 30 seconds. It seems like it should be possible to >> parallelize blame, but it requires pack file operations to be >> thread-safe. > > Really, give the above patch a try. I am taking longer to finish it > than anticipated (with a lot due to procrastination but that is, > unfortunately, a large part of my workflow), and it's cutting into my > "paychecks" (voluntary donations which to a good degree depend on timely > and nontrivial progress reports for my freely available work on GNU > LilyPond). I will give that a try. How much of a performance improvement have you clocked? > Note that it looks like the majority of the remaining time on GNU/Linux > tends to be spent in system time: I/O time, memory management. And I > have an SSD drive. When using packed repositories of considerable size, > decompression comes into play as well. I don't think that you can hope > to get noticeably higher I/O throughput by multithreading, so really, > really, really consider dedicated hardware running on a native Linux > file system. I have a background in hardware, and I have much more faith in modern disk schedulers :) Stefan -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] tests: turn on network daemon tests by default
Jeff King writes: > Agreed, but I think the only way to know the size of those fallouts is > to try it and see who complains. I would not normally be so cavalier > with git itself, but I think for the test infrastructure, we have a > small, tech-savvy audience that can help us iterate on it without too > much pain. There is another. $ GIT_TEST_HTTPD=false sh t5537-fetch-shallow.sh ok 1 - setup ok 2 - setup shallow clone ok 3 - clone from shallow clone ok 4 - fetch from shallow clone ok 5 - fetch --depth from shallow clone ok 6 - fetch --unshallow from shallow clone ok 7 - fetch something upstream has but hidden by clients shallow boundaries ok 8 - fetch that requires changes in .git/shallow is filtered ok 9 - fetch --update-shallow error: Can't use skip_all after running some tests Under 'prove' test target, the way it exits causes: *** prove *** t5537-fetch-shallow.sh .. Dubious, test returned 1 (wstat 256, 0x100) All 9 subtests passed which leads to: Test Summary Report --- t5537-fetch-shallow.sh (Wstat: 256 Tests: 9 Failed: 0) Non-zero exit status: 1 Parse errors: No plan found in TAP output On the 'master' branch without these "auto opt-in" patches, $ GIT_TEST_HTTPD= sh t5537-fetch-shallow.sh ok 1 - setup ok 2 - setup shallow clone ok 3 - clone from shallow clone ok 4 - fetch from shallow clone ok 5 - fetch --depth from shallow clone ok 6 - fetch --unshallow from shallow clone ok 7 - fetch something upstream has but hidden by clients shallow boundaries ok 8 - fetch that requires changes in .git/shallow is filtered ok 9 - fetch --update-shallow skipping remaining tests, git built without http support # passed all 9 test(s) 1..9 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Make the git codebase thread-safe
Stefan Zager writes: > On Wed, Feb 12, 2014 at 10:50 AM, David Kastrup wrote: > >> Really, give the above patch a try. I am taking longer to finish it >> than anticipated (with a lot due to procrastination but that is, >> unfortunately, a large part of my workflow), and it's cutting into my >> "paychecks" (voluntary donations which to a good degree depend on timely >> and nontrivial progress reports for my freely available work on GNU >> LilyPond). > > I will give that a try. How much of a performance improvement have > you clocked? Depends on file type and size. With large files with lots of small changes, performance improvements get more impressive. Some ugly real-world examples are the Emacs repository, src/xdisp.c (performance improvement about a factor of 3), a large file in the style of /usr/share/dict/words clocking in at a factor of about 5. Again, that's with an SSD and ext4 filesystem on GNU/Linux, and there are no improvements in system time (I/O) except for patch 4 of the series which helps perhaps 20% or so. So the benefits of the patch will come into play mostly for big, bad files on Windows: other than that, the I/O time is likely to be the dominant player anyway. If you have benchmarked the stuff, for annoying cases expect I/O time to go down maybe 10-20%, and user time to drop by a factor of 4. Under GNU/Linux, that makes for a significant overall improvement. On Windows, the payback is likely quite less because of the worse I/O performance. Pity. -- David Kastrup -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Make the global packed_git variable static to sha1_file.c.
Stefan Zager writes: > If anyone has a recommendation for a less labor-intensive way to do > this in emacs, I'd be very grateful. This is not "do this in emacs", but here is a possible approach. You can ask "git diff" about what you changed, and actually apply the change while fixing whitespace errors. I.e. git diff sha1_file.c | git apply --cached --whitespace=fix git diff git checkout sha1_file.c The first step will add a cleaned-up version to your index. The second "diff" (optional) is to see what whitespace errors are introduced when going from that cleaned-up version to what you have in the working tree. With the last step you would update the working tree version to the cleaned-up version from the index. [alias] wsadd = "!sh -c 'git diff -- \"$@\" | git apply --cached --whitespace=fix;\ git co -- ${1-.} \"$@\"' -" -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Make the git codebase thread-safe
Am 12.02.2014 19:37, schrieb Erik Faye-Lund: > On Wed, Feb 12, 2014 at 7:34 PM, Stefan Zager wrote: >> On Wed, Feb 12, 2014 at 10:27 AM, Erik Faye-Lund wrote: >>> On Wed, Feb 12, 2014 at 7:20 PM, Stefan Zager wrote: I don't want to steal the thunder of my coworker, who wrote the implementation. He plans to submit it upstream soon-ish. It relies on using the lpOverlapped argument to ReadFile(), with some additional tomfoolery to make sure that the implicit position pointer for the file descriptor doesn't get modified. >>> >>> Is the code available somewhere? I'm especially interested in the >>> "additional tomfoolery to make sure that the implicit position pointer >>> for the file descriptor doesn't get modified"-part, as this was what I >>> ended up butting my head into when trying to do this myself. >> >> https://chromium-review.googlesource.com/#/c/186104/ > > ReOpenFile, that's fantastic. Thanks a lot! ...but should be loaded dynamically via GetProcAddress, or are we ready to drop XP support? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Make the git codebase thread-safe
On Wed, Feb 12, 2014 at 11:22 AM, Karsten Blees wrote: > Am 12.02.2014 19:37, schrieb Erik Faye-Lund: >> On Wed, Feb 12, 2014 at 7:34 PM, Stefan Zager wrote: >>> On Wed, Feb 12, 2014 at 10:27 AM, Erik Faye-Lund >>> wrote: On Wed, Feb 12, 2014 at 7:20 PM, Stefan Zager wrote: > > I don't want to steal the thunder of my coworker, who wrote the > implementation. He plans to submit it upstream soon-ish. It relies > on using the lpOverlapped argument to ReadFile(), with some additional > tomfoolery to make sure that the implicit position pointer for the > file descriptor doesn't get modified. Is the code available somewhere? I'm especially interested in the "additional tomfoolery to make sure that the implicit position pointer for the file descriptor doesn't get modified"-part, as this was what I ended up butting my head into when trying to do this myself. >>> >>> https://chromium-review.googlesource.com/#/c/186104/ >> >> ReOpenFile, that's fantastic. Thanks a lot! > > ...but should be loaded dynamically via GetProcAddress, or are we ready to > drop XP support? Right, that is an issue. From our perspective, it's well past time to drop XP support. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] blame.c: prepare_lines should not call xrealloc for every line
David Kastrup writes: > Making a single preparation run for counting the lines will avoid memory > fragmentation. Also, fix the allocated memory size which was wrong > when sizeof(int *) != sizeof(int), and would have been too small > for sizeof(int *) < sizeof(int), admittedly unlikely. > > Signed-off-by: David Kastrup > --- I think I took sizeof(int*)->sizeof(int) patch to the 'next' branch already, which might have to conflict with this clean-up, but it should be trivial to resolve. Thanks for resending. I was busy elsewhere (i.e. "no feedback" does not mean "silent rejection" nor "silent agreement" at least from me), and such a resend does help prevent patches fall thru cracks. > diff --git a/builtin/blame.c b/builtin/blame.c > index e44a6bb..1aefedf 100644 > --- a/builtin/blame.c > +++ b/builtin/blame.c > @@ -1772,25 +1772,41 @@ static int prepare_lines(struct scoreboard *sb) > { > const char *buf = sb->final_buf; > unsigned long len = sb->final_buf_size; > + const char *end = buf + len; > + const char *p; > + int *lineno; > + int num = 0, incomplete = 0; > > + for (p = buf;;) { > + p = memchr(p, '\n', end - p); > + if (p) { > + p++; > num++; > + continue; > } > + break; > } > + > + if (len && end[-1] != '\n') > + incomplete++; /* incomplete line at the end */ > + > + sb->lineno = xmalloc(sizeof(*sb->lineno) * (num + incomplete + 1)); > + lineno = sb->lineno; > + > + *lineno++ = 0; > + for (p = buf;;) { > + p = memchr(p, '\n', end - p); > + if (p) { > + p++; > + *lineno++ = p - buf; > + continue; > + } > + break; > + } > + > + if (incomplete) > + *lineno++ = len; > + > sb->num_lines = num + incomplete; > return sb->num_lines; > } -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] tree-walk: be more specific about corrupt tree errors
When the tree-walker runs into an error, it just calls die(), and the message is always "corrupt tree file". However, we are actually covering several cases here; let's give the user a hint about what happened. Let's also avoid using the word "corrupt", which makes it seem like the data bit-rotted on disk. Our sha1 check would already have found that. These errors are ones of data that is malformed in the first place. Signed-off-by: Jeff King --- Michael and I have been looking off-list at some bogus trees (created by a non-git.git implementation). git-fsck unhelpfully dies during the tree-walk: $ git fsck Checking object directories: 100% (256/256), done. fatal: corrupt tree file I think in the long run we want to either teach fsck to avoid the regular tree-walker or to set a special "continue as much as you can" flag. That will let us keep going to find more errors, do our usual fsck error checks (which are more strict), and especially report on _which_ object was broken (we can't do that here because we are deep in the call stack and may not even have a real object yet). This patch at least gives us slightly more specific error messages (both for fsck and for other commands). And it may provide a first step in clarity if we follow the "continue as much as you can" path. tree-walk.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/tree-walk.c b/tree-walk.c index 79dba1d..d53b084 100644 --- a/tree-walk.c +++ b/tree-walk.c @@ -28,11 +28,13 @@ static void decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned unsigned int mode, len; if (size < 24 || buf[size - 21]) - die("corrupt tree file"); + die("truncated tree file"); path = get_mode(buf, &mode); - if (!path || !*path) - die("corrupt tree file"); + if (!path) + die("malformed mode in tree entry"); + if (!*path) + die("empty filename in tree entry"); len = strlen(path) + 1; /* Initialize the descriptor entry */ @@ -81,7 +83,7 @@ void update_tree_entry(struct tree_desc *desc) unsigned long len = end - (const unsigned char *)buf; if (size < len) - die("corrupt tree file"); + die("truncated tree file"); buf = end; size -= len; desc->buffer = buf; -- 1.8.5.2.500.g8060133 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Make the git codebase thread-safe
Stefan Zager writes: > ... I used the Very Sleepy profiler > to see where all the time was spent on Windows: 55% of the time was > spent in OpenFile, and 25% in CloseFile (both in win32). This is somewhat interesting. When we check things out, checkout_paths() has a list of paths to be checked out, and iterates over them and call checkout_entry(). I wonder if you can: - introduce a version of checkout_entry() that takes file descriptors to write to; - have an asynchronous helper threads that pre-open the paths to be written out and feed to a queue; - restructure that loop so that it reads the from the queue, performs the actual writing out, and then feeds to another queue; and - have another asynchronous helper threads that reads from the queue and close them. Calls to write (and preparation of data to be written) will then remain single-threaded, but it sounds like that codepath is not the bottleneck in your measurement, so -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Make the global packed_git variable static to sha1_file.c.
I'll locally fix up these style issues before commenting on the patch. Thanks. ERROR: space required after that ',' (ctx:VxV) #78: FILE: builtin/count-objects.c:111: + struct pack_data pd = {0,0,0}; ^ ERROR: space required after that ',' (ctx:VxV) #78: FILE: builtin/count-objects.c:111: + struct pack_data pd = {0,0,0}; ^ ERROR: space required after that ',' (ctx:VxV) #171: FILE: builtin/fsck.c:683: + struct verify_packs_data vpd = {0,0,0}; ^ ERROR: space required after that ',' (ctx:VxV) #171: FILE: builtin/fsck.c:683: + struct verify_packs_data vpd = {0,0,0}; ^ ERROR: space required after that ',' (ctx:VxV) #541: FILE: builtin/pack-redundant.c:591: + struct add_pack_data apd = {filename,0,NULL}; ^ ERROR: space required after that ',' (ctx:VxV) #541: FILE: builtin/pack-redundant.c:591: + struct add_pack_data apd = {filename,0,NULL}; ^ ERROR: space required after that ',' (ctx:VxV) #565: FILE: builtin/pack-redundant.c:604: + struct add_pack_data apd = {NULL,0,NULL}; ^ ERROR: space required after that ',' (ctx:VxV) #565: FILE: builtin/pack-redundant.c:604: + struct add_pack_data apd = {NULL,0,NULL}; ^ ERROR: space prohibited after that '-' (ctx:WxW) #726: FILE: pack-revindex.c:47: + num = - 1 - num; ^ ERROR: space required after that ',' (ctx:VxV) #901: FILE: sha1_name.c:195: + struct disambiguate_data d = {len,bin_pfx,ds}; ^ ERROR: space required after that ',' (ctx:VxV) #901: FILE: sha1_name.c:195: + struct disambiguate_data d = {len,bin_pfx,ds}; ^ total: 11 errors, 0 warnings, 781 lines checked -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Make the git codebase thread-safe
On Wed, Feb 12, 2014 at 12:06 PM, Junio C Hamano wrote: > Stefan Zager writes: > >> ... I used the Very Sleepy profiler >> to see where all the time was spent on Windows: 55% of the time was >> spent in OpenFile, and 25% in CloseFile (both in win32). > > This is somewhat interesting. > > When we check things out, checkout_paths() has a list of paths to be > checked out, and iterates over them and call checkout_entry(). > > I wonder if you can: > > - introduce a version of checkout_entry() that takes file >descriptors to write to; > > - have an asynchronous helper threads that pre-open the paths to be >written out and feed to a >queue; > > - restructure that loop so that it reads the to be written> from the queue, performs the actual writing out, >and then feeds to another queue; and > > - have another asynchronous helper threads that reads descriptor to be closed> from the queue and close them. > > Calls to write (and preparation of data to be written) will then > remain single-threaded, but it sounds like that codepath is not the > bottleneck in your measurement, so Yes, I considered that as well. At a minimum, that would still require attr.c to implement thread locking, since attribute files must be parsed to look for stream filters. I have already done that work. But I'm not sure it's the best long-term approach to add convoluted custom threading solutions to each git operation as it appears on the performance radar. I'm hoping to make the entire code base more thread-friendly, so that threading can be added in a more natural and idiomatic (and less painful) way. For example, the most natural way to add threading to checkout would be in the loops over the index in check_updates() in unpack-trees.c. If attr.c and sha1_file.c were thread-safe, then it would be possible to thread checkout entirely in check_updates(), with a pretty compact code change. I have already done the work in attr.c; sha1_file.c is hairier, but do-able. Stefan -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Introduce experimental remote object access mode
On Tue, Feb 11, 2014 at 11:29 AM, Junio C Hamano wrote: > Shawn Pearce writes: > >> Why would you do this? Perhaps you need more time in your day >> to consume tea or coffee. Set GIT_RTT and enjoy a beverage. > > So the conclusion is that it is not practical to do a lazy fetch if > it is done extremely naively at "we want this object --- wait a bit > and we'll give you" level? Yes, this is what I thought when someone proposed this hack in sha1_file.c to me on Monday. So I ran a quick experiment to see if my instinct was right. > I am wondering if we can do a bit better, like "we want this object > --- wait a bit, ah that's a commit, so it is likely that you may > want the trees and blobs associated with it, too, if not right now > but in a near future, let me push a pack that holds them to you"? Ah, smart observation. That might work. However I doubt it. I implemented a version of Git on top of Google Bigtable (and Apache HBase and Apache Cassandra) multiple times using JGit. tl;dr: this approach doesn't work in practice. The naive implementation for these distributed NoSQL systems is to store each object in its own row keyed by SHA-1, and lookup the object when you want it. This is very slow and is more or less what this stupid patch shows. Worse, none of them were able to even get close to the 1ms latency I used in this example. In another implementation (which I published into JGit as the "DHT" backend) I stored a group of related commits together in a row. Row target sizes were in the 1-2 MiB range when using pack style compression for commits, so the average row held hundreds of commits. Reading one commit would actually slurp back a number of related commits. The idea was if we need commit A now we will need B, C, D, E (its parents and ancestors) soon as the application walks the revision history, like rev-list or pack-objects. For a process like pack-objects this almost seems to work. If commits are together we can slurp a group at a time to amortize the round trip latency. Unfortunately the application can still go through hundreds of commits faster than the real world RTT is. So I tried to fix this by storing an extra metadata pointer in each row to identify the next row, so next block of commits could start loading right away. Its still slow, as the application can scan through data faster than the RTT. At least for pack generation the traversal code does commits and builds up a list of all root trees. The root trees can be async loaded in batches, but the depth first traversal is still a killer. There are stalls while the application waits for the next subtree, even if you cluster the trees also into groups using depth first traversal the way the packer produces pack files today. Junio's idea to cluster data by commit and its related trees and blobs is just a different data organization. It may be necessary to make two copies of the data, one clustered by commits and another by commit+tree+blob to satisfy different access patterns. And in the commit+tree+blob case you may need multiple redundant copies of blobs near commits that use them if those commits are frequently accessed. Its a lot of redundant disk space. We always say disk is cheap, but disk is slow and not getting faster. SSDs are helping, but SSDs are expensive and have size limitations compared to spinning disk. Just making many copies of data isn't necessarily a solution. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] contrib/diff-highlight: multibyte characters diff
On Tue, Feb 11, 2014 at 06:09:10PM +0900, Yoshihiro Sugi wrote: > diff-highlight split each hunks and compare them as byte sequences. > it causes problems when diff hunks include multibyte characters. > This change enable to work on such cases by decoding inputs and encoding > output as utf8 string. Thanks for looking at this. I didn't consider multibyte characters at all when I wrote the original. Sadly, applying your patch seems to cause diff-highlight to take about 1.5x as much CPU (I just tried it on the complete output of "git log --all -p" in my git.git repository, which went from ~60s to ~90s). That is not necessarily a deal-breaker, as it is more important to be correct. But I wonder if there is any way we can optimize it. From my understanding, it may not be the encoding/decoding itself, but rather that the utf8-aware text routines are slower. More on that below. > # Highlight by reversing foreground and background. You could do > # other things like bold or underline if you prefer. > @@ -15,8 +16,9 @@ my @added; > my $in_hunk; > > while (<>) { > + $_ = decode_utf8($_); What happens when the diff content is not utf8 at all? The default failure mode for decode_utf8 seems to be to replace non-utf8 sequences with a substitution character. Which means we would corrupt something like iso8859-1, which works fine in the current code. > if (!$in_hunk) { > - print; > + print encode_utf8($_); And we have the opposite question here. We know that what we have in the string is perl's internal format. But how do we know that what the user's terminal wants is utf8? I think in the most general case, we would need to auto-detect the encoding for each string (since there is no guarantee that a file even retains the same encoding through its history). And then probably punt on highlighting when comparing two lines in different encodings, and if they are in the same encoding, treat them as a sequence of code points rather than octets when comparing. That's probably way too slow and complicated to worry about. And frankly, I am OK with making the tool work _best_ with UTF-8, as long as it can fail gracefully for other encodings (i.e., if you can turn off the utf8 magic when you have another encoding, and we will pass it through blindly as octets). Would it be enough to do something like the patch below? The specific things I am shooting for are: - the only part of the code that cares about the sequence is the split/highlight bit, and most lines do not get highlighted at all. Therefore it tries to lazily do the decode step, which gets back our lost performance (I measured ~12% slowdown). - we try the utf8 decode and if it fails, fall back to treating it like a sequence of bytes. That doesn't help people with other multibyte encodings, but at least lets single-byte encodings besides utf8 continue to work. I constructed a very simple test of: git init && printf 'cont\xc3\xa9nt\n' >file && git add file && printf 'cont\xc3\xb6nt\n' >file && git diff and it seemed to work. Can you confirm that it does the right thing on your more complicated cases? -Peff --- diff --git a/contrib/diff-highlight/diff-highlight b/contrib/diff-highlight/diff-highlight index c4404d4..9447ba2 100755 --- a/contrib/diff-highlight/diff-highlight +++ b/contrib/diff-highlight/diff-highlight @@ -2,6 +2,7 @@ use warnings FATAL => 'all'; use strict; +use Encode qw(decode_utf8 encode_utf8); # Highlight by reversing foreground and background. You could do # other things like bold or underline if you prefer. @@ -73,13 +74,23 @@ sub show_hunk { my @queue; for (my $i = 0; $i < @$a; $i++) { - my ($rm, $add) = highlight_pair($a->[$i], $b->[$i]); - print $rm; - push @queue, $add; + my ($a_dec, $encode_rm) = decode($a->[$i]); + my ($b_dec, $encode_add) = decode($b->[$i]); + my ($rm, $add) = highlight_pair($a_dec, $b_dec); + print $encode_rm->($rm); + push @queue, $encode_add->($add); } print @queue; } +sub decode { + my $orig = shift; + my $decoded = eval { decode_utf8($orig, Encode::FB_CROAK) }; + return defined $decoded ? + ($decoded, sub { encode_utf8(shift) }) : + ($orig, sub { shift }); +} + sub highlight_pair { my @a = split_line(shift); my @b = split_line(shift); -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html