git ls-files --with-tree documentation
--with-tree= When using --error-unmatch to expand the user supplied (i.e. path pattern) arguments to paths, pretend that paths which were removed in the index since the named are still present. Using this option with -s or -u options does not make any sense. This seems to say that it only affects it when --error-unmatch is used, but in fact it goes deeper; for example I can use it to list files that are present in either the current work tree or some other branch: joey@darkstar:/tmp/v> git checkout foo joey@darkstar:/tmp/v> git ls-files --with-tree=master in-foo in-master joey@darkstar:/tmp/v> git ls-files in-foo joey@darkstar:/tmp/v> git ls-tree master 100644 blob 0242cc10fdf4e9afdfd0928c2a209d4545780168in-master This is very useful behavior, but I'm not sure if I should rely on it behaving this way in the future, given the documentation. t/t3060-ls-files-with-tree.sh does indeed test that it "should add entries from named tree", and it does it without using --error-unmatch. How about changing the documentation to something like this to make more explicit what it does. --with-tree= Treat all files in the as if they were present in the index. When using --error-unmatch to expand the user supplied (i.e. path pattern) arguments to paths, this has the effect that paths which were removed in the index since the named are still present. Using this option with -s or -u options does not make any sense. -- see shy jo
Re: bug when combined with etckeeper
/etc/.git/hooks/pre-commit is installed by etckeeper and runs etckeeper pre-commit, which deals with /etc/.etckeeper, including running "git add .etckeeper". Why that file would match a gitignore seems much less important than why git would run that hook in an entirely different git repository. core.hooksPath By default Git will look for your hooks in the $GIT_DIR/hooks directory. Set this to different path, e.g. /etc/git/hooks, and Git will try to find your hooks in that directory, e.g. /etc/git/hooks/pre-receive instead of in $GIT_DIR/hooks/pre-receive. Hmm, the example "/etc/git/hooks" there is very similar to the "/etc/.git/hooks" used by etckeeper. So my guess is you have core.hooksPath set globally. -- see shy jo signature.asc Description: PGP signature
Re: Git for games working group
Ævar Arnfjörð Bjarmason wrote: > There's surely other aspects of that square peg of large file tracking > not fitting the round hole of file locking, the point of my write-up was > not that *that* solution is perfect, but there's prior art here that's > very easily adopted to distributed locking if someone wanted to scratch > that itch, since the notion of keeping a log of who has/hasn't gotten a > file is very similar to a log of who has/hasn't locked some file(s) in > the tree. Actually they are fundamentally very different. git-annex's tracking of locations of files is eventually consistent, which of course means that at any given point in time it may be currently inconsistent. That is fine for tracking locations of files, but not for locking. When git-annex needs to do an operation that relies on someone else's copy of a file actually being present, it uses real locking. That locking is not centralized, instead it relies on the connections between git repositories. That turns out to be sufficient for git-annex's own locking needs, but it would not be sufficient to avoid file edit conflict problems in eg a split brain situation. -- see shy jo signature.asc Description: PGP signature
Re: clean filter run in top of repo with wrong GIT_WORK_TREE
The clean filter can work around this problem by chdir GIT_PREFIX, but needing to do this in unusual cases seems to be asking for bugs. -- see shy jo signature.asc Description: PGP signature
clean filter run in top of repo with wrong GIT_WORK_TREE
When git is running inside a subdirectory of the repository, and needs to run the clean filter, it runs it chdired back to the top of the repository. However, if git was run with a relative --work-tree, it passes that relative path in GIT_WORK_TREE on to the clean filter. If git was run with eg, "--work-tree=..", the clean filter sees a work tree that is outside the repository. It might then read files located outside the repository. That seems like it could have security consequences, but it's certianly a surprising problem to need to deal with when writing a clean filter. Brian posted a fix for a very similar bug in sequencer.c on the 14th, so it seems likely there are other occurances of the same problem elsewhere. Demonstration of this bug: joey@darkstar:~/tmp/repo>cat .gitattributes * filter=foo joey@darkstar:~/tmp/repo>git config filter.foo.clean clean-filter %f joey@darkstar:~/tmp/repo>cat ~/bin/clean-filter #!/bin/sh pwd >&2 echo $GIT_WORK_TREE >&2 ls "$GIT_WORK_TREE/$1" joey@darkstar:~/tmp/repo>cd foo/bar/ joey@darkstar:~/tmp/repo/foo/bar>ls x joey@darkstar:~/tmp/repo/foo/bar>touch x joey@darkstar:~/tmp/repo/foo/bar>git --work-tree=../.. ls-files --modified /home/joey/tmp/repo ../.. ls: cannot access '../../foo/bar/x': No such file or directory git version 2.18.0 -- see shy jo signature.asc Description: PGP signature
Re: Fetch-hooks
Leo Gaspard wrote: > I just wanted to check, you did not put the Signed-off-by line in > patches in https://marc.info/?l=git=132491485901482=2 > > Could you confirm that the patches you sent are “covered under an > appropriate open source license and I have the right under that license > to submit that work with modifications, whether created in whole or in > part by me, under the same open source license (unless I am permitted to > submit under a different license), as indicated in the file”, so that I > could send the patch I wrote based on yours with a Signed-off-by line of > my own without breaking the DCO? Yes; my patches are under the same GPL-2 as the rest of git. -- see shy jo signature.asc Description: PGP signature
Re: Fetch-hooks
Leo Gaspard wrote: > That said, I just came upon [1] (esp. the description [2] and the patch > [3]), and wondered: it looks like the patch was abandoned midway in > favor of a hook refactoring. Would you happen to know whether the hook > refactoring eventually took place, and/or whether this patch was > resubmitted later, and/or whether it would still be possible to merge > this now? (not having any experience with git's internals yet, I don't > really know whether these are stupid questions or not) > > PS: Cc'ing Joey, as you most likely know best what eventually happened, > if you can remember it? I don't remember it well, but reviewing the thread, I think it foundered on this comment by Junio: > That use case sounds like that "git fetch" is called as a first class UI, > which is covered by "git myfetch" (you can call it "git annex fetch") > wrapper approach, the canonical example of a hook that we explicitly do ^ > not want to add. ^^^ While I still think a fetch hook would be a good idea for reasons of composability, I then just went off and implemented such a wrapper for my own particular use case, and the wrapper program then grew to cover use cases that a hook would not have been able to cover, so ... -- see shy jo signature.asc Description: PGP signature
Re: [PATCH] link_alt_odb_entries: make empty input a noop
Jeff King wrote: > This should make Joey's immediate pain go away, though only by papering > it over. I tend to agree that we shouldn't be looking at $PWD at all > here. I've confirmed that Jeff's patch fixes the case I was having trouble with. -- see shy jo signature.asc Description: PGP signature
use of PWD
In strbuf_add_absolute_path, git uses PWD if set when making relative paths absolute, otherwise it falls back to getcwd(3). Using PWD may not be a good idea. Here's one case where it confuses git badly: joey@darkstar:/>sudo ln -s /media/hd/repo hd joey@darkstar:/>cd /hd/repo joey@darkstar:/hd/repo>git --git-dir=../../../home/joey/tmp/repo/.git cat-file -t HEAD fatal: unable to normalize object directory: /hd/repo/../../../home/joey/tmp/repo/.git/objects joey@darkstar:/hd/repo>ls -d ../../../home/joey/tmp/repo/.git ../../../home/joey/tmp/repo/.git/ In that situation where cd has followed a symlink to a different depth, there seems to be no way to give git a relative path that works. Other numbers of ../ also don't work. What does work is to unset PWD: joey@darkstar:/hd/repo>PWD= git --git-dir=../../../home/joey/tmp/repo/.git cat-file -t HEAD commit So why does git use PWD at all? Some shell code used pwd earlier (leading to similar bugs like the one fixed in v1.5.1.5), but in the C code, it was first introduced in commit 1b9a9467f8b9a8da2fe58d10ae16779492aa7737, which speaks of the "user's view of the current directory", which is what PWD is. The use of PWD in that commit may be ok. Then in commit 10c4c881c4d2cb0ece0508e7142e189e68445257, the limited use of PWD broadened a lot, seemingly without intending to look at the "user's view of the current directory" anymore, due to reusing the code from the earlier commit. -- see shy jo signature.asc Description: PGP signature
Re: reversion in GIT_COMMON_DIR refs path
Joey Hess wrote: > Bisecting this test suite failure > https://git-annex.branchable.com/git-annex_in_nixpkgs_fails_with_git-2.13.0/ > I landed on commit f57f37e2e1bf11ab4cdfd221ad47e961ba9353a0 to git. > > It seems that changed resolving refs paths when GIT_DIR and GIT_COMMON_DIR > are both set. While before refs were looked for in GIT_COMMON_DIR, > now they're not. In case there's any doubt about whether this is a reversion or an intentional change, see gitrepository-layout(5): refs References are stored in subdirectories of this directory. The git prune command knows to preserve objects reachable from refs found in this directory and its subdirectories. This directory is ignored if $GIT_COMMON_DIR is set and "$GIT_COMMON_DIR/refs" will be used instead. So the documented behavior is broken. -- see shy jo signature.asc Description: PGP signature
Re: [PATCH] tests: add an optional test to test git-annex
Nice work. Note that you can export BUILDER=stack and git-annex will build with a known good dependency stack, which can be more reliable/cross platform than using apt to install its build dependencies. That needs https://docs.haskellstack.org/ installed. Also it currently needs GIT_TEST_GIT_ANNEX_REVISION=master since I improved git-annex's Makefile slightly. -- see shy jo signature.asc Description: PGP signature
Re: reversion in GIT_COMMON_DIR refs path
Ævar Arnfjörð Bjarmason wrote: > On Tue, May 16, 2017 at 7:10 PM, Joey Hess <i...@joeyh.name> wrote: > > Bisecting this test suite failure > > https://git-annex.branchable.com/git-annex_in_nixpkgs_fails_with_git-2.13.0/ > > I landed on commit f57f37e2e1bf11ab4cdfd221ad47e961ba9353a0 to git. > > That links's broken for me. Looking at your wiki it looks like you > mean: > https://git-annex.branchable.com/bugs/git-annex_in_nixpkgs_fails_with_git-2.13.0/ Thanks for correcting that > I have no idea what this bug is about, but side-question: It looks > like this is git-annex's own test suite that's failing with 2.13.0, is > that right? Yes indeed. > It would be very nice to have a test in git itself to test with > git-annex. I.e. some optional test that just pulls down the latest > git-annex release & runs its tests against the git we're building. > > Thanks for annex b.t.w., I use it a lot. If the git devs are ok with this, I certianly would be happy if such tests were run, at least occasionally, on the git side! -- see shy jo signature.asc Description: PGP signature
reversion in GIT_COMMON_DIR refs path
Bisecting this test suite failure https://git-annex.branchable.com/git-annex_in_nixpkgs_fails_with_git-2.13.0/ I landed on commit f57f37e2e1bf11ab4cdfd221ad47e961ba9353a0 to git. It seems that changed resolving refs paths when GIT_DIR and GIT_COMMON_DIR are both set. While before refs were looked for in GIT_COMMON_DIR, now they're not. Test case: #!/bin/sh set -e set -x rm -rf testdir git init testdir cd testdir echo 1 > foo git add foo git commit -m add mkdir dummy mkdir dummy/overlay cp .git/index .git/HEAD dummy/overlay #cp .git/refs .git/packed-refs dummy/overlay -a cd dummy export GIT_COMMON_DIR=`pwd`/../.git export GIT_DIR=`pwd`/overlay git rev-parse --git-path refs/heads/master git show refs/heads/master This script succeeds with git 2.11.0, but with 2.13.0, it fails: fatal: ambiguous argument 'refs/heads/master': unknown revision or path not in the working tree. It seems to be failing to look up refs in GIT_COMMON_DIR. Note that uncommenting the commented out line in the script, to copy the refs into GIT_DIR, makes it succeed. git rev-parse --git-path refs/heads/master shows the GIT_COMMON_DIR/refs path still (as gitrepository-layout documents). So this reversion made different parts of git disagreeing about the refs path.
Re: SHA1 collisions found
Linus Torvalds wrote: > So you'd have to be able to attack both the full SHA1, _and_ whatever > other different good hash to 128 bits. There's a surprising result of combining iterated hash functions, that the combination is no more difficult to attack than the strongest hash function used. https://www.iacr.org/cryptodb/archive/2004/CRYPTO/1472/1472.pdf Perhaps you already knew about this, but I had only heard rumors that was the case, until I found that reference recently. -- see shy jo signature.asc Description: PGP signature
Re: SHA1 collisions found
Jeff King wrote: > It's not an identical prefix, but I think collision attacks generally > are along the lines of selecting two prefixes followed by garbage, and > then mutating the garbage on both sides. That would "work" in this case > (modulo the fact that git would complain about the NUL). > > I haven't read the paper yet to see if that is the case here, though. The current attack is an identical-prefix attack, not chosen-prefix, so not quite to that point yet. The MD5 chosen-prefix attack was 2^15 harder than the known-prefix attack, but who knows if the numbers will be comprable for SHA1. > A related case is if you could stick a "cruft " header at the end of > the commit headers, and mutate its value (avoiding newlines). fsck > doesn't complain about that. git log and git show don't show such cruft headers either. BTW, the SHA attack only added ~128 bytes to the pdfs, not really a huge amount of garbage. -- see shy jo signature.asc Description: PGP signature
Re: SHA1 collisions found
Joey Hess wrote: > Linus Torvalds wrote: > > What you describe pretty much already requires a pre-image attack, > > which the new attack is _not_. > > > > It's not clear that the "good" object can be anything sane. > > Generate a regular commit object; use the entire commit object + NUL as the > chosen prefix, and use the identical-prefix collision attack to generate > the colliding good/bad objects. > > (The size in git's object header is a minor complication. Set the size > field to something sufficiently large, and then pad out the colliding > objects to that size once they're generated.) Sorry! While that would work, it's a useless attack because the good and bad commit objects still point to the same tree. It would be interesting to have such colliding objects, to see what beaks, but probably not worth $75k to generate them. -- see shy jo signature.asc Description: PGP signature
Re: SHA1 collisions found
Linus Torvalds wrote: > What you describe pretty much already requires a pre-image attack, > which the new attack is _not_. > > It's not clear that the "good" object can be anything sane. Generate a regular commit object; use the entire commit object + NUL as the chosen prefix, and use the identical-prefix collision attack to generate the colliding good/bad objects. (The size in git's object header is a minor complication. Set the size field to something sufficiently large, and then pad out the colliding objects to that size once they're generated.) -- see shy jo signature.asc Description: PGP signature
Re: SHA1 collisions found
Linus Torvalds wrote: > I haven't seen the attack yet, but git doesn't actually just hash the > data, it does prepend a type/length field to it. That usually tends to > make collision attacks much harder, because you either have to make > the resulting size the same too, or you have to be able to also edit > the size field in the header. I have some sha1 collisions (and other fun along these lines) in https://github.com/joeyh/supercollider That includes two files with the same SHA and size, which do get different blobs thanks to the way git prepends the header to the content. joey@darkstar:~/tmp/supercollider>sha1sum bad.pdf good.pdf d00bbe65d80f6d53d5c15da7c6b4f0a655c5a86a bad.pdf d00bbe65d80f6d53d5c15da7c6b4f0a655c5a86a good.pdf joey@darkstar:~/tmp/supercollider>git ls-tree HEAD 100644 blob ca44e9913faf08d625346205e228e2265dd12b65bad.pdf 100644 blob 5f90b67523865ad5b1391cb4a1c010d541c816c1good.pdf While appending identical data to these colliding files does generate other collisions, prepending data does not. It would cost 6500 CPU years + 100 GPU years to generate valid colliding git objects using the methods of the paper's authors. That might be cost effective if it helped get a backdoor into eg, the kernel. > (b) we can probably easily add some extra sanity checks to the opaque > data we do have, to make it much harder to do the hiding of random > data that these attacks pretty much always depend on. For example, git fsck does warn about a commit message with opaque data hidden after a NUL. But, git show/merge/pull give no indication that something funky is going on when working with such commits. -- see shy jo signature.asc Description: PGP signature
Re: SHA1 collisions found
Junio C Hamano wrote: > On Thu, Feb 23, 2017 at 8:43 AM, Joey Hess <i...@joeyh.name> wrote: > > > > Since we now have collisions in valid PDF files, collisions in valid git > > commit and tree objects are probably able to be constructed. > > That may be true, but > https://public-inbox.org/git/pine.lnx.4.58.0504291221250.18...@ppc970.osdl.org/ That's about someone replacing an valid object in Linus's repository with an invalid random blob they found that collides. This SHA1 break doesn't allow generating such a blob anyway. Linus is right, that's an impractical attack. Attacks using this SHA1 break will look something more like: * I push a "bad" object to a repo on github I set up under a pseudonym. * I publish a "good" object in a commit and convince the maintainer to merge it. * I wait for the maintainer to push to github. * I wait for github to deduplicate and hope they'll replace the good object with the bad one I pre-uploaded, thus silently changing the content of the good commit the maintainer reviewed and pushed. * The bad object is pulled from github and deployed. * The maintainer still has the good object. They may not notice the bad object is out there for a long time. Of course, it doesn't need to involve Github, and doesn't need to rely on internal details of their deduplication[1]; that only let me publish the bad object under a psydonym. -- see shy jo [1] Which I'm only guessing about, but now that we have colliding objects, we can upload them to different repos and see if such dedupication happens. signature.asc Description: PGP signature
SHA1 collisions found
https://shattered.io/static/shattered.pdf https://freedom-to-tinker.com/2017/02/23/rip-sha-1/ IIRC someone has been working on parameterizing git's SHA1 assumptions so a repository could eventually use a more secure hash. How far has that gotten? There are still many "40" constants in git.git HEAD. In the meantime, git commit -S, and checks that commits are signed, seems like the only way to mitigate against attacks such as the ones described in the threads at https://joeyh.name/blog/sha-1/ and https://joeyh.name/blog/entry/size_of_the_git_sha1_collision_attack_surface/ Since we now have collisions in valid PDF files, collisions in valid git commit and tree objects are probably able to be constructed. -- see shy jo signature.asc Description: PGP signature
Re: What's cooking in git.git (Jun 2016, #05; Thu, 16)
Torsten Bögershausen wrote re jh/clean-smudge-annex: > The thing is that we need to check the file system to find .gitatttibutes, > even if we just did it 1 nanosecond ago. > > So the .gitattributes is done 3 times: > -1 would_convert_to_git_filter_fd( > -2 assert(would_convert_to_git_filter_fd(path)); > -3 convert.c/convert_to_git_filter_fd() > > The only situation where this could be useful is when the .gitattributes > change between -1 and -2, > but then they would have changed between -1 and -3, and convert.c > will die(). > > Does it make sense to remove -2 ? There's less redundant work going on than at first seems, because .gitattribute files are only read once and cached. Verified by strace. So, the redundant work is only in the processing that convert_attrs() does of the cached .gitattributes. Notice that there was a similar redundant triple call to convert_attrs() before my patch set: 1. index_fd checks would_convert_to_git_filter_fd 2. index_stream_convert_blob does assert(would_convert_to_git_filter_fd(path)) (Again redundantly since 1. is its only caller and has already checked.) 3. in convert_to_git_filter_fd If convert_attrs() is somehow expensive, it might be worth passing a struct conv_attrs * into the functions that currently call convert_attrs(). But it does not look expensive to me. I think it would be safe enough to remove both asserts, at least as the code is structured now. -- see shy jo signature.asc Description: PGP signature
[PATCH v5 4/8] use smudgeToFile in git checkout etc
This makes git checkout, git reset, etc use smudgeToFile. Includes test cases. (There's a call to convert_to_working_tree in merge-recursive.c that could also be made to use smudgeToFile as well.) Signed-off-by: Joey Hess <jo...@joeyh.name> --- entry.c | 40 t/t0021-conversion.sh | 34 -- 2 files changed, 64 insertions(+), 10 deletions(-) diff --git a/entry.c b/entry.c index 519e042..81d12a1 100644 --- a/entry.c +++ b/entry.c @@ -146,6 +146,7 @@ static int write_entry(struct cache_entry *ce, unsigned long size; size_t wrote, newsize = 0; struct stat st; + int regular_file, smudge_to_file; if (ce_mode_s_ifmt == S_IFREG) { struct stream_filter *filter = get_stream_filter(ce->name, ce->sha1); @@ -175,8 +176,13 @@ static int write_entry(struct cache_entry *ce, /* * Convert from git internal format to working tree format +* unless the smudgeToFile filter can write to the +* file directly. */ - if (ce_mode_s_ifmt == S_IFREG && + regular_file = ce_mode_s_ifmt == S_IFREG; + smudge_to_file = regular_file + && can_smudge_to_file(ce->name); + if (regular_file && !smudge_to_file && convert_to_working_tree(ce->name, new, size, )) { free(new); new = strbuf_detach(, ); @@ -189,13 +195,31 @@ static int write_entry(struct cache_entry *ce, return error_errno("unable to create file %s", path); } - wrote = write_in_full(fd, new, size); - if (!to_tempfile) - fstat_done = fstat_output(fd, state, ); - close(fd); - free(new); - if (wrote != size) - return error("unable to write file %s", path); + if (!smudge_to_file) { + wrote = write_in_full(fd, new, size); + if (!to_tempfile) + fstat_done = fstat_output(fd, state, ); + close(fd); + free(new); + if (wrote != size) + return error("unable to write file %s", path); + } + else { + close(fd); + convert_to_working_tree_filter_to_file(ce->name, path, new, size); + free(new); + /* +* The smudgeToFile filter may have replaced the +* file; open it to make sure that the file +* exists. +*/ + fd = open(path, O_RDONLY); + if (fd < 0) + return error_errno("unable to create file %s", path); + if (!to_tempfile) + fstat_done = fstat_output(fd, state, ); + close(fd); + } break; case S_IFGITLINK: if (to_tempfile) diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh index bd84b80..ea18b17 100755 --- a/t/t0021-conversion.sh +++ b/t/t0021-conversion.sh @@ -14,12 +14,20 @@ chmod +x rot13.sh cat <rot13-from-file.sh #!$SHELL_PATH -fsfile="\$1" +srcfile="\$1" touch rot13-from-file.ran -cat "\$fsfile" | ./rot13.sh +cat "\$srcfile" | ./rot13.sh EOF chmod +x rot13-from-file.sh +cat <rot13-to-file.sh +#!$SHELL_PATH +destfile="\$1" +touch rot13-to-file.ran +./rot13.sh >"\$destfile" +EOF +chmod +x rot13-to-file.sh + test_expect_success setup ' git config filter.rot13.smudge ./rot13.sh && git config filter.rot13.clean ./rot13.sh && @@ -291,6 +299,17 @@ test_expect_success 'cleanFromFile filter is used when adding a file' ' test_cmp test fstest.t ' +test_expect_success 'smudgeToFile filter is used when checking out a file' ' + test_config filter.rot13.smudgeToFile ./rot13-to-file.sh && + + rm -f fstest.t && + git checkout -- fstest.t && + test_cmp test fstest.t && + + test -e rot13-to-file.ran && + rm -f rot13-to-file.ran +' + test_expect_success 'cleanFromFile filter is not used when clean filter is not configured' ' test_config filter.noclean.smudge ./rot13.sh && test_config filter.noclean.cleanFromFile ./rot13-from-file.sh && @@ -304,4 +323,15 @@ test_expect_success 'cleanFromFile filter is not used when clean filter is not c test_cmp test actual '
[PATCH v5 0/8] extend smudge/clean filters with direct file access (for pu)
Back from vacation with a reroll of jh/clean-smudge-annex. Deals with conflicting changes from cc/apply-am in pu. Since tb/convert-peek-in-index is not currently in pu, this reroll isn't based on it, and will conflict if that topic gets added back into pu. Not sure what the status of tb/convert-peek-in-index is at this point? Improvements from Junio's review: fix build with DEVELOPER=1 style fixes use test_cmp in test cases improve robustness of a test case clean up some confusing code small performance tweak Joey Hess (8): clarify %f documentation add smudgeToFile and cleanFromFile filter configs use cleanFromFile in git add use smudgeToFile in git checkout etc warn on unusable smudgeToFile/cleanFromFile config better recovery from failure of smudgeToFile filter use smudgeToFile filter in git am use smudgeToFile filter in recursive merge Documentation/config.txt| 18 - Documentation/gitattributes.txt | 42 apply.c | 16 + convert.c | 148 convert.h | 10 +++ entry.c | 59 merge-recursive.c | 53 +++--- sha1_file.c | 42 ++-- t/t0021-conversion.sh | 117 +++ 9 files changed, 459 insertions(+), 46 deletions(-) -- 2.8.1 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 2/8] add smudgeToFile and cleanFromFile filter configs
This adds new smudgeToFile and cleanFromFile filter commands, which are similar to smudge and clean but allow direct access to files on disk. This interface can be much more efficient when operating on large files, because the whole file content does not need to be streamed through the filter. It even allows for things like cleanFromFile commands that avoid reading the whole content of the file, and for smudgeToFile commands that populate a work tree file using an efficient Copy On Write operation. The new filter commands will not be used for all filtering. They are efficient to use when git add is adding a file, or when the work tree is being updated, but not a good fit when git is internally filtering blob objects in memory for eg, a diff. So, a user who wants to use smudgeToFile should also provide a smudge command to be used in cases where smudgeToFile is not used. And ditto with cleanFromFile and clean. To avoid foot-shooting configurations, the new commands are not used unless the old commands are also configured. That also ensures that a filter driver configuration that includes these new commands will work, although less efficiently, when used with an older version of git that does not support them. Signed-off-by: Joey Hess <jo...@joeyh.name> --- Documentation/config.txt| 18 ++- Documentation/gitattributes.txt | 37 ++ convert.c | 111 +++- convert.h | 10 4 files changed, 160 insertions(+), 16 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 19493aa..a55bed8 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1325,15 +1325,29 @@ format.useAutoBase:: format-patch by default. filter..clean:: - The command which is used to convert the content of a worktree + The command which is used as a filter to convert the content of a worktree file to a blob upon checkin. See linkgit:gitattributes[5] for details. filter..smudge:: - The command which is used to convert the content of a blob + The command which is used as a filter to convert the content of a blob object to a worktree file upon checkout. See linkgit:gitattributes[5] for details. +filter..cleanFromFile:: + Similar to filter..clean but the specified command + directly accesses a worktree file on disk, rather than + receiving the file content from standard input. + Only used when filter..clean is also configured. + See linkgit:gitattributes[5] for details. + +filter..smudgeToFile:: + Similar to filter..smudge but the specified command + writes the content of a blob directly to a worktree file, + rather than to standard output. + Only used when filter..smudge is also configured. + See linkgit:gitattributes[5] for details. + fsck.:: Allows overriding the message type (error, warn or ignore) of a specific message ID such as `missingEmail`. diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index 197ece8..a58aafc 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -385,6 +385,43 @@ not exist, or may have different contents. So, smudge and clean commands should not try to access the file on disk, but only act as filters on the content provided to them on standard input. +There are two extra commands "cleanFromFile" and "smudgeToFile", which +can optionally be set in a filter driver. These are similar to the "clean" +and "smudge" commands, but avoid needing to pipe the contents of files +through the filters, and instead read/write files in the filesystem. +This can be more efficient when using filters with large files that are not +directly stored in the repository. + +Both "cleanFromFile" and "smudgeToFile" are provided a path as an +added parameter after the configured command line. + +The "cleanFromFile" command is provided the path to the file that +it should clean. Like the "clean" command, it should output the cleaned +version to standard output. + +The "smudgeToFile" command is provided a path to the file that it +should write to. (This file will already exist, as an empty file that can +be written to or replaced.) Like the "smudge" command, "smudgeToFile" +is fed the blob object from its standard input. + +Some git operations that need to apply filters cannot use "cleanFromFile" +and "smudgeToFile", since the files are not present to disk. So, to avoid +inconsistent behavior, "cleanFromFile" will only be used if "clean" is +also configured, and "smudgeToFile" will only be used if "smudge" is also +configured. + +An example large file storage filter driver using cleanFromFile and +sm
[PATCH v5 7/8] use smudgeToFile filter in git am
git am updates the work tree and so should use the smudgeToFile filter. This includes some refactoring into convert_to_working_tree_filter_to_file to make it check the file after running the smudgeToFile command, and clean up from a failing command. Signed-off-by: Joey Hess <jo...@joeyh.name> --- apply.c | 16 convert.c | 25 +++-- entry.c | 21 - t/t0021-conversion.sh | 13 + 4 files changed, 56 insertions(+), 19 deletions(-) diff --git a/apply.c b/apply.c index 4a6b2db..7db8344 100644 --- a/apply.c +++ b/apply.c @@ -4322,6 +4322,22 @@ static int try_create_file(const char *path, unsigned int mode, const char *buf, if (fd < 0) return 1; + if (can_smudge_to_file(path)) { + close(fd); + fd = convert_to_working_tree_filter_to_file(path, path, buf, size); + if (fd < 0) { + /* smudgeToFile filter failed; continue +* with regular file creation instead. */ + fd = open(path, O_CREAT | O_EXCL | O_WRONLY, (mode & 0100) ? 0777 : 0666); + if (fd < 0) + return -1; + } + else { + close(fd); + return 0; + } + } + if (convert_to_working_tree(path, buf, size, )) { size = nbuf.len; buf = nbuf.buf; diff --git a/convert.c b/convert.c index e1b0b44..3746ad5 100644 --- a/convert.c +++ b/convert.c @@ -1030,13 +1030,34 @@ int convert_to_working_tree(const char *path, const char *src, size_t len, struc return convert_to_working_tree_internal(path, NULL, src, len, dst, 0); } +/* + * Returns fd open to read the worktree file on success. + * On failure, the worktree file will not exist. + */ int convert_to_working_tree_filter_to_file(const char *path, const char *destpath, const char *src, size_t len) { struct strbuf output = STRBUF_INIT; - int ret = convert_to_working_tree_internal(path, destpath, src, len, , 0); + int ok = convert_to_working_tree_internal(path, destpath, src, len, , 0); /* The smudgeToFile filter stdout is not used. */ strbuf_release(); - return ret; + if (ok) { + /* +* Open the file to make sure that it's present +* (and readable) after the command populated it. +*/ + int fd = open(path, O_RDONLY); + if (fd < 0) + unlink(path); + return fd; + } + else { + /* +* The command could have created the file before failing, +* so delete it. +*/ + unlink(path); + return -1; + } } int renormalize_buffer(const char *path, const char *src, size_t len, struct strbuf *dst) diff --git a/entry.c b/entry.c index 7811e31..40662eb 100644 --- a/entry.c +++ b/entry.c @@ -191,34 +191,21 @@ static int write_entry(struct cache_entry *ce, if (smudge_to_file) { close(fd); - if (convert_to_working_tree_filter_to_file(ce->name, path, new, size)) { + fd = convert_to_working_tree_filter_to_file(ce->name, path, new, size); + if (fd >= 0) { free(new); - /* -* The smudgeToFile filter may have replaced -* or deleted the file; reopen it to make -* sure that the file exists. -*/ - fd = open(path, O_RDONLY); - if (fd < 0) - return error_errno("unable to create file %s", path); if (!to_tempfile) fstat_done = fstat_output(fd, state, ); close(fd); } else { - /* -* The failing smudgeToFile filter may have -* deleted or replaced the file; delete -* the file and re-open for recovery write. -*/ - unlink(path); + /* Fall through to normal write below. */ + smudge_to_file = 0; fd = open_output_fd(path, ce, to_tempfile); if (fd < 0) { free(new); retur
[PATCH v5 6/8] better recovery from failure of smudgeToFile filter
If the smudgeToFile filter fails, it can leave the worktree file with the wrong content, or even deleted. Recover from this by falling back to running the smudge filter. Signed-off-by: Joey Hess <jo...@joeyh.name> --- entry.c | 66 ++- t/t0021-conversion.sh | 24 +++ 2 files changed, 68 insertions(+), 22 deletions(-) diff --git a/entry.c b/entry.c index 81d12a1..7811e31 100644 --- a/entry.c +++ b/entry.c @@ -182,12 +182,6 @@ static int write_entry(struct cache_entry *ce, regular_file = ce_mode_s_ifmt == S_IFREG; smudge_to_file = regular_file && can_smudge_to_file(ce->name); - if (regular_file && !smudge_to_file && - convert_to_working_tree(ce->name, new, size, )) { - free(new); - new = strbuf_detach(, ); - size = newsize; - } fd = open_output_fd(path, ce, to_tempfile); if (fd < 0) { @@ -195,7 +189,51 @@ static int write_entry(struct cache_entry *ce, return error_errno("unable to create file %s", path); } + if (smudge_to_file) { + close(fd); + if (convert_to_working_tree_filter_to_file(ce->name, path, new, size)) { + free(new); + /* +* The smudgeToFile filter may have replaced +* or deleted the file; reopen it to make +* sure that the file exists. +*/ + fd = open(path, O_RDONLY); + if (fd < 0) + return error_errno("unable to create file %s", path); + if (!to_tempfile) + fstat_done = fstat_output(fd, state, ); + close(fd); + } + else { + /* +* The failing smudgeToFile filter may have +* deleted or replaced the file; delete +* the file and re-open for recovery write. +*/ + unlink(path); + fd = open_output_fd(path, ce, to_tempfile); + if (fd < 0) { + free(new); + return error_errno("unable to create file %s", path); + } + /* Fall through to normal write below. */ + smudge_to_file = 0; + } + } + + /* +* Not an else of above if (smudge_to_file) because the +* smudgeToFile filter may fail and in that case this is +* run to recover. +*/ if (!smudge_to_file) { + if (regular_file && + convert_to_working_tree(ce->name, new, size, )) { + free(new); + new = strbuf_detach(, ); + size = newsize; + } wrote = write_in_full(fd, new, size); if (!to_tempfile) fstat_done = fstat_output(fd, state, ); @@ -204,22 +242,6 @@ static int write_entry(struct cache_entry *ce, if (wrote != size) return error("unable to write file %s", path); } - else { - close(fd); - convert_to_working_tree_filter_to_file(ce->name, path, new, size); - free(new); - /* -* The smudgeToFile filter may have replaced the -* file; open it to make sure that the file -* exists. -*/ - fd = open(path, O_RDONLY); - if (fd < 0) - return error_errno("unable to create file %s", path); - if (!to_tempfile) - fstat_done = fstat_output(fd, state, ); - close(fd); - } break; case S_IFGITLINK: if (to_tempfile) diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh index ea18b17..0efad9b 100755 --- a/t/t0021-conversion.sh ++
[PATCH v5 1/8] clarify %f documentation
It's natural to expect %f to be an actual file on disk; help avoid that mistake. Signed-off-by: Joey Hess <jo...@joeyh.name> --- Documentation/gitattributes.txt | 5 + 1 file changed, 5 insertions(+) diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index f2afdb6..197ece8 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -379,6 +379,11 @@ substitution. For example: smudge = git-p4-filter --smudge %f +Note that "%f" is the name of the path that is being worked on. Depending +on the version that is being filtered, the corresponding file on disk may +not exist, or may have different contents. So, smudge and clean commands +should not try to access the file on disk, but only act as filters on the +content provided to them on standard input. Interaction between checkin/checkout attributes ^^^ -- 2.8.1 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 5/8] warn on unusable smudgeToFile/cleanFromFile config
Let the user know when they have a smudgeToFile/cleanFromFile config that cannot be used because the corresponding smudge/clean config is missing. The warning is only displayed a maximum of once per git invocation, and only when doing an operation that would use the filter. Signed-off-by: Joey Hess <jo...@joeyh.name> --- convert.c | 36 ++-- 1 file changed, 26 insertions(+), 10 deletions(-) diff --git a/convert.c b/convert.c index eb7774f..e1b0b44 100644 --- a/convert.c +++ b/convert.c @@ -845,34 +845,50 @@ int would_convert_to_git_filter_fd(const char *path) return apply_filter(path, NULL, NULL, 0, -1, NULL, ca.drv->clean); } +static int can_filter_file(const char *filefilter, const char *filefiltername, + const char *stdiofilter, const char *stdiofiltername, + const struct conv_attrs *ca, + int *warncount) +{ + if (!filefilter) + return 0; + + if (stdiofilter) + return 1; + + if (*warncount == 0) + warning("Not running your configured filter.%s.%s command, because filter.%s.%s is not configured", + ca->drv->name, filefiltername, + ca->drv->name, stdiofiltername); + *warncount=*warncount+1; + + return 0; +} + int can_clean_from_file(const char *path) { struct conv_attrs ca; + static int warncount = 0; convert_attrs(, path); if (!ca.drv) return 0; - /* -* Only use the cleanFromFile filter when the clean filter is also -* configured. -*/ - return (ca.drv->clean_from_file && ca.drv->clean); + return can_filter_file(ca.drv->clean_from_file, "cleanFromFile", + ca.drv->clean, "clean", , ); } int can_smudge_to_file(const char *path) { struct conv_attrs ca; + static int warncount = 0; convert_attrs(, path); if (!ca.drv) return 0; - /* -* Only use the smudgeToFile filter when the smudge filter is also -* configured. -*/ - return (ca.drv->smudge_to_file && ca.drv->smudge); + return can_filter_file(ca.drv->smudge_to_file, "smudgeToFile", + ca.drv->smudge, "smudge", , ); } const char *get_convert_attr_ascii(const char *path) -- 2.8.1 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 8/8] use smudgeToFile filter in recursive merge
Recursive merge updates the work tree and so should use the smudgeToFile filter. At this point, smudgeToFile is run by everything that updates work tree files. Signed-off-by: Joey Hess <jo...@joeyh.name> --- merge-recursive.c | 53 --- t/t0021-conversion.sh | 16 +++- 2 files changed, 57 insertions(+), 12 deletions(-) diff --git a/merge-recursive.c b/merge-recursive.c index a4a1195..5fe3f50 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -758,6 +758,7 @@ static void update_file_flags(struct merge_options *o, enum object_type type; void *buf; unsigned long size; + int isreg; if (S_ISGITLINK(mode)) { /* @@ -774,22 +775,16 @@ static void update_file_flags(struct merge_options *o, die(_("cannot read object %s '%s'"), oid_to_hex(oid), path); if (type != OBJ_BLOB) die(_("blob expected for %s '%s'"), oid_to_hex(oid), path); - if (S_ISREG(mode)) { - struct strbuf strbuf = STRBUF_INIT; - if (convert_to_working_tree(path, buf, size, )) { - free(buf); - size = strbuf.len; - buf = strbuf_detach(, NULL); - } - } if (make_room_for_path(o, path) < 0) { update_wd = 0; free(buf); goto update_index; } - if (S_ISREG(mode) || (!has_symlinks && S_ISLNK(mode))) { + isreg = S_ISREG(mode); + if (isreg || (!has_symlinks && S_ISLNK(mode))) { int fd; + int smudge_to_file; if (mode & 0100) mode = 0777; else @@ -797,8 +792,44 @@ static void update_file_flags(struct merge_options *o, fd = open(path, O_WRONLY | O_TRUNC | O_CREAT, mode); if (fd < 0) die_errno(_("failed to open '%s'"), path); - write_in_full(fd, buf, size); - close(fd); + + smudge_to_file = can_smudge_to_file(path); + if (smudge_to_file) { + close(fd); + fd = convert_to_working_tree_filter_to_file(path, path, buf, size); + if (fd < 0) { + /* +* smudgeToFile filter failed; +* continue with regular file +* creation. +*/ + smudge_to_file = 0; + fd = open(path, O_WRONLY | O_TRUNC | O_CREAT, mode); + if (fd < 0) + die_errno(_("failed to open '%s'"), path); + } + else { + close(fd); + } + } + + /* +* Not an else of above if (smudge_to_file) because +* the smudgeToFile filter may fail and in that case +* this is run to recover. +*/ + if (!smudge_to_file) { + if (isreg) { + struct strbuf strbuf = STRBUF_INIT; + if (convert_to_working_tree(path, buf, size, )) { + free(buf); + size = strbuf.len; + buf = strbuf_detach(, NULL); + } + } + write_in_full(fd, buf, size); + close(fd); + } } else if (S_ISLNK(mode)) { char *lnk = xmemdupz(buf, size); safe_create_leading_directories_const(path); diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh index 42b28aa..64b2b8f 100755 --- a/t/t0021-conversion.sh +++ b/t/t0021-conversion.sh @@ -334,10 +334,24 @@ test_expect_success 'recovery from failure of smudgeToFile filter that deletes t test_cmp test fstest.t ' +test_expect_success 'smudgeToFile filter is used in merge' ' + test_config filter.rot13.smudge
[PATCH v5 3/8] use cleanFromFile in git add
Includes test cases. Signed-off-by: Joey Hess <jo...@joeyh.name> --- sha1_file.c | 42 -- t/t0021-conversion.sh | 36 2 files changed, 72 insertions(+), 6 deletions(-) diff --git a/sha1_file.c b/sha1_file.c index 2fc22b0..549a20f 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -3335,6 +3335,29 @@ static int index_stream_convert_blob(unsigned char *sha1, int fd, return ret; } +static int index_from_file_convert_blob(unsigned char *sha1, + const char *path, unsigned flags) +{ + int ret; + const int write_object = flags & HASH_WRITE_OBJECT; + struct strbuf sbuf = STRBUF_INIT; + + assert(path); + assert(can_clean_from_file(path)); + + convert_to_git_filter_from_file(path, , +write_object ? safe_crlf : SAFE_CRLF_FALSE); + + if (write_object) + ret = write_sha1_file(sbuf.buf, sbuf.len, typename(OBJ_BLOB), + sha1); + else + ret = hash_sha1_file(sbuf.buf, sbuf.len, typename(OBJ_BLOB), +sha1); + strbuf_release(); + return ret; +} + static int index_pipe(unsigned char *sha1, int fd, enum object_type type, const char *path, unsigned flags) { @@ -3427,12 +3450,19 @@ int index_path(unsigned char *sha1, const char *path, struct stat *st, unsigned switch (st->st_mode & S_IFMT) { case S_IFREG: - fd = open(path, O_RDONLY); - if (fd < 0) - return error_errno("open(\"%s\")", path); - if (index_fd(sha1, fd, st, OBJ_BLOB, path, flags) < 0) - return error("%s: failed to insert into database", -path); + if (can_clean_from_file(path)) { + if (index_from_file_convert_blob(sha1, path, flags) < 0) + return error("%s: failed to insert into database", +path); + } + else { + fd = open(path, O_RDONLY); + if (fd < 0) + return error_errno("open(\"%s\")", path); + if (index_fd(sha1, fd, st, OBJ_BLOB, path, flags) < 0) + return error("%s: failed to insert into database", +path); + } break; case S_IFLNK: if (strbuf_readlink(, path, st->st_size)) diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh index 7bac2bc..bd84b80 100755 --- a/t/t0021-conversion.sh +++ b/t/t0021-conversion.sh @@ -12,6 +12,14 @@ tr \ EOF chmod +x rot13.sh +cat <rot13-from-file.sh +#!$SHELL_PATH +fsfile="\$1" +touch rot13-from-file.ran +cat "\$fsfile" | ./rot13.sh +EOF +chmod +x rot13-from-file.sh + test_expect_success setup ' git config filter.rot13.smudge ./rot13.sh && git config filter.rot13.clean ./rot13.sh && @@ -268,4 +276,32 @@ test_expect_success 'disable filter with empty override' ' test_must_be_empty err ' +test_expect_success 'cleanFromFile filter is used when adding a file' ' + test_config filter.rot13.cleanFromFile ./rot13-from-file.sh && + + echo "*.t filter=rot13" >.gitattributes && + + cat test >fstest.t && + git add fstest.t && + test -e rot13-from-file.ran && + rm -f rot13-from-file.ran && + + rm -f fstest.t && + git checkout -- fstest.t && + test_cmp test fstest.t +' + +test_expect_success 'cleanFromFile filter is not used when clean filter is not configured' ' + test_config filter.noclean.smudge ./rot13.sh && + test_config filter.noclean.cleanFromFile ./rot13-from-file.sh && + + echo "*.no filter=noclean" >.gitattributes && + + cat test >test.no && + git add test.no && + test ! -e rot13-from-file.ran && + git cat-file blob :test.no >actual && + test_cmp test actual +' + test_done -- 2.8.1 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] Long running Git clean/smudge filter
larsxschnei...@gmail.com wrote: > (2) Joey's topic, which is the base for my patch, looks stalled for more than > 2 weeks: > http://thread.gmane.org/gmane.comp.version-control.git/297994/focus=298006 > I would be happy to address Junio's comments and post a reroll. However, > I don't want to interfere with Joey's plans. I've been on vacation and plan to get back to that in the upcoming week. > @Joey (in case you are reading this): > My patch changes your initial idea quite a bit. However, I believe it is an > improvement that could be beneficial for git-annex, too. Would you prefer to > work with me on the combination of our ideas (file clean/smudge + long running > clean/smudge processes) or would you prefer to keep your interface? Long running filters mean that you need a more complicated protocol to speak over the pipes. Seems that such a protocol could be designed to work with the original smudge/clean filters as well as with my smudgeToFile/cleanFromFile filters. Assuming that there's a way to tell whether the filters support being long-running or not. Note that the interface we arrived at for smudgeToFile/cleanFromFile is as similar as possible to smudge/clean, so the filter developer only has to change one thing. That's a big plus, and so I don't like diverging the two interfaces. So, I don't want to entangle these two ideas very much. -- see shy jo signature.asc Description: PGP signature
[PATCH v4 3/8] use cleanFromFile in git add
Includes test cases. Signed-off-by: Joey Hess <jo...@joeyh.name> --- sha1_file.c | 44 ++-- t/t0021-conversion.sh | 36 2 files changed, 74 insertions(+), 6 deletions(-) diff --git a/sha1_file.c b/sha1_file.c index 55604b6..df62eaf 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -3339,6 +3339,31 @@ static int index_stream_convert_blob(unsigned char *sha1, int fd, return ret; } +static int index_from_file_convert_blob(unsigned char *sha1, + const char *path, unsigned flags) +{ + int ret; + const int write_object = flags & HASH_WRITE_OBJECT; + const int valid_sha1 = flags & HASH_USE_SHA_NOT_PATH; + struct strbuf sbuf = STRBUF_INIT; + + assert(path); + assert(can_clean_from_file(path)); + + convert_to_git_filter_from_file(path, , +write_object ? safe_crlf : SAFE_CRLF_FALSE, +valid_sha1 ? sha1 : NULL); + + if (write_object) + ret = write_sha1_file(sbuf.buf, sbuf.len, typename(OBJ_BLOB), + sha1); + else + ret = hash_sha1_file(sbuf.buf, sbuf.len, typename(OBJ_BLOB), +sha1); + strbuf_release(); + return ret; +} + static int index_pipe(unsigned char *sha1, int fd, enum object_type type, const char *path, unsigned flags) { @@ -3433,12 +3458,19 @@ int index_path(unsigned char *sha1, const char *path, struct stat *st, unsigned switch (st->st_mode & S_IFMT) { case S_IFREG: - fd = open(path, O_RDONLY); - if (fd < 0) - return error_errno("open(\"%s\")", path); - if (index_fd(sha1, fd, st, OBJ_BLOB, path, flags) < 0) - return error("%s: failed to insert into database", -path); + if (can_clean_from_file(path)) { + if (index_from_file_convert_blob(sha1, path, flags) < 0) + return error("%s: failed to insert into database", +path); + } + else { + fd = open(path, O_RDONLY); + if (fd < 0) + return error_errno("open(\"%s\")", path); + if (index_fd(sha1, fd, st, OBJ_BLOB, path, flags) < 0) + return error("%s: failed to insert into database", +path); + } break; case S_IFLNK: if (strbuf_readlink(, path, st->st_size)) diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh index 7bac2bc..407d5d6 100755 --- a/t/t0021-conversion.sh +++ b/t/t0021-conversion.sh @@ -12,6 +12,14 @@ tr \ EOF chmod +x rot13.sh +cat <rot13-from-file.sh +#!$SHELL_PATH +fsfile="\$1" +touch rot13-from-file.ran +cat "\$fsfile" | ./rot13.sh +EOF +chmod +x rot13-from-file.sh + test_expect_success setup ' git config filter.rot13.smudge ./rot13.sh && git config filter.rot13.clean ./rot13.sh && @@ -268,4 +276,32 @@ test_expect_success 'disable filter with empty override' ' test_must_be_empty err ' +test_expect_success 'cleanFromFile filter is used when adding a file' ' + test_config filter.rot13.cleanFromFile ./rot13-from-file.sh && + + echo "*.t filter=rot13" >.gitattributes && + + cat test >fstest.t && + git add fstest.t && + test -e rot13-from-file.ran && + rm -f rot13-from-file.ran && + + rm -f fstest.t && + git checkout -- fstest.t && + cmp test fstest.t +' + +test_expect_success 'cleanFromFile filter is not used when clean filter is not configured' ' + test_config filter.noclean.smudge ./rot13.sh && + test_config filter.noclean.cleanFromFile ./rot13-from-file.sh && + + echo "*.no filter=noclean" >.gitattributes && + + cat test >test.no && + git add test.no && + test ! -e rot13-from-file.ran && + git cat-file blob :test.no >actual && + cmp test actual +' + test_done -- 2.9.0.587.ga3bedf2 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 2/8] add smudgeToFile and cleanFromFile filter configs
This adds new smudgeToFile and cleanFromFile filter commands, which are similar to smudge and clean but allow direct access to files on disk. This interface can be much more efficient when operating on large files, because the whole file content does not need to be streamed through the filter. It even allows for things like cleanFromFile commands that avoid reading the whole content of the file, and for smudgeToFile commands that populate a work tree file using an efficient Copy On Write operation. The new filter commands will not be used for all filtering. They are efficient to use when git add is adding a file, or when the work tree is being updated, but not a good fit when git is internally filtering blob objects in memory for eg, a diff. So, a user who wants to use smudgeToFile should also provide a smudge command to be used in cases where smudgeToFile is not used. And ditto with cleanFromFile and clean. To avoid foot-shooting configurations, the new commands are not used unless the old commands are also configured. That also ensures that a filter driver configuration that includes these new commands will work, although less efficiently, when used with an older version of git that does not support them. Signed-off-by: Joey Hess <jo...@joeyh.name> --- Documentation/config.txt| 18 ++- Documentation/gitattributes.txt | 37 + convert.c | 114 ++-- convert.h | 11 4 files changed, 162 insertions(+), 18 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 0d4095f..b76dad3 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1306,15 +1306,29 @@ format.useAutoBase:: format-patch by default. filter..clean:: - The command which is used to convert the content of a worktree + The command which is used as a filter to convert the content of a worktree file to a blob upon checkin. See linkgit:gitattributes[5] for details. filter..smudge:: - The command which is used to convert the content of a blob + The command which is used as a filter to convert the content of a blob object to a worktree file upon checkout. See linkgit:gitattributes[5] for details. +filter..cleanFromFile:: + Similar to filter..clean but the specified command + directly accesses a worktree file on disk, rather than + receiving the file content from standard input. + Only used when filter..clean is also configured. + See linkgit:gitattributes[5] for details. + +filter..smudgeToFile:: + Similar to filter..smudge but the specified command + writes the content of a blob directly to a worktree file, + rather than to standard output. + Only used when filter..smudge is also configured. + See linkgit:gitattributes[5] for details. + fsck.:: Allows overriding the message type (error, warn or ignore) of a specific message ID such as `missingEmail`. diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index 197ece8..a58aafc 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -385,6 +385,43 @@ not exist, or may have different contents. So, smudge and clean commands should not try to access the file on disk, but only act as filters on the content provided to them on standard input. +There are two extra commands "cleanFromFile" and "smudgeToFile", which +can optionally be set in a filter driver. These are similar to the "clean" +and "smudge" commands, but avoid needing to pipe the contents of files +through the filters, and instead read/write files in the filesystem. +This can be more efficient when using filters with large files that are not +directly stored in the repository. + +Both "cleanFromFile" and "smudgeToFile" are provided a path as an +added parameter after the configured command line. + +The "cleanFromFile" command is provided the path to the file that +it should clean. Like the "clean" command, it should output the cleaned +version to standard output. + +The "smudgeToFile" command is provided a path to the file that it +should write to. (This file will already exist, as an empty file that can +be written to or replaced.) Like the "smudge" command, "smudgeToFile" +is fed the blob object from its standard input. + +Some git operations that need to apply filters cannot use "cleanFromFile" +and "smudgeToFile", since the files are not present to disk. So, to avoid +inconsistent behavior, "cleanFromFile" will only be used if "clean" is +also configured, and "smudgeToFile" will only be used if "smudge" is also +configured. + +An example large file storage filter driver using cleanFromFile and +sm
Re: What's cooking in git.git (Jun 2016, #05; Thu, 16)
Torsten Bögershausen wrote: > There is a conflict in pu: > "jh/clean-smudge-annex" does not work together with "tb/convert-peek-in-index" > > (And currently pu didn't compile) I'm sending a v4 of jh/clean-smudge-annex that is rebased on top of tb/convert-peek-in-index to fix this. > (I will hopefully be able to do a separate review of the smudge/clean patch) Would be appreciated. It'll be 2 weeks until I can work on this any more. > (And jo...@joeyh.name is not reachable from web.de) I'd like to fix whatever's broken; you could send details out of band to joeyh...@gmail.com -- see shy jo signature.asc Description: PGP signature
[PATCH v4 7/8] use smudgeToFile filter in git am
git am updates the work tree and so should use the smudgeToFile filter. This includes some refactoring into convert_to_working_tree_filter_to_file to make it check the file after running the smudgeToFile command, and clean up from a failing command. Signed-off-by: Joey Hess <jo...@joeyh.name> --- builtin/apply.c | 16 convert.c | 19 +-- entry.c | 15 +++ t/t0021-conversion.sh | 13 + 4 files changed, 49 insertions(+), 14 deletions(-) diff --git a/builtin/apply.c b/builtin/apply.c index b24c6ba..b027ab7 100644 --- a/builtin/apply.c +++ b/builtin/apply.c @@ -4250,6 +4250,22 @@ static int try_create_file(const char *path, unsigned int mode, const char *buf, if (fd < 0) return -1; + if (can_smudge_to_file(path)) { + close(fd); + fd = convert_to_working_tree_filter_to_file(path, path, buf, size); + if (fd < 0) { + /* smudgeToFile filter failed; continue +* with regular file creation. */ + fd = open(path, O_CREAT | O_EXCL | O_WRONLY, (mode & 0100) ? 0777 : 0666); + if (fd < 0) + return -1; + } + else { + close(fd); + return 0; + } + } + if (convert_to_working_tree(path, buf, size, )) { size = nbuf.len; buf = nbuf.buf; diff --git a/convert.c b/convert.c index 378a3bd..4fe89e0 100644 --- a/convert.c +++ b/convert.c @@ -1048,13 +1048,28 @@ int convert_to_working_tree(const char *path, const char *src, size_t len, struc return convert_to_working_tree_internal(path, NULL, src, len, dst, 0); } +/* Returns fd open to read the worktree file on success. + * On failure, the worktree file will not exist. */ int convert_to_working_tree_filter_to_file(const char *path, const char *destpath, const char *src, size_t len) { struct strbuf output = STRBUF_INIT; - int ret = convert_to_working_tree_internal(path, destpath, src, len, , 0); + int ok = convert_to_working_tree_internal(path, destpath, src, len, , 0); /* The smudgeToFile filter stdout is not used. */ strbuf_release(); - return ret; + if (ok) { + /* Open the file to make sure that it's present +* (and readable) after the command populated it. */ + int fd = open(path, O_RDONLY); + if (fd < 0) + unlink(path); + return fd; + } + else { + /* The command could have created the file before failing, +* so delete it. */ + unlink(path); + return -1; + } } int renormalize_buffer(const char *path, const char *src, size_t len, struct strbuf *dst) diff --git a/entry.c b/entry.c index 8322127..2f7c4fd 100644 --- a/entry.c +++ b/entry.c @@ -190,13 +190,10 @@ static int write_entry(struct cache_entry *ce, if (smudge_to_file) { close(fd); - if (! convert_to_working_tree_filter_to_file(ce->name, path, new, size)) { + fd = convert_to_working_tree_filter_to_file(ce->name, path, new, size); + if (fd < 0) { smudge_to_file = 0; - /* The failing smudgeToFile filter may have -* deleted or replaced the file; delete -* the file and re-open for recovery write. -*/ - unlink(path); + /* re-open for recovery write */ fd = open_output_fd(path, ce, to_tempfile); if (fd < 0) { free(new); @@ -205,12 +202,6 @@ static int write_entry(struct cache_entry *ce, } else { free(new); - /* The smudgeToFile filter may have replaced -* or deleted the file; reopen it to make sure -* that the file exists. */ - fd = open(path, O_RDONLY); - if (fd < 0) - return error_errno("unable to create file %s", path); if (!to_tempfile) fstat_done = fstat_output(fd, state, ); close(fd); diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh index c0b4709..fd07bd6 100755 --- a/t/t0021-conversion.sh +++ b/t/t0021-conversion.sh @@ -334,
[PATCH v4 5/8] warn on unusable smudgeToFile/cleanFromFile config
Let the user know when they have a smudgeToFile/cleanFromFile config that cannot be used because the corresponding smudge/clean config is missing. The warning is only displayed a maximum of once per git invocation, and only when doing an operation that would use the filter. Signed-off-by: Joey Hess <jo...@joeyh.name> --- convert.c | 34 ++ 1 file changed, 26 insertions(+), 8 deletions(-) diff --git a/convert.c b/convert.c index 9dde406..378a3bd 100644 --- a/convert.c +++ b/convert.c @@ -859,32 +859,50 @@ int would_convert_to_git_filter_fd(const char *path) return apply_filter(path, NULL, NULL, 0, -1, NULL, ca.drv->clean); } +static int can_filter_file(const char *filefilter, const char *filefiltername, + const char *stdiofilter, const char *stdiofiltername, + const struct conv_attrs *ca, + int *warncount) +{ + if (! filefilter) + return 0; + + if (stdiofilter) + return 1; + + if (*warncount == 0) + warning("Not running your configured filter.%s.%s command, because filter.%s.%s is not configured", + ca->drv->name, filefiltername, + ca->drv->name, stdiofiltername); + *warncount=*warncount+1; + + return 0; +} + int can_clean_from_file(const char *path) { struct conv_attrs ca; + static int warncount = 0; convert_attrs(, path); if (!ca.drv) return 0; - /* Only use the cleanFromFile filter when the clean filter is also -* configured. -*/ - return (ca.drv->clean_from_file && ca.drv->clean); + return can_filter_file(ca.drv->clean_from_file, "cleanFromFile", + ca.drv->clean, "clean", , ); } int can_smudge_to_file(const char *path) { struct conv_attrs ca; + static int warncount = 0; convert_attrs(, path); if (!ca.drv) return 0; - /* Only use the smudgeToFile filter when the smudge filter is also -* configured. -*/ - return (ca.drv->smudge_to_file && ca.drv->smudge); + return can_filter_file(ca.drv->smudge_to_file, "smudgeToFile", + ca.drv->smudge, "smudge", , ); } const char *get_convert_attr_ascii(const char *path) -- 2.9.0.587.ga3bedf2 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 8/8] use smudgeToFile filter in recursive merge
Recursive merge updates the work tree and so should use the smudgeToFile filter. At this point, smudgeToFile is run by everything that updates work tree files. Signed-off-by: Joey Hess <jo...@joeyh.name> --- merge-recursive.c | 42 -- t/t0021-conversion.sh | 16 +++- 2 files changed, 47 insertions(+), 11 deletions(-) diff --git a/merge-recursive.c b/merge-recursive.c index 48fe7e7..7d38cf8 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -765,14 +765,6 @@ static void update_file_flags(struct merge_options *o, die(_("cannot read object %s '%s'"), oid_to_hex(oid), path); if (type != OBJ_BLOB) die(_("blob expected for %s '%s'"), oid_to_hex(oid), path); - if (S_ISREG(mode)) { - struct strbuf strbuf = STRBUF_INIT; - if (convert_to_working_tree(path, buf, size, )) { - free(buf); - size = strbuf.len; - buf = strbuf_detach(, NULL); - } - } if (make_room_for_path(o, path) < 0) { update_wd = 0; @@ -781,6 +773,7 @@ static void update_file_flags(struct merge_options *o, } if (S_ISREG(mode) || (!has_symlinks && S_ISLNK(mode))) { int fd; + int isreg = S_ISREG(mode); if (mode & 0100) mode = 0777; else @@ -788,8 +781,37 @@ static void update_file_flags(struct merge_options *o, fd = open(path, O_WRONLY | O_TRUNC | O_CREAT, mode); if (fd < 0) die_errno(_("failed to open '%s'"), path); - write_in_full(fd, buf, size); - close(fd); + + int smudge_to_file = can_smudge_to_file(path); + if (smudge_to_file) { + close(fd); + fd = convert_to_working_tree_filter_to_file(path, path, buf, size); + if (fd < 0) { + /* smudgeToFile filter failed; +* continue with regular file +* creation. */ + smudge_to_file = 0; + fd = open(path, O_WRONLY | O_TRUNC | O_CREAT, mode); + if (fd < 0) + die_errno(_("failed to open '%s'"), path); + } + else { + close(fd); + } + } + + if (! smudge_to_file) { + if (isreg) { + struct strbuf strbuf = STRBUF_INIT; + if (convert_to_working_tree(path, buf, size, )) { + free(buf); + size = strbuf.len; + buf = strbuf_detach(, NULL); + } + } + write_in_full(fd, buf, size); + close(fd); + } } else if (S_ISLNK(mode)) { char *lnk = xmemdupz(buf, size); safe_create_leading_directories_const(path); diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh index fd07bd6..2722013 100755 --- a/t/t0021-conversion.sh +++ b/t/t0021-conversion.sh @@ -334,10 +334,24 @@ test_expect_success 'recovery from failure of smudgeToFile filter that deletes t cmp test fstest.t ' +test_expect_success 'smudgeToFile filter is used in merge' ' + test_config filter.rot13.smudgeToFile ./rot13-to-file.sh && + + git commit -m "added fstest.t" fstest.t && + git checkout -b old && + git reset --hard HEAD^ && + git merge master && + + test -e rot13-to-file.ran && + rm -f rot13-to-file.ran && + + cmp test fstest.t && + git checkout master +' + test_expect_success 'smudgeToFile filter is used by git am' ' test_config filter.rot13.smudgeToFile ./rot13-to-file.sh && - git commit fstest.t -m "added fstest.t" && git format-patch HEAD^ --stdout > fstest.patch && git reset --hard HEAD^ && git am &
[PATCH v4 6/8] better recovery from failure of smudgeToFile filter
If the smudgeToFile filter fails, it can leave the worktree file with the wrong content, or even deleted. Recover from this by falling back to running the smudge filter. Signed-off-by: Joey Hess <jo...@joeyh.name> --- entry.c | 55 --- t/t0021-conversion.sh | 24 ++ 2 files changed, 59 insertions(+), 20 deletions(-) diff --git a/entry.c b/entry.c index 97975e5..8322127 100644 --- a/entry.c +++ b/entry.c @@ -181,12 +181,6 @@ static int write_entry(struct cache_entry *ce, int regular_file = ce_mode_s_ifmt == S_IFREG; int smudge_to_file = regular_file && can_smudge_to_file(ce->name); - if (regular_file && ! smudge_to_file && - convert_to_working_tree(ce->name, new, size, )) { - free(new); - new = strbuf_detach(, ); - size = newsize; - } fd = open_output_fd(path, ce, to_tempfile); if (fd < 0) { @@ -194,7 +188,42 @@ static int write_entry(struct cache_entry *ce, return error_errno("unable to create file %s", path); } + if (smudge_to_file) { + close(fd); + if (! convert_to_working_tree_filter_to_file(ce->name, path, new, size)) { + smudge_to_file = 0; + /* The failing smudgeToFile filter may have +* deleted or replaced the file; delete +* the file and re-open for recovery write. +*/ + unlink(path); + fd = open_output_fd(path, ce, to_tempfile); + if (fd < 0) { + free(new); + return error_errno("unable to create file %s", path); + } + } + else { + free(new); + /* The smudgeToFile filter may have replaced +* or deleted the file; reopen it to make sure +* that the file exists. */ + fd = open(path, O_RDONLY); + if (fd < 0) + return error_errno("unable to create file %s", path); + if (!to_tempfile) + fstat_done = fstat_output(fd, state, ); + close(fd); + } + } + if (! smudge_to_file) { + if (regular_file && + convert_to_working_tree(ce->name, new, size, )) { + free(new); + new = strbuf_detach(, ); + size = newsize; + } wrote = write_in_full(fd, new, size); if (!to_tempfile) fstat_done = fstat_output(fd, state, ); @@ -203,20 +232,6 @@ static int write_entry(struct cache_entry *ce, if (wrote != size) return error("unable to write file %s", path); } - else { - close(fd); - convert_to_working_tree_filter_to_file(ce->name, path, new, size); - free(new); - /* The smudgeToFile filter may have replaced the -* file; open it to make sure that the file -* exists. */ - fd = open(path, O_RDONLY); - if (fd < 0) - return error_errno("unable to create file %s", path); - if (!to_tempfile) - fstat_done = fstat_output(fd, state, ); - close(fd); - } break; case S_IFGITLINK: if (to_tempfile) diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh index cba03fd..c0b4709 100755 --- a/t/t0021-conversion.sh +++ b/t/t0021-conversion.sh @@ -28,6 +28,14 @@ touch rot13-to-file.ran EOF chmod +x rot13-to-file.sh +cat <delete-file-and-fail.sh +#!$SHELL_PATH +destfile="\$1" +rm -f "\$destfile" +exit 1 +EOF +chmod +x delete-file-and-fail.sh + test_expect_success setup ' git config filter.rot13.smudge ./rot13.sh && git config filter.rot13.clean ./rot13.sh && @@ -310,6 +318,22 @@ test
[PATCH v4 1/8] clarify %f documentation
It's natural to expect %f to be an actual file on disk; help avoid that mistake. Signed-off-by: Joey Hess <jo...@joeyh.name> --- Documentation/gitattributes.txt | 5 + 1 file changed, 5 insertions(+) diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index f2afdb6..197ece8 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -379,6 +379,11 @@ substitution. For example: smudge = git-p4-filter --smudge %f +Note that "%f" is the name of the path that is being worked on. Depending +on the version that is being filtered, the corresponding file on disk may +not exist, or may have different contents. So, smudge and clean commands +should not try to access the file on disk, but only act as filters on the +content provided to them on standard input. Interaction between checkin/checkout attributes ^^^ -- 2.9.0.587.ga3bedf2 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 0/8] extend smudge/clean filters with direct file access (for pu)
This is the same as v3, except rebased on top of tb/convert-peek-in-index to fix a build failure in pu. Joey Hess (8): clarify %f documentation add smudgeToFile and cleanFromFile filter configs use cleanFromFile in git add use smudgeToFile in git checkout etc warn on unusable smudgeToFile/cleanFromFile config better recovery from failure of smudgeToFile filter use smudgeToFile filter in git am use smudgeToFile filter in recursive merge Documentation/config.txt| 18 - Documentation/gitattributes.txt | 42 builtin/apply.c | 16 + convert.c | 147 +++- convert.h | 11 +++ entry.c | 53 +++ merge-recursive.c | 42 +--- sha1_file.c | 44 ++-- t/t0021-conversion.sh | 115 +++ 9 files changed, 441 insertions(+), 47 deletions(-) -- 2.9.0.587.ga3bedf2 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4 4/8] use smudgeToFile in git checkout etc
This makes git checkout, git reset, etc use smudgeToFile. Includes test cases. (There's a call to convert_to_working_tree in merge-recursive.c that could also be made to use smudgeToFile as well.) Signed-off-by: Joey Hess <jo...@joeyh.name> --- entry.c | 37 + t/t0021-conversion.sh | 38 +- 2 files changed, 62 insertions(+), 13 deletions(-) diff --git a/entry.c b/entry.c index 519e042..97975e5 100644 --- a/entry.c +++ b/entry.c @@ -175,8 +175,13 @@ static int write_entry(struct cache_entry *ce, /* * Convert from git internal format to working tree format +* unless the smudgeToFile filter can write to the +* file directly. */ - if (ce_mode_s_ifmt == S_IFREG && + int regular_file = ce_mode_s_ifmt == S_IFREG; + int smudge_to_file = regular_file + && can_smudge_to_file(ce->name); + if (regular_file && ! smudge_to_file && convert_to_working_tree(ce->name, new, size, )) { free(new); new = strbuf_detach(, ); @@ -189,13 +194,29 @@ static int write_entry(struct cache_entry *ce, return error_errno("unable to create file %s", path); } - wrote = write_in_full(fd, new, size); - if (!to_tempfile) - fstat_done = fstat_output(fd, state, ); - close(fd); - free(new); - if (wrote != size) - return error("unable to write file %s", path); + if (! smudge_to_file) { + wrote = write_in_full(fd, new, size); + if (!to_tempfile) + fstat_done = fstat_output(fd, state, ); + close(fd); + free(new); + if (wrote != size) + return error("unable to write file %s", path); + } + else { + close(fd); + convert_to_working_tree_filter_to_file(ce->name, path, new, size); + free(new); + /* The smudgeToFile filter may have replaced the +* file; open it to make sure that the file +* exists. */ + fd = open(path, O_RDONLY); + if (fd < 0) + return error_errno("unable to create file %s", path); + if (!to_tempfile) + fstat_done = fstat_output(fd, state, ); + close(fd); + } break; case S_IFGITLINK: if (to_tempfile) diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh index 407d5d6..cba03fd 100755 --- a/t/t0021-conversion.sh +++ b/t/t0021-conversion.sh @@ -14,12 +14,20 @@ chmod +x rot13.sh cat <rot13-from-file.sh #!$SHELL_PATH -fsfile="\$1" +srcfile="\$1" touch rot13-from-file.ran -cat "\$fsfile" | ./rot13.sh +cat "\$srcfile" | ./rot13.sh EOF chmod +x rot13-from-file.sh +cat <rot13-to-file.sh +#!$SHELL_PATH +destfile="\$1" +touch rot13-to-file.ran +./rot13.sh > "\$destfile" +EOF +chmod +x rot13-to-file.sh + test_expect_success setup ' git config filter.rot13.smudge ./rot13.sh && git config filter.rot13.clean ./rot13.sh && @@ -291,6 +299,17 @@ test_expect_success 'cleanFromFile filter is used when adding a file' ' cmp test fstest.t ' +test_expect_success 'smudgeToFile filter is used when checking out a file' ' + test_config filter.rot13.smudgeToFile ./rot13-to-file.sh && + + rm -f fstest.t && + git checkout -- fstest.t && + cmp test fstest.t && + + test -e rot13-to-file.ran && + rm -f rot13-to-file.ran +' + test_expect_success 'cleanFromFile filter is not used when clean filter is not configured' ' test_config filter.noclean.smudge ./rot13.sh && test_config filter.noclean.cleanFromFile ./rot13-from-file.sh && @@ -299,9 +318,18 @@ test_expect_success 'cleanFromFile filter is not used when clean filter is not c cat test >test.no && git add test.no && - test ! -e rot13-from-file.ran && - git cat-file blob :test.no >actual && - cmp test actual + test ! -e rot13-from-file.ran +' + +test_expect_success 'smudgeToFile filter is not used when smudge filter is not configured' ' + test_config filter.nosmudge.clean ./rot13.sh && + test_
[PATCH v3 3/8] use cleanFromFile in git add
Includes test cases. Signed-off-by: Joey Hess <jo...@joeyh.name> --- sha1_file.c | 42 -- t/t0021-conversion.sh | 36 2 files changed, 72 insertions(+), 6 deletions(-) diff --git a/sha1_file.c b/sha1_file.c index d5e1121..8df86a0 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -3329,6 +3329,29 @@ static int index_stream_convert_blob(unsigned char *sha1, int fd, return ret; } +static int index_from_file_convert_blob(unsigned char *sha1, + const char *path, unsigned flags) +{ + int ret; + const int write_object = flags & HASH_WRITE_OBJECT; + struct strbuf sbuf = STRBUF_INIT; + + assert(path); + assert(can_clean_from_file(path)); + + convert_to_git_filter_from_file(path, , +write_object ? safe_crlf : SAFE_CRLF_FALSE); + + if (write_object) + ret = write_sha1_file(sbuf.buf, sbuf.len, typename(OBJ_BLOB), + sha1); + else + ret = hash_sha1_file(sbuf.buf, sbuf.len, typename(OBJ_BLOB), +sha1); + strbuf_release(); + return ret; +} + static int index_pipe(unsigned char *sha1, int fd, enum object_type type, const char *path, unsigned flags) { @@ -3421,12 +3444,19 @@ int index_path(unsigned char *sha1, const char *path, struct stat *st, unsigned switch (st->st_mode & S_IFMT) { case S_IFREG: - fd = open(path, O_RDONLY); - if (fd < 0) - return error_errno("open(\"%s\")", path); - if (index_fd(sha1, fd, st, OBJ_BLOB, path, flags) < 0) - return error("%s: failed to insert into database", -path); + if (can_clean_from_file(path)) { + if (index_from_file_convert_blob(sha1, path, flags) < 0) + return error("%s: failed to insert into database", +path); + } + else { + fd = open(path, O_RDONLY); + if (fd < 0) + return error_errno("open(\"%s\")", path); + if (index_fd(sha1, fd, st, OBJ_BLOB, path, flags) < 0) + return error("%s: failed to insert into database", +path); + } break; case S_IFLNK: if (strbuf_readlink(, path, st->st_size)) diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh index 7bac2bc..407d5d6 100755 --- a/t/t0021-conversion.sh +++ b/t/t0021-conversion.sh @@ -12,6 +12,14 @@ tr \ EOF chmod +x rot13.sh +cat <rot13-from-file.sh +#!$SHELL_PATH +fsfile="\$1" +touch rot13-from-file.ran +cat "\$fsfile" | ./rot13.sh +EOF +chmod +x rot13-from-file.sh + test_expect_success setup ' git config filter.rot13.smudge ./rot13.sh && git config filter.rot13.clean ./rot13.sh && @@ -268,4 +276,32 @@ test_expect_success 'disable filter with empty override' ' test_must_be_empty err ' +test_expect_success 'cleanFromFile filter is used when adding a file' ' + test_config filter.rot13.cleanFromFile ./rot13-from-file.sh && + + echo "*.t filter=rot13" >.gitattributes && + + cat test >fstest.t && + git add fstest.t && + test -e rot13-from-file.ran && + rm -f rot13-from-file.ran && + + rm -f fstest.t && + git checkout -- fstest.t && + cmp test fstest.t +' + +test_expect_success 'cleanFromFile filter is not used when clean filter is not configured' ' + test_config filter.noclean.smudge ./rot13.sh && + test_config filter.noclean.cleanFromFile ./rot13-from-file.sh && + + echo "*.no filter=noclean" >.gitattributes && + + cat test >test.no && + git add test.no && + test ! -e rot13-from-file.ran && + git cat-file blob :test.no >actual && + cmp test actual +' + test_done -- 2.9.0.8.g973eabb.dirty -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 8/8] use smudgeToFile filter in recursive merge
Recursive merge updates the work tree and so should use the smudgeToFile filter. At this point, smudgeToFile is run by everything that updates work tree files. Signed-off-by: Joey Hess <jo...@joeyh.name> --- merge-recursive.c | 42 -- t/t0021-conversion.sh | 16 +++- 2 files changed, 47 insertions(+), 11 deletions(-) diff --git a/merge-recursive.c b/merge-recursive.c index 65cb5d6..012fe38 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -765,14 +765,6 @@ static void update_file_flags(struct merge_options *o, die(_("cannot read object %s '%s'"), sha1_to_hex(sha), path); if (type != OBJ_BLOB) die(_("blob expected for %s '%s'"), sha1_to_hex(sha), path); - if (S_ISREG(mode)) { - struct strbuf strbuf = STRBUF_INIT; - if (convert_to_working_tree(path, buf, size, )) { - free(buf); - size = strbuf.len; - buf = strbuf_detach(, NULL); - } - } if (make_room_for_path(o, path) < 0) { update_wd = 0; @@ -781,6 +773,7 @@ static void update_file_flags(struct merge_options *o, } if (S_ISREG(mode) || (!has_symlinks && S_ISLNK(mode))) { int fd; + int isreg = S_ISREG(mode); if (mode & 0100) mode = 0777; else @@ -788,8 +781,37 @@ static void update_file_flags(struct merge_options *o, fd = open(path, O_WRONLY | O_TRUNC | O_CREAT, mode); if (fd < 0) die_errno(_("failed to open '%s'"), path); - write_in_full(fd, buf, size); - close(fd); + + int smudge_to_file = can_smudge_to_file(path); + if (smudge_to_file) { + close(fd); + fd = convert_to_working_tree_filter_to_file(path, path, buf, size); + if (fd < 0) { + /* smudgeToFile filter failed; +* continue with regular file +* creation. */ + smudge_to_file = 0; + fd = open(path, O_WRONLY | O_TRUNC | O_CREAT, mode); + if (fd < 0) + die_errno(_("failed to open '%s'"), path); + } + else { + close(fd); + } + } + + if (! smudge_to_file) { + if (isreg) { + struct strbuf strbuf = STRBUF_INIT; + if (convert_to_working_tree(path, buf, size, )) { + free(buf); + size = strbuf.len; + buf = strbuf_detach(, NULL); + } + } + write_in_full(fd, buf, size); + close(fd); + } } else if (S_ISLNK(mode)) { char *lnk = xmemdupz(buf, size); safe_create_leading_directories_const(path); diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh index fd07bd6..2722013 100755 --- a/t/t0021-conversion.sh +++ b/t/t0021-conversion.sh @@ -334,10 +334,24 @@ test_expect_success 'recovery from failure of smudgeToFile filter that deletes t cmp test fstest.t ' +test_expect_success 'smudgeToFile filter is used in merge' ' + test_config filter.rot13.smudgeToFile ./rot13-to-file.sh && + + git commit -m "added fstest.t" fstest.t && + git checkout -b old && + git reset --hard HEAD^ && + git merge master && + + test -e rot13-to-file.ran && + rm -f rot13-to-file.ran && + + cmp test fstest.t && + git checkout master +' + test_expect_success 'smudgeToFile filter is used by git am' ' test_config filter.rot13.smudgeToFile ./rot13-to-file.sh && - git commit fstest.t -m "added fstest.t" && git format-patch HEAD^ --stdout > fstest.patch && git reset --hard HEAD^ && git am <
[PATCH v3 4/8] use smudgeToFile in git checkout etc
This makes git checkout, git reset, etc use smudgeToFile. Includes test cases. (There's a call to convert_to_working_tree in merge-recursive.c that could also be made to use smudgeToFile as well.) Signed-off-by: Joey Hess <jo...@joeyh.name> --- entry.c | 37 + t/t0021-conversion.sh | 38 +- 2 files changed, 62 insertions(+), 13 deletions(-) diff --git a/entry.c b/entry.c index 519e042..97975e5 100644 --- a/entry.c +++ b/entry.c @@ -175,8 +175,13 @@ static int write_entry(struct cache_entry *ce, /* * Convert from git internal format to working tree format +* unless the smudgeToFile filter can write to the +* file directly. */ - if (ce_mode_s_ifmt == S_IFREG && + int regular_file = ce_mode_s_ifmt == S_IFREG; + int smudge_to_file = regular_file + && can_smudge_to_file(ce->name); + if (regular_file && ! smudge_to_file && convert_to_working_tree(ce->name, new, size, )) { free(new); new = strbuf_detach(, ); @@ -189,13 +194,29 @@ static int write_entry(struct cache_entry *ce, return error_errno("unable to create file %s", path); } - wrote = write_in_full(fd, new, size); - if (!to_tempfile) - fstat_done = fstat_output(fd, state, ); - close(fd); - free(new); - if (wrote != size) - return error("unable to write file %s", path); + if (! smudge_to_file) { + wrote = write_in_full(fd, new, size); + if (!to_tempfile) + fstat_done = fstat_output(fd, state, ); + close(fd); + free(new); + if (wrote != size) + return error("unable to write file %s", path); + } + else { + close(fd); + convert_to_working_tree_filter_to_file(ce->name, path, new, size); + free(new); + /* The smudgeToFile filter may have replaced the +* file; open it to make sure that the file +* exists. */ + fd = open(path, O_RDONLY); + if (fd < 0) + return error_errno("unable to create file %s", path); + if (!to_tempfile) + fstat_done = fstat_output(fd, state, ); + close(fd); + } break; case S_IFGITLINK: if (to_tempfile) diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh index 407d5d6..cba03fd 100755 --- a/t/t0021-conversion.sh +++ b/t/t0021-conversion.sh @@ -14,12 +14,20 @@ chmod +x rot13.sh cat <rot13-from-file.sh #!$SHELL_PATH -fsfile="\$1" +srcfile="\$1" touch rot13-from-file.ran -cat "\$fsfile" | ./rot13.sh +cat "\$srcfile" | ./rot13.sh EOF chmod +x rot13-from-file.sh +cat <rot13-to-file.sh +#!$SHELL_PATH +destfile="\$1" +touch rot13-to-file.ran +./rot13.sh > "\$destfile" +EOF +chmod +x rot13-to-file.sh + test_expect_success setup ' git config filter.rot13.smudge ./rot13.sh && git config filter.rot13.clean ./rot13.sh && @@ -291,6 +299,17 @@ test_expect_success 'cleanFromFile filter is used when adding a file' ' cmp test fstest.t ' +test_expect_success 'smudgeToFile filter is used when checking out a file' ' + test_config filter.rot13.smudgeToFile ./rot13-to-file.sh && + + rm -f fstest.t && + git checkout -- fstest.t && + cmp test fstest.t && + + test -e rot13-to-file.ran && + rm -f rot13-to-file.ran +' + test_expect_success 'cleanFromFile filter is not used when clean filter is not configured' ' test_config filter.noclean.smudge ./rot13.sh && test_config filter.noclean.cleanFromFile ./rot13-from-file.sh && @@ -299,9 +318,18 @@ test_expect_success 'cleanFromFile filter is not used when clean filter is not c cat test >test.no && git add test.no && - test ! -e rot13-from-file.ran && - git cat-file blob :test.no >actual && - cmp test actual + test ! -e rot13-from-file.ran +' + +test_expect_success 'smudgeToFile filter is not used when smudge filter is not configured' ' + test_config filter.nosmudge.clean ./rot13.sh && + test
[PATCH v3 7/8] use smudgeToFile filter in git am
git am updates the work tree and so should use the smudgeToFile filter. This includes some refactoring into convert_to_working_tree_filter_to_file to make it check the file after running the smudgeToFile command, and clean up from a failing command. Signed-off-by: Joey Hess <jo...@joeyh.name> --- builtin/apply.c | 16 convert.c | 19 +-- entry.c | 15 +++ t/t0021-conversion.sh | 13 + 4 files changed, 49 insertions(+), 14 deletions(-) diff --git a/builtin/apply.c b/builtin/apply.c index c770d7d..acd7c3e 100644 --- a/builtin/apply.c +++ b/builtin/apply.c @@ -4109,6 +4109,22 @@ static int try_create_file(const char *path, unsigned int mode, const char *buf, if (fd < 0) return -1; + if (can_smudge_to_file(path)) { + close(fd); + fd = convert_to_working_tree_filter_to_file(path, path, buf, size); + if (fd < 0) { + /* smudgeToFile filter failed; continue +* with regular file creation. */ + fd = open(path, O_CREAT | O_EXCL | O_WRONLY, (mode & 0100) ? 0777 : 0666); + if (fd < 0) + return -1; + } + else { + close(fd); + return 0; + } + } + if (convert_to_working_tree(path, buf, size, )) { size = nbuf.len; buf = nbuf.buf; diff --git a/convert.c b/convert.c index b6a76a9..948b52f 100644 --- a/convert.c +++ b/convert.c @@ -1031,13 +1031,28 @@ int convert_to_working_tree(const char *path, const char *src, size_t len, struc return convert_to_working_tree_internal(path, NULL, src, len, dst, 0); } +/* Returns fd open to read the worktree file on success. + * On failure, the worktree file will not exist. */ int convert_to_working_tree_filter_to_file(const char *path, const char *destpath, const char *src, size_t len) { struct strbuf output = STRBUF_INIT; - int ret = convert_to_working_tree_internal(path, destpath, src, len, , 0); + int ok = convert_to_working_tree_internal(path, destpath, src, len, , 0); /* The smudgeToFile filter stdout is not used. */ strbuf_release(); - return ret; + if (ok) { + /* Open the file to make sure that it's present +* (and readable) after the command populated it. */ + int fd = open(path, O_RDONLY); + if (fd < 0) + unlink(path); + return fd; + } + else { + /* The command could have created the file before failing, +* so delete it. */ + unlink(path); + return -1; + } } int renormalize_buffer(const char *path, const char *src, size_t len, struct strbuf *dst) diff --git a/entry.c b/entry.c index 8322127..2f7c4fd 100644 --- a/entry.c +++ b/entry.c @@ -190,13 +190,10 @@ static int write_entry(struct cache_entry *ce, if (smudge_to_file) { close(fd); - if (! convert_to_working_tree_filter_to_file(ce->name, path, new, size)) { + fd = convert_to_working_tree_filter_to_file(ce->name, path, new, size); + if (fd < 0) { smudge_to_file = 0; - /* The failing smudgeToFile filter may have -* deleted or replaced the file; delete -* the file and re-open for recovery write. -*/ - unlink(path); + /* re-open for recovery write */ fd = open_output_fd(path, ce, to_tempfile); if (fd < 0) { free(new); @@ -205,12 +202,6 @@ static int write_entry(struct cache_entry *ce, } else { free(new); - /* The smudgeToFile filter may have replaced -* or deleted the file; reopen it to make sure -* that the file exists. */ - fd = open(path, O_RDONLY); - if (fd < 0) - return error_errno("unable to create file %s", path); if (!to_tempfile) fstat_done = fstat_output(fd, state, ); close(fd); diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh index c0b4709..fd07bd6 100755 --- a/t/t0021-conversion.sh +++ b/t/t0021-conversion.sh @@ -334,
[PATCH v3 5/8] warn on unusable smudgeToFile/cleanFromFile config
Let the user know when they have a smudgeToFile/cleanFromFile config that cannot be used because the corresponding smudge/clean config is missing. The warning is only displayed a maximum of once per git invocation, and only when doing an operation that would use the filter. Signed-off-by: Joey Hess <jo...@joeyh.name> --- convert.c | 34 ++ 1 file changed, 26 insertions(+), 8 deletions(-) diff --git a/convert.c b/convert.c index bf63ba0..b6a76a9 100644 --- a/convert.c +++ b/convert.c @@ -847,32 +847,50 @@ int would_convert_to_git_filter_fd(const char *path) return apply_filter(path, NULL, NULL, 0, -1, NULL, ca.drv->clean); } +static int can_filter_file(const char *filefilter, const char *filefiltername, + const char *stdiofilter, const char *stdiofiltername, + const struct conv_attrs *ca, + int *warncount) +{ + if (! filefilter) + return 0; + + if (stdiofilter) + return 1; + + if (*warncount == 0) + warning("Not running your configured filter.%s.%s command, because filter.%s.%s is not configured", + ca->drv->name, filefiltername, + ca->drv->name, stdiofiltername); + *warncount=*warncount+1; + + return 0; +} + int can_clean_from_file(const char *path) { struct conv_attrs ca; + static int warncount = 0; convert_attrs(, path); if (!ca.drv) return 0; - /* Only use the cleanFromFile filter when the clean filter is also -* configured. -*/ - return (ca.drv->clean_from_file && ca.drv->clean); + return can_filter_file(ca.drv->clean_from_file, "cleanFromFile", + ca.drv->clean, "clean", , ); } int can_smudge_to_file(const char *path) { struct conv_attrs ca; + static int warncount = 0; convert_attrs(, path); if (!ca.drv) return 0; - /* Only use the smudgeToFile filter when the smudge filter is also -* configured. -*/ - return (ca.drv->smudge_to_file && ca.drv->smudge); + return can_filter_file(ca.drv->smudge_to_file, "smudgeToFile", + ca.drv->smudge, "smudge", , ); } const char *get_convert_attr_ascii(const char *path) -- 2.9.0.8.g973eabb.dirty -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 2/8] add smudgeToFile and cleanFromFile filter configs
This adds new smudgeToFile and cleanFromFile filter commands, which are similar to smudge and clean but allow direct access to files on disk. This interface can be much more efficient when operating on large files, because the whole file content does not need to be streamed through the filter. It even allows for things like cleanFromFile commands that avoid reading the whole content of the file, and for smudgeToFile commands that populate a work tree file using an efficient Copy On Write operation. The new filter commands will not be used for all filtering. They are efficient to use when git add is adding a file, or when the work tree is being updated, but not a good fit when git is internally filtering blob objects in memory for eg, a diff. So, a user who wants to use smudgeToFile should also provide a smudge command to be used in cases where smudgeToFile is not used. And ditto with cleanFromFile and clean. To avoid foot-shooting configurations, the new commands are not used unless the old commands are also configured. That also ensures that a filter driver configuration that includes these new commands will work, although less efficiently, when used with an older version of git that does not support them. Signed-off-by: Joey Hess <jo...@joeyh.name> --- Documentation/config.txt| 18 ++- Documentation/gitattributes.txt | 37 ++ convert.c | 108 ++-- convert.h | 10 4 files changed, 157 insertions(+), 16 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 2e1b2e4..c300efe 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1299,15 +1299,29 @@ format.useAutoBase:: format-patch by default. filter..clean:: - The command which is used to convert the content of a worktree + The command which is used as a filter to convert the content of a worktree file to a blob upon checkin. See linkgit:gitattributes[5] for details. filter..smudge:: - The command which is used to convert the content of a blob + The command which is used as a filter to convert the content of a blob object to a worktree file upon checkout. See linkgit:gitattributes[5] for details. +filter..cleanFromFile:: + Similar to filter..clean but the specified command + directly accesses a worktree file on disk, rather than + receiving the file content from standard input. + Only used when filter..clean is also configured. + See linkgit:gitattributes[5] for details. + +filter..smudgeToFile:: + Similar to filter..smudge but the specified command + writes the content of a blob directly to a worktree file, + rather than to standard output. + Only used when filter..smudge is also configured. + See linkgit:gitattributes[5] for details. + fsck.:: Allows overriding the message type (error, warn or ignore) of a specific message ID such as `missingEmail`. diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index 145dd10..5ae0783 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -380,6 +380,43 @@ not exist, or may have different contents. So, smudge and clean commands should not try to access the file on disk, but only act as filters on the content provided to them on standard input. +There are two extra commands "cleanFromFile" and "smudgeToFile", which +can optionally be set in a filter driver. These are similar to the "clean" +and "smudge" commands, but avoid needing to pipe the contents of files +through the filters, and instead read/write files in the filesystem. +This can be more efficient when using filters with large files that are not +directly stored in the repository. + +Both "cleanFromFile" and "smudgeToFile" are provided a path as an +added parameter after the configured command line. + +The "cleanFromFile" command is provided the path to the file that +it should clean. Like the "clean" command, it should output the cleaned +version to standard output. + +The "smudgeToFile" command is provided a path to the file that it +should write to. (This file will already exist, as an empty file that can +be written to or replaced.) Like the "smudge" command, "smudgeToFile" +is fed the blob object from its standard input. + +Some git operations that need to apply filters cannot use "cleanFromFile" +and "smudgeToFile", since the files are not present to disk. So, to avoid +inconsistent behavior, "cleanFromFile" will only be used if "clean" is +also configured, and "smudgeToFile" will only be used if "smudge" is also +configured. + +An example large file storage filter driver using cleanFromFile and +sm
[PATCH v3 6/8] better recovery from failure of smudgeToFile filter
If the smudgeToFile filter fails, it can leave the worktree file with the wrong content, or even deleted. Recover from this by falling back to running the smudge filter. Signed-off-by: Joey Hess <jo...@joeyh.name> --- entry.c | 55 --- t/t0021-conversion.sh | 24 ++ 2 files changed, 59 insertions(+), 20 deletions(-) diff --git a/entry.c b/entry.c index 97975e5..8322127 100644 --- a/entry.c +++ b/entry.c @@ -181,12 +181,6 @@ static int write_entry(struct cache_entry *ce, int regular_file = ce_mode_s_ifmt == S_IFREG; int smudge_to_file = regular_file && can_smudge_to_file(ce->name); - if (regular_file && ! smudge_to_file && - convert_to_working_tree(ce->name, new, size, )) { - free(new); - new = strbuf_detach(, ); - size = newsize; - } fd = open_output_fd(path, ce, to_tempfile); if (fd < 0) { @@ -194,7 +188,42 @@ static int write_entry(struct cache_entry *ce, return error_errno("unable to create file %s", path); } + if (smudge_to_file) { + close(fd); + if (! convert_to_working_tree_filter_to_file(ce->name, path, new, size)) { + smudge_to_file = 0; + /* The failing smudgeToFile filter may have +* deleted or replaced the file; delete +* the file and re-open for recovery write. +*/ + unlink(path); + fd = open_output_fd(path, ce, to_tempfile); + if (fd < 0) { + free(new); + return error_errno("unable to create file %s", path); + } + } + else { + free(new); + /* The smudgeToFile filter may have replaced +* or deleted the file; reopen it to make sure +* that the file exists. */ + fd = open(path, O_RDONLY); + if (fd < 0) + return error_errno("unable to create file %s", path); + if (!to_tempfile) + fstat_done = fstat_output(fd, state, ); + close(fd); + } + } + if (! smudge_to_file) { + if (regular_file && + convert_to_working_tree(ce->name, new, size, )) { + free(new); + new = strbuf_detach(, ); + size = newsize; + } wrote = write_in_full(fd, new, size); if (!to_tempfile) fstat_done = fstat_output(fd, state, ); @@ -203,20 +232,6 @@ static int write_entry(struct cache_entry *ce, if (wrote != size) return error("unable to write file %s", path); } - else { - close(fd); - convert_to_working_tree_filter_to_file(ce->name, path, new, size); - free(new); - /* The smudgeToFile filter may have replaced the -* file; open it to make sure that the file -* exists. */ - fd = open(path, O_RDONLY); - if (fd < 0) - return error_errno("unable to create file %s", path); - if (!to_tempfile) - fstat_done = fstat_output(fd, state, ); - close(fd); - } break; case S_IFGITLINK: if (to_tempfile) diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh index cba03fd..c0b4709 100755 --- a/t/t0021-conversion.sh +++ b/t/t0021-conversion.sh @@ -28,6 +28,14 @@ touch rot13-to-file.ran EOF chmod +x rot13-to-file.sh +cat <delete-file-and-fail.sh +#!$SHELL_PATH +destfile="\$1" +rm -f "\$destfile" +exit 1 +EOF +chmod +x delete-file-and-fail.sh + test_expect_success setup ' git config filter.rot13.smudge ./rot13.sh && git config filter.rot13.clean ./rot13.sh && @@ -310,6 +318,22 @@ test
[PATCH v3 0/8] extend smudge/clean filters with direct file access
Reroll of this patch set with changes: * Added additional smudgeToFile calls in git am and recursive merge. * Improved behavior when smudgeToFile filter fails. * Some small improvements to the test cases. I hope this will be the final version. Joey Hess (8): clarify %f documentation add smudgeToFile and cleanFromFile filter configs use cleanFromFile in git add use smudgeToFile in git checkout etc warn on unusable smudgeToFile/cleanFromFile config better recovery from failure of smudgeToFile filter use smudgeToFile filter in git am use smudgeToFile filter in recursive merge Documentation/config.txt| 18 - Documentation/gitattributes.txt | 42 builtin/apply.c | 16 + convert.c | 141 convert.h | 10 +++ entry.c | 53 +++ merge-recursive.c | 42 +--- sha1_file.c | 42 ++-- t/t0021-conversion.sh | 115 9 files changed, 434 insertions(+), 45 deletions(-) -- 2.9.0.8.g973eabb.dirty -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 1/8] clarify %f documentation
It's natural to expect %f to be an actual file on disk; help avoid that mistake. Signed-off-by: Joey Hess <jo...@joeyh.name> --- Documentation/gitattributes.txt | 5 + 1 file changed, 5 insertions(+) diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index e3b1de8..145dd10 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -374,6 +374,11 @@ substitution. For example: smudge = git-p4-filter --smudge %f +Note that "%f" is the name of the path that is being worked on. Depending +on the version that is being filtered, the corresponding file on disk may +not exist, or may have different contents. So, smudge and clean commands +should not try to access the file on disk, but only act as filters on the +content provided to them on standard input. Interaction between checkin/checkout attributes ^^^ -- 2.9.0.8.g973eabb.dirty -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/4] clarify %f documentation
It's natural to expect %f to be an actual file on disk; help avoid that mistake. Signed-off-by: Joey Hess <jo...@joeyh.name> --- This patch series was meant to contain 5 patches; here's the missing one. This patch will apply cleanly on top of v2.9.0. Documentation/gitattributes.txt | 5 + 1 file changed, 5 insertions(+) diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index e3b1de8..145dd10 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -374,6 +374,11 @@ substitution. For example: smudge = git-p4-filter --smudge %f +Note that "%f" is the name of the path that is being worked on. Depending +on the version that is being filtered, the corresponding file on disk may +not exist, or may have different contents. So, smudge and clean commands +should not try to access the file on disk, but only act as filters on the +content provided to them on standard input. Interaction between checkin/checkout attributes ^^^ -- 2.8.1 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/4] extend smudge/clean filters with direct file access
Doing a quick benchmark of this new interface and git-annex's use of it, git checkout of a 1 gigabyte file with git-annex providing the smudge filter took: 19 seconds using the smudge interface 11 seconds using smudgeToFile 0.1 seconds with smudgeToFile and annex.thin set (while also saving 1 gb of disk space!) So around 2x speed improvement due to not needing to pipe the file content through the filter, even without git-annex's annex.thin tricks. -- see shy jo signature.asc Description: PGP signature
[PATCH v2 3/4] use smudgeToFile in git checkout etc
This makes git checkout, git reset, etc use smudgeToFile. Includes test cases. (There's a call to convert_to_working_tree in merge-recursive.c that could also be made to use smudgeToFile as well.) Signed-off-by: Joey Hess <jo...@joeyh.name> --- entry.c | 37 + t/t0021-conversion.sh | 38 +- 2 files changed, 62 insertions(+), 13 deletions(-) diff --git a/entry.c b/entry.c index 519e042..97975e5 100644 --- a/entry.c +++ b/entry.c @@ -175,8 +175,13 @@ static int write_entry(struct cache_entry *ce, /* * Convert from git internal format to working tree format +* unless the smudgeToFile filter can write to the +* file directly. */ - if (ce_mode_s_ifmt == S_IFREG && + int regular_file = ce_mode_s_ifmt == S_IFREG; + int smudge_to_file = regular_file + && can_smudge_to_file(ce->name); + if (regular_file && ! smudge_to_file && convert_to_working_tree(ce->name, new, size, )) { free(new); new = strbuf_detach(, ); @@ -189,13 +194,29 @@ static int write_entry(struct cache_entry *ce, return error_errno("unable to create file %s", path); } - wrote = write_in_full(fd, new, size); - if (!to_tempfile) - fstat_done = fstat_output(fd, state, ); - close(fd); - free(new); - if (wrote != size) - return error("unable to write file %s", path); + if (! smudge_to_file) { + wrote = write_in_full(fd, new, size); + if (!to_tempfile) + fstat_done = fstat_output(fd, state, ); + close(fd); + free(new); + if (wrote != size) + return error("unable to write file %s", path); + } + else { + close(fd); + convert_to_working_tree_filter_to_file(ce->name, path, new, size); + free(new); + /* The smudgeToFile filter may have replaced the +* file; open it to make sure that the file +* exists. */ + fd = open(path, O_RDONLY); + if (fd < 0) + return error_errno("unable to create file %s", path); + if (!to_tempfile) + fstat_done = fstat_output(fd, state, ); + close(fd); + } break; case S_IFGITLINK: if (to_tempfile) diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh index 399f92b..a8042d1 100755 --- a/t/t0021-conversion.sh +++ b/t/t0021-conversion.sh @@ -14,12 +14,20 @@ chmod +x rot13.sh cat <rot13-from-file.sh #!$SHELL_PATH -fsfile="\$1" +srcfile="\$1" touch rot13-from-file.ran -cat "\$fsfile" | ./rot13.sh +cat "\$srcfile" | ./rot13.sh EOF chmod +x rot13-from-file.sh +cat <rot13-to-file.sh +#!$SHELL_PATH +destfile="\$1" +touch rot13-to-file.ran +./rot13.sh > "\$destfile" +EOF +chmod +x rot13-to-file.sh + test_expect_success setup ' git config filter.rot13.smudge ./rot13.sh && git config filter.rot13.clean ./rot13.sh && @@ -291,6 +299,17 @@ test_expect_success 'cleanFromFile filter is used when adding a file' ' cmp test.t fstest.t ' +test_expect_success 'smudgeToFile filter is used when checking out a file' ' + test_config filter.rot13.smudgeToFile ./rot13-to-file.sh && + + rm -f fstest.t && + git checkout -- fstest.t && + cmp test.t fstest.t && + + test -e rot13-to-file.ran && + rm -f rot13-to-file.ran +' + test_expect_success 'cleanFromFile filter is not used when clean filter is not configured' ' test_config filter.noclean.smudge ./rot13.sh && test_config filter.noclean.cleanFromFile ./rot13-from-file.sh && @@ -299,9 +318,18 @@ test_expect_success 'cleanFromFile filter is not used when clean filter is not c cat test.t >test.no && git add test.no && - test ! -e rot13-from-file.ran && - git cat-file blob :test.no >actual && - cmp test.t actual + test ! -e rot13-from-file.ran +' + +test_expect_success 'smudgeToFile filter is not used when smudge filter is not configured' ' + test_config filter.nosmudge.clean ./rot13.sh &&
[PATCH v2 1/4] add smudgeToFile and cleanFromFile filter configs
This adds new smudgeToFile and cleanFromFile filter commands, which are similar to smudge and clean but allow direct access to files on disk. This interface can be much more efficient when operating on large files, because the whole file content does not need to be streamed through the filter. It even allows for things like cleanFromFile commands that avoid reading the whole content of the file, and for smudgeToFile commands that populate a work tree file using an efficient Copy On Write operation. The new filter commands will not be used for all filtering. They are efficient to use when git add is adding a file, or when the work tree is being updated, but not a good fit when git is internally filtering blob objects in memory for eg, a diff. So, a user who wants to use smudgeToFile should also provide a smudge command to be used in cases where smudgeToFile is not used. And ditto with cleanFromFile and clean. To avoid foot-shooting configurations, the new commands are not used unless the old commands are also configured. That also ensures that a filter driver configuration that includes these new commands will work, although less efficiently, when used with an older version of git that does not support them. Signed-off-by: Joey Hess <jo...@joeyh.name> --- Documentation/config.txt| 18 ++- Documentation/gitattributes.txt | 37 ++ convert.c | 108 ++-- convert.h | 10 4 files changed, 157 insertions(+), 16 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 2e1b2e4..38f54c1 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1299,15 +1299,29 @@ format.useAutoBase:: format-patch by default. filter..clean:: - The command which is used to convert the content of a worktree + The command which is used as a filter to convert the content of a worktree file to a blob upon checkin. See linkgit:gitattributes[5] for details. filter..smudge:: - The command which is used to convert the content of a blob + The command which is used as a filter to convert the content of a blob object to a worktree file upon checkout. See linkgit:gitattributes[5] for details. +filter..cleanFromFile:: + Similar to filter..clean but the specified command + directly accesses a worktree file on disk, rather than + receiving the file content from standard input. + Only used when filter..clean is also configured. + See linkgit:gitattributes[5] for details. + +filter..smudgeToFile:: + Similar to filter..smudge but the specified command + writes the content of a blob directly to a worktree file, + rather than to standard output. + Only used when filter..smudge is also configured. + See linkgit:gitattributes[5] for details. + fsck.:: Allows overriding the message type (error, warn or ignore) of a specific message ID such as `missingEmail`. diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index 145dd10..5ae0783 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -380,6 +380,43 @@ not exist, or may have different contents. So, smudge and clean commands should not try to access the file on disk, but only act as filters on the content provided to them on standard input. +There are two extra commands "cleanFromFile" and "smudgeToFile", which +can optionally be set in a filter driver. These are similar to the "clean" +and "smudge" commands, but avoid needing to pipe the contents of files +through the filters, and instead read/write files in the filesystem. +This can be more efficient when using filters with large files that are not +directly stored in the repository. + +Both "cleanFromFile" and "smudgeToFile" are provided a path as an +added parameter after the configured command line. + +The "cleanFromFile" command is provided the path to the file that +it should clean. Like the "clean" command, it should output the cleaned +version to standard output. + +The "smudgeToFile" command is provided a path to the file that it +should write to. (This file will already exist, as an empty file that can +be written to or replaced.) Like the "smudge" command, "smudgeToFile" +is fed the blob object from its standard input. + +Some git operations that need to apply filters cannot use "cleanFromFile" +and "smudgeToFile", since the files are not present to disk. So, to avoid +inconsistent behavior, "cleanFromFile" will only be used if "clean" is +also configured, and "smudgeToFile" will only be used if "smudge" is also +configured. + +An example large file storage filter driver using cleanFromFile and +sm
[PATCH v2 4/4] warn on unusable smudgeToFile/cleanFromFile config
Let the user know when they have a smudgeToFile/cleanFromFile config that cannot be used because the corresponding smudge/clean config is missing. The warning is only displayed a maximum of once per git invocation, and only when doing an operation that would use the filter. Signed-off-by: Joey Hess <jo...@joeyh.name> --- convert.c | 34 ++ 1 file changed, 26 insertions(+), 8 deletions(-) diff --git a/convert.c b/convert.c index bf63ba0..84f6bc5 100644 --- a/convert.c +++ b/convert.c @@ -847,32 +847,50 @@ int would_convert_to_git_filter_fd(const char *path) return apply_filter(path, NULL, NULL, 0, -1, NULL, ca.drv->clean); } +static int can_filter_file(const char *filefilter, const char *filefiltername, + const char *stdiofilter, const char *stdiofiltername, + const struct conv_attrs *ca, + int *warncount) +{ + if (! filefilter) + return 0; + + if (stdiofilter) + return 1; + + if (*warncount == 0) + warning("Not running your configured filter.%s.%s command, because filter.%s.%s is not configured", + ca->drv->name, filefiltername, + ca->drv->name, stdiofiltername); + *warncount=*warncount+1; + + return 0; +} + int can_clean_from_file(const char *path) { struct conv_attrs ca; + static int warncount = 0; convert_attrs(, path); if (!ca.drv) return 0; - /* Only use the cleanFromFile filter when the clean filter is also -* configured. -*/ - return (ca.drv->clean_from_file && ca.drv->clean); + return can_filter_file(ca.drv->clean_from_file, "cleanFromFile", + ca.drv->clean, "clean", , ); } int can_smudge_to_file(const char *path) { struct conv_attrs ca; + static int warncount = 0; convert_attrs(, path); if (!ca.drv) return 0; - /* Only use the smudgeToFile filter when the smudge filter is also -* configured. -*/ - return (ca.drv->smudge_to_file && ca.drv->smudge); + return can_filter_file(ca.drv->smudge_to_file, "smudgeToFile", + ca.drv->smudge, "smudge", , ); } const char *get_convert_attr_ascii(const char *path) -- 2.8.1 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/4] use cleanFromFile in git add
Includes test cases. Signed-off-by: Joey Hess <jo...@joeyh.name> --- sha1_file.c | 42 -- t/t0021-conversion.sh | 36 2 files changed, 72 insertions(+), 6 deletions(-) diff --git a/sha1_file.c b/sha1_file.c index d5e1121..8df86a0 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -3329,6 +3329,29 @@ static int index_stream_convert_blob(unsigned char *sha1, int fd, return ret; } +static int index_from_file_convert_blob(unsigned char *sha1, + const char *path, unsigned flags) +{ + int ret; + const int write_object = flags & HASH_WRITE_OBJECT; + struct strbuf sbuf = STRBUF_INIT; + + assert(path); + assert(can_clean_from_file(path)); + + convert_to_git_filter_from_file(path, , +write_object ? safe_crlf : SAFE_CRLF_FALSE); + + if (write_object) + ret = write_sha1_file(sbuf.buf, sbuf.len, typename(OBJ_BLOB), + sha1); + else + ret = hash_sha1_file(sbuf.buf, sbuf.len, typename(OBJ_BLOB), +sha1); + strbuf_release(); + return ret; +} + static int index_pipe(unsigned char *sha1, int fd, enum object_type type, const char *path, unsigned flags) { @@ -3421,12 +3444,19 @@ int index_path(unsigned char *sha1, const char *path, struct stat *st, unsigned switch (st->st_mode & S_IFMT) { case S_IFREG: - fd = open(path, O_RDONLY); - if (fd < 0) - return error_errno("open(\"%s\")", path); - if (index_fd(sha1, fd, st, OBJ_BLOB, path, flags) < 0) - return error("%s: failed to insert into database", -path); + if (can_clean_from_file(path)) { + if (index_from_file_convert_blob(sha1, path, flags) < 0) + return error("%s: failed to insert into database", +path); + } + else { + fd = open(path, O_RDONLY); + if (fd < 0) + return error_errno("open(\"%s\")", path); + if (index_fd(sha1, fd, st, OBJ_BLOB, path, flags) < 0) + return error("%s: failed to insert into database", +path); + } break; case S_IFLNK: if (strbuf_readlink(, path, st->st_size)) diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh index 7bac2bc..399f92b 100755 --- a/t/t0021-conversion.sh +++ b/t/t0021-conversion.sh @@ -12,6 +12,14 @@ tr \ EOF chmod +x rot13.sh +cat <rot13-from-file.sh +#!$SHELL_PATH +fsfile="\$1" +touch rot13-from-file.ran +cat "\$fsfile" | ./rot13.sh +EOF +chmod +x rot13-from-file.sh + test_expect_success setup ' git config filter.rot13.smudge ./rot13.sh && git config filter.rot13.clean ./rot13.sh && @@ -268,4 +276,32 @@ test_expect_success 'disable filter with empty override' ' test_must_be_empty err ' +test_expect_success 'cleanFromFile filter is used when adding a file' ' + test_config filter.rot13.cleanFromFile ./rot13-from-file.sh && + + echo "*.t filter=rot13" >.gitattributes && + + cat test.t >fstest.t && + git add fstest.t && + test -e rot13-from-file.ran && + rm -f rot13-from-file.ran && + + rm -f fstest.t && + git checkout -- fstest.t && + cmp test.t fstest.t +' + +test_expect_success 'cleanFromFile filter is not used when clean filter is not configured' ' + test_config filter.noclean.smudge ./rot13.sh && + test_config filter.noclean.cleanFromFile ./rot13-from-file.sh && + + echo "*.no filter=noclean" >.gitattributes && + + cat test.t >test.no && + git add test.no && + test ! -e rot13-from-file.ran && + git cat-file blob :test.no >actual && + cmp test.t actual +' + test_done -- 2.8.1 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/4] extend smudge/clean filters with direct file access
Reroll of this patch set with changes: * Renamed the new filter drivers for consistency with other configs. * Improved documentation with feedback from Junio and others. * Eliminated %p and instead append the filename to the commands (separated by a space). * Fixed an FD leak and a space leak. * Only use smudgeToFile with regular files, not symlinks. * After running the smudgeToFile command, double-check that the expected file is present, in case the command was buggy and deleted it. * Added a warning message when the new filter commands are configured but the old ones are not, so that the user knows it's refusing to use their configuration. There's been good and helpful documentation and interface review, but some more code review would be good! Also, git-annex has a improved-smudge-filters branch now that demonstrates this interface. Joey Hess (4): add smudgeToFile and cleanFromFile filter configs use cleanFromFile in git add use smudgeToFile in git checkout etc warn on unusable smudgeToFile/cleanFromFile config Documentation/config.txt| 18 +- Documentation/gitattributes.txt | 37 convert.c | 126 +++- convert.h | 10 entry.c | 37 +--- sha1_file.c | 42 -- t/t0021-conversion.sh | 64 7 files changed, 304 insertions(+), 30 deletions(-) -- 2.8.1 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] add smudge-to-file and clean-from-file filter configuration
Junio C Hamano wrote: > Would an interface that always appends the pathname at the end of > the command line string work? One problem with this is that "appends" is subtly unclear in this case. With the example of smugeToFile = cmd --to-file it seems that a space should be added by git before the filename. On the other handle, consider smugeToFile = cmd --to-file= here a space is not wanted before the filename. So, either a space is automatically included before the filename and the second example breaks, or no space is included, and to make the first example work would need careful inclusion of the trailing space with quoting to prevent it being elided eg, smugeToFile = "cmd --to-file " %p does avoid this ambiguity. But as Junio noted, %p is mandatory in the command for it to possibly work. Git could refuse to use smugeToFile = cmd as not containing a %p and so not possibly being able to work. Or we could pick one of the two methods of appending the file (I prefer not including a space before it as more flexible), and anything using this interface would need to design its command line parsing with this interface in mind, and would probably choose to use --to-file=foo rather than --to-file foo. -- see shy jo signature.asc Description: PGP signature
Re: [PATCH 2/4] add smudge-to-file and clean-from-file filter configuration
Junio C Hamano wrote: > Would an interface that always appends the pathname at the end of > the command line string work? I'm ok with this, and like getting rid of %p as it's not distinguishable from %f without reading the documentation. The sh -c trick can of course be used if some other ordering of parameters is needed. Probably anything using this interface is gonna be implemented with the interface in mind from the beginning and won't need such a trick. -- see shy jo signature.asc Description: PGP signature
Re: [PATCH 2/4] add smudge-to-file and clean-from-file filter configuration
Junio C Hamano wrote: > There is what we would want to fix, though. "worktree file" should > be spelled "working tree file". This used not to matter before "git > worktree" was invented (before that we used these two terms > interchangeably), but these days the distinction matters. The existing documentation that I am patching uses the term "worktree file" which is why I continued to use that wording. (Unless this is a documentation transition that you want to happen peicemeal as documentation is updated for other reasons?) > > +filter..clean-from-file:: > > Documentation/config.txt hopefully lists all the configuration, but > I do not see anything that uses 'words-joined-with-dash' format. > Please do not invent new out-of-convention names. Point taken; I'll use cleanFromFile and smudgeToFile. Here's a revised version of the documentation that I think takes the other suggestions onboard. I emphasise that clean and smudge operate as filters, to contrast better with cleanFromFile and smudgeToFile not operating as regular stdio filters. filter..clean:: - The command which is used to convert the content of a worktree + The command which is used as a filter to convert the content of a worktree file to a blob upon checkin. See linkgit:gitattributes[5] for details. filter..smudge:: - The command which is used to convert the content of a blob + The command which is used as a filter to convert the content of a blob object to a worktree file upon checkout. See linkgit:gitattributes[5] for details. +filter..cleanFromFile:: + Similar to filter..clean but the specified command + directly accesses a worktree file on disk, rather than + receiving the file content from standard input. + In the command, "%p" is replaced with the name of the file. + Only used when filter..clean is also configured. + See linkgit:gitattributes[5] for details. + +filter..smudgeToFile:: + Similar to filter..smudge but the specified command + writes the content of a blob directly to a worktree file, + rather than to standard output. + In the command, "%p" is replaced with the name of the file. + Only used when filter..smudge is also configured. + See linkgit:gitattributes[5] for details. + This could be extended more, but I think this should describe the config settings concisely and point to the more involved discussion of filter drivers in gitattributes. -- see shy jo -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4] add smudge-to-file and clean-from-file filter configuration
Michael J Gruber wrote: > I'm not sure this will save all feet. I foresee "why is smudge-to-file > not doing anything" reports... It could display a warning message if smudge-to-file is set and smudge is not. > In addition, it opens the way to doing completely different things in > smudge and smudge-to-file - which partly is intentional, of course. They can be implemented very differently, but need to provide the same file content. Otherwise git checkout and git diff would show different content for the same file, for example. This is up to the implementor of a filter driver to get right. > Do you make any promises that %p is a seekable file? Yes, %p is a regular file and so is seekable, statable, etc. -- see shy jo signature.asc Description: PGP signature
Re: [PATCH 1/4] clarify %f documentation
Junio C Hamano wrote: > "tracked by Git" is not all that interesting, compared to the fact > that your filter needs to give contents relevant to that path > because that is what the command line argument Git gives you with > '%f' means. It is not a random filename "tracked by Git". Among 47 > other files tracked by Git, the single one being given is the one > the code that drives the filter is WORKING ON, and I think that > needs to be written in the description, hence "the path that is > being worked on" was my suggestion. Ah, "being worked on" does clarify it well, I think: + Note that "%f" is the name of the path that is being worked on. Depending + on the version that is being filtered, the corresponding file on disk may + not exist, or may have different contents. So, smudge and clean commands + should not try to access the file on disk, but only act as filters on the + content provided to them on standard input. -- see shy jo signature.asc Description: PGP signature
Re: [PATCH 1/4] clarify %f documentation
Junio C Hamano wrote: > I agree that "the name of the file" can be interpreted in many ways, > and I agree that it would be a good idea to find a better phrase to > name the path that is being worked on, but I do not think "the file > in the git repository" is that phrase. > I think using the word "path" somewhere in the updated description > is more likely to have the effect you desire. "path" is also very ambiguous. I see that "tracked" is often used to describe what %f is, so how about: + Note that "%f" is the name of a file tracked by Git. Depending on the + version that is being filtered, the corresponding file on disk may not + exist, or may have different contents. So, smudge and clean commands should + not try to access the file on disk. -- see shy jo signature.asc Description: PGP signature
Re: [PATCH 4/4] use smudge-to-file in git checkout etc
Joey Hess wrote: > + int smudge_to_file = can_smudge_to_file(ce->name); > if (ce_mode_s_ifmt == S_IFREG && > + ! smudge_to_file && > convert_to_working_tree(ce->name, new, size, )) { > free(new); > new = strbuf_detach(, ); > @@ -189,13 +193,29 @@ static int write_entry(struct cache_entry *ce, > + if (! can_smudge_to_file(ce->name)) { > + } > + else { > + close(fd); > + convert_to_working_tree_filter_to_file(ce->name, path, > new, size); Oops, I had meant to avoid using smudge-to-file when e_mode_s_ifmt != S_IFREG, and forgot it in this patch, so it does the wrong thing for symlinks. I'll send an updated patch set fixing this after any other review. -- see shy jo -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] use smudge-to-file in git checkout etc
This makes git checkout, git reset, etc use smudge-to-file. Includes test cases. (There's a call to convert_to_working_tree in merge-recursive.c that could also be made to use smudge-to-file as well.) Signed-off-by: Joey Hess <jo...@joeyh.name> --- entry.c | 34 +++--- t/t0021-conversion.sh | 44 2 files changed, 63 insertions(+), 15 deletions(-) diff --git a/entry.c b/entry.c index 519e042..6a23159 100644 --- a/entry.c +++ b/entry.c @@ -175,8 +175,12 @@ static int write_entry(struct cache_entry *ce, /* * Convert from git internal format to working tree format +* unless the smudge-to-file filter can write to the +* file directly. */ + int smudge_to_file = can_smudge_to_file(ce->name); if (ce_mode_s_ifmt == S_IFREG && + ! smudge_to_file && convert_to_working_tree(ce->name, new, size, )) { free(new); new = strbuf_detach(, ); @@ -189,13 +193,29 @@ static int write_entry(struct cache_entry *ce, return error_errno("unable to create file %s", path); } - wrote = write_in_full(fd, new, size); - if (!to_tempfile) - fstat_done = fstat_output(fd, state, ); - close(fd); - free(new); - if (wrote != size) - return error("unable to write file %s", path); + if (! can_smudge_to_file(ce->name)) { + wrote = write_in_full(fd, new, size); + if (!to_tempfile) + fstat_done = fstat_output(fd, state, ); + close(fd); + free(new); + if (wrote != size) + return error("unable to write file %s", path); + } + else { + close(fd); + convert_to_working_tree_filter_to_file(ce->name, path, new, size); + free(new); + if (!to_tempfile) { + /* Re-open the file to stat it; the +* smudge-to-file filter may have replaced +* the file. */ + fd = open(path, O_RDONLY); + if (fd < 0) { + return error_errno("unable to create file %s", path); + } + } + } break; case S_IFGITLINK: if (to_tempfile) diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh index 1043ea5..b0e2e5e 100755 --- a/t/t0021-conversion.sh +++ b/t/t0021-conversion.sh @@ -14,12 +14,20 @@ chmod +x rot13.sh cat <rot13-from-file.sh #!$SHELL_PATH -fsfile="\$1" +srcfile="\$1" touch rot13-from-file.ran -cat "\$fsfile" | ./rot13.sh +cat "\$srcfile" | ./rot13.sh EOF chmod +x rot13-from-file.sh +cat <rot13-to-file.sh +#!$SHELL_PATH +destfile="\$1" +touch rot13-to-file.ran +./rot13.sh > "\$destfile" +EOF +chmod +x rot13-to-file.sh + test_expect_success setup ' git config filter.rot13.smudge ./rot13.sh && git config filter.rot13.clean ./rot13.sh && @@ -291,17 +299,37 @@ test_expect_success 'clean-from-file filter is used when adding a file' ' cmp test.t fstest.t ' +test_expect_success 'smudge-to-file filter is used when checking out a file' ' + test_config filter.rot13.smudge-to-file "./rot13-to-file.sh %p" && + + rm -f fstest.t && + git checkout -- fstest.t && + cmp test.t fstest.t && + + test -e rot13-to-file.ran && + rm -f rot13-to-file.ran +' + test_expect_success 'clean-from-file filter is not used when clean filter is not configured' ' - test_config filter.no.smudge ./rot13.sh && - test_config filter.no.clean-from-file "./rot13-from-file.sh %p" && + test_config filter.noclean.smudge ./rot13.sh && + test_config filter.noclean.clean-from-file "./rot13-from-file.sh %p" && - echo "*.no filter=no" >.gitattributes && + echo "*.no filter=noclean" >.gitattributes && cat test.t >test.no && git add test.no && - test ! -e rot13-from-file.ran && - git cat-file blob :test.no >actual && - cmp test.t actual + test ! -e rot13-from-file.ran +' + +test_expect_success 'smudge-to
[PATCH 0/4] extend smudge/clean filters with direct file access
As discussed in this thread: http://thread.gmane.org/gmane.comp.version-control.git/294425 This adds smudge-to-file and clean-from-file commands supplimenting the smudge and clean filters. This interface can be much more efficient when operating on large files, because the whole file content does not need to be streamed through the filter. It even allows for things like clean-from-file commands that avoid reading the whole content of the file, and for smudge-to-file commands that populate a work tree file using an efficient Copy On Write operation. Joey Hess (4): clarify %f documentation add smudge-to-file and clean-from-file filter configuration use clean-from-file in git add use smudge-to-file in git checkout etc Documentation/config.txt| 27 +++--- Documentation/gitattributes.txt | 44 - convert.c | 107 ++-- convert.h | 10 entry.c | 34 ++--- sha1_file.c | 42 +--- t/t0021-conversion.sh | 64 7 files changed, 292 insertions(+), 36 deletions(-) -- 2.9.0.4.g2856e74.dirty -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] clarify %f documentation
It's natural to expect %f to be an actual file on disk; help avoid that mistake. Signed-off-by: Joey Hess <jo...@joeyh.name> --- Documentation/gitattributes.txt | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index e3b1de8..e077cc9 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -365,8 +365,8 @@ you can declare that the filter is `required`, in the configuration: Sequence "%f" on the filter command line is replaced with the name of -the file the filter is working on. A filter might use this in keyword -substitution. For example: +the file in the git repository the filter is working on. +A filter might use this in keyword substitution. For example: [filter "p4"] @@ -374,6 +374,9 @@ substitution. For example: smudge = git-p4-filter --smudge %f +Note that the "%f" is the name of a file in the git repository; the +corresponding file on disk may not exist, or may have unrelated contents to +what git is filtering. Interaction between checkin/checkout attributes ^^^ -- 2.9.0.4.g2856e74.dirty -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] add smudge-to-file and clean-from-file filter configuration
This adds new smudge-to-file and clean-from-file filter commands, which are similar to smudge and clean but allow direct access to files on disk. In smudge-to-file and clean-from-file, "%p" is expanded to the path to the file that should be cleaned from, or smudged to. This interface can be much more efficient when operating on large files, because the whole file content does not need to be streamed through the filter. It even allows for things like clean-from-file commands that avoid reading the whole content of the file, and for smudge-to-file commands that populate a work tree file using an efficient Copy On Write operation. The new filter commands will not be used for all filtering. They are efficient to use when git add is adding a file, or when the work tree is being updated, but not a good fit when git is internally filtering blob objects in memory for eg, a diff. So, a user who wants to use smudge-to-file should also provide a smudge command to be used in cases where smudge-to-file is not used. And ditto with clean-from-file and clean. To avoid foot-shooting configurations, the new commands are not used unless the old commands are also configured. That also ensures that a filter driver configuration that includes these new commands will work, although less efficiently, when used with an older version of git that does not support them. Signed-off-by: Joey Hess <jo...@joeyh.name> --- Documentation/config.txt| 27 +++--- Documentation/gitattributes.txt | 37 ++ convert.c | 107 ++-- convert.h | 10 4 files changed, 160 insertions(+), 21 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 2e1b2e4..bbb9296 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1299,14 +1299,29 @@ format.useAutoBase:: format-patch by default. filter..clean:: - The command which is used to convert the content of a worktree - file to a blob upon checkin. See linkgit:gitattributes[5] for - details. + The command which is used on checkin to convert the content of + a worktree file (provided on stdin) to a blob (written to stdout). + See linkgit:gitattributes[5] for details. filter..smudge:: - The command which is used to convert the content of a blob - object to a worktree file upon checkout. See - linkgit:gitattributes[5] for details. + The command which is used on checkout to convert the content of a + blob object (provided on stdin) to a worktree file (written to + stdout). + See linkgit:gitattributes[5] for details. + +filter..clean-from-file:: + Optional command which is used on checkin to convert the content + of a worktree file, which can be read from disk, to a blob + (written to stdout). + Only used when filter..clean is also configured. + See linkgit:gitattributes[5] for details. + +filter..smudge-to-file:: + Optional command which is used to convert the content of a blob + object (provided on stdin) to a worktree file, writing directly + to the file. + Only used when filter..clean is also configured. + See linkgit:gitattributes[5] for details. fsck.:: Allows overriding the message type (error, warn or ignore) of a diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index e077cc9..32621e7 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -378,6 +378,43 @@ Note that the "%f" is the name of a file in the git repository; the corresponding file on disk may not exist, or may have unrelated contents to what git is filtering. +There are two extra commands "clean-from-file" and "smudge-to-file", which +can optionally be set in a filter driver. These are similar to the "clean" +and "smudge" commands, but avoid needing to pipe the contents of files +through the filters, and instead reading/writing files in the filesystem. +This can be more efficient when using filters with large files that are not +directly stored in the repository. + +Sequence "%p" on the "clean-from-file" and "smudge-to-file" command line +is replaced with the path to a file that the filter can access. + +In the "clean-from-file" command, "%p" is the path to the file that +it should clean. Like the "clean" command, it should output the cleaned +version to standard output. + +In the "smudge-from-file" command, "%p" is the path to the file that it +should write to. (This file will already exist, as an empty file that can +be written to or replaced.) Like the "smudge" command, "smudge-from-file" +is fed the blob object from its standard input. + +Some git operations that need to apply filters
[PATCH 3/4] use clean-from-file in git add
Includes test cases. Signed-off-by: Joey Hess <jo...@joeyh.name> --- sha1_file.c | 42 -- t/t0021-conversion.sh | 36 2 files changed, 72 insertions(+), 6 deletions(-) diff --git a/sha1_file.c b/sha1_file.c index d5e1121..8df86a0 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -3329,6 +3329,29 @@ static int index_stream_convert_blob(unsigned char *sha1, int fd, return ret; } +static int index_from_file_convert_blob(unsigned char *sha1, + const char *path, unsigned flags) +{ + int ret; + const int write_object = flags & HASH_WRITE_OBJECT; + struct strbuf sbuf = STRBUF_INIT; + + assert(path); + assert(can_clean_from_file(path)); + + convert_to_git_filter_from_file(path, , +write_object ? safe_crlf : SAFE_CRLF_FALSE); + + if (write_object) + ret = write_sha1_file(sbuf.buf, sbuf.len, typename(OBJ_BLOB), + sha1); + else + ret = hash_sha1_file(sbuf.buf, sbuf.len, typename(OBJ_BLOB), +sha1); + strbuf_release(); + return ret; +} + static int index_pipe(unsigned char *sha1, int fd, enum object_type type, const char *path, unsigned flags) { @@ -3421,12 +3444,19 @@ int index_path(unsigned char *sha1, const char *path, struct stat *st, unsigned switch (st->st_mode & S_IFMT) { case S_IFREG: - fd = open(path, O_RDONLY); - if (fd < 0) - return error_errno("open(\"%s\")", path); - if (index_fd(sha1, fd, st, OBJ_BLOB, path, flags) < 0) - return error("%s: failed to insert into database", -path); + if (can_clean_from_file(path)) { + if (index_from_file_convert_blob(sha1, path, flags) < 0) + return error("%s: failed to insert into database", +path); + } + else { + fd = open(path, O_RDONLY); + if (fd < 0) + return error_errno("open(\"%s\")", path); + if (index_fd(sha1, fd, st, OBJ_BLOB, path, flags) < 0) + return error("%s: failed to insert into database", +path); + } break; case S_IFLNK: if (strbuf_readlink(, path, st->st_size)) diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh index 7bac2bc..1043ea5 100755 --- a/t/t0021-conversion.sh +++ b/t/t0021-conversion.sh @@ -12,6 +12,14 @@ tr \ EOF chmod +x rot13.sh +cat <rot13-from-file.sh +#!$SHELL_PATH +fsfile="\$1" +touch rot13-from-file.ran +cat "\$fsfile" | ./rot13.sh +EOF +chmod +x rot13-from-file.sh + test_expect_success setup ' git config filter.rot13.smudge ./rot13.sh && git config filter.rot13.clean ./rot13.sh && @@ -268,4 +276,32 @@ test_expect_success 'disable filter with empty override' ' test_must_be_empty err ' +test_expect_success 'clean-from-file filter is used when adding a file' ' + test_config filter.rot13.clean-from-file "./rot13-from-file.sh %p" && + + echo "*.t filter=rot13" >.gitattributes && + + cat test.t >fstest.t && + git add fstest.t && + test -e rot13-from-file.ran && + rm -f rot13-from-file.ran && + + rm -f fstest.t && + git checkout -- fstest.t && + cmp test.t fstest.t +' + +test_expect_success 'clean-from-file filter is not used when clean filter is not configured' ' + test_config filter.no.smudge ./rot13.sh && + test_config filter.no.clean-from-file "./rot13-from-file.sh %p" && + + echo "*.no filter=no" >.gitattributes && + + cat test.t >test.no && + git add test.no && + test ! -e rot13-from-file.ran && + git cat-file blob :test.no >actual && + cmp test.t actual +' + test_done -- 2.9.0.4.g2856e74.dirty -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
merge committing staged deletions?
I have a case where git merge seems to include staged deletions into the merge commit. This seems pretty surprising, dunno if it's a bug. joey@darkstar:~/tmp/x/1>git rm 1 foo joey@darkstar:~/tmp/x/1>git status On branch master Changes to be committed: (use "git reset HEAD ..." to unstage) deleted:1 deleted:foo joey@darkstar:~/tmp/x/1>git merge refs/heads/synced/master --no-ff Already up-to-date! Merge made by the 'recursive' strategy. 1 | 1 - foo | 1 - 2 files changed, 2 deletions(-) delete mode 100644 1 delete mode 100644 foo I thought that a merge would leave staged changes alone, unless they conflict in some way with the changes merged in. So why is merge looking at the staged deletions in this case? I'm using --no-ff because the commit being merged is itself a merge of HEAD and another commit. HEAD and the commit being merged in fact have the same tree, so the right merge solution, AFAICS, would be to keep that tree. I've attached a 1 kb git bundle that you can clone to reproduce this: git clone bundle b cd b git rm 1 git merge remotes/origin/synced/master --no-ff git version 2.8.1 -- see shy jo bundle Description: Binary data
Re: GIT_INDEX_FILE relative path breaks in subdir
Junio C Hamano wrote: > I personally think that it would be OK as long as we do not change > behaviours for those who do not use core.worktree, $GIT_DIR and/or > $GIT_WORK_TREE and change behaviour for others to match that > behaviour, simply because the plain vanilla no-configuration would > be used by the largest number of people. But depending on the size > of the "minority", you may get pushback from them. The minority affected would I think be limited to: 1. People who have a configuration that *always* sets core.worktree etc, and used GIT_INDEX_FILE with a cwd-relative path and it just happened to work for them. 2. People who have gone through the learning curve I've suffered about how relative GIT_INDEX_FILE works, and put in a workaround that's not the obvious "use an absolute path" workaround, but is more complex. Like, checking the git configs and adjusting what the variable is set relative to. And apparently nobody in this set felt worth mentioning this to the list.. I am unsure about the size of 1, and am sure 2 is small to nonexistant. > I am not sure if relative to PWD is useful. If it were relative to > either the GIT_DIR or the GIT_WORK_TREE, i.e. a fixed point, then > you can set and export GIT_INDEX_FILE and chdir around without > having to adjust it. If it were relative to PWD, you would need to > adjust it every time you chdir, no? Good point, I had not considered this use case. Although if I set a long-term environment variable and expect to chdir around, well, that's what absolute paths are for. All my uses of GIT_INDEX_FILE are in short-term contexts where the program does not chdir. -- see shy jo signature.asc Description: PGP signature
Re: GIT_INDEX_FILE relative path breaks in subdir
Junio C Hamano wrote: > Joey Hess <i...@joeyh.name> writes: > > > This seems to make it basically impossible for any program that wants to > > use GIT_INDEX_FILE to use anything other than an absolute path; > > there are too many configurations to keep straight that could change how > > git interprets what should be a simple relative path to a file. > > Thanks for digging. Perhaps this needs to be documented (not "in > this case it is take as relative to that, in this other case, ...", > but "you cannot rely on relative being relative to something you > think"). Documenting it that way feels like ¯\_(ツ)_/¯ I feel it should be made consistently relative to top of work tree. Seems fairly unlikely that any scripts driving git rely on it being relative to the pwd when GIT_WORK_TREE etc is set. (I'd prefer relative to pwd because that is much more sane IMHO, but making that change is more likely to break something.) -- see shy jo signature.asc Description: PGP signature
Re: GIT_INDEX_FILE relative path breaks in subdir
This is actually worse than I thought; when git is being run with a detached work tree, GIT_INDEX_FILE is treated as a path relative to CWD, instead of the normal behavior of relative the top of the work tree. Normal and expected (according to this thread anyway): joey@darkstar:~/src/other/git/Documentation> input="100644 8a1218a1024a212bb3db30becd860315f9f3ac52 1frotz" joey@darkstar:~/src/other/git/Documentation> echo "$input" | GIT_INDEX_FILE=.git/newindex git update-index --index-info joey@darkstar:~/src/other/git/Documentation> ls ../.git/newindex ../.git/newindex Fully detached worktree and git dir: joey@darkstar:/> echo "$input" | GIT_INDEX_FILE=.git/index git --git-dir=/home/joey/src/other/git/.git --work-tree=/home/joey/src/other/git update-index --index-info fatal: Unable to create '/.git/index.lock': No such file or directory joey@darkstar:/> echo "$input" | GIT_WORK_TREE=/home/joey/src/other/git GIT_DIR=/home/joey/src/other/git/.git GIT_INDEX_FILE=.git/newindex git update-index --index-info fatal: Unable to create '/.git/newindex.lock': No such file or directory Work tree detached via git configuration: joey@darkstar:~/src/other/git> git config core.worktree /home/joey/src/other/git/tmptree joey@darkstar:~/src/other/git> mkdir tmptree joey@darkstar:~/src/other/git> cd tmptree/ joey@darkstar:~/src/other/git/tmptree> mkdir Documentation joey@darkstar:~/src/other/git/tmptree> cd Documentation/ joey@darkstar:~/src/other/git/tmptree/Documentation> echo "$input" | GIT_INDEX_FILE=.git/index git update-index --index-info fatal: Unable to create '/home/joey/src/other/git/tmptree/.git/index.lock': No such file or directory This seems to make it basically impossible for any program that wants to use GIT_INDEX_FILE to use anything other than an absolute path; there are too many configurations to keep straight that could change how git interprets what should be a simple relative path to a file. -- see shy jo signature.asc Description: PGP signature
Re: GIT_INDEX_FILE relative path breaks in subdir
Junio C Hamano wrote: > Joey Hess <i...@joeyh.name> writes: > > > Appears to be a bug in git. Seems that it's assuming GIT_INDEX_FILE is > > relative to the top of the worktree and not to the CWD. > > I think that has always been the case. You can always specify it as > relative to the top. Of course, you can use absolute. I think it *has* always been that way, but is there anything in the documentation that indicates this is the case? This could well be best fixed in the documentation. Hmm, at least GIT_OBJECT_DIRECTORY also behaves this way. (OTOH, GIT_DIR does not behave this way). -- see shy jo signature.asc Description: PGP signature
GIT_INDEX_FILE relative path breaks in subdir
joey@darkstar:/tmp>git init test Initialized empty Git repository in /tmp/test/.git/ joey@darkstar:/tmp>cd test joey@darkstar:/tmp/test>mkdir sub joey@darkstar:/tmp/test>cd sub joey@darkstar:/tmp/test/sub>GIT_INDEX_FILE=../.git/otherindex git write-tree fatal: Unable to create '/tmp/test/../.git/otherindex.lock': No such file or directory Appears to be a bug in git. Seems that it's assuming GIT_INDEX_FILE is relative to the top of the worktree and not to the CWD. Workaround: Use absolute path to the index file. joey@darkstar:/tmp/test/sub>GIT_INDEX_FILE=`pwd`/../.git/otherindex git write-tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904 joey@darkstar:/tmp/test/sub>ls ../.git/otherindex ../.git/otherindex git version 2.8.1 -- see shy jo signature.asc Description: PGP signature
Re: proposal for extending smudge/clean filters with raw file access
Junio C Hamano wrote: > > Secondly, and harder to get around, the filename passed to the clean > > filter is not necessarily a path to the actual existing file that is > > being cleaned. > > Either one of us is confused. I was talking about updating the > current "clean" implementation without changing its interface, > i.e. gets fed via its standard input, expected to respond to its > standard output. There is no filename involved. I'm talking about the %f that can be passed to the clean filter. The context that I left out is that my clean filter could avoid reading all of stdin, and quickly produce the cleaned result, but only if it can examine the file that's being cleaned. Which is not currently entirely safe to use the %f for. There may be a way to make a clean filter that can do something useful without reading all of stdin, and without examining the file that's being cleaned. Maybe. Hard to see how. I don't feel such a hypothetical clean filter is worth changing the current EPIPE behavior to support. So I think it's better to add a separate clean-from-fs and keep the current clean filter interface as it stands. -- see shy jo signature.asc Description: PGP signature
Re: proposal for extending smudge/clean filters with raw file access
Junio C Hamano wrote: > The smudge happens to be the last to run, so it is quite true that > it can say "Hey Git, I've written it out already". > > I didn't check all codepaths to ensure that we won't need the > smudged result in core at all, but I am guessing you did so before > making this proposal, so in that case I would say this sounds fine. Well, the idea is to only use smudge-to-file when the smudged content is going to be written out to a file. Any other code paths that need to smudge some content would use the smudge filter. So, try_create_file would use it. Maybe some other places I have not found yet could as well. -- see shy jo signature.asc Description: PGP signature
Re: proposal for extending smudge/clean filters with raw file access
Junio C Hamano wrote: > This side, I do not think we even need a new variant. We can just > update the code to interact with "clean" so that it the writer to > the pipe ignores SIGPIPE, detects EPIPE on write(2), says "ah, the > other end does not need the full input to produce its output". The > reader from the pipe consumes its output without failing AS LONG AS > the "clean" filter exits with zero (we do check its exit status, > right?) There are two problems with doing that. First, any clean filter that relied on that would not work with older versions of git. Secondly, and harder to get around, the filename passed to the clean filter is not necessarily a path to the actual existing file that is being cleaned. For example, git hash-object --stdin --path=whatever. So the current clean filter can't really safely rely on accessing the file to short-circuit its cleaning. (Although it seems to mostly work.. currently..) > We cannot do a similar "we can just unconditionally update" like I > said on the "clean" side to "smudge", so it would need a new > variant. I do not want to call it anything "raw", as there is > nothing "raw" about it. "smudge-to-fs" would be a better name. "raw" was just a placeholder. "clean-from-fs" and "smudge-to-fs" are pretty descriptive. -- see shy jo signature.asc Description: PGP signature
proposal for extending smudge/clean filters with raw file access
I'm using smudge/clean filters in git-annex now, and it's not been an entirely smooth fit between the interface and what git-annex wants to do. The clean filter has to consume the whole file content on stdin; not reading it all will make git think the clean filter failed. But, git-annex often doesn't need to read the whole content of a work-tree file in order to clean it. The smudge filter has to output the whole file content to stdout. But git-annex often has the file's content on disk already, and could just move it into place in the working tree. This would save CPU and IO and often disk space too. But the smudge interface doesn't let git-annex use the efficient approach. So I propose extending the filter driver with two more optional commands. Call them raw-clean and raw-smudge for now. raw-clean would be like clean, but rather than being fed the whole content of a large file on stdin, it would be passed the filename, and can access the file itself. Like the clean filter, it outputs the cleaned version on stdout. raw-smudge would be like smudge, but rather than needing to output the whole content of a large file on stdout, it would be passed a filename, and can create that file itself. To keep this backwards compatible, and to handle the cases where the object being filtered is not a file on disk, the smudge and clean filters would be required to be configured too, in order for raw-clean and raw-smudge to be used. It seems fairly easy to implement raw-clean. In sha1_file.c, index_path would use raw-clean when available, while index_fd etc keep on using the clean filter. I have not investigated what would be needed to implement raw-smudge yet. -- see shy jo -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 'next'ed --allow-unrelated-histories could cause lots of grief
Yaroslav Halchenko wrote: > - for git-annex (Joey was CCed from the beginning, not sure if annex > would be affected though), it would be merging of git-annex > branches while joining multiple annexes for the sync (e.g. by git > annex assistant). Not entirely accurate (git-annex merges its git-annex branches using a custom merge method and not involving git-merge). The actual use case is two users (or one user with two devices) each with a git-annex repository who decide to share their files by combining the two repositories. This is pretty far from the kernel world, so it's not like bisection is something they care about. However, I also see --allow-unrelated-histories as very useful to prevent many foot-shooting maneuvers. Especially when a repository has special-purpose branches, like git-annex's git-annex branch, or other branches that are never intended to be merged into master. It's a not entirely uncommon mistake for users to merge in such a branch, and the users who make such a mistake often don't know enough git to easily recover from it. Junio C Hamano wrote: > merge: GIT_MERGE_ALLOW_UNRELATED_HISTORIES environment I hope this patch lands, it will save me a lot of bother. -- see shy jo signature.asc Description: PGP signature
Re: 'next'ed --allow-unrelated-histories could cause lots of grief
Yaroslav Halchenko wrote: > which is planned for the next release. I guess it is indeed a > worthwhile accident-prevention measure BUT not sure if it is so > important as to cause a change in behavior on which some projects using > git through the cmdline interface might have been relying upon for > years! Not only through the command line interface. The git-annex webapp has common use cases that will be broken by this change. > Moreover, it was explicitly stated that "no configuration variable to > enable this by default exists and will not be added", which would cause > 3rd party scripts/code/projects relying on previous behavior to provide > version specific handling (either to add that > --allow-unrelated-histories or not)... very cumbersome! Agreed, a configuration setting that could be passed via -c would be much less cumbersome than checking the version of git in order to only pass the option to git versions that understand it. This would also provide a way to get git pull to allow such merges. Compare with, for example, the change to default to an interactive merge, where GIT_MERGE_AUTOEDIT=no was provided to ease compatability. -- see shy jo signature.asc Description: PGP signature
Re: Migrating away from SHA-1?
Theodore Ts'o wrote: > OK, so how does this map to git? First of all, from a collision > perspective, the two blobs have to map into valid C code Git provides other places to hide the colliding blobs; the best seems to be as an added header in the commit object, or as trailing data after a \0 in the commit message. git is very good at hiding such potentially colliding data from the user, as https://github.com/joeyh/supercollider demonstrates. commit 24f30db5790b209fa412ce81c5ef2bf8af5fd4d7 Author: Joey Hess <j...@kitenet.net> Date: Fri Sep 9 11:49:21 2011 -0400 an innocent commit If this were a sha1 colliding attack, there would be some sort of binary garbage below. Which there isn't. So this can be safely merged. joey@darkstar:~/tmp/supercollider>git cat-file -p 24f30db5790b209fa412ce81c5ef2bf8af5fd4d7 tree 735a7633237c07b398856005de3bc9ea00446747 author Joey Hess <j...@kitenet.net> 1315583361 -0400 committer Joey Hess <j...@kitenet.net> 1315583361 -0400 an innocent commit If this were a sha1 colliding attack, there would be some sort of binary garbage below. Which there isn't. So this can be safely merged. ??b???[?i??ͯ?t? 2??os??Q??H?*zl?RA˂q?E ?E7???\?m???U?>MU GY?d)?ȼ??'g?~D??ɯhQ????/"E??X?m???^??S?D??;w6(?`??>?縘?AѲ?*!??@v>?8??2?!??=*?J ???ynH???c?w?\??K7???N?6?????A5?FM?wZ?~?pKY?R???s7??(?ƶ?_"??m%1a??ʀ??K[ t????!A0?ΈfT.?T?w?ᛵƌ?р???aco?V/2??nَ? ?}?6?_?z?{ (The other possibility would be to hide the colliding blob in the tree object, but that seems unlikely.) -- see shy jo signature.asc Description: PGP signature
bug: git submodule add in of nested submodule handles relative path wrong
joey@darkstar:/tmp/empty>git init sub1 Initialized empty Git repository in /tmp/empty/sub1/.git/ joey@darkstar:/tmp/empty>git init sub2 Initialized empty Git repository in /tmp/empty/sub2/.git/ joey@darkstar:/tmp/empty>cd sub1 joey@darkstar:/tmp/empty/sub1>date > f1 ; git add f1; git commit -m add -q joey@darkstar:/tmp/empty/sub1>cd .. joey@darkstar:/tmp/empty>cd sub2 joey@darkstar:/tmp/empty/sub2>date > f2 ; git add f2; git commit -m add -q joey@darkstar:/tmp/empty/sub2>cd .. joey@darkstar:/tmp/empty>git init repo Initialized empty Git repository in /tmp/empty/repo/.git/ joey@darkstar:/tmp/empty>cd repo joey@darkstar:/tmp/empty/repo>git submodule add ../sub1 1 Cloning into '1'... done. joey@darkstar:/tmp/empty/repo>cd 1 joey@darkstar:/tmp/empty/repo/1>ls f1 joey@darkstar:/tmp/empty/repo/1>git submodule add ../../sub2 2 fatal: repository '/tmp/sub2' does not exist fatal: clone of '/tmp/sub2' into submodule path '2' failed Like the bug I filed yesterday, this is caused by git submodule add's chdir into .git/modules. Workaround is to pass absolute paths to git submodule add. Note that this could be an exploitable security hole under some unusual circumstances. In the example above, any other local user could create a /tmp/sub2 containing something nasty, and git would check it out accidentially. git version 2.7.0 -- see shy jo signature.asc Description: PGP signature
Re: bug: git submodule add fails when .git is a symlink
Junio C Hamano wrote: > A more pertinent question may be which version of Git did the above > ever work, I guess. We fairly liberally chdir around and I do not > think we deliberately avoid assuming that "cd .git && cd .." might > not come back to the original directory, for example, so I wouldn't > be surprised if it never worked. IIRC git used symlinks for .git in submodules before version 1.7.8, so I guess that older versions supported that pretty well. This one case is the only time I've seen a symlink for .git present a problem so far. -- see shy jo signature.asc Description: PGP signature
Re: bug: git submodule add fails when .git is a symlink
Stefan Beller wrote: > To elaborate on that: Starting in 2.7 parts of the submodule stuff > has been rewritten in C, in 2.8 even more and there is more in flight for > > 2.8. > > However your bug is also to be found in 2.6, which doesn't contain any > recent rewrites, so it is a rather long standing bug, I would presume. Yes, I saw it with 2.7.0, but I think the user who reported it was on 2.6. > As a workaround for now: > > echo "gitdir: ../gitdir/.git" > .git Not an option in our particular situation, unfortunately. I wonder if the miscalculated ../../../somedir could cause git to access files outside the git repos? -- see shy jo signature.asc Description: PGP signature
bug: git submodule add fails when .git is a symlink
git init gitdir mkdir worktree cd worktree ln -s ../gitdir/.git .git git submodule add /any/git/repo sub fatal: Could not chdir to '../../../sub': No such file or directory Fairly sure this is a bug.. -- see shy jo signature.asc Description: PGP signature
update index mtime etc metadata
Is there any available plumbing that can change the mtime etc metadata that is recorded in the index for a file, to user-provided values? Or, to force the current file stat metadata to be updated in the index? I know, git update-index --refresh, but I have a case where that's too expensive. I'm using smudge filters; I know that the cleaned version of the file will be unchanged from what's in the index now and only the stat metadata will change, and so I want to avoid git update-index --refresh running the clean filter, which can be quite expensive for a large file. At the moment I don't see a way to do it other than using eg libgit2 to update the appropriate fields in the index structure. -- see shy jo signature.asc Description: PGP signature
git pull --upload-pack reversion in git 2.5.0
In git 2.1.4, I can run: git pull --upload-pack 'echo --foo' This also seems to work in 2.4.6, but in 2.5.0, the option parser does something weird, apparently looking inside the quoted parameter and parsing parameters in there: error: unknown option `foo' usage: git fetch [options] [repository [refspec...]] Needless to say, this broke my use of --upload-pack. -- see shy jo signature.asc Description: Digital signature
Re: git pull --upload-pack reversion in git 2.5.0
I think this comes down to a lack of quoting where git-pull runs git-fetch. Before eb2a8d9ed3fca2ba2f617b704992d483605f3bb6, $@ was passed through to git-fetch, but now there is a $upload_pack which is passed without being quoted. -- see shy jo signature.asc Description: Digital signature
[PATCH] support bash completion for add-on commands
This makes it possible to implement bash completion for add-on commands, that will work even when the bash completion scripts are being loaded on-demand, as is done by the bash-completion package. git's bash completion handles subcommands by running a _git_$command function. As well as the many such functions included in git-completion.bash, there can be other functions defined elsewhere to support third-party add-on git commands, and they'll happily be used. But, bash completion scripts are often loaded on demand, as shown in the completion_loader example in bash's man page, and the bash-completion implementation that is commonly used on many Linux systems. The demand loading will load this very script from some place like /usr/share/bash-completion/completions/git, when the user complete a git command. But, completion scripts for git add-on commands don't get loaded. For example, when I wrote a git-annex bash completion script, bash was unable to tab complete git annex foo, until I tab completed a git-annex command. Which loaded the git-annex completion, and then that same completion worked to make git annex foo tab complete. An inconsistent UI.. So, if the git completion script is unable to find the wanted _git_$command function, have it fall-back to looking for a git-$command completion script, and loading it. The add-on script is looked for in the same directory as the git completion script, which we can find by looking at BASH_SOURCE. Signed-off-by: Joey Hess jo...@joeyh.name --- contrib/completion/git-completion.bash | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash index c97c648..ba91b2a 100644 --- a/contrib/completion/git-completion.bash +++ b/contrib/completion/git-completion.bash @@ -2614,7 +2614,16 @@ __git_main () if [ -n $expansion ]; then words[1]=$expansion completion_func=_git_${expansion//-/_} - declare -f $completion_func /dev/null $completion_func + declare -f $completion_func /dev/null $completion_func return + fi + + # As a fallback, if no completion function is defined for the + # command, look for add-on command completion script in same + # directory as this completion script, and if found, source it, + # and restart completion using it. + local compdir=${BASH_SOURCE%/*} + if [ -e $compdir/git-$command ]; then + source $compdir/git-$command __git_main $@ fi } -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
undocumented core.sharedRepository=2 set by git init --shared=world
joey@darkstar:~/tmpgit init --shared=world testrepo Initialized empty shared Git repository in /home/joey/tmp/testrepo/.git/ joey@darkstar:~/tmpgrep shared testrepo/.git/config sharedrepository = 2 This magic value of 2 seems to be undocumented, as is the magic value of 1 that's equvilant to group. I think it would be better to have git init put in world or group and not these magic values. Anyway, I suppose they ought to be documented too. -- see shy jo signature.asc Description: Digital signature
[PATCH] improve documentation for some commands that use pathspecs
After being surprised that git-ls-files expands pathspecs, here's a patch that would have saved me. --- Documentation/git-ls-files.txt | 9 + Documentation/git-ls-tree.txt | 8 2 files changed, 9 insertions(+), 8 deletions(-) diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt index e26f01f..f7a3039 100644 --- a/Documentation/git-ls-files.txt +++ b/Documentation/git-ls-files.txt @@ -17,7 +17,7 @@ SYNOPSIS [--exclude-per-directory=file] [--exclude-standard] [--error-unmatch] [--with-tree=tree-ish] - [--full-name] [--abbrev] [--] [file...] + [--full-name] [--abbrev] [--] [pathspec...] DESCRIPTION --- @@ -101,7 +101,7 @@ OPTIONS --with-tree=tree-ish:: When using --error-unmatch to expand the user supplied - file (i.e. path pattern) arguments to paths, pretend + pathspec arguments to paths, pretend that paths which were removed in the index since the named tree-ish are still present. Using this option with `-s` or `-u` options does not make any sense. @@ -150,9 +150,10 @@ a space) at the start of each line: \--:: Do not interpret any more arguments as options. -file:: +pathspec:: Files to show. If no files are given all files which match the other - specified criteria are shown. + specified criteria are shown. (Note that this isn't really raw + pathnames, but rather a list of patterns to match.) Output -- diff --git a/Documentation/git-ls-tree.txt b/Documentation/git-ls-tree.txt index 16e87fd..58e7f64 100644 --- a/Documentation/git-ls-tree.txt +++ b/Documentation/git-ls-tree.txt @@ -11,7 +11,7 @@ SYNOPSIS [verse] 'git ls-tree' [-d] [-r] [-t] [-l] [-z] [--name-only] [--name-status] [--full-name] [--full-tree] [--abbrev[=n]] - tree-ish [path...] + tree-ish [pathspec...] DESCRIPTION --- @@ -19,11 +19,11 @@ Lists the contents of a given tree object, like what /bin/ls -a does in the current working directory. Note that: - the behaviour is slightly different from that of /bin/ls in that the - 'path' denotes just a list of patterns to match, e.g. so specifying + 'pathspec' denotes just a list of patterns to match, e.g. so specifying directory name (without '-r') will behave differently, and order of the arguments does not matter. - - the behaviour is similar to that of /bin/ls in that the 'path' is + - the behaviour is similar to that of /bin/ls in that the 'pathspec' is taken as relative to the current working directory. E.g. when you are in a directory 'sub' that has a directory 'dir', you can run 'git ls-tree -r HEAD dir' to list the contents of the tree (that is @@ -72,7 +72,7 @@ OPTIONS Do not limit the listing to the current working directory. Implies --full-name. -[path...]:: +[pathspec...]:: When paths are given, show them (note that this isn't really raw pathnames, but rather a list of patterns to match). Otherwise implicitly uses the root level of the tree as the sole path argument. -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
git ls-files wildcard behavior considered harmful
I was very surprised to learn the other day that git ls-files 'foo*' will expand wildcards (including character classes), in the absence of expansion by the shell. (git version 2.1.4) joey@darkstar:~/tmp/aaagit ls-files 'foo*bar' foo*bar foobazbar joey@darkstar:~/tmp/aaagit ls-files '[abc]' [abc] a b As far as I can see this behavior is not documented on the man page, except for a tiny mention in the --with-tree documentation, where it says file (i.e. path pattern). Since I wanted to avoid this wildcard expension, I tried slash-escaping the wildcard characters. This works: joey@darkstar:~/tmp/aaagit ls-files 'foo\*bar' foo*bar joey@darkstar:~/tmp/aaagit ls-files '\[abc\]' [abc] But, there is a weird behavior here with subdirectories. While normally ls-files would recurse, slash-escaped wildcard characters in the directory name prevent recursion. joey@darkstar:~/tmp/aaagit ls-files 'foo[d]' foo[d]/subfile food joey@darkstar:~/tmp/aaagit ls-files 'foo\[d\]' joey@darkstar:~/tmp/aaa The above example shows a case where it's impossible to get ls-files to only list files in a directory and not other files that match the wildcard. This seems like it must be a bug, and it means it's impossible to reliably work around the wildcard expansion behavior. I suspect that this wildcard expansion behavior is useful somewhere. But from the perspective of using ls-files as plumbing, where you want to get out some subset of what was put in, it's not nice. -- see shy jo -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git ls-files wildcard behavior considered harmful
Duy Nguyen wrote: You can do git --literal-pathspecs ls-files ... or set GIT_LITERAL_PATHSPECS. Thanks! --literal-pathspecs does allow getting around this. Now I'm wondering what other parts of plumbing might be doing globbing that I did not anticipate. Maybe I should set the environment variable so I don't need to worry about it.. -- see shy jo signature.asc Description: Digital signature
Re: weaning distributions off tarballs: extended verification of git tags
I support this proposal, as someone who no longer releases tarballs of my software, when I can possibly avoid it. I have worried about signed tags / commits only being a SHA1 break away from useless. As to the implementation, checksumming the collection of raw objects is certainly superior to tar. Colin had suggested sorting the objects by checksum, but I don't think that is necessary. Just stream the commit object, then its tree object, followed by the content of each object listed in the tree, recursing into subtrees as necessary. That will be a stable stream for a given commit, or tree. -- see shy jo signature.asc Description: Digital signature
bug with partial commit and pre-commit hook updating the index
I have found many uses for the feature that lets a pre-commit hook stage changes in the index that will be included in the commit. But now I seem to have found a bug in the support for that, involving partial commits. It seems that, after a partial commit in which the pre-commit hook stages a modification of a file, the index is is left without that staged change. This only occurs with git commit $file, not git commit -a. joey@darkstar:~/tmp/als joey@darkstar:~/tmp/adate foo joey@darkstar:~/tmp/agit add foo joey@darkstar:~/tmp/agit commit -m added regular file foo [master 79d0f1d] added regular file foo 1 file changed, 1 insertion(+) create mode 100644 foo joey@darkstar:~/tmp/amv ~/pre-commit .git/hooks/ joey@darkstar:~/tmp/acat .git/hooks/pre-commit #!/bin/sh ln -vsf /etc/passwd foo git add foo joey@darkstar:~/tmp/adate foo joey@darkstar:~/tmp/agit commit foo -m update ‘foo’ - ‘/etc/passwd’ [master efa9f67] update 1 file changed, 1 insertion(+), 1 deletion(-) rewrite foo (100%) mode change 100644 = 12 So, the pre-commit hook replaced file foo with a symlink, and staged it, and we can see from the commit summary that was correctly included in the commit. But, look here: joey@darkstar:~/tmp/agit status On branch master Changes to be committed: (use git reset HEAD file... to unstage) typechange: foo Changes not staged for commit: (use git add file... to update what will be committed) (use git checkout -- file... to discard changes in working directory) typechange: foo builtin/commit.c has a long comment that talks about a false index which is set up and used during a partial commit. The pre-commit hook is run using this false index, and the commit is generated from it. I guess the bug involves the real index not being updated afterwards to reflect the changes made to the false index. -- see shy jo, resending a mail that vger accepted yesterday but has still not posted -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
bug: git merge --no-commit loses track of file modes in the index
If git merge --no-commit is used to merge a commit adding a file with an unusual mode -- specifically a symlink which has mode 12, it fails to stage the right mode into the index. This only happens when core.symlinks=false. I noticed it on FAT, but have managed to reproduce it on ext4. Here's an example of the bug: joey@darkstar:~git clone r1 r2 Cloning into 'r2'... done. joey@darkstar:~cd r1 joey@darkstar:~/r1ls -l total 0 lrwxrwxrwx 1 joey joey 11 Jun 12 21:23 foo - /etc/passwd joey@darkstar:~/r1git mv foo bar joey@darkstar:~/r1git commit -m moved [master 516a53c] moved 1 file changed, 0 insertions(+), 0 deletions(-) rename foo = bar (100%) joey@darkstar:~/r1cd .. joey@darkstar:~cd r2 joey@darkstar:~/r2git config core.symlinks false joey@darkstar:~/r2git fetch origin remote: Counting objects: 2, done. remote: Total 2 (delta 0), reused 0 (delta 0) Unpacking objects: 100% (2/2), done. From /home/joey/r1 7ab8102..516a53c master - origin/master joey@darkstar:~/r2git merge origin/master --no-commit --no-ff Automatic merge went well; stopped before committing as requested joey@darkstar:~/r2git diff --cached diff --git a/bar b/bar new file mode 100644 index 000..3594e94 --- /dev/null +++ b/bar @@ -0,0 +1 @@ +/etc/passwd \ No newline at end of file diff --git a/foo b/foo deleted file mode 12 index 3594e94..000 --- a/foo +++ /dev/null @@ -1 +0,0 @@ -/etc/passwd \ No newline at end of file joey@darkstar:~/r2git commit -m oops [master 63bd960] oops joey@darkstar:~/r2git show commit 63bd9608c96a91582b27c5853ff58053bab6c71c Merge: 7ab8102 516a53c Author: Joey Hess j...@kitenet.net Date: Thu Jun 12 21:37:35 2014 -0400 oops diff --cc bar index 000,3594e94..3594e94 mode 00,12..100644 --- a/bar +++ b/bar joey@darkstar:~/r2git version git version 2.0.0 -- see shy jo signature.asc Description: Digital signature
Re: RLIMIT_NOFILE fallback
Junio C Hamano wrote: Hmph, perhaps you are right. Like this? Works for me. -- see shy jo signature.asc Description: Digital signature
Re: RLIMIT_NOFILE fallback
Jeff King wrote: I wish we understood why getrlimit was failing. Returning EFAULT seems like an odd choice if it is not implemented for the system. On such a system, do the other fallbacks actually work? Would it work to do: That is, does sysconf actually work on such a system (or does it need a similar run-time fallback)? And either way, we should try falling back to OPEN_MAX rather than 1 if we have it. For what it's worth, the system this happened on was a QNAP TS-219PII Linux willow 2.6.33.2 #1 Fri Mar 1 04:41:48 CST 2013 armv5tel unknown I don't have access to it to run tests of sysconf. (I already suggested its owner upgrade its firmware.) As far as the warning, I am not sure I see a point. The user does not have any useful recourse, and git should continue to operate as normal. Having every single git invocation print by the way, RLIMIT_NOFILE does not work on your system seems like it would get annoying. I agree with that. -- see shy jo signature.asc Description: Digital signature
Re: [Question] Git recovery with HEAD commit broken
Matthieu Moy wrote: Not as far as I know. But git fsck has a --lost-found option that can help recovering unreachable (dangling) commits. You may have a look at http://hackage.haskell.org/package/git-repair but I do not think it would solve your particular case. Well, let's find out.. I corrupted .git/refs/heads/master to refer to a commit that does not exist. The history has a few prior commits. joey@darkstar:/tmp/yygit fsck Checking object directories: 100% (256/256), done. error: HEAD: invalid sha1 pointer 10814e97cc8bf5f6f8ce0c0d5302f778c09cac88 error: refs/heads/master does not point to a valid object! notice: No default references joey@darkstar:/tmp/yy~/src/git-repair/git-repair Running git fsck ... Initialized empty Git repository in /home/joey/tmp/tmprepo.0/.git/ 1 missing objects could not be recovered! To force a recovery to a usable state, retry with the --force parameter. - exit 1 If there had been a remote that had the missing 10814e97cc8bf5f6f8ce0c0d5302f778c09cac88 commit, it would have cloned it from there, and this would have succeeded. But with a fully missing commit, --force is needed to enable more destructive repairs. joey@darkstar:/tmp/yy~/src/git-repair/git-repair --force Running git fsck ... Initialized empty Git repository in /home/joey/tmp/tmprepo.0/.git/ fatal: bad object refs/heads/master fatal: bad object refs/heads/master fatal: bad object refs/heads/master Deleted these local branches, which could not be recovered due to missing objects: refs/heads/master You currently have refs/heads/master checked out. You may have staged changes in the index that can be committed to recover the lost state of this branch! Successfully recovered repository! Please carefully check that the changes mentioned above are ok.. Hmm, that could have gone better. While it successfully detected the broken HEAD, and removed that ref, which is enough to make git fsck pass[1], it failed to find the old ref in the reflog, despite containing code that walks up it to find a usable commit. joey@darkstar:/tmp/yygit reflog fatal: bad default revision 'HEAD' And that's why.. git-reflog requires a valid HEAD to work. Bit of a catch-22. I could work around this by manually parsing the reflog. It would not be the first thing git-repair has to re-implement because the git command isn't robust enough[2]. I have made a TODO about this. OTOH, if a kind git developer would like to make git-reflog work when HEAD is missing, that seems like a generally useful improvement.. -- see shy jo [1] It will make fsck pass 100% of the time -- its test suite randomly corrupts repositories and checks that it can force some repair good enough to make git fsck pass. [2] A particularly annoying one is that git branch -d cannot be used to remove a branch that is directly pointing to a corrupted commit! signature.asc Description: Digital signature