git ls-files --with-tree documentation

2018-10-19 Thread Joey Hess
   --with-tree=
   When using --error-unmatch to expand the user supplied  (i.e.
   path pattern) arguments to paths, pretend that paths which were
   removed in the index since the named  are still present.
   Using this option with -s or -u options does not make any sense.

This seems to say that it only affects it when --error-unmatch is used,
but in fact it goes deeper; for example I can use it to list files that
are present in either the current work tree or some other branch:

joey@darkstar:/tmp/v> git checkout foo
joey@darkstar:/tmp/v> git ls-files --with-tree=master
in-foo
in-master
joey@darkstar:/tmp/v> git ls-files
in-foo
joey@darkstar:/tmp/v> git ls-tree master 
100644 blob 0242cc10fdf4e9afdfd0928c2a209d4545780168in-master

This is very useful behavior, but I'm not sure if I should rely on it
behaving this way in the future, given the documentation.

t/t3060-ls-files-with-tree.sh does indeed test that it
"should add entries from named tree", and it does it without using
--error-unmatch.

How about changing the documentation to something like this to make
more explicit what it does.

   --with-tree=
   Treat all files in the  as if they were present in the 
index.
   When using --error-unmatch to expand the user supplied  (i.e.
   path pattern) arguments to paths, this has the effect that paths 
which were
   removed in the index since the named  are still present.
   Using this option with -s or -u options does not make any sense.

-- 
see shy jo


Re: bug when combined with etckeeper

2018-10-10 Thread Joey Hess
/etc/.git/hooks/pre-commit is installed by etckeeper and runs
etckeeper pre-commit, which deals with /etc/.etckeeper, including
running "git add .etckeeper". Why that file would match a gitignore
seems much less important than why git would run that hook in an
entirely different git repository.

   core.hooksPath
   By default Git will look for your hooks in the $GIT_DIR/hooks
   directory. Set this to different path, e.g.  /etc/git/hooks, and Git
   will try to find your hooks in that directory, e.g.
   /etc/git/hooks/pre-receive instead of in $GIT_DIR/hooks/pre-receive.

Hmm, the example "/etc/git/hooks" there is very similar to the
"/etc/.git/hooks" used by etckeeper. So my guess is you have
core.hooksPath set globally.

-- 
see shy jo


signature.asc
Description: PGP signature


Re: Git for games working group

2018-09-17 Thread Joey Hess
Ævar Arnfjörð Bjarmason wrote:
> There's surely other aspects of that square peg of large file tracking
> not fitting the round hole of file locking, the point of my write-up was
> not that *that* solution is perfect, but there's prior art here that's
> very easily adopted to distributed locking if someone wanted to scratch
> that itch, since the notion of keeping a log of who has/hasn't gotten a
> file is very similar to a log of who has/hasn't locked some file(s) in
> the tree.

Actually they are fundamentally very different. git-annex's tracking of
locations of files is eventually consistent, which of course means that
at any given point in time it may be currently inconsistent. That is
fine for tracking locations of files, but not for locking.

When git-annex needs to do an operation that relies on someone else's
copy of a file actually being present, it uses real locking. That
locking is not centralized, instead it relies on the connections between
git repositories. That turns out to be sufficient for git-annex's own
locking needs, but it would not be sufficient to avoid file edit
conflict problems in eg a split brain situation.

-- 
see shy jo


signature.asc
Description: PGP signature


Re: clean filter run in top of repo with wrong GIT_WORK_TREE

2018-07-17 Thread Joey Hess
The clean filter can work around this problem by chdir GIT_PREFIX,
but needing to do this in unusual cases seems to be asking for bugs.

-- 
see shy jo


signature.asc
Description: PGP signature


clean filter run in top of repo with wrong GIT_WORK_TREE

2018-07-17 Thread Joey Hess
When git is running inside a subdirectory of the repository, 
and needs to run the clean filter, it runs it chdired back to the top of
the repository. However, if git was run with a relative --work-tree,
it passes that relative path in GIT_WORK_TREE on to the clean filter.

If git was run with eg, "--work-tree=..", the clean filter sees a work
tree that is outside the repository. It might then read files located
outside the repository. That seems like it could have security
consequences, but it's certianly a surprising problem to need to deal
with when writing a clean filter.

Brian posted a fix for a very similar bug in sequencer.c on the 14th, 
so it seems likely there are other occurances of the same problem
elsewhere.

Demonstration of this bug:

joey@darkstar:~/tmp/repo>cat .gitattributes
* filter=foo
joey@darkstar:~/tmp/repo>git config filter.foo.clean
clean-filter %f
joey@darkstar:~/tmp/repo>cat ~/bin/clean-filter 
#!/bin/sh
pwd >&2
echo $GIT_WORK_TREE >&2
ls "$GIT_WORK_TREE/$1"
joey@darkstar:~/tmp/repo>cd foo/bar/
joey@darkstar:~/tmp/repo/foo/bar>ls
x
joey@darkstar:~/tmp/repo/foo/bar>touch x
joey@darkstar:~/tmp/repo/foo/bar>git --work-tree=../.. ls-files --modified
/home/joey/tmp/repo
../..
ls: cannot access '../../foo/bar/x': No such file or directory

git version 2.18.0

-- 
see shy jo


signature.asc
Description: PGP signature


Re: Fetch-hooks

2018-02-09 Thread Joey Hess
Leo Gaspard wrote:
> I just wanted to check, you did not put the Signed-off-by line in
> patches in https://marc.info/?l=git=132491485901482=2
> 
> Could you confirm that the patches you sent are “covered under an
> appropriate open source license and I have the right under that license
> to submit that work with modifications, whether created in whole or in
> part by me, under the same open source license (unless I am permitted to
> submit under a different license), as indicated in the file”, so that I
> could send the patch I wrote based on yours with a Signed-off-by line of
> my own without breaking the DCO?

Yes; my patches are under the same GPL-2 as the rest of git.

-- 
see shy jo


signature.asc
Description: PGP signature


Re: Fetch-hooks

2018-02-08 Thread Joey Hess
Leo Gaspard wrote:
> That said, I just came upon [1] (esp. the description [2] and the patch
> [3]), and wondered: it looks like the patch was abandoned midway in
> favor of a hook refactoring. Would you happen to know whether the hook
> refactoring eventually took place, and/or whether this patch was
> resubmitted later, and/or whether it would still be possible to merge
> this now? (not having any experience with git's internals yet, I don't
> really know whether these are stupid questions or not)
> 
> PS: Cc'ing Joey, as you most likely know best what eventually happened,
> if you can remember it?

I don't remember it well, but reviewing the thread, I think it foundered
on this comment by Junio:

> That use case sounds like that "git fetch" is called as a first class UI,
> which is covered by "git myfetch" (you can call it "git annex fetch")
> wrapper approach, the canonical example of a hook that we explicitly do
^
> not want to add.
  ^^^

While I still think a fetch hook would be a good idea for reasons of
composability, I then just went off and implemented such a wrapper for
my own particular use case, and the wrapper program then grew to cover
use cases that a hook would not have been able to cover, so ...

-- 
see shy jo


signature.asc
Description: PGP signature


Re: [PATCH] link_alt_odb_entries: make empty input a noop

2017-11-13 Thread Joey Hess
Jeff King wrote:
> This should make Joey's immediate pain go away, though only by papering
> it over. I tend to agree that we shouldn't be looking at $PWD at all
> here.

I've confirmed that Jeff's patch fixes the case I was having trouble with.

-- 
see shy jo


signature.asc
Description: PGP signature


use of PWD

2017-11-07 Thread Joey Hess
In strbuf_add_absolute_path, git uses PWD if set when making relative
paths absolute, otherwise it falls back to getcwd(3). Using PWD may not
be a good idea. Here's one case where it confuses git badly:

joey@darkstar:/>sudo ln -s /media/hd/repo hd
joey@darkstar:/>cd /hd/repo
joey@darkstar:/hd/repo>git --git-dir=../../../home/joey/tmp/repo/.git cat-file 
-t HEAD
fatal: unable to normalize object directory: 
/hd/repo/../../../home/joey/tmp/repo/.git/objects
joey@darkstar:/hd/repo>ls -d ../../../home/joey/tmp/repo/.git
../../../home/joey/tmp/repo/.git/

In that situation where cd has followed a symlink to a different
depth, there seems to be no way to give git a relative path that works.
Other numbers of ../ also don't work.

What does work is to unset PWD:

joey@darkstar:/hd/repo>PWD= git --git-dir=../../../home/joey/tmp/repo/.git 
cat-file -t HEAD
commit

So why does git use PWD at all? Some shell code used pwd earlier
(leading to similar bugs like the one fixed in v1.5.1.5), but in
the C code, it was first introduced in commit
1b9a9467f8b9a8da2fe58d10ae16779492aa7737, which speaks of the "user's
view of the current directory", which is what PWD is. The use of PWD in
that commit may be ok.

Then in commit 10c4c881c4d2cb0ece0508e7142e189e68445257, 
the limited use of PWD broadened a lot, seemingly without
intending to look at the "user's view of the current directory"
anymore, due to reusing the code from the earlier commit.

-- 
see shy jo


signature.asc
Description: PGP signature


Re: reversion in GIT_COMMON_DIR refs path

2017-05-19 Thread Joey Hess
Joey Hess wrote:
> Bisecting this test suite failure
> https://git-annex.branchable.com/git-annex_in_nixpkgs_fails_with_git-2.13.0/
> I landed on commit f57f37e2e1bf11ab4cdfd221ad47e961ba9353a0 to git.
> 
> It seems that changed resolving refs paths when GIT_DIR and GIT_COMMON_DIR
> are both set. While before refs were looked for in GIT_COMMON_DIR,
> now they're not.

In case there's any doubt about whether this is a reversion or an
intentional change, see gitrepository-layout(5):

   refs
   References are stored in subdirectories of this directory. The git
   prune command knows to preserve objects reachable from refs found
   in this directory and its subdirectories. This directory is ignored
   if $GIT_COMMON_DIR is set and "$GIT_COMMON_DIR/refs" will be used
   instead.

So the documented behavior is broken.

-- 
see shy jo


signature.asc
Description: PGP signature


Re: [PATCH] tests: add an optional test to test git-annex

2017-05-16 Thread Joey Hess
Nice work.

Note that you can export BUILDER=stack and git-annex will build with a
known good dependency stack, which can be more reliable/cross platform
than using apt to install its build dependencies. That needs
https://docs.haskellstack.org/ installed. Also it currently needs
GIT_TEST_GIT_ANNEX_REVISION=master since I improved git-annex's
Makefile slightly.

-- 
see shy jo


signature.asc
Description: PGP signature


Re: reversion in GIT_COMMON_DIR refs path

2017-05-16 Thread Joey Hess
Ævar Arnfjörð Bjarmason wrote:
> On Tue, May 16, 2017 at 7:10 PM, Joey Hess <i...@joeyh.name> wrote:
> > Bisecting this test suite failure
> > https://git-annex.branchable.com/git-annex_in_nixpkgs_fails_with_git-2.13.0/
> > I landed on commit f57f37e2e1bf11ab4cdfd221ad47e961ba9353a0 to git.
> 
> That links's broken for me. Looking at your wiki it looks like you
> mean: 
> https://git-annex.branchable.com/bugs/git-annex_in_nixpkgs_fails_with_git-2.13.0/

Thanks for correcting that

> I have no idea what this bug is about, but side-question: It looks
> like this is git-annex's own test suite that's failing with 2.13.0, is
> that right?

Yes indeed.

> It would be very nice to have a test in git itself to test with
> git-annex. I.e. some optional test that just pulls down the latest
> git-annex release & runs its tests against the git we're building.
> 
> Thanks for annex b.t.w., I use it a lot.

If the git devs are ok with this, I certianly would be happy if such
tests were run, at least occasionally, on the git side!

-- 
see shy jo


signature.asc
Description: PGP signature


reversion in GIT_COMMON_DIR refs path

2017-05-16 Thread Joey Hess
Bisecting this test suite failure
https://git-annex.branchable.com/git-annex_in_nixpkgs_fails_with_git-2.13.0/
I landed on commit f57f37e2e1bf11ab4cdfd221ad47e961ba9353a0 to git.

It seems that changed resolving refs paths when GIT_DIR and GIT_COMMON_DIR
are both set. While before refs were looked for in GIT_COMMON_DIR,
now they're not.

Test case:

#!/bin/sh
set -e
set -x
rm -rf testdir
git init testdir
cd testdir
echo 1 > foo
git add foo
git commit -m add
mkdir dummy
mkdir dummy/overlay
cp .git/index .git/HEAD dummy/overlay
#cp .git/refs .git/packed-refs dummy/overlay -a
cd dummy
export GIT_COMMON_DIR=`pwd`/../.git
export GIT_DIR=`pwd`/overlay
git rev-parse --git-path refs/heads/master
git show refs/heads/master

This script succeeds with git 2.11.0, but with 2.13.0, it fails:

fatal: ambiguous argument 'refs/heads/master': unknown revision or path not in 
the working tree.

It seems to be failing to look up refs in GIT_COMMON_DIR.
Note that uncommenting the commented out line in the script, to copy the refs
into GIT_DIR, makes it succeed.

git rev-parse --git-path refs/heads/master shows the GIT_COMMON_DIR/refs path
still (as gitrepository-layout documents). So this reversion made
different parts of git disagreeing about the refs path.



Re: SHA1 collisions found

2017-03-02 Thread Joey Hess
Linus Torvalds wrote:
> So you'd have to be able to attack both the full SHA1, _and_ whatever
> other different good hash to 128 bits.

There's a surprising result of combining iterated hash functions, that
the combination is no more difficult to attack than the strongest hash
function used.

https://www.iacr.org/cryptodb/archive/2004/CRYPTO/1472/1472.pdf

Perhaps you already knew about this, but I had only heard rumors
that was the case, until I found that reference recently.

-- 
see shy jo


signature.asc
Description: PGP signature


Re: SHA1 collisions found

2017-02-23 Thread Joey Hess
Jeff King wrote:
> It's not an identical prefix, but I think collision attacks generally
> are along the lines of selecting two prefixes followed by garbage, and
> then mutating the garbage on both sides. That would "work" in this case
> (modulo the fact that git would complain about the NUL).
> 
> I haven't read the paper yet to see if that is the case here, though.

The current attack is an identical-prefix attack, not chosen-prefix, so
not quite to that point yet.

The MD5 chosen-prefix attack was 2^15 harder than the known-prefix attack,
but who knows if the numbers will be comprable for SHA1.

> A related case is if you could stick a "cruft " header at the end of
> the commit headers, and mutate its value (avoiding newlines). fsck
> doesn't complain about that.

git log and git show don't show such cruft headers either.

BTW, the SHA attack only added ~128 bytes to the pdfs, not really a
huge amount of garbage.

-- 
see shy jo


signature.asc
Description: PGP signature


Re: SHA1 collisions found

2017-02-23 Thread Joey Hess
Joey Hess wrote:
> Linus Torvalds wrote:
> > What you describe pretty much already requires a pre-image attack,
> > which the new attack is _not_.
> > 
> > It's not clear that the "good" object can be anything sane.
> 
> Generate a regular commit object; use the entire commit object + NUL as the
> chosen prefix, and use the identical-prefix collision attack to generate
> the colliding good/bad objects.
> 
> (The size in git's object header is a minor complication. Set the size
> field to something sufficiently large, and then pad out the colliding
> objects to that size once they're generated.)

Sorry! While that would work, it's a useless attack because the good and bad
commit objects still point to the same tree.

It would be interesting to have such colliding objects, to see what beaks,
but probably not worth $75k to generate them.

-- 
see shy jo


signature.asc
Description: PGP signature


Re: SHA1 collisions found

2017-02-23 Thread Joey Hess
Linus Torvalds wrote:
> What you describe pretty much already requires a pre-image attack,
> which the new attack is _not_.
> 
> It's not clear that the "good" object can be anything sane.

Generate a regular commit object; use the entire commit object + NUL as the
chosen prefix, and use the identical-prefix collision attack to generate
the colliding good/bad objects.

(The size in git's object header is a minor complication. Set the size
field to something sufficiently large, and then pad out the colliding
objects to that size once they're generated.)

-- 
see shy jo


signature.asc
Description: PGP signature


Re: SHA1 collisions found

2017-02-23 Thread Joey Hess
Linus Torvalds wrote:
> I haven't seen the attack yet, but git doesn't actually just hash the
> data, it does prepend a type/length field to it. That usually tends to
> make collision attacks much harder, because you either have to make
> the resulting size the same too, or you have to be able to also edit
> the size field in the header.

I have some sha1 collisions (and other fun along these lines) in 
https://github.com/joeyh/supercollider

That includes two files with the same SHA and size, which do get
different blobs thanks to the way git prepends the header to the
content.

joey@darkstar:~/tmp/supercollider>sha1sum  bad.pdf good.pdf 
d00bbe65d80f6d53d5c15da7c6b4f0a655c5a86a  bad.pdf
d00bbe65d80f6d53d5c15da7c6b4f0a655c5a86a  good.pdf
joey@darkstar:~/tmp/supercollider>git ls-tree HEAD
100644 blob ca44e9913faf08d625346205e228e2265dd12b65bad.pdf
100644 blob 5f90b67523865ad5b1391cb4a1c010d541c816c1good.pdf

While appending identical data to these colliding files does generate
other collisions, prepending data does not.

It would cost 6500 CPU years + 100 GPU years to generate valid colliding
git objects using the methods of the paper's authors. That might be cost
effective if it helped get a backdoor into eg, the kernel.

>  (b) we can probably easily add some extra sanity checks to the opaque
> data we do have, to make it much harder to do the hiding of random
> data that these attacks pretty much always depend on.

For example, git fsck does warn about a commit message with opaque
data hidden after a NUL. But, git show/merge/pull give no indication
that something funky is going on when working with such commits.

-- 
see shy jo


signature.asc
Description: PGP signature


Re: SHA1 collisions found

2017-02-23 Thread Joey Hess
Junio C Hamano wrote:
> On Thu, Feb 23, 2017 at 8:43 AM, Joey Hess <i...@joeyh.name> wrote:
> >
> > Since we now have collisions in valid PDF files, collisions in valid git
> > commit and tree objects are probably able to be constructed.
> 
> That may be true, but
> https://public-inbox.org/git/pine.lnx.4.58.0504291221250.18...@ppc970.osdl.org/

That's about someone replacing an valid object in Linus's repository
with an invalid random blob they found that collides. This SHA1
break doesn't allow generating such a blob anyway. Linus is right,
that's an impractical attack.

Attacks using this SHA1 break will look something more like:

* I push a "bad" object to a repo on github I set up under a
  pseudonym.
* I publish a "good" object in a commit and convince the maintainer to
  merge it.
* I wait for the maintainer to push to github.
* I wait for github to deduplicate and hope they'll replace the good
  object with the bad one I pre-uploaded, thus silently changing the
  content of the good commit the maintainer reviewed and pushed.
* The bad object is pulled from github and deployed.
* The maintainer still has the good object. They may not notice the bad
  object is out there for a long time.

Of course, it doesn't need to involve Github, and doesn't need to
rely on internal details of their deduplication[1]; 
that only let me publish the bad object under a psydonym.

-- 
see shy jo

[1] Which I'm only guessing about, but now that we have colliding
objects, we can upload them to different repos and see if such
dedupication happens.


signature.asc
Description: PGP signature


SHA1 collisions found

2017-02-23 Thread Joey Hess
https://shattered.io/static/shattered.pdf
https://freedom-to-tinker.com/2017/02/23/rip-sha-1/

IIRC someone has been working on parameterizing git's SHA1 assumptions
so a repository could eventually use a more secure hash. How far has
that gotten? There are still many "40" constants in git.git HEAD.

In the meantime, git commit -S, and checks that commits are signed,
seems like the only way to mitigate against attacks such as
the ones described in the threads at
https://joeyh.name/blog/sha-1/ and
https://joeyh.name/blog/entry/size_of_the_git_sha1_collision_attack_surface/

Since we now have collisions in valid PDF files, collisions in valid git
commit and tree objects are probably able to be constructed.

-- 
see shy jo


signature.asc
Description: PGP signature


Re: What's cooking in git.git (Jun 2016, #05; Thu, 16)

2016-07-12 Thread Joey Hess
Torsten Bögershausen wrote re jh/clean-smudge-annex:
> The thing is that we need to check the file system to find .gitatttibutes,
> even if we just did it 1 nanosecond ago.
> 
> So the .gitattributes is done 3 times:
> -1 would_convert_to_git_filter_fd(
> -2 assert(would_convert_to_git_filter_fd(path));
> -3 convert.c/convert_to_git_filter_fd()
> 
> The only situation where this could be useful is when the .gitattributes
> change between -1 and -2,
> but then they would have changed between -1 and -3, and convert.c
> will die().
> 
> Does it make sense to remove -2 ?

There's less redundant work going on than at first seems, because 
.gitattribute files are only read once and cached. Verified by strace.

So, the redundant work is only in the processing that convert_attrs() does
of the cached .gitattributes.

Notice that there was a similar redundant triple call to convert_attrs()
before my patch set:

1. index_fd checks would_convert_to_git_filter_fd
2. index_stream_convert_blob does assert(would_convert_to_git_filter_fd(path))
   (Again redundantly since 1. is its only caller and has already
   checked.)
3. in convert_to_git_filter_fd

If convert_attrs() is somehow expensive, it might be worth passing a
struct conv_attrs * into the functions that currently call
convert_attrs(). But it does not look expensive to me.

I think it would be safe enough to remove both asserts, at least as the
code is structured now.

-- 
see shy jo


signature.asc
Description: PGP signature


[PATCH v5 4/8] use smudgeToFile in git checkout etc

2016-07-11 Thread Joey Hess
This makes git checkout, git reset, etc use smudgeToFile.

Includes test cases.

(There's a call to convert_to_working_tree in merge-recursive.c
that could also be made to use smudgeToFile as well.)

Signed-off-by: Joey Hess <jo...@joeyh.name>
---
 entry.c   | 40 
 t/t0021-conversion.sh | 34 --
 2 files changed, 64 insertions(+), 10 deletions(-)

diff --git a/entry.c b/entry.c
index 519e042..81d12a1 100644
--- a/entry.c
+++ b/entry.c
@@ -146,6 +146,7 @@ static int write_entry(struct cache_entry *ce,
unsigned long size;
size_t wrote, newsize = 0;
struct stat st;
+   int regular_file, smudge_to_file;
 
if (ce_mode_s_ifmt == S_IFREG) {
struct stream_filter *filter = get_stream_filter(ce->name, 
ce->sha1);
@@ -175,8 +176,13 @@ static int write_entry(struct cache_entry *ce,
 
/*
 * Convert from git internal format to working tree format
+* unless the smudgeToFile filter can write to the
+* file directly.
 */
-   if (ce_mode_s_ifmt == S_IFREG &&
+   regular_file = ce_mode_s_ifmt == S_IFREG;
+   smudge_to_file = regular_file
+   && can_smudge_to_file(ce->name);
+   if (regular_file && !smudge_to_file &&
convert_to_working_tree(ce->name, new, size, )) {
free(new);
new = strbuf_detach(, );
@@ -189,13 +195,31 @@ static int write_entry(struct cache_entry *ce,
return error_errno("unable to create file %s", path);
}
 
-   wrote = write_in_full(fd, new, size);
-   if (!to_tempfile)
-   fstat_done = fstat_output(fd, state, );
-   close(fd);
-   free(new);
-   if (wrote != size)
-   return error("unable to write file %s", path);
+   if (!smudge_to_file) {
+   wrote = write_in_full(fd, new, size);
+   if (!to_tempfile)
+   fstat_done = fstat_output(fd, state, );
+   close(fd);
+   free(new);
+   if (wrote != size)
+   return error("unable to write file %s", path);
+   }
+   else {
+   close(fd);
+   convert_to_working_tree_filter_to_file(ce->name, path, 
new, size);
+   free(new);
+   /*
+* The smudgeToFile filter may have replaced the
+* file; open it to make sure that the file
+* exists.
+*/
+   fd = open(path, O_RDONLY);
+   if (fd < 0)
+   return error_errno("unable to create file %s", 
path);
+   if (!to_tempfile)
+   fstat_done = fstat_output(fd, state, );
+   close(fd);
+   }
break;
case S_IFGITLINK:
if (to_tempfile)
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index bd84b80..ea18b17 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -14,12 +14,20 @@ chmod +x rot13.sh
 
 cat <rot13-from-file.sh
 #!$SHELL_PATH
-fsfile="\$1"
+srcfile="\$1"
 touch rot13-from-file.ran
-cat "\$fsfile" | ./rot13.sh
+cat "\$srcfile" | ./rot13.sh
 EOF
 chmod +x rot13-from-file.sh
 
+cat <rot13-to-file.sh
+#!$SHELL_PATH
+destfile="\$1"
+touch rot13-to-file.ran
+./rot13.sh >"\$destfile"
+EOF
+chmod +x rot13-to-file.sh
+
 test_expect_success setup '
git config filter.rot13.smudge ./rot13.sh &&
git config filter.rot13.clean ./rot13.sh &&
@@ -291,6 +299,17 @@ test_expect_success 'cleanFromFile filter is used when 
adding a file' '
test_cmp test fstest.t
 '
 
+test_expect_success 'smudgeToFile filter is used when checking out a file' '
+   test_config filter.rot13.smudgeToFile ./rot13-to-file.sh &&
+
+   rm -f fstest.t &&
+   git checkout -- fstest.t &&
+   test_cmp test fstest.t &&
+
+   test -e rot13-to-file.ran &&
+   rm -f rot13-to-file.ran
+'
+
 test_expect_success 'cleanFromFile filter is not used when clean filter is not 
configured' '
test_config filter.noclean.smudge ./rot13.sh &&
test_config filter.noclean.cleanFromFile ./rot13-from-file.sh &&
@@ -304,4 +323,15 @@ test_expect_success 'cleanFromFile filter is not used when 
clean filter is not c
test_cmp test actual
 '
 

[PATCH v5 0/8] extend smudge/clean filters with direct file access (for pu)

2016-07-11 Thread Joey Hess
Back from vacation with a reroll of jh/clean-smudge-annex.

Deals with conflicting changes from cc/apply-am in pu.

Since tb/convert-peek-in-index is not currently in pu, this reroll isn't
based on it, and will conflict if that topic gets added back into pu.
Not sure what the status of tb/convert-peek-in-index is at this point?

Improvements from Junio's review:

fix build with DEVELOPER=1
style fixes
use test_cmp in test cases
improve robustness of a test case
clean up some confusing code
small performance tweak

Joey Hess (8):
  clarify %f documentation
  add smudgeToFile and cleanFromFile filter configs
  use cleanFromFile in git add
  use smudgeToFile in git checkout etc
  warn on unusable smudgeToFile/cleanFromFile config
  better recovery from failure of smudgeToFile filter
  use smudgeToFile filter in git am
  use smudgeToFile filter in recursive merge

 Documentation/config.txt|  18 -
 Documentation/gitattributes.txt |  42 
 apply.c |  16 +
 convert.c   | 148 
 convert.h   |  10 +++
 entry.c |  59 
 merge-recursive.c   |  53 +++---
 sha1_file.c |  42 ++--
 t/t0021-conversion.sh   | 117 +++
 9 files changed, 459 insertions(+), 46 deletions(-)

-- 
2.8.1

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 2/8] add smudgeToFile and cleanFromFile filter configs

2016-07-11 Thread Joey Hess
This adds new smudgeToFile and cleanFromFile filter commands,
which are similar to smudge and clean but allow direct access to files on
disk.

This interface can be much more efficient when operating on large files,
because the whole file content does not need to be streamed through the
filter. It even allows for things like cleanFromFile commands that avoid
reading the whole content of the file, and for smudgeToFile commands that
populate a work tree file using an efficient Copy On Write operation.

The new filter commands will not be used for all filtering. They are
efficient to use when git add is adding a file, or when the work tree is
being updated, but not a good fit when git is internally filtering blob
objects in memory for eg, a diff.

So, a user who wants to use smudgeToFile should also provide a smudge
command to be used in cases where smudgeToFile is not used. And ditto
with cleanFromFile and clean. To avoid foot-shooting configurations, the
new commands are not used unless the old commands are also configured.

That also ensures that a filter driver configuration that includes these
new commands will work, although less efficiently, when used with an older
version of git that does not support them.

Signed-off-by: Joey Hess <jo...@joeyh.name>
---
 Documentation/config.txt|  18 ++-
 Documentation/gitattributes.txt |  37 ++
 convert.c   | 111 +++-
 convert.h   |  10 
 4 files changed, 160 insertions(+), 16 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 19493aa..a55bed8 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1325,15 +1325,29 @@ format.useAutoBase::
format-patch by default.
 
 filter..clean::
-   The command which is used to convert the content of a worktree
+   The command which is used as a filter to convert the content of a 
worktree
file to a blob upon checkin.  See linkgit:gitattributes[5] for
details.
 
 filter..smudge::
-   The command which is used to convert the content of a blob
+   The command which is used as a filter to convert the content of a blob
object to a worktree file upon checkout.  See
linkgit:gitattributes[5] for details.
 
+filter..cleanFromFile::
+   Similar to filter..clean but the specified command
+   directly accesses a worktree file on disk, rather than
+   receiving the file content from standard input.
+   Only used when filter..clean is also configured.
+   See linkgit:gitattributes[5] for details.
+
+filter..smudgeToFile::
+   Similar to filter..smudge but the specified command
+   writes the content of a blob directly to a worktree file,
+   rather than to standard output.
+   Only used when filter..smudge is also configured.
+   See linkgit:gitattributes[5] for details.
+
 fsck.::
Allows overriding the message type (error, warn or ignore) of a
specific message ID such as `missingEmail`.
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 197ece8..a58aafc 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -385,6 +385,43 @@ not exist, or may have different contents. So, smudge and 
clean commands
 should not try to access the file on disk, but only act as filters on the
 content provided to them on standard input.
 
+There are two extra commands "cleanFromFile" and "smudgeToFile", which
+can optionally be set in a filter driver. These are similar to the "clean"
+and "smudge" commands, but avoid needing to pipe the contents of files
+through the filters, and instead read/write files in the filesystem.
+This can be more efficient when using filters with large files that are not
+directly stored in the repository.
+
+Both "cleanFromFile" and "smudgeToFile" are provided a path as an
+added parameter after the configured command line.
+
+The "cleanFromFile" command is provided the path to the file that
+it should clean. Like the "clean" command, it should output the cleaned
+version to standard output.
+
+The "smudgeToFile" command is provided a path to the file that it
+should write to. (This file will already exist, as an empty file that can
+be written to or replaced.) Like the "smudge" command, "smudgeToFile"
+is fed the blob object from its standard input.
+
+Some git operations that need to apply filters cannot use "cleanFromFile"
+and "smudgeToFile", since the files are not present to disk. So, to avoid
+inconsistent behavior, "cleanFromFile" will only be used if "clean" is
+also configured, and "smudgeToFile" will only be used if "smudge" is also
+configured.
+
+An example large file storage filter driver using cleanFromFile and
+sm

[PATCH v5 7/8] use smudgeToFile filter in git am

2016-07-11 Thread Joey Hess
git am updates the work tree and so should use the smudgeToFile filter.

This includes some refactoring into convert_to_working_tree_filter_to_file
to make it check the file after running the smudgeToFile command, and clean
up from a failing command.

Signed-off-by: Joey Hess <jo...@joeyh.name>
---
 apply.c   | 16 
 convert.c | 25 +++--
 entry.c   | 21 -
 t/t0021-conversion.sh | 13 +
 4 files changed, 56 insertions(+), 19 deletions(-)

diff --git a/apply.c b/apply.c
index 4a6b2db..7db8344 100644
--- a/apply.c
+++ b/apply.c
@@ -4322,6 +4322,22 @@ static int try_create_file(const char *path, unsigned 
int mode, const char *buf,
if (fd < 0)
return 1;
 
+   if (can_smudge_to_file(path)) {
+   close(fd);
+   fd = convert_to_working_tree_filter_to_file(path, path, buf, 
size);
+   if (fd < 0) {
+   /* smudgeToFile filter failed; continue
+* with regular file creation instead. */
+   fd = open(path, O_CREAT | O_EXCL | O_WRONLY, (mode & 
0100) ? 0777 : 0666);
+   if (fd < 0)
+   return -1;
+   }
+   else {
+   close(fd);
+   return 0;
+   }
+   }
+
if (convert_to_working_tree(path, buf, size, )) {
size = nbuf.len;
buf  = nbuf.buf;
diff --git a/convert.c b/convert.c
index e1b0b44..3746ad5 100644
--- a/convert.c
+++ b/convert.c
@@ -1030,13 +1030,34 @@ int convert_to_working_tree(const char *path, const 
char *src, size_t len, struc
return convert_to_working_tree_internal(path, NULL, src, len, dst, 0);
 }
 
+/*
+ * Returns fd open to read the worktree file on success.
+ * On failure, the worktree file will not exist.
+ */
 int convert_to_working_tree_filter_to_file(const char *path, const char 
*destpath, const char *src, size_t len)
 {
struct strbuf output = STRBUF_INIT;
-   int ret = convert_to_working_tree_internal(path, destpath, src, len, 
, 0);
+   int ok = convert_to_working_tree_internal(path, destpath, src, len, 
, 0);
/* The smudgeToFile filter stdout is not used. */
strbuf_release();
-   return ret;
+   if (ok) {
+   /*
+* Open the file to make sure that it's present
+* (and readable) after the command populated it.
+*/
+   int fd = open(path, O_RDONLY);
+   if (fd < 0)
+   unlink(path);
+   return fd;
+   }
+   else {
+   /*
+* The command could have created the file before failing,
+* so delete it.
+*/
+   unlink(path);
+   return -1;
+   }
 }
 
 int renormalize_buffer(const char *path, const char *src, size_t len, struct 
strbuf *dst)
diff --git a/entry.c b/entry.c
index 7811e31..40662eb 100644
--- a/entry.c
+++ b/entry.c
@@ -191,34 +191,21 @@ static int write_entry(struct cache_entry *ce,
 
if (smudge_to_file) {
close(fd);
-   if (convert_to_working_tree_filter_to_file(ce->name, 
path, new, size)) {
+   fd = convert_to_working_tree_filter_to_file(ce->name, 
path, new, size);
+   if (fd >= 0) {
free(new);
-   /*
-* The smudgeToFile filter may have replaced
-* or deleted the file; reopen it to make
-* sure that the file exists.
-*/
-   fd = open(path, O_RDONLY);
-   if (fd < 0)
-   return error_errno("unable to create 
file %s", path);
if (!to_tempfile)
fstat_done = fstat_output(fd, state, 
);
close(fd);
}
else {
-   /*
-* The failing smudgeToFile filter may have
-* deleted or replaced the file; delete
-* the file and re-open for recovery write.
-*/
-   unlink(path);
+   /* Fall through to normal write below. */
+   smudge_to_file = 0;
fd = open_output_fd(path, ce, to_tempfile);
if (fd < 0) {
free(new);
retur

[PATCH v5 6/8] better recovery from failure of smudgeToFile filter

2016-07-11 Thread Joey Hess
If the smudgeToFile filter fails, it can leave the worktree file with the
wrong content, or even deleted. Recover from this by falling back to
running the smudge filter.

Signed-off-by: Joey Hess <jo...@joeyh.name>
---
 entry.c   | 66 ++-
 t/t0021-conversion.sh | 24 +++
 2 files changed, 68 insertions(+), 22 deletions(-)

diff --git a/entry.c b/entry.c
index 81d12a1..7811e31 100644
--- a/entry.c
+++ b/entry.c
@@ -182,12 +182,6 @@ static int write_entry(struct cache_entry *ce,
regular_file = ce_mode_s_ifmt == S_IFREG;
smudge_to_file = regular_file
&& can_smudge_to_file(ce->name);
-   if (regular_file && !smudge_to_file &&
-   convert_to_working_tree(ce->name, new, size, )) {
-   free(new);
-   new = strbuf_detach(, );
-   size = newsize;
-   }
 
fd = open_output_fd(path, ce, to_tempfile);
if (fd < 0) {
@@ -195,7 +189,51 @@ static int write_entry(struct cache_entry *ce,
return error_errno("unable to create file %s", path);
}
 
+   if (smudge_to_file) {
+   close(fd);
+   if (convert_to_working_tree_filter_to_file(ce->name, 
path, new, size)) {
+   free(new);
+   /*
+* The smudgeToFile filter may have replaced
+* or deleted the file; reopen it to make
+* sure that the file exists.
+*/
+   fd = open(path, O_RDONLY);
+   if (fd < 0)
+   return error_errno("unable to create 
file %s", path);
+   if (!to_tempfile)
+   fstat_done = fstat_output(fd, state, 
);
+   close(fd);
+   }
+   else {
+   /*
+* The failing smudgeToFile filter may have
+* deleted or replaced the file; delete
+* the file and re-open for recovery write.
+*/
+   unlink(path);
+   fd = open_output_fd(path, ce, to_tempfile);
+   if (fd < 0) {
+   free(new);
+   return error_errno("unable to create 
file %s", path);
+   }
+   /* Fall through to normal write below. */
+   smudge_to_file = 0;
+   }
+   }
+
+   /*
+* Not an else of above if (smudge_to_file) because the
+* smudgeToFile filter may fail and in that case this is
+* run to recover.
+*/
if (!smudge_to_file) {
+   if (regular_file &&
+   convert_to_working_tree(ce->name, new, size, )) 
{
+   free(new);
+   new = strbuf_detach(, );
+   size = newsize;
+   }
wrote = write_in_full(fd, new, size);
if (!to_tempfile)
fstat_done = fstat_output(fd, state, );
@@ -204,22 +242,6 @@ static int write_entry(struct cache_entry *ce,
if (wrote != size)
return error("unable to write file %s", path);
}
-   else {
-   close(fd);
-   convert_to_working_tree_filter_to_file(ce->name, path, 
new, size);
-   free(new);
-   /*
-* The smudgeToFile filter may have replaced the
-* file; open it to make sure that the file
-* exists.
-*/
-   fd = open(path, O_RDONLY);
-   if (fd < 0)
-   return error_errno("unable to create file %s", 
path);
-   if (!to_tempfile)
-   fstat_done = fstat_output(fd, state, );
-   close(fd);
-   }
break;
case S_IFGITLINK:
if (to_tempfile)
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index ea18b17..0efad9b 100755
--- a/t/t0021-conversion.sh
++

[PATCH v5 1/8] clarify %f documentation

2016-07-11 Thread Joey Hess
It's natural to expect %f to be an actual file on disk; help avoid that
mistake.

Signed-off-by: Joey Hess <jo...@joeyh.name>
---
 Documentation/gitattributes.txt | 5 +
 1 file changed, 5 insertions(+)

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index f2afdb6..197ece8 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -379,6 +379,11 @@ substitution.  For example:
smudge = git-p4-filter --smudge %f
 
 
+Note that "%f" is the name of the path that is being worked on. Depending
+on the version that is being filtered, the corresponding file on disk may
+not exist, or may have different contents. So, smudge and clean commands
+should not try to access the file on disk, but only act as filters on the
+content provided to them on standard input.
 
 Interaction between checkin/checkout attributes
 ^^^
-- 
2.8.1

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 5/8] warn on unusable smudgeToFile/cleanFromFile config

2016-07-11 Thread Joey Hess
Let the user know when they have a smudgeToFile/cleanFromFile config
that cannot be used because the corresponding smudge/clean config
is missing.

The warning is only displayed a maximum of once per git invocation,
and only when doing an operation that would use the filter.

Signed-off-by: Joey Hess <jo...@joeyh.name>
---
 convert.c | 36 ++--
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/convert.c b/convert.c
index eb7774f..e1b0b44 100644
--- a/convert.c
+++ b/convert.c
@@ -845,34 +845,50 @@ int would_convert_to_git_filter_fd(const char *path)
return apply_filter(path, NULL, NULL, 0, -1, NULL, ca.drv->clean);
 }
 
+static int can_filter_file(const char *filefilter, const char *filefiltername,
+  const char *stdiofilter, const char *stdiofiltername,
+  const struct conv_attrs *ca,
+  int *warncount)
+{
+   if (!filefilter)
+   return 0;
+
+   if (stdiofilter)
+   return 1;
+
+   if (*warncount == 0)
+   warning("Not running your configured filter.%s.%s command, 
because filter.%s.%s is not configured",
+   ca->drv->name, filefiltername,
+   ca->drv->name, stdiofiltername);
+   *warncount=*warncount+1;
+
+   return 0;
+}
+
 int can_clean_from_file(const char *path)
 {
struct conv_attrs ca;
+   static int warncount = 0;
 
convert_attrs(, path);
if (!ca.drv)
return 0;
 
-   /*
-* Only use the cleanFromFile filter when the clean filter is also
-* configured.
-*/
-   return (ca.drv->clean_from_file && ca.drv->clean);
+   return can_filter_file(ca.drv->clean_from_file, "cleanFromFile",
+  ca.drv->clean, "clean", , );
 }
 
 int can_smudge_to_file(const char *path)
 {
struct conv_attrs ca;
+   static int warncount = 0;
 
convert_attrs(, path);
if (!ca.drv)
return 0;
 
-   /*
-* Only use the smudgeToFile filter when the smudge filter is also
-* configured.
-*/
-   return (ca.drv->smudge_to_file && ca.drv->smudge);
+   return can_filter_file(ca.drv->smudge_to_file, "smudgeToFile",
+  ca.drv->smudge, "smudge", , );
 }
 
 const char *get_convert_attr_ascii(const char *path)
-- 
2.8.1

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 8/8] use smudgeToFile filter in recursive merge

2016-07-11 Thread Joey Hess
Recursive merge updates the work tree and so should use the smudgeToFile
filter.

At this point, smudgeToFile is run by everything that updates work
tree files.

Signed-off-by: Joey Hess <jo...@joeyh.name>
---
 merge-recursive.c | 53 ---
 t/t0021-conversion.sh | 16 +++-
 2 files changed, 57 insertions(+), 12 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index a4a1195..5fe3f50 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -758,6 +758,7 @@ static void update_file_flags(struct merge_options *o,
enum object_type type;
void *buf;
unsigned long size;
+   int isreg;
 
if (S_ISGITLINK(mode)) {
/*
@@ -774,22 +775,16 @@ static void update_file_flags(struct merge_options *o,
die(_("cannot read object %s '%s'"), oid_to_hex(oid), 
path);
if (type != OBJ_BLOB)
die(_("blob expected for %s '%s'"), oid_to_hex(oid), 
path);
-   if (S_ISREG(mode)) {
-   struct strbuf strbuf = STRBUF_INIT;
-   if (convert_to_working_tree(path, buf, size, )) {
-   free(buf);
-   size = strbuf.len;
-   buf = strbuf_detach(, NULL);
-   }
-   }
 
if (make_room_for_path(o, path) < 0) {
update_wd = 0;
free(buf);
goto update_index;
}
-   if (S_ISREG(mode) || (!has_symlinks && S_ISLNK(mode))) {
+   isreg = S_ISREG(mode);
+   if (isreg || (!has_symlinks && S_ISLNK(mode))) {
int fd;
+   int smudge_to_file;
if (mode & 0100)
mode = 0777;
else
@@ -797,8 +792,44 @@ static void update_file_flags(struct merge_options *o,
fd = open(path, O_WRONLY | O_TRUNC | O_CREAT, mode);
if (fd < 0)
die_errno(_("failed to open '%s'"), path);
-   write_in_full(fd, buf, size);
-   close(fd);
+
+   smudge_to_file = can_smudge_to_file(path);
+   if (smudge_to_file) {
+   close(fd);
+   fd = 
convert_to_working_tree_filter_to_file(path, path, buf, size);
+   if (fd < 0) {
+   /*
+* smudgeToFile filter failed;
+* continue with regular file
+* creation.
+*/
+   smudge_to_file = 0;
+   fd = open(path, O_WRONLY | O_TRUNC | 
O_CREAT, mode);
+   if (fd < 0)
+   die_errno(_("failed to open 
'%s'"), path);
+   }
+   else {
+   close(fd);
+   }
+   }
+
+   /*
+* Not an else of above if (smudge_to_file) because
+* the smudgeToFile filter may fail and in that case
+* this is run to recover.
+*/
+   if (!smudge_to_file) {
+   if (isreg) {
+   struct strbuf strbuf = STRBUF_INIT;
+   if (convert_to_working_tree(path, buf, 
size, )) {
+   free(buf);
+   size = strbuf.len;
+   buf = strbuf_detach(, 
NULL);
+   }
+   }
+   write_in_full(fd, buf, size);
+   close(fd);
+   }
} else if (S_ISLNK(mode)) {
char *lnk = xmemdupz(buf, size);
safe_create_leading_directories_const(path);
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index 42b28aa..64b2b8f 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -334,10 +334,24 @@ test_expect_success 'recovery from failure of 
smudgeToFile filter that deletes t
test_cmp test fstest.t
 '
 
+test_expect_success 'smudgeToFile filter is used in merge' '
+   test_config filter.rot13.smudge

[PATCH v5 3/8] use cleanFromFile in git add

2016-07-11 Thread Joey Hess
Includes test cases.

Signed-off-by: Joey Hess <jo...@joeyh.name>
---
 sha1_file.c   | 42 --
 t/t0021-conversion.sh | 36 
 2 files changed, 72 insertions(+), 6 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 2fc22b0..549a20f 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3335,6 +3335,29 @@ static int index_stream_convert_blob(unsigned char 
*sha1, int fd,
return ret;
 }
 
+static int index_from_file_convert_blob(unsigned char *sha1,
+ const char *path, unsigned flags)
+{
+   int ret;
+   const int write_object = flags & HASH_WRITE_OBJECT;
+   struct strbuf sbuf = STRBUF_INIT;
+
+   assert(path);
+   assert(can_clean_from_file(path));
+
+   convert_to_git_filter_from_file(path, ,
+write_object ? safe_crlf : SAFE_CRLF_FALSE);
+
+   if (write_object)
+   ret = write_sha1_file(sbuf.buf, sbuf.len, typename(OBJ_BLOB),
+ sha1);
+   else
+   ret = hash_sha1_file(sbuf.buf, sbuf.len, typename(OBJ_BLOB),
+sha1);
+   strbuf_release();
+   return ret;
+}
+
 static int index_pipe(unsigned char *sha1, int fd, enum object_type type,
  const char *path, unsigned flags)
 {
@@ -3427,12 +3450,19 @@ int index_path(unsigned char *sha1, const char *path, 
struct stat *st, unsigned
 
switch (st->st_mode & S_IFMT) {
case S_IFREG:
-   fd = open(path, O_RDONLY);
-   if (fd < 0)
-   return error_errno("open(\"%s\")", path);
-   if (index_fd(sha1, fd, st, OBJ_BLOB, path, flags) < 0)
-   return error("%s: failed to insert into database",
-path);
+   if (can_clean_from_file(path)) {
+   if (index_from_file_convert_blob(sha1, path, flags) < 0)
+   return error("%s: failed to insert into 
database",
+path);
+   }
+   else {
+   fd = open(path, O_RDONLY);
+   if (fd < 0)
+   return error_errno("open(\"%s\")", path);
+   if (index_fd(sha1, fd, st, OBJ_BLOB, path, flags) < 0)
+   return error("%s: failed to insert into 
database",
+path);
+   }
break;
case S_IFLNK:
if (strbuf_readlink(, path, st->st_size))
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index 7bac2bc..bd84b80 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -12,6 +12,14 @@ tr \
 EOF
 chmod +x rot13.sh
 
+cat <rot13-from-file.sh
+#!$SHELL_PATH
+fsfile="\$1"
+touch rot13-from-file.ran
+cat "\$fsfile" | ./rot13.sh
+EOF
+chmod +x rot13-from-file.sh
+
 test_expect_success setup '
git config filter.rot13.smudge ./rot13.sh &&
git config filter.rot13.clean ./rot13.sh &&
@@ -268,4 +276,32 @@ test_expect_success 'disable filter with empty override' '
test_must_be_empty err
 '
 
+test_expect_success 'cleanFromFile filter is used when adding a file' '
+   test_config filter.rot13.cleanFromFile ./rot13-from-file.sh &&
+
+   echo "*.t filter=rot13" >.gitattributes &&
+
+   cat test >fstest.t &&
+   git add fstest.t &&
+   test -e rot13-from-file.ran &&
+   rm -f rot13-from-file.ran &&
+
+   rm -f fstest.t &&
+   git checkout -- fstest.t &&
+   test_cmp test fstest.t
+'
+
+test_expect_success 'cleanFromFile filter is not used when clean filter is not 
configured' '
+   test_config filter.noclean.smudge ./rot13.sh &&
+   test_config filter.noclean.cleanFromFile ./rot13-from-file.sh &&
+
+   echo "*.no filter=noclean" >.gitattributes &&
+
+   cat test >test.no &&
+   git add test.no &&
+   test ! -e rot13-from-file.ran &&
+   git cat-file blob :test.no >actual &&
+   test_cmp test actual
+'
+
 test_done
-- 
2.8.1

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Long running Git clean/smudge filter

2016-07-10 Thread Joey Hess
larsxschnei...@gmail.com wrote:
> (2) Joey's topic, which is the base for my patch, looks stalled for more than
> 2 weeks:
> http://thread.gmane.org/gmane.comp.version-control.git/297994/focus=298006
> I would be happy to address Junio's comments and post a reroll. However,
> I don't want to interfere with Joey's plans.

I've been on vacation and plan to get back to that in the upcoming week.

> @Joey (in case you are reading this):
> My patch changes your initial idea quite a bit. However, I believe it is an
> improvement that could be beneficial for git-annex, too. Would you prefer to
> work with me on the combination of our ideas (file clean/smudge + long running
> clean/smudge processes) or would you prefer to keep your interface?

Long running filters mean that you need a more complicated protocol to
speak over the pipes. Seems that such a protocol could be designed to work
with the original smudge/clean filters as well as with my
smudgeToFile/cleanFromFile filters. Assuming that there's a way to
tell whether the filters support being long-running or not.

Note that the interface we arrived at for smudgeToFile/cleanFromFile is as
similar as possible to smudge/clean, so the filter developer only has to
change one thing. That's a big plus, and so I don't like diverging the
two interfaces.

So, I don't want to entangle these two ideas very much.

-- 
see shy jo


signature.asc
Description: PGP signature


[PATCH v4 3/8] use cleanFromFile in git add

2016-06-22 Thread Joey Hess
Includes test cases.

Signed-off-by: Joey Hess <jo...@joeyh.name>
---
 sha1_file.c   | 44 ++--
 t/t0021-conversion.sh | 36 
 2 files changed, 74 insertions(+), 6 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 55604b6..df62eaf 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3339,6 +3339,31 @@ static int index_stream_convert_blob(unsigned char 
*sha1, int fd,
return ret;
 }
 
+static int index_from_file_convert_blob(unsigned char *sha1,
+ const char *path, unsigned flags)
+{
+   int ret;
+   const int write_object = flags & HASH_WRITE_OBJECT;
+   const int valid_sha1 = flags & HASH_USE_SHA_NOT_PATH;
+   struct strbuf sbuf = STRBUF_INIT;
+
+   assert(path);
+   assert(can_clean_from_file(path));
+
+   convert_to_git_filter_from_file(path, ,
+write_object ? safe_crlf : SAFE_CRLF_FALSE,
+valid_sha1 ? sha1 : NULL);
+
+   if (write_object)
+   ret = write_sha1_file(sbuf.buf, sbuf.len, typename(OBJ_BLOB),
+ sha1);
+   else
+   ret = hash_sha1_file(sbuf.buf, sbuf.len, typename(OBJ_BLOB),
+sha1);
+   strbuf_release();
+   return ret;
+}
+
 static int index_pipe(unsigned char *sha1, int fd, enum object_type type,
  const char *path, unsigned flags)
 {
@@ -3433,12 +3458,19 @@ int index_path(unsigned char *sha1, const char *path, 
struct stat *st, unsigned
 
switch (st->st_mode & S_IFMT) {
case S_IFREG:
-   fd = open(path, O_RDONLY);
-   if (fd < 0)
-   return error_errno("open(\"%s\")", path);
-   if (index_fd(sha1, fd, st, OBJ_BLOB, path, flags) < 0)
-   return error("%s: failed to insert into database",
-path);
+   if (can_clean_from_file(path)) {
+   if (index_from_file_convert_blob(sha1, path, flags) < 0)
+   return error("%s: failed to insert into 
database",
+path);
+   }
+   else {
+   fd = open(path, O_RDONLY);
+   if (fd < 0)
+   return error_errno("open(\"%s\")", path);
+   if (index_fd(sha1, fd, st, OBJ_BLOB, path, flags) < 0)
+   return error("%s: failed to insert into 
database",
+path);
+   }
break;
case S_IFLNK:
if (strbuf_readlink(, path, st->st_size))
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index 7bac2bc..407d5d6 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -12,6 +12,14 @@ tr \
 EOF
 chmod +x rot13.sh
 
+cat <rot13-from-file.sh
+#!$SHELL_PATH
+fsfile="\$1"
+touch rot13-from-file.ran
+cat "\$fsfile" | ./rot13.sh
+EOF
+chmod +x rot13-from-file.sh
+
 test_expect_success setup '
git config filter.rot13.smudge ./rot13.sh &&
git config filter.rot13.clean ./rot13.sh &&
@@ -268,4 +276,32 @@ test_expect_success 'disable filter with empty override' '
test_must_be_empty err
 '
 
+test_expect_success 'cleanFromFile filter is used when adding a file' '
+   test_config filter.rot13.cleanFromFile ./rot13-from-file.sh &&
+
+   echo "*.t filter=rot13" >.gitattributes &&
+
+   cat test >fstest.t &&
+   git add fstest.t &&
+   test -e rot13-from-file.ran &&
+   rm -f rot13-from-file.ran &&
+
+   rm -f fstest.t &&
+   git checkout -- fstest.t &&
+   cmp test fstest.t
+'
+
+test_expect_success 'cleanFromFile filter is not used when clean filter is not 
configured' '
+   test_config filter.noclean.smudge ./rot13.sh &&
+   test_config filter.noclean.cleanFromFile ./rot13-from-file.sh &&
+
+   echo "*.no filter=noclean" >.gitattributes &&
+
+   cat test >test.no &&
+   git add test.no &&
+   test ! -e rot13-from-file.ran &&
+   git cat-file blob :test.no >actual &&
+   cmp test actual
+'
+
 test_done
-- 
2.9.0.587.ga3bedf2

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 2/8] add smudgeToFile and cleanFromFile filter configs

2016-06-22 Thread Joey Hess
This adds new smudgeToFile and cleanFromFile filter commands,
which are similar to smudge and clean but allow direct access to files on
disk.

This interface can be much more efficient when operating on large files,
because the whole file content does not need to be streamed through the
filter. It even allows for things like cleanFromFile commands that avoid
reading the whole content of the file, and for smudgeToFile commands that
populate a work tree file using an efficient Copy On Write operation.

The new filter commands will not be used for all filtering. They are
efficient to use when git add is adding a file, or when the work tree is
being updated, but not a good fit when git is internally filtering blob
objects in memory for eg, a diff.

So, a user who wants to use smudgeToFile should also provide a smudge
command to be used in cases where smudgeToFile is not used. And ditto
with cleanFromFile and clean. To avoid foot-shooting configurations, the
new commands are not used unless the old commands are also configured.

That also ensures that a filter driver configuration that includes these
new commands will work, although less efficiently, when used with an older
version of git that does not support them.

Signed-off-by: Joey Hess <jo...@joeyh.name>
---
 Documentation/config.txt|  18 ++-
 Documentation/gitattributes.txt |  37 +
 convert.c   | 114 ++--
 convert.h   |  11 
 4 files changed, 162 insertions(+), 18 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 0d4095f..b76dad3 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1306,15 +1306,29 @@ format.useAutoBase::
format-patch by default.
 
 filter..clean::
-   The command which is used to convert the content of a worktree
+   The command which is used as a filter to convert the content of a 
worktree
file to a blob upon checkin.  See linkgit:gitattributes[5] for
details.
 
 filter..smudge::
-   The command which is used to convert the content of a blob
+   The command which is used as a filter to convert the content of a blob
object to a worktree file upon checkout.  See
linkgit:gitattributes[5] for details.
 
+filter..cleanFromFile::
+   Similar to filter..clean but the specified command
+   directly accesses a worktree file on disk, rather than
+   receiving the file content from standard input.
+   Only used when filter..clean is also configured.
+   See linkgit:gitattributes[5] for details.
+
+filter..smudgeToFile::
+   Similar to filter..smudge but the specified command
+   writes the content of a blob directly to a worktree file,
+   rather than to standard output.
+   Only used when filter..smudge is also configured.
+   See linkgit:gitattributes[5] for details.
+
 fsck.::
Allows overriding the message type (error, warn or ignore) of a
specific message ID such as `missingEmail`.
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 197ece8..a58aafc 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -385,6 +385,43 @@ not exist, or may have different contents. So, smudge and 
clean commands
 should not try to access the file on disk, but only act as filters on the
 content provided to them on standard input.
 
+There are two extra commands "cleanFromFile" and "smudgeToFile", which
+can optionally be set in a filter driver. These are similar to the "clean"
+and "smudge" commands, but avoid needing to pipe the contents of files
+through the filters, and instead read/write files in the filesystem.
+This can be more efficient when using filters with large files that are not
+directly stored in the repository.
+
+Both "cleanFromFile" and "smudgeToFile" are provided a path as an
+added parameter after the configured command line.
+
+The "cleanFromFile" command is provided the path to the file that
+it should clean. Like the "clean" command, it should output the cleaned
+version to standard output.
+
+The "smudgeToFile" command is provided a path to the file that it
+should write to. (This file will already exist, as an empty file that can
+be written to or replaced.) Like the "smudge" command, "smudgeToFile"
+is fed the blob object from its standard input.
+
+Some git operations that need to apply filters cannot use "cleanFromFile"
+and "smudgeToFile", since the files are not present to disk. So, to avoid
+inconsistent behavior, "cleanFromFile" will only be used if "clean" is
+also configured, and "smudgeToFile" will only be used if "smudge" is also
+configured.
+
+An example large file storage filter driver using cleanFromFile and
+sm

Re: What's cooking in git.git (Jun 2016, #05; Thu, 16)

2016-06-22 Thread Joey Hess
Torsten Bögershausen wrote:
> There is a conflict in pu:
> "jh/clean-smudge-annex" does not work together with "tb/convert-peek-in-index"
> 
> (And currently pu didn't compile)

I'm sending a v4 of jh/clean-smudge-annex that is rebased on top of
tb/convert-peek-in-index to fix this.

> (I will hopefully be able to do a separate review of the smudge/clean patch)

Would be appreciated. It'll be 2 weeks until I can work on this any more.

> (And jo...@joeyh.name is not reachable from web.de)

I'd like to fix whatever's broken; you could send details out of band to
joeyh...@gmail.com

-- 
see shy jo


signature.asc
Description: PGP signature


[PATCH v4 7/8] use smudgeToFile filter in git am

2016-06-22 Thread Joey Hess
git am updates the work tree and so should use the smudgeToFile filter.

This includes some refactoring into convert_to_working_tree_filter_to_file
to make it check the file after running the smudgeToFile command, and clean
up from a failing command.

Signed-off-by: Joey Hess <jo...@joeyh.name>
---
 builtin/apply.c   | 16 
 convert.c | 19 +--
 entry.c   | 15 +++
 t/t0021-conversion.sh | 13 +
 4 files changed, 49 insertions(+), 14 deletions(-)

diff --git a/builtin/apply.c b/builtin/apply.c
index b24c6ba..b027ab7 100644
--- a/builtin/apply.c
+++ b/builtin/apply.c
@@ -4250,6 +4250,22 @@ static int try_create_file(const char *path, unsigned 
int mode, const char *buf,
if (fd < 0)
return -1;
 
+   if (can_smudge_to_file(path)) {
+   close(fd);
+   fd = convert_to_working_tree_filter_to_file(path, path, buf, 
size);
+   if (fd < 0) {
+   /* smudgeToFile filter failed; continue
+* with regular file creation. */
+   fd = open(path, O_CREAT | O_EXCL | O_WRONLY, (mode & 
0100) ? 0777 : 0666);
+   if (fd < 0)
+   return -1;
+   }
+   else {
+   close(fd);
+   return 0;
+   }
+   }
+
if (convert_to_working_tree(path, buf, size, )) {
size = nbuf.len;
buf  = nbuf.buf;
diff --git a/convert.c b/convert.c
index 378a3bd..4fe89e0 100644
--- a/convert.c
+++ b/convert.c
@@ -1048,13 +1048,28 @@ int convert_to_working_tree(const char *path, const 
char *src, size_t len, struc
return convert_to_working_tree_internal(path, NULL, src, len, dst, 0);
 }
 
+/* Returns fd open to read the worktree file on success.
+ * On failure, the worktree file will not exist. */
 int convert_to_working_tree_filter_to_file(const char *path, const char 
*destpath, const char *src, size_t len)
 {
struct strbuf output = STRBUF_INIT;
-   int ret = convert_to_working_tree_internal(path, destpath, src, len, 
, 0);
+   int ok = convert_to_working_tree_internal(path, destpath, src, len, 
, 0);
/* The smudgeToFile filter stdout is not used. */
strbuf_release();
-   return ret;
+   if (ok) {
+   /* Open the file to make sure that it's present
+* (and readable) after the command populated it. */
+   int fd = open(path, O_RDONLY);
+   if (fd < 0)
+   unlink(path);
+   return fd;
+   }
+   else {
+   /* The command could have created the file before failing,
+* so delete it. */
+   unlink(path);
+   return -1;
+   }
 }
 
 int renormalize_buffer(const char *path, const char *src, size_t len, struct 
strbuf *dst)
diff --git a/entry.c b/entry.c
index 8322127..2f7c4fd 100644
--- a/entry.c
+++ b/entry.c
@@ -190,13 +190,10 @@ static int write_entry(struct cache_entry *ce,
 
if (smudge_to_file) {
close(fd);
-   if (! convert_to_working_tree_filter_to_file(ce->name, 
path, new, size)) {
+   fd = convert_to_working_tree_filter_to_file(ce->name, 
path, new, size);
+   if (fd < 0) {
smudge_to_file = 0;
-   /* The failing smudgeToFile filter may have
-* deleted or replaced the file; delete
-* the file and re-open for recovery write.
-*/
-   unlink(path);
+   /* re-open for recovery write */
fd = open_output_fd(path, ce, to_tempfile);
if (fd < 0) {
free(new);
@@ -205,12 +202,6 @@ static int write_entry(struct cache_entry *ce,
}
else {
free(new);
-   /* The smudgeToFile filter may have replaced
-* or deleted the file; reopen it to make sure
-* that the file exists. */
-   fd = open(path, O_RDONLY);
-   if (fd < 0)
-   return error_errno("unable to create 
file %s", path);
if (!to_tempfile)
fstat_done = fstat_output(fd, state, 
);
close(fd);
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index c0b4709..fd07bd6 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -334,

[PATCH v4 5/8] warn on unusable smudgeToFile/cleanFromFile config

2016-06-22 Thread Joey Hess
Let the user know when they have a smudgeToFile/cleanFromFile config
that cannot be used because the corresponding smudge/clean config
is missing.

The warning is only displayed a maximum of once per git invocation,
and only when doing an operation that would use the filter.

Signed-off-by: Joey Hess <jo...@joeyh.name>
---
 convert.c | 34 ++
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/convert.c b/convert.c
index 9dde406..378a3bd 100644
--- a/convert.c
+++ b/convert.c
@@ -859,32 +859,50 @@ int would_convert_to_git_filter_fd(const char *path)
return apply_filter(path, NULL, NULL, 0, -1, NULL, ca.drv->clean);
 }
 
+static int can_filter_file(const char *filefilter, const char *filefiltername,
+  const char *stdiofilter, const char *stdiofiltername,
+  const struct conv_attrs *ca,
+  int *warncount)
+{
+   if (! filefilter)
+   return 0;
+
+   if (stdiofilter)
+   return 1;
+
+   if (*warncount == 0)
+   warning("Not running your configured filter.%s.%s command, 
because filter.%s.%s is not configured",
+   ca->drv->name, filefiltername,
+   ca->drv->name, stdiofiltername);
+   *warncount=*warncount+1;
+
+   return 0;
+}
+
 int can_clean_from_file(const char *path)
 {
struct conv_attrs ca;
+   static int warncount = 0;
 
convert_attrs(, path);
if (!ca.drv)
return 0;
 
-   /* Only use the cleanFromFile filter when the clean filter is also
-* configured.
-*/
-   return (ca.drv->clean_from_file && ca.drv->clean);
+   return can_filter_file(ca.drv->clean_from_file, "cleanFromFile",
+  ca.drv->clean, "clean", , );
 }
 
 int can_smudge_to_file(const char *path)
 {
struct conv_attrs ca;
+   static int warncount = 0;
 
convert_attrs(, path);
if (!ca.drv)
return 0;
 
-   /* Only use the smudgeToFile filter when the smudge filter is also
-* configured.
-*/
-   return (ca.drv->smudge_to_file && ca.drv->smudge);
+   return can_filter_file(ca.drv->smudge_to_file, "smudgeToFile",
+  ca.drv->smudge, "smudge", , );
 }
 
 const char *get_convert_attr_ascii(const char *path)
-- 
2.9.0.587.ga3bedf2

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 8/8] use smudgeToFile filter in recursive merge

2016-06-22 Thread Joey Hess
Recursive merge updates the work tree and so should use the smudgeToFile
filter.

At this point, smudgeToFile is run by everything that updates work
tree files.

Signed-off-by: Joey Hess <jo...@joeyh.name>
---
 merge-recursive.c | 42 --
 t/t0021-conversion.sh | 16 +++-
 2 files changed, 47 insertions(+), 11 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 48fe7e7..7d38cf8 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -765,14 +765,6 @@ static void update_file_flags(struct merge_options *o,
die(_("cannot read object %s '%s'"), oid_to_hex(oid), 
path);
if (type != OBJ_BLOB)
die(_("blob expected for %s '%s'"), oid_to_hex(oid), 
path);
-   if (S_ISREG(mode)) {
-   struct strbuf strbuf = STRBUF_INIT;
-   if (convert_to_working_tree(path, buf, size, )) {
-   free(buf);
-   size = strbuf.len;
-   buf = strbuf_detach(, NULL);
-   }
-   }
 
if (make_room_for_path(o, path) < 0) {
update_wd = 0;
@@ -781,6 +773,7 @@ static void update_file_flags(struct merge_options *o,
}
if (S_ISREG(mode) || (!has_symlinks && S_ISLNK(mode))) {
int fd;
+   int isreg = S_ISREG(mode);
if (mode & 0100)
mode = 0777;
else
@@ -788,8 +781,37 @@ static void update_file_flags(struct merge_options *o,
fd = open(path, O_WRONLY | O_TRUNC | O_CREAT, mode);
if (fd < 0)
die_errno(_("failed to open '%s'"), path);
-   write_in_full(fd, buf, size);
-   close(fd);
+
+   int smudge_to_file = can_smudge_to_file(path);
+   if (smudge_to_file) {
+   close(fd);
+   fd = 
convert_to_working_tree_filter_to_file(path, path, buf, size);
+   if (fd < 0) {
+   /* smudgeToFile filter failed;
+* continue with regular file
+* creation. */
+   smudge_to_file = 0;
+   fd = open(path, O_WRONLY | O_TRUNC | 
O_CREAT, mode);
+   if (fd < 0)
+   die_errno(_("failed to open 
'%s'"), path);
+   }
+   else {
+   close(fd);
+   }
+   }
+
+   if (! smudge_to_file) {
+   if (isreg) {
+   struct strbuf strbuf = STRBUF_INIT;
+   if (convert_to_working_tree(path, buf, 
size, )) {
+   free(buf);
+   size = strbuf.len;
+   buf = strbuf_detach(, 
NULL);
+   }
+   }
+   write_in_full(fd, buf, size);
+   close(fd);
+   }
} else if (S_ISLNK(mode)) {
char *lnk = xmemdupz(buf, size);
safe_create_leading_directories_const(path);
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index fd07bd6..2722013 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -334,10 +334,24 @@ test_expect_success 'recovery from failure of 
smudgeToFile filter that deletes t
cmp test fstest.t
 '
 
+test_expect_success 'smudgeToFile filter is used in merge' '
+   test_config filter.rot13.smudgeToFile ./rot13-to-file.sh &&
+
+   git commit -m "added fstest.t" fstest.t &&
+   git checkout -b old &&
+   git reset --hard HEAD^ &&
+   git merge master &&
+
+   test -e rot13-to-file.ran &&
+   rm -f rot13-to-file.ran &&
+
+   cmp test fstest.t &&
+   git checkout master
+'
+
 test_expect_success 'smudgeToFile filter is used by git am' '
test_config filter.rot13.smudgeToFile ./rot13-to-file.sh &&
 
-   git commit fstest.t -m "added fstest.t" &&
git format-patch HEAD^ --stdout > fstest.patch &&
git reset --hard HEAD^ &&
git am &

[PATCH v4 6/8] better recovery from failure of smudgeToFile filter

2016-06-22 Thread Joey Hess
If the smudgeToFile filter fails, it can leave the worktree file with the
wrong content, or even deleted. Recover from this by falling back to
running the smudge filter.

Signed-off-by: Joey Hess <jo...@joeyh.name>
---
 entry.c   | 55 ---
 t/t0021-conversion.sh | 24 ++
 2 files changed, 59 insertions(+), 20 deletions(-)

diff --git a/entry.c b/entry.c
index 97975e5..8322127 100644
--- a/entry.c
+++ b/entry.c
@@ -181,12 +181,6 @@ static int write_entry(struct cache_entry *ce,
int regular_file = ce_mode_s_ifmt == S_IFREG;
int smudge_to_file = regular_file
&& can_smudge_to_file(ce->name);
-   if (regular_file && ! smudge_to_file &&
-   convert_to_working_tree(ce->name, new, size, )) {
-   free(new);
-   new = strbuf_detach(, );
-   size = newsize;
-   }
 
fd = open_output_fd(path, ce, to_tempfile);
if (fd < 0) {
@@ -194,7 +188,42 @@ static int write_entry(struct cache_entry *ce,
return error_errno("unable to create file %s", path);
}
 
+   if (smudge_to_file) {
+   close(fd);
+   if (! convert_to_working_tree_filter_to_file(ce->name, 
path, new, size)) {
+   smudge_to_file = 0;
+   /* The failing smudgeToFile filter may have
+* deleted or replaced the file; delete
+* the file and re-open for recovery write.
+*/
+   unlink(path);
+   fd = open_output_fd(path, ce, to_tempfile);
+   if (fd < 0) {
+   free(new);
+   return error_errno("unable to create 
file %s", path);
+   }
+   }
+   else {
+   free(new);
+   /* The smudgeToFile filter may have replaced
+* or deleted the file; reopen it to make sure
+* that the file exists. */
+   fd = open(path, O_RDONLY);
+   if (fd < 0)
+   return error_errno("unable to create 
file %s", path);
+   if (!to_tempfile)
+   fstat_done = fstat_output(fd, state, 
);
+   close(fd);
+   }
+   }
+
if (! smudge_to_file) {
+   if (regular_file &&
+   convert_to_working_tree(ce->name, new, size, )) 
{
+   free(new);
+   new = strbuf_detach(, );
+   size = newsize;
+   }
wrote = write_in_full(fd, new, size);
if (!to_tempfile)
fstat_done = fstat_output(fd, state, );
@@ -203,20 +232,6 @@ static int write_entry(struct cache_entry *ce,
if (wrote != size)
return error("unable to write file %s", path);
}
-   else {
-   close(fd);
-   convert_to_working_tree_filter_to_file(ce->name, path, 
new, size);
-   free(new);
-   /* The smudgeToFile filter may have replaced the
-* file; open it to make sure that the file
-* exists. */
-   fd = open(path, O_RDONLY);
-   if (fd < 0)
-   return error_errno("unable to create file %s", 
path);
-   if (!to_tempfile)
-   fstat_done = fstat_output(fd, state, );
-   close(fd);
-   }
break;
case S_IFGITLINK:
if (to_tempfile)
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index cba03fd..c0b4709 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -28,6 +28,14 @@ touch rot13-to-file.ran
 EOF
 chmod +x rot13-to-file.sh
 
+cat <delete-file-and-fail.sh
+#!$SHELL_PATH
+destfile="\$1"
+rm -f "\$destfile"
+exit 1
+EOF
+chmod +x delete-file-and-fail.sh
+
 test_expect_success setup '
git config filter.rot13.smudge ./rot13.sh &&
git config filter.rot13.clean ./rot13.sh &&
@@ -310,6 +318,22 @@ test

[PATCH v4 1/8] clarify %f documentation

2016-06-22 Thread Joey Hess
It's natural to expect %f to be an actual file on disk; help avoid that
mistake.

Signed-off-by: Joey Hess <jo...@joeyh.name>
---
 Documentation/gitattributes.txt | 5 +
 1 file changed, 5 insertions(+)

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index f2afdb6..197ece8 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -379,6 +379,11 @@ substitution.  For example:
smudge = git-p4-filter --smudge %f
 
 
+Note that "%f" is the name of the path that is being worked on. Depending
+on the version that is being filtered, the corresponding file on disk may
+not exist, or may have different contents. So, smudge and clean commands
+should not try to access the file on disk, but only act as filters on the
+content provided to them on standard input.
 
 Interaction between checkin/checkout attributes
 ^^^
-- 
2.9.0.587.ga3bedf2

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 0/8] extend smudge/clean filters with direct file access (for pu)

2016-06-22 Thread Joey Hess
This is the same as v3, except rebased on top of tb/convert-peek-in-index
to fix a build failure in pu.

Joey Hess (8):
  clarify %f documentation
  add smudgeToFile and cleanFromFile filter configs
  use cleanFromFile in git add
  use smudgeToFile in git checkout etc
  warn on unusable smudgeToFile/cleanFromFile config
  better recovery from failure of smudgeToFile filter
  use smudgeToFile filter in git am
  use smudgeToFile filter in recursive merge

 Documentation/config.txt|  18 -
 Documentation/gitattributes.txt |  42 
 builtin/apply.c |  16 +
 convert.c   | 147 +++-
 convert.h   |  11 +++
 entry.c |  53 +++
 merge-recursive.c   |  42 +---
 sha1_file.c |  44 ++--
 t/t0021-conversion.sh   | 115 +++
 9 files changed, 441 insertions(+), 47 deletions(-)

-- 
2.9.0.587.ga3bedf2

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 4/8] use smudgeToFile in git checkout etc

2016-06-22 Thread Joey Hess
This makes git checkout, git reset, etc use smudgeToFile.

Includes test cases.

(There's a call to convert_to_working_tree in merge-recursive.c
that could also be made to use smudgeToFile as well.)

Signed-off-by: Joey Hess <jo...@joeyh.name>
---
 entry.c   | 37 +
 t/t0021-conversion.sh | 38 +-
 2 files changed, 62 insertions(+), 13 deletions(-)

diff --git a/entry.c b/entry.c
index 519e042..97975e5 100644
--- a/entry.c
+++ b/entry.c
@@ -175,8 +175,13 @@ static int write_entry(struct cache_entry *ce,
 
/*
 * Convert from git internal format to working tree format
+* unless the smudgeToFile filter can write to the
+* file directly.
 */
-   if (ce_mode_s_ifmt == S_IFREG &&
+   int regular_file = ce_mode_s_ifmt == S_IFREG;
+   int smudge_to_file = regular_file
+   && can_smudge_to_file(ce->name);
+   if (regular_file && ! smudge_to_file &&
convert_to_working_tree(ce->name, new, size, )) {
free(new);
new = strbuf_detach(, );
@@ -189,13 +194,29 @@ static int write_entry(struct cache_entry *ce,
return error_errno("unable to create file %s", path);
}
 
-   wrote = write_in_full(fd, new, size);
-   if (!to_tempfile)
-   fstat_done = fstat_output(fd, state, );
-   close(fd);
-   free(new);
-   if (wrote != size)
-   return error("unable to write file %s", path);
+   if (! smudge_to_file) {
+   wrote = write_in_full(fd, new, size);
+   if (!to_tempfile)
+   fstat_done = fstat_output(fd, state, );
+   close(fd);
+   free(new);
+   if (wrote != size)
+   return error("unable to write file %s", path);
+   }
+   else {
+   close(fd);
+   convert_to_working_tree_filter_to_file(ce->name, path, 
new, size);
+   free(new);
+   /* The smudgeToFile filter may have replaced the
+* file; open it to make sure that the file
+* exists. */
+   fd = open(path, O_RDONLY);
+   if (fd < 0)
+   return error_errno("unable to create file %s", 
path);
+   if (!to_tempfile)
+   fstat_done = fstat_output(fd, state, );
+   close(fd);
+   }
break;
case S_IFGITLINK:
if (to_tempfile)
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index 407d5d6..cba03fd 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -14,12 +14,20 @@ chmod +x rot13.sh
 
 cat <rot13-from-file.sh
 #!$SHELL_PATH
-fsfile="\$1"
+srcfile="\$1"
 touch rot13-from-file.ran
-cat "\$fsfile" | ./rot13.sh
+cat "\$srcfile" | ./rot13.sh
 EOF
 chmod +x rot13-from-file.sh
 
+cat <rot13-to-file.sh
+#!$SHELL_PATH
+destfile="\$1"
+touch rot13-to-file.ran
+./rot13.sh > "\$destfile"
+EOF
+chmod +x rot13-to-file.sh
+
 test_expect_success setup '
git config filter.rot13.smudge ./rot13.sh &&
git config filter.rot13.clean ./rot13.sh &&
@@ -291,6 +299,17 @@ test_expect_success 'cleanFromFile filter is used when 
adding a file' '
cmp test fstest.t
 '
 
+test_expect_success 'smudgeToFile filter is used when checking out a file' '
+   test_config filter.rot13.smudgeToFile ./rot13-to-file.sh &&
+
+   rm -f fstest.t &&
+   git checkout -- fstest.t &&
+   cmp test fstest.t &&
+
+   test -e rot13-to-file.ran &&
+   rm -f rot13-to-file.ran
+'
+
 test_expect_success 'cleanFromFile filter is not used when clean filter is not 
configured' '
test_config filter.noclean.smudge ./rot13.sh &&
test_config filter.noclean.cleanFromFile ./rot13-from-file.sh &&
@@ -299,9 +318,18 @@ test_expect_success 'cleanFromFile filter is not used when 
clean filter is not c
 
cat test >test.no &&
git add test.no &&
-   test ! -e rot13-from-file.ran &&
-   git cat-file blob :test.no >actual &&
-   cmp test actual
+   test ! -e rot13-from-file.ran
+'
+
+test_expect_success 'smudgeToFile filter is not used when smudge filter is not 
configured' '
+   test_config filter.nosmudge.clean ./rot13.sh &&
+   test_

[PATCH v3 3/8] use cleanFromFile in git add

2016-06-22 Thread Joey Hess
Includes test cases.

Signed-off-by: Joey Hess <jo...@joeyh.name>
---
 sha1_file.c   | 42 --
 t/t0021-conversion.sh | 36 
 2 files changed, 72 insertions(+), 6 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index d5e1121..8df86a0 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3329,6 +3329,29 @@ static int index_stream_convert_blob(unsigned char 
*sha1, int fd,
return ret;
 }
 
+static int index_from_file_convert_blob(unsigned char *sha1,
+ const char *path, unsigned flags)
+{
+   int ret;
+   const int write_object = flags & HASH_WRITE_OBJECT;
+   struct strbuf sbuf = STRBUF_INIT;
+
+   assert(path);
+   assert(can_clean_from_file(path));
+
+   convert_to_git_filter_from_file(path, ,
+write_object ? safe_crlf : SAFE_CRLF_FALSE);
+
+   if (write_object)
+   ret = write_sha1_file(sbuf.buf, sbuf.len, typename(OBJ_BLOB),
+ sha1);
+   else
+   ret = hash_sha1_file(sbuf.buf, sbuf.len, typename(OBJ_BLOB),
+sha1);
+   strbuf_release();
+   return ret;
+}
+
 static int index_pipe(unsigned char *sha1, int fd, enum object_type type,
  const char *path, unsigned flags)
 {
@@ -3421,12 +3444,19 @@ int index_path(unsigned char *sha1, const char *path, 
struct stat *st, unsigned
 
switch (st->st_mode & S_IFMT) {
case S_IFREG:
-   fd = open(path, O_RDONLY);
-   if (fd < 0)
-   return error_errno("open(\"%s\")", path);
-   if (index_fd(sha1, fd, st, OBJ_BLOB, path, flags) < 0)
-   return error("%s: failed to insert into database",
-path);
+   if (can_clean_from_file(path)) {
+   if (index_from_file_convert_blob(sha1, path, flags) < 0)
+   return error("%s: failed to insert into 
database",
+path);
+   }
+   else {
+   fd = open(path, O_RDONLY);
+   if (fd < 0)
+   return error_errno("open(\"%s\")", path);
+   if (index_fd(sha1, fd, st, OBJ_BLOB, path, flags) < 0)
+   return error("%s: failed to insert into 
database",
+path);
+   }
break;
case S_IFLNK:
if (strbuf_readlink(, path, st->st_size))
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index 7bac2bc..407d5d6 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -12,6 +12,14 @@ tr \
 EOF
 chmod +x rot13.sh
 
+cat <rot13-from-file.sh
+#!$SHELL_PATH
+fsfile="\$1"
+touch rot13-from-file.ran
+cat "\$fsfile" | ./rot13.sh
+EOF
+chmod +x rot13-from-file.sh
+
 test_expect_success setup '
git config filter.rot13.smudge ./rot13.sh &&
git config filter.rot13.clean ./rot13.sh &&
@@ -268,4 +276,32 @@ test_expect_success 'disable filter with empty override' '
test_must_be_empty err
 '
 
+test_expect_success 'cleanFromFile filter is used when adding a file' '
+   test_config filter.rot13.cleanFromFile ./rot13-from-file.sh &&
+
+   echo "*.t filter=rot13" >.gitattributes &&
+
+   cat test >fstest.t &&
+   git add fstest.t &&
+   test -e rot13-from-file.ran &&
+   rm -f rot13-from-file.ran &&
+
+   rm -f fstest.t &&
+   git checkout -- fstest.t &&
+   cmp test fstest.t
+'
+
+test_expect_success 'cleanFromFile filter is not used when clean filter is not 
configured' '
+   test_config filter.noclean.smudge ./rot13.sh &&
+   test_config filter.noclean.cleanFromFile ./rot13-from-file.sh &&
+
+   echo "*.no filter=noclean" >.gitattributes &&
+
+   cat test >test.no &&
+   git add test.no &&
+   test ! -e rot13-from-file.ran &&
+   git cat-file blob :test.no >actual &&
+   cmp test actual
+'
+
 test_done
-- 
2.9.0.8.g973eabb.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 8/8] use smudgeToFile filter in recursive merge

2016-06-22 Thread Joey Hess
Recursive merge updates the work tree and so should use the smudgeToFile
filter.

At this point, smudgeToFile is run by everything that updates work
tree files.

Signed-off-by: Joey Hess <jo...@joeyh.name>
---
 merge-recursive.c | 42 --
 t/t0021-conversion.sh | 16 +++-
 2 files changed, 47 insertions(+), 11 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 65cb5d6..012fe38 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -765,14 +765,6 @@ static void update_file_flags(struct merge_options *o,
die(_("cannot read object %s '%s'"), sha1_to_hex(sha), 
path);
if (type != OBJ_BLOB)
die(_("blob expected for %s '%s'"), sha1_to_hex(sha), 
path);
-   if (S_ISREG(mode)) {
-   struct strbuf strbuf = STRBUF_INIT;
-   if (convert_to_working_tree(path, buf, size, )) {
-   free(buf);
-   size = strbuf.len;
-   buf = strbuf_detach(, NULL);
-   }
-   }
 
if (make_room_for_path(o, path) < 0) {
update_wd = 0;
@@ -781,6 +773,7 @@ static void update_file_flags(struct merge_options *o,
}
if (S_ISREG(mode) || (!has_symlinks && S_ISLNK(mode))) {
int fd;
+   int isreg = S_ISREG(mode);
if (mode & 0100)
mode = 0777;
else
@@ -788,8 +781,37 @@ static void update_file_flags(struct merge_options *o,
fd = open(path, O_WRONLY | O_TRUNC | O_CREAT, mode);
if (fd < 0)
die_errno(_("failed to open '%s'"), path);
-   write_in_full(fd, buf, size);
-   close(fd);
+
+   int smudge_to_file = can_smudge_to_file(path);
+   if (smudge_to_file) {
+   close(fd);
+   fd = 
convert_to_working_tree_filter_to_file(path, path, buf, size);
+   if (fd < 0) {
+   /* smudgeToFile filter failed;
+* continue with regular file
+* creation. */
+   smudge_to_file = 0;
+   fd = open(path, O_WRONLY | O_TRUNC | 
O_CREAT, mode);
+   if (fd < 0)
+   die_errno(_("failed to open 
'%s'"), path);
+   }
+   else {
+   close(fd);
+   }
+   }
+
+   if (! smudge_to_file) {
+   if (isreg) {
+   struct strbuf strbuf = STRBUF_INIT;
+   if (convert_to_working_tree(path, buf, 
size, )) {
+   free(buf);
+   size = strbuf.len;
+   buf = strbuf_detach(, 
NULL);
+   }
+   }
+   write_in_full(fd, buf, size);
+   close(fd);
+   }
} else if (S_ISLNK(mode)) {
char *lnk = xmemdupz(buf, size);
safe_create_leading_directories_const(path);
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index fd07bd6..2722013 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -334,10 +334,24 @@ test_expect_success 'recovery from failure of 
smudgeToFile filter that deletes t
cmp test fstest.t
 '
 
+test_expect_success 'smudgeToFile filter is used in merge' '
+   test_config filter.rot13.smudgeToFile ./rot13-to-file.sh &&
+
+   git commit -m "added fstest.t" fstest.t &&
+   git checkout -b old &&
+   git reset --hard HEAD^ &&
+   git merge master &&
+
+   test -e rot13-to-file.ran &&
+   rm -f rot13-to-file.ran &&
+
+   cmp test fstest.t &&
+   git checkout master
+'
+
 test_expect_success 'smudgeToFile filter is used by git am' '
test_config filter.rot13.smudgeToFile ./rot13-to-file.sh &&
 
-   git commit fstest.t -m "added fstest.t" &&
git format-patch HEAD^ --stdout > fstest.patch &&
git reset --hard HEAD^ &&
git am < 

[PATCH v3 4/8] use smudgeToFile in git checkout etc

2016-06-22 Thread Joey Hess
This makes git checkout, git reset, etc use smudgeToFile.

Includes test cases.

(There's a call to convert_to_working_tree in merge-recursive.c
that could also be made to use smudgeToFile as well.)

Signed-off-by: Joey Hess <jo...@joeyh.name>
---
 entry.c   | 37 +
 t/t0021-conversion.sh | 38 +-
 2 files changed, 62 insertions(+), 13 deletions(-)

diff --git a/entry.c b/entry.c
index 519e042..97975e5 100644
--- a/entry.c
+++ b/entry.c
@@ -175,8 +175,13 @@ static int write_entry(struct cache_entry *ce,
 
/*
 * Convert from git internal format to working tree format
+* unless the smudgeToFile filter can write to the
+* file directly.
 */
-   if (ce_mode_s_ifmt == S_IFREG &&
+   int regular_file = ce_mode_s_ifmt == S_IFREG;
+   int smudge_to_file = regular_file
+   && can_smudge_to_file(ce->name);
+   if (regular_file && ! smudge_to_file &&
convert_to_working_tree(ce->name, new, size, )) {
free(new);
new = strbuf_detach(, );
@@ -189,13 +194,29 @@ static int write_entry(struct cache_entry *ce,
return error_errno("unable to create file %s", path);
}
 
-   wrote = write_in_full(fd, new, size);
-   if (!to_tempfile)
-   fstat_done = fstat_output(fd, state, );
-   close(fd);
-   free(new);
-   if (wrote != size)
-   return error("unable to write file %s", path);
+   if (! smudge_to_file) {
+   wrote = write_in_full(fd, new, size);
+   if (!to_tempfile)
+   fstat_done = fstat_output(fd, state, );
+   close(fd);
+   free(new);
+   if (wrote != size)
+   return error("unable to write file %s", path);
+   }
+   else {
+   close(fd);
+   convert_to_working_tree_filter_to_file(ce->name, path, 
new, size);
+   free(new);
+   /* The smudgeToFile filter may have replaced the
+* file; open it to make sure that the file
+* exists. */
+   fd = open(path, O_RDONLY);
+   if (fd < 0)
+   return error_errno("unable to create file %s", 
path);
+   if (!to_tempfile)
+   fstat_done = fstat_output(fd, state, );
+   close(fd);
+   }
break;
case S_IFGITLINK:
if (to_tempfile)
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index 407d5d6..cba03fd 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -14,12 +14,20 @@ chmod +x rot13.sh
 
 cat <rot13-from-file.sh
 #!$SHELL_PATH
-fsfile="\$1"
+srcfile="\$1"
 touch rot13-from-file.ran
-cat "\$fsfile" | ./rot13.sh
+cat "\$srcfile" | ./rot13.sh
 EOF
 chmod +x rot13-from-file.sh
 
+cat <rot13-to-file.sh
+#!$SHELL_PATH
+destfile="\$1"
+touch rot13-to-file.ran
+./rot13.sh > "\$destfile"
+EOF
+chmod +x rot13-to-file.sh
+
 test_expect_success setup '
git config filter.rot13.smudge ./rot13.sh &&
git config filter.rot13.clean ./rot13.sh &&
@@ -291,6 +299,17 @@ test_expect_success 'cleanFromFile filter is used when 
adding a file' '
cmp test fstest.t
 '
 
+test_expect_success 'smudgeToFile filter is used when checking out a file' '
+   test_config filter.rot13.smudgeToFile ./rot13-to-file.sh &&
+
+   rm -f fstest.t &&
+   git checkout -- fstest.t &&
+   cmp test fstest.t &&
+
+   test -e rot13-to-file.ran &&
+   rm -f rot13-to-file.ran
+'
+
 test_expect_success 'cleanFromFile filter is not used when clean filter is not 
configured' '
test_config filter.noclean.smudge ./rot13.sh &&
test_config filter.noclean.cleanFromFile ./rot13-from-file.sh &&
@@ -299,9 +318,18 @@ test_expect_success 'cleanFromFile filter is not used when 
clean filter is not c
 
cat test >test.no &&
git add test.no &&
-   test ! -e rot13-from-file.ran &&
-   git cat-file blob :test.no >actual &&
-   cmp test actual
+   test ! -e rot13-from-file.ran
+'
+
+test_expect_success 'smudgeToFile filter is not used when smudge filter is not 
configured' '
+   test_config filter.nosmudge.clean ./rot13.sh &&
+   test

[PATCH v3 7/8] use smudgeToFile filter in git am

2016-06-22 Thread Joey Hess
git am updates the work tree and so should use the smudgeToFile filter.

This includes some refactoring into convert_to_working_tree_filter_to_file
to make it check the file after running the smudgeToFile command, and clean
up from a failing command.

Signed-off-by: Joey Hess <jo...@joeyh.name>
---
 builtin/apply.c   | 16 
 convert.c | 19 +--
 entry.c   | 15 +++
 t/t0021-conversion.sh | 13 +
 4 files changed, 49 insertions(+), 14 deletions(-)

diff --git a/builtin/apply.c b/builtin/apply.c
index c770d7d..acd7c3e 100644
--- a/builtin/apply.c
+++ b/builtin/apply.c
@@ -4109,6 +4109,22 @@ static int try_create_file(const char *path, unsigned 
int mode, const char *buf,
if (fd < 0)
return -1;
 
+   if (can_smudge_to_file(path)) {
+   close(fd);
+   fd = convert_to_working_tree_filter_to_file(path, path, buf, 
size);
+   if (fd < 0) {
+   /* smudgeToFile filter failed; continue
+* with regular file creation. */
+   fd = open(path, O_CREAT | O_EXCL | O_WRONLY, (mode & 
0100) ? 0777 : 0666);
+   if (fd < 0)
+   return -1;
+   }
+   else {
+   close(fd);
+   return 0;
+   }
+   }
+
if (convert_to_working_tree(path, buf, size, )) {
size = nbuf.len;
buf  = nbuf.buf;
diff --git a/convert.c b/convert.c
index b6a76a9..948b52f 100644
--- a/convert.c
+++ b/convert.c
@@ -1031,13 +1031,28 @@ int convert_to_working_tree(const char *path, const 
char *src, size_t len, struc
return convert_to_working_tree_internal(path, NULL, src, len, dst, 0);
 }
 
+/* Returns fd open to read the worktree file on success.
+ * On failure, the worktree file will not exist. */
 int convert_to_working_tree_filter_to_file(const char *path, const char 
*destpath, const char *src, size_t len)
 {
struct strbuf output = STRBUF_INIT;
-   int ret = convert_to_working_tree_internal(path, destpath, src, len, 
, 0);
+   int ok = convert_to_working_tree_internal(path, destpath, src, len, 
, 0);
/* The smudgeToFile filter stdout is not used. */
strbuf_release();
-   return ret;
+   if (ok) {
+   /* Open the file to make sure that it's present
+* (and readable) after the command populated it. */
+   int fd = open(path, O_RDONLY);
+   if (fd < 0)
+   unlink(path);
+   return fd;
+   }
+   else {
+   /* The command could have created the file before failing,
+* so delete it. */
+   unlink(path);
+   return -1;
+   }
 }
 
 int renormalize_buffer(const char *path, const char *src, size_t len, struct 
strbuf *dst)
diff --git a/entry.c b/entry.c
index 8322127..2f7c4fd 100644
--- a/entry.c
+++ b/entry.c
@@ -190,13 +190,10 @@ static int write_entry(struct cache_entry *ce,
 
if (smudge_to_file) {
close(fd);
-   if (! convert_to_working_tree_filter_to_file(ce->name, 
path, new, size)) {
+   fd = convert_to_working_tree_filter_to_file(ce->name, 
path, new, size);
+   if (fd < 0) {
smudge_to_file = 0;
-   /* The failing smudgeToFile filter may have
-* deleted or replaced the file; delete
-* the file and re-open for recovery write.
-*/
-   unlink(path);
+   /* re-open for recovery write */
fd = open_output_fd(path, ce, to_tempfile);
if (fd < 0) {
free(new);
@@ -205,12 +202,6 @@ static int write_entry(struct cache_entry *ce,
}
else {
free(new);
-   /* The smudgeToFile filter may have replaced
-* or deleted the file; reopen it to make sure
-* that the file exists. */
-   fd = open(path, O_RDONLY);
-   if (fd < 0)
-   return error_errno("unable to create 
file %s", path);
if (!to_tempfile)
fstat_done = fstat_output(fd, state, 
);
close(fd);
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index c0b4709..fd07bd6 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -334,

[PATCH v3 5/8] warn on unusable smudgeToFile/cleanFromFile config

2016-06-22 Thread Joey Hess
Let the user know when they have a smudgeToFile/cleanFromFile config
that cannot be used because the corresponding smudge/clean config
is missing.

The warning is only displayed a maximum of once per git invocation,
and only when doing an operation that would use the filter.

Signed-off-by: Joey Hess <jo...@joeyh.name>
---
 convert.c | 34 ++
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/convert.c b/convert.c
index bf63ba0..b6a76a9 100644
--- a/convert.c
+++ b/convert.c
@@ -847,32 +847,50 @@ int would_convert_to_git_filter_fd(const char *path)
return apply_filter(path, NULL, NULL, 0, -1, NULL, ca.drv->clean);
 }
 
+static int can_filter_file(const char *filefilter, const char *filefiltername,
+  const char *stdiofilter, const char *stdiofiltername,
+  const struct conv_attrs *ca,
+  int *warncount)
+{
+   if (! filefilter)
+   return 0;
+
+   if (stdiofilter)
+   return 1;
+
+   if (*warncount == 0)
+   warning("Not running your configured filter.%s.%s command, 
because filter.%s.%s is not configured",
+   ca->drv->name, filefiltername,
+   ca->drv->name, stdiofiltername);
+   *warncount=*warncount+1;
+
+   return 0;
+}
+
 int can_clean_from_file(const char *path)
 {
struct conv_attrs ca;
+   static int warncount = 0;
 
convert_attrs(, path);
if (!ca.drv)
return 0;
 
-   /* Only use the cleanFromFile filter when the clean filter is also
-* configured.
-*/
-   return (ca.drv->clean_from_file && ca.drv->clean);
+   return can_filter_file(ca.drv->clean_from_file, "cleanFromFile",
+  ca.drv->clean, "clean", , );
 }
 
 int can_smudge_to_file(const char *path)
 {
struct conv_attrs ca;
+   static int warncount = 0;
 
convert_attrs(, path);
if (!ca.drv)
return 0;
 
-   /* Only use the smudgeToFile filter when the smudge filter is also
-* configured.
-*/
-   return (ca.drv->smudge_to_file && ca.drv->smudge);
+   return can_filter_file(ca.drv->smudge_to_file, "smudgeToFile",
+  ca.drv->smudge, "smudge", , );
 }
 
 const char *get_convert_attr_ascii(const char *path)
-- 
2.9.0.8.g973eabb.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/8] add smudgeToFile and cleanFromFile filter configs

2016-06-22 Thread Joey Hess
This adds new smudgeToFile and cleanFromFile filter commands,
which are similar to smudge and clean but allow direct access to files on
disk.

This interface can be much more efficient when operating on large files,
because the whole file content does not need to be streamed through the
filter. It even allows for things like cleanFromFile commands that avoid
reading the whole content of the file, and for smudgeToFile commands that
populate a work tree file using an efficient Copy On Write operation.

The new filter commands will not be used for all filtering. They are
efficient to use when git add is adding a file, or when the work tree is
being updated, but not a good fit when git is internally filtering blob
objects in memory for eg, a diff.

So, a user who wants to use smudgeToFile should also provide a smudge
command to be used in cases where smudgeToFile is not used. And ditto
with cleanFromFile and clean. To avoid foot-shooting configurations, the
new commands are not used unless the old commands are also configured.

That also ensures that a filter driver configuration that includes these
new commands will work, although less efficiently, when used with an older
version of git that does not support them.

Signed-off-by: Joey Hess <jo...@joeyh.name>
---
 Documentation/config.txt|  18 ++-
 Documentation/gitattributes.txt |  37 ++
 convert.c   | 108 ++--
 convert.h   |  10 
 4 files changed, 157 insertions(+), 16 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 2e1b2e4..c300efe 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1299,15 +1299,29 @@ format.useAutoBase::
format-patch by default.
 
 filter..clean::
-   The command which is used to convert the content of a worktree
+   The command which is used as a filter to convert the content of a 
worktree
file to a blob upon checkin.  See linkgit:gitattributes[5] for
details.
 
 filter..smudge::
-   The command which is used to convert the content of a blob
+   The command which is used as a filter to convert the content of a blob
object to a worktree file upon checkout.  See
linkgit:gitattributes[5] for details.
 
+filter..cleanFromFile::
+   Similar to filter..clean but the specified command
+   directly accesses a worktree file on disk, rather than
+   receiving the file content from standard input.
+   Only used when filter..clean is also configured.
+   See linkgit:gitattributes[5] for details.
+
+filter..smudgeToFile::
+   Similar to filter..smudge but the specified command
+   writes the content of a blob directly to a worktree file,
+   rather than to standard output.
+   Only used when filter..smudge is also configured.
+   See linkgit:gitattributes[5] for details.
+
 fsck.::
Allows overriding the message type (error, warn or ignore) of a
specific message ID such as `missingEmail`.
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 145dd10..5ae0783 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -380,6 +380,43 @@ not exist, or may have different contents. So, smudge and 
clean commands
 should not try to access the file on disk, but only act as filters on the
 content provided to them on standard input.
 
+There are two extra commands "cleanFromFile" and "smudgeToFile", which
+can optionally be set in a filter driver. These are similar to the "clean"
+and "smudge" commands, but avoid needing to pipe the contents of files
+through the filters, and instead read/write files in the filesystem.
+This can be more efficient when using filters with large files that are not
+directly stored in the repository.
+
+Both "cleanFromFile" and "smudgeToFile" are provided a path as an
+added parameter after the configured command line.
+
+The "cleanFromFile" command is provided the path to the file that
+it should clean. Like the "clean" command, it should output the cleaned
+version to standard output.
+
+The "smudgeToFile" command is provided a path to the file that it
+should write to. (This file will already exist, as an empty file that can
+be written to or replaced.) Like the "smudge" command, "smudgeToFile"
+is fed the blob object from its standard input.
+
+Some git operations that need to apply filters cannot use "cleanFromFile"
+and "smudgeToFile", since the files are not present to disk. So, to avoid
+inconsistent behavior, "cleanFromFile" will only be used if "clean" is
+also configured, and "smudgeToFile" will only be used if "smudge" is also
+configured.
+
+An example large file storage filter driver using cleanFromFile and
+sm

[PATCH v3 6/8] better recovery from failure of smudgeToFile filter

2016-06-22 Thread Joey Hess
If the smudgeToFile filter fails, it can leave the worktree file with the
wrong content, or even deleted. Recover from this by falling back to
running the smudge filter.

Signed-off-by: Joey Hess <jo...@joeyh.name>
---
 entry.c   | 55 ---
 t/t0021-conversion.sh | 24 ++
 2 files changed, 59 insertions(+), 20 deletions(-)

diff --git a/entry.c b/entry.c
index 97975e5..8322127 100644
--- a/entry.c
+++ b/entry.c
@@ -181,12 +181,6 @@ static int write_entry(struct cache_entry *ce,
int regular_file = ce_mode_s_ifmt == S_IFREG;
int smudge_to_file = regular_file
&& can_smudge_to_file(ce->name);
-   if (regular_file && ! smudge_to_file &&
-   convert_to_working_tree(ce->name, new, size, )) {
-   free(new);
-   new = strbuf_detach(, );
-   size = newsize;
-   }
 
fd = open_output_fd(path, ce, to_tempfile);
if (fd < 0) {
@@ -194,7 +188,42 @@ static int write_entry(struct cache_entry *ce,
return error_errno("unable to create file %s", path);
}
 
+   if (smudge_to_file) {
+   close(fd);
+   if (! convert_to_working_tree_filter_to_file(ce->name, 
path, new, size)) {
+   smudge_to_file = 0;
+   /* The failing smudgeToFile filter may have
+* deleted or replaced the file; delete
+* the file and re-open for recovery write.
+*/
+   unlink(path);
+   fd = open_output_fd(path, ce, to_tempfile);
+   if (fd < 0) {
+   free(new);
+   return error_errno("unable to create 
file %s", path);
+   }
+   }
+   else {
+   free(new);
+   /* The smudgeToFile filter may have replaced
+* or deleted the file; reopen it to make sure
+* that the file exists. */
+   fd = open(path, O_RDONLY);
+   if (fd < 0)
+   return error_errno("unable to create 
file %s", path);
+   if (!to_tempfile)
+   fstat_done = fstat_output(fd, state, 
);
+   close(fd);
+   }
+   }
+
if (! smudge_to_file) {
+   if (regular_file &&
+   convert_to_working_tree(ce->name, new, size, )) 
{
+   free(new);
+   new = strbuf_detach(, );
+   size = newsize;
+   }
wrote = write_in_full(fd, new, size);
if (!to_tempfile)
fstat_done = fstat_output(fd, state, );
@@ -203,20 +232,6 @@ static int write_entry(struct cache_entry *ce,
if (wrote != size)
return error("unable to write file %s", path);
}
-   else {
-   close(fd);
-   convert_to_working_tree_filter_to_file(ce->name, path, 
new, size);
-   free(new);
-   /* The smudgeToFile filter may have replaced the
-* file; open it to make sure that the file
-* exists. */
-   fd = open(path, O_RDONLY);
-   if (fd < 0)
-   return error_errno("unable to create file %s", 
path);
-   if (!to_tempfile)
-   fstat_done = fstat_output(fd, state, );
-   close(fd);
-   }
break;
case S_IFGITLINK:
if (to_tempfile)
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index cba03fd..c0b4709 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -28,6 +28,14 @@ touch rot13-to-file.ran
 EOF
 chmod +x rot13-to-file.sh
 
+cat <delete-file-and-fail.sh
+#!$SHELL_PATH
+destfile="\$1"
+rm -f "\$destfile"
+exit 1
+EOF
+chmod +x delete-file-and-fail.sh
+
 test_expect_success setup '
git config filter.rot13.smudge ./rot13.sh &&
git config filter.rot13.clean ./rot13.sh &&
@@ -310,6 +318,22 @@ test

[PATCH v3 0/8] extend smudge/clean filters with direct file access

2016-06-22 Thread Joey Hess
Reroll of this patch set with changes:

* Added additional smudgeToFile calls in git am and recursive merge.
* Improved behavior when smudgeToFile filter fails.
* Some small improvements to the test cases.

I hope this will be the final version.

Joey Hess (8):
  clarify %f documentation
  add smudgeToFile and cleanFromFile filter configs
  use cleanFromFile in git add
  use smudgeToFile in git checkout etc
  warn on unusable smudgeToFile/cleanFromFile config
  better recovery from failure of smudgeToFile filter
  use smudgeToFile filter in git am
  use smudgeToFile filter in recursive merge

 Documentation/config.txt|  18 -
 Documentation/gitattributes.txt |  42 
 builtin/apply.c |  16 +
 convert.c   | 141 
 convert.h   |  10 +++
 entry.c |  53 +++
 merge-recursive.c   |  42 +---
 sha1_file.c |  42 ++--
 t/t0021-conversion.sh   | 115 
 9 files changed, 434 insertions(+), 45 deletions(-)

-- 
2.9.0.8.g973eabb.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/8] clarify %f documentation

2016-06-22 Thread Joey Hess
It's natural to expect %f to be an actual file on disk; help avoid that
mistake.

Signed-off-by: Joey Hess <jo...@joeyh.name>
---
 Documentation/gitattributes.txt | 5 +
 1 file changed, 5 insertions(+)

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index e3b1de8..145dd10 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -374,6 +374,11 @@ substitution.  For example:
smudge = git-p4-filter --smudge %f
 
 
+Note that "%f" is the name of the path that is being worked on. Depending
+on the version that is being filtered, the corresponding file on disk may
+not exist, or may have different contents. So, smudge and clean commands
+should not try to access the file on disk, but only act as filters on the
+content provided to them on standard input.
 
 Interaction between checkin/checkout attributes
 ^^^
-- 
2.9.0.8.g973eabb.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/4] clarify %f documentation

2016-06-17 Thread Joey Hess
It's natural to expect %f to be an actual file on disk; help avoid that
mistake.

Signed-off-by: Joey Hess <jo...@joeyh.name>
---

This patch series was meant to contain 5 patches; here's the missing
one. This patch will apply cleanly on top of v2.9.0.

 Documentation/gitattributes.txt | 5 +
 1 file changed, 5 insertions(+)

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index e3b1de8..145dd10 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -374,6 +374,11 @@ substitution.  For example:
smudge = git-p4-filter --smudge %f
 
 
+Note that "%f" is the name of the path that is being worked on. Depending
+on the version that is being filtered, the corresponding file on disk may
+not exist, or may have different contents. So, smudge and clean commands
+should not try to access the file on disk, but only act as filters on the
+content provided to them on standard input.
 
 Interaction between checkin/checkout attributes
 ^^^
-- 
2.8.1
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/4] extend smudge/clean filters with direct file access

2016-06-17 Thread Joey Hess
Doing a quick benchmark of this new interface and git-annex's use of it, git
checkout of a 1 gigabyte file with git-annex providing the smudge filter took:

19 seconds using the smudge interface
11 seconds using smudgeToFile
0.1 seconds with smudgeToFile and annex.thin set
 (while also saving 1 gb of disk space!)

So around 2x speed improvement due to not needing to pipe the file content
through the filter, even without git-annex's annex.thin tricks.

-- 
see shy jo


signature.asc
Description: PGP signature


[PATCH v2 3/4] use smudgeToFile in git checkout etc

2016-06-17 Thread Joey Hess
This makes git checkout, git reset, etc use smudgeToFile.

Includes test cases.

(There's a call to convert_to_working_tree in merge-recursive.c
that could also be made to use smudgeToFile as well.)

Signed-off-by: Joey Hess <jo...@joeyh.name>
---
 entry.c   | 37 +
 t/t0021-conversion.sh | 38 +-
 2 files changed, 62 insertions(+), 13 deletions(-)

diff --git a/entry.c b/entry.c
index 519e042..97975e5 100644
--- a/entry.c
+++ b/entry.c
@@ -175,8 +175,13 @@ static int write_entry(struct cache_entry *ce,
 
/*
 * Convert from git internal format to working tree format
+* unless the smudgeToFile filter can write to the
+* file directly.
 */
-   if (ce_mode_s_ifmt == S_IFREG &&
+   int regular_file = ce_mode_s_ifmt == S_IFREG;
+   int smudge_to_file = regular_file
+   && can_smudge_to_file(ce->name);
+   if (regular_file && ! smudge_to_file &&
convert_to_working_tree(ce->name, new, size, )) {
free(new);
new = strbuf_detach(, );
@@ -189,13 +194,29 @@ static int write_entry(struct cache_entry *ce,
return error_errno("unable to create file %s", path);
}
 
-   wrote = write_in_full(fd, new, size);
-   if (!to_tempfile)
-   fstat_done = fstat_output(fd, state, );
-   close(fd);
-   free(new);
-   if (wrote != size)
-   return error("unable to write file %s", path);
+   if (! smudge_to_file) {
+   wrote = write_in_full(fd, new, size);
+   if (!to_tempfile)
+   fstat_done = fstat_output(fd, state, );
+   close(fd);
+   free(new);
+   if (wrote != size)
+   return error("unable to write file %s", path);
+   }
+   else {
+   close(fd);
+   convert_to_working_tree_filter_to_file(ce->name, path, 
new, size);
+   free(new);
+   /* The smudgeToFile filter may have replaced the
+* file; open it to make sure that the file
+* exists. */
+   fd = open(path, O_RDONLY);
+   if (fd < 0)
+   return error_errno("unable to create file %s", 
path);
+   if (!to_tempfile)
+   fstat_done = fstat_output(fd, state, );
+   close(fd);
+   }
break;
case S_IFGITLINK:
if (to_tempfile)
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index 399f92b..a8042d1 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -14,12 +14,20 @@ chmod +x rot13.sh
 
 cat <rot13-from-file.sh
 #!$SHELL_PATH
-fsfile="\$1"
+srcfile="\$1"
 touch rot13-from-file.ran
-cat "\$fsfile" | ./rot13.sh
+cat "\$srcfile" | ./rot13.sh
 EOF
 chmod +x rot13-from-file.sh
 
+cat <rot13-to-file.sh
+#!$SHELL_PATH
+destfile="\$1"
+touch rot13-to-file.ran
+./rot13.sh > "\$destfile"
+EOF
+chmod +x rot13-to-file.sh
+
 test_expect_success setup '
git config filter.rot13.smudge ./rot13.sh &&
git config filter.rot13.clean ./rot13.sh &&
@@ -291,6 +299,17 @@ test_expect_success 'cleanFromFile filter is used when 
adding a file' '
cmp test.t fstest.t
 '
 
+test_expect_success 'smudgeToFile filter is used when checking out a file' '
+   test_config filter.rot13.smudgeToFile ./rot13-to-file.sh &&
+
+   rm -f fstest.t &&
+   git checkout -- fstest.t &&
+   cmp test.t fstest.t &&
+
+   test -e rot13-to-file.ran &&
+   rm -f rot13-to-file.ran
+'
+
 test_expect_success 'cleanFromFile filter is not used when clean filter is not 
configured' '
test_config filter.noclean.smudge ./rot13.sh &&
test_config filter.noclean.cleanFromFile ./rot13-from-file.sh &&
@@ -299,9 +318,18 @@ test_expect_success 'cleanFromFile filter is not used when 
clean filter is not c
 
cat test.t >test.no &&
git add test.no &&
-   test ! -e rot13-from-file.ran &&
-   git cat-file blob :test.no >actual &&
-   cmp test.t actual
+   test ! -e rot13-from-file.ran
+'
+
+test_expect_success 'smudgeToFile filter is not used when smudge filter is not 
configured' '
+   test_config filter.nosmudge.clean ./rot13.sh &&

[PATCH v2 1/4] add smudgeToFile and cleanFromFile filter configs

2016-06-17 Thread Joey Hess
This adds new smudgeToFile and cleanFromFile filter commands,
which are similar to smudge and clean but allow direct access to files on
disk.

This interface can be much more efficient when operating on large files,
because the whole file content does not need to be streamed through the
filter. It even allows for things like cleanFromFile commands that avoid
reading the whole content of the file, and for smudgeToFile commands that
populate a work tree file using an efficient Copy On Write operation.

The new filter commands will not be used for all filtering. They are
efficient to use when git add is adding a file, or when the work tree is
being updated, but not a good fit when git is internally filtering blob
objects in memory for eg, a diff.

So, a user who wants to use smudgeToFile should also provide a smudge
command to be used in cases where smudgeToFile is not used. And ditto
with cleanFromFile and clean. To avoid foot-shooting configurations, the
new commands are not used unless the old commands are also configured.

That also ensures that a filter driver configuration that includes these
new commands will work, although less efficiently, when used with an older
version of git that does not support them.

Signed-off-by: Joey Hess <jo...@joeyh.name>
---
 Documentation/config.txt|  18 ++-
 Documentation/gitattributes.txt |  37 ++
 convert.c   | 108 ++--
 convert.h   |  10 
 4 files changed, 157 insertions(+), 16 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 2e1b2e4..38f54c1 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1299,15 +1299,29 @@ format.useAutoBase::
format-patch by default.
 
 filter..clean::
-   The command which is used to convert the content of a worktree
+   The command which is used as a filter to convert the content of a 
worktree
file to a blob upon checkin.  See linkgit:gitattributes[5] for
details.
 
 filter..smudge::
-   The command which is used to convert the content of a blob
+   The command which is used as a filter to convert the content of a blob
object to a worktree file upon checkout.  See
linkgit:gitattributes[5] for details.
 
+filter..cleanFromFile::
+   Similar to filter..clean but the specified command 
+   directly accesses a worktree file on disk, rather than
+   receiving the file content from standard input. 
+   Only used when filter..clean is also configured.
+   See linkgit:gitattributes[5] for details.
+
+filter..smudgeToFile::
+   Similar to filter..smudge but the specified command
+   writes the content of a blob directly to a worktree file,
+   rather than to standard output.
+   Only used when filter..smudge is also configured.
+   See linkgit:gitattributes[5] for details.
+
 fsck.::
Allows overriding the message type (error, warn or ignore) of a
specific message ID such as `missingEmail`.
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 145dd10..5ae0783 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -380,6 +380,43 @@ not exist, or may have different contents. So, smudge and 
clean commands
 should not try to access the file on disk, but only act as filters on the
 content provided to them on standard input.
 
+There are two extra commands "cleanFromFile" and "smudgeToFile", which
+can optionally be set in a filter driver. These are similar to the "clean"
+and "smudge" commands, but avoid needing to pipe the contents of files
+through the filters, and instead read/write files in the filesystem.
+This can be more efficient when using filters with large files that are not
+directly stored in the repository.
+
+Both "cleanFromFile" and "smudgeToFile" are provided a path as an
+added parameter after the configured command line.
+
+The "cleanFromFile" command is provided the path to the file that
+it should clean. Like the "clean" command, it should output the cleaned
+version to standard output.
+
+The "smudgeToFile" command is provided a path to the file that it
+should write to. (This file will already exist, as an empty file that can
+be written to or replaced.) Like the "smudge" command, "smudgeToFile"
+is fed the blob object from its standard input.
+
+Some git operations that need to apply filters cannot use "cleanFromFile"
+and "smudgeToFile", since the files are not present to disk. So, to avoid
+inconsistent behavior, "cleanFromFile" will only be used if "clean" is
+also configured, and "smudgeToFile" will only be used if "smudge" is also
+configured.
+
+An example large file storage filter driver using cleanFromFile and
+sm

[PATCH v2 4/4] warn on unusable smudgeToFile/cleanFromFile config

2016-06-17 Thread Joey Hess
Let the user know when they have a smudgeToFile/cleanFromFile config
that cannot be used because the corresponding smudge/clean config
is missing.

The warning is only displayed a maximum of once per git invocation,
and only when doing an operation that would use the filter.

Signed-off-by: Joey Hess <jo...@joeyh.name>
---
 convert.c | 34 ++
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/convert.c b/convert.c
index bf63ba0..84f6bc5 100644
--- a/convert.c
+++ b/convert.c
@@ -847,32 +847,50 @@ int would_convert_to_git_filter_fd(const char *path)
return apply_filter(path, NULL, NULL, 0, -1, NULL, ca.drv->clean);
 }
 
+static int can_filter_file(const char *filefilter, const char *filefiltername,
+  const char *stdiofilter, const char *stdiofiltername,
+  const struct conv_attrs *ca,
+  int *warncount)
+{
+   if (! filefilter)
+   return 0;
+
+   if (stdiofilter)
+   return 1;
+
+   if (*warncount == 0)
+   warning("Not running your configured filter.%s.%s command, 
because filter.%s.%s is not configured",
+   ca->drv->name, filefiltername,
+   ca->drv->name, stdiofiltername);
+   *warncount=*warncount+1;
+   
+   return 0;
+}
+
 int can_clean_from_file(const char *path)
 {
struct conv_attrs ca;
+   static int warncount = 0;
 
convert_attrs(, path);
if (!ca.drv)
return 0;
 
-   /* Only use the cleanFromFile filter when the clean filter is also
-* configured.
-*/
-   return (ca.drv->clean_from_file && ca.drv->clean);
+   return can_filter_file(ca.drv->clean_from_file, "cleanFromFile",
+  ca.drv->clean, "clean", , );
 }
 
 int can_smudge_to_file(const char *path)
 {
struct conv_attrs ca;
+   static int warncount = 0;
 
convert_attrs(, path);
if (!ca.drv)
return 0;
 
-   /* Only use the smudgeToFile filter when the smudge filter is also
-* configured.
-*/
-   return (ca.drv->smudge_to_file && ca.drv->smudge);
+   return can_filter_file(ca.drv->smudge_to_file, "smudgeToFile",
+  ca.drv->smudge, "smudge", , );
 }
 
 const char *get_convert_attr_ascii(const char *path)
-- 
2.8.1

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/4] use cleanFromFile in git add

2016-06-17 Thread Joey Hess
Includes test cases.

Signed-off-by: Joey Hess <jo...@joeyh.name>
---
 sha1_file.c   | 42 --
 t/t0021-conversion.sh | 36 
 2 files changed, 72 insertions(+), 6 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index d5e1121..8df86a0 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3329,6 +3329,29 @@ static int index_stream_convert_blob(unsigned char 
*sha1, int fd,
return ret;
 }
 
+static int index_from_file_convert_blob(unsigned char *sha1,
+ const char *path, unsigned flags)
+{
+   int ret;
+   const int write_object = flags & HASH_WRITE_OBJECT;
+   struct strbuf sbuf = STRBUF_INIT;
+
+   assert(path);
+   assert(can_clean_from_file(path));
+
+   convert_to_git_filter_from_file(path, ,
+write_object ? safe_crlf : SAFE_CRLF_FALSE);
+
+   if (write_object)
+   ret = write_sha1_file(sbuf.buf, sbuf.len, typename(OBJ_BLOB),
+ sha1);
+   else
+   ret = hash_sha1_file(sbuf.buf, sbuf.len, typename(OBJ_BLOB),
+sha1);
+   strbuf_release();
+   return ret;
+}
+
 static int index_pipe(unsigned char *sha1, int fd, enum object_type type,
  const char *path, unsigned flags)
 {
@@ -3421,12 +3444,19 @@ int index_path(unsigned char *sha1, const char *path, 
struct stat *st, unsigned
 
switch (st->st_mode & S_IFMT) {
case S_IFREG:
-   fd = open(path, O_RDONLY);
-   if (fd < 0)
-   return error_errno("open(\"%s\")", path);
-   if (index_fd(sha1, fd, st, OBJ_BLOB, path, flags) < 0)
-   return error("%s: failed to insert into database",
-path);
+   if (can_clean_from_file(path)) {
+   if (index_from_file_convert_blob(sha1, path, flags) < 0)
+   return error("%s: failed to insert into 
database",
+path);
+   }
+   else {
+   fd = open(path, O_RDONLY);
+   if (fd < 0)
+   return error_errno("open(\"%s\")", path);
+   if (index_fd(sha1, fd, st, OBJ_BLOB, path, flags) < 0)
+   return error("%s: failed to insert into 
database",
+path);
+   }
break;
case S_IFLNK:
if (strbuf_readlink(, path, st->st_size))
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index 7bac2bc..399f92b 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -12,6 +12,14 @@ tr \
 EOF
 chmod +x rot13.sh
 
+cat <rot13-from-file.sh
+#!$SHELL_PATH
+fsfile="\$1"
+touch rot13-from-file.ran
+cat "\$fsfile" | ./rot13.sh
+EOF
+chmod +x rot13-from-file.sh
+
 test_expect_success setup '
git config filter.rot13.smudge ./rot13.sh &&
git config filter.rot13.clean ./rot13.sh &&
@@ -268,4 +276,32 @@ test_expect_success 'disable filter with empty override' '
test_must_be_empty err
 '
 
+test_expect_success 'cleanFromFile filter is used when adding a file' '
+   test_config filter.rot13.cleanFromFile ./rot13-from-file.sh &&
+
+   echo "*.t filter=rot13" >.gitattributes &&
+
+   cat test.t >fstest.t &&
+   git add fstest.t &&
+   test -e rot13-from-file.ran &&
+   rm -f rot13-from-file.ran &&
+
+   rm -f fstest.t &&
+   git checkout -- fstest.t &&
+   cmp test.t fstest.t
+'
+
+test_expect_success 'cleanFromFile filter is not used when clean filter is not 
configured' '
+   test_config filter.noclean.smudge ./rot13.sh &&
+   test_config filter.noclean.cleanFromFile ./rot13-from-file.sh &&
+
+   echo "*.no filter=noclean" >.gitattributes &&
+
+   cat test.t >test.no &&
+   git add test.no &&
+   test ! -e rot13-from-file.ran &&
+   git cat-file blob :test.no >actual &&
+   cmp test.t actual
+'
+
 test_done
-- 
2.8.1

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/4] extend smudge/clean filters with direct file access

2016-06-17 Thread Joey Hess
Reroll of this patch set with changes:

* Renamed the new filter drivers for consistency with other configs.
* Improved documentation with feedback from Junio and others.
* Eliminated %p and instead append the filename to the commands
  (separated by a space).
* Fixed an FD leak and a space leak.
* Only use smudgeToFile with regular files, not symlinks.
* After running the smudgeToFile command, double-check that the
  expected file is present, in case the command was buggy and deleted it.
* Added a warning message when the new filter commands are configured
  but the old ones are not, so that the user knows it's refusing to use
  their configuration.

There's been good and helpful documentation and interface review,
but some more code review would be good! Also, git-annex has a
improved-smudge-filters branch now that demonstrates this interface.

Joey Hess (4):
  add smudgeToFile and cleanFromFile filter configs
  use cleanFromFile in git add
  use smudgeToFile in git checkout etc
  warn on unusable smudgeToFile/cleanFromFile config

 Documentation/config.txt|  18 +-
 Documentation/gitattributes.txt |  37 
 convert.c   | 126 +++-
 convert.h   |  10 
 entry.c |  37 +---
 sha1_file.c |  42 --
 t/t0021-conversion.sh   |  64 
 7 files changed, 304 insertions(+), 30 deletions(-)

-- 
2.8.1

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] add smudge-to-file and clean-from-file filter configuration

2016-06-17 Thread Joey Hess
Junio C Hamano wrote:
> Would an interface that always appends the pathname at the end of
> the command line string work?

One problem with this is that "appends" is subtly unclear in this case.

With the example of smugeToFile = cmd --to-file
it seems that a space should be added by git before the filename.

On the other handle, consider smugeToFile = cmd --to-file=
here a space is not wanted before the filename.

So, either a space is automatically included before the filename
and the second example breaks, or no space is included, and
to make the first example work would need careful inclusion of the
trailing space with quoting to prevent it being elided
eg, smugeToFile = "cmd --to-file "

%p does avoid this ambiguity. But as Junio noted, %p is mandatory in the
command for it to possibly work. Git could refuse to use smugeToFile = cmd
as not containing a %p and so not possibly being able to work.

Or we could pick one of the two methods of appending the file
(I prefer not including a space before it as more flexible), and
anything using this interface would need to design its command line
parsing with this interface in mind, and would probably choose to use
--to-file=foo rather than --to-file foo.

-- 
see shy jo


signature.asc
Description: PGP signature


Re: [PATCH 2/4] add smudge-to-file and clean-from-file filter configuration

2016-06-17 Thread Joey Hess
Junio C Hamano wrote:
> Would an interface that always appends the pathname at the end of
> the command line string work?

I'm ok with this, and like getting rid of %p as it's not distinguishable
from %f without reading the documentation.

The sh -c trick can of course be used if some other ordering of
parameters is needed. Probably anything using this interface is gonna be
implemented with the interface in mind from the beginning and won't need
such a trick.

-- 
see shy jo


signature.asc
Description: PGP signature


Re: [PATCH 2/4] add smudge-to-file and clean-from-file filter configuration

2016-06-17 Thread Joey Hess
Junio C Hamano wrote:
> There is what we would want to fix, though.  "worktree file" should
> be spelled "working tree file". This used not to matter before "git
> worktree" was invented (before that we used these two terms
> interchangeably), but these days the distinction matters.

The existing documentation that I am patching uses the term "worktree
file" which is why I continued to use that wording.

(Unless this is a documentation transition that you want to happen
peicemeal as documentation is updated for other reasons?)

> > +filter..clean-from-file::
> 
> Documentation/config.txt hopefully lists all the configuration, but
> I do not see anything that uses 'words-joined-with-dash' format.
> Please do not invent new out-of-convention names.

Point taken; I'll use cleanFromFile and smudgeToFile.

Here's a revised version of the documentation that I think takes the other
suggestions onboard. I emphasise that clean and smudge operate as filters,
to contrast better with cleanFromFile and smudgeToFile not operating as
regular stdio filters.

 filter..clean::
-   The command which is used to convert the content of a worktree
+   The command which is used as a filter to convert the content of a 
worktree
file to a blob upon checkin.  See linkgit:gitattributes[5] for
details.
 
 filter..smudge::
-   The command which is used to convert the content of a blob
+   The command which is used as a filter to convert the content of a blob
object to a worktree file upon checkout.  See
linkgit:gitattributes[5] for details.
 
+filter..cleanFromFile::
+   Similar to filter..clean but the specified command 
+   directly accesses a worktree file on disk, rather than
+   receiving the file content from standard input. 
+   In the command, "%p" is replaced with the name of the file. 
+   Only used when filter..clean is also configured.
+   See linkgit:gitattributes[5] for details.
+
+filter..smudgeToFile::
+   Similar to filter..smudge but the specified command
+   writes the content of a blob directly to a worktree file,
+   rather than to standard output.
+   In the command, "%p" is replaced with the name of the file.
+   Only used when filter..smudge is also configured.
+   See linkgit:gitattributes[5] for details.
+

This could be extended more, but I think this should describe the config
settings concisely and point to the more involved discussion of filter drivers
in gitattributes.

-- 
see shy jo
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] add smudge-to-file and clean-from-file filter configuration

2016-06-17 Thread Joey Hess
Michael J Gruber wrote:
> I'm not sure this will save all feet. I foresee "why is smudge-to-file
> not doing anything" reports...

It could display a warning message if smudge-to-file is set and smudge is
not.

> In addition, it opens the way to doing completely different things in
> smudge and smudge-to-file - which partly is intentional, of course.

They can be implemented very differently, but need to provide the same
file content. Otherwise git checkout and git diff would show different
content for the same file, for example. This is up to the implementor of
a filter driver to get right.

> Do you make any promises that %p is a seekable file?

Yes, %p is a regular file and so is seekable, statable, etc.

-- 
see shy jo


signature.asc
Description: PGP signature


Re: [PATCH 1/4] clarify %f documentation

2016-06-17 Thread Joey Hess
Junio C Hamano wrote:
> "tracked by Git" is not all that interesting, compared to the fact
> that your filter needs to give contents relevant to that path
> because that is what the command line argument Git gives you with
> '%f' means.  It is not a random filename "tracked by Git".  Among 47
> other files tracked by Git, the single one being given is the one
> the code that drives the filter is WORKING ON, and I think that
> needs to be written in the description, hence "the path that is
> being worked on" was my suggestion.

Ah, "being worked on" does clarify it well, I think:

+ Note that "%f" is the name of the path that is being worked on. Depending
+ on the version that is being filtered, the corresponding file on disk may
+ not exist, or may have different contents. So, smudge and clean commands
+ should not try to access the file on disk, but only act as filters on the
+ content provided to them on standard input.

-- 
see shy jo


signature.asc
Description: PGP signature


Re: [PATCH 1/4] clarify %f documentation

2016-06-16 Thread Joey Hess
Junio C Hamano wrote:
> I agree that "the name of the file" can be interpreted in many ways,
> and I agree that it would be a good idea to find a better phrase to
> name the path that is being worked on, but I do not think "the file
> in the git repository" is that phrase.

> I think using the word "path" somewhere in the updated description
> is more likely to have the effect you desire.

"path" is also very ambiguous. I see that "tracked" is often used to
describe what %f is, so how about:
 
+ Note that "%f" is the name of a file tracked by Git. Depending on the
+ version that is being filtered, the corresponding file on disk may not
+ exist, or may have different contents. So, smudge and clean commands should
+ not try to access the file on disk.

-- 
see shy jo


signature.asc
Description: PGP signature


Re: [PATCH 4/4] use smudge-to-file in git checkout etc

2016-06-16 Thread Joey Hess
Joey Hess wrote:
> + int smudge_to_file = can_smudge_to_file(ce->name);
>   if (ce_mode_s_ifmt == S_IFREG &&
> + ! smudge_to_file &&
>   convert_to_working_tree(ce->name, new, size, )) {
>   free(new);
>   new = strbuf_detach(, );
> @@ -189,13 +193,29 @@ static int write_entry(struct cache_entry *ce,

> + if (! can_smudge_to_file(ce->name)) {
> + }
> + else {
> + close(fd);
> + convert_to_working_tree_filter_to_file(ce->name, path, 
> new, size);

Oops, I had meant to avoid using smudge-to-file when
e_mode_s_ifmt != S_IFREG, and forgot it in this patch, so it does the
wrong thing for symlinks.

I'll send an updated patch set fixing this after any other review.

-- 
see shy jo
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] use smudge-to-file in git checkout etc

2016-06-16 Thread Joey Hess
This makes git checkout, git reset, etc use smudge-to-file.

Includes test cases.

(There's a call to convert_to_working_tree in merge-recursive.c
that could also be made to use smudge-to-file as well.)

Signed-off-by: Joey Hess <jo...@joeyh.name>
---
 entry.c   | 34 +++---
 t/t0021-conversion.sh | 44 
 2 files changed, 63 insertions(+), 15 deletions(-)

diff --git a/entry.c b/entry.c
index 519e042..6a23159 100644
--- a/entry.c
+++ b/entry.c
@@ -175,8 +175,12 @@ static int write_entry(struct cache_entry *ce,
 
/*
 * Convert from git internal format to working tree format
+* unless the smudge-to-file filter can write to the
+* file directly.
 */
+   int smudge_to_file = can_smudge_to_file(ce->name);
if (ce_mode_s_ifmt == S_IFREG &&
+   ! smudge_to_file &&
convert_to_working_tree(ce->name, new, size, )) {
free(new);
new = strbuf_detach(, );
@@ -189,13 +193,29 @@ static int write_entry(struct cache_entry *ce,
return error_errno("unable to create file %s", path);
}
 
-   wrote = write_in_full(fd, new, size);
-   if (!to_tempfile)
-   fstat_done = fstat_output(fd, state, );
-   close(fd);
-   free(new);
-   if (wrote != size)
-   return error("unable to write file %s", path);
+   if (! can_smudge_to_file(ce->name)) {
+   wrote = write_in_full(fd, new, size);
+   if (!to_tempfile)
+   fstat_done = fstat_output(fd, state, );
+   close(fd);
+   free(new);
+   if (wrote != size)
+   return error("unable to write file %s", path);
+   }
+   else {
+   close(fd);
+   convert_to_working_tree_filter_to_file(ce->name, path, 
new, size);
+   free(new);
+   if (!to_tempfile) {
+   /* Re-open the file to stat it; the
+* smudge-to-file filter may have replaced
+* the file. */
+   fd = open(path, O_RDONLY);
+   if (fd < 0) {
+   return error_errno("unable to create 
file %s", path);
+   }
+   }
+   }
break;
case S_IFGITLINK:
if (to_tempfile)
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index 1043ea5..b0e2e5e 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -14,12 +14,20 @@ chmod +x rot13.sh
 
 cat <rot13-from-file.sh
 #!$SHELL_PATH
-fsfile="\$1"
+srcfile="\$1"
 touch rot13-from-file.ran
-cat "\$fsfile" | ./rot13.sh
+cat "\$srcfile" | ./rot13.sh
 EOF
 chmod +x rot13-from-file.sh
 
+cat <rot13-to-file.sh
+#!$SHELL_PATH
+destfile="\$1"
+touch rot13-to-file.ran
+./rot13.sh > "\$destfile"
+EOF
+chmod +x rot13-to-file.sh
+
 test_expect_success setup '
git config filter.rot13.smudge ./rot13.sh &&
git config filter.rot13.clean ./rot13.sh &&
@@ -291,17 +299,37 @@ test_expect_success 'clean-from-file filter is used when 
adding a file' '
cmp test.t fstest.t
 '
 
+test_expect_success 'smudge-to-file filter is used when checking out a file' '
+   test_config filter.rot13.smudge-to-file "./rot13-to-file.sh %p" &&
+
+   rm -f fstest.t &&
+   git checkout -- fstest.t &&
+   cmp test.t fstest.t &&
+
+   test -e rot13-to-file.ran &&
+   rm -f rot13-to-file.ran
+'
+
 test_expect_success 'clean-from-file filter is not used when clean filter is 
not configured' '
-   test_config filter.no.smudge ./rot13.sh &&
-   test_config filter.no.clean-from-file "./rot13-from-file.sh %p" &&
+   test_config filter.noclean.smudge ./rot13.sh &&
+   test_config filter.noclean.clean-from-file "./rot13-from-file.sh %p" &&
 
-   echo "*.no filter=no" >.gitattributes &&
+   echo "*.no filter=noclean" >.gitattributes &&
 
cat test.t >test.no &&
git add test.no &&
-   test ! -e rot13-from-file.ran &&
-   git cat-file blob :test.no >actual &&
-   cmp test.t actual
+   test ! -e rot13-from-file.ran
+'
+
+test_expect_success 'smudge-to

[PATCH 0/4] extend smudge/clean filters with direct file access

2016-06-16 Thread Joey Hess
As discussed in this thread:
http://thread.gmane.org/gmane.comp.version-control.git/294425
This adds smudge-to-file and clean-from-file commands supplimenting the
smudge and clean filters.

This interface can be much more efficient when operating on large files,
because the whole file content does not need to be streamed through the
filter. It even allows for things like clean-from-file commands that avoid
reading the whole content of the file, and for smudge-to-file commands that
populate a work tree file using an efficient Copy On Write operation.

Joey Hess (4):
  clarify %f documentation
  add smudge-to-file and clean-from-file filter configuration
  use clean-from-file in git add
  use smudge-to-file in git checkout etc

 Documentation/config.txt|  27 +++---
 Documentation/gitattributes.txt |  44 -
 convert.c   | 107 ++--
 convert.h   |  10 
 entry.c |  34 ++---
 sha1_file.c |  42 +---
 t/t0021-conversion.sh   |  64 
 7 files changed, 292 insertions(+), 36 deletions(-)

-- 
2.9.0.4.g2856e74.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] clarify %f documentation

2016-06-16 Thread Joey Hess
It's natural to expect %f to be an actual file on disk; help avoid that
mistake.

Signed-off-by: Joey Hess <jo...@joeyh.name>
---
 Documentation/gitattributes.txt | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index e3b1de8..e077cc9 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -365,8 +365,8 @@ you can declare that the filter is `required`, in the 
configuration:
 
 
 Sequence "%f" on the filter command line is replaced with the name of
-the file the filter is working on.  A filter might use this in keyword
-substitution.  For example:
+the file in the git repository the filter is working on.
+A filter might use this in keyword substitution.  For example:
 
 
 [filter "p4"]
@@ -374,6 +374,9 @@ substitution.  For example:
smudge = git-p4-filter --smudge %f
 
 
+Note that the "%f" is the name of a file in the git repository; the
+corresponding file on disk may not exist, or may have unrelated contents to
+what git is filtering.
 
 Interaction between checkin/checkout attributes
 ^^^
-- 
2.9.0.4.g2856e74.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] add smudge-to-file and clean-from-file filter configuration

2016-06-16 Thread Joey Hess
This adds new smudge-to-file and clean-from-file filter commands,
which are similar to smudge and clean but allow direct access to files on
disk.

In smudge-to-file and clean-from-file, "%p" is expanded to the path to the
file that should be cleaned from, or smudged to.

This interface can be much more efficient when operating on large files,
because the whole file content does not need to be streamed through the
filter. It even allows for things like clean-from-file commands that avoid
reading the whole content of the file, and for smudge-to-file commands that
populate a work tree file using an efficient Copy On Write operation.

The new filter commands will not be used for all filtering. They are
efficient to use when git add is adding a file, or when the work tree is
being updated, but not a good fit when git is internally filtering blob
objects in memory for eg, a diff.

So, a user who wants to use smudge-to-file should also provide a smudge
command to be used in cases where smudge-to-file is not used. And ditto
with clean-from-file and clean. To avoid foot-shooting configurations, the
new commands are not used unless the old commands are also configured.

That also ensures that a filter driver configuration that includes these
new commands will work, although less efficiently, when used with an older
version of git that does not support them.

Signed-off-by: Joey Hess <jo...@joeyh.name>
---
 Documentation/config.txt|  27 +++---
 Documentation/gitattributes.txt |  37 ++
 convert.c   | 107 ++--
 convert.h   |  10 
 4 files changed, 160 insertions(+), 21 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 2e1b2e4..bbb9296 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1299,14 +1299,29 @@ format.useAutoBase::
format-patch by default.
 
 filter..clean::
-   The command which is used to convert the content of a worktree
-   file to a blob upon checkin.  See linkgit:gitattributes[5] for
-   details.
+   The command which is used on checkin to convert the content of
+   a worktree file (provided on stdin) to a blob (written to stdout).
+   See linkgit:gitattributes[5] for details.
 
 filter..smudge::
-   The command which is used to convert the content of a blob
-   object to a worktree file upon checkout.  See
-   linkgit:gitattributes[5] for details.
+   The command which is used on checkout to convert the content of a
+   blob object (provided on stdin) to a worktree file (written to
+   stdout).
+   See linkgit:gitattributes[5] for details.
+
+filter..clean-from-file::
+   Optional command which is used on checkin to convert the content
+   of a worktree file, which can be read from disk, to a blob
+   (written to stdout).
+   Only used when filter..clean is also configured.
+   See linkgit:gitattributes[5] for details.
+
+filter..smudge-to-file::
+   Optional command which is used to convert the content of a blob
+   object (provided on stdin) to a worktree file, writing directly
+   to the file.
+   Only used when filter..clean is also configured.
+   See linkgit:gitattributes[5] for details.
 
 fsck.::
Allows overriding the message type (error, warn or ignore) of a
diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index e077cc9..32621e7 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -378,6 +378,43 @@ Note that the "%f" is the name of a file in the git 
repository; the
 corresponding file on disk may not exist, or may have unrelated contents to
 what git is filtering.
 
+There are two extra commands "clean-from-file" and "smudge-to-file", which
+can optionally be set in a filter driver. These are similar to the "clean"
+and "smudge" commands, but avoid needing to pipe the contents of files
+through the filters, and instead reading/writing files in the filesystem.
+This can be more efficient when using filters with large files that are not
+directly stored in the repository.
+
+Sequence "%p" on the "clean-from-file" and "smudge-to-file" command line
+is replaced with the path to a file that the filter can access.
+
+In the "clean-from-file" command, "%p" is the path to the file that
+it should clean. Like the "clean" command, it should output the cleaned
+version to standard output.
+
+In the "smudge-from-file" command, "%p" is the path to the file that it
+should write to. (This file will already exist, as an empty file that can
+be written to or replaced.) Like the "smudge" command, "smudge-from-file"
+is fed the blob object from its standard input.
+
+Some git operations that need to apply filters

[PATCH 3/4] use clean-from-file in git add

2016-06-16 Thread Joey Hess
Includes test cases.

Signed-off-by: Joey Hess <jo...@joeyh.name>
---
 sha1_file.c   | 42 --
 t/t0021-conversion.sh | 36 
 2 files changed, 72 insertions(+), 6 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index d5e1121..8df86a0 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3329,6 +3329,29 @@ static int index_stream_convert_blob(unsigned char 
*sha1, int fd,
return ret;
 }
 
+static int index_from_file_convert_blob(unsigned char *sha1,
+ const char *path, unsigned flags)
+{
+   int ret;
+   const int write_object = flags & HASH_WRITE_OBJECT;
+   struct strbuf sbuf = STRBUF_INIT;
+
+   assert(path);
+   assert(can_clean_from_file(path));
+
+   convert_to_git_filter_from_file(path, ,
+write_object ? safe_crlf : SAFE_CRLF_FALSE);
+
+   if (write_object)
+   ret = write_sha1_file(sbuf.buf, sbuf.len, typename(OBJ_BLOB),
+ sha1);
+   else
+   ret = hash_sha1_file(sbuf.buf, sbuf.len, typename(OBJ_BLOB),
+sha1);
+   strbuf_release();
+   return ret;
+}
+
 static int index_pipe(unsigned char *sha1, int fd, enum object_type type,
  const char *path, unsigned flags)
 {
@@ -3421,12 +3444,19 @@ int index_path(unsigned char *sha1, const char *path, 
struct stat *st, unsigned
 
switch (st->st_mode & S_IFMT) {
case S_IFREG:
-   fd = open(path, O_RDONLY);
-   if (fd < 0)
-   return error_errno("open(\"%s\")", path);
-   if (index_fd(sha1, fd, st, OBJ_BLOB, path, flags) < 0)
-   return error("%s: failed to insert into database",
-path);
+   if (can_clean_from_file(path)) {
+   if (index_from_file_convert_blob(sha1, path, flags) < 0)
+   return error("%s: failed to insert into 
database",
+path);
+   }
+   else {
+   fd = open(path, O_RDONLY);
+   if (fd < 0)
+   return error_errno("open(\"%s\")", path);
+   if (index_fd(sha1, fd, st, OBJ_BLOB, path, flags) < 0)
+   return error("%s: failed to insert into 
database",
+path);
+   }
break;
case S_IFLNK:
if (strbuf_readlink(, path, st->st_size))
diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index 7bac2bc..1043ea5 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -12,6 +12,14 @@ tr \
 EOF
 chmod +x rot13.sh
 
+cat <rot13-from-file.sh
+#!$SHELL_PATH
+fsfile="\$1"
+touch rot13-from-file.ran
+cat "\$fsfile" | ./rot13.sh
+EOF
+chmod +x rot13-from-file.sh
+
 test_expect_success setup '
git config filter.rot13.smudge ./rot13.sh &&
git config filter.rot13.clean ./rot13.sh &&
@@ -268,4 +276,32 @@ test_expect_success 'disable filter with empty override' '
test_must_be_empty err
 '
 
+test_expect_success 'clean-from-file filter is used when adding a file' '
+   test_config filter.rot13.clean-from-file "./rot13-from-file.sh %p" &&
+
+   echo "*.t filter=rot13" >.gitattributes &&
+
+   cat test.t >fstest.t &&
+   git add fstest.t &&
+   test -e rot13-from-file.ran &&
+   rm -f rot13-from-file.ran &&
+
+   rm -f fstest.t &&
+   git checkout -- fstest.t &&
+   cmp test.t fstest.t
+'
+
+test_expect_success 'clean-from-file filter is not used when clean filter is 
not configured' '
+   test_config filter.no.smudge ./rot13.sh &&
+   test_config filter.no.clean-from-file "./rot13-from-file.sh %p" &&
+
+   echo "*.no filter=no" >.gitattributes &&
+
+   cat test.t >test.no &&
+   git add test.no &&
+   test ! -e rot13-from-file.ran &&
+   git cat-file blob :test.no >actual &&
+   cmp test.t actual
+'
+
 test_done
-- 
2.9.0.4.g2856e74.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


merge committing staged deletions?

2016-06-13 Thread Joey Hess
I have a case where git merge seems to include staged deletions into the
merge commit. This seems pretty surprising, dunno if it's a bug.

joey@darkstar:~/tmp/x/1>git rm 1 foo
joey@darkstar:~/tmp/x/1>git status
On branch master
Changes to be committed:
  (use "git reset HEAD ..." to unstage)

deleted:1
deleted:foo
joey@darkstar:~/tmp/x/1>git merge refs/heads/synced/master --no-ff
Already up-to-date!
Merge made by the 'recursive' strategy.
 1   | 1 -
 foo | 1 -
 2 files changed, 2 deletions(-)
 delete mode 100644 1
 delete mode 100644 foo

I thought that a merge would leave staged changes alone, unless
they conflict in some way with the changes merged in. 
So why is merge looking at the staged deletions in this case?

I'm using --no-ff because the commit being merged is itself a merge
of HEAD and another commit. HEAD and the commit being merged in fact
have the same tree, so the right merge solution, AFAICS, would be to
keep that tree.

I've attached a 1 kb git bundle that you can clone to reproduce this:

git clone bundle b
cd b
git rm 1
git merge remotes/origin/synced/master --no-ff

git version 2.8.1

-- 
see shy jo


bundle
Description: Binary data


Re: GIT_INDEX_FILE relative path breaks in subdir

2016-05-23 Thread Joey Hess
Junio C Hamano wrote:
> I personally think that it would be OK as long as we do not change
> behaviours for those who do not use core.worktree, $GIT_DIR and/or
> $GIT_WORK_TREE and change behaviour for others to match that
> behaviour, simply because the plain vanilla no-configuration would
> be used by the largest number of people.  But depending on the size
> of the "minority", you may get pushback from them.

The minority affected would I think be limited to:

1. People who have a configuration that *always* sets core.worktree etc,
   and used GIT_INDEX_FILE with a cwd-relative path and it just happened
   to work for them.

2. People who have gone through the learning curve I've suffered about
   how relative GIT_INDEX_FILE works, and put in a workaround that's
   not the obvious "use an absolute path" workaround, but is more
   complex. Like, checking the git configs and adjusting what the
   variable is set relative to. And apparently nobody in this set felt
   worth mentioning this to the list..

I am unsure about the size of 1, and am sure 2 is small to nonexistant.

> I am not sure if relative to PWD is useful.  If it were relative to
> either the GIT_DIR or the GIT_WORK_TREE, i.e. a fixed point, then
> you can set and export GIT_INDEX_FILE and chdir around without
> having to adjust it.  If it were relative to PWD, you would need to
> adjust it every time you chdir, no?

Good point, I had not considered this use case. Although if I set a
long-term environment variable and expect to chdir around, well, that's
what absolute paths are for. All my uses of GIT_INDEX_FILE
are in short-term contexts where the program does not chdir.

-- 
see shy jo


signature.asc
Description: PGP signature


Re: GIT_INDEX_FILE relative path breaks in subdir

2016-05-23 Thread Joey Hess
Junio C Hamano wrote:
> Joey Hess <i...@joeyh.name> writes:
> 
> > This seems to make it basically impossible for any program that wants to
> > use GIT_INDEX_FILE to use anything other than an absolute path;
> > there are too many configurations to keep straight that could change how
> > git interprets what should be a simple relative path to a file.
> 
> Thanks for digging.  Perhaps this needs to be documented (not "in
> this case it is take as relative to that, in this other case, ...",
> but "you cannot rely on relative being relative to something you
> think").

Documenting it that way feels like ¯\_(ツ)_/¯
I feel it should be made consistently relative to top of work tree.

Seems fairly unlikely that any scripts driving git rely on it
being relative to the pwd when GIT_WORK_TREE etc is set.

(I'd prefer relative to pwd because that is much more sane IMHO, but
making that change is more likely to break something.)

-- 
see shy jo


signature.asc
Description: PGP signature


Re: GIT_INDEX_FILE relative path breaks in subdir

2016-05-22 Thread Joey Hess
This is actually worse than I thought; when git is being run with a
detached work tree, GIT_INDEX_FILE is treated as a path relative to CWD,
instead of the normal behavior of relative the top of the work tree.

Normal and expected (according to this thread anyway):

joey@darkstar:~/src/other/git/Documentation> input="100644 
8a1218a1024a212bb3db30becd860315f9f3ac52 1frotz"
joey@darkstar:~/src/other/git/Documentation> echo "$input" | 
GIT_INDEX_FILE=.git/newindex git update-index --index-info 
joey@darkstar:~/src/other/git/Documentation> ls ../.git/newindex 
../.git/newindex

Fully detached worktree and git dir:

joey@darkstar:/> echo "$input" | GIT_INDEX_FILE=.git/index git 
--git-dir=/home/joey/src/other/git/.git --work-tree=/home/joey/src/other/git 
update-index --index-info 
fatal: Unable to create '/.git/index.lock': No such file or directory
joey@darkstar:/> echo "$input" | GIT_WORK_TREE=/home/joey/src/other/git 
GIT_DIR=/home/joey/src/other/git/.git GIT_INDEX_FILE=.git/newindex git 
update-index --index-info 
fatal: Unable to create '/.git/newindex.lock': No such file or directory

Work tree detached via git configuration:

joey@darkstar:~/src/other/git> git config core.worktree 
/home/joey/src/other/git/tmptree
joey@darkstar:~/src/other/git> mkdir tmptree
joey@darkstar:~/src/other/git> cd tmptree/
joey@darkstar:~/src/other/git/tmptree> mkdir Documentation
joey@darkstar:~/src/other/git/tmptree> cd Documentation/
joey@darkstar:~/src/other/git/tmptree/Documentation> echo "$input" | 
GIT_INDEX_FILE=.git/index git update-index --index-info
fatal: Unable to create '/home/joey/src/other/git/tmptree/.git/index.lock': No 
such file or directory

This seems to make it basically impossible for any program that wants to
use GIT_INDEX_FILE to use anything other than an absolute path;
there are too many configurations to keep straight that could change how
git interprets what should be a simple relative path to a file.

-- 
see shy jo


signature.asc
Description: PGP signature


Re: GIT_INDEX_FILE relative path breaks in subdir

2016-05-17 Thread Joey Hess
Junio C Hamano wrote:
> Joey Hess <i...@joeyh.name> writes:
> 
> > Appears to be a bug in git. Seems that it's assuming GIT_INDEX_FILE is
> > relative to the top of the worktree and not to the CWD.
> 
> I think that has always been the case.  You can always specify it as
> relative to the top.  Of course, you can use absolute.

I think it *has* always been that way, but is there anything in the
documentation that indicates this is the case? This could well be best
fixed in the documentation.

Hmm, at least GIT_OBJECT_DIRECTORY also behaves this way.
(OTOH, GIT_DIR does not behave this way).

-- 
see shy jo


signature.asc
Description: PGP signature


GIT_INDEX_FILE relative path breaks in subdir

2016-05-17 Thread Joey Hess
joey@darkstar:/tmp>git init test
Initialized empty Git repository in /tmp/test/.git/
joey@darkstar:/tmp>cd test
joey@darkstar:/tmp/test>mkdir sub
joey@darkstar:/tmp/test>cd sub
joey@darkstar:/tmp/test/sub>GIT_INDEX_FILE=../.git/otherindex git write-tree
fatal: Unable to create '/tmp/test/../.git/otherindex.lock': No such file or 
directory

Appears to be a bug in git. Seems that it's assuming GIT_INDEX_FILE is
relative to the top of the worktree and not to the CWD.

Workaround: Use absolute path to the index file.

joey@darkstar:/tmp/test/sub>GIT_INDEX_FILE=`pwd`/../.git/otherindex git 
write-tree
4b825dc642cb6eb9a060e54bf8d69288fbee4904
joey@darkstar:/tmp/test/sub>ls ../.git/otherindex 
../.git/otherindex

git version 2.8.1

-- 
see shy jo


signature.asc
Description: PGP signature


Re: proposal for extending smudge/clean filters with raw file access

2016-05-12 Thread Joey Hess
Junio C Hamano wrote:
> > Secondly, and harder to get around, the filename passed to the clean
> > filter is not necessarily a path to the actual existing file that is
> > being cleaned.
> 
> Either one of us is confused.  I was talking about updating the
> current "clean" implementation without changing its interface,
> i.e. gets fed via its standard input, expected to respond to its
> standard output.  There is no filename involved.

I'm talking about the %f that can be passed to the clean filter.
The context that I left out is that my clean filter could avoid reading
all of stdin, and quickly produce the cleaned result, but only if it
can examine the file that's being cleaned. Which is not currently
entirely safe to use the %f for.

There may be a way to make a clean filter that can do something useful
without reading all of stdin, and without examining the file that's
being cleaned. Maybe. Hard to see how. I don't feel such a hypothetical
clean filter is worth changing the current EPIPE behavior to support.

So I think it's better to add a separate clean-from-fs and keep the
current clean filter interface as it stands.

-- 
see shy jo


signature.asc
Description: PGP signature


Re: proposal for extending smudge/clean filters with raw file access

2016-05-12 Thread Joey Hess
Junio C Hamano wrote:
> The smudge happens to be the last to run, so it is quite true that
> it can say "Hey Git, I've written it out already".
> 
> I didn't check all codepaths to ensure that we won't need the
> smudged result in core at all, but I am guessing you did so before
> making this proposal, so in that case I would say this sounds fine.

Well, the idea is to only use smudge-to-file when the smudged content is
going to be written out to a file. Any other code paths that need to
smudge some content would use the smudge filter.

So, try_create_file would use it. Maybe some other places I have not
found yet could as well.

-- 
see shy jo


signature.asc
Description: PGP signature


Re: proposal for extending smudge/clean filters with raw file access

2016-05-12 Thread Joey Hess
Junio C Hamano wrote:
> This side, I do not think we even need a new variant.  We can just
> update the code to interact with "clean" so that it the writer to
> the pipe ignores SIGPIPE, detects EPIPE on write(2), says "ah, the
> other end does not need the full input to produce its output".  The
> reader from the pipe consumes its output without failing AS LONG AS
> the "clean" filter exits with zero (we do check its exit status,
> right?)

There are two problems with doing that. First, any clean filter that
relied on that would not work with older versions of git.

Secondly, and harder to get around, the filename passed to the clean
filter is not necessarily a path to the actual existing file that is
being cleaned. For example, git hash-object --stdin --path=whatever.
So the current clean filter can't really safely rely on accessing the
file to short-circuit its cleaning. (Although it seems to mostly work..
currently..)

> We cannot do a similar "we can just unconditionally update" like I
> said on the "clean" side to "smudge", so it would need a new
> variant.  I do not want to call it anything "raw", as there is
> nothing "raw" about it.  "smudge-to-fs" would be a better name.

"raw" was just a placeholder. "clean-from-fs" and "smudge-to-fs" are
pretty descriptive.

-- 
see shy jo


signature.asc
Description: PGP signature


proposal for extending smudge/clean filters with raw file access

2016-05-12 Thread Joey Hess
I'm using smudge/clean filters in git-annex now, and it's not been an
entirely smooth fit between the interface and what git-annex wants
to do.

The clean filter has to consume the whole file content on stdin;
not reading it all will make git think the clean filter failed.
But, git-annex often doesn't need to read the whole content of a
work-tree file in order to clean it.

The smudge filter has to output the whole file content to stdout. But
git-annex often has the file's content on disk already, and could just
move it into place in the working tree. This would save CPU and IO and
often disk space too. But the smudge interface doesn't let git-annex use
the efficient approach.

So I propose extending the filter driver with two more optional
commands. Call them raw-clean and raw-smudge for now.

raw-clean would be like clean, but rather than being fed the whole
content of a large file on stdin, it would be passed the filename, and
can access the file itself. Like the clean filter, it outputs the
cleaned version on stdout.

raw-smudge would be like smudge, but rather than needing to output the
whole content of a large file on stdout, it would be passed a filename,
and can create that file itself.

To keep this backwards compatible, and to handle the cases where the
object being filtered is not a file on disk, the smudge and clean
filters would be required to be configured too, in order for raw-clean
and raw-smudge to be used.

It seems fairly easy to implement raw-clean. In sha1_file.c, index_path
would use raw-clean when available, while index_fd etc keep on using
the clean filter. I have not investigated what would be needed to implement
raw-smudge yet.

-- 
see shy jo
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: 'next'ed --allow-unrelated-histories could cause lots of grief

2016-04-21 Thread Joey Hess
Yaroslav Halchenko wrote:
> - for git-annex (Joey was CCed from the beginning, not sure if annex
>   would be affected though), it would be merging of git-annex
>   branches while joining multiple annexes for the sync (e.g. by git
>   annex assistant). 

Not entirely accurate (git-annex merges its git-annex branches using a
custom merge method and not involving git-merge). The actual use case is
two users (or one user with two devices) each with a git-annex
repository who decide to share their files by combining the two
repositories. This is pretty far from the kernel world, so it's not like
bisection is something they care about.

However, I also see --allow-unrelated-histories as very useful to
prevent many foot-shooting maneuvers. Especially when a repository has
special-purpose branches, like git-annex's git-annex branch, or other
branches that are never intended to be merged into master. It's a not
entirely uncommon mistake for users to merge in such a branch, and the
users who make such a mistake often don't know enough git to easily
recover from it.

Junio C Hamano wrote:
>   merge: GIT_MERGE_ALLOW_UNRELATED_HISTORIES environment

I hope this patch lands, it will save me a lot of bother.

-- 
see shy jo


signature.asc
Description: PGP signature


Re: 'next'ed --allow-unrelated-histories could cause lots of grief

2016-04-21 Thread Joey Hess
Yaroslav Halchenko wrote:
> which is planned for the next release.  I guess it is indeed a
> worthwhile accident-prevention measure BUT not sure if it is so
> important as to cause a change in behavior on which some projects using
> git through the cmdline interface might have been relying upon for
> years!

Not only through the command line interface. The git-annex webapp has
common use cases that will be broken by this change.

> Moreover, it was explicitly stated that "no configuration variable to
> enable this by default exists and will not be added", which would cause
> 3rd party scripts/code/projects relying on previous behavior  to provide
> version specific handling (either to add that
> --allow-unrelated-histories or not)... very cumbersome!

Agreed, a configuration setting that could be passed via -c would be
much less cumbersome than checking the version of git in order to only
pass the option to git versions that understand it. This would also
provide a way to get git pull to allow such merges.

Compare with, for example, the change to default to an interactive
merge, where GIT_MERGE_AUTOEDIT=no was provided to ease compatability.

-- 
see shy jo


signature.asc
Description: PGP signature


Re: Migrating away from SHA-1?

2016-04-14 Thread Joey Hess
Theodore Ts'o wrote:
> OK, so how does this map to git?  First of all, from a collision
> perspective, the two blobs have to map into valid C code

Git provides other places to hide the colliding blobs; the best seems to
be as an added header in the commit object, or as trailing data after a \0
in the commit message. git is very good at hiding such potentially
colliding data from the user, as https://github.com/joeyh/supercollider
demonstrates.

commit 24f30db5790b209fa412ce81c5ef2bf8af5fd4d7
Author: Joey Hess <j...@kitenet.net>
Date:   Fri Sep 9 11:49:21 2011 -0400

an innocent commit

If this were a sha1 colliding attack, there would be some sort of binary
garbage below. Which there isn't. So this can be safely merged.

joey@darkstar:~/tmp/supercollider>git cat-file -p 
24f30db5790b209fa412ce81c5ef2bf8af5fd4d7
tree 735a7633237c07b398856005de3bc9ea00446747
author Joey Hess <j...@kitenet.net> 1315583361 -0400
committer Joey Hess <j...@kitenet.net> 1315583361 -0400

an innocent commit

If this were a sha1 colliding attack, there would be some sort of binary
garbage below. Which there isn't. So this can be safely merged.



??b???[?i??ͯ?t?
2??os??Q??H?޸*zl?RA˂q?E
?E7???\?m???U?>MU 
GY?d)?ȼ??'g?~D??ɯhQ????/"E??X?m???^͸??S?D??;w6(?`??>?縘?AѲ?*!??@v>?8??2?!??=*?J
   

???ynH???c?w?\??K7???N?6?????A5?FM?wZ?~?pKY?R???s7??(?ƶ?_"??m%1a??ʀ??K[
t????!A0?ΈfT.?T?w?ᛵƌ?р???aco?V/2??nَ?
?}?6?_?z?{


(The other possibility would be to hide the colliding blob in the tree
object, but that seems unlikely.)

-- 
see shy jo


signature.asc
Description: PGP signature


bug: git submodule add in of nested submodule handles relative path wrong

2016-03-02 Thread Joey Hess
joey@darkstar:/tmp/empty>git init sub1
Initialized empty Git repository in /tmp/empty/sub1/.git/
joey@darkstar:/tmp/empty>git init sub2
Initialized empty Git repository in /tmp/empty/sub2/.git/
joey@darkstar:/tmp/empty>cd sub1
joey@darkstar:/tmp/empty/sub1>date > f1 ; git add f1; git commit -m add -q
joey@darkstar:/tmp/empty/sub1>cd ..
joey@darkstar:/tmp/empty>cd sub2
joey@darkstar:/tmp/empty/sub2>date > f2 ; git add f2; git commit -m add -q
joey@darkstar:/tmp/empty/sub2>cd ..
joey@darkstar:/tmp/empty>git init repo
Initialized empty Git repository in /tmp/empty/repo/.git/
joey@darkstar:/tmp/empty>cd repo
joey@darkstar:/tmp/empty/repo>git submodule add ../sub1 1
Cloning into '1'...
done.
joey@darkstar:/tmp/empty/repo>cd 1
joey@darkstar:/tmp/empty/repo/1>ls
f1
joey@darkstar:/tmp/empty/repo/1>git submodule add ../../sub2 2
fatal: repository '/tmp/sub2' does not exist
fatal: clone of '/tmp/sub2' into submodule path '2' failed

Like the bug I filed yesterday, this is caused by git submodule add's
chdir into .git/modules.

Workaround is to pass absolute paths to git submodule add.

Note that this could be an exploitable security hole under some unusual
circumstances. In the example above, any other local user could create a
/tmp/sub2 containing something nasty, and git would check it out
accidentially.

git version 2.7.0

-- 
see shy jo


signature.asc
Description: PGP signature


Re: bug: git submodule add fails when .git is a symlink

2016-03-01 Thread Joey Hess
Junio C Hamano wrote:
> A more pertinent question may be which version of Git did the above
> ever work, I guess.  We fairly liberally chdir around and I do not
> think we deliberately avoid assuming that "cd .git && cd .." might
> not come back to the original directory, for example, so I wouldn't
> be surprised if it never worked.

IIRC git used symlinks for .git in submodules before version 1.7.8, so I
guess that older versions supported that pretty well.

This one case is the only time I've seen a symlink for .git present a
problem so far.

-- 
see shy jo


signature.asc
Description: PGP signature


Re: bug: git submodule add fails when .git is a symlink

2016-03-01 Thread Joey Hess
Stefan Beller wrote:
> To elaborate on that: Starting in 2.7 parts of the submodule stuff
> has been rewritten in C, in 2.8 even more and there is more in flight for
>  > 2.8.
> 
> However your bug is also to be found in 2.6, which doesn't contain any
> recent rewrites, so it is a rather long standing bug, I would presume.

Yes, I saw it with 2.7.0, but I think the user who reported it was on
2.6.

> As a workaround for now:
> 
> echo "gitdir: ../gitdir/.git" > .git

Not an option in our particular situation, unfortunately.

I wonder if the miscalculated ../../../somedir could cause git to access
files outside the git repos?

-- 
see shy jo


signature.asc
Description: PGP signature


bug: git submodule add fails when .git is a symlink

2016-03-01 Thread Joey Hess
git init gitdir
mkdir worktree
cd worktree
ln -s ../gitdir/.git .git
git submodule add /any/git/repo sub

fatal: Could not chdir to '../../../sub': No such file or directory

Fairly sure this is a bug..

-- 
see shy jo


signature.asc
Description: PGP signature


update index mtime etc metadata

2015-12-14 Thread Joey Hess
Is there any available plumbing that can change the mtime etc metadata
that is recorded in the index for a file, to user-provided values? Or,
to force the current file stat metadata to be updated in the index?

I know, git update-index --refresh, but I have a case where that's too
expensive. I'm using smudge filters; I know that the cleaned version of
the file will be unchanged from what's in the index now and only the
stat metadata will change, and so I want to avoid
git update-index --refresh running the clean filter, which can
be quite expensive for a large file.

At the moment I don't see a way to do it other than using eg libgit2 to
update the appropriate fields in the index structure.

-- 
see shy jo


signature.asc
Description: PGP signature


git pull --upload-pack reversion in git 2.5.0

2015-07-30 Thread Joey Hess
In git 2.1.4, I can run: git pull --upload-pack 'echo --foo'

This also seems to work in 2.4.6, but in 2.5.0, the option parser
does something weird, apparently looking inside the quoted parameter
and parsing parameters in there:

error: unknown option `foo'
usage: git fetch [options] [repository [refspec...]]

Needless to say, this broke my use of --upload-pack.

-- 
see shy jo


signature.asc
Description: Digital signature


Re: git pull --upload-pack reversion in git 2.5.0

2015-07-30 Thread Joey Hess
I think this comes down to a lack of quoting where git-pull runs
git-fetch. Before eb2a8d9ed3fca2ba2f617b704992d483605f3bb6,
$@ was passed through to git-fetch, but now there is a $upload_pack
which is passed without being quoted.

-- 
see shy jo


signature.asc
Description: Digital signature


[PATCH] support bash completion for add-on commands

2015-07-16 Thread Joey Hess
This makes it possible to implement bash completion for add-on commands,
that will work even when the bash completion scripts are being loaded
on-demand, as is done by the bash-completion package.

git's bash completion handles subcommands by running a _git_$command
function. As well as the many such functions included in
git-completion.bash, there can be other functions defined elsewhere
to support third-party add-on git commands, and they'll happily be used.

But, bash completion scripts are often loaded on demand, as shown in the
completion_loader example in bash's man page, and the bash-completion
implementation that is commonly used on many Linux systems. The demand
loading will load this very script from some place like
/usr/share/bash-completion/completions/git, when the user complete a git
command. But, completion scripts for git add-on commands don't get loaded.

For example, when I wrote a git-annex bash completion script,
bash was unable to tab complete git annex foo, until I tab completed a
git-annex command. Which loaded the git-annex completion, and then
that same completion worked to make git annex foo tab complete. An
inconsistent UI..

So, if the git completion script is unable to find the wanted
_git_$command function, have it fall-back to looking for a git-$command
completion script, and loading it. The add-on script is looked for in the
same directory as the git completion script, which we can find by looking
at BASH_SOURCE.

Signed-off-by: Joey Hess jo...@joeyh.name
---
 contrib/completion/git-completion.bash | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index c97c648..ba91b2a 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -2614,7 +2614,16 @@ __git_main ()
if [ -n $expansion ]; then
words[1]=$expansion
completion_func=_git_${expansion//-/_}
-   declare -f $completion_func /dev/null  $completion_func
+   declare -f $completion_func /dev/null  $completion_func  
return
+   fi
+
+   # As a fallback, if no completion function is defined for the
+   # command, look for add-on command completion script in same
+   # directory as this completion script, and if found, source it,
+   # and restart completion using it.
+   local compdir=${BASH_SOURCE%/*}
+   if [ -e $compdir/git-$command ]; then
+   source $compdir/git-$command  __git_main $@
fi
 }
 

-- 
2.1.4
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


undocumented core.sharedRepository=2 set by git init --shared=world

2015-07-06 Thread Joey Hess
joey@darkstar:~/tmpgit init --shared=world testrepo
Initialized empty shared Git repository in /home/joey/tmp/testrepo/.git/
joey@darkstar:~/tmpgrep shared testrepo/.git/config 
sharedrepository = 2

This magic value of 2 seems to be undocumented, as is the magic value of 1
that's equvilant to group.

I think it would be better to have git init put in world or group and not
these magic values. Anyway, I suppose they ought to be documented too.

-- 
see shy jo


signature.asc
Description: Digital signature


[PATCH] improve documentation for some commands that use pathspecs

2015-03-31 Thread Joey Hess
After being surprised that git-ls-files expands pathspecs, here's a patch
that would have saved me.
---
 Documentation/git-ls-files.txt | 9 +
 Documentation/git-ls-tree.txt  | 8 
 2 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
index e26f01f..f7a3039 100644
--- a/Documentation/git-ls-files.txt
+++ b/Documentation/git-ls-files.txt
@@ -17,7 +17,7 @@ SYNOPSIS
[--exclude-per-directory=file]
[--exclude-standard]
[--error-unmatch] [--with-tree=tree-ish]
-   [--full-name] [--abbrev] [--] [file...]
+   [--full-name] [--abbrev] [--] [pathspec...]
 
 DESCRIPTION
 ---
@@ -101,7 +101,7 @@ OPTIONS
 
 --with-tree=tree-ish::
When using --error-unmatch to expand the user supplied
-   file (i.e. path pattern) arguments to paths, pretend
+   pathspec arguments to paths, pretend
that paths which were removed in the index since the
named tree-ish are still present.  Using this option
with `-s` or `-u` options does not make any sense.
@@ -150,9 +150,10 @@ a space) at the start of each line:
 \--::
Do not interpret any more arguments as options.
 
-file::
+pathspec::
Files to show. If no files are given all files which match the other
-   specified criteria are shown.
+   specified criteria are shown. (Note that this isn't really raw
+   pathnames, but rather a list of patterns to match.)
 
 Output
 --
diff --git a/Documentation/git-ls-tree.txt b/Documentation/git-ls-tree.txt
index 16e87fd..58e7f64 100644
--- a/Documentation/git-ls-tree.txt
+++ b/Documentation/git-ls-tree.txt
@@ -11,7 +11,7 @@ SYNOPSIS
 [verse]
 'git ls-tree' [-d] [-r] [-t] [-l] [-z]
[--name-only] [--name-status] [--full-name] [--full-tree] 
[--abbrev[=n]]
-   tree-ish [path...]
+   tree-ish [pathspec...]
 
 DESCRIPTION
 ---
@@ -19,11 +19,11 @@ Lists the contents of a given tree object, like what 
/bin/ls -a does
 in the current working directory.  Note that:
 
  - the behaviour is slightly different from that of /bin/ls in that the
-   'path' denotes just a list of patterns to match, e.g. so specifying
+   'pathspec' denotes just a list of patterns to match, e.g. so specifying
directory name (without '-r') will behave differently, and order of the
arguments does not matter.
 
- - the behaviour is similar to that of /bin/ls in that the 'path' is
+ - the behaviour is similar to that of /bin/ls in that the 'pathspec' is
taken as relative to the current working directory.  E.g. when you are
in a directory 'sub' that has a directory 'dir', you can run 'git
ls-tree -r HEAD dir' to list the contents of the tree (that is
@@ -72,7 +72,7 @@ OPTIONS
Do not limit the listing to the current working directory.
Implies --full-name.
 
-[path...]::
+[pathspec...]::
When paths are given, show them (note that this isn't really raw
pathnames, but rather a list of patterns to match).  Otherwise
implicitly uses the root level of the tree as the sole path argument.
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


git ls-files wildcard behavior considered harmful

2015-03-30 Thread Joey Hess
I was very surprised to learn the other day that git ls-files 'foo*'
will expand wildcards (including character classes), in the absence of
expansion by the shell. (git version 2.1.4)

joey@darkstar:~/tmp/aaagit ls-files 'foo*bar'
foo*bar
foobazbar
joey@darkstar:~/tmp/aaagit ls-files '[abc]'
[abc]
a
b

As far as I can see this behavior is not documented on the man page,
except for a tiny mention in the --with-tree documentation, where
it says file (i.e. path pattern).

Since I wanted to avoid this wildcard expension, I tried slash-escaping
the wildcard characters. This works:

joey@darkstar:~/tmp/aaagit ls-files 'foo\*bar'
foo*bar
joey@darkstar:~/tmp/aaagit ls-files '\[abc\]'
[abc]

But, there is a weird behavior here with subdirectories. While normally
ls-files would recurse, slash-escaped wildcard characters in the directory
name prevent recursion.

joey@darkstar:~/tmp/aaagit ls-files 'foo[d]'
foo[d]/subfile
food
joey@darkstar:~/tmp/aaagit ls-files 'foo\[d\]'
joey@darkstar:~/tmp/aaa

The above example shows a case where it's impossible to get ls-files
to only list files in a directory and not other files that match the
wildcard. This seems like it must be a bug, and it means it's impossible
to reliably work around the wildcard expansion behavior.

I suspect that this wildcard expansion behavior is useful somewhere.
But from the perspective of using ls-files as plumbing, where you want
to get out some subset of what was put in, it's not nice.

-- 
see shy jo
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git ls-files wildcard behavior considered harmful

2015-03-30 Thread Joey Hess
Duy Nguyen wrote:
 You can do git --literal-pathspecs ls-files ... or set 
 GIT_LITERAL_PATHSPECS.

Thanks! --literal-pathspecs does allow getting around this.

Now I'm wondering what other parts of plumbing might be doing globbing
that I did not anticipate. Maybe I should set the environment variable
so I don't need to worry about it..

-- 
see shy jo


signature.asc
Description: Digital signature


Re: weaning distributions off tarballs: extended verification of git tags

2015-03-02 Thread Joey Hess
I support this proposal, as someone who no longer releases tarballs
of my software, when I can possibly avoid it. I have worried about
signed tags / commits only being a SHA1 break away from useless.

As to the implementation, checksumming the collection of raw objects is
certainly superior to tar. Colin had suggested sorting the objects by
checksum, but I don't think that is necessary. Just stream the commit
object, then its tree object, followed by the content of each object
listed in the tree, recursing into subtrees as necessary. That will be a
stable stream for a given commit, or tree.

-- 
see shy jo


signature.asc
Description: Digital signature


bug with partial commit and pre-commit hook updating the index

2014-10-10 Thread Joey Hess
I have found many uses for the feature that lets a pre-commit hook stage
changes in the index that will be included in the commit. But now I seem
to have found a bug in the support for that, involving partial commits.

It seems that, after a partial commit in which the pre-commit hook
stages a modification of a file, the index is is left without that
staged change. This only occurs with git commit $file, not git commit -a.

joey@darkstar:~/tmp/als
joey@darkstar:~/tmp/adate  foo
joey@darkstar:~/tmp/agit add foo
joey@darkstar:~/tmp/agit commit -m added regular file foo
[master 79d0f1d] added regular file foo
 1 file changed, 1 insertion(+)
 create mode 100644 foo
joey@darkstar:~/tmp/amv ~/pre-commit  .git/hooks/
joey@darkstar:~/tmp/acat  .git/hooks/pre-commit
#!/bin/sh
ln -vsf /etc/passwd foo
git add foo
joey@darkstar:~/tmp/adate  foo
joey@darkstar:~/tmp/agit commit foo -m update
‘foo’ - ‘/etc/passwd’
[master efa9f67] update
 1 file changed, 1 insertion(+), 1 deletion(-)
 rewrite foo (100%)
 mode change 100644 = 12

So, the pre-commit hook replaced file foo with a symlink, and staged it,
and we can see from the commit summary that was correctly included
in the commit. But, look here:

joey@darkstar:~/tmp/agit status
On branch master
Changes to be committed:
  (use git reset HEAD file... to unstage)

typechange: foo

Changes not staged for commit:
  (use git add file... to update what will be committed)
  (use git checkout -- file... to discard changes in working directory)

typechange: foo

builtin/commit.c has a long comment that talks about a false index which
is set up and used during a partial commit. The pre-commit hook is run
using this false index, and the commit is generated from it. I guess the
bug involves the real index not being updated afterwards to reflect the
changes made to the false index.

-- 
see shy jo, resending a mail that vger accepted yesterday but has still 
not posted
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


bug: git merge --no-commit loses track of file modes in the index

2014-06-12 Thread Joey Hess
If git merge --no-commit is used to merge a commit adding a 
file with an unusual mode -- specifically a symlink which has mode 12,
it fails to stage the right mode into the index.

This only happens when core.symlinks=false. I noticed it on FAT, but
have managed to reproduce it on ext4.

Here's an example of the bug:

joey@darkstar:~git clone r1 r2
Cloning into 'r2'...
done.
joey@darkstar:~cd r1
joey@darkstar:~/r1ls -l
total 0
lrwxrwxrwx 1 joey joey 11 Jun 12 21:23 foo - /etc/passwd
joey@darkstar:~/r1git mv foo bar
joey@darkstar:~/r1git commit -m moved
[master 516a53c] moved
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename foo = bar (100%)
joey@darkstar:~/r1cd ..
joey@darkstar:~cd r2
joey@darkstar:~/r2git config core.symlinks false
joey@darkstar:~/r2git fetch origin
remote: Counting objects: 2, done.
remote: Total 2 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (2/2), done.
From /home/joey/r1
   7ab8102..516a53c  master - origin/master
joey@darkstar:~/r2git merge origin/master --no-commit --no-ff
Automatic merge went well; stopped before committing as requested
joey@darkstar:~/r2git diff --cached
diff --git a/bar b/bar
new file mode 100644
index 000..3594e94
--- /dev/null
+++ b/bar
@@ -0,0 +1 @@
+/etc/passwd
\ No newline at end of file
diff --git a/foo b/foo
deleted file mode 12
index 3594e94..000
--- a/foo
+++ /dev/null
@@ -1 +0,0 @@
-/etc/passwd
\ No newline at end of file
joey@darkstar:~/r2git commit -m oops
[master 63bd960] oops
joey@darkstar:~/r2git show
commit 63bd9608c96a91582b27c5853ff58053bab6c71c
Merge: 7ab8102 516a53c
Author: Joey Hess j...@kitenet.net
Date:   Thu Jun 12 21:37:35 2014 -0400

oops

diff --cc bar
index 000,3594e94..3594e94
mode 00,12..100644
--- a/bar
+++ b/bar

joey@darkstar:~/r2git version
git version 2.0.0

-- 
see shy jo


signature.asc
Description: Digital signature


Re: RLIMIT_NOFILE fallback

2013-12-18 Thread Joey Hess
Junio C Hamano wrote:
 Hmph, perhaps you are right.  Like this?

Works for me.

-- 
see shy jo


signature.asc
Description: Digital signature


Re: RLIMIT_NOFILE fallback

2013-12-18 Thread Joey Hess
Jeff King wrote:
 I wish we understood why getrlimit was failing. Returning EFAULT seems
 like an odd choice if it is not implemented for the system. On such a
 system, do the other fallbacks actually work? Would it work to do:
 
 That is, does sysconf actually work on such a system (or does it need a
 similar run-time fallback)? And either way, we should try falling back
 to OPEN_MAX rather than 1 if we have it.

For what it's worth, the system this happened on was a QNAP TS-219PII
Linux willow 2.6.33.2 #1 Fri Mar 1 04:41:48 CST 2013 armv5tel unknown

I don't have access to it to run tests of sysconf. (I already suggested its
owner upgrade its firmware.)

 As far as the warning, I am not sure I see a point. The user does not
 have any useful recourse, and git should continue to operate as normal.
 Having every single git invocation print by the way, RLIMIT_NOFILE does
 not work on your system seems like it would get annoying.

I agree with that.

-- 
see shy jo


signature.asc
Description: Digital signature


Re: [Question] Git recovery with HEAD commit broken

2013-12-11 Thread Joey Hess
Matthieu Moy wrote:
 Not as far as I know. But git fsck has a --lost-found option that can
 help recovering unreachable (dangling) commits.
 
 You may have a look at http://hackage.haskell.org/package/git-repair but
 I do not think it would solve your particular case.

Well, let's find out.. I corrupted .git/refs/heads/master to refer to a
commit that does not exist. The history has a few prior commits.

joey@darkstar:/tmp/yygit fsck
Checking object directories: 100% (256/256), done.
error: HEAD: invalid sha1 pointer
10814e97cc8bf5f6f8ce0c0d5302f778c09cac88
error: refs/heads/master does not point to a valid object!
notice: No default references

joey@darkstar:/tmp/yy~/src/git-repair/git-repair 
Running git fsck ...
Initialized empty Git repository in /home/joey/tmp/tmprepo.0/.git/
1 missing objects could not be recovered!
To force a recovery to a usable state, retry with the --force parameter.
- exit 1

If there had been a remote that had the missing
10814e97cc8bf5f6f8ce0c0d5302f778c09cac88 commit, it would have cloned
it from there, and this would have succeeded. But with a fully missing
commit, --force is needed to enable more destructive repairs.

joey@darkstar:/tmp/yy~/src/git-repair/git-repair --force
Running git fsck ...
Initialized empty Git repository in /home/joey/tmp/tmprepo.0/.git/
fatal: bad object refs/heads/master
fatal: bad object refs/heads/master
fatal: bad object refs/heads/master
Deleted these local branches, which could not be recovered due to missing 
objects:
refs/heads/master
You currently have refs/heads/master checked out. You may have staged changes 
in the index that can be committed to recover the lost state of this branch!
Successfully recovered repository!
Please carefully check that the changes mentioned above are ok..

Hmm, that could have gone better. While it successfully detected the broken
HEAD, and removed that ref, which is enough to make git fsck pass[1],
it failed to find the old ref in the reflog, despite containing code
that walks up it to find a usable commit.

joey@darkstar:/tmp/yygit reflog
fatal: bad default revision 'HEAD'

And that's why.. git-reflog requires a valid HEAD to work. Bit of a catch-22.
I could work around this by manually parsing the reflog. It would not be
the first thing git-repair has to re-implement because the git command
isn't robust enough[2]. 

I have made a TODO about this.
OTOH, if a kind git developer would like to make git-reflog work when HEAD
is missing, that seems like a generally useful improvement..

-- 
see shy jo

[1] It will make fsck pass 100% of the time -- its test suite randomly
corrupts repositories and checks that it can force some repair good
enough to make git fsck pass.
[2] A particularly annoying one is that git branch -d cannot be used
to remove a branch that is directly pointing to a corrupted commit!


signature.asc
Description: Digital signature


  1   2   >