Re: [PATCH 4/4] archive: queue directories for all types of pathspecs

2017-09-13 Thread René Scharfe
Am 13.09.2017 um 14:53 schrieb Jeff King:
> On Wed, Sep 13, 2017 at 12:43:57AM +0200, René Scharfe wrote:
>> Yet another way is have a few levels of nested subdirectories (e.g.
>> d1/d2/d3/file1) and ignoring the entries at the leaved (e.g. file1).
> 
> s/leaved/leaves/ ?

Indeed.

René


Re: [PATCH 4/4] archive: queue directories for all types of pathspecs

2017-09-13 Thread Jeff King
On Wed, Sep 13, 2017 at 12:43:57AM +0200, René Scharfe wrote:

> -- >8 --
> Subject: [PATCH] archive: don't add empty directories to archives
> 
> While git doesn't track empty directories, git archive can be tricked
> into putting some into archives.  One way is to construct an empty tree
> object, as t5004 does.  While that is supported by the object database,
> it can't be represented in the index and thus it's unlikely to occur in
> the wild.
> 
> Another way is using the literal name of a directory in an exclude
> pathspec -- its contents are are excluded, but the directory stub is
> included.  That's inconsistent: exclude pathspecs containing wildcards
> don't leave empty directories in the archive.
> 
> Yet another way is have a few levels of nested subdirectories (e.g.
> d1/d2/d3/file1) and ignoring the entries at the leaved (e.g. file1).

s/leaved/leaves/ ?

> The directories with the ignored content are ignored as well (e.g. d3),
> but their empty parents are included (e.g. d2).
> 
> As empty directories are not supported by git, they should also not be
> written into archives.  If an empty directory is really needed then it
> can be tracked and archived by placing an empty .gitignore file in it.
> 
> There already is a mechanism in place for suppressing empty directories.
> When read_tree_recursive() encounters a directory excluded by a pathspec
> then it enters it anyway because it might contain included entries.  It
> calls the callback function before it is able to decide if the directory
> is actually needed.  For that reason git archive adds directories to a
> queue and writes entries for them only when it encounters the first
> child item -- but currently only if pathspecs with wildcards are used.
> 
> Queue *all* directories, no matter if there even are pathspecs present.
> This prevents git archive from writing entries for empty directories in
> all cases.

Nicely explained, and this seems like the right level to be handling it.
Simple, and it will catch the cases we know about _and_ and any new ones
which pop up.

> ---
>  archive.c   | 19 ++-
>  t/t5001-archive-attr.sh |  2 +-
>  t/t5002-archive-attr-pattern.sh |  2 +-
>  t/t5004-archive-corner-cases.sh |  4 ++--
>  4 files changed, 6 insertions(+), 21 deletions(-)

I'm not too familiar with this part of the archive code, but it seemed
pretty easy to follow. The patch looks good to me.

-Peff


Re: [PATCH 4/4] archive: queue directories for all types of pathspecs

2017-09-12 Thread René Scharfe
Am 20.08.2017 um 11:06 schrieb Jeff King:
> I actually think it would be reasonable to omit the empty directory in
> t5004. The main thing we care about there is that we produce an archive
> with no files rather than barfing. Checking that the empty directory is
> actually there was mostly "this is what it happens to produce" rather
> than any conscious decision, I think.
> 
> If the new rule is "we omit empty directories", then it would make sense
> for us to follow that, regardless of whether it happened by pathspec
> limiting or if the tree was empty in the first place (and such an
> empty tree is insane anyway; Git tries hard not to create them).

Right.

-- >8 --
Subject: [PATCH] archive: don't add empty directories to archives

While git doesn't track empty directories, git archive can be tricked
into putting some into archives.  One way is to construct an empty tree
object, as t5004 does.  While that is supported by the object database,
it can't be represented in the index and thus it's unlikely to occur in
the wild.

Another way is using the literal name of a directory in an exclude
pathspec -- its contents are are excluded, but the directory stub is
included.  That's inconsistent: exclude pathspecs containing wildcards
don't leave empty directories in the archive.

Yet another way is have a few levels of nested subdirectories (e.g.
d1/d2/d3/file1) and ignoring the entries at the leaved (e.g. file1).
The directories with the ignored content are ignored as well (e.g. d3),
but their empty parents are included (e.g. d2).

As empty directories are not supported by git, they should also not be
written into archives.  If an empty directory is really needed then it
can be tracked and archived by placing an empty .gitignore file in it.

There already is a mechanism in place for suppressing empty directories.
When read_tree_recursive() encounters a directory excluded by a pathspec
then it enters it anyway because it might contain included entries.  It
calls the callback function before it is able to decide if the directory
is actually needed.  For that reason git archive adds directories to a
queue and writes entries for them only when it encounters the first
child item -- but currently only if pathspecs with wildcards are used.

Queue *all* directories, no matter if there even are pathspecs present.
This prevents git archive from writing entries for empty directories in
all cases.

Suggested-by: Jeff King 
Signed-off-by: Rene Scharfe 
---
 archive.c   | 19 ++-
 t/t5001-archive-attr.sh |  2 +-
 t/t5002-archive-attr-pattern.sh |  2 +-
 t/t5004-archive-corner-cases.sh |  4 ++--
 4 files changed, 6 insertions(+), 21 deletions(-)

diff --git a/archive.c b/archive.c
index 1ab8d3a1d7..1e41f4bbeb 100644
--- a/archive.c
+++ b/archive.c
@@ -121,11 +121,6 @@ static int check_attr_export_subst(const struct attr_check 
*check)
return check && ATTR_TRUE(check->items[1].value);
 }
 
-static int should_queue_directories(const struct archiver_args *args)
-{
-   return args->pathspec.has_wildcard;
-}
-
 static int write_archive_entry(const unsigned char *sha1, const char *base,
int baselen, const char *filename, unsigned mode, int stage,
void *context)
@@ -147,7 +142,7 @@ static int write_archive_entry(const unsigned char *sha1, 
const char *base,
strbuf_addch(, '/');
path_without_prefix = path.buf + args->baselen;
 
-   if (!S_ISDIR(mode) || !should_queue_directories(args)) {
+   if (!S_ISDIR(mode)) {
const struct attr_check *check;
check = get_archive_attrs(path_without_prefix);
if (check_attr_export_ignore(check))
@@ -169,14 +164,6 @@ static int write_archive_entry(const unsigned char *sha1, 
const char *base,
return write_entry(args, sha1, path.buf, path.len, mode);
 }
 
-static int write_archive_entry_buf(const unsigned char *sha1, struct strbuf 
*base,
-   const char *filename, unsigned mode, int stage,
-   void *context)
-{
-   return write_archive_entry(sha1, base->buf, base->len,
-filename, mode, stage, context);
-}
-
 static void queue_directory(const unsigned char *sha1,
struct strbuf *base, const char *filename,
unsigned mode, int stage, struct archiver_context *c)
@@ -290,9 +277,7 @@ int write_archive_entries(struct archiver_args *args,
}
 
err = read_tree_recursive(args->tree, "", 0, 0, >pathspec,
- should_queue_directories(args) ?
- queue_or_write_archive_entry :
- write_archive_entry_buf,
+ queue_or_write_archive_entry,
  );
if (err == READ_TREE_RECURSIVE)
err = 0;
diff --git a/t/t5001-archive-attr.sh 

Re: [PATCH 4/4] archive: queue directories for all types of pathspecs

2017-08-21 Thread Stefan Beller
On Sat, Aug 19, 2017 at 9:53 AM, René Scharfe  wrote:
> Am 19.08.2017 um 07:33 schrieb René Scharfe:
>> When read_tree_recursive() encounters a directory excluded by a pathspec
>> then it enters it anyway because it might contain included entries.  It
>> calls the callback function before it is able to decide if the directory
>> is actually needed.
>>
>> For that reason git archive adds directories to a queue and writes
>> entries for them only when it encounters the first child item -- but
>> only if pathspecs with wildcards are used.  Do the same for literal
>> pathspecs as well, as the reasoning above applies to them, too.  This
>> prevents git archive from writing entries for excluded directories.
>
> This breaks the test "archive empty subtree with no pathspec" in t5004 by
> omitting the empty directory from the archive.  Sorry for missing that!
>
> This is kind of a bonus patch, so please discard it for now; the first
> three are OK IMHO.

I thought so up to reading this patch. I found the naming and
content of should_queue_directories not obvious at first read.
This patch confused me even more for that name to be chosen.
I have no alternative to propose, though.

Thanks,
Stefan


Re: [PATCH 4/4] archive: queue directories for all types of pathspecs

2017-08-20 Thread Jeff King
On Sat, Aug 19, 2017 at 06:53:26PM +0200, René Scharfe wrote:

> Am 19.08.2017 um 07:33 schrieb René Scharfe:
> > When read_tree_recursive() encounters a directory excluded by a pathspec
> > then it enters it anyway because it might contain included entries.  It
> > calls the callback function before it is able to decide if the directory
> > is actually needed.
> > 
> > For that reason git archive adds directories to a queue and writes
> > entries for them only when it encounters the first child item -- but
> > only if pathspecs with wildcards are used.  Do the same for literal
> > pathspecs as well, as the reasoning above applies to them, too.  This
> > prevents git archive from writing entries for excluded directories.
> 
> This breaks the test "archive empty subtree with no pathspec" in t5004 by
> omitting the empty directory from the archive.  Sorry for missing that!
> 
> This is kind of a bonus patch, so please discard it for now; the first
> three are OK IMHO.
> 
> A better version of this patch would at least update t5004 as well, but
> there might be a better way.

I actually think it would be reasonable to omit the empty directory in
t5004. The main thing we care about there is that we produce an archive
with no files rather than barfing. Checking that the empty directory is
actually there was mostly "this is what it happens to produce" rather
than any conscious decision, I think.

If the new rule is "we omit empty directories", then it would make sense
for us to follow that, regardless of whether it happened by pathspec
limiting or if the tree was empty in the first place (and such an
empty tree is insane anyway; Git tries hard not to create them).

-Peff


Re: [PATCH 4/4] archive: queue directories for all types of pathspecs

2017-08-19 Thread Junio C Hamano
René Scharfe  writes:

> No, it's "archive empty subtree by direct pathspec" that's broken.  Gah!
>
>> omitting the empty directory from the archive.  Sorry for missing that!
>> 
>> This is kind of a bonus patch, so please discard it for now; the first
>> three are OK IMHO.

Ah, our mails crossed.  Thanks; will drop the last one for now.


Re: [PATCH 4/4] archive: queue directories for all types of pathspecs

2017-08-19 Thread René Scharfe
Am 19.08.2017 um 18:53 schrieb René Scharfe:
> Am 19.08.2017 um 07:33 schrieb René Scharfe:
>> When read_tree_recursive() encounters a directory excluded by a pathspec
>> then it enters it anyway because it might contain included entries.  It
>> calls the callback function before it is able to decide if the directory
>> is actually needed.
>>
>> For that reason git archive adds directories to a queue and writes
>> entries for them only when it encounters the first child item -- but
>> only if pathspecs with wildcards are used.  Do the same for literal
>> pathspecs as well, as the reasoning above applies to them, too.  This
>> prevents git archive from writing entries for excluded directories.
> 
> This breaks the test "archive empty subtree with no pathspec" in t5004 by

No, it's "archive empty subtree by direct pathspec" that's broken.  Gah!

> omitting the empty directory from the archive.  Sorry for missing that!
> 
> This is kind of a bonus patch, so please discard it for now; the first
> three are OK IMHO.
> 
> A better version of this patch would at least update t5004 as well, but
> there might be a better way.
> 
>>
>> Signed-off-by: Rene Scharfe 
>> ---
>>archive.c   | 2 +-
>>t/t5001-archive-attr.sh | 2 +-
>>2 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/archive.c b/archive.c
>> index 1ab8d3a1d7..174c0555b6 100644
>> --- a/archive.c
>> +++ b/archive.c
>> @@ -123,7 +123,7 @@ static int check_attr_export_subst(const struct 
>> attr_check *check)
>>
>>static int should_queue_directories(const struct archiver_args *args)
>>{
>> -return args->pathspec.has_wildcard;
>> +return args->pathspec.nr;
>>}
>>
>>static int write_archive_entry(const unsigned char *sha1, const char 
>> *base,
>> diff --git a/t/t5001-archive-attr.sh b/t/t5001-archive-attr.sh
>> index 897f6f06d5..e9aa97117a 100755
>> --- a/t/t5001-archive-attr.sh
>> +++ b/t/t5001-archive-attr.sh
>> @@ -73,7 +73,7 @@ test_expect_missingarchive-pathspec/ignored-by-tree
>>test_expect_missing   archive-pathspec/ignored-by-tree.d
>>test_expect_missing   archive-pathspec/ignored-by-tree.d/file
>>test_expect_existsarchive-pathspec/ignored-by-worktree
>> -test_expect_missing archive-pathspec/excluded-by-pathspec.d failure
>> +test_expect_missing archive-pathspec/excluded-by-pathspec.d
>>test_expect_missing   archive-pathspec/excluded-by-pathspec.d/file
>>
>>test_expect_success 'git archive with wildcard pathspec' '
>>


Re: [PATCH 4/4] archive: queue directories for all types of pathspecs

2017-08-19 Thread Junio C Hamano
René Scharfe  writes:

> When read_tree_recursive() encounters a directory excluded by a pathspec
> then it enters it anyway because it might contain included entries.  It
> calls the callback function before it is able to decide if the directory
> is actually needed.
>
> For that reason git archive adds directories to a queue and writes
> entries for them only when it encounters the first child item -- but
> only if pathspecs with wildcards are used.  Do the same for literal
> pathspecs as well, as the reasoning above applies to them, too.  This
> prevents git archive from writing entries for excluded directories.
>
> Signed-off-by: Rene Scharfe 
> ---
>  archive.c   | 2 +-
>  t/t5001-archive-attr.sh | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)

This seems to break t/t5004-archive-corner-cases.sh though...

expecting success:
git archive --format=tar $root_tree -- sub >subtree-path.tar &&
make_dir extract &&
"$TAR" xf subtree-path.tar -C extract &&
check_dir extract sub

--- expect  2017-08-19 16:56:49.761513537 +
+++ actual  2017-08-19 16:56:49.769513535 +
@@ -1,2 +1 @@
 extract
-extract/sub
not ok 10 - archive empty subtree by direct pathspec

> diff --git a/archive.c b/archive.c
> index 1ab8d3a1d7..174c0555b6 100644
> --- a/archive.c
> +++ b/archive.c
> @@ -123,7 +123,7 @@ static int check_attr_export_subst(const struct 
> attr_check *check)
>  
>  static int should_queue_directories(const struct archiver_args *args)
>  {
> - return args->pathspec.has_wildcard;
> + return args->pathspec.nr;
>  }
>  
>  static int write_archive_entry(const unsigned char *sha1, const char *base,
> diff --git a/t/t5001-archive-attr.sh b/t/t5001-archive-attr.sh
> index 897f6f06d5..e9aa97117a 100755
> --- a/t/t5001-archive-attr.sh
> +++ b/t/t5001-archive-attr.sh
> @@ -73,7 +73,7 @@ test_expect_missing archive-pathspec/ignored-by-tree
>  test_expect_missing  archive-pathspec/ignored-by-tree.d
>  test_expect_missing  archive-pathspec/ignored-by-tree.d/file
>  test_expect_exists   archive-pathspec/ignored-by-worktree
> -test_expect_missing  archive-pathspec/excluded-by-pathspec.d failure
> +test_expect_missing  archive-pathspec/excluded-by-pathspec.d
>  test_expect_missing  archive-pathspec/excluded-by-pathspec.d/file
>  
>  test_expect_success 'git archive with wildcard pathspec' '


Re: [PATCH 4/4] archive: queue directories for all types of pathspecs

2017-08-19 Thread René Scharfe
Am 19.08.2017 um 07:33 schrieb René Scharfe:
> When read_tree_recursive() encounters a directory excluded by a pathspec
> then it enters it anyway because it might contain included entries.  It
> calls the callback function before it is able to decide if the directory
> is actually needed.
> 
> For that reason git archive adds directories to a queue and writes
> entries for them only when it encounters the first child item -- but
> only if pathspecs with wildcards are used.  Do the same for literal
> pathspecs as well, as the reasoning above applies to them, too.  This
> prevents git archive from writing entries for excluded directories.

This breaks the test "archive empty subtree with no pathspec" in t5004 by
omitting the empty directory from the archive.  Sorry for missing that!

This is kind of a bonus patch, so please discard it for now; the first
three are OK IMHO.

A better version of this patch would at least update t5004 as well, but
there might be a better way.

> 
> Signed-off-by: Rene Scharfe 
> ---
>   archive.c   | 2 +-
>   t/t5001-archive-attr.sh | 2 +-
>   2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/archive.c b/archive.c
> index 1ab8d3a1d7..174c0555b6 100644
> --- a/archive.c
> +++ b/archive.c
> @@ -123,7 +123,7 @@ static int check_attr_export_subst(const struct 
> attr_check *check)
>   
>   static int should_queue_directories(const struct archiver_args *args)
>   {
> - return args->pathspec.has_wildcard;
> + return args->pathspec.nr;
>   }
>   
>   static int write_archive_entry(const unsigned char *sha1, const char *base,
> diff --git a/t/t5001-archive-attr.sh b/t/t5001-archive-attr.sh
> index 897f6f06d5..e9aa97117a 100755
> --- a/t/t5001-archive-attr.sh
> +++ b/t/t5001-archive-attr.sh
> @@ -73,7 +73,7 @@ test_expect_missing archive-pathspec/ignored-by-tree
>   test_expect_missing archive-pathspec/ignored-by-tree.d
>   test_expect_missing archive-pathspec/ignored-by-tree.d/file
>   test_expect_exists  archive-pathspec/ignored-by-worktree
> -test_expect_missing  archive-pathspec/excluded-by-pathspec.d failure
> +test_expect_missing  archive-pathspec/excluded-by-pathspec.d
>   test_expect_missing archive-pathspec/excluded-by-pathspec.d/file
>   
>   test_expect_success 'git archive with wildcard pathspec' '
> 


[PATCH 4/4] archive: queue directories for all types of pathspecs

2017-08-18 Thread René Scharfe
When read_tree_recursive() encounters a directory excluded by a pathspec
then it enters it anyway because it might contain included entries.  It
calls the callback function before it is able to decide if the directory
is actually needed.

For that reason git archive adds directories to a queue and writes
entries for them only when it encounters the first child item -- but
only if pathspecs with wildcards are used.  Do the same for literal
pathspecs as well, as the reasoning above applies to them, too.  This
prevents git archive from writing entries for excluded directories.

Signed-off-by: Rene Scharfe 
---
 archive.c   | 2 +-
 t/t5001-archive-attr.sh | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/archive.c b/archive.c
index 1ab8d3a1d7..174c0555b6 100644
--- a/archive.c
+++ b/archive.c
@@ -123,7 +123,7 @@ static int check_attr_export_subst(const struct attr_check 
*check)
 
 static int should_queue_directories(const struct archiver_args *args)
 {
-   return args->pathspec.has_wildcard;
+   return args->pathspec.nr;
 }
 
 static int write_archive_entry(const unsigned char *sha1, const char *base,
diff --git a/t/t5001-archive-attr.sh b/t/t5001-archive-attr.sh
index 897f6f06d5..e9aa97117a 100755
--- a/t/t5001-archive-attr.sh
+++ b/t/t5001-archive-attr.sh
@@ -73,7 +73,7 @@ test_expect_missing   archive-pathspec/ignored-by-tree
 test_expect_missingarchive-pathspec/ignored-by-tree.d
 test_expect_missingarchive-pathspec/ignored-by-tree.d/file
 test_expect_exists archive-pathspec/ignored-by-worktree
-test_expect_missingarchive-pathspec/excluded-by-pathspec.d failure
+test_expect_missingarchive-pathspec/excluded-by-pathspec.d
 test_expect_missingarchive-pathspec/excluded-by-pathspec.d/file
 
 test_expect_success 'git archive with wildcard pathspec' '
-- 
2.14.1