date:20170818

[PATCH 3/4] archive: don't queue excluded directories

2017-08-18 Thread René Scharfe

Reject directories with the attribute export-ignore already while
queuing them.  This prevents read_tree_recursive() from descending into
them and this avoids write_archive_entry() rejecting them later on,
which queue_or_write_archive_entry() is not prepared for.

Borrow the existing strbuf to build the full path to avoid string
copies and extra allocations; just make sure we restore the original
value before moving on.

Keep checking any other attributes in write_archive_entry() as before,
but avoid checking them twice.

Signed-off-by: Rene Scharfe 
---
 archive.c   | 32 +---
 t/t5001-archive-attr.sh |  4 ++--
 2 files changed, 27 insertions(+), 9 deletions(-)

diff --git a/archive.c b/archive.c
index 8e5f632912..1ab8d3a1d7 100644
--- a/archive.c
+++ b/archive.c
@@ -121,17 +121,21 @@ static int check_attr_export_subst(const struct 
attr_check *check)
return check && ATTR_TRUE(check->items[1].value);
 }
 
+static int should_queue_directories(const struct archiver_args *args)
+{
+   return args->pathspec.has_wildcard;
+}
+
 static int write_archive_entry(const unsigned char *sha1, const char *base,
int baselen, const char *filename, unsigned mode, int stage,
void *context)
 {
static struct strbuf path = STRBUF_INIT;
-   const struct attr_check *check;
struct archiver_context *c = context;
struct archiver_args *args = c->args;
write_archive_entry_fn_t write_entry = c->write_entry;
-   const char *path_without_prefix;
int err;
+   const char *path_without_prefix;
 
args->convert = 0;
strbuf_reset();
@@ -143,10 +147,13 @@ static int write_archive_entry(const unsigned char *sha1, 
const char *base,
strbuf_addch(, '/');
path_without_prefix = path.buf + args->baselen;
 
-   check = get_archive_attrs(path_without_prefix);
-   if (check_attr_export_ignore(check))
-   return 0;
-   args->convert = check_attr_export_subst(check);
+   if (!S_ISDIR(mode) || !should_queue_directories(args)) {
+   const struct attr_check *check;
+   check = get_archive_attrs(path_without_prefix);
+   if (check_attr_export_ignore(check))
+   return 0;
+   args->convert = check_attr_export_subst(check);
+   }
 
if (S_ISDIR(mode) || S_ISGITLINK(mode)) {
if (args->verbose)
@@ -219,6 +226,17 @@ static int queue_or_write_archive_entry(const unsigned 
char *sha1,
}
 
if (S_ISDIR(mode)) {
+   size_t baselen = base->len;
+   const struct attr_check *check;
+
+   /* Borrow base, but restore its original value when done. */
+   strbuf_addstr(base, filename);
+   strbuf_addch(base, '/');
+   check = get_archive_attrs(base->buf);
+   strbuf_setlen(base, baselen);
+
+   if (check_attr_export_ignore(check))
+   return 0;
queue_directory(sha1, base, filename,
mode, stage, c);
return READ_TREE_RECURSIVE;
@@ -272,7 +290,7 @@ int write_archive_entries(struct archiver_args *args,
}
 
err = read_tree_recursive(args->tree, "", 0, 0, >pathspec,
- args->pathspec.has_wildcard ?
+ should_queue_directories(args) ?
  queue_or_write_archive_entry :
  write_archive_entry_buf,
  );
diff --git a/t/t5001-archive-attr.sh b/t/t5001-archive-attr.sh
index 063622bc71..897f6f06d5 100755
--- a/t/t5001-archive-attr.sh
+++ b/t/t5001-archive-attr.sh
@@ -76,7 +76,7 @@ test_expect_existsarchive-pathspec/ignored-by-worktree
 test_expect_missingarchive-pathspec/excluded-by-pathspec.d failure
 test_expect_missingarchive-pathspec/excluded-by-pathspec.d/file
 
-test_expect_failure 'git archive with wildcard pathspec' '
+test_expect_success 'git archive with wildcard pathspec' '
git archive HEAD ":!excluded-by-p*" >archive-pathspec-wildcard.tar &&
extract_tar_to_dir archive-pathspec-wildcard
 '
@@ -85,7 +85,7 @@ test_expect_missing   archive-pathspec-wildcard/ignored
 test_expect_missingarchive-pathspec-wildcard/ignored-by-tree
 test_expect_missingarchive-pathspec-wildcard/ignored-by-tree.d
 test_expect_missingarchive-pathspec-wildcard/ignored-by-tree.d/file
-test_expect_exists archive-pathspec-wildcard/ignored-by-worktree failure
+test_expect_exists archive-pathspec-wildcard/ignored-by-worktree
 test_expect_missingarchive-pathspec-wildcard/excluded-by-pathspec.d
 test_expect_missingarchive-pathspec-wildcard/excluded-by-pathspec.d/file
 
-- 
2.14.1

[PATCH 4/4] archive: queue directories for all types of pathspecs

2017-08-18 Thread René Scharfe

When read_tree_recursive() encounters a directory excluded by a pathspec
then it enters it anyway because it might contain included entries.  It
calls the callback function before it is able to decide if the directory
is actually needed.

For that reason git archive adds directories to a queue and writes
entries for them only when it encounters the first child item -- but
only if pathspecs with wildcards are used.  Do the same for literal
pathspecs as well, as the reasoning above applies to them, too.  This
prevents git archive from writing entries for excluded directories.

Signed-off-by: Rene Scharfe 
---
 archive.c   | 2 +-
 t/t5001-archive-attr.sh | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/archive.c b/archive.c
index 1ab8d3a1d7..174c0555b6 100644
--- a/archive.c
+++ b/archive.c
@@ -123,7 +123,7 @@ static int check_attr_export_subst(const struct attr_check 
*check)
 
 static int should_queue_directories(const struct archiver_args *args)
 {
-   return args->pathspec.has_wildcard;
+   return args->pathspec.nr;
 }
 
 static int write_archive_entry(const unsigned char *sha1, const char *base,
diff --git a/t/t5001-archive-attr.sh b/t/t5001-archive-attr.sh
index 897f6f06d5..e9aa97117a 100755
--- a/t/t5001-archive-attr.sh
+++ b/t/t5001-archive-attr.sh
@@ -73,7 +73,7 @@ test_expect_missing   archive-pathspec/ignored-by-tree
 test_expect_missingarchive-pathspec/ignored-by-tree.d
 test_expect_missingarchive-pathspec/ignored-by-tree.d/file
 test_expect_exists archive-pathspec/ignored-by-worktree
-test_expect_missingarchive-pathspec/excluded-by-pathspec.d failure
+test_expect_missingarchive-pathspec/excluded-by-pathspec.d
 test_expect_missingarchive-pathspec/excluded-by-pathspec.d/file
 
 test_expect_success 'git archive with wildcard pathspec' '
-- 
2.14.1

[PATCH 2/4] archive: factor out helper functions for handling attributes

2017-08-18 Thread René Scharfe

Add helpers for accessing attributes that encapsulate the details of how
to retrieve their values.

Signed-off-by: Rene Scharfe 
---
 archive.c | 31 +++
 1 file changed, 23 insertions(+), 8 deletions(-)

diff --git a/archive.c b/archive.c
index 557dd2db85..8e5f632912 100644
--- a/archive.c
+++ b/archive.c
@@ -103,12 +103,30 @@ struct archiver_context {
struct directory *bottom;
 };
 
+static const struct attr_check *get_archive_attrs(const char *path)
+{
+   static struct attr_check *check;
+   if (!check)
+   check = attr_check_initl("export-ignore", "export-subst", NULL);
+   return git_check_attr(path, check) ? NULL : check;
+}
+
+static int check_attr_export_ignore(const struct attr_check *check)
+{
+   return check && ATTR_TRUE(check->items[0].value);
+}
+
+static int check_attr_export_subst(const struct attr_check *check)
+{
+   return check && ATTR_TRUE(check->items[1].value);
+}
+
 static int write_archive_entry(const unsigned char *sha1, const char *base,
int baselen, const char *filename, unsigned mode, int stage,
void *context)
 {
static struct strbuf path = STRBUF_INIT;
-   static struct attr_check *check;
+   const struct attr_check *check;
struct archiver_context *c = context;
struct archiver_args *args = c->args;
write_archive_entry_fn_t write_entry = c->write_entry;
@@ -125,13 +143,10 @@ static int write_archive_entry(const unsigned char *sha1, 
const char *base,
strbuf_addch(, '/');
path_without_prefix = path.buf + args->baselen;
 
-   if (!check)
-   check = attr_check_initl("export-ignore", "export-subst", NULL);
-   if (!git_check_attr(path_without_prefix, check)) {
-   if (ATTR_TRUE(check->items[0].value))
-   return 0;
-   args->convert = ATTR_TRUE(check->items[1].value);
-   }
+   check = get_archive_attrs(path_without_prefix);
+   if (check_attr_export_ignore(check))
+   return 0;
+   args->convert = check_attr_export_subst(check);
 
if (S_ISDIR(mode) || S_ISGITLINK(mode)) {
if (args->verbose)
-- 
2.14.1

[PATCH 1/4] t5001: add tests for export-ignore attributes and exclude pathspecs

2017-08-18 Thread René Scharfe

Demonstrate mishandling of the attribute export-ignore by git archive
when used together with pathspecs.  Wildcard pathspecs can even cause it
to abort.  And a directory excluded without a wildcard is still included
as an empty folder in the archive.

Test-case-by: David Adam 
Signed-off-by: Rene Scharfe 
---
 t/t5001-archive-attr.sh | 47 ---
 1 file changed, 44 insertions(+), 3 deletions(-)

diff --git a/t/t5001-archive-attr.sh b/t/t5001-archive-attr.sh
index b04d955bfa..063622bc71 100755
--- a/t/t5001-archive-attr.sh
+++ b/t/t5001-archive-attr.sh
@@ -7,11 +7,15 @@ test_description='git archive attribute tests'
 SUBSTFORMAT='%H (%h)%n'
 
 test_expect_exists() {
-   test_expect_success " $1 exists" "test -e $1"
+   test_expect_${2:-success} " $1 exists" "test -e $1"
 }
 
 test_expect_missing() {
-   test_expect_success " $1 does not exist" "test ! -e $1"
+   test_expect_${2:-success} " $1 does not exist" "test ! -e $1"
+}
+
+extract_tar_to_dir () {
+   (mkdir "$1" && cd "$1" && "$TAR" xf -) <"$1.tar"
 }
 
 test_expect_success 'setup' '
@@ -21,12 +25,19 @@ test_expect_success 'setup' '
 
echo ignored by tree >ignored-by-tree &&
echo ignored-by-tree export-ignore >.gitattributes &&
-   git add ignored-by-tree .gitattributes &&
+   mkdir ignored-by-tree.d &&
+   >ignored-by-tree.d/file &&
+   echo ignored-by-tree.d export-ignore >>.gitattributes &&
+   git add ignored-by-tree ignored-by-tree.d .gitattributes &&
 
echo ignored by worktree >ignored-by-worktree &&
echo ignored-by-worktree export-ignore >.gitattributes &&
git add ignored-by-worktree &&
 
+   mkdir excluded-by-pathspec.d &&
+   >excluded-by-pathspec.d/file &&
+   git add excluded-by-pathspec.d &&
+
printf "A\$Format:%s\$O" "$SUBSTFORMAT" >nosubstfile &&
printf "A\$Format:%s\$O" "$SUBSTFORMAT" >substfile1 &&
printf "A not substituted O" >substfile2 &&
@@ -46,7 +57,37 @@ test_expect_success 'git archive' '
 
 test_expect_missingarchive/ignored
 test_expect_missingarchive/ignored-by-tree
+test_expect_missingarchive/ignored-by-tree.d
+test_expect_missingarchive/ignored-by-tree.d/file
 test_expect_exists archive/ignored-by-worktree
+test_expect_exists archive/excluded-by-pathspec.d
+test_expect_exists archive/excluded-by-pathspec.d/file
+
+test_expect_success 'git archive with pathspec' '
+   git archive HEAD ":!excluded-by-pathspec.d" >archive-pathspec.tar &&
+   extract_tar_to_dir archive-pathspec
+'
+
+test_expect_missingarchive-pathspec/ignored
+test_expect_missingarchive-pathspec/ignored-by-tree
+test_expect_missingarchive-pathspec/ignored-by-tree.d
+test_expect_missingarchive-pathspec/ignored-by-tree.d/file
+test_expect_exists archive-pathspec/ignored-by-worktree
+test_expect_missingarchive-pathspec/excluded-by-pathspec.d failure
+test_expect_missingarchive-pathspec/excluded-by-pathspec.d/file
+
+test_expect_failure 'git archive with wildcard pathspec' '
+   git archive HEAD ":!excluded-by-p*" >archive-pathspec-wildcard.tar &&
+   extract_tar_to_dir archive-pathspec-wildcard
+'
+
+test_expect_missingarchive-pathspec-wildcard/ignored
+test_expect_missingarchive-pathspec-wildcard/ignored-by-tree
+test_expect_missingarchive-pathspec-wildcard/ignored-by-tree.d
+test_expect_missingarchive-pathspec-wildcard/ignored-by-tree.d/file
+test_expect_exists archive-pathspec-wildcard/ignored-by-worktree failure
+test_expect_missingarchive-pathspec-wildcard/excluded-by-pathspec.d
+test_expect_missingarchive-pathspec-wildcard/excluded-by-pathspec.d/file
 
 test_expect_success 'git archive with worktree attributes' '
git archive --worktree-attributes HEAD >worktree.tar &&
-- 
2.14.1

Re: Bug?: git archive exclude pathspec and gitattributes export-ignore

2017-08-18 Thread René Scharfe

Am 14.08.2017 um 18:43 schrieb René Scharfe:
> The real solution is probably to teach tree-walk.c::do_match() how to
> handle attributes and then inject ":(attr:-export-ignore)" as a default
> internal pathspec in archive.c::parse_pathspec_arg() instead of handling
> it in archive.c::write_archive_entry().

That's complicated and I'm not sure anymore if it's even a good idea.
Let's solve this in git archive for now.

  t5001: add tests for export-ignore attributes and exclude pathspecs
  archive: factor out helper functions for handling attributes
  archive: don't queue excluded directories
  archive: queue directories for all types of pathspecs

 archive.c   | 49 +
 t/t5001-archive-attr.sh | 47 ---
 2 files changed, 85 insertions(+), 11 deletions(-)

-- 
2.14.1

Re: [PATCH v2 00/25] Move exported packfile funcs to its own file

2017-08-18 Thread Jonathan Tan

On Fri, 11 Aug 2017 15:41:28 -0400
Ben Peart  wrote:

> Nice to see the pack file functions being refactored out.  I looked at 
> the end result and it looked good to me.

Thanks.

> Do you have the energy to do a similar refactoring for the remaining 
> public functions residing in sha1_file.c?  Perhaps a new sha1_file.h? It 
> would be nice to get more things out of cache.h. :)

I agree that that would be desirable, but for now I'll leave that for
someone else to do :-)

Re: [RFC PATCH] Updated "imported object" design

2017-08-18 Thread Jonathan Tan

On Fri, 18 Aug 2017 10:18:37 -0400
Ben Peart  wrote:

> > But if there was a good way to refer to the "anti-projection" in a
> > virtualized system (that is, the "real" thing or "object" behind the
> > "virtual" thing or "image"), then maybe the "virtualized" language is
> > the best. (And I would gladly change - I'm having a hard time coming up
> > with a name for the "anti-projection" in the "lazy" language.)
> > 
> 
> The most common "anti-virtual" language I'm familiar with is "physical." 
>   Virtual machine <-> physical machine. Virtual world <-> physical 
> world. Virtual repo, commit, tree, blob - physical repo, commit, tree, 
> blob. I'm not thrilled but I think it works...

I was thinking more along the lines of the "entity that projects the
virtualization", not the opposite of a "virtualization" - "physical"
might work for the latter but probably not the former.

After some in-office discussion, if we stick to the "promise" concept,
maybe we have something like this:

  In a partial clone, the origin acts as a promisor of objects. Every
  object obtained from the promisor also acts as a promise that any
  object directly or indirectly referenced from that object is fetchable
  from the promisor.

> > This is not true if you're fetching from another repo 
> 
> This isn't a case we've explicitly dealt with (multiple remotes into a 
> virtualized repo).  Our behavior today would be that once you set the 
> "virtual repo" flag on the repo (this happens at clone for us), all 
> remotes are treated as virtual as well (ie we don't differentiate 
> behavior based on which remote was used).  Our "custom fetcher" always 
> uses "origin" and some custom settings for a cache-server saved in the 
> .git/config file when asked to fetch missing objects.
> 
> This is probably a good model to stick with at least initially as trying 
> to solve multiple possible "virtual" remotes as well as mingling 
> virtualized and non-virtualized remotes and all the mixed cases that can 
> come up makes my head hurt.  We should probably address that in a 
> different thread. :)

OK, let's stick to the current model first then, whether our opinion on
other remotes is (1) "we won't have any other remotes so we don't care",
(2) "we have other remotes but it's fine to make sure that they don't
introduce any new missing objects", or (3) "we need other remotes to
introduce missing objects, but we can build that after this foundation
is laid".

> > or if you're using
> > receive-pack, but (1) I think these are not used as much in such a
> > situation, and (2) if you do use them, the slowness only "kicks in" if
> > you do not have the objects referred to (whether non-"imported" or
> > "imported") and thus have to check the references in all "imported"
> > objects.
> > 
> 
> Is there any case where receive-pack is used on the client side?  I'm 
> only aware of it being used on the server side to receive packs pushed 
> from the client.  If it is not used in a virtualized client, then we 
> would not need to do anything different for receive-pack.

This happens if another repo decides to push to the virtualized client,
which (as I wrote) I don't expect to happen often. My intention is to
ensure that receive-pack will still work.

> That is another good point.  Given the discussion above about not 
> needing to do the connectivity test for fetch/clone - the potential perf 
> hit of loading/parsing all the various objects to build up the oidset is 
> much less of an issue.

Agreed.

Git makes a merge commit but as a normal (non-merge) commit

2017-08-18 Thread hIpPy

While merging if I do certain actions then the merge commit is made
with the merge message but as a normal (non-merge) commit.

Repro steps:
- Set GIT_MERGE_AUTOEDIT=yes (set other than "no") in .bashrc
- Make a merge commit with no conflicts.
  (external text editor shows the generated merge message)
- Focus on Git Bash and Ctrl-C.
- Commit (git commit).

Actual behavior:
Git makes a normal (non-merge) commit (squash merge) but with the
merge commit message.

It looks like a bug to me. This is very confusing later on as the repo
topology would show that the branch is not merged in and there is not
an easy way to find out when the merge was made.

Expected behavior:
Git should stay in a MERGING state. The user can choose to either
abort the merge or continue the merge (git merge --continue OR git
commit).

This does not happen in case of conflicts (at least I'm not able to
repro). I get a (master|MERGING) prompt till I resolve the conflicts
and commit, which goes through correctly as a merge commit.

Environment:
$ git version
git version 2.14.0.windows.2
$ bash --version
GNU bash, version 4.4.12(1)-release (x86_64-pc-msys)

Thanks,
RM

Re: Revision resolution for remote-helpers?

2017-08-18 Thread Jonathan Nieder

Mike Hommey wrote[1]:
> On Fri, Aug 18, 2017 at 03:06:37PM -0700, Jonathan Nieder wrote:
>> Mike Hommey wrote:

>>> The reason for the :: prefix is that it matches the ::
>>> prefix used for remote helpers.
>>>
>>> Now, there are a few caveats:
[...]
>>> - msys likes to completely fuck up command lines when they contain ::.
>>>   For remote helpers, the alternative that works is
>>>   :///etc.
>>
>> Hm --- is there a bug already open about this (e.g. in the Git for
>> Windows project or in msys) where I can read more?
>
> It's entirely an msys problem. Msys has weird rules to translate between
> unix paths and windows paths on the command line, and botches everything
> as a consequence. That's by "design".
>
> http://www.mingw.org/wiki/Posix_path_conversion
>
> (Particularly, see the last two entries)
>
> That only happens when calling native Windows programs from a msys
> shell.

Cc-ing the Git for Windows mailing list as an FYI.

I have faint memories that some developers on that project have had to
delve deep into Msys path modification rules.  It's possible they can
give us advice (e.g. about :: having been a bad choice of
syntax in the first place :)).

Thanks,
Jonathan

[1] https://public-inbox.org/git/20170818221754.3rbh35aewj5xn...@glandium.org/

[PATCH v3 04/23] pack: move open_pack_index(), parse_pack_index()

2017-08-18 Thread Jonathan Tan

alloc_packed_git() in packfile.c is duplicated from sha1_file.c. In a
subsequent commit, alloc_packed_git() will be removed from sha1_file.c.

Signed-off-by: Jonathan Tan 
---
 builtin/count-objects.c |   1 +
 builtin/fsck.c  |   1 +
 builtin/pack-objects.c  |   1 +
 cache.h |   8 ---
 pack-bitmap.c   |   1 +
 pack-check.c|   1 +
 packfile.c  | 149 
 packfile.h  |   8 +++
 sha1_file.c | 140 -
 sha1_name.c |   1 +
 10 files changed, 163 insertions(+), 148 deletions(-)

diff --git a/builtin/count-objects.c b/builtin/count-objects.c
index 1d82e61f2..33343818c 100644
--- a/builtin/count-objects.c
+++ b/builtin/count-objects.c
@@ -10,6 +10,7 @@
 #include "builtin.h"
 #include "parse-options.h"
 #include "quote.h"
+#include "packfile.h"
 
 static unsigned long garbage;
 static off_t size_garbage;
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 99dea7adf..c56207b21 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -15,6 +15,7 @@
 #include "progress.h"
 #include "streaming.h"
 #include "decorate.h"
+#include "packfile.h"
 
 #define REACHABLE 0x0001
 #define SEEN  0x0002
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 5c5d3d507..08f05cb84 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -25,6 +25,7 @@
 #include "sha1-array.h"
 #include "argv-array.h"
 #include "mru.h"
+#include "packfile.h"
 
 static const char *pack_usage[] = {
N_("git pack-objects --stdout [...] [<  | < 
]"),
diff --git a/cache.h b/cache.h
index a0497d469..f271033db 100644
--- a/cache.h
+++ b/cache.h
@@ -1611,8 +1611,6 @@ struct pack_entry {
struct packed_git *p;
 };
 
-extern struct packed_git *parse_pack_index(unsigned char *sha1, const char 
*idx_path);
-
 /* A hook to report invalid files in pack directory */
 #define PACKDIR_FILE_PACK 1
 #define PACKDIR_FILE_IDX 2
@@ -1647,12 +1645,6 @@ extern int odb_mkstemp(struct strbuf *template, const 
char *pattern);
  */
 extern int odb_pack_keep(const char *name);
 
-/*
- * mmap the index file for the specified packfile (if it is not
- * already mmapped).  Return 0 on success.
- */
-extern int open_pack_index(struct packed_git *);
-
 /*
  * munmap the index file for the specified packfile (if it is
  * currently mmapped).
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 327634cd7..cb3d14ba4 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -9,6 +9,7 @@
 #include "pack-bitmap.h"
 #include "pack-revindex.h"
 #include "pack-objects.h"
+#include "packfile.h"
 
 /*
  * An entry on the bitmap index, representing the bitmap for a given
diff --git a/pack-check.c b/pack-check.c
index 84469168a..2086f5bb7 100644
--- a/pack-check.c
+++ b/pack-check.c
@@ -2,6 +2,7 @@
 #include "pack.h"
 #include "pack-revindex.h"
 #include "progress.h"
+#include "packfile.h"
 
 struct idx_entry {
off_toffset;
diff --git a/packfile.c b/packfile.c
index 60d9fc3b0..6edc43228 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1,5 +1,6 @@
 #include "cache.h"
 #include "mru.h"
+#include "pack.h"
 
 char *odb_pack_name(struct strbuf *buf,
const unsigned char *sha1,
@@ -59,3 +60,151 @@ void pack_report(void)
pack_open_windows, peak_pack_open_windows,
sz_fmt(pack_mapped), sz_fmt(peak_pack_mapped));
 }
+
+/*
+ * Open and mmap the index file at path, perform a couple of
+ * consistency checks, then record its information to p.  Return 0 on
+ * success.
+ */
+static int check_packed_git_idx(const char *path, struct packed_git *p)
+{
+   void *idx_map;
+   struct pack_idx_header *hdr;
+   size_t idx_size;
+   uint32_t version, nr, i, *index;
+   int fd = git_open(path);
+   struct stat st;
+
+   if (fd < 0)
+   return -1;
+   if (fstat(fd, )) {
+   close(fd);
+   return -1;
+   }
+   idx_size = xsize_t(st.st_size);
+   if (idx_size < 4 * 256 + 20 + 20) {
+   close(fd);
+   return error("index file %s is too small", path);
+   }
+   idx_map = xmmap(NULL, idx_size, PROT_READ, MAP_PRIVATE, fd, 0);
+   close(fd);
+
+   hdr = idx_map;
+   if (hdr->idx_signature == htonl(PACK_IDX_SIGNATURE)) {
+   version = ntohl(hdr->idx_version);
+   if (version < 2 || version > 2) {
+   munmap(idx_map, idx_size);
+   return error("index file %s is version %"PRIu32
+" and is not supported by this binary"
+" (try upgrading GIT to a newer version)",
+path, version);
+   }
+   } else
+   version = 1;
+
+   nr = 0;
+   index = idx_map;
+   if (version > 1)
+   index += 2;  /* skip index

[PATCH v3 08/23] pack: move unuse_pack()

2017-08-18 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 cache.h | 1 -
 packfile.c  | 9 +
 packfile.h  | 1 +
 sha1_file.c | 9 -
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/cache.h b/cache.h
index a27018210..0313b0b8d 100644
--- a/cache.h
+++ b/cache.h
@@ -1645,7 +1645,6 @@ extern int odb_mkstemp(struct strbuf *template, const 
char *pattern);
  */
 extern int odb_pack_keep(const char *name);
 
-extern void unuse_pack(struct pack_window **);
 extern void clear_delta_base_cache(void);
 extern struct packed_git *add_packed_git(const char *path, size_t path_len, 
int local);
 
diff --git a/packfile.c b/packfile.c
index ea451d27e..0c97c3a1a 100644
--- a/packfile.c
+++ b/packfile.c
@@ -596,3 +596,12 @@ unsigned char *use_pack(struct packed_git *p,
*left = win->len - xsize_t(offset);
return win->base + offset;
 }
+
+void unuse_pack(struct pack_window **w_cursor)
+{
+   struct pack_window *w = *w_cursor;
+   if (w) {
+   w->inuse_cnt--;
+   *w_cursor = NULL;
+   }
+}
diff --git a/packfile.h b/packfile.h
index 97cfc5e70..b5db490ab 100644
--- a/packfile.h
+++ b/packfile.h
@@ -45,6 +45,7 @@ extern void close_pack_index(struct packed_git *);
 extern unsigned char *use_pack(struct packed_git *, struct pack_window **, 
off_t, size_t *);
 extern void close_pack_windows(struct packed_git *);
 extern void close_all_packs(void);
+extern void unuse_pack(struct pack_window **);
 
 extern void release_pack_memory(size_t);
 
diff --git a/sha1_file.c b/sha1_file.c
index 7704801d1..84d96d0ab 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -718,15 +718,6 @@ void *xmmap(void *start, size_t length,
return ret;
 }
 
-void unuse_pack(struct pack_window **w_cursor)
-{
-   struct pack_window *w = *w_cursor;
-   if (w) {
-   w->inuse_cnt--;
-   *w_cursor = NULL;
-   }
-}
-
 static struct packed_git *alloc_packed_git(int extra)
 {
struct packed_git *p = xmalloc(st_add(sizeof(*p), extra));
-- 
2.14.1.480.gb18f417b89-goog

[PATCH v3 09/23] pack: move add_packed_git()

2017-08-18 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 cache.h |  1 -
 connected.c |  1 +
 packfile.c  | 53 +
 packfile.h  |  1 +
 sha1_file.c | 61 -
 5 files changed, 55 insertions(+), 62 deletions(-)

diff --git a/cache.h b/cache.h
index 0313b0b8d..3625509f9 100644
--- a/cache.h
+++ b/cache.h
@@ -1646,7 +1646,6 @@ extern int odb_mkstemp(struct strbuf *template, const 
char *pattern);
 extern int odb_pack_keep(const char *name);
 
 extern void clear_delta_base_cache(void);
-extern struct packed_git *add_packed_git(const char *path, size_t path_len, 
int local);
 
 /*
  * Make sure that a pointer access into an mmap'd index file is within bounds,
diff --git a/connected.c b/connected.c
index 136c2ac16..3e3f0148c 100644
--- a/connected.c
+++ b/connected.c
@@ -3,6 +3,7 @@
 #include "sigchain.h"
 #include "connected.h"
 #include "transport.h"
+#include "pack.h"
 
 /*
  * If we feed all the commits we want to verify to this command
diff --git a/packfile.c b/packfile.c
index 0c97c3a1a..d1433d8c7 100644
--- a/packfile.c
+++ b/packfile.c
@@ -605,3 +605,56 @@ void unuse_pack(struct pack_window **w_cursor)
*w_cursor = NULL;
}
 }
+
+static void try_to_free_pack_memory(size_t size)
+{
+   release_pack_memory(size);
+}
+
+struct packed_git *add_packed_git(const char *path, size_t path_len, int local)
+{
+   static int have_set_try_to_free_routine;
+   struct stat st;
+   size_t alloc;
+   struct packed_git *p;
+
+   if (!have_set_try_to_free_routine) {
+   have_set_try_to_free_routine = 1;
+   set_try_to_free_routine(try_to_free_pack_memory);
+   }
+
+   /*
+* Make sure a corresponding .pack file exists and that
+* the index looks sane.
+*/
+   if (!strip_suffix_mem(path, _len, ".idx"))
+   return NULL;
+
+   /*
+* ".pack" is long enough to hold any suffix we're adding (and
+* the use xsnprintf double-checks that)
+*/
+   alloc = st_add3(path_len, strlen(".pack"), 1);
+   p = alloc_packed_git(alloc);
+   memcpy(p->pack_name, path, path_len);
+
+   xsnprintf(p->pack_name + path_len, alloc - path_len, ".keep");
+   if (!access(p->pack_name, F_OK))
+   p->pack_keep = 1;
+
+   xsnprintf(p->pack_name + path_len, alloc - path_len, ".pack");
+   if (stat(p->pack_name, ) || !S_ISREG(st.st_mode)) {
+   free(p);
+   return NULL;
+   }
+
+   /* ok, it looks sane as far as we can check without
+* actually mapping the pack file.
+*/
+   p->pack_size = st.st_size;
+   p->pack_local = local;
+   p->mtime = st.st_mtime;
+   if (path_len < 40 || get_sha1_hex(path + path_len - 40, p->sha1))
+   hashclr(p->sha1);
+   return p;
+}
diff --git a/packfile.h b/packfile.h
index b5db490ab..1e932a49e 100644
--- a/packfile.h
+++ b/packfile.h
@@ -46,6 +46,7 @@ extern unsigned char *use_pack(struct packed_git *, struct 
pack_window **, off_t
 extern void close_pack_windows(struct packed_git *);
 extern void close_all_packs(void);
 extern void unuse_pack(struct pack_window **);
+extern struct packed_git *add_packed_git(const char *path, size_t path_len, 
int local);
 
 extern void release_pack_memory(size_t);
 
diff --git a/sha1_file.c b/sha1_file.c
index 84d96d0ab..0929fc10e 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -718,67 +718,6 @@ void *xmmap(void *start, size_t length,
return ret;
 }
 
-static struct packed_git *alloc_packed_git(int extra)
-{
-   struct packed_git *p = xmalloc(st_add(sizeof(*p), extra));
-   memset(p, 0, sizeof(*p));
-   p->pack_fd = -1;
-   return p;
-}
-
-static void try_to_free_pack_memory(size_t size)
-{
-   release_pack_memory(size);
-}
-
-struct packed_git *add_packed_git(const char *path, size_t path_len, int local)
-{
-   static int have_set_try_to_free_routine;
-   struct stat st;
-   size_t alloc;
-   struct packed_git *p;
-
-   if (!have_set_try_to_free_routine) {
-   have_set_try_to_free_routine = 1;
-   set_try_to_free_routine(try_to_free_pack_memory);
-   }
-
-   /*
-* Make sure a corresponding .pack file exists and that
-* the index looks sane.
-*/
-   if (!strip_suffix_mem(path, _len, ".idx"))
-   return NULL;
-
-   /*
-* ".pack" is long enough to hold any suffix we're adding (and
-* the use xsnprintf double-checks that)
-*/
-   alloc = st_add3(path_len, strlen(".pack"), 1);
-   p = alloc_packed_git(alloc);
-   memcpy(p->pack_name, path, path_len);
-
-   xsnprintf(p->pack_name + path_len, alloc - path_len, ".keep");
-   if (!access(p->pack_name, F_OK))
-   p->pack_keep = 1;
-
-   xsnprintf(p->pack_name + path_len, alloc - path_len, ".pack");
-

[PATCH v3 20/23] pack: move find_pack_entry() and make it global

2017-08-18 Thread Jonathan Tan

This function needs to be global as it is used by sha1_file.c and will
be used by packfile.c.

Signed-off-by: Jonathan Tan 
---
 packfile.c  | 53 +
 packfile.h  |  2 ++
 sha1_file.c | 53 -
 3 files changed, 55 insertions(+), 53 deletions(-)

diff --git a/packfile.c b/packfile.c
index ba3a5eb3a..ae5395f5f 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1787,3 +1787,56 @@ struct packed_git *find_sha1_pack(const unsigned char 
*sha1,
return NULL;
 
 }
+
+static int fill_pack_entry(const unsigned char *sha1,
+  struct pack_entry *e,
+  struct packed_git *p)
+{
+   off_t offset;
+
+   if (p->num_bad_objects) {
+   unsigned i;
+   for (i = 0; i < p->num_bad_objects; i++)
+   if (!hashcmp(sha1, p->bad_object_sha1 + 20 * i))
+   return 0;
+   }
+
+   offset = find_pack_entry_one(sha1, p);
+   if (!offset)
+   return 0;
+
+   /*
+* We are about to tell the caller where they can locate the
+* requested object.  We better make sure the packfile is
+* still here and can be accessed before supplying that
+* answer, as it may have been deleted since the index was
+* loaded!
+*/
+   if (!is_pack_valid(p))
+   return 0;
+   e->offset = offset;
+   e->p = p;
+   hashcpy(e->sha1, sha1);
+   return 1;
+}
+
+/*
+ * Iff a pack file contains the object named by sha1, return true and
+ * store its location to e.
+ */
+int find_pack_entry(const unsigned char *sha1, struct pack_entry *e)
+{
+   struct mru_entry *p;
+
+   prepare_packed_git();
+   if (!packed_git)
+   return 0;
+
+   for (p = packed_git_mru->head; p; p = p->next) {
+   if (fill_pack_entry(sha1, e, p->item)) {
+   mru_mark(packed_git_mru, p);
+   return 1;
+   }
+   }
+   return 0;
+}
diff --git a/packfile.h b/packfile.h
index a4ff6f6ed..c9b4fcfaf 100644
--- a/packfile.h
+++ b/packfile.h
@@ -118,4 +118,6 @@ extern int packed_object_info(struct packed_git *pack, 
off_t offset, struct obje
 extern void mark_bad_packed_object(struct packed_git *p, const unsigned char 
*sha1);
 extern const struct packed_git *has_packed_and_bad(const unsigned char *sha1);
 
+extern int find_pack_entry(const unsigned char *sha1, struct pack_entry *e);
+
 #endif
diff --git a/sha1_file.c b/sha1_file.c
index 8853672d2..76c86639c 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1074,59 +1074,6 @@ int parse_sha1_header(const char *hdr, unsigned long 
*sizep)
return parse_sha1_header_extended(hdr, , 0);
 }
 
-static int fill_pack_entry(const unsigned char *sha1,
-  struct pack_entry *e,
-  struct packed_git *p)
-{
-   off_t offset;
-
-   if (p->num_bad_objects) {
-   unsigned i;
-   for (i = 0; i < p->num_bad_objects; i++)
-   if (!hashcmp(sha1, p->bad_object_sha1 + 20 * i))
-   return 0;
-   }
-
-   offset = find_pack_entry_one(sha1, p);
-   if (!offset)
-   return 0;
-
-   /*
-* We are about to tell the caller where they can locate the
-* requested object.  We better make sure the packfile is
-* still here and can be accessed before supplying that
-* answer, as it may have been deleted since the index was
-* loaded!
-*/
-   if (!is_pack_valid(p))
-   return 0;
-   e->offset = offset;
-   e->p = p;
-   hashcpy(e->sha1, sha1);
-   return 1;
-}
-
-/*
- * Iff a pack file contains the object named by sha1, return true and
- * store its location to e.
- */
-static int find_pack_entry(const unsigned char *sha1, struct pack_entry *e)
-{
-   struct mru_entry *p;
-
-   prepare_packed_git();
-   if (!packed_git)
-   return 0;
-
-   for (p = packed_git_mru->head; p; p = p->next) {
-   if (fill_pack_entry(sha1, e, p->item)) {
-   mru_mark(packed_git_mru, p);
-   return 1;
-   }
-   }
-   return 0;
-}
-
 static int sha1_loose_object_info(const unsigned char *sha1,
  struct object_info *oi,
  int flags)
-- 
2.14.1.480.gb18f417b89-goog

[PATCH v3 10/23] pack: move install_packed_git()

2017-08-18 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 cache.h |  1 -
 packfile.c  | 11 ++-
 packfile.h  |  2 ++
 sha1_file.c |  9 -
 4 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/cache.h b/cache.h
index 3625509f9..c4d8bee52 100644
--- a/cache.h
+++ b/cache.h
@@ -1619,7 +1619,6 @@ extern void (*report_garbage)(unsigned seen_bits, const 
char *path);
 
 extern void prepare_packed_git(void);
 extern void reprepare_packed_git(void);
-extern void install_packed_git(struct packed_git *pack);
 
 /*
  * Give a rough count of objects in the repository. This sacrifices accuracy
diff --git a/packfile.c b/packfile.c
index d1433d8c7..9a65aa4f6 100644
--- a/packfile.c
+++ b/packfile.c
@@ -28,7 +28,7 @@ static unsigned int pack_used_ctr;
 static unsigned int pack_mmap_calls;
 static unsigned int peak_pack_open_windows;
 static unsigned int pack_open_windows;
-unsigned int pack_open_fds;
+static unsigned int pack_open_fds;
 static unsigned int pack_max_fds;
 static size_t peak_pack_mapped;
 static size_t pack_mapped;
@@ -658,3 +658,12 @@ struct packed_git *add_packed_git(const char *path, size_t 
path_len, int local)
hashclr(p->sha1);
return p;
 }
+
+void install_packed_git(struct packed_git *pack)
+{
+   if (pack->pack_fd != -1)
+   pack_open_fds++;
+
+   pack->next = packed_git;
+   packed_git = pack;
+}
diff --git a/packfile.h b/packfile.h
index 1e932a49e..a18029184 100644
--- a/packfile.h
+++ b/packfile.h
@@ -28,6 +28,8 @@ extern unsigned int pack_open_fds;
 
 extern struct packed_git *parse_pack_index(unsigned char *sha1, const char 
*idx_path);
 
+extern void install_packed_git(struct packed_git *pack);
+
 extern void pack_report(void);
 
 /*
diff --git a/sha1_file.c b/sha1_file.c
index 0929fc10e..b77e7e3c3 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -718,15 +718,6 @@ void *xmmap(void *start, size_t length,
return ret;
 }
 
-void install_packed_git(struct packed_git *pack)
-{
-   if (pack->pack_fd != -1)
-   pack_open_fds++;
-
-   pack->next = packed_git;
-   packed_git = pack;
-}
-
 void (*report_garbage)(unsigned seen_bits, const char *path);
 
 static void report_helper(const struct string_list *list,
-- 
2.14.1.480.gb18f417b89-goog

[PATCH v3 12/23] pack: move unpack_object_header_buffer()

2017-08-18 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 cache.h |  1 -
 packfile.c  | 25 +
 packfile.h  |  2 ++
 sha1_file.c | 25 -
 4 files changed, 27 insertions(+), 26 deletions(-)

diff --git a/cache.h b/cache.h
index 63765d481..75cc0c497 100644
--- a/cache.h
+++ b/cache.h
@@ -1669,7 +1669,6 @@ extern off_t find_pack_entry_one(const unsigned char 
*sha1, struct packed_git *)
 
 extern int is_pack_valid(struct packed_git *);
 extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, 
unsigned long *);
-extern unsigned long unpack_object_header_buffer(const unsigned char *buf, 
unsigned long len, enum object_type *type, unsigned long *sizep);
 extern unsigned long get_size_from_delta(struct packed_git *, struct 
pack_window **, off_t);
 extern int unpack_object_header(struct packed_git *, struct pack_window **, 
off_t *, unsigned long *);
 
diff --git a/packfile.c b/packfile.c
index 9cf462856..43b708812 100644
--- a/packfile.c
+++ b/packfile.c
@@ -884,3 +884,28 @@ void reprepare_packed_git(void)
prepare_packed_git_run_once = 0;
prepare_packed_git();
 }
+
+unsigned long unpack_object_header_buffer(const unsigned char *buf,
+   unsigned long len, enum object_type *type, unsigned long *sizep)
+{
+   unsigned shift;
+   unsigned long size, c;
+   unsigned long used = 0;
+
+   c = buf[used++];
+   *type = (c >> 4) & 7;
+   size = c & 15;
+   shift = 4;
+   while (c & 0x80) {
+   if (len <= used || bitsizeof(long) <= shift) {
+   error("bad object header");
+   size = used = 0;
+   break;
+   }
+   c = buf[used++];
+   size += (c & 0x7f) << shift;
+   shift += 7;
+   }
+   *sizep = size;
+   return used;
+}
diff --git a/packfile.h b/packfile.h
index 1cfda1d00..9f36e0112 100644
--- a/packfile.h
+++ b/packfile.h
@@ -62,6 +62,8 @@ extern void close_all_packs(void);
 extern void unuse_pack(struct pack_window **);
 extern struct packed_git *add_packed_git(const char *path, size_t path_len, 
int local);
 
+extern unsigned long unpack_object_header_buffer(const unsigned char *buf, 
unsigned long len, enum object_type *type, unsigned long *sizep);
+
 extern void release_pack_memory(size_t);
 
 extern int open_packed_git(struct packed_git *p);
diff --git a/sha1_file.c b/sha1_file.c
index 51bb4d1db..b57b0 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -914,31 +914,6 @@ void *map_sha1_file(const unsigned char *sha1, unsigned 
long *size)
return map_sha1_file_1(NULL, sha1, size);
 }
 
-unsigned long unpack_object_header_buffer(const unsigned char *buf,
-   unsigned long len, enum object_type *type, unsigned long *sizep)
-{
-   unsigned shift;
-   unsigned long size, c;
-   unsigned long used = 0;
-
-   c = buf[used++];
-   *type = (c >> 4) & 7;
-   size = c & 15;
-   shift = 4;
-   while (c & 0x80) {
-   if (len <= used || bitsizeof(long) <= shift) {
-   error("bad object header");
-   size = used = 0;
-   break;
-   }
-   c = buf[used++];
-   size += (c & 0x7f) << shift;
-   shift += 7;
-   }
-   *sizep = size;
-   return used;
-}
-
 static int unpack_sha1_short_header(git_zstream *stream,
unsigned char *map, unsigned long mapsize,
void *buffer, unsigned long bufsiz)
-- 
2.14.1.480.gb18f417b89-goog

[PATCH v3 11/23] pack: move {,re}prepare_packed_git and approximate_object_count

2017-08-18 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 builtin/gc.c   |   1 +
 bulk-checkin.c |   1 +
 cache.h|  15 
 connected.c|   2 +-
 fetch-pack.c   |   1 +
 http-backend.c |   1 +
 packfile.c | 217 +
 packfile.h |  16 -
 path.c |   1 +
 server-info.c  |   1 +
 sha1_file.c| 214 
 11 files changed, 238 insertions(+), 232 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index e6b84475a..3c78fcb9b 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -19,6 +19,7 @@
 #include "sigchain.h"
 #include "argv-array.h"
 #include "commit.h"
+#include "packfile.h"
 
 #define FAILED_RUN "failed to run %s"
 
diff --git a/bulk-checkin.c b/bulk-checkin.c
index 5be7ce5c7..9a1f6c49a 100644
--- a/bulk-checkin.c
+++ b/bulk-checkin.c
@@ -6,6 +6,7 @@
 #include "csum-file.h"
 #include "pack.h"
 #include "strbuf.h"
+#include "packfile.h"
 
 static struct bulk_checkin_state {
unsigned plugged:1;
diff --git a/cache.h b/cache.h
index c4d8bee52..63765d481 100644
--- a/cache.h
+++ b/cache.h
@@ -1611,21 +1611,6 @@ struct pack_entry {
struct packed_git *p;
 };
 
-/* A hook to report invalid files in pack directory */
-#define PACKDIR_FILE_PACK 1
-#define PACKDIR_FILE_IDX 2
-#define PACKDIR_FILE_GARBAGE 4
-extern void (*report_garbage)(unsigned seen_bits, const char *path);
-
-extern void prepare_packed_git(void);
-extern void reprepare_packed_git(void);
-
-/*
- * Give a rough count of objects in the repository. This sacrifices accuracy
- * for speed.
- */
-unsigned long approximate_object_count(void);
-
 extern struct packed_git *find_sha1_pack(const unsigned char *sha1,
 struct packed_git *packs);
 
diff --git a/connected.c b/connected.c
index 3e3f0148c..f416b0505 100644
--- a/connected.c
+++ b/connected.c
@@ -3,7 +3,7 @@
 #include "sigchain.h"
 #include "connected.h"
 #include "transport.h"
-#include "pack.h"
+#include "packfile.h"
 
 /*
  * If we feed all the commits we want to verify to this command
diff --git a/fetch-pack.c b/fetch-pack.c
index fbbc99c88..105506e9a 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -17,6 +17,7 @@
 #include "prio-queue.h"
 #include "sha1-array.h"
 #include "oidset.h"
+#include "packfile.h"
 
 static int transfer_unpack_limit = -1;
 static int fetch_unpack_limit = -1;
diff --git a/http-backend.c b/http-backend.c
index 519025d2c..8076b1d5e 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -9,6 +9,7 @@
 #include "string-list.h"
 #include "url.h"
 #include "argv-array.h"
+#include "packfile.h"
 
 static const char content_type[] = "Content-Type";
 static const char content_length[] = "Content-Length";
diff --git a/packfile.c b/packfile.c
index 9a65aa4f6..9cf462856 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1,6 +1,9 @@
 #include "cache.h"
 #include "mru.h"
 #include "pack.h"
+#include "dir.h"
+#include "mergesort.h"
+#include "packfile.h"
 
 char *odb_pack_name(struct strbuf *buf,
const unsigned char *sha1,
@@ -667,3 +670,217 @@ void install_packed_git(struct packed_git *pack)
pack->next = packed_git;
packed_git = pack;
 }
+
+void (*report_garbage)(unsigned seen_bits, const char *path);
+
+static void report_helper(const struct string_list *list,
+ int seen_bits, int first, int last)
+{
+   if (seen_bits == (PACKDIR_FILE_PACK|PACKDIR_FILE_IDX))
+   return;
+
+   for (; first < last; first++)
+   report_garbage(seen_bits, list->items[first].string);
+}
+
+static void report_pack_garbage(struct string_list *list)
+{
+   int i, baselen = -1, first = 0, seen_bits = 0;
+
+   if (!report_garbage)
+   return;
+
+   string_list_sort(list);
+
+   for (i = 0; i < list->nr; i++) {
+   const char *path = list->items[i].string;
+   if (baselen != -1 &&
+   strncmp(path, list->items[first].string, baselen)) {
+   report_helper(list, seen_bits, first, i);
+   baselen = -1;
+   seen_bits = 0;
+   }
+   if (baselen == -1) {
+   const char *dot = strrchr(path, '.');
+   if (!dot) {
+   report_garbage(PACKDIR_FILE_GARBAGE, path);
+   continue;
+   }
+   baselen = dot - path + 1;
+   first = i;
+   }
+   if (!strcmp(path + baselen, "pack"))
+   seen_bits |= 1;
+   else if (!strcmp(path + baselen, "idx"))
+   seen_bits |= 2;
+   }
+   report_helper(list, seen_bits, first, list->nr);
+}
+
+static void prepare_packed_git_one(char *objdir, int local)
+{
+   struct strbuf path = STRBUF_INIT;
+   size_t dirnamelen;
+   DIR

[PATCH v3 05/23] pack: move release_pack_memory()

2017-08-18 Thread Jonathan Tan

The function unuse_one_window() needs to be temporarily made global. Its
scope will be restored to static in a subsequent commit.

Signed-off-by: Jonathan Tan 
---
 git-compat-util.h |  2 --
 packfile.c| 49 +
 packfile.h|  4 
 sha1_file.c   | 49 -
 4 files changed, 53 insertions(+), 51 deletions(-)

diff --git a/git-compat-util.h b/git-compat-util.h
index db9c22de7..201056e2d 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -749,8 +749,6 @@ const char *inet_ntop(int af, const void *src, char *dst, 
size_t size);
 extern int git_atexit(void (*handler)(void));
 #endif
 
-extern void release_pack_memory(size_t);
-
 typedef void (*try_to_free_t)(size_t);
 extern try_to_free_t set_try_to_free_routine(try_to_free_t);
 
diff --git a/packfile.c b/packfile.c
index 6edc43228..8daa74ad1 100644
--- a/packfile.c
+++ b/packfile.c
@@ -208,3 +208,52 @@ struct packed_git *parse_pack_index(unsigned char *sha1, 
const char *idx_path)
 
return p;
 }
+
+static void scan_windows(struct packed_git *p,
+   struct packed_git **lru_p,
+   struct pack_window **lru_w,
+   struct pack_window **lru_l)
+{
+   struct pack_window *w, *w_l;
+
+   for (w_l = NULL, w = p->windows; w; w = w->next) {
+   if (!w->inuse_cnt) {
+   if (!*lru_w || w->last_used < (*lru_w)->last_used) {
+   *lru_p = p;
+   *lru_w = w;
+   *lru_l = w_l;
+   }
+   }
+   w_l = w;
+   }
+}
+
+int unuse_one_window(struct packed_git *current)
+{
+   struct packed_git *p, *lru_p = NULL;
+   struct pack_window *lru_w = NULL, *lru_l = NULL;
+
+   if (current)
+   scan_windows(current, _p, _w, _l);
+   for (p = packed_git; p; p = p->next)
+   scan_windows(p, _p, _w, _l);
+   if (lru_p) {
+   munmap(lru_w->base, lru_w->len);
+   pack_mapped -= lru_w->len;
+   if (lru_l)
+   lru_l->next = lru_w->next;
+   else
+   lru_p->windows = lru_w->next;
+   free(lru_w);
+   pack_open_windows--;
+   return 1;
+   }
+   return 0;
+}
+
+void release_pack_memory(size_t need)
+{
+   size_t cur = pack_mapped;
+   while (need >= (cur - pack_mapped) && unuse_one_window(NULL))
+   ; /* nothing */
+}
diff --git a/packfile.h b/packfile.h
index 703887d41..f6fe1c741 100644
--- a/packfile.h
+++ b/packfile.h
@@ -43,4 +43,8 @@ extern void pack_report(void);
  */
 extern int open_pack_index(struct packed_git *);
 
+extern int unuse_one_window(struct packed_git *current);
+
+extern void release_pack_memory(size_t);
+
 #endif
diff --git a/sha1_file.c b/sha1_file.c
index 475d2032d..d51efd78d 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -680,55 +680,6 @@ static int has_loose_object(const unsigned char *sha1)
return check_and_freshen(sha1, 0);
 }
 
-static void scan_windows(struct packed_git *p,
-   struct packed_git **lru_p,
-   struct pack_window **lru_w,
-   struct pack_window **lru_l)
-{
-   struct pack_window *w, *w_l;
-
-   for (w_l = NULL, w = p->windows; w; w = w->next) {
-   if (!w->inuse_cnt) {
-   if (!*lru_w || w->last_used < (*lru_w)->last_used) {
-   *lru_p = p;
-   *lru_w = w;
-   *lru_l = w_l;
-   }
-   }
-   w_l = w;
-   }
-}
-
-static int unuse_one_window(struct packed_git *current)
-{
-   struct packed_git *p, *lru_p = NULL;
-   struct pack_window *lru_w = NULL, *lru_l = NULL;
-
-   if (current)
-   scan_windows(current, _p, _w, _l);
-   for (p = packed_git; p; p = p->next)
-   scan_windows(p, _p, _w, _l);
-   if (lru_p) {
-   munmap(lru_w->base, lru_w->len);
-   pack_mapped -= lru_w->len;
-   if (lru_l)
-   lru_l->next = lru_w->next;
-   else
-   lru_p->windows = lru_w->next;
-   free(lru_w);
-   pack_open_windows--;
-   return 1;
-   }
-   return 0;
-}
-
-void release_pack_memory(size_t need)
-{
-   size_t cur = pack_mapped;
-   while (need >= (cur - pack_mapped) && unuse_one_window(NULL))
-   ; /* nothing */
-}
-
 static void mmap_limit_check(size_t length)
 {
static size_t limit = 0;
-- 
2.14.1.480.gb18f417b89-goog

[PATCH v3 18/23] pack: move find_pack_entry_one(), is_pack_valid()

2017-08-18 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 cache.h |  8 ---
 packfile.c  | 76 -
 packfile.h  |  9 ++--
 sha1_file.c | 73 --
 4 files changed, 82 insertions(+), 84 deletions(-)

diff --git a/cache.h b/cache.h
index ee75a4949..9297d078a 100644
--- a/cache.h
+++ b/cache.h
@@ -1626,14 +1626,6 @@ extern int odb_mkstemp(struct strbuf *template, const 
char *pattern);
  */
 extern int odb_pack_keep(const char *name);
 
-/*
- * If the object named sha1 is present in the specified packfile,
- * return its offset within the packfile; otherwise, return 0.
- */
-extern off_t find_pack_entry_one(const unsigned char *sha1, struct packed_git 
*);
-
-extern int is_pack_valid(struct packed_git *);
-
 /*
  * Iterate over the files in the loose-object parts of the object
  * directory "path", triggering the following callbacks:
diff --git a/packfile.c b/packfile.c
index e914422e9..ad7336594 100644
--- a/packfile.c
+++ b/packfile.c
@@ -7,6 +7,7 @@
 #include "delta.h"
 #include "list.h"
 #include "streaming.h"
+#include "sha1-lookup.h"
 
 char *odb_pack_name(struct strbuf *buf,
const unsigned char *sha1,
@@ -509,7 +510,7 @@ static int open_packed_git_1(struct packed_git *p)
return 0;
 }
 
-int open_packed_git(struct packed_git *p)
+static int open_packed_git(struct packed_git *p)
 {
if (!open_packed_git_1(p))
return 0;
@@ -1700,3 +1701,76 @@ off_t nth_packed_object_offset(const struct packed_git 
*p, uint32_t n)
   ntohl(*((uint32_t *)(index + 4)));
}
 }
+
+off_t find_pack_entry_one(const unsigned char *sha1,
+ struct packed_git *p)
+{
+   const uint32_t *level1_ofs = p->index_data;
+   const unsigned char *index = p->index_data;
+   unsigned hi, lo, stride;
+   static int debug_lookup = -1;
+
+   if (debug_lookup < 0)
+   debug_lookup = !!getenv("GIT_DEBUG_LOOKUP");
+
+   if (!index) {
+   if (open_pack_index(p))
+   return 0;
+   level1_ofs = p->index_data;
+   index = p->index_data;
+   }
+   if (p->index_version > 1) {
+   level1_ofs += 2;
+   index += 8;
+   }
+   index += 4 * 256;
+   hi = ntohl(level1_ofs[*sha1]);
+   lo = ((*sha1 == 0x0) ? 0 : ntohl(level1_ofs[*sha1 - 1]));
+   if (p->index_version > 1) {
+   stride = 20;
+   } else {
+   stride = 24;
+   index += 4;
+   }
+
+   if (debug_lookup)
+   printf("%02x%02x%02x... lo %u hi %u nr %"PRIu32"\n",
+  sha1[0], sha1[1], sha1[2], lo, hi, p->num_objects);
+
+   while (lo < hi) {
+   unsigned mi = (lo + hi) / 2;
+   int cmp = hashcmp(index + mi * stride, sha1);
+
+   if (debug_lookup)
+   printf("lo %u hi %u rg %u mi %u\n",
+  lo, hi, hi - lo, mi);
+   if (!cmp)
+   return nth_packed_object_offset(p, mi);
+   if (cmp > 0)
+   hi = mi;
+   else
+   lo = mi+1;
+   }
+   return 0;
+}
+
+int is_pack_valid(struct packed_git *p)
+{
+   /* An already open pack is known to be valid. */
+   if (p->pack_fd != -1)
+   return 1;
+
+   /* If the pack has one window completely covering the
+* file size, the pack is known to be valid even if
+* the descriptor is not currently open.
+*/
+   if (p->windows) {
+   struct pack_window *w = p->windows;
+
+   if (!w->offset && w->len == p->pack_size)
+   return 1;
+   }
+
+   /* Force the pack to open to prove its valid. */
+   return !open_packed_git(p);
+}
diff --git a/packfile.h b/packfile.h
index 8deb84bd1..4fca6fb28 100644
--- a/packfile.h
+++ b/packfile.h
@@ -93,6 +93,13 @@ extern const struct object_id *nth_packed_object_oid(struct 
object_id *, struct
  */
 extern off_t nth_packed_object_offset(const struct packed_git *, uint32_t n);
 
+/*
+ * If the object named sha1 is present in the specified packfile,
+ * return its offset within the packfile; otherwise, return 0.
+ */
+extern off_t find_pack_entry_one(const unsigned char *sha1, struct packed_git 
*);
+
+extern int is_pack_valid(struct packed_git *);
 extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, 
unsigned long *);
 extern unsigned long unpack_object_header_buffer(const unsigned char *buf, 
unsigned long len, enum object_type *type, unsigned long *sizep);
 extern unsigned long get_size_from_delta(struct packed_git *, struct 
pack_window **, off_t);
@@ -100,8 +107,6 @@ extern int unpack_object_header(struct packed_git *, struct 
pack_window **, off_
 
 extern void

[PATCH v3 03/23] pack: move pack_report()

2017-08-18 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 cache.h |  2 --
 packfile.c  | 24 
 packfile.h  |  2 ++
 sha1_file.c | 24 
 4 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/cache.h b/cache.h
index aa2b4d390..a0497d469 100644
--- a/cache.h
+++ b/cache.h
@@ -1632,8 +1632,6 @@ unsigned long approximate_object_count(void);
 extern struct packed_git *find_sha1_pack(const unsigned char *sha1,
 struct packed_git *packs);
 
-extern void pack_report(void);
-
 /*
  * Create a temporary file rooted in the object database directory, or
  * die on failure. The filename is taken from "pattern", which should have the
diff --git a/packfile.c b/packfile.c
index 0f46e0617..60d9fc3b0 100644
--- a/packfile.c
+++ b/packfile.c
@@ -35,3 +35,27 @@ struct packed_git *packed_git;
 
 static struct mru packed_git_mru_storage;
 struct mru *packed_git_mru = _git_mru_storage;
+
+#define SZ_FMT PRIuMAX
+static inline uintmax_t sz_fmt(size_t s) { return s; }
+
+void pack_report(void)
+{
+   fprintf(stderr,
+   "pack_report: getpagesize()= %10" SZ_FMT "\n"
+   "pack_report: core.packedGitWindowSize = %10" SZ_FMT "\n"
+   "pack_report: core.packedGitLimit  = %10" SZ_FMT "\n",
+   sz_fmt(getpagesize()),
+   sz_fmt(packed_git_window_size),
+   sz_fmt(packed_git_limit));
+   fprintf(stderr,
+   "pack_report: pack_used_ctr= %10u\n"
+   "pack_report: pack_mmap_calls  = %10u\n"
+   "pack_report: pack_open_windows= %10u / %10u\n"
+   "pack_report: pack_mapped  = "
+   "%10" SZ_FMT " / %10" SZ_FMT "\n",
+   pack_used_ctr,
+   pack_mmap_calls,
+   pack_open_windows, peak_pack_open_windows,
+   sz_fmt(pack_mapped), sz_fmt(peak_pack_mapped));
+}
diff --git a/packfile.h b/packfile.h
index a76bb7cec..bfa94c8fe 100644
--- a/packfile.h
+++ b/packfile.h
@@ -33,4 +33,6 @@ extern unsigned int pack_max_fds;
 extern size_t peak_pack_mapped;
 extern size_t pack_mapped;
 
+extern void pack_report(void);
+
 #endif
diff --git a/sha1_file.c b/sha1_file.c
index 2b5ce9959..f7c8152ac 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -30,9 +30,6 @@
 #include "quote.h"
 #include "packfile.h"
 
-#define SZ_FMT PRIuMAX
-static inline uintmax_t sz_fmt(size_t s) { return s; }
-
 const unsigned char null_sha1[20];
 const struct object_id null_oid;
 const struct object_id empty_tree_oid = {
@@ -683,27 +680,6 @@ static int has_loose_object(const unsigned char *sha1)
return check_and_freshen(sha1, 0);
 }
 
-void pack_report(void)
-{
-   fprintf(stderr,
-   "pack_report: getpagesize()= %10" SZ_FMT "\n"
-   "pack_report: core.packedGitWindowSize = %10" SZ_FMT "\n"
-   "pack_report: core.packedGitLimit  = %10" SZ_FMT "\n",
-   sz_fmt(getpagesize()),
-   sz_fmt(packed_git_window_size),
-   sz_fmt(packed_git_limit));
-   fprintf(stderr,
-   "pack_report: pack_used_ctr= %10u\n"
-   "pack_report: pack_mmap_calls  = %10u\n"
-   "pack_report: pack_open_windows= %10u / %10u\n"
-   "pack_report: pack_mapped  = "
-   "%10" SZ_FMT " / %10" SZ_FMT "\n",
-   pack_used_ctr,
-   pack_mmap_calls,
-   pack_open_windows, peak_pack_open_windows,
-   sz_fmt(pack_mapped), sz_fmt(peak_pack_mapped));
-}
-
 /*
  * Open and mmap the index file at path, perform a couple of
  * consistency checks, then record its information to p.  Return 0 on
-- 
2.14.1.480.gb18f417b89-goog

[PATCH v3 19/23] pack: move find_sha1_pack()

2017-08-18 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 cache.h   |  3 ---
 http-push.c   |  1 +
 http-walker.c |  1 +
 packfile.c| 13 +
 packfile.h|  3 +++
 sha1_file.c   | 13 -
 6 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/cache.h b/cache.h
index 9297d078a..1e90bb754 100644
--- a/cache.h
+++ b/cache.h
@@ -1608,9 +1608,6 @@ struct pack_entry {
struct packed_git *p;
 };
 
-extern struct packed_git *find_sha1_pack(const unsigned char *sha1,
-struct packed_git *packs);
-
 /*
  * Create a temporary file rooted in the object database directory, or
  * die on failure. The filename is taken from "pattern", which should have the
diff --git a/http-push.c b/http-push.c
index c91f40a61..e4c9b065c 100644
--- a/http-push.c
+++ b/http-push.c
@@ -11,6 +11,7 @@
 #include "list-objects.h"
 #include "sigchain.h"
 #include "argv-array.h"
+#include "packfile.h"
 
 #ifdef EXPAT_NEEDS_XMLPARSE_H
 #include 
diff --git a/http-walker.c b/http-walker.c
index ee049cb13..1ae8363de 100644
--- a/http-walker.c
+++ b/http-walker.c
@@ -4,6 +4,7 @@
 #include "http.h"
 #include "list.h"
 #include "transport.h"
+#include "packfile.h"
 
 struct alt_base {
char *base;
diff --git a/packfile.c b/packfile.c
index ad7336594..ba3a5eb3a 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1774,3 +1774,16 @@ int is_pack_valid(struct packed_git *p)
/* Force the pack to open to prove its valid. */
return !open_packed_git(p);
 }
+
+struct packed_git *find_sha1_pack(const unsigned char *sha1,
+ struct packed_git *packs)
+{
+   struct packed_git *p;
+
+   for (p = packs; p; p = p->next) {
+   if (find_pack_entry_one(sha1, p))
+   return p;
+   }
+   return NULL;
+
+}
diff --git a/packfile.h b/packfile.h
index 4fca6fb28..a4ff6f6ed 100644
--- a/packfile.h
+++ b/packfile.h
@@ -42,6 +42,9 @@ extern void install_packed_git(struct packed_git *pack);
  */
 unsigned long approximate_object_count(void);
 
+extern struct packed_git *find_sha1_pack(const unsigned char *sha1,
+struct packed_git *packs);
+
 extern void pack_report(void);
 
 /*
diff --git a/sha1_file.c b/sha1_file.c
index 27714f5e1..8853672d2 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1127,19 +1127,6 @@ static int find_pack_entry(const unsigned char *sha1, 
struct pack_entry *e)
return 0;
 }
 
-struct packed_git *find_sha1_pack(const unsigned char *sha1,
- struct packed_git *packs)
-{
-   struct packed_git *p;
-
-   for (p = packs; p; p = p->next) {
-   if (find_pack_entry_one(sha1, p))
-   return p;
-   }
-   return NULL;
-
-}
-
 static int sha1_loose_object_info(const unsigned char *sha1,
  struct object_info *oi,
  int flags)
-- 
2.14.1.480.gb18f417b89-goog

[PATCH v3 21/23] pack: move has_sha1_pack()

2017-08-18 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 builtin/prune-packed.c | 1 +
 cache.h| 2 --
 diff.c | 1 +
 packfile.c | 6 ++
 packfile.h | 2 ++
 revision.c | 1 +
 sha1_file.c| 6 --
 7 files changed, 11 insertions(+), 8 deletions(-)

diff --git a/builtin/prune-packed.c b/builtin/prune-packed.c
index ac978ad40..97bfde24b 100644
--- a/builtin/prune-packed.c
+++ b/builtin/prune-packed.c
@@ -2,6 +2,7 @@
 #include "cache.h"
 #include "progress.h"
 #include "parse-options.h"
+#include "packfile.h"
 
 static const char * const prune_packed_usage[] = {
N_("git prune-packed [-n | --dry-run] [-q | --quiet]"),
diff --git a/cache.h b/cache.h
index 1e90bb754..286891df4 100644
--- a/cache.h
+++ b/cache.h
@@ -1198,8 +1198,6 @@ extern int check_sha1_signature(const unsigned char 
*sha1, void *buf, unsigned l
 
 extern int finalize_object_file(const char *tmpfile, const char *filename);
 
-extern int has_sha1_pack(const unsigned char *sha1);
-
 /*
  * Open the loose object at path, check its sha1, and return the contents,
  * type, and size. If the object is a blob, then "contents" may return NULL,
diff --git a/diff.c b/diff.c
index 85e714f6c..e9a1d6162 100644
--- a/diff.c
+++ b/diff.c
@@ -20,6 +20,7 @@
 #include "string-list.h"
 #include "argv-array.h"
 #include "graph.h"
+#include "packfile.h"
 
 #ifdef NO_FAST_WORKING_DIRECTORY
 #define FAST_WORKING_DIRECTORY 0
diff --git a/packfile.c b/packfile.c
index ae5395f5f..7472ab816 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1840,3 +1840,9 @@ int find_pack_entry(const unsigned char *sha1, struct 
pack_entry *e)
}
return 0;
 }
+
+int has_sha1_pack(const unsigned char *sha1)
+{
+   struct pack_entry e;
+   return find_pack_entry(sha1, );
+}
diff --git a/packfile.h b/packfile.h
index c9b4fcfaf..4945a1505 100644
--- a/packfile.h
+++ b/packfile.h
@@ -120,4 +120,6 @@ extern const struct packed_git *has_packed_and_bad(const 
unsigned char *sha1);
 
 extern int find_pack_entry(const unsigned char *sha1, struct pack_entry *e);
 
+extern int has_sha1_pack(const unsigned char *sha1);
+
 #endif
diff --git a/revision.c b/revision.c
index 6603af944..db97b9221 100644
--- a/revision.c
+++ b/revision.c
@@ -19,6 +19,7 @@
 #include "dir.h"
 #include "cache-tree.h"
 #include "bisect.h"
+#include "packfile.h"
 
 volatile show_early_output_fn_t show_early_output;
 
diff --git a/sha1_file.c b/sha1_file.c
index 76c86639c..e4975e0ae 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1630,12 +1630,6 @@ int has_pack_index(const unsigned char *sha1)
return 1;
 }
 
-int has_sha1_pack(const unsigned char *sha1)
-{
-   struct pack_entry e;
-   return find_pack_entry(sha1, );
-}
-
 int has_sha1_file_with_flags(const unsigned char *sha1, int flags)
 {
if (!startup_info->have_repository)
-- 
2.14.1.480.gb18f417b89-goog

[PATCH v3 14/23] pack: move unpack_object_header()

2017-08-18 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 cache.h |  1 -
 packfile.c  | 26 ++
 packfile.h  |  1 +
 sha1_file.c | 26 --
 4 files changed, 27 insertions(+), 27 deletions(-)

diff --git a/cache.h b/cache.h
index 87f65aeea..7adbc587d 100644
--- a/cache.h
+++ b/cache.h
@@ -1669,7 +1669,6 @@ extern off_t find_pack_entry_one(const unsigned char 
*sha1, struct packed_git *)
 
 extern int is_pack_valid(struct packed_git *);
 extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, 
unsigned long *);
-extern int unpack_object_header(struct packed_git *, struct pack_window **, 
off_t *, unsigned long *);
 
 /*
  * Iterate over the files in the loose-object parts of the object
diff --git a/packfile.c b/packfile.c
index fa90b643e..3543b37b8 100644
--- a/packfile.c
+++ b/packfile.c
@@ -949,3 +949,29 @@ unsigned long get_size_from_delta(struct packed_git *p,
/* Read the result size */
return get_delta_hdr_size(, delta_head+sizeof(delta_head));
 }
+
+int unpack_object_header(struct packed_git *p,
+struct pack_window **w_curs,
+off_t *curpos,
+unsigned long *sizep)
+{
+   unsigned char *base;
+   size_t left;
+   size_t used;
+   enum object_type type;
+
+   /* use_pack() assures us we have [base, base + 20) available
+* as a range that we can look at.  (Its actually the hash
+* size that is assured.)  With our object header encoding
+* the maximum deflated object size is 2^137, which is just
+* insane, so we know won't exceed what we have been given.
+*/
+   base = use_pack(p, w_curs, *curpos, );
+   used = unpack_object_header_buffer(base, left, , sizep);
+   if (!used) {
+   type = OBJ_BAD;
+   } else
+   *curpos += used;
+
+   return type;
+}
diff --git a/packfile.h b/packfile.h
index 9c3bce6b2..d22a528b5 100644
--- a/packfile.h
+++ b/packfile.h
@@ -64,6 +64,7 @@ extern struct packed_git *add_packed_git(const char *path, 
size_t path_len, int
 
 extern unsigned long unpack_object_header_buffer(const unsigned char *buf, 
unsigned long len, enum object_type *type, unsigned long *sizep);
 extern unsigned long get_size_from_delta(struct packed_git *, struct 
pack_window **, off_t);
+extern int unpack_object_header(struct packed_git *, struct pack_window **, 
off_t *, unsigned long *);
 
 extern void release_pack_memory(size_t);
 
diff --git a/sha1_file.c b/sha1_file.c
index 5d016ad6b..681dcf1c0 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1171,32 +1171,6 @@ static const unsigned char *get_delta_base_sha1(struct 
packed_git *p,
return NULL;
 }
 
-int unpack_object_header(struct packed_git *p,
-struct pack_window **w_curs,
-off_t *curpos,
-unsigned long *sizep)
-{
-   unsigned char *base;
-   size_t left;
-   size_t used;
-   enum object_type type;
-
-   /* use_pack() assures us we have [base, base + 20) available
-* as a range that we can look at.  (Its actually the hash
-* size that is assured.)  With our object header encoding
-* the maximum deflated object size is 2^137, which is just
-* insane, so we know won't exceed what we have been given.
-*/
-   base = use_pack(p, w_curs, *curpos, );
-   used = unpack_object_header_buffer(base, left, , sizep);
-   if (!used) {
-   type = OBJ_BAD;
-   } else
-   *curpos += used;
-
-   return type;
-}
-
 static int retry_bad_packed_offset(struct packed_git *p, off_t obj_offset)
 {
int type;
-- 
2.14.1.480.gb18f417b89-goog

[PATCH v3 15/23] pack: move clear_delta_base_cache(), packed_object_info(), unpack_entry()

2017-08-18 Thread Jonathan Tan

Both sha1_file.c and packfile.c now need read_object(), so a copy of
read_object() was created in packfile.c.

This patch makes both mark_bad_packed_object() and has_packed_and_bad()
global. Unlike most of the other patches in this series, these 2
functions need to remain global.

Signed-off-by: Jonathan Tan 
---
 cache.h |   7 -
 packfile.c  | 661 ++
 packfile.h  |  10 +
 sha1_file.c | 677 ++--
 4 files changed, 685 insertions(+), 670 deletions(-)

diff --git a/cache.h b/cache.h
index 7adbc587d..11aa18e6a 100644
--- a/cache.h
+++ b/cache.h
@@ -1194,9 +1194,6 @@ extern void *map_sha1_file(const unsigned char *sha1, 
unsigned long *size);
 extern int unpack_sha1_header(git_zstream *stream, unsigned char *map, 
unsigned long mapsize, void *buffer, unsigned long bufsiz);
 extern int parse_sha1_header(const char *hdr, unsigned long *sizep);
 
-/* global flag to enable extra checks when accessing packed objects */
-extern int do_check_packed_object_crc;
-
 extern int check_sha1_signature(const unsigned char *sha1, void *buf, unsigned 
long size, const char *type);
 
 extern int finalize_object_file(const char *tmpfile, const char *filename);
@@ -1629,8 +1626,6 @@ extern int odb_mkstemp(struct strbuf *template, const 
char *pattern);
  */
 extern int odb_pack_keep(const char *name);
 
-extern void clear_delta_base_cache(void);
-
 /*
  * Make sure that a pointer access into an mmap'd index file is within bounds,
  * and can provide at least 8 bytes of data.
@@ -1668,7 +1663,6 @@ extern off_t nth_packed_object_offset(const struct 
packed_git *, uint32_t n);
 extern off_t find_pack_entry_one(const unsigned char *sha1, struct packed_git 
*);
 
 extern int is_pack_valid(struct packed_git *);
-extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, 
unsigned long *);
 
 /*
  * Iterate over the files in the loose-object parts of the object
@@ -1779,7 +1773,6 @@ struct object_info {
 /* Do not retry packed storage after checking packed and loose storage */
 #define OBJECT_INFO_QUICK 8
 extern int sha1_object_info_extended(const unsigned char *, struct object_info 
*, unsigned flags);
-extern int packed_object_info(struct packed_git *pack, off_t offset, struct 
object_info *);
 
 /* Dumb servers support */
 extern int update_server_info(int);
diff --git a/packfile.c b/packfile.c
index 3543b37b8..624cc109e 100644
--- a/packfile.c
+++ b/packfile.c
@@ -5,6 +5,8 @@
 #include "mergesort.h"
 #include "packfile.h"
 #include "delta.h"
+#include "list.h"
+#include "streaming.h"
 
 char *odb_pack_name(struct strbuf *buf,
const unsigned char *sha1,
@@ -975,3 +977,662 @@ int unpack_object_header(struct packed_git *p,
 
return type;
 }
+
+void mark_bad_packed_object(struct packed_git *p, const unsigned char *sha1)
+{
+   unsigned i;
+   for (i = 0; i < p->num_bad_objects; i++)
+   if (!hashcmp(sha1, p->bad_object_sha1 + GIT_SHA1_RAWSZ * i))
+   return;
+   p->bad_object_sha1 = xrealloc(p->bad_object_sha1,
+ st_mult(GIT_MAX_RAWSZ,
+ st_add(p->num_bad_objects, 1)));
+   hashcpy(p->bad_object_sha1 + GIT_SHA1_RAWSZ * p->num_bad_objects, sha1);
+   p->num_bad_objects++;
+}
+
+const struct packed_git *has_packed_and_bad(const unsigned char *sha1)
+{
+   struct packed_git *p;
+   unsigned i;
+
+   for (p = packed_git; p; p = p->next)
+   for (i = 0; i < p->num_bad_objects; i++)
+   if (!hashcmp(sha1, p->bad_object_sha1 + 20 * i))
+   return p;
+   return NULL;
+}
+
+static off_t get_delta_base(struct packed_git *p,
+   struct pack_window **w_curs,
+   off_t *curpos,
+   enum object_type type,
+   off_t delta_obj_offset)
+{
+   unsigned char *base_info = use_pack(p, w_curs, *curpos, NULL);
+   off_t base_offset;
+
+   /* use_pack() assured us we have [base_info, base_info + 20)
+* as a range that we can look at without walking off the
+* end of the mapped window.  Its actually the hash size
+* that is assured.  An OFS_DELTA longer than the hash size
+* is stupid, as then a REF_DELTA would be smaller to store.
+*/
+   if (type == OBJ_OFS_DELTA) {
+   unsigned used = 0;
+   unsigned char c = base_info[used++];
+   base_offset = c & 127;
+   while (c & 128) {
+   base_offset += 1;
+   if (!base_offset || MSB(base_offset, 7))
+   return 0;  /* overflow */
+   c = base_info[used++];
+   base_offset = (base_offset << 7) + (c

[PATCH v3 22/23] pack: move has_pack_index()

2017-08-18 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 cache.h | 2 --
 packfile.c  | 8 
 packfile.h  | 2 ++
 sha1_file.c | 8 
 4 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/cache.h b/cache.h
index 286891df4..dcbe37a3f 100644
--- a/cache.h
+++ b/cache.h
@@ -1233,8 +1233,6 @@ extern int has_object_file_with_flags(const struct 
object_id *oid, int flags);
  */
 extern int has_loose_object_nonlocal(const unsigned char *sha1);
 
-extern int has_pack_index(const unsigned char *sha1);
-
 extern void assert_sha1_type(const unsigned char *sha1, enum object_type 
expect);
 
 /* Helper to check and "touch" a file */
diff --git a/packfile.c b/packfile.c
index 7472ab816..7e293761b 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1846,3 +1846,11 @@ int has_sha1_pack(const unsigned char *sha1)
struct pack_entry e;
return find_pack_entry(sha1, );
 }
+
+int has_pack_index(const unsigned char *sha1)
+{
+   struct stat st;
+   if (stat(sha1_pack_index_name(sha1), ))
+   return 0;
+   return 1;
+}
diff --git a/packfile.h b/packfile.h
index 4945a1505..1b6ea832c 100644
--- a/packfile.h
+++ b/packfile.h
@@ -122,4 +122,6 @@ extern int find_pack_entry(const unsigned char *sha1, 
struct pack_entry *e);
 
 extern int has_sha1_pack(const unsigned char *sha1);
 
+extern int has_pack_index(const unsigned char *sha1);
+
 #endif
diff --git a/sha1_file.c b/sha1_file.c
index e4975e0ae..fa422435f 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1622,14 +1622,6 @@ int force_object_loose(const unsigned char *sha1, time_t 
mtime)
return ret;
 }
 
-int has_pack_index(const unsigned char *sha1)
-{
-   struct stat st;
-   if (stat(sha1_pack_index_name(sha1), ))
-   return 0;
-   return 1;
-}
-
 int has_sha1_file_with_flags(const unsigned char *sha1, int flags)
 {
if (!startup_info->have_repository)
-- 
2.14.1.480.gb18f417b89-goog

[PATCH v3 17/23] pack: move check_pack_index_ptr(), nth_packed_object_offset()

2017-08-18 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 cache.h | 16 
 packfile.c  | 33 +
 packfile.h  | 16 
 sha1_file.c | 33 -
 4 files changed, 49 insertions(+), 49 deletions(-)

diff --git a/cache.h b/cache.h
index 83aa3cc62..ee75a4949 100644
--- a/cache.h
+++ b/cache.h
@@ -1626,22 +1626,6 @@ extern int odb_mkstemp(struct strbuf *template, const 
char *pattern);
  */
 extern int odb_pack_keep(const char *name);
 
-/*
- * Make sure that a pointer access into an mmap'd index file is within bounds,
- * and can provide at least 8 bytes of data.
- *
- * Note that this is only necessary for variable-length segments of the file
- * (like the 64-bit extended offset table), as we compare the size to the
- * fixed-length parts when we open the file.
- */
-extern void check_pack_index_ptr(const struct packed_git *p, const void *ptr);
-
-/*
- * Return the offset of the nth object within the specified packfile.
- * The index must already be opened.
- */
-extern off_t nth_packed_object_offset(const struct packed_git *, uint32_t n);
-
 /*
  * If the object named sha1 is present in the specified packfile,
  * return its offset within the packfile; otherwise, return 0.
diff --git a/packfile.c b/packfile.c
index e9b16da94..e914422e9 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1667,3 +1667,36 @@ const struct object_id *nth_packed_object_oid(struct 
object_id *oid,
hashcpy(oid->hash, hash);
return oid;
 }
+
+void check_pack_index_ptr(const struct packed_git *p, const void *vptr)
+{
+   const unsigned char *ptr = vptr;
+   const unsigned char *start = p->index_data;
+   const unsigned char *end = start + p->index_size;
+   if (ptr < start)
+   die(_("offset before start of pack index for %s (corrupt 
index?)"),
+   p->pack_name);
+   /* No need to check for underflow; .idx files must be at least 8 bytes 
*/
+   if (ptr >= end - 8)
+   die(_("offset beyond end of pack index for %s (truncated 
index?)"),
+   p->pack_name);
+}
+
+off_t nth_packed_object_offset(const struct packed_git *p, uint32_t n)
+{
+   const unsigned char *index = p->index_data;
+   index += 4 * 256;
+   if (p->index_version == 1) {
+   return ntohl(*((uint32_t *)(index + 24 * n)));
+   } else {
+   uint32_t off;
+   index += 8 + p->num_objects * (20 + 4);
+   off = ntohl(*((uint32_t *)(index + 4 * n)));
+   if (!(off & 0x8000))
+   return off;
+   index += p->num_objects * 4 + (off & 0x7fff) * 8;
+   check_pack_index_ptr(p, index);
+   return (((uint64_t)ntohl(*((uint32_t *)(index + 0 << 32) |
+  ntohl(*((uint32_t *)(index + 4)));
+   }
+}
diff --git a/packfile.h b/packfile.h
index 56d70caa0..8deb84bd1 100644
--- a/packfile.h
+++ b/packfile.h
@@ -63,6 +63,16 @@ extern void unuse_pack(struct pack_window **);
 extern void clear_delta_base_cache(void);
 extern struct packed_git *add_packed_git(const char *path, size_t path_len, 
int local);
 
+/*
+ * Make sure that a pointer access into an mmap'd index file is within bounds,
+ * and can provide at least 8 bytes of data.
+ *
+ * Note that this is only necessary for variable-length segments of the file
+ * (like the 64-bit extended offset table), as we compare the size to the
+ * fixed-length parts when we open the file.
+ */
+extern void check_pack_index_ptr(const struct packed_git *p, const void *ptr);
+
 /*
  * Return the SHA-1 of the nth object within the specified packfile.
  * Open the index if it is not already open.  The return value points
@@ -77,6 +87,11 @@ extern const unsigned char *nth_packed_object_sha1(struct 
packed_git *, uint32_t
  */
 extern const struct object_id *nth_packed_object_oid(struct object_id *, 
struct packed_git *, uint32_t n);
 
+/*
+ * Return the offset of the nth object within the specified packfile.
+ * The index must already be opened.
+ */
+extern off_t nth_packed_object_offset(const struct packed_git *, uint32_t n);
 
 extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, 
unsigned long *);
 extern unsigned long unpack_object_header_buffer(const unsigned char *buf, 
unsigned long len, enum object_type *type, unsigned long *sizep);
@@ -94,4 +109,5 @@ extern int packed_object_info(struct packed_git *pack, off_t 
offset, struct obje
 
 extern void mark_bad_packed_object(struct packed_git *p, const unsigned char 
*sha1);
 extern const struct packed_git *has_packed_and_bad(const unsigned char *sha1);
+
 #endif
diff --git a/sha1_file.c b/sha1_file.c
index 34fbe8e51..2d22bc228 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1074,39 +1074,6 @@ int parse_sha1_header(const char *hdr, unsigned long 
*sizep)
return parse_sha1_header_extended(hdr, , 0);
 }
 
-void

[PATCH v3 13/23] pack: move get_size_from_delta()

2017-08-18 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 cache.h |  1 -
 packfile.c  | 40 
 packfile.h  |  1 +
 sha1_file.c | 39 ---
 4 files changed, 41 insertions(+), 40 deletions(-)

diff --git a/cache.h b/cache.h
index 75cc0c497..87f65aeea 100644
--- a/cache.h
+++ b/cache.h
@@ -1669,7 +1669,6 @@ extern off_t find_pack_entry_one(const unsigned char 
*sha1, struct packed_git *)
 
 extern int is_pack_valid(struct packed_git *);
 extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, 
unsigned long *);
-extern unsigned long get_size_from_delta(struct packed_git *, struct 
pack_window **, off_t);
 extern int unpack_object_header(struct packed_git *, struct pack_window **, 
off_t *, unsigned long *);
 
 /*
diff --git a/packfile.c b/packfile.c
index 43b708812..fa90b643e 100644
--- a/packfile.c
+++ b/packfile.c
@@ -4,6 +4,7 @@
 #include "dir.h"
 #include "mergesort.h"
 #include "packfile.h"
+#include "delta.h"
 
 char *odb_pack_name(struct strbuf *buf,
const unsigned char *sha1,
@@ -909,3 +910,42 @@ unsigned long unpack_object_header_buffer(const unsigned 
char *buf,
*sizep = size;
return used;
 }
+
+unsigned long get_size_from_delta(struct packed_git *p,
+ struct pack_window **w_curs,
+ off_t curpos)
+{
+   const unsigned char *data;
+   unsigned char delta_head[20], *in;
+   git_zstream stream;
+   int st;
+
+   memset(, 0, sizeof(stream));
+   stream.next_out = delta_head;
+   stream.avail_out = sizeof(delta_head);
+
+   git_inflate_init();
+   do {
+   in = use_pack(p, w_curs, curpos, _in);
+   stream.next_in = in;
+   st = git_inflate(, Z_FINISH);
+   curpos += stream.next_in - in;
+   } while ((st == Z_OK || st == Z_BUF_ERROR) &&
+stream.total_out < sizeof(delta_head));
+   git_inflate_end();
+   if ((st != Z_STREAM_END) && stream.total_out != sizeof(delta_head)) {
+   error("delta data unpack-initial failed");
+   return 0;
+   }
+
+   /* Examine the initial part of the delta to figure out
+* the result size.
+*/
+   data = delta_head;
+
+   /* ignore base size */
+   get_delta_hdr_size(, delta_head+sizeof(delta_head));
+
+   /* Read the result size */
+   return get_delta_hdr_size(, delta_head+sizeof(delta_head));
+}
diff --git a/packfile.h b/packfile.h
index 9f36e0112..9c3bce6b2 100644
--- a/packfile.h
+++ b/packfile.h
@@ -63,6 +63,7 @@ extern void unuse_pack(struct pack_window **);
 extern struct packed_git *add_packed_git(const char *path, size_t path_len, 
int local);
 
 extern unsigned long unpack_object_header_buffer(const unsigned char *buf, 
unsigned long len, enum object_type *type, unsigned long *sizep);
+extern unsigned long get_size_from_delta(struct packed_git *, struct 
pack_window **, off_t);
 
 extern void release_pack_memory(size_t);
 
diff --git a/sha1_file.c b/sha1_file.c
index b57b0..5d016ad6b 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1100,45 +1100,6 @@ int parse_sha1_header(const char *hdr, unsigned long 
*sizep)
return parse_sha1_header_extended(hdr, , 0);
 }
 
-unsigned long get_size_from_delta(struct packed_git *p,
- struct pack_window **w_curs,
- off_t curpos)
-{
-   const unsigned char *data;
-   unsigned char delta_head[20], *in;
-   git_zstream stream;
-   int st;
-
-   memset(, 0, sizeof(stream));
-   stream.next_out = delta_head;
-   stream.avail_out = sizeof(delta_head);
-
-   git_inflate_init();
-   do {
-   in = use_pack(p, w_curs, curpos, _in);
-   stream.next_in = in;
-   st = git_inflate(, Z_FINISH);
-   curpos += stream.next_in - in;
-   } while ((st == Z_OK || st == Z_BUF_ERROR) &&
-stream.total_out < sizeof(delta_head));
-   git_inflate_end();
-   if ((st != Z_STREAM_END) && stream.total_out != sizeof(delta_head)) {
-   error("delta data unpack-initial failed");
-   return 0;
-   }
-
-   /* Examine the initial part of the delta to figure out
-* the result size.
-*/
-   data = delta_head;
-
-   /* ignore base size */
-   get_delta_hdr_size(, delta_head+sizeof(delta_head));
-
-   /* Read the result size */
-   return get_delta_hdr_size(, delta_head+sizeof(delta_head));
-}
-
 static off_t get_delta_base(struct packed_git *p,
struct pack_window **w_curs,
off_t *curpos,
-- 
2.14.1.480.gb18f417b89-goog

[PATCH v3 16/23] pack: move nth_packed_object_{sha1,oid}

2017-08-18 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 cache.h | 14 --
 packfile.c  | 31 +++
 packfile.h  | 16 +++-
 sha1_file.c | 31 ---
 4 files changed, 46 insertions(+), 46 deletions(-)

diff --git a/cache.h b/cache.h
index 11aa18e6a..83aa3cc62 100644
--- a/cache.h
+++ b/cache.h
@@ -1636,20 +1636,6 @@ extern int odb_pack_keep(const char *name);
  */
 extern void check_pack_index_ptr(const struct packed_git *p, const void *ptr);
 
-/*
- * Return the SHA-1 of the nth object within the specified packfile.
- * Open the index if it is not already open.  The return value points
- * at the SHA-1 within the mmapped index.  Return NULL if there is an
- * error.
- */
-extern const unsigned char *nth_packed_object_sha1(struct packed_git *, 
uint32_t n);
-/*
- * Like nth_packed_object_sha1, but write the data into the object specified by
- * the the first argument.  Returns the first argument on success, and NULL on
- * error.
- */
-extern const struct object_id *nth_packed_object_oid(struct object_id *, 
struct packed_git *, uint32_t n);
-
 /*
  * Return the offset of the nth object within the specified packfile.
  * The index must already be opened.
diff --git a/packfile.c b/packfile.c
index 624cc109e..e9b16da94 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1636,3 +1636,34 @@ void *unpack_entry(struct packed_git *p, off_t 
obj_offset,
 
return data;
 }
+
+const unsigned char *nth_packed_object_sha1(struct packed_git *p,
+   uint32_t n)
+{
+   const unsigned char *index = p->index_data;
+   if (!index) {
+   if (open_pack_index(p))
+   return NULL;
+   index = p->index_data;
+   }
+   if (n >= p->num_objects)
+   return NULL;
+   index += 4 * 256;
+   if (p->index_version == 1) {
+   return index + 24 * n + 4;
+   } else {
+   index += 8;
+   return index + 20 * n;
+   }
+}
+
+const struct object_id *nth_packed_object_oid(struct object_id *oid,
+ struct packed_git *p,
+ uint32_t n)
+{
+   const unsigned char *hash = nth_packed_object_sha1(p, n);
+   if (!hash)
+   return NULL;
+   hashcpy(oid->hash, hash);
+   return oid;
+}
diff --git a/packfile.h b/packfile.h
index c28eaccc6..56d70caa0 100644
--- a/packfile.h
+++ b/packfile.h
@@ -63,6 +63,21 @@ extern void unuse_pack(struct pack_window **);
 extern void clear_delta_base_cache(void);
 extern struct packed_git *add_packed_git(const char *path, size_t path_len, 
int local);
 
+/*
+ * Return the SHA-1 of the nth object within the specified packfile.
+ * Open the index if it is not already open.  The return value points
+ * at the SHA-1 within the mmapped index.  Return NULL if there is an
+ * error.
+ */
+extern const unsigned char *nth_packed_object_sha1(struct packed_git *, 
uint32_t n);
+/*
+ * Like nth_packed_object_sha1, but write the data into the object specified by
+ * the the first argument.  Returns the first argument on success, and NULL on
+ * error.
+ */
+extern const struct object_id *nth_packed_object_oid(struct object_id *, 
struct packed_git *, uint32_t n);
+
+
 extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, 
unsigned long *);
 extern unsigned long unpack_object_header_buffer(const unsigned char *buf, 
unsigned long len, enum object_type *type, unsigned long *sizep);
 extern unsigned long get_size_from_delta(struct packed_git *, struct 
pack_window **, off_t);
@@ -79,5 +94,4 @@ extern int packed_object_info(struct packed_git *pack, off_t 
offset, struct obje
 
 extern void mark_bad_packed_object(struct packed_git *p, const unsigned char 
*sha1);
 extern const struct packed_git *has_packed_and_bad(const unsigned char *sha1);
-
 #endif
diff --git a/sha1_file.c b/sha1_file.c
index e537ba089..34fbe8e51 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1074,37 +1074,6 @@ int parse_sha1_header(const char *hdr, unsigned long 
*sizep)
return parse_sha1_header_extended(hdr, , 0);
 }
 
-const unsigned char *nth_packed_object_sha1(struct packed_git *p,
-   uint32_t n)
-{
-   const unsigned char *index = p->index_data;
-   if (!index) {
-   if (open_pack_index(p))
-   return NULL;
-   index = p->index_data;
-   }
-   if (n >= p->num_objects)
-   return NULL;
-   index += 4 * 256;
-   if (p->index_version == 1) {
-   return index + 24 * n + 4;
-   } else {
-   index += 8;
-   return index + 20 * n;
-   }
-}
-
-const struct object_id *nth_packed_object_oid(struct object_id *oid,
- struct packed_git *p,
- uint32_t

[PATCH v3 23/23] pack: move for_each_packed_object()

2017-08-18 Thread Jonathan Tan

Signed-off-by: Jonathan Tan 
---
 builtin/cat-file.c |  1 +
 cache.h|  7 +--
 packfile.c | 40 
 packfile.h | 11 +++
 reachable.c|  1 +
 sha1_file.c| 40 
 6 files changed, 54 insertions(+), 46 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 96b786e48..be5936017 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -12,6 +12,7 @@
 #include "streaming.h"
 #include "tree-walk.h"
 #include "sha1-array.h"
+#include "packfile.h"
 
 struct batch_options {
int enabled;
diff --git a/cache.h b/cache.h
index dcbe37a3f..2eeb21b02 100644
--- a/cache.h
+++ b/cache.h
@@ -1668,17 +1668,12 @@ int for_each_loose_file_in_objdir_buf(struct strbuf 
*path,
  void *data);
 
 /*
- * Iterate over loose and packed objects in both the local
+ * Iterate over loose objects in both the local
  * repository and any alternates repositories (unless the
  * LOCAL_ONLY flag is set).
  */
 #define FOR_EACH_OBJECT_LOCAL_ONLY 0x1
-typedef int each_packed_object_fn(const struct object_id *oid,
- struct packed_git *pack,
- uint32_t pos,
- void *data);
 extern int for_each_loose_object(each_loose_object_fn, void *, unsigned flags);
-extern int for_each_packed_object(each_packed_object_fn, void *, unsigned 
flags);
 
 struct object_info {
/* Request */
diff --git a/packfile.c b/packfile.c
index 7e293761b..1f11ef5b8 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1854,3 +1854,43 @@ int has_pack_index(const unsigned char *sha1)
return 0;
return 1;
 }
+
+static int for_each_object_in_pack(struct packed_git *p, each_packed_object_fn 
cb, void *data)
+{
+   uint32_t i;
+   int r = 0;
+
+   for (i = 0; i < p->num_objects; i++) {
+   struct object_id oid;
+
+   if (!nth_packed_object_oid(, p, i))
+   return error("unable to get sha1 of object %u in %s",
+i, p->pack_name);
+
+   r = cb(, p, i, data);
+   if (r)
+   break;
+   }
+   return r;
+}
+
+int for_each_packed_object(each_packed_object_fn cb, void *data, unsigned 
flags)
+{
+   struct packed_git *p;
+   int r = 0;
+   int pack_errors = 0;
+
+   prepare_packed_git();
+   for (p = packed_git; p; p = p->next) {
+   if ((flags & FOR_EACH_OBJECT_LOCAL_ONLY) && !p->pack_local)
+   continue;
+   if (open_pack_index(p)) {
+   pack_errors = 1;
+   continue;
+   }
+   r = for_each_object_in_pack(p, cb, data);
+   if (r)
+   break;
+   }
+   return r ? r : pack_errors;
+}
diff --git a/packfile.h b/packfile.h
index 1b6ea832c..ca4cc3b97 100644
--- a/packfile.h
+++ b/packfile.h
@@ -124,4 +124,15 @@ extern int has_sha1_pack(const unsigned char *sha1);
 
 extern int has_pack_index(const unsigned char *sha1);
 
+/*
+ * Iterate over packed objects in both the local
+ * repository and any alternates repositories (unless the
+ * FOR_EACH_OBJECT_LOCAL_ONLY flag, defined in cache.h, is set).
+ */
+typedef int each_packed_object_fn(const struct object_id *oid,
+ struct packed_git *pack,
+ uint32_t pos,
+ void *data);
+extern int for_each_packed_object(each_packed_object_fn, void *, unsigned 
flags);
+
 #endif
diff --git a/reachable.c b/reachable.c
index c62efbfd4..d1ac5d97e 100644
--- a/reachable.c
+++ b/reachable.c
@@ -9,6 +9,7 @@
 #include "cache-tree.h"
 #include "progress.h"
 #include "list-objects.h"
+#include "packfile.h"
 
 struct connectivity_progress {
struct progress *progress;
diff --git a/sha1_file.c b/sha1_file.c
index fa422435f..0bb2343f8 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2014,46 +2014,6 @@ int for_each_loose_object(each_loose_object_fn cb, void 
*data, unsigned flags)
return foreach_alt_odb(loose_from_alt_odb, );
 }
 
-static int for_each_object_in_pack(struct packed_git *p, each_packed_object_fn 
cb, void *data)
-{
-   uint32_t i;
-   int r = 0;
-
-   for (i = 0; i < p->num_objects; i++) {
-   struct object_id oid;
-
-   if (!nth_packed_object_oid(, p, i))
-   return error("unable to get sha1 of object %u in %s",
-i, p->pack_name);
-
-   r = cb(, p, i, data);
-   if (r)
-   break;
-   }
-   return r;
-}
-
-int for_each_packed_object(each_packed_object_fn cb, void *data, unsigned 
flags)
-{
-   struct packed_git *p;
-   int r = 0;
-   int pack_errors = 0;
-
-

[PATCH v3 02/23] pack: move static state variables

2017-08-18 Thread Jonathan Tan

sha1_file.c declares some static variables that store packfile-related
state. Move them to packfile.c.

They are temporarily made global, but subsequent commits will restore
their scope back to static.

Signed-off-by: Jonathan Tan 
---
 packfile.c  | 14 ++
 packfile.h  |  9 +
 sha1_file.c | 13 -
 3 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/packfile.c b/packfile.c
index 0d191dfd6..0f46e0617 100644
--- a/packfile.c
+++ b/packfile.c
@@ -1,4 +1,5 @@
 #include "cache.h"
+#include "mru.h"
 
 char *odb_pack_name(struct strbuf *buf,
const unsigned char *sha1,
@@ -21,3 +22,16 @@ char *sha1_pack_index_name(const unsigned char *sha1)
static struct strbuf buf = STRBUF_INIT;
return odb_pack_name(, sha1, "idx");
 }
+
+unsigned int pack_used_ctr;
+unsigned int pack_mmap_calls;
+unsigned int peak_pack_open_windows;
+unsigned int pack_open_windows;
+unsigned int pack_open_fds;
+unsigned int pack_max_fds;
+size_t peak_pack_mapped;
+size_t pack_mapped;
+struct packed_git *packed_git;
+
+static struct mru packed_git_mru_storage;
+struct mru *packed_git_mru = _git_mru_storage;
diff --git a/packfile.h b/packfile.h
index 3c4a0dbd7..a76bb7cec 100644
--- a/packfile.h
+++ b/packfile.h
@@ -24,4 +24,13 @@ extern char *sha1_pack_name(const unsigned char *sha1);
  */
 extern char *sha1_pack_index_name(const unsigned char *sha1);
 
+extern unsigned int pack_used_ctr;
+extern unsigned int pack_mmap_calls;
+extern unsigned int peak_pack_open_windows;
+extern unsigned int pack_open_windows;
+extern unsigned int pack_open_fds;
+extern unsigned int pack_max_fds;
+extern size_t peak_pack_mapped;
+extern size_t pack_mapped;
+
 #endif
diff --git a/sha1_file.c b/sha1_file.c
index 6e7a20b52..2b5ce9959 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -683,19 +683,6 @@ static int has_loose_object(const unsigned char *sha1)
return check_and_freshen(sha1, 0);
 }
 
-static unsigned int pack_used_ctr;
-static unsigned int pack_mmap_calls;
-static unsigned int peak_pack_open_windows;
-static unsigned int pack_open_windows;
-static unsigned int pack_open_fds;
-static unsigned int pack_max_fds;
-static size_t peak_pack_mapped;
-static size_t pack_mapped;
-struct packed_git *packed_git;
-
-static struct mru packed_git_mru_storage;
-struct mru *packed_git_mru = _git_mru_storage;
-
 void pack_report(void)
 {
fprintf(stderr,
-- 
2.14.1.480.gb18f417b89-goog

[PATCH v3 06/23] pack: move pack-closing functions

2017-08-18 Thread Jonathan Tan

The function close_pack_fd() needs to be temporarily made global. Its
scope will be restored to static in a subsequent commit.

Signed-off-by: Jonathan Tan 
---
 builtin/am.c   |  1 +
 builtin/clone.c|  1 +
 builtin/fetch.c|  1 +
 builtin/merge.c|  1 +
 builtin/receive-pack.c |  1 +
 cache.h|  8 
 packfile.c | 54 +
 packfile.h | 11 ++
 sha1_file.c| 55 --
 9 files changed, 70 insertions(+), 63 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index c973bd96d..b9d9ff203 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -31,6 +31,7 @@
 #include "mailinfo.h"
 #include "apply.h"
 #include "string-list.h"
+#include "packfile.h"
 
 /**
  * Returns 1 if the file is empty or does not exist, 0 otherwise.
diff --git a/builtin/clone.c b/builtin/clone.c
index 08b5cc433..13abb075a 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -25,6 +25,7 @@
 #include "remote.h"
 #include "run-command.h"
 #include "connected.h"
+#include "packfile.h"
 
 /*
  * Overall FIXMEs:
diff --git a/builtin/fetch.c b/builtin/fetch.c
index c87e59f3b..c86c36f37 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -17,6 +17,7 @@
 #include "connected.h"
 #include "argv-array.h"
 #include "utf8.h"
+#include "packfile.h"
 
 static const char * const builtin_fetch_usage[] = {
N_("git fetch [] [ [...]]"),
diff --git a/builtin/merge.c b/builtin/merge.c
index 900bafdb4..45e673dcc 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -32,6 +32,7 @@
 #include "gpg-interface.h"
 #include "sequencer.h"
 #include "string-list.h"
+#include "packfile.h"
 
 #define DEFAULT_TWOHEAD (1<<0)
 #define DEFAULT_OCTOPUS (1<<1)
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index cabdc55e0..0019a484f 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -23,6 +23,7 @@
 #include "fsck.h"
 #include "tmp-objdir.h"
 #include "oidset.h"
+#include "packfile.h"
 
 static const char * const receive_pack_usage[] = {
N_("git receive-pack "),
diff --git a/cache.h b/cache.h
index f271033db..bbc56566e 100644
--- a/cache.h
+++ b/cache.h
@@ -1645,15 +1645,7 @@ extern int odb_mkstemp(struct strbuf *template, const 
char *pattern);
  */
 extern int odb_pack_keep(const char *name);
 
-/*
- * munmap the index file for the specified packfile (if it is
- * currently mmapped).
- */
-extern void close_pack_index(struct packed_git *);
-
 extern unsigned char *use_pack(struct packed_git *, struct pack_window **, 
off_t, size_t *);
-extern void close_pack_windows(struct packed_git *);
-extern void close_all_packs(void);
 extern void unuse_pack(struct pack_window **);
 extern void clear_delta_base_cache(void);
 extern struct packed_git *add_packed_git(const char *path, size_t path_len, 
int local);
diff --git a/packfile.c b/packfile.c
index 8daa74ad1..c8e2dbdee 100644
--- a/packfile.c
+++ b/packfile.c
@@ -257,3 +257,57 @@ void release_pack_memory(size_t need)
while (need >= (cur - pack_mapped) && unuse_one_window(NULL))
; /* nothing */
 }
+
+void close_pack_windows(struct packed_git *p)
+{
+   while (p->windows) {
+   struct pack_window *w = p->windows;
+
+   if (w->inuse_cnt)
+   die("pack '%s' still has open windows to it",
+   p->pack_name);
+   munmap(w->base, w->len);
+   pack_mapped -= w->len;
+   pack_open_windows--;
+   p->windows = w->next;
+   free(w);
+   }
+}
+
+int close_pack_fd(struct packed_git *p)
+{
+   if (p->pack_fd < 0)
+   return 0;
+
+   close(p->pack_fd);
+   pack_open_fds--;
+   p->pack_fd = -1;
+
+   return 1;
+}
+
+void close_pack_index(struct packed_git *p)
+{
+   if (p->index_data) {
+   munmap((void *)p->index_data, p->index_size);
+   p->index_data = NULL;
+   }
+}
+
+static void close_pack(struct packed_git *p)
+{
+   close_pack_windows(p);
+   close_pack_fd(p);
+   close_pack_index(p);
+}
+
+void close_all_packs(void)
+{
+   struct packed_git *p;
+
+   for (p = packed_git; p; p = p->next)
+   if (p->do_not_close)
+   die("BUG: want to close pack marked 'do-not-close'");
+   else
+   close_pack(p);
+}
diff --git a/packfile.h b/packfile.h
index f6fe1c741..c6a07de62 100644
--- a/packfile.h
+++ b/packfile.h
@@ -43,6 +43,17 @@ extern void pack_report(void);
  */
 extern int open_pack_index(struct packed_git *);
 
+/*
+ * munmap the index file for the specified packfile (if it is
+ * currently mmapped).
+ */
+extern void close_pack_index(struct packed_git *);
+
+extern void close_pack_windows(struct packed_git *);
+extern void close_all_packs(void);
+
+extern int close_pack_fd(struct packed_git *);
+

[PATCH v3 07/23] pack: move use_pack()

2017-08-18 Thread Jonathan Tan

The function open_packed_git() needs to be temporarily made global. Its
scope will be restored to static in a subsequent commit.

Signed-off-by: Jonathan Tan 
---
 cache.h |   1 -
 packfile.c  | 303 ++--
 packfile.h  |  14 +--
 sha1_file.c | 285 
 streaming.c |   1 +
 5 files changed, 298 insertions(+), 306 deletions(-)

diff --git a/cache.h b/cache.h
index bbc56566e..a27018210 100644
--- a/cache.h
+++ b/cache.h
@@ -1645,7 +1645,6 @@ extern int odb_mkstemp(struct strbuf *template, const 
char *pattern);
  */
 extern int odb_pack_keep(const char *name);
 
-extern unsigned char *use_pack(struct packed_git *, struct pack_window **, 
off_t, size_t *);
 extern void unuse_pack(struct pack_window **);
 extern void clear_delta_base_cache(void);
 extern struct packed_git *add_packed_git(const char *path, size_t path_len, 
int local);
diff --git a/packfile.c b/packfile.c
index c8e2dbdee..ea451d27e 100644
--- a/packfile.c
+++ b/packfile.c
@@ -24,14 +24,14 @@ char *sha1_pack_index_name(const unsigned char *sha1)
return odb_pack_name(, sha1, "idx");
 }
 
-unsigned int pack_used_ctr;
-unsigned int pack_mmap_calls;
-unsigned int peak_pack_open_windows;
-unsigned int pack_open_windows;
+static unsigned int pack_used_ctr;
+static unsigned int pack_mmap_calls;
+static unsigned int peak_pack_open_windows;
+static unsigned int pack_open_windows;
 unsigned int pack_open_fds;
-unsigned int pack_max_fds;
-size_t peak_pack_mapped;
-size_t pack_mapped;
+static unsigned int pack_max_fds;
+static size_t peak_pack_mapped;
+static size_t pack_mapped;
 struct packed_git *packed_git;
 
 static struct mru packed_git_mru_storage;
@@ -228,7 +228,7 @@ static void scan_windows(struct packed_git *p,
}
 }
 
-int unuse_one_window(struct packed_git *current)
+static int unuse_one_window(struct packed_git *current)
 {
struct packed_git *p, *lru_p = NULL;
struct pack_window *lru_w = NULL, *lru_l = NULL;
@@ -274,7 +274,7 @@ void close_pack_windows(struct packed_git *p)
}
 }
 
-int close_pack_fd(struct packed_git *p)
+static int close_pack_fd(struct packed_git *p)
 {
if (p->pack_fd < 0)
return 0;
@@ -311,3 +311,288 @@ void close_all_packs(void)
else
close_pack(p);
 }
+
+/*
+ * The LRU pack is the one with the oldest MRU window, preferring packs
+ * with no used windows, or the oldest mtime if it has no windows allocated.
+ */
+static void find_lru_pack(struct packed_git *p, struct packed_git **lru_p, 
struct pack_window **mru_w, int *accept_windows_inuse)
+{
+   struct pack_window *w, *this_mru_w;
+   int has_windows_inuse = 0;
+
+   /*
+* Reject this pack if it has windows and the previously selected
+* one does not.  If this pack does not have windows, reject
+* it if the pack file is newer than the previously selected one.
+*/
+   if (*lru_p && !*mru_w && (p->windows || p->mtime > (*lru_p)->mtime))
+   return;
+
+   for (w = this_mru_w = p->windows; w; w = w->next) {
+   /*
+* Reject this pack if any of its windows are in use,
+* but the previously selected pack did not have any
+* inuse windows.  Otherwise, record that this pack
+* has windows in use.
+*/
+   if (w->inuse_cnt) {
+   if (*accept_windows_inuse)
+   has_windows_inuse = 1;
+   else
+   return;
+   }
+
+   if (w->last_used > this_mru_w->last_used)
+   this_mru_w = w;
+
+   /*
+* Reject this pack if it has windows that have been
+* used more recently than the previously selected pack.
+* If the previously selected pack had windows inuse and
+* we have not encountered a window in this pack that is
+* inuse, skip this check since we prefer a pack with no
+* inuse windows to one that has inuse windows.
+*/
+   if (*mru_w && *accept_windows_inuse == has_windows_inuse &&
+   this_mru_w->last_used > (*mru_w)->last_used)
+   return;
+   }
+
+   /*
+* Select this pack.
+*/
+   *mru_w = this_mru_w;
+   *lru_p = p;
+   *accept_windows_inuse = has_windows_inuse;
+}
+
+static int close_one_pack(void)
+{
+   struct packed_git *p, *lru_p = NULL;
+   struct pack_window *mru_w = NULL;
+   int accept_windows_inuse = 1;
+
+   for (p = packed_git; p; p = p->next) {
+   if (p->pack_fd == -1)
+   continue;
+   find_lru_pack(p, _p, _w, _windows_inuse);
+   }
+
+   if (lru_p)
+

[PATCH v3 01/23] pack: move pack name-related functions

2017-08-18 Thread Jonathan Tan

Currently, sha1_file.c and cache.h contain many functions, both related
to and unrelated to packfiles. This makes both files very large and
causes an unclear separation of concerns.

Create a new file, packfile.c, to hold all packfile-related functions
currently in sha1_file.c. It has a corresponding header packfile.h.

In this commit, the pack name-related functions are moved. Subsequent
commits will move the other functions.

Signed-off-by: Jonathan Tan 
---
 Makefile |  1 +
 builtin/index-pack.c |  1 +
 builtin/pack-redundant.c |  1 +
 cache.h  | 23 ---
 fast-import.c|  1 +
 http.c   |  1 +
 outgoing/packfile.h  |  0
 packfile.c   | 23 +++
 packfile.h   | 27 +++
 sha1_file.c  | 23 +--
 10 files changed, 56 insertions(+), 45 deletions(-)
 create mode 100644 outgoing/packfile.h
 create mode 100644 packfile.c
 create mode 100644 packfile.h

diff --git a/Makefile b/Makefile
index 461c845d3..5cdecaa17 100644
--- a/Makefile
+++ b/Makefile
@@ -816,6 +816,7 @@ LIB_OBJS += notes-merge.o
 LIB_OBJS += notes-utils.o
 LIB_OBJS += object.o
 LIB_OBJS += oidset.o
+LIB_OBJS += packfile.o
 LIB_OBJS += pack-bitmap.o
 LIB_OBJS += pack-bitmap-write.o
 LIB_OBJS += pack-check.o
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 26828c1d8..f2be145e1 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -12,6 +12,7 @@
 #include "exec_cmd.h"
 #include "streaming.h"
 #include "thread-utils.h"
+#include "packfile.h"
 
 static const char index_pack_usage[] =
 "git index-pack [-v] [-o ] [--keep | --keep=] [--verify] 
[--strict] ( | --stdin [--fix-thin] [])";
diff --git a/builtin/pack-redundant.c b/builtin/pack-redundant.c
index cb1df1c76..aaa813632 100644
--- a/builtin/pack-redundant.c
+++ b/builtin/pack-redundant.c
@@ -7,6 +7,7 @@
 */
 
 #include "builtin.h"
+#include "packfile.h"
 
 #define BLKSIZE 512
 
diff --git a/cache.h b/cache.h
index fcba87a69..aa2b4d390 100644
--- a/cache.h
+++ b/cache.h
@@ -902,20 +902,6 @@ extern void check_repository_format(void);
  */
 extern const char *sha1_file_name(const unsigned char *sha1);
 
-/*
- * Return the name of the (local) packfile with the specified sha1 in
- * its name.  The return value is a pointer to memory that is
- * overwritten each time this function is called.
- */
-extern char *sha1_pack_name(const unsigned char *sha1);
-
-/*
- * Return the name of the (local) pack index file with the specified
- * sha1 in its name.  The return value is a pointer to memory that is
- * overwritten each time this function is called.
- */
-extern char *sha1_pack_index_name(const unsigned char *sha1);
-
 /*
  * Return an abbreviated sha1 unique within this repository's object database.
  * The result will be at least `len` characters long, and will be NUL
@@ -1656,15 +1642,6 @@ extern void pack_report(void);
  */
 extern int odb_mkstemp(struct strbuf *template, const char *pattern);
 
-/*
- * Generate the filename to be used for a pack file with checksum "sha1" and
- * extension "ext". The result is written into the strbuf "buf", overwriting
- * any existing contents. A pointer to buf->buf is returned as a convenience.
- *
- * Example: odb_pack_name(out, sha1, "idx") => 
".git/objects/pack/pack-1234..idx"
- */
-extern char *odb_pack_name(struct strbuf *buf, const unsigned char *sha1, 
const char *ext);
-
 /*
  * Create a pack .keep file named "name" (which should generally be the output
  * of odb_pack_name). Returns a file descriptor opened for writing, or -1 on
diff --git a/fast-import.c b/fast-import.c
index a959161b4..49516d60e 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -167,6 +167,7 @@ Format of STDIN stream:
 #include "quote.h"
 #include "dir.h"
 #include "run-command.h"
+#include "packfile.h"
 
 #define PACK_ID_BITS 16
 #define MAX_PACK_ID ((1<= 0x070a08
diff --git a/outgoing/packfile.h b/outgoing/packfile.h
new file mode 100644
index 0..e69de29bb
diff --git a/packfile.c b/packfile.c
new file mode 100644
index 0..0d191dfd6
--- /dev/null
+++ b/packfile.c
@@ -0,0 +1,23 @@
+#include "cache.h"
+
+char *odb_pack_name(struct strbuf *buf,
+   const unsigned char *sha1,
+   const char *ext)
+{
+   strbuf_reset(buf);
+   strbuf_addf(buf, "%s/pack/pack-%s.%s", get_object_directory(),
+   sha1_to_hex(sha1), ext);
+   return buf->buf;
+}
+
+char *sha1_pack_name(const unsigned char *sha1)
+{
+   static struct strbuf buf = STRBUF_INIT;
+   return odb_pack_name(, sha1, "pack");
+}
+

[PATCH v3 00/23] Move exported packfile funcs to its own file

2017-08-18 Thread Jonathan Tan

> You'd need to double check, but I think the topics that cause
> trouble are rs/find-apck-entry-bisection and jk/drop-sha1-entry-pos;
> you can start from v2.14.1 and merge these topics on top and then
> build your change on top.  That would allow you to start cooking
> before both of them graduate to 'master', as I expect they are both
> quick-to-next material.  There might be other topics that interfere
> with what you are doing, but you can easily find out what they are
> if you do a trial merge to 'next' and 'pu' yourself.

OK - in addition to the 2 you mentioned, I have found some others
(likely added after you wrote that). The complete list is:
 - rs/find-pack-entry-bisection
 - jk/drop-sha1-entry-pos
 - jt/sha1-file-cleanup (formerly part of this set)
 - mk/use-size-t-in-zlib
 - rs/unpack-entry-leakfix

I have merged all of these and rebased my patches on top.

Other changes:
 - Used packfile.h instead of pack.h (following most people's
   preference)
 - Ensured that I added functions to packfile.h retaining the order they
   were originally in, so that if you run "git diff  --color-moved
   --patience", there are much fewer zebra stripes

The merge base commit can be accessed online [1], if you need it.

[1] https://github.com/jonathantanmy/git/commits/packmigrate

Jonathan Tan (23):
  pack: move pack name-related functions
  pack: move static state variables
  pack: move pack_report()
  pack: move open_pack_index(), parse_pack_index()
  pack: move release_pack_memory()
  pack: move pack-closing functions
  pack: move use_pack()
  pack: move unuse_pack()
  pack: move add_packed_git()
  pack: move install_packed_git()
  pack: move {,re}prepare_packed_git and approximate_object_count
  pack: move unpack_object_header_buffer()
  pack: move get_size_from_delta()
  pack: move unpack_object_header()
  pack: move clear_delta_base_cache(), packed_object_info(),
unpack_entry()
  pack: move nth_packed_object_{sha1,oid}
  pack: move check_pack_index_ptr(), nth_packed_object_offset()
  pack: move find_pack_entry_one(), is_pack_valid()
  pack: move find_sha1_pack()
  pack: move find_pack_entry() and make it global
  pack: move has_sha1_pack()
  pack: move has_pack_index()
  pack: move for_each_packed_object()

 Makefile |1 +
 builtin/am.c |1 +
 builtin/cat-file.c   |1 +
 builtin/clone.c  |1 +
 builtin/count-objects.c  |1 +
 builtin/fetch.c  |1 +
 builtin/fsck.c   |1 +
 builtin/gc.c |1 +
 builtin/index-pack.c |1 +
 builtin/merge.c  |1 +
 builtin/pack-objects.c   |1 +
 builtin/pack-redundant.c |1 +
 builtin/prune-packed.c   |1 +
 builtin/receive-pack.c   |1 +
 bulk-checkin.c   |1 +
 cache.h  |  122 +--
 connected.c  |1 +
 diff.c   |1 +
 fast-import.c|1 +
 fetch-pack.c |1 +
 git-compat-util.h|2 -
 http-backend.c   |1 +
 http-push.c  |1 +
 http-walker.c|1 +
 http.c   |1 +
 outgoing/packfile.h  |0
 pack-bitmap.c|1 +
 pack-check.c |1 +
 packfile.c   | 1896 +++
 packfile.h   |  138 +++
 path.c   |1 +
 reachable.c  |1 +
 revision.c   |1 +
 server-info.c|1 +
 sha1_file.c  | 2452 ++
 sha1_name.c  |1 +
 streaming.c  |1 +
 37 files changed, 2354 insertions(+), 2287 deletions(-)
 create mode 100644 outgoing/packfile.h
 create mode 100644 packfile.c
 create mode 100644 packfile.h

-- 
2.14.1.480.gb18f417b89-goog

Re: Revision resolution for remote-helpers?

2017-08-18 Thread Mike Hommey

On Fri, Aug 18, 2017 at 03:06:37PM -0700, Jonathan Nieder wrote:
> Hi,
> 
> Mike Hommey wrote:
> 
> > My thought is that a string like :: could be used
> > wherever a committish is expected. That would call some helper
> > and request to resolve revision, and the helper would provide a git
> > commit as a response.
> 
> I like this idea.
> 
> > The reason for the :: prefix is that it matches the ::
> > prefix used for remote helpers.
> >
> > Now, there are a few caveats:
> > - , for e.g. svn, pretty much would depend on the remote.
> >   OTOH, in mercurial, it doesn't. I think it would be fair for the
> >   helper to have to deal with making what appears after :: relevant
> >   to find the right revision, by possibly including a remote name.
> > - msys likes to completely fuck up command lines when they contain ::.
> >   For remote helpers, the alternative that works is
> >   :///etc.
> 
> Hm --- is there a bug already open about this (e.g. in the Git for
> Windows project or in msys) where I can read more?

It's entirely an msys problem. Msys has weird rules to translate between
unix paths and windows paths on the command line, and botches everything
as a consequence. That's by "design".

http://www.mingw.org/wiki/Posix_path_conversion

(Particularly, see the last two entries)

That only happens when calling native Windows programs from a msys
shell.

> > Which leads me to think some "virtual" ref namespace could be a solution
> > to the problem. So instead of ::, the prefix would be /.
> > For e.g. svn, svn/$remote/$rev would be a natural way to specify the
> > revision for a given remote. For mercurial, hg/$sha1.
> 
> I see.  My naive assumption would be that a string like r12345 would be
> the most natural way for a user to want to specify a Subversion
> revision, but you're right that those only have meaning scoped to a
> particular server.  That makes the svn/ prefix more tolerable.
> 
> > Potentially, this could be a sort of pluggable ref stores, which could
> > be used for extensions such as the currently discussed reftable.
> >
> > On the opposite end of the problem, I'm also thinking about git log
> > --decorate= displaying the mercurial revisions where branch
> > decorations would normally go.
> >
> > I have no patch to show for it. Those are ideas that I first want to
> > discuss before implementing anything.
> 
> One additional thought: unlike refs, these are not necessarily
> enumerable (and I wouldn't expect "git show-ref" to show them) and
> they do not affect "git prune"'s reachability computation.
> 
> So internally I don't think refs is the right concept to map these to.
> I think the right layer is revision syntax handling (revision.c).

Makes sense.

Mike

Re: Revision resolution for remote-helpers?

2017-08-18 Thread Jonathan Nieder

Hi,

Mike Hommey wrote:

> My thought is that a string like :: could be used
> wherever a committish is expected. That would call some helper
> and request to resolve revision, and the helper would provide a git
> commit as a response.

I like this idea.

> The reason for the :: prefix is that it matches the ::
> prefix used for remote helpers.
>
> Now, there are a few caveats:
> - , for e.g. svn, pretty much would depend on the remote.
>   OTOH, in mercurial, it doesn't. I think it would be fair for the
>   helper to have to deal with making what appears after :: relevant
>   to find the right revision, by possibly including a remote name.
> - msys likes to completely fuck up command lines when they contain ::.
>   For remote helpers, the alternative that works is
>   :///etc.

Hm --- is there a bug already open about this (e.g. in the Git for
Windows project or in msys) where I can read more?

> Which leads me to think some "virtual" ref namespace could be a solution
> to the problem. So instead of ::, the prefix would be /.
> For e.g. svn, svn/$remote/$rev would be a natural way to specify the
> revision for a given remote. For mercurial, hg/$sha1.

I see.  My naive assumption would be that a string like r12345 would be
the most natural way for a user to want to specify a Subversion
revision, but you're right that those only have meaning scoped to a
particular server.  That makes the svn/ prefix more tolerable.

> Potentially, this could be a sort of pluggable ref stores, which could
> be used for extensions such as the currently discussed reftable.
>
> On the opposite end of the problem, I'm also thinking about git log
> --decorate= displaying the mercurial revisions where branch
> decorations would normally go.
>
> I have no patch to show for it. Those are ideas that I first want to
> discuss before implementing anything.

One additional thought: unlike refs, these are not necessarily
enumerable (and I wouldn't expect "git show-ref" to show them) and
they do not affect "git prune"'s reachability computation.

So internally I don't think refs is the right concept to map these to.
I think the right layer is revision syntax handling (revision.c).

Thanks,
Jonathan

Re: [PATCH] pull: respect submodule update configuration

2017-08-18 Thread Stefan Beller

On Fri, Aug 18, 2017 at 3:04 PM, Stefan Beller  wrote:
> From: Lars Schneider 

eh, that is what I get for amending to Lars patch.

[PATCH] pull: respect submodule update configuration

2017-08-18 Thread Stefan Beller

From: Lars Schneider 

Do not override the submodule configuration in the call to update
the submodules, but give a weaker default.

Reported-by: Lars Schneider 
Signed-off-by: Stefan Beller 
---
  
Personally I dislike this patch, but I have no better idea for the time
being.

Thanks,
Stefan 

 builtin/pull.c |  6 --
 git-submodule.sh   |  7 ++-
 t/t7400-submodule-basic.sh | 22 ++
 3 files changed, 32 insertions(+), 3 deletions(-)

diff --git a/builtin/pull.c b/builtin/pull.c
index 9b86e519b1..be4f74d764 100644
--- a/builtin/pull.c
+++ b/builtin/pull.c
@@ -553,7 +553,8 @@ static int rebase_submodules(void)
cp.git_cmd = 1;
cp.no_stdin = 1;
argv_array_pushl(, "submodule", "update",
-  "--recursive", "--rebase", NULL);
+  "--recursive", "--default-update",
+  "rebase", NULL);
 
return run_command();
 }
@@ -565,7 +566,8 @@ static int update_submodules(void)
cp.git_cmd = 1;
cp.no_stdin = 1;
argv_array_pushl(, "submodule", "update",
-  "--recursive", "--checkout", NULL);
+  "--recursive", "--default-update",
+  "checkout", NULL);
 
return run_command();
 }
diff --git a/git-submodule.sh b/git-submodule.sh
index e131760eec..6dbc32e686 100755
--- a/git-submodule.sh
+++ b/git-submodule.sh
@@ -511,6 +511,7 @@ fetch_in_submodule () (
 cmd_update()
 {
# parse $args after "submodule ... update".
+   default_update="checkout"
while test $# -ne 0
do
case "$1" in
@@ -552,6 +553,10 @@ cmd_update()
--checkout)
update="checkout"
;;
+   --default-update)
+   default_update="$2"
+   shift
+   ;;
--recommend-shallow)
recommend_shallow="--recommend-shallow"
;;
@@ -619,7 +624,7 @@ cmd_update()
update_module=$(git config submodule."$name".update)
if test -z "$update_module"
then
-   update_module="checkout"
+   update_module="$default_update"
fi
fi
 
diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh
index dcac364c5f..ff64bf8528 100755
--- a/t/t7400-submodule-basic.sh
+++ b/t/t7400-submodule-basic.sh
@@ -1289,4 +1289,26 @@ test_expect_success 'init properly sets the config' '
test_must_fail git -C multisuper_clone config --get 
submodule.sub1.active
 '
 
+test_expect_success 'submodule update and git pull with disabled submodule' '
+   test_when_finished "rm -rf multisuper_clone" &&
+   pwd=$(pwd) &&
+   cat <<-\EOF >expect &&
+   -sub0
+sub1 (test2)
+sub2 (test2)
+sub3 (test2)
+sub4 (test2)
+sub5 (test2)
+   EOF
+   git clone file://"$pwd"/multisuper multisuper_clone &&
+   (
+   cd multisuper_clone &&
+   git config --local submodule.sub0.update none &&
+   git submodule update --init --recursive &&
+   git pull --recurse-submodules &&
+   git submodule status | cut -c 1,43- >actual
+   ) &&
+   test_cmp expect multisuper_clone/actual
+'
+
 test_done
-- 
2.14.0.rc0.3.g6c2e499285

Re: Revision resolution for remote-helpers?

2017-08-18 Thread Mike Hommey

On Fri, Aug 18, 2017 at 08:15:09AM -0400, Jeff King wrote:
> On Fri, Aug 18, 2017 at 03:42:08PM +0900, Mike Hommey wrote:
> 
> > I was thinking it could be useful to have a special syntax for revisions
> > that would query a helper program. The helper program could use a
> > similar protocol to that of the remote helpers.
> 
> That sounds like a reasonable thing to want.
> 
> > My thought is that a string like :: could be used
> > wherever a committish is expected. That would call some helper
> > and request to resolve revision, and the helper would provide a git
> > commit as a response.
> 
> So I'm guessing this would look something like:
> 
>   git log svn::12345
> 
> I think even without Git support, you could do something like:
> 
>   git log $(git svn map 12345)

That's what I do, but subshells and all is extra cumbersome.

> which is similarly complex in terms of concepts, and not too many more
> characters. That would be a little more awkward outside of a shell,
> though.
> 
> But it did get me wondering if we could do _better_ with something built
> into Git. For example, could we have an external resolution helper that
> resolves names to object ids as a fallback after internal resolution has
> failed. And then you could do:
> 
>  git log 12345
> 
> and it would just work. Efficiency shouldn't be a big problem, because
> we'd hit the helper only in the error case.
> 
> I'd be more concerned about awkward ambiguities, though. If mercurial is
> also using sha1s, then there's nothing syntactic to differentiate the
> two. For that matter, 12345 has the same problem, since it could be a
> partial sha1.

For something as short, the likelihood of hitting an actual existing
abbreviated sha1 is quite high, too.

> It might work to actually check if we have the object and then bail
> to the remote resolver only if we don't. But that's actually conflating
> name resolution with object lookup, which our internals typically keep
> separate.
> 
> So maybe this is a bad direction to go in. I'm mostly just thinking out
> loud here.
> 
> > Which leads me to think some "virtual" ref namespace could be a solution
> > to the problem. So instead of ::, the prefix would be /.
> > For e.g. svn, svn/$remote/$rev would be a natural way to specify the
> > revision for a given remote. For mercurial, hg/$sha1.
> 
> Interesting. I do like the general idea of having external helpers to
> fill in bits of the virtual namespace. But it may also open many cans of
> worms. :)
> 
> > Potentially, this could be a sort of pluggable ref stores, which could
> > be used for extensions such as the currently discussed reftable.
> 
> The current pluggable code is definitely geared for build-time
> pluggability, not runtime. But I think you could have a builtin
> pluggable store that does the overlay, and then chains to another
> backend. I.e., configure something like:
> 
>   [extensions]
>   refBackend = externalOverlay
> 
>   [externalOverlay "svn"]
>   namespace = refs/svn
>   command = my-svn-mapper
> 
>   [externalOverlay]
>   chain = reftable
> 
> That would allow the externalOverlay thing to develop independent of the
> core of Git's refs code.

That's a lot of configuration, but it's definitely an interesting
proposition.

> > On the opposite end of the problem, I'm also thinking about git log
> > --decorate= displaying the mercurial revisions where branch
> > decorations would normally go.
> 
> Interesting thought. I'm not sure if that would be a good thing or a bad
> thing. But one of the virtual methods for pluggable backends is
> "enumerate all refs". If you're mapping every mercurial revision, that's
> going to be a lot of refs (and potentially a lot of overhead for certain
> operations).
> 
> I think the decorate code just looks at a few parts of the refs
> namespace right now (so a "refs/svn" would probably get ignored by
> default).

I think decorate would need its own special entry to the "ref query" API
to answer the question "what ref points to " instead of scanning
the whole namespace.

Mike

Re: [bug] Git submodule command interprets switch as argument and switch

2017-08-18 Thread R0b0t1

On Fri, Aug 18, 2017 at 3:38 PM, Jonathan Nieder  wrote:
> Hi,
>
> R0b0t1 wrote:
>
>> The issue is as follows:
>>
>> R0b0t1@host:~/devel/project$ git submodule add
>> https://github.com/user/project -f
>> Cloning into '/home/R0b0t1/devel/project/-f'...
>
> Thanks for reporting.  Confusingly, I think this is intended behavior.
> "git help submodule" explains:
>
> add [-b ] [-f|--force] [--name ]
> [--reference ] [--depth ] [--]
>  []
>
> Add the given repository as a submodule at the given
> path [etc]
>
> Since the -f comes after , it is a .
>

Not to comment on every response to this bug, but I understand. What
is confusing is that the command was failing without being forced, and
without thinking I added "-f" at the end. The command succeeded as if
I supplied the force flag, but per my experience with other tools, and
with other Git commands, "-f" should not be a flag. It should be a
path, as you say. However it appears to be both.

To reproduce my situation exactly, add "*" to your .gitignore. The
directory "gerrit" should then be ignored, and Git will warn you that
the submodule will not be tracked (I may have another issue to report
related to this, but I'm still trying to figure out what is going on).
However, if you name the directory something else, like "-f", it will
still be matched by the .gitignore rule and should not succeed. This
is why I think the path is also being interpreted as a flag.

Something else may be happening, but either way the behavior does not
seem to be expected nor consistent with other parts of Git.

> That said, there are a few related things wrong here.
>
> The usage string above says I can put "--" before the  to
> make things extra unambiguous.  But when I try that, I get the following
> result:
>
> $ git submodule add -- https://gerrit.googlesource.com/gerrit -f
> Cloning into '/tmp/t/test/-f'...
> [...]
> Resolving deltas: 100% (215796/215796), done.
> /usr/lib/git-core/git-submodule: line 261: cd: -f: invalid option
> cd: usage: cd [-L|[-P [-e]] [-@]] [dir]
> Unable to checkout submodule '-f'
>
> If I try to put the "--" between  and , I get another
> confusing result:
>
> $ git submodule add https://gerrit.googlesource.com/gerrit -- -f
> '--' already exists in the index
>
> "git help cli" is supposed to give advice about this kind of thing as
> well --- e.g., it gives some sound advice about what form of flags
> scripts should use (e.g., to always use the 'stuck' form --name=
> instead of --name name).  But it doesn't mention this issue of flags
> belonging before other arguments.
>
> Thoughts?
>

I have experienced issues with -- before. It has been long enough I
have forgotten. Sorry for not being able to provide anything concrete.

What's cooking in git.git (Aug 2017, #04; Fri, 18)

2017-08-18 Thread Junio C Hamano

Here are the topics that have been cooking.  Commits prefixed with
'-' are only in 'pu' (proposed updates) while commits prefixed with
'+' are in 'next'.  The ones marked with '.' do not appear in any of
the integration branches, but I am still holding onto them.

The top part of the draft release notes for the next cycle, which I
tentatively called Git 2.15, declares that "git add ''" will still
be supported up to this release but will become illegal after that.
Given that a typical release cycle is 8-12 weeks, that will happen
sometime early next year.

You can find the changes described here in the integration branches
of the repositories listed at

http://git-blame.blogspot.com/p/git-public-repositories.html

--
[New Topics]

* cc/subprocess-handshake-missing-capabilities (2017-08-16) 1 commit
 - sub-process: print the cmd when a capability is unsupported

 When handshake with a subprocess filter notices that the process
 asked for an unknown capability, Git did not report what program
 the offending subprocess was running.  This has been corrected.

 Will merge to 'next'.

 We may want a follow-up fix to tighten the error checking, though.


* tb/apply-with-crlf (2017-08-17) 3 commits
 - SQUASH???
 - apply: file commited with CRLF should roundtrip diff and apply
 - convert: add SAFE_CRLF_KEEP_CRLF
 (this branch is tangled with jc/apply-with-crlf.)

 "git apply" that is used as a better "patch -p1" failed to apply a
 taken from a file with CRLF line endings to a file with CRLF line
 endings.  The root cause was because it misused convert_to_git()
 that tried to do "safe-crlf" processing by looking at the index
 entry at the same path, which is a nonsense---in that mode, "apply"
 is not working on the data in (or derived from) the index at all.
 This has been fixed.

 Will merge to 'next' after squashing the fix in.


* rs/t1002-do-not-use-sum (2017-08-15) 1 commit
 - t1002: stop using sum(1)

 Test simplification.

 Will merge to 'next'.


* sb/sha1-file-cleanup (2017-08-15) 1 commit
 - sha1_file: make read_info_alternates static

 Code clean-up.

 Will merge to 'next'.


* as/grep-quiet-no-match-exit-code-fix (2017-08-17) 1 commit
 - git-grep: correct exit code with --quiet and -L

 "git grep -L" and "git grep --quiet -L" reported different exit
 codes; this has been corrected.

 Will merge to 'next'.


* hv/t5526-andand-chain-fix (2017-08-17) 1 commit
 - t5526: fix some broken && chains

 Test fix.

 Will merge to 'next'.


* jc/diff-sane-truncate-no-more (2017-08-17) 1 commit
 - diff: retire sane_truncate_fn

 Code clean-up.

 Will merge to 'next'.


* ks/branch-set-upstream (2017-08-17) 3 commits
 - branch: quote branch/ref names to improve readability
 - builtin/branch: stop supporting the "--set-upstream" option
 - t3200: cleanup cruft of a test

 "branch --set-upstream" that has been deprecated in Git 1.8 has
 finally been retired.

 Will merge to 'next'.


* mg/format-ref-doc-fix (2017-08-18) 2 commits
 - Documentation/git-for-each-ref: clarify peeling of tags for --format
 - Documentation: use proper wording for ref format strings

 Doc fix.

 Will merge to 'next'.


* po/read-graft-line (2017-08-18) 4 commits
 - commit: rewrite read_graft_line
 - commit: allocate array using object_id size
 - commit: replace the raw buffer with strbuf in read_graft_line
 - sha1_file: fix definition of null_sha1

 Conversion from uchar[20] to struct object_id continues; this is to
 ensure that we do not assume sizeof(struct object_id) is the same
 as the length of SHA-1 hash (or length of longest hash we support).

 Will merge to 'next'.


* sb/submodule-parallel-update (2017-08-17) 1 commit
 - submodule.sh: remove unused variable

 Code clean-up.

 Will merge to 'next'.


* jc/apply-with-crlf (2017-08-16) 6 commits
 . apply: clarify read_old_data() is about no-index case
 . apply: localize the CRLF business to read_old_data()
 . apply: only pay attention to CRLF in the preimage
 . apply: remove unused field apply_state.flags
 . apply: file commited with CRLF should roundtrip diff and apply
 - convert: add SAFE_CRLF_KEEP_CRLF
 (this branch is tangled with tb/apply-with-crlf.)

 Will discard as it now is part of the tb/apply-with-crlf topic.

--
[Stalled]

* mg/status-in-progress-info (2017-05-10) 2 commits
 - status --short --inprogress: spell it as --in-progress
 - status: show in-progress info for short status

 "git status" learns an option to report various operations
 (e.g. "merging") that the user is in the middle of.

 cf. 


* nd/worktree-move (2017-04-20) 6 commits
 - worktree remove: new command
 - worktree move: refuse to move worktrees with submodules
 - worktree move: accept destination as directory
 - worktree move: new command
 - worktree.c: add update_worktree_location()
 - worktree.c: add validate_worktree()

 "git worktree" learned move and remove

Re: [bug] Git submodule command interprets switch as argument and switch

2017-08-18 Thread Jonathan Nieder

Hi,

R0b0t1 wrote:

> The issue is as follows:
>
> R0b0t1@host:~/devel/project$ git submodule add
> https://github.com/user/project -f
> Cloning into '/home/R0b0t1/devel/project/-f'...

Thanks for reporting.  Confusingly, I think this is intended behavior.
"git help submodule" explains:

add [-b ] [-f|--force] [--name ]
[--reference ] [--depth ] [--]
 []

Add the given repository as a submodule at the given
path [etc]

Since the -f comes after , it is a .

That said, there are a few related things wrong here.

The usage string above says I can put "--" before the  to
make things extra unambiguous.  But when I try that, I get the following
result:

$ git submodule add -- https://gerrit.googlesource.com/gerrit -f
Cloning into '/tmp/t/test/-f'...
[...]
Resolving deltas: 100% (215796/215796), done.
/usr/lib/git-core/git-submodule: line 261: cd: -f: invalid option
cd: usage: cd [-L|[-P [-e]] [-@]] [dir]
Unable to checkout submodule '-f'

If I try to put the "--" between  and , I get another
confusing result:

$ git submodule add https://gerrit.googlesource.com/gerrit -- -f
'--' already exists in the index

"git help cli" is supposed to give advice about this kind of thing as
well --- e.g., it gives some sound advice about what form of flags
scripts should use (e.g., to always use the 'stuck' form --name=
instead of --name name).  But it doesn't mention this issue of flags
belonging before other arguments.

Thoughts?

Thanks,
Jonathan

Re: [PATCH v4 4/4] commit: rewrite read_graft_line

2017-08-18 Thread Junio C Hamano

Patryk Obara  writes:

> Actually, I don't think I needed to remove free(graft) line, but I don't
> know if freeing NULL is considered ok in git code. Let me know if I
> should bring it back, please.

Either free(graft) or assert(!graft) is fine, but we should have one
of them there.  I'll add assert(!graft) there while queuing, at
least for now.

In the current code, when the control reaches the bad_graft_data
label, 'graft' must be NULL, or there is a bug in our code.  Because
we are parsing exactly the same input using the same helper routines
in both passes, we should see failure during the first pass before
'graft' points to an allocated piece of memory.  So it may be a good
idea to have assert(!graft) there than free(graft); the latter would
sweep a potential bug under the carpet.

If this were a part of the system whose design is still fluid (it is
not), it is not implausible that we would later want to add new test
that jumps to the bad_graft_data label.  For example, after the
"runs exactly twice" loop, we may add a new test that iterates over
the graft->parents[] to ensure that there is no duplicate and jumps
to bad_graft_data when we find one.

If we add assert(!graft) there today, and if such an enhancement
forgets to replace it with free(graft), the assert() will catch the
mistake.  If we have neither, it makes it more likely that such an
enhancement leaves a possible memory leak in its error codepath.

Thanks.

Re:

2017-08-18 Thread Jessy

Hello,


I wish to seek for your assistance in a deal that will be of mutual benefit for 
the both of us from Camp Stanley in Uijeongbu. Please contact me for details, 
God bless you.

Re: [PATCH v4 4/4] commit: rewrite read_graft_line

2017-08-18 Thread Patryk Obara

Ok, so that's an option - in this instance free is not actually needed
because it can be triggered only in phase 0, but it would add a bit of
robustness.

-- 
| ← Ceci n'est pas une pipe
Patryk Obara

Re: [bug] Git submodule command interprets switch as argument and switch

2017-08-18 Thread Stefan Beller

On Thu, Aug 17, 2017 at 10:16 PM, R0b0t1  wrote:
> The issue is as follows:
>
> R0b0t1@host:~/devel/project$ git submodule add
> https://github.com/user/project -f
> Cloning into '/home/R0b0t1/devel/project/-f'...
>
> My .gitignore's first line is *, and then I explicitly allow things.
> Despite the presence of "project/" in the .gitignore the submodule
> command says it is ignored.

That might indicate that another submodule command doesn't
cope with submodule names that look like a common flag.

> The "force" flag is interpreted as a flag
> and also as the destination directory.
>
> It is possible the argument parsing code for other commands exhibits this 
> error.

Yes, though these other commands are in C, not in shell.
Note that Prathamesh is currently porting the "git submodule"
command to C, which would allow us to fix this bug easily.

Also note that the -f is ambigious, what if the user meant
to have the submodule at path "-f" ? (This issue comes
up in many other commands, for example when a path
and a branch name is accepted, the path of a potentially
deleted file.

To solve this git accepts a double dash, which signals git
that anything after the double dash there are arguments not
to be interpreted as a command line flag.

>
> R0b0t1.

Re: [PATCH v4 4/4] commit: rewrite read_graft_line

2017-08-18 Thread Junio C Hamano

Patryk Obara  writes:

> Actually, I don't think I needed to remove free(graft) line, but I don't
> know if freeing NULL is considered ok in git code. Let me know if I
> should bring it back, please.

Calling free(var) when var may or may not be NULL is perfectly fine.

We even discourage people from writing:

if (var)
free(var);

because an unconditional call to free(var) is sufficient.

Re: Submodule regression in 2.14?

2017-08-18 Thread Junio C Hamano

Stefan Beller  writes:

>> In the past "submodule..update=none" was an easy way
>> to selectively disable certain Submodules.
>>
>> How would I do this with Git 2.14?
>
> submodule..active = false
>
>> My gut feeling is that all commands should respect the
>> "submodule..update=none" setting.
>
> Well my gut feeling was that the "update" part of the name
> reponds to the subcommand, not the generic action.
>
> For example when you set update=none, git-status,
> recursive git-diff still reported the submodule.

Both status and diff are read-only operations, so this smells like a
bit bogus argument made by comparing apples and oranges.

I think Lars is more interested in operations that actually affects
the state of submodules by updating them---"submodule update" may be
a prime example as it goes down to run fetch, pull and/or checkout.
It may have been the only thing that affected the state of
submodules before the "--recurse-submodules" option was added to
commands that affect the state of the (super)project, but I would
think that it is not so wrong to expect that these state-affecting
operations running in the "recurse into submodules" mode to honor
"do not update this submodule" that used to be honored only by
"submodule update".

Re: Submodule regression in 2.14?

2017-08-18 Thread Stefan Beller

On Fri, Aug 18, 2017 at 9:50 AM, Junio C Hamano  wrote:
> I am not sure if I follow.  Submodules are not trees and one of the
> reasons people may want to separate things into different modules is
> so that they can treat them differently.  If submodules allow you
> a richer set of operations than a tree that is part of a monolithic
> project, is that necessarily a bad thing?

It is not a bad thing on its own, but we have to consider which
additional actions are useful.

Jonathan brought up the following very long term vision:
Eventually the everyday git commands do not treat submodules
any special than trees, even the submodules git directory
may be non existent (everything is absorbed into the superproject);
so it really feels like a monorepo.
When you want to work on a submodule individually, you have to
make a new working tree.

Re: [PATCH] add test for bug in git-mv with nested submodules

2017-08-18 Thread Stefan Beller

> I just copied the shortcut that they were adding themselfes as submodule
> in 'setup submodule'. The whole setup of submodules in this test is like
> this. This way we already had a nested submodule structure which I could
> just add.
>
> I agree that this is unrealistic so I can change that in the test I am
> adding. But from what I have seen, this shortcut is taken in quite some
> places when dealing with submodules.

Please do not make it worse.
Once upon a time (late '16 IIRC) I had a series floating on the
list removing all occurrences, but there were issues with the
series and it did not land.

Re: [PATCH v4 4/4] commit: rewrite read_graft_line

2017-08-18 Thread Patryk Obara

Actually, I don't think I needed to remove free(graft) line, but I don't
know if freeing NULL is considered ok in git code. Let me know if I
should bring it back, please.

-- 
| ← Ceci n'est pas une pipe
Patryk Obara

[PATCH v4 0/4] Modernize read_graft_line implementation

2017-08-18 Thread Patryk Obara

Changes since v3:

- Commit replacing raw buffer does not store temporary pointer to
  strbuf internals any more.

- Commit message of patch 4 explains all alternative approaches
  considered so far.

- Patch 4 uses two-phases to parse graft line, without code repetition.

I have my reservations about patch 4 from readability standpoint
(it's not immediately clear why parsing code can skip freeing of graft
in phase 2), but this implementation seems to address every issue raised
in review so far. If you'll prefer me to go back to impementation
from v3, I have it prepared ;)


Patryk Obara (4):
  sha1_file: fix definition of null_sha1
  commit: replace the raw buffer with strbuf in read_graft_line
  commit: allocate array using object_id size
  commit: rewrite read_graft_line

 builtin/blame.c |  2 +-
 commit.c| 45 +
 commit.h|  2 +-
 sha1_file.c |  2 +-
 4 files changed, 28 insertions(+), 23 deletions(-)

-- 
2.9.5

[PATCH v4 4/4] commit: rewrite read_graft_line

2017-08-18 Thread Patryk Obara

Old implementation determined number of hashes by dividing length of
line by length of hash, which works only if all hash representations
have same length.

New graft line parser works in two phases:

  1. In first phase line is scanned to verify correctness and compute
 number of hashes, then graft struct is allocated.

  2. In second phase line is scanned again to fill up already allocated
 graft struct.

This way graft parsing code can support different sizes of hashes
without any further code adaptations.

A number of alternative implementations were considered and discarded:

  - Modifying graft structure to store oid_array instead of FLEXI_ARRAY
indicates undesirable usage of struct to readers.

  - Parsing into temporary string_list or oid_array complicates code
by adding more return paths, as these structures needs to be
cleared before returning from function.

  - Determining number of hashes by counting separators might cause
maintenance issues, if this function needs to be modified in future
again.

Signed-off-by: Patryk Obara 
---
 commit.c | 35 ---
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/commit.c b/commit.c
index 436eb34..3eefd9d 100644
--- a/commit.c
+++ b/commit.c
@@ -137,32 +137,37 @@ int register_commit_graft(struct commit_graft *graft, int 
ignore_dups)
 struct commit_graft *read_graft_line(struct strbuf *line)
 {
/* The format is just "Commit Parent1 Parent2 ...\n" */
-   int i;
+   int i, phase;
+   const char *tail = NULL;
struct commit_graft *graft = NULL;
-   const int entry_size = GIT_SHA1_HEXSZ + 1;
+   struct object_id dummy_oid, *oid;
 
strbuf_rtrim(line);
if (!line->len || line->buf[0] == '#')
return NULL;
-   if ((line->len + 1) % entry_size)
-   goto bad_graft_data;
-   i = (line->len + 1) / entry_size - 1;
-   graft = xmalloc(st_add(sizeof(*graft),
-  st_mult(sizeof(struct object_id), i)));
-   graft->nr_parent = i;
-   if (get_oid_hex(line->buf, >oid))
-   goto bad_graft_data;
-   for (i = GIT_SHA1_HEXSZ; i < line->len; i += entry_size) {
-   if (line->buf[i] != ' ')
-   goto bad_graft_data;
-   if (get_sha1_hex(line->buf + i + 1, 
graft->parent[i/entry_size].hash))
+   /*
+* phase 0 verifies line, counts hashes in line and allocates graft
+* phase 1 fills graft
+*/
+   for (phase = 0; phase < 2; phase++) {
+   oid = graft ? >oid : _oid;
+   if (parse_oid_hex(line->buf, oid, ))
goto bad_graft_data;
+   for (i = 0; *tail != '\0'; i++) {
+   oid = graft ? >parent[i] : _oid;
+   if (!isspace(*tail++) || parse_oid_hex(tail, oid, 
))
+   goto bad_graft_data;
+   }
+   if (!graft) {
+   graft = xmalloc(st_add(sizeof(*graft),
+  st_mult(sizeof(struct 
object_id), i)));
+   graft->nr_parent = i;
+   }
}
return graft;
 
 bad_graft_data:
error("bad graft data: %s", line->buf);
-   free(graft);
return NULL;
 }
 
-- 
2.9.5

[PATCH v4 1/4] sha1_file: fix definition of null_sha1

2017-08-18 Thread Patryk Obara

The array is declared in cache.h as:

  extern const unsigned char null_sha1[GIT_MAX_RAWSZ];

Definition in sha1_file.c must match.

Signed-off-by: Patryk Obara 
---
 sha1_file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sha1_file.c b/sha1_file.c
index b60ae15..f5b5bec 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -32,7 +32,7 @@
 #define SZ_FMT PRIuMAX
 static inline uintmax_t sz_fmt(size_t s) { return s; }
 
-const unsigned char null_sha1[20];
+const unsigned char null_sha1[GIT_MAX_RAWSZ];
 const struct object_id null_oid;
 const struct object_id empty_tree_oid = {
EMPTY_TREE_SHA1_BIN_LITERAL
-- 
2.9.5

[PATCH v4 3/4] commit: allocate array using object_id size

2017-08-18 Thread Patryk Obara

struct commit_graft aggregates an array of object_id's, which have
size >= GIT_MAX_RAWSZ bytes. This change prevents memory allocation
error when size of object_id is larger than GIT_SHA1_RAWSZ.

Signed-off-by: Patryk Obara 
---
 commit.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/commit.c b/commit.c
index 1a0a9f2..436eb34 100644
--- a/commit.c
+++ b/commit.c
@@ -147,7 +147,8 @@ struct commit_graft *read_graft_line(struct strbuf *line)
if ((line->len + 1) % entry_size)
goto bad_graft_data;
i = (line->len + 1) / entry_size - 1;
-   graft = xmalloc(st_add(sizeof(*graft), st_mult(GIT_SHA1_RAWSZ, i)));
+   graft = xmalloc(st_add(sizeof(*graft),
+  st_mult(sizeof(struct object_id), i)));
graft->nr_parent = i;
if (get_oid_hex(line->buf, >oid))
goto bad_graft_data;
-- 
2.9.5

[PATCH v4 2/4] commit: replace the raw buffer with strbuf in read_graft_line

2017-08-18 Thread Patryk Obara

This simplifies function declaration and allows for use of strbuf_rtrim
instead of modifying buffer directly.

Signed-off-by: Patryk Obara 
---
 builtin/blame.c |  2 +-
 commit.c| 23 +++
 commit.h|  2 +-
 3 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index bda1a78..d4472e9 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -488,7 +488,7 @@ static int read_ancestry(const char *graft_file)
return -1;
while (!strbuf_getwholeline(, fp, '\n')) {
/* The format is just "Commit Parent1 Parent2 ...\n" */
-   struct commit_graft *graft = read_graft_line(buf.buf, buf.len);
+   struct commit_graft *graft = read_graft_line();
if (graft)
register_commit_graft(graft, 0);
}
diff --git a/commit.c b/commit.c
index 8b28415..1a0a9f2 100644
--- a/commit.c
+++ b/commit.c
@@ -134,34 +134,33 @@ int register_commit_graft(struct commit_graft *graft, int 
ignore_dups)
return 0;
 }
 
-struct commit_graft *read_graft_line(char *buf, int len)
+struct commit_graft *read_graft_line(struct strbuf *line)
 {
/* The format is just "Commit Parent1 Parent2 ...\n" */
int i;
struct commit_graft *graft = NULL;
const int entry_size = GIT_SHA1_HEXSZ + 1;
 
-   while (len && isspace(buf[len-1]))
-   buf[--len] = '\0';
-   if (buf[0] == '#' || buf[0] == '\0')
+   strbuf_rtrim(line);
+   if (!line->len || line->buf[0] == '#')
return NULL;
-   if ((len + 1) % entry_size)
+   if ((line->len + 1) % entry_size)
goto bad_graft_data;
-   i = (len + 1) / entry_size - 1;
+   i = (line->len + 1) / entry_size - 1;
graft = xmalloc(st_add(sizeof(*graft), st_mult(GIT_SHA1_RAWSZ, i)));
graft->nr_parent = i;
-   if (get_oid_hex(buf, >oid))
+   if (get_oid_hex(line->buf, >oid))
goto bad_graft_data;
-   for (i = GIT_SHA1_HEXSZ; i < len; i += entry_size) {
-   if (buf[i] != ' ')
+   for (i = GIT_SHA1_HEXSZ; i < line->len; i += entry_size) {
+   if (line->buf[i] != ' ')
goto bad_graft_data;
-   if (get_sha1_hex(buf + i + 1, graft->parent[i/entry_size].hash))
+   if (get_sha1_hex(line->buf + i + 1, 
graft->parent[i/entry_size].hash))
goto bad_graft_data;
}
return graft;
 
 bad_graft_data:
-   error("bad graft data: %s", buf);
+   error("bad graft data: %s", line->buf);
free(graft);
return NULL;
 }
@@ -174,7 +173,7 @@ static int read_graft_file(const char *graft_file)
return -1;
while (!strbuf_getwholeline(, fp, '\n')) {
/* The format is just "Commit Parent1 Parent2 ...\n" */
-   struct commit_graft *graft = read_graft_line(buf.buf, buf.len);
+   struct commit_graft *graft = read_graft_line();
if (!graft)
continue;
if (register_commit_graft(graft, 1))
diff --git a/commit.h b/commit.h
index 6d857f0..baecc0a 100644
--- a/commit.h
+++ b/commit.h
@@ -247,7 +247,7 @@ struct commit_graft {
 };
 typedef int (*each_commit_graft_fn)(const struct commit_graft *, void *);
 
-struct commit_graft *read_graft_line(char *buf, int len);
+struct commit_graft *read_graft_line(struct strbuf *line);
 int register_commit_graft(struct commit_graft *, int);
 struct commit_graft *lookup_commit_graft(const struct object_id *oid);
 
-- 
2.9.5

Re: [PATCH v3 4/4] commit: rewrite read_graft_line

2017-08-18 Thread Junio C Hamano

Patryk Obara  writes:

> Ah! I presumed two separate loops, one throwing away oids and second
> one actually filling a table - this makes more sense. I was just about
> to send v4, but will rewrite the last patch and we'll see how it looks
> like.

Yeah, it is understandable if you missed my "a loop that runs
exactly twice", as that pattern, while we do use it in a few places
in our codebase, is of limited applicability in general---the cost
of discarded computation in the first pass need to be low enough for
the improved maintainability to make sense.

Re: [PATCH] add test for bug in git-mv with nested submodules

2017-08-18 Thread Heiko Voigt

On Thu, Aug 17, 2017 at 12:05:56PM -0700, Stefan Beller wrote:
> On Thu, Aug 17, 2017 at 3:34 AM, Heiko Voigt  wrote:
> > When using git-mv with a submodule it will detect that and update the
> > paths for its configurations (.gitmodules, worktree and gitfile). This
> > does not work for nested submodules where a user renames the root
> > submodule.
> >
> > We discovered this fact when working on on-demand fetch for renamed
> > submodules. Lets add a test to document.
> >
> > Signed-off-by: Heiko Voigt 
> > ---
> >  t/t7001-mv.sh | 9 +
> >  1 file changed, 9 insertions(+)
> >
> > diff --git a/t/t7001-mv.sh b/t/t7001-mv.sh
> > index e365d1f..39f8aed 100755
> > --- a/t/t7001-mv.sh
> > +++ b/t/t7001-mv.sh
> > @@ -491,4 +491,13 @@ test_expect_success 'moving a submodule in nested 
> > directories' '
> > test_cmp actual expect
> >  '
> >
> > +test_expect_failure 'moving nested submodules' '
> > +   git commit -am "cleanup commit" &&
> > +   git submodule add ./. sub_nested &&
> 
> If possible, I would avoid adding the repo itself
> as a submodule as it is unrealistic in the wild.
> 
> While it may be ok for the test here, later down the road
> other tests making use of it it may become an issue with
> the URL of the submodule.

I just copied the shortcut that they were adding themselfes as submodule
in 'setup submodule'. The whole setup of submodules in this test is like
this. This way we already had a nested submodule structure which I could
just add.

I agree that this is unrealistic so I can change that in the test I am
adding. But from what I have seen, this shortcut is taken in quite some
places when dealing with submodules.

Cheers Heiko

Re: Submodule regression in 2.14?

2017-08-18 Thread Stefan Beller

> In the past "submodule..update=none" was an easy way
> to selectively disable certain Submodules.
>
> How would I do this with Git 2.14?

submodule..active = false

> My gut feeling is that all commands should respect the
> "submodule..update=none" setting.

Well my gut feeling was that the "update" part of the name
reponds to the subcommand, not the generic action.

For example when you set update=none, git-status,
recursive git-diff still reported the submodule.

>
> - Lars

Re: [PATCH v3 4/4] commit: rewrite read_graft_line

2017-08-18 Thread Patryk Obara

Ah! I presumed two separate loops, one throwing away oids and second
one actually filling a table - this makes more sense. I was just about
to send v4, but will rewrite the last patch and we'll see how it looks
like.
-- 
| ← Ceci n'est pas une pipe
Patryk Obara

Re: [PATCH v4 2/3] builtin/branch: stop supporting the use of --set-upstream option

2017-08-18 Thread Martin Ågren

On 18 August 2017 at 18:30, Junio C Hamano  wrote:
> Kaartic Sivaraam  writes:
>
>> On Friday 18 August 2017 01:25 AM, Junio C Hamano wrote:
>>> Martin Ågren  writes:
>>>
 On 17 August 2017 at 04:54, Kaartic Sivaraam
  wrote:
> Helped-by: Martin Ågren ,  Junio C Hamano 
> 
> Signed-off-by: Kaartic Sivaraam 
 I didn't expect a "Helped-by", all I did was to give some random
 comments. :-) I'm not so sure about the comma-separation, that seems to
 be a first in the project.
>>> I didn't either ;-)
>>>
>>> The line looks odd so I'll remove it while queuing.
>>>
>>> Thanks for noticing.
>> I should have been better with my wordings :) How about converting that
>> line into two 'Suggestions-by:' or 'Reviewed-by:' ?
>
> I personally do not think either is needed for those small things we
> saw in the discussion.
>
> Unless Martin feels strongly about it, that is.

No, no strong feelings. Thanks.

Re: [PATCH 2/2] Documentation/git-for-each-ref: clarify peeling of tags for --format

2017-08-18 Thread Junio C Hamano

Michael J Gruber  writes:

> `*` in format strings means peeling of tag objects so that object field
> names refer to the object that the tag object points at, instead of the
> tag object itself.
>
> Currently, this is documented using grammar that is clearly inspired by
> classical latin, though missing more than an article in order to be
> classical english.

;-)

Thanks, both patches look good to me.

>
> Try and straighten that explanation out a bit.
>
> Signed-off-by: Michael J Gruber 
> ---
>  Documentation/git-for-each-ref.txt | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/git-for-each-ref.txt 
> b/Documentation/git-for-each-ref.txt
> index dac9138fab..bb370c9c7b 100644
> --- a/Documentation/git-for-each-ref.txt
> +++ b/Documentation/git-for-each-ref.txt
> @@ -41,8 +41,9 @@ OPTIONS
>   A string that interpolates `%(fieldname)` from a ref being shown
>   and the object it points at.  If `fieldname`
>   is prefixed with an asterisk (`*`) and the ref points
> - at a tag object, the value for the field in the object
> - tag refers is used.  When unspecified, defaults to
> + at a tag object, use the value for the field in the object
> + which the tag object refers to (instead of the field in the tag object).
> + When unspecified, `` defaults to
>   `%(objectname) SPC %(objecttype) TAB %(refname)`.
>   It also interpolates `%%` to `%`, and `%xx` where `xx`
>   are hex digits interpolates to character with hex code

Re: Submodule regression in 2.14?

2017-08-18 Thread Junio C Hamano

Stefan Beller  writes:

> On Thu, Aug 17, 2017 at 7:13 PM, Junio C Hamano  wrote:
>> Stefan Beller  writes:
>>
>>> Are you saying this might be a design mistake and
>>> the .update ought to be respected by all the other
>>> commands? For example
>>> git reset --recurse-submodules
>>> should ignore the .update= none?
>>
>> I have been under the impression that that has been the traditional
>> desire of what .update ought to mean.  I personally do not have a
>> strong opinion---at least not yet.
>
> In this context note v2.14.0-rc1-34-g7463e2ec3
> (bw/submodule-config-cleanup~7, "unpack-trees:
> don't respect submodule.update") that is going opposite of
> your impression.

Exactly.  We are in agreement that recent developments seem to go
against the traditional desire and it is understandable Lars sees
this as a regression.  I still do not have a strong opinion either
way, if this is a regression or a progress.

> Maybe, I'll think about it. However there is no such
> equivalent for trees (and AFAICT never came up) to
> treat a specific directory other than the rest in worktree
> operations.

I am not sure if I follow.  Submodules are not trees and one of the
reasons people may want to separate things into different modules is
so that they can treat them differently.  If submodules allow you
a richer set of operations than a tree that is part of a monolithic
project, is that necessarily a bad thing?

Re: [PATCH v3 4/4] commit: rewrite read_graft_line

2017-08-18 Thread Junio C Hamano

Patryk Obara  writes:

> Junio C Hamano  wrote:
>>
>> If I were doing the two-pass thing, I'd probably write a for loop
>> that runs exactly twice, where the first iteration parses into a
>> single throw-away oid struct only to count, and the second iteration
>> parses the same input into the allocated array of oid struct.  That
>> way, you do not have to worry about two phrases going out of sync.
>
> Two passes would still differ in error handling due to xmalloc between them…

I am not sure if I follow.  What I meant was something along these
lines:

struct commit_graft *graft = NULL;
char *line = ... what you read from the file ...;
int phase; /* phase #0 counts, phase #1 fills */

for (phase = 0; phase < 2; phase++) {
int count;
char *scan;
strucut object_id dummy_oid, *oid;

for (scan = line, count = 0;
 *scan;
 count++) {
oid = graft ? >parent[count] : _oid;
if (parse_oid_hex(scan, oid, ))
return error(...);
switch (*scan) {
case ' ':
scan++;
continue; /* there are more */
case '\0':
break; /* we are done */
default:
return error(...);
}
}

if (!graft)
graft = xmalloc(... with 'count' parentes ...);
}

/* now we have graft with parent[count] all filled */
return graft;

The inner for() loop will do the same parsing for both passes,
leaving little chance for programming errors to make the two passes
decide there are different number of fake parents.  I suspect I may
have botched some details in that loop, but both passes will even
share the same buggy counting when the code is structured that
way ;-)

That is what I meant by "not have to worry about two phases going
out of sync".

Re: [RFC PATCH 1/2] implement fetching of moved submodules

2017-08-18 Thread Heiko Voigt

On Thu, Aug 17, 2017 at 10:20:13AM -0700, Stefan Beller wrote:
> On Thu, Aug 17, 2017 at 3:53 AM, Heiko Voigt  wrote:
> > We store the changed submodules paths to calculate which submodule needs
> > fetching. This does not work for moved submodules since their paths do
> > not stay the same in case of a moved submodules. In case of new
> > submodules we do not have a path in the current checkout, since they
> > just appeared in this fetch.
> >
> > It is more general to collect the submodule names for changes instead of
> > their paths to include the above cases.
> >
> > With the change described above we implement 'on-demand' fetching of
> > changes in moved submodules.
> 
> This sounds as if this would also enable fetching new submodules
> eventually?

Yes that was the goal when starting with these changes back then. But it
took more time than I had back then. So instead of letting these changes
sit bitrot again lets see if we can get them integrated.

For new submodules we need to change the iteration somehow. Currently we
are iterating through the index. But new submodules obviously do not
have an index entry (otherwise they would not be new). So instead of the
index we will need to create another list that contains "all"
submodules. Maybe something like: all submodules from the index plus all
submodules that changed / are new? We could also go further and inspect
all submodules from all ref tips to handle submodules on other branches
configured to 'yes'. But I think we should leave that for later if need
arises.

Some merge of index and additional submodules is needed, because for
--recurse-submodules=yes or submodule..fetchRecurseSubmodules=yes
we always need to run fetch inside the submodule. That would break if we
only looked at submodules that are collected as changed.

> > Note: This does only work when repositories have a .gitmodules file. In
> > other words: It breaks if we do not get a name for a repository.
> > IIRC, consensus was that this is a requirement to get nice submodule
> > handling these days?
> 
> I think that should have been the consensus since ~1.7.8 (since the
> submodules git dir can live inside the superprojects
> /module/).

I agree but since we started without it, we kind of have a mixed state.

> A gitlink entry without corresponding .gitmodules entry is just a gitlink.
> If we happen to have a repository at that path of the gitlink, we can
> be nice and pretend like it is a functional submodule, but it really is
> not. It's just another repo inside the superproject that happen to live
> at the path of a gitlink.

Yeah but at the moment we are handling 'on-demand' fetches and stuff for
such just gitlink submodules. If we were firm on that requirement we
would just skip those but that is not the case with the current
implementation.

> > Signed-off-by: Heiko Voigt 
> > ---
> >
> > I updated the leftover code from my series implementing recursive fetch
> > for moved submodules[1] to the current master.
> >
> > This breaks t5531 and t5545 because they do not use a .gitmodules file.
> >
> > I also have some code leftover that does fallback on paths in case no
> > submodule names can be found. But I do not really like it. The question
> > here is how far do we support not using .gitmodules. Is it e.g.
> > reasonable to say: "For --recurse-submodules=on-demand you need a
> > .gitmodules file?"
> 
> I would not intentionally break users here, but any new functionality can
> safely assume (a) we have a proper .gitmodules entry or (b) it is not a
> submodule, so do nothing/be extra careful.
> 
> For example in recursive diff sort of makes sense to also handle
> non-submodule gitlinks, but fetch is harder to tell.

Well we have a few different cases for gitlinks without .gitmodule
entry:

 1. New gitlink: We can not handle since we do not know where to clone
 from.

 2. Removed gitlink: No need to do anything in fetch

 3. Changed (but same name) gitlink: We can / and currently do run fetch
 in it

 4. Renamed: We currently skip those. We could probably do something to
 track the rename and run fetch in case of gitlink changes.
 In my current approach only the ones with a name are
 handled.

So I guess I will add a fallback to paths for 3. so we do not
unnecessarily break users using the current implementation.

> (just last night I was rereading
> https://public-inbox.org/git/CAJo=hjvnapnaddcaawavu9c4rveqdos3ev9wtguhx4fd0v_...@mail.gmail.com/
> which I think is a super cute application of gitlinks. If you happen
> to checkout such
> a tree, you don't want to fetch all of the fake submodules)
> 
> >
> > [1] 
> > https://public-inbox.org/git/f5baa2acc09531a16f4f693eebbe60706bb8ed1e.1361751905.git.hvo...@hvoigt.net/
> 
> Oha, that is from way back in the time. :)

Yeah this code did go through some proper bitrotting :)

> >  submodule.c | 92 
>

Re: [PATCH v4 2/3] builtin/branch: stop supporting the use of --set-upstream option

2017-08-18 Thread Junio C Hamano

Kaartic Sivaraam  writes:

> On Monday 14 August 2017 02:20 PM, Kaartic Sivaraam wrote:
>>
>> On Wednesday 09 August 2017 12:03 AM, Martin Ågren wrote:
>>>
>>> Maybe the final note could be removed? Someone who is looking up
>>> --set-upstream because Git just "crashed" on them will only want to know
>>> what they should do instead. Our thoughts about the future are perhaps
>>> not that interesting.
>>
>> I thought it's better to document it to avoid people from getting 
> surprised
>> when the options *starts working* again.
>
> I hope that explains the reason.

That is something we can say we _actually_ repurpose the option.
Until then, it is merely noise that distracts the users.

Re: [PATCH v4 2/3] builtin/branch: stop supporting the use of --set-upstream option

2017-08-18 Thread Junio C Hamano

Kaartic Sivaraam  writes:

> On Friday 18 August 2017 01:25 AM, Junio C Hamano wrote:
>> Martin Ågren  writes:
>>
>>> On 17 August 2017 at 04:54, Kaartic Sivaraam
>>>  wrote:
 Helped-by: Martin Ågren ,  Junio C Hamano 
 
 Signed-off-by: Kaartic Sivaraam 
>>> I didn't expect a "Helped-by", all I did was to give some random
>>> comments. :-) I'm not so sure about the comma-separation, that seems to
>>> be a first in the project.
>> I didn't either ;-)
>>
>> The line looks odd so I'll remove it while queuing.
>>
>> Thanks for noticing.
> I should have been better with my wordings :) How about converting that
> line into two 'Suggestions-by:' or 'Reviewed-by:' ?

I personally do not think either is needed for those small things we
saw in the discussion.

Unless Martin feels strongly about it, that is.

Thanks.

Re: [RFC PATCH 2/2] submodule: simplify decision tree whether to or not to fetch

2017-08-18 Thread Heiko Voigt

On Thu, Aug 17, 2017 at 10:50:07AM -0700, Brandon Williams wrote:
> On 08/17, Stefan Beller wrote:
> > On Thu, Aug 17, 2017 at 4:00 AM, Heiko Voigt  wrote:
> > > To make extending this logic later easier.
> > >
> > > Signed-off-by: Heiko Voigt 
> > > ---
> > > I am quite sure I replicated the same logic but a few more eyes would be
> > > appreciated.
> > 
> > A code cleanup is appreciated!
> > 
> > I thought Brandon had a series in flight doing a very similar cleanup here,
> > but in master..pu there is nothing to be found.
> 
> Yeah there are 2 series in flight which will probably conflict here.
> bw/grep-recurse-submodules and bw/submodule-config-cleanup

Ok then I will wait until those are in and then see if I can base the
cleanup on top. I think it is only necessary as a preparation for the
fully fledged fetch configuration logic mess we will get into once we
get to the full recursive submodule fetch implementation. Not
necessarily needed for the moved submodules.

> > 
> > The code looks good to me.

Thanks.

Cheers Heiko

[PATCH 2/2] Documentation/git-for-each-ref: clarify peeling of tags for --format

2017-08-18 Thread Michael J Gruber

`*` in format strings means peeling of tag objects so that object field
names refer to the object that the tag object points at, instead of the
tag object itself.

Currently, this is documented using grammar that is clearly inspired by
classical latin, though missing more than an article in order to be
classical english.

Try and straighten that explanation out a bit.

Signed-off-by: Michael J Gruber 
---
 Documentation/git-for-each-ref.txt | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-for-each-ref.txt 
b/Documentation/git-for-each-ref.txt
index dac9138fab..bb370c9c7b 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -41,8 +41,9 @@ OPTIONS
A string that interpolates `%(fieldname)` from a ref being shown
and the object it points at.  If `fieldname`
is prefixed with an asterisk (`*`) and the ref points
-   at a tag object, the value for the field in the object
-   tag refers is used.  When unspecified, defaults to
+   at a tag object, use the value for the field in the object
+   which the tag object refers to (instead of the field in the tag object).
+   When unspecified, `` defaults to
`%(objectname) SPC %(objecttype) TAB %(refname)`.
It also interpolates `%%` to `%`, and `%xx` where `xx`
are hex digits interpolates to character with hex code
-- 
2.14.1.224.g15ee91971c

[PATCH 1/2] Documentation: use proper wording for ref format strings

2017-08-18 Thread Michael J Gruber

Various commands list refs and allow to use a format string for the
output that interpolates from the ref as well as the object it points
at (for-each-ref; branch and tag in list mode).

Currently, the documentation talks about interpolating from the object.
This is confusing because a ref points to an object but not vice versa,
so the object cannot possible know %(refname), for example. Thus, this is
wrong independent of refs being objects (one day, maybe) or not.

Change the wording to make this clearer (and distinguish it from formats
for the log family).

Signed-off-by: Michael J Gruber 
---
 Documentation/git-branch.txt   | 4 ++--
 Documentation/git-for-each-ref.txt | 4 ++--
 Documentation/git-tag.txt  | 4 ++--
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-branch.txt b/Documentation/git-branch.txt
index 81bd0a7b77..d0b3358771 100644
--- a/Documentation/git-branch.txt
+++ b/Documentation/git-branch.txt
@@ -267,8 +267,8 @@ start-point is either a local or remote-tracking branch.
Only list branches of the given object.
 
 --format ::
-   A string that interpolates `%(fieldname)` from the object
-   pointed at by a ref being shown.  The format is the same as
+   A string that interpolates `%(fieldname)` from a branch ref being shown
+   and the object it points at.  The format is the same as
that of linkgit:git-for-each-ref[1].
 
 Examples
diff --git a/Documentation/git-for-each-ref.txt 
b/Documentation/git-for-each-ref.txt
index cc42c12832..dac9138fab 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -38,8 +38,8 @@ OPTIONS
key.
 
 ::
-   A string that interpolates `%(fieldname)` from the
-   object pointed at by a ref being shown.  If `fieldname`
+   A string that interpolates `%(fieldname)` from a ref being shown
+   and the object it points at.  If `fieldname`
is prefixed with an asterisk (`*`) and the ref points
at a tag object, the value for the field in the object
tag refers is used.  When unspecified, defaults to
diff --git a/Documentation/git-tag.txt b/Documentation/git-tag.txt
index d97aad3439..543fb425ee 100644
--- a/Documentation/git-tag.txt
+++ b/Documentation/git-tag.txt
@@ -188,8 +188,8 @@ This option is only applicable when listing tags without 
annotation lines.
Defaults to HEAD.
 
 ::
-   A string that interpolates `%(fieldname)` from the object
-   pointed at by a ref being shown.  The format is the same as
+   A string that interpolates `%(fieldname)` from a tag ref being shown
+   and the object it points at.  The format is the same as
that of linkgit:git-for-each-ref[1].  When unspecified,
defaults to `%(refname:strip=2)`.
 
-- 
2.14.1.224.g15ee91971c

Re: [RFC PATCH] Updated "imported object" design

2017-08-18 Thread Ben Peart




On 8/17/2017 5:39 PM, Jonathan Tan wrote:

Thanks for your comments. I'll reply to both your e-mails in this one
e-mail.


This illustrates another place we need to resolve the
naming/vocabulary.  We should at least be consistent to make it easier
to discuss/explain.  We obviously went with "virtual" when building
GVFS but I'm OK with "lazy" as long as we're consistent.  Some
examples of how the naming can clarify or confuse:

'Promise-enable your repo by setting the "extensions.lazyObject" flag'

'Enable your repo to lazily fetch objects by setting the
"extensions.lazyObject"'

'Virtualize your repo by setting the "extensions.virtualize" flag'

We may want to carry the same name into the filename we use to mark
the (virtualized/lazy/promised/imported) objects.

(This reminds me that there are only 2 hard problems in computer
science...) ;)


Good point about the name. Maybe the 2nd one is the best? (Mainly
because I would expect a "virtualized" repo to have virtual refs too.)

But if there was a good way to refer to the "anti-projection" in a
virtualized system (that is, the "real" thing or "object" behind the
"virtual" thing or "image"), then maybe the "virtualized" language is
the best. (And I would gladly change - I'm having a hard time coming up
with a name for the "anti-projection" in the "lazy" language.)



The most common "anti-virtual" language I'm familiar with is "physical." 
 Virtual machine <-> physical machine. Virtual world <-> physical 
world. Virtual repo, commit, tree, blob - physical repo, commit, tree, 
blob. I'm not thrilled but I think it works...



Also, I should probably standardize on "lazily fetch" instead of "lazily
load". I didn't want to overlap with the existing fetching, but after
some thought, it's probably better to do that. The explanation would
thus be that you can either use the built-in Git fetcher (to be built,
although I have an old version here [1]) or supply a custom fetcher.

[1] https://github.com/jonathantanmy/git/commits/partialclone


I think this all works and would meet the requirements we've been
discussing.  The big trade off here vs what we first discussed with
promises is that we are generating the list of promises on the fly
when they are needed rather than downloading and maintaining a list
locally.

My biggest concern with this model is the cost of opening and parsing
every imported object (loose and pack for local and alternates) to
build the oidset of promises.

In fsck this probably won't be an issue as it already focuses on
correctness at the expense of speed.  I'm more worried about when we
add the same/similar logic into check_connected.  That impacts fetch,
clone, and receive_pack.

I guess the only way we can know for sure it to do a perf test and
measure the impact.


As for fetching from the main repo, the connectivity check does not need
to be performed at all because all objects are "imported", so the
performance of the connectivity check does not matter. Same for cloning.



Very good point! I got stuck on connectivity check in general forgetting 
that we really only need to prevent sharing a corrupt repo.


This is not true if you're fetching from another repo 


This isn't a case we've explicitly dealt with (multiple remotes into a 
virtualized repo).  Our behavior today would be that once you set the 
"virtual repo" flag on the repo (this happens at clone for us), all 
remotes are treated as virtual as well (ie we don't differentiate 
behavior based on which remote was used).  Our "custom fetcher" always 
uses "origin" and some custom settings for a cache-server saved in the 
.git/config file when asked to fetch missing objects.


This is probably a good model to stick with at least initially as trying 
to solve multiple possible "virtual" remotes as well as mingling 
virtualized and non-virtualized remotes and all the mixed cases that can 
come up makes my head hurt.  We should probably address that in a 
different thread. :)



or if you're using
receive-pack, but (1) I think these are not used as much in such a
situation, and (2) if you do use them, the slowness only "kicks in" if
you do not have the objects referred to (whether non-"imported" or
"imported") and thus have to check the references in all "imported"
objects.



Is there any case where receive-pack is used on the client side?  I'm 
only aware of it being used on the server side to receive packs pushed 
from the client.  If it is not used in a virtualized client, then we 
would not need to do anything different for receive-pack.



I think this topic should continue to move forward so that we can
provide reasonable connectivity tests for fsck and check_connected in
the face of partial clones.  I'm not sure the prototype implementation
of reading/parsing all imported objects to build the promised oidset is
the most performant model but we can continue to investigate the best
options.


Agreed - I think the most important thing here is settling on the API
(name of

Re: Submodule regression in 2.14?

2017-08-18 Thread Lars Schneider


> On 17 Aug 2017, at 23:55, Stefan Beller  wrote:
> 
> On Thu, Aug 17, 2017 at 2:21 PM, Lars Schneider
>  wrote:
>> 
>>> Oh, wait.
>>> $ git log --oneline v2.13.0..v2.14.1 -- builtin/pull.c
>>> c9c63ee558 Merge branch 'sb/pull-rebase-submodule'
>>> a6d7eb2c7a pull: optionally rebase submodules (remote submodule changes 
>>> only)
>>> could also be a culprit. Do you have pull.rebase set?
>> 
>> I bisected the problem today and "a6d7eb2c7a pull: optionally rebase 
>> submodules
>> (remote submodule changes only)" is indeed the culprit.
>> 
>> The commit seems to break the following test case.
>> 
>> diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh
>> index dcac364c5f..24f9729015 100755
>> --- a/t/t7400-submodule-basic.sh
>> +++ b/t/t7400-submodule-basic.sh
>> @@ -1289,4 +1289,19 @@ test_expect_success 'init properly sets the config' '
>>test_must_fail git -C multisuper_clone config --get 
>> submodule.sub1.active
>> '
>> 
>> +test_expect_success 'submodule update and git pull with disabled submodule' 
>> '
>> +   test_when_finished "rm -rf multisuper_clone" &&
>> +   pwd=$(pwd) &&
>> +   git clone file://"$pwd"/multisuper multisuper_clone &&
>> +   (
>> +   cd multisuper_clone &&
>> +   git config --local submodule.sub0.update none &&
>> +   git submodule update --init --recursive &&
>> +   git pull --recurse-submodules &&
>> +   git submodule status | cut -c 1,43- >actual
>> +   ) &&
>> +   ls &&
>> +   test_cmp expect multisuper_clone/actual
>> +'
> 
> Thanks for providing this test.
> 
> cd trash directory.t7400-submodule-basic/multisuper_clone
> cat .git/config
> [submodule "sub0"]
>  update = none
>  active = true
>  url = file:///.../t/trash directory.t7400-submodule-basic/sub1
> 
> 
> submodule..update
>The default update procedure for a submodule.
>This variable is populated by git submodule init
>from the gitmodules(5) file. See description of
>update command in git-submodule(1).
> 
> The first sentence of .update is misleading IMHO as the
> these settings should strictly apply to the "submodule update"
> command. So "pull --recurse-submodules" ought to ignore it,
> instead the pull can do whatever it wants, namely treat the
> submodule roughly like a tree and either merge/rebase
> inside the submodule as well. The user *asked* for recursive
> pull after all.
> 
> Are you saying this might be a design mistake and
> the .update ought to be respected by all the other
> commands? For example
>git reset --recurse-submodules
> should ignore the .update= none?
> 
> When designing these new recursive submodule functionality
> outside the "submodule" command, I'd want submodules
> to behave as much as possible like trees.

In the past "submodule..update=none" was an easy way
to selectively disable certain Submodules.

How would I do this with Git 2.14?

My gut feeling is that all commands should respect the
"submodule..update=none" setting.

- Lars

Re: Revision resolution for remote-helpers?

2017-08-18 Thread Jeff King

On Fri, Aug 18, 2017 at 03:42:08PM +0900, Mike Hommey wrote:

> I was thinking it could be useful to have a special syntax for revisions
> that would query a helper program. The helper program could use a
> similar protocol to that of the remote helpers.

That sounds like a reasonable thing to want.

> My thought is that a string like :: could be used
> wherever a committish is expected. That would call some helper
> and request to resolve revision, and the helper would provide a git
> commit as a response.

So I'm guessing this would look something like:

  git log svn::12345

I think even without Git support, you could do something like:

  git log $(git svn map 12345)

which is similarly complex in terms of concepts, and not too many more
characters. That would be a little more awkward outside of a shell,
though.

But it did get me wondering if we could do _better_ with something built
into Git. For example, could we have an external resolution helper that
resolves names to object ids as a fallback after internal resolution has
failed. And then you could do:

 git log 12345

and it would just work. Efficiency shouldn't be a big problem, because
we'd hit the helper only in the error case.

I'd be more concerned about awkward ambiguities, though. If mercurial is
also using sha1s, then there's nothing syntactic to differentiate the
two. For that matter, 12345 has the same problem, since it could be a
partial sha1.

It might work to actually check if we have the object and then bail
to the remote resolver only if we don't. But that's actually conflating
name resolution with object lookup, which our internals typically keep
separate.

So maybe this is a bad direction to go in. I'm mostly just thinking out
loud here.

> Which leads me to think some "virtual" ref namespace could be a solution
> to the problem. So instead of ::, the prefix would be /.
> For e.g. svn, svn/$remote/$rev would be a natural way to specify the
> revision for a given remote. For mercurial, hg/$sha1.

Interesting. I do like the general idea of having external helpers to
fill in bits of the virtual namespace. But it may also open many cans of
worms. :)

> Potentially, this could be a sort of pluggable ref stores, which could
> be used for extensions such as the currently discussed reftable.

The current pluggable code is definitely geared for build-time
pluggability, not runtime. But I think you could have a builtin
pluggable store that does the overlay, and then chains to another
backend. I.e., configure something like:

  [extensions]
  refBackend = externalOverlay

  [externalOverlay "svn"]
  namespace = refs/svn
  command = my-svn-mapper

  [externalOverlay]
  chain = reftable

That would allow the externalOverlay thing to develop independent of the
core of Git's refs code.

> On the opposite end of the problem, I'm also thinking about git log
> --decorate= displaying the mercurial revisions where branch
> decorations would normally go.

Interesting thought. I'm not sure if that would be a good thing or a bad
thing. But one of the virtual methods for pluggable backends is
"enumerate all refs". If you're mapping every mercurial revision, that's
going to be a lot of refs (and potentially a lot of overhead for certain
operations).

I think the decorate code just looks at a few parts of the refs
namespace right now (so a "refs/svn" would probably get ignored by
default).

-Peff

Re: Weirdness with git change detection

2017-08-18 Thread Jeff King

On Fri, Aug 18, 2017 at 08:56:03AM +0200, Michael J Gruber wrote:

> > The idea being that users could run "git lint" if they suspect something
> > funny is going on. I dunno. It may be a dead-end. Most such
> > oddities are better detected and handled during actual git operations if
> > we can. So this would really just be for things that are too expensive
> > to detect in normal operations.
> 
> Typically, that problem arises when you turn a filter on or off at some
> point in your history. Since "attributes" can come from various sources,
> especially the versioned ".gitattributes" file, unversioned per-repo
> .git/info/attributes, and global attributes, "git diff" may apply
> different attributes depending on what you diff (versioned blob, workdir
> file, out-of-tree file).
> [...]

Yeah, I agree that we cannot catch every problematic case (for all the
reason you give here). It does seem like a large number of problems
people report have to do with checkout of in-tree .gitattributes,
though. So my thinking was if we could cover that case, it might help
people (even if we leave many problems unnoticed).

But...

> I've found that when I decide to use a filter like that, the best
> approach is to either apply it retroactively (filter-branch,
> unversionsed attributes, that is clean all stored blobs) or make a
> commit where I specifically note the switch (versioned .gitattributes
> plus affected blob changes) and what config should go along with it.

One problem is that people need to know to run the lint command. And if
they know enough that this is a problem worthy of checking via a linter,
then they could perhaps just as easily do the in-tree blob changes.

I say "perhaps" because I don't think it's as easy as running a single
"git fix-my-stale-blobs". I wonder if rather than a linter, we ought to
just have an option to "git checkout" or something to ignore stat data
and re-checkout any entries for which convert_to_working_tree() isn't a
noop.

That serves both as a repair tool and as a linter (since running "git
diff" on the result would show you what needs to be fixed). It wouldn't
solve the user-education problem, but at least it would give a simple
solution that could be passed along to users.

I dunno. I don't do line ending conversion, so I don't really run into
this issue myself.

-Peff

Re: [PATCH v3 2/4] commit: replace the raw buffer with strbuf in read_graft_line

2017-08-18 Thread Jeff King

On Fri, Aug 18, 2017 at 12:12:37PM +0200, Patryk Obara wrote:

> Jeff King  wrote:
> 
> > AFAICT this is only here to avoid having to s/buf/line->buf/ in the rest
> > of the function. But I think we should just make that change (you
> > already did in some of the spots). And IMHO we should do the same for
> > line->len. When there are two names for the same value, it increases the
> > chances of a bug where the two end up diverging.
> 
> My motivation was rather to keep patch(es) as small as possible because every
> line using buf will be replaced in a later patch in series. But it will make
> commit better (it will stand on its own), so why not to do it? :)

Ah, I didn't notice those lines went away. That does make it less bad,
but I do think it's easier to review if each commit stands on its own.

In some cases, if it's really painful to do the intermediate cleanup, I
might say something in the commit message like "this leaves X that is
not ideal, but we'll be getting rid of it soon anyway". But in this case
I think just creating that intermediate state is simple enough.

> Ah, I only replaced comparison to NULL terminator with length check because
> I thought it better shows intention of the code and I didn't notice, that
> reversing order will result in better code overall.
> 
> I will include both changes in v4.

Thanks.

-Peff

Re: [PATCH v3 4/4] commit: rewrite read_graft_line

2017-08-18 Thread Jeff King

On Fri, Aug 18, 2017 at 01:30:23PM +0200, Patryk Obara wrote:

> > We'd reject such an input totally (though as an interesting side effect,
> > you can convince the parser to allocate 20x as much RAM as you send it;
> > one oid for each space).
> 
> Grafts are not populated during clone operation, so it really would be user
> making his life miserable. I could allocate FLEXI_ARRAY of size
> min(n, line->len / (GIT_*MIN*_HEXSZ+1)) instead… but I think it's not even
> worth the cost of making the code more complicated (and I don't want
> to reintroduce these size macros in here.
> 
> We _could_ put an artificial limit on graft parents, though (e.g. 10) and
> display an error message urging the user to stop using grafts?

Yeah, sorry, I should have made more clear that this is fine. I always
try to read parsing code with my paranoid hat on, but I agree that
grafts aren't really exposed to untrusted entities.

In general I'd prefer to avoid artificial limits unless there's a need
for them. There are already spots (like receive-pack) where you can ask
Git to store bytes in RAM as fast as you can send them. What I found
interesting about this one was the 20:1 amplification. :)

> Before sending v3 I tried two other alternative implementations (perhaps I
> should've listed them in the v3 cover letter):

It might even be worth listing them in the commit message. Somebody
finding your commit 3 years from via "git log -S" or "git blame" might
say "yes, but why didn't they just do it like...". You can respond to
them preemptively. :)

-Peff

Re: [PATCH v3 4/4] commit: rewrite read_graft_line

2017-08-18 Thread Patryk Obara

Jeff King  wrote:
>
> So we're probably fine. The two parsing passes are right next to each
> other and are sufficiently simple and strict that we don't have to
> worry about them diverging.

That was my conclusion as well. I added comment before the first pass and
avoided any "cleverness" to make it perfectly clear to a reader.

> We'd reject such an input totally (though as an interesting side effect,
> you can convince the parser to allocate 20x as much RAM as you send it;
> one oid for each space).

Grafts are not populated during clone operation, so it really would be user
making his life miserable. I could allocate FLEXI_ARRAY of size
min(n, line->len / (GIT_*MIN*_HEXSZ+1)) instead… but I think it's not even
worth the cost of making the code more complicated (and I don't want
to reintroduce these size macros in here.

We _could_ put an artificial limit on graft parents, though (e.g. 10) and
display an error message urging the user to stop using grafts?

> The single-pass alternative would probably be to read into a dynamic
> structure like an oid_array, and then copy the result into the flex
> structure.

Before sending v3 I tried two other alternative implementations (perhaps I
should've listed them in the v3 cover letter):

  1. Using string_list_split_in_place. I resigned from this approach as soon
 as I noticed, that line->buf needs to be preserved for possible
 error message. string_list_split would have no benefits over using
 oid_array.

  2. Parsing into temporary oid_array and then copying memory to FLEXI_ARRAY.
 Throw-away oid_array still needs to be cleaned, which means we have
 new/different return path (one before xmalloc and one after xmalloc),
 which means "bad_graft_data" label needs to be changed into "cleanup"
 label (or removed), which means error description needs be conditionally
 put in earlier code… and at this point, I decided these changes are not
 making code cleaner nor more readable at all :)

Junio C Hamano  wrote:
>
> If I were doing the two-pass thing, I'd probably write a for loop
> that runs exactly twice, where the first iteration parses into a
> single throw-away oid struct only to count, and the second iteration
> parses the same input into the allocated array of oid struct.  That
> way, you do not have to worry about two phrases going out of sync.

Two passes would still differ in error handling due to xmalloc between them…

-- 
| ← Ceci n'est pas une pipe
Patryk Obara

Re: [PATCH v3 2/4] commit: replace the raw buffer with strbuf in read_graft_line

2017-08-18 Thread Patryk Obara

Jeff King  wrote:

> AFAICT this is only here to avoid having to s/buf/line->buf/ in the rest
> of the function. But I think we should just make that change (you
> already did in some of the spots). And IMHO we should do the same for
> line->len. When there are two names for the same value, it increases the
> chances of a bug where the two end up diverging.

My motivation was rather to keep patch(es) as small as possible because every
line using buf will be replaced in a later patch in series. But it will make
commit better (it will stand on its own), so why not to do it? :)

> (…) I think short-circuiting like:
>
>   if (!line->len || line->buf[0] == '#')
>
> is better (I also think "!" instead of "== 0" is our usual style, but
> that's much less important).

Ah, I only replaced comparison to NULL terminator with length check because
I thought it better shows intention of the code and I didn't notice, that
reversing order will result in better code overall.

I will include both changes in v4.

-- 
| ← Ceci n'est pas une pipe
Patryk Obara

Re: reftable [v6]: new ref storage format

2017-08-18 Thread Michael Haggerty

On Wed, Aug 16, 2017 at 12:47 AM, Shawn Pearce  wrote:
> On Mon, Aug 14, 2017 at 5:13 AM, Michael Haggerty  
> wrote:
>> On 08/07/2017 03:47 AM, Shawn Pearce wrote:
>>> 6th iteration of the reftable storage format.
> [...]
>>>  index record
>>>
>>> An index record describes the last entry in another block.
>>> Index records are written as:
>>>
>>> varint( prefix_length )
>>> varint( (suffix_length << 3) | 0 )
>>> suffix
>>> varint( block_position )
>>>
>>> Index records use prefix compression exactly like `ref_record`.
>>>
>>> Index records store `block_position` after the suffix, specifying the
>>> absolute position in bytes (from the start of the file) of the block
>>> that ends with this reference.
>>
>> Is there a reason that the index lists the *last* refname that is
>> contained in a block rather than the *first* refname? I can't think of a
>> reason to choose one vs. the other, but your choice was initially
>> surprising. I don't think it matters either way; I was just curious.
>
> Yes, there is a reason. When a reader is searching the index block and
> discovers a key that is greater than their search needle, they are now
> sitting on a record with the block_position for that greater key. By
> using the *last* refname the current block_position is the one to seek
> to.
>
> If instead we used *first* refname, the reader would now have to
> backtrack to the prior index record to get the block_position out of
> that record. Or it has to keep a running "prior_position" local
> variable.
>
> Using last simplifies the reader's code.

Ah, OK. I was thinking of this as being a binary search, in which case
you *must* see both bracketing records before you are done, and the
chances are 50-50 which one you see first. But this search is a little
bit different, because the index records within a restart block have
to be scanned linearly. So it is much more likely that you see the
"before" record followed by the "after" record.

Thanks for the explanation.

Michael

Re: [PATCH] check-ref-format: require a repository for --branch

2017-08-18 Thread Junio C Hamano

Jeff King  writes:

> I _do_ think it's a misfeature to put it in check-ref-format. It should
> be part of rev-parse. Which admittedly is a kitchen sink, but this kind
> of resolving is one of the main things it should be doing. And in fact
> you can already do:
>
>   git rev-parse --symbolic-full-name @{-1}
>
> But I stopped short of suggesting we remove the feature entirely. It
> would obviously require a deprecation period.

Yeah, I realize that "nonsense" was a bit too strong.  I do agree
that it was a misfeature to place in "check" ref-format, though.

Thanks.

Re: [PATCH v3 4/4] commit: rewrite read_graft_line

2017-08-18 Thread Junio C Hamano

Jeff King  writes:

> So we're probably fine. The two parsing passes are right next to each
> other and are sufficiently simple and strict that we don't have to
> worry about them diverging.

If I were doing the two-pass thing, I'd probably write a for loop
that runs exactly twice, where the first iteration parses into a
single throw-away oid struct only to count, and the second iteration
parses the same input into the allocated array of oid struct.  That
way, you do not have to worry about two phrases going out of sync.

Re: [Patch size_t V3 00/19] use size_t

2017-08-18 Thread Martin Koegler

On Thu, Aug 17, 2017 at 10:35:54PM +0200, Torsten Bögershausen wrote:
> On Wed, Aug 16, 2017 at 10:16:12PM +0200, Martin Koegler wrote:
> > From: Martin Koegler 
> > 
> > This patchset is for next [24db08a6e8fed761d3bace7f2d5997806e20b9f7].
> 
> I applied it succesfully, and run the test suite on a 32 bit system.
> The Laptop reported one brekage in t0040:
> t0040-parse-options.sh   not ok 19 - OPT_MAGNITUDE() 3giga

I missed test-parse-options.c - I'll post an updated versions considering all 
comments.

parse-options takes the variable as void* - so the compilers also fails to 
recognize the usage
of incompatible types.
 
> Beside some t5561-http-backend.sh (which are most probably not caused
> by this patch.
> 
> The raspi had 2 deadlocks, like "git push --signed dst noop ff +noff"
> or
> trash directory.t5541-http-push-smart/gpghome --sign -u commit...@example.com
> Even this most probably not caused by the patch. (and the test is still 
> running)
> 
> Well done, and on which platforms did you test ?

Linux 64. 

Regards,
Martin

Revision resolution for remote-helpers?

2017-08-18 Thread Mike Hommey

Hi,

As you might remember, I'm maintaining a remote helper that allows to
talk directly to mercurial servers with git: git-cinnabar.

When dealing with "foreign (non-git) repositories", it is often
necessary to refer to revisions with their native name. With mercurial,
that's a sha1, with svn it's a revision number, etc.

Git-cinnabar does provide a helper command that gives back the git
commit from the mercurial revision (and vice versa), but that's
cumbersome to use in commands.

I was thinking it could be useful to have a special syntax for revisions
that would query a helper program. The helper program could use a
similar protocol to that of the remote helpers.

My thought is that a string like :: could be used
wherever a committish is expected. That would call some helper
and request to resolve revision, and the helper would provide a git
commit as a response.

The reason for the :: prefix is that it matches the ::
prefix used for remote helpers.

Now, there are a few caveats:
- , for e.g. svn, pretty much would depend on the remote.
  OTOH, in mercurial, it doesn't. I think it would be fair for the
  helper to have to deal with making what appears after :: relevant
  to find the right revision, by possibly including a remote name.
- msys likes to completely fuck up command lines when they contain ::.
  For remote helpers, the alternative that works is
  :///etc.

Which leads me to think some "virtual" ref namespace could be a solution
to the problem. So instead of ::, the prefix would be /.
For e.g. svn, svn/$remote/$rev would be a natural way to specify the
revision for a given remote. For mercurial, hg/$sha1.

Potentially, this could be a sort of pluggable ref stores, which could
be used for extensions such as the currently discussed reftable.

On the opposite end of the problem, I'm also thinking about git log
--decorate= displaying the mercurial revisions where branch
decorations would normally go.

I have no patch to show for it. Those are ideas that I first want to
discuss before implementing anything.

Thoughts?

Mike

Re: Weirdness with git change detection

2017-08-18 Thread Michael J Gruber

Jeff King venit, vidit, dixit 11.07.2017 10:24:
> On Tue, Jul 11, 2017 at 10:20:43AM +0200, Torsten Bögershausen wrote:
> 
>>> No problem. I actually think it would be interesting if Git could
>>> somehow detect and warn about this situation. But the obvious way to do
>>> that would be to re-run the clean filter directly after checkout. And
>>> doing that all the time is expensive.
>>
>> Would it be possible to limit the round-trip-check to "git reset --hard" ?
>> If yes, possibly many users are willing to pay the price, if Git tells
>> the "your filters don't round-trip". (Including CRLF conversions)
> 
> Anything's possible, I suppose. But I don't think I'd want that feature
> turned on myself.
> 
>>> Perhaps some kind of "lint" program would be interesting to warn of
>>> possible misconfigurations. Of course people would have to run it for it
>>> to be useful. :)
>>
>> What do you have in mind here ?
>> Don't we need to run some content through the filter(s)?
> 
> I was thinking of a tool that could run a series of checks on the
> repository and nag about potential problems. One of them could be doing
> a round-trip repo->clean->smudge for each file.
> 
> Another one might be warning about files that differ only in case.
> 
> The idea being that users could run "git lint" if they suspect something
> funny is going on. I dunno. It may be a dead-end. Most such
> oddities are better detected and handled during actual git operations if
> we can. So this would really just be for things that are too expensive
> to detect in normal operations.
> 
> -Peff
> 

Typically, that problem arises when you turn a filter on or off at some
point in your history. Since "attributes" can come from various sources,
especially the versioned ".gitattributes" file, unversioned per-repo
.git/info/attributes, and global attributes, "git diff" may apply
different attributes depending on what you diff (versioned blob, workdir
file, out-of-tree file).

This is not made easier by the fact that unversioned config (per repo,
per user, global) defines the filter action, and that even upgrades of
your filter tools may change the output. So, "filter off/on" is by no
means the only possible source of discrepancies.

I've found that when I decide to use a filter like that, the best
approach is to either apply it retroactively (filter-branch,
unversionsed attributes, that is clean all stored blobs) or make a
commit where I specifically note the switch (versioned .gitattributes
plus affected blob changes) and what config should go along with it.

All of this is difficult to check or correct automatically, since it
depends on user decisions.

About the only thing we could do is checking that
"clean(smudge(foo))=clean(foo)" at a specific "point in time"
(attributes, config) for specific foo, but that wouldn't catch the case
above, even if we iterated over all commits which affect files that the
filter (currently) applies to.

Keep in mind that filters are a killer feature, so if you shoot yourself
in the foot: it could have come worse ;)

Michael

Re: ignoring extra bitmap file?

2017-08-18 Thread Jeff King

On Thu, Aug 17, 2017 at 09:24:36PM +0200, Andreas Krey wrote:

> I'm seeing the message
> 
>remote: warning: ignoring extra bitmap file: 
> ./objects/pack/pack-2943dc24pack
> 
> and indeed, there is such a thing (two, actually):

Only one is the extra. :) The other is doing something useful.

Basically, the bitmap code was written to a handle a single bitmap file.
It would be possible to handle multiple, but it simplified the
implementation greatly to only handle one. And in practice, since a
bitmap can only be made for a pack which contains all of the reachable
objects, you'd have only one bitmap per repo, for the one big "main"
pack.

> But it looks like something went wrong in that repack cycle (that
> pack-2943dc247702 is the full repo), and it won't get removed later
> in the next repack in the evening.

Yes, it looks like you got a full repack that failed to remove the old
pack. Or more likely you had two full repacks racing with each other,
each creating a new big pack.

So the extra bitmap here is harmless. Both of them contain more or less
the same data, and whichever one we use will be fine (and remember that
the .bitmap files are purely an optimization, so that "more or less"
will only make a minor impact on the speed of operations, not on the
output).

> Question: Can I safely remove the .bitmap file, and repack will then
> clean up the .pack and .idx files as will?

Yes, it's always safe to remove a .bitmap file (though if you remove the
last one, you may expect performance to drop for some operations).

Whether there's a .bitmap doesn't impact whether .pack and .idx files
are deleted. The next full repack would pack everything into a new big
pack, and then delete any existing files, including .pack, .idx, and
.bitmap.

-Peff

Re: [PATCH v3 4/4] commit: rewrite read_graft_line

2017-08-18 Thread Jeff King

On Fri, Aug 18, 2017 at 03:59:38AM +0200, Patryk Obara wrote:

> Determine the number of object_id's to parse in a single graft line by
> counting separators (whitespace characters) instead of dividing by
> length of hash representation.
> 
> This way graft parsing code can support different sizes of hashes
> without any further code adaptations.

Sounds like a reasonable approach, though I wonder what happens if our
counting pass differs in its behavior from the actual parse.

E.g., here:

> + /* count number of blanks to determine size of array to allocate */
> + for (i = 0, n = 0; i < line->len; i++)
> + if (isspace(line->buf[i]))
> + n++;

If we see multiple spaces like "1234abcd 5678abcd" we'll allocate a
slot for each. I think that's OK because here:

> + for (i = 0; i < graft->nr_parent; i++)
> + if (!isspace(*tail++) || parse_oid_hex(tail, >parent[i], 
> ))
>   goto bad_graft_data;

We'd reject such an input totally (though as an interesting side effect,
you can convince the parser to allocate 20x as much RAM as you send it;
one oid for each space).

So we're probably fine. The two parsing passes are right next to each
other and are sufficiently simple and strict that we don't have to
worry about them diverging.

The single-pass alternative would probably be to read into a dynamic
structure like an oid_array, and then copy the result into the flex
structure.

Or of course to stop using a flex structure, as your original pass did.
I agree with Junio that the use of object_id's is orthogonal to using a
FLEX_ARRAY. But I could also see an argument that the complexity the
flex array adds here isn't worth the savings. The main benefits of a
flex array are:

  1. Less memory used. But we don't expect to see a large enough number
 of grafts for this to matter.

  2. A more compact memory representation, which can be faster. But
 accessing the parent list of a graft isn't going to be the hot code
 path. It probably only happens once in a program run when we
 rewrite the parents (versus checking the grafted commit, which is
 looked up once per commit we access).

  3. It's easier to free the struct and its associated resources in a
 single free(). But we never free the graft list.

Reading your original, my thought was "why _not_ keep doing it as a
FLEX_ARRAY, as it saves a little memory, which can't hurt". But seeing
this pre-counting phase, I think it does make the code a little more
complicated. But I'd be OK with doing it either way.

-Peff

Dear Talented

2017-08-18 Thread Kim Sharma

Dear Talented,

I am Talent Scout For BLUE SKY FILM STUDIO, Present Blue sky Studio a
Film Corporation Located in the United State, is Soliciting for the
Right to use Your Photo/Face and Personality as One of the Semi -Major
Role/ Character in our Upcoming ANIMATED Stereoscope 3D Movie-The Story
of Anubis (Anubis 2018) The Movie is Currently Filming (In
Production) Please Note That There Will Be No Auditions, Traveling or
Any Special / Professional Acting Skills, Since the Production of This
Movie Will Be Done with our State of Art Computer -Generating Imagery
Equipment. We Are Prepared to Pay the Total Sum of $620,000.00 USD. For
More Information/Understanding, Please Write us on the E-Mail Below.
CONTACT EMAIL: bluesky.filmstu...@usa.com
All Reply to: bluesky.filmstu...@usa.com
Note: Only the Response send to this mail will be Given a Prior
Consideration.


Talent Scout
Kim Sharma

Re: [PATCH v3 2/4] commit: replace the raw buffer with strbuf in read_graft_line

2017-08-18 Thread Jeff King

On Fri, Aug 18, 2017 at 03:59:36AM +0200, Patryk Obara wrote:

> diff --git a/commit.c b/commit.c
> index 8b28415..019e733 100644
> --- a/commit.c
> +++ b/commit.c
> @@ -134,17 +134,18 @@ int register_commit_graft(struct commit_graft *graft, 
> int ignore_dups)
>   return 0;
>  }
>  
> -struct commit_graft *read_graft_line(char *buf, int len)
> +struct commit_graft *read_graft_line(struct strbuf *line)
>  {
>   /* The format is just "Commit Parent1 Parent2 ...\n" */
> - int i;
> + int i, len;
> + char *buf = line->buf;

Copying a pointer to a strbuf's buffer is a dangerous habit. The strbuf
is free to re-allocate the buffer under the hood during any operation it
likes, potentially leaving you pointing to freed memory.

In this case it's OK because the only function you call is
strbuf_rtrim(), which never reallocates. But I feel like this is setting
up a maintenance trap for the next person to touch the function.

AFAICT this is only here to avoid having to s/buf/line->buf/ in the rest
of the function. But I think we should just make that change (you
already did in some of the spots). And IMHO we should do the same for
line->len. When there are two names for the same value, it increases the
chances of a bug where the two end up diverging.

> - while (len && isspace(buf[len-1]))
> - buf[--len] = '\0';
> - if (buf[0] == '#' || buf[0] == '\0')
> + strbuf_rtrim(line);
> + if (line->buf[0] == '#' || line->len == 0)
>   return NULL;

I find it funny to look at line->buf[0] before line->len, because it
means we're reading pas the end of the buffer. It's OK here because we
know there's a NUL terminator, but I think short-circuiting like:

  if (!line->len || line->buf[0] == '#')

is better (I also think "!" instead of "== 0" is our usual style, but
that's much less important).

-Peff

Re: [PATCH] check-ref-format: require a repository for --branch

2017-08-18 Thread Jeff King

On Thu, Aug 17, 2017 at 02:30:53PM -0700, Junio C Hamano wrote:

> > I'm not sure I buy that. What does "turning it into a branch name" even
> > mean when you are not in a repository? Clearly @{-1} must fail. And
> > everything else is just going to output the exact input you provided.
> 
> This "just going to output the exact input" is not entirely correct;
> there is just one use case for it.
> 
> "git check-ref-format --branch a..b" would fail with a helpful error
> message, while the command run with "a.b" would tell you the name is
> safe.

Well, yes. It's checking the syntax, as well. But you don't need
--branch for that.

> Use of 'check-ref-format --branch "@{-1}"' *IS* a nonsense, whether
> it is inside or outside a repository; it is OK to fail it outside a
> repository and I would say it is even OK to fail it inside a
> repository.  After all "check-ref-format" is about checking if the
> string is a sensible name to use.
> 
> I think calling interpret_branch_name() in the codepath is a mistake
> and we should instead report if "refs/heads/@{-1}" literally is a
> valid refname we could create instead.

I don't think it's nonsense. It's the reason the feature was added in
the first place. See your own a31dca0393 (check-ref-format --branch:
give Porcelain a way to grok branch shorthand, 2009-03-21). Without
that interpretation, it does nothing that you could not equally well do
as:

  git check-ref-format refs/heads/$name

I _do_ think it's a misfeature to put it in check-ref-format. It should
be part of rev-parse. Which admittedly is a kitchen sink, but this kind
of resolving is one of the main things it should be doing. And in fact
you can already do:

  git rev-parse --symbolic-full-name @{-1}

But I stopped short of suggesting we remove the feature entirely. It
would obviously require a deprecation period.

-Peff

88 matches

Mail list logo