Re: [PATCH 1/2] remote: defer repacking packed-refs when deleting refs

2014-05-24 Thread Jens Lindström
On Fri, May 23, 2014 at 7:09 PM, Junio C Hamano  wrote:
> Jens Lindström  writes:
>> One additional change was required in
>> builtin/remote.c:remove_branches().  It used to pass in the expected
>> SHA-1 of the ref to delete_ref(), which only works if the ref exists.
>> If repack_without_refs() is called first, most refs won't exist,...
>
> Why?  A ref, before you start pruning or removing a remote, could be
> in one of these three states.
>
>  (a) only loose refs/remotes/them/frotz exists
>  (b) both loose and packed refs/remotes/them/nitfol exist
>  (c) only packed refs/remotes/them/xyzzy exists
>
> And then you repack packed-refs file without these three refs.  When
> you do so, you know that you would need to remove frotz and nitfol,
> and also you know you do not want to call delete_ref for xyzzy, no?
>
> In other words, the problem you are describing in remove_branches(),
> that it wants to make sure that the ref-to-be-removed still points
> at the expected object, does not sound like it is doing anything
> inherently wrong---rather, it sounds like you didn't make necessary
> changes to the caller to make sure that you do not call delete_ref()
> on something you know you removed already.
>
> Puzzled

There is one reason why one would want to call delete_ref() even if
the ref itself was already fully deleted by repack_without_refs()
(because it was only packed) and that is that delete_ref() also
removes the ref log, if there is one.  We could refactor the deletion
to

  1) repack_without_refs() on all refs
  2) delete_ref() on still existing (loose) refs
  3) delete_ref_log() on all refs

to let us only call delete_ref() on existing refs, and then keep the
current value check.

Personally I don't feel that we're solving an important problem here.
Making sure not to overwrite a ref that has been updated since we read
its value is of course of great value for 'receive-pack' and 'commit'
and such, but in this case we're removing a remote and all its
remote-tracking branches and configuration.  If a remote-tracking
branch is updated concurrently, the current value check would fail,
and the remote configuration and that one branch would remain.  But if
the update had happened just an instant earlier, just before we
started removing the remote, we would have happily deleted that
remote-tracking branch as well, throwing away whatever information the
update stored in it.

/ Jens
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/3] remote prune: optimize "dangling symref" check/warning

2014-05-23 Thread Jens Lindström
When 'git remote prune' was used to delete many refs in a repository
with many refs, a lot of time was spent checking for (now) dangling
symbolic refs pointing to the deleted ref, since warn_dangling_symref()
was once per deleted ref to check all other refs in the repository.

Avoid this using the new warn_dangling_symrefs() function which
makes one pass over all refs and checks for all the deleted refs in
one go, after they have all been deleted.

Signed-off-by: Jens Lindström 
---
 builtin/remote.c |  7 ++-
 refs.c   | 19 ++-
 refs.h   |  1 +
 3 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/builtin/remote.c b/builtin/remote.c
index d33abe6..9b3e368 100644
--- a/builtin/remote.c
+++ b/builtin/remote.c
@@ -1313,6 +1313,7 @@ static int prune_remote(const char *remote, int dry_run)
 {
int result = 0, i;
struct ref_states states;
+   struct string_list delete_refs_list = STRING_LIST_INIT_NODUP;
const char **delete_refs;
const char *dangling_msg = dry_run
? _(" %s will become dangling!")
@@ -1339,6 +1340,8 @@ static int prune_remote(const char *remote, int dry_run)
for (i = 0; i < states.stale.nr; i++) {
const char *refname = states.stale.items[i].util;
 
+   string_list_insert(&delete_refs_list, refname);
+
if (!dry_run)
result |= delete_ref(refname, NULL, 0);
 
@@ -1348,9 +1351,11 @@ static int prune_remote(const char *remote, int dry_run)
else
printf_ln(_(" * [pruned] %s"),
   abbrev_ref(refname, "refs/remotes/"));
-   warn_dangling_symref(stdout, dangling_msg, refname);
}
 
+   warn_dangling_symrefs(stdout, dangling_msg, &delete_refs_list);
+   string_list_clear(&delete_refs_list, 0);
+
free_remote_ref_states(&states);
return result;
 }
diff --git a/refs.c b/refs.c
index 262c1c2..59fb700 100644
--- a/refs.c
+++ b/refs.c
@@ -1611,6 +1611,7 @@ int peel_ref(const char *refname, unsigned char *sha1)
 struct warn_if_dangling_data {
FILE *fp;
const char *refname;
+   const struct string_list *refnames;
const char *msg_fmt;
 };
 
@@ -1625,8 +1626,12 @@ static int warn_if_dangling_symref(const char *refname, 
const unsigned char *sha
return 0;
 
resolves_to = resolve_ref_unsafe(refname, junk, 0, NULL);
-   if (!resolves_to || strcmp(resolves_to, d->refname))
+   if (!resolves_to
+   || (d->refname
+   ? strcmp(resolves_to, d->refname)
+   : !string_list_has_string(d->refnames, resolves_to))) {
return 0;
+   }
 
fprintf(d->fp, d->msg_fmt, refname);
fputc('\n', d->fp);
@@ -1639,6 +1644,18 @@ void warn_dangling_symref(FILE *fp, const char *msg_fmt, 
const char *refname)
 
data.fp = fp;
data.refname = refname;
+   data.refnames = NULL;
+   data.msg_fmt = msg_fmt;
+   for_each_rawref(warn_if_dangling_symref, &data);
+}
+
+void warn_dangling_symrefs(FILE *fp, const char *msg_fmt, const struct 
string_list *refnames)
+{
+   struct warn_if_dangling_data data;
+
+   data.fp = fp;
+   data.refname = NULL;
+   data.refnames = refnames;
data.msg_fmt = msg_fmt;
for_each_rawref(warn_if_dangling_symref, &data);
 }
diff --git a/refs.h b/refs.h
index f287c7a..1440acc 100644
--- a/refs.h
+++ b/refs.h
@@ -89,6 +89,7 @@ static inline const char *has_glob_specials(const char 
*pattern)
 extern int for_each_rawref(each_ref_fn, void *);
 
 extern void warn_dangling_symref(FILE *fp, const char *msg_fmt, const char 
*refname);
+extern void warn_dangling_symrefs(FILE *fp, const char *msg_fmt, const struct 
string_list* refnames);
 
 /*
  * Lock the packed-refs file for writing.  Flags is passed to
-- 
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/3] remote: repack packed-refs once when deleting multiple refs

2014-05-23 Thread Jens Lindström
When 'git remote rm' or 'git remote prune' were used in a repository
with many refs, and needed to delete many remote-tracking refs, a lot
of time was spent deleting those refs since for each deleted ref,
repack_without_refs() was called to rewrite packed-refs without just
that deleted ref.

To avoid this, call repack_without_refs() first to repack without all
the refs that will be deleted, before calling delete_ref() to delete
each one completely.  The call to repack_without_ref() in delete_ref()
then becomes a no-op, since packed-refs already won't contain any of
the deleted refs.

Signed-off-by: Jens Lindström 
---
Note: remove_branches() no longer checks that the remote-tracking
branches it deletes point at the right object before deleting them
by passing the expected SHA-1 to delete_ref().  This was a required
change since all packed refs have been deleted already by the time
we call delete_ref(), which causes delete_ref() to fail if given an
expected SHA-1 to check.  'remote prune' already behaved this way.

 builtin/remote.c | 20 ++--
 refs.c   |  2 +-
 refs.h   |  2 ++
 3 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/builtin/remote.c b/builtin/remote.c
index 84802cd..d33abe6 100644
--- a/builtin/remote.c
+++ b/builtin/remote.c
@@ -749,15 +749,23 @@ static int mv(int argc, const char **argv)
 
 static int remove_branches(struct string_list *branches)
 {
+   const char **branch_names;
int i, result = 0;
+
+   branch_names = xmalloc(branches->nr * sizeof(*branch_names));
+   for (i = 0; i < branches->nr; i++)
+   branch_names[i] = branches->items[i].string;
+   result |= repack_without_refs(branch_names, branches->nr);
+   free(branch_names);
+
for (i = 0; i < branches->nr; i++) {
struct string_list_item *item = branches->items + i;
const char *refname = item->string;
-   unsigned char *sha1 = item->util;
 
-   if (delete_ref(refname, sha1, 0))
+   if (delete_ref(refname, NULL, 0))
result |= error(_("Could not remove branch %s"), 
refname);
}
+
return result;
 }
 
@@ -1305,6 +1313,7 @@ static int prune_remote(const char *remote, int dry_run)
 {
int result = 0, i;
struct ref_states states;
+   const char **delete_refs;
const char *dangling_msg = dry_run
? _(" %s will become dangling!")
: _(" %s has become dangling!");
@@ -1318,6 +1327,13 @@ static int prune_remote(const char *remote, int dry_run)
   states.remote->url_nr
   ? states.remote->url[0]
   : _("(no URL)"));
+
+   delete_refs = xmalloc(states.stale.nr * sizeof(*delete_refs));
+   for (i = 0; i < states.stale.nr; i++)
+   delete_refs[i] = states.stale.items[i].util;
+   if (!dry_run)
+   result |= repack_without_refs(delete_refs, 
states.stale.nr);
+   free(delete_refs);
}
 
for (i = 0; i < states.stale.nr; i++) {
diff --git a/refs.c b/refs.c
index 28d5eca..262c1c2 100644
--- a/refs.c
+++ b/refs.c
@@ -2431,7 +2431,7 @@ static int curate_packed_ref_fn(struct ref_entry *entry, 
void *cb_data)
return 0;
 }
 
-static int repack_without_refs(const char **refnames, int n)
+int repack_without_refs(const char **refnames, int n)
 {
struct ref_dir *packed;
struct string_list refs_to_delete = STRING_LIST_INIT_DUP;
diff --git a/refs.h b/refs.h
index 87a1a79..f287c7a 100644
--- a/refs.h
+++ b/refs.h
@@ -132,6 +132,8 @@ extern void rollback_packed_refs(void);
  */
 int pack_refs(unsigned int flags);
 
+extern int repack_without_refs(const char **refnames, int n);
+
 extern int ref_exists(const char *);
 
 /*
-- 
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/3] remote rm: delete remote configuration as the last

2014-05-23 Thread Jens Lindström
When removing a remote, delete the remote-tracking branches before
deleting the remote configuration.  This way, if the operation fails or
is aborted while deleting the remote-tracking branches, the command can
be rerun to complete the operation.

Signed-off-by: Jens Lindström 
---
 builtin/remote.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/builtin/remote.c b/builtin/remote.c
index b3ab4cf..84802cd 100644
--- a/builtin/remote.c
+++ b/builtin/remote.c
@@ -789,10 +789,6 @@ static int rm(int argc, const char **argv)
known_remotes.to_delete = remote;
for_each_remote(add_known_remote, &known_remotes);
 
-   strbuf_addf(&buf, "remote.%s", remote->name);
-   if (git_config_rename_section(buf.buf, NULL) < 1)
-   return error(_("Could not remove config section '%s'"), 
buf.buf);
-
read_branches();
for (i = 0; i < branch_list.nr; i++) {
struct string_list_item *item = branch_list.items + i;
@@ -837,6 +833,12 @@ static int rm(int argc, const char **argv)
}
string_list_clear(&skipped, 0);
 
+   if (!result) {
+   strbuf_addf(&buf, "remote.%s", remote->name);
+   if (git_config_rename_section(buf.buf, NULL) < 1)
+   return error(_("Could not remove config section '%s'"), 
buf.buf);
+   }
+
return result;
 }
 
-- 
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/3] remote: optimize rm/prune ref deletion

2014-05-23 Thread Jens Lindström
Changes since previous version:

 * Additionally change the order that 'remote rm' does things so that it
   removes the remote configuration as the last step and only if the
   other steps succeeded.

 * Change the packed-refs repacking patch to repack before deleting refs
   instead of after.  This makes the patch simpler, since delete_ref()
   no longer needs to be told not to repack.

Jens Lindstrom (3):
  remote rm: delete remote configuration as the last step
  remote: repack packed-refs once when deleting multiple refs
  remote prune: optimize "dangling symref" check/warning

 builtin/remote.c | 37 ++---
 refs.c   | 21 +++--
 refs.h   |  3 +++
 3 files changed, 52 insertions(+), 9 deletions(-)

--
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] remote: defer repacking packed-refs when deleting refs

2014-05-23 Thread Jens Lindström
On Tue, May 20, 2014 at 10:29 PM, Junio C Hamano  wrote:
> Junio C Hamano  writes:
>
>> A bit safer way to organize might be to first create a list of the
>> refs to be removed in-core, update packed-refs without these refs to
>> be removed, and then finally remove the loose ones, but I haven't
>> thought things through.
>
> Perhaps a removal of remote can go in this order to be resistant
> against an abort-in-the-middle.
>
>  * update packed-refs without the refs that came from the remote.
>
>- when interrupted before the new temporary file is renamed to
>  the final, it would be a no-op.
>
>- when interrupted after the rename, only some refs that came
>  from the remote may disappear.
>
>  * remove the loose refs that came from the remote.
>
>  * finally, remove the configuration related to the remote.
>
> This order would let you interrupt "remote rm" without leaving the
> repository in a broken state.  Before the final state, it may appear
> that you have some but not all remote-tracking refs from the remote
> in your repository, but you would not have any ref that point at an
> obsolete object.  Running "remote rm" again, once it finishes, will
> give you the desired result.

Removing the remote configuration (I assume you mean the
"remote." section from .git/config) last in 'remote rm' would be
a bit better I think.  Especially with the very slow removal of
branches an impatient user would likely abort the command if there
were many remote-tracking branches, after which rerunning would fail
since the remote configuration was already gone, and then there would
be no obvious way to get rid of the remaining remote-tracking
branches.

Doing the repacking first and then run through and delete loose refs
and ref logs leads to a smaller and nicer patch as well; no need to
tell delete_ref() to not repack then, since repack_without_refs() will
just find that the ref isn't in packed-refs already and do nothing.

One additional change was required in
builtin/remote.c:remove_branches().  It used to pass in the expected
SHA-1 of the ref to delete_ref(), which only works if the ref exists.
If repack_without_refs() is called first, most refs won't exist, and
delete_ref() would fail.  Branch removal from 'remote prune' didn't
have this problem, since it called delete_ref() with a NULL SHA-1.


> A longer-term goal might be to have ref-transaction infrastructure
> clever enough to coalesce the "repack-without-these-refs" requests
> into one automatically without forcing the callers you are fixing
> care about these things, though.  And such a transaction semantics
> may have to also cover the updating of configuration files.

Yes, some kind of more advanced transaction mechanism would be great,
and would likely solve this type of performance issue by design.

/ Jens
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] remote prune: optimize "dangling symref" check/warning

2014-05-20 Thread Jens Lindström
When 'git remote prune' was used to delete many refs in a repository
with many refs, a lot of time was spent checking for (now) dangling
symbolic refs pointing to the deleted ref, since warn_dangling_symref()
was once per deleted ref to check all other refs in the repository.

Avoid this using the new warn_dangling_symrefs() function which
makes one pass over all refs and checks for all the deleted refs in
one go, after they have all been deleted.

Signed-off-by: Jens Lindström 
---
 builtin/remote.c |  6 +-
 refs.c   | 19 ++-
 refs.h   |  1 +
 3 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/builtin/remote.c b/builtin/remote.c
index ce60a30..5e4a8dd 100644
--- a/builtin/remote.c
+++ b/builtin/remote.c
@@ -1306,6 +1306,7 @@ static int prune_remote(const char *remote, int dry_run)
 {
int result = 0, i;
struct ref_states states;
+   struct string_list delete_refs_list = STRING_LIST_INIT_NODUP;
const char **delete_refs;
const char *dangling_msg = dry_run
? _(" %s will become dangling!")
@@ -1327,6 +1328,7 @@ static int prune_remote(const char *remote, int dry_run)
const char *refname = states.stale.items[i].util;
 
delete_refs[i] = refname;
+   string_list_insert(&delete_refs_list, refname);
 
if (!dry_run)
result |= delete_ref(refname, NULL, REF_DEFERREPACK);
@@ -1337,9 +1339,11 @@ static int prune_remote(const char *remote, int dry_run)
else
printf_ln(_(" * [pruned] %s"),
   abbrev_ref(refname, "refs/remotes/"));
-   warn_dangling_symref(stdout, dangling_msg, refname);
}
 
+   warn_dangling_symrefs(stdout, dangling_msg, &delete_refs_list);
+   string_list_clear(&delete_refs_list, 0);
+
if (states.stale.nr) {
if (!dry_run)
result |= repack_without_refs(delete_refs, 
states.stale.nr);
diff --git a/refs.c b/refs.c
index 3b62aca..fdd8b74 100644
--- a/refs.c
+++ b/refs.c
@@ -1611,6 +1611,7 @@ int peel_ref(const char *refname, unsigned char *sha1)
 struct warn_if_dangling_data {
FILE *fp;
const char *refname;
+   const struct string_list *refnames;
const char *msg_fmt;
 };
 
@@ -1625,8 +1626,12 @@ static int warn_if_dangling_symref(const char *refname, 
const unsigned char *sha
return 0;
 
resolves_to = resolve_ref_unsafe(refname, junk, 0, NULL);
-   if (!resolves_to || strcmp(resolves_to, d->refname))
+   if (!resolves_to
+   || (d->refname
+   ? strcmp(resolves_to, d->refname)
+   : !string_list_has_string(d->refnames, resolves_to))) {
return 0;
+   }
 
fprintf(d->fp, d->msg_fmt, refname);
fputc('\n', d->fp);
@@ -1639,6 +1644,18 @@ void warn_dangling_symref(FILE *fp, const char *msg_fmt, 
const char *refname)
 
data.fp = fp;
data.refname = refname;
+   data.refnames = NULL;
+   data.msg_fmt = msg_fmt;
+   for_each_rawref(warn_if_dangling_symref, &data);
+}
+
+void warn_dangling_symrefs(FILE *fp, const char *msg_fmt, const struct 
string_list *refnames)
+{
+   struct warn_if_dangling_data data;
+
+   data.fp = fp;
+   data.refname = NULL;
+   data.refnames = refnames;
data.msg_fmt = msg_fmt;
for_each_rawref(warn_if_dangling_symref, &data);
 }
diff --git a/refs.h b/refs.h
index 0db5584..cd4710d 100644
--- a/refs.h
+++ b/refs.h
@@ -89,7 +89,7 @@ static inline const char *has_glob_specials(const char 
*pattern)
 extern int for_each_rawref(each_ref_fn, void *);
 
 extern void warn_dangling_symref(FILE *fp, const char *msg_fmt, const char 
*refname);
+extern void warn_dangling_symrefs(FILE *fp, const char *msg_fmt, const struct 
string_list* refnames);
 
 /*
  * Lock the packed-refs file for writing.  Flags is passed to
--
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] remote: defer repacking packed-refs when deleting refs

2014-05-20 Thread Jens Lindström
When 'git remote rm' or 'git remote prune' were used in a repository
with many refs, and needed to delete many refs, a lot of time was spent
deleting those refs since for each deleted ref, repack_without_refs()
was called to rewrite packed-refs without just that deleted ref.

To avoid this, defer the repacking until after all refs have been
deleted (by delete_ref()), and then call repack_without_refs() once to
repack without all the deleted refs.

Signed-off-by: Jens Lindström 
---
This patch changes behavior when the operation is aborted in the
middle, so that loose refs and ref logs might have been deleted, but
not the corresponding entries in packed-refs, since packed-refs is now
only updated at the end.  This is a bit unfortunate, and may
"resurrect" an obsolete packed-refs entry by deleting the loose ref
that had overridden it.

Mitigating factors would be that this only affects remote tracking
branches that we were about to delete anyway, and that in the case of
'git remote prune' were apparently not actually matching a ref in the
remote.

Also, it is a lot harder to interrupt the operation in the middle when
it takes a few seconds compared to when it takes many minutes to
complete.  :-)

 builtin/remote.c | 19 ---
 refs.c   | 15 +--
 refs.h   |  3 +++
 3 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/builtin/remote.c b/builtin/remote.c
index b3ab4cf..ce60a30 100644
--- a/builtin/remote.c
+++ b/builtin/remote.c
@@ -749,15 +749,18 @@ static int mv(int argc, const char **argv)
 
 static int remove_branches(struct string_list *branches)
 {
+   const char **branch_names = xmalloc(branches->nr * 
sizeof(*branch_names));
int i, result = 0;
for (i = 0; i < branches->nr; i++) {
struct string_list_item *item = branches->items + i;
-   const char *refname = item->string;
+   const char *refname = branch_names[i] = item->string;
unsigned char *sha1 = item->util;
 
-   if (delete_ref(refname, sha1, 0))
+   if (delete_ref(refname, sha1, REF_DEFERREPACK))
result |= error(_("Could not remove branch %s"), 
refname);
}
+   result |= repack_without_refs(branch_names, branches->nr);
+   free(branch_names);
return result;
 }
 
@@ -1303,6 +1306,7 @@ static int prune_remote(const char *remote, int dry_run)
 {
int result = 0, i;
struct ref_states states;
+   const char **delete_refs;
const char *dangling_msg = dry_run
? _(" %s will become dangling!")
: _(" %s has become dangling!");
@@ -1316,13 +1320,16 @@ static int prune_remote(const char *remote, int dry_run)
   states.remote->url_nr
   ? states.remote->url[0]
   : _("(no URL)"));
+   delete_refs = xmalloc(states.stale.nr * sizeof(*delete_refs));
}
 
for (i = 0; i < states.stale.nr; i++) {
const char *refname = states.stale.items[i].util;
 
+   delete_refs[i] = refname;
+
if (!dry_run)
-   result |= delete_ref(refname, NULL, 0);
+   result |= delete_ref(refname, NULL, REF_DEFERREPACK);
 
if (dry_run)
printf_ln(_(" * [would prune] %s"),
@@ -1333,6 +1340,12 @@ static int prune_remote(const char *remote, int dry_run)
warn_dangling_symref(stdout, dangling_msg, refname);
}
 
+   if (states.stale.nr) {
+   if (!dry_run)
+   result |= repack_without_refs(delete_refs, 
states.stale.nr);
+   free(delete_refs);
+   }
+
free_remote_ref_states(&states);
return result;
 }
diff --git a/refs.c b/refs.c
index 28d5eca..3b62aca 100644
--- a/refs.c
+++ b/refs.c
@@ -2431,7 +2431,7 @@ static int curate_packed_ref_fn(struct ref_entry *entry, 
void *cb_data)
return 0;
 }
 
-static int repack_without_refs(const char **refnames, int n)
+int repack_without_refs(const char **refnames, int n)
 {
struct ref_dir *packed;
struct string_list refs_to_delete = STRING_LIST_INIT_DUP;
@@ -2507,11 +2507,14 @@ int delete_ref(const char *refname, const unsigned char 
*sha1, int delopt)
return 1;
ret |= delete_ref_loose(lock, flag);
 
-   /* removing the loose one could have resurrected an earlier
-* packed one.  Also, if it was not loose we need to repack
-* without it.
-*/
-   ret |= repack_without_ref(lock->ref_name);
+   if (!(delopt & REF_DEFERREPACK))
+   {
+   /* removing the loose one could have resurrected an earlier
+* packed one.  Also, if it was not loose we need to repack
+* 

[PATCH 0/2] remote: optimize rm/prune ref deletion

2014-05-20 Thread Jens Lindström
At work, we have some shared repositories with far too many refs in
them, which causes various issues, performance and otherwise.  We plan
to move most of the refs out of them, but for that to help users that
have already fetched all the refs into their local repositories, those
users should want to run 'git remote prune'.

It turns out that 'git remote prune' (and also 'git remote rm') has
its own rather severe performance issues relating to removing the
obsolete refs from the local repository, which I've addressed in the
following patches.  The performance improves from many CPU minutes
(long enough that I could never be bothered to let it run to
completion) to around a few seconds, when removing ~15000 refs.

Jens Lindström (2):
  remote: defer repacking packed-refs when deleting refs
  remote prune: optimize "dangling symref" check/warning

 builtin/remote.c | 25 +
 refs.c   | 34 +++---
 refs.h   |  4 
 3 files changed, 52 insertions(+), 11 deletions(-)

--
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Clear fd after closing to avoid double-close error

2013-10-23 Thread Jens Lindström
On Tue, Oct 22, 2013 at 8:42 PM, Junio C Hamano  wrote:
> Duy Nguyen  writes:

>> Life would have been simpler if fd[1] was _always_ closed by
>> send_pack(), like in c20181e (start_command(), if .in/.out > 0, closes
>> file descriptors, not the callers - 2008-02-21).
>
> Yeah, that was also my first reaction when I saw the above three
> lines after reading the discussion that led to the diagnosis.

If send_pack() always closes fd[1], then I believe "git send-pack
--stateless-rpc --helper-status" would die in print_helper_status(),
called after send_pack(), since fd[1] would be 1, to which
print_helper_status() will try to write. (I don't know what either
--stateless-rpc or --helper-status mean, other than what's obvious
from the code, or if the combination of them makes any sense.)

I don't really have any more time to spend on this issue, so if a more
thorough fix is required, I'm afraid someone else will have to work on
it.

/ Jens
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] Clear fd after closing to avoid double-close error

2013-10-22 Thread Jens Lindström
From: Jens Lindstrom 

In send_pack(), clear the fd passed to pack_objects() by setting
it to -1, since pack_objects() closes the fd (via a call to
run_command()).  Likewise, in get_pack(), clear the fd passed to
run_command().

Not doing so risks having git_transport_push(), caller of
send_pack(), closing the fd again, possibly incorrectly closing
some other open file; or similarly with fetch_refs_from_pack(),
indirect caller of get_pack().

Signed-off-by: Jens Lindström 
---
 fetch-pack.c | 4 
 send-pack.c  | 4 
 2 files changed, 8 insertions(+)

diff --git a/fetch-pack.c b/fetch-pack.c
index f5d99c1..29b711a 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -776,6 +776,10 @@ static int get_pack(struct fetch_pack_args *args,
close(cmd.out);
}
 
+   if (!use_sideband)
+   /* Closed by start_command() */
+   xd[0] = -1;
+
ret = finish_command(&cmd);
if (!ret || (args->check_self_contained_and_connected && ret == 1))
args->self_contained_and_connected =
diff --git a/send-pack.c b/send-pack.c
index 7d172ef..edbfd07 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -300,8 +300,12 @@ int send_pack(struct send_pack_args *args,
shutdown(fd[0], SHUT_WR);
if (use_sideband)
finish_async(&demux);
+   fd[1] = -1;
return -1;
}
+   if (!args->stateless_rpc)
+   /* Closed by pack_objects() via start_command() */
+   fd[1] = -1;
}
if (args->stateless_rpc && cmds_sent)
packet_flush(out);
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Clear fd after closing to avoid double-close error

2013-10-22 Thread Jens Lindström
On Tue, Oct 22, 2013 at 2:39 PM, Duy Nguyen  wrote:

> Not your itch. But if you have time you may want to fix fetch-pack
> too. It has the same problem. fetch-pack.c:get_pack() with
> use_sideband == 0 passes fd[0] to start_command(), then later its
> caller transport.c:fetch_refs_via_pack() closes the handle again.

I'll update the patch to clear that fd as well.


>> Signed-off-by: Jens Lindström 
>> ---
>>  send-pack.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/send-pack.c b/send-pack.c
>> index 7d172ef..7def2af 100644
>> --- a/send-pack.c
>> +++ b/send-pack.c
>> @@ -302,6 +302,9 @@ int send_pack(struct send_pack_args *args,
>> finish_async(&demux);
>> return -1;
>
> In this code block, there is "close(out);", we may need to set fd[1] = -1 too.

This block closes the fd unconditionally, I think. Either via
pack_objects() (if !args->stateless_rpc) or directly (otherwise.) So I
guess it should always clear the fd before returning to be safe.


>> }
>> +   if (!args->stateless_rpc)
>> +   /* Closed by pack_objects() via start_command() */
>> +   fd[1] = -1;
>> }
>> if (args->stateless_rpc && cmds_sent)
>> packet_flush(out);
>
> I was puzzled by this packet_flush(out) for a while because I thought
> "out" was already closed. Turns out when stateless_rpc is true, a new
> pipe is created in pack_objects() and "out" is not closed. So
> everything is still good (and messy).
>
> Life would have been simpler if fd[1] was _always_ closed by
> send_pack(), like in c20181e (start_command(), if .in/.out > 0, closes
> file descriptors, not the callers - 2008-02-21).

It did strike me as a bit unclear who exactly "owned" these file
descriptors. But I'm of course wholly unfamiliar with this code.

/ Jens
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Clear fd after closing to avoid double-close error

2013-10-22 Thread Jens Lindström
From: Jens Lindstrom 

In send_pack(), clear the fd passed to pack_objects() by setting
it to -1, since pack_objects() closes the fd (via a call to
run_command()).

Not doing so risks having git_transport_push(), caller of
send_pack(), closing the fd again, possibly incorrectly closing
some other open file.

Signed-off-by: Jens Lindström 
---
 send-pack.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/send-pack.c b/send-pack.c
index 7d172ef..7def2af 100644
--- a/send-pack.c
+++ b/send-pack.c
@@ -302,6 +302,9 @@ int send_pack(struct send_pack_args *args,
finish_async(&demux);
return -1;
}
+   if (!args->stateless_rpc)
+   /* Closed by pack_objects() via start_command() */
+   fd[1] = -1;
}
if (args->stateless_rpc && cmds_sent)
packet_flush(out);
-- 
1.8.1.2

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Bug: git-push crash due to double-close of file descriptor

2013-10-22 Thread Jens Lindström
In a repository, I have a repeatable crash when pushing a ref to a
remote. The cause seems very simple, and it's more unclear to me why
this doesn't happen more often.

The cause, as I understand it:

git_transport_push() calls send_pack() which calls pack_objects()
which calls start_command(), which closes the output file descriptor
(known in start_command() as "cmd->out" and in git_transport_push() as
"data->fd[1]".)

git_transport_push(), immediately following the call to send_pack(),
also closes the file descriptor. Adding error handling to this close()
call shows that it fails with EBADF for a normal push (one that
doesn't crash.)

In my crashing scenario, the file descriptor has been reused between
the first close() and the second incorrect close(), for opening a pack
file. So the second close() closes the pack file, leaving a 'struct
packed_git' object with a "dangling" pack_fd member. Later on, mmap()
is called to map the contents of the pack file, but at this point the
file descriptor has been reused yet again for a zero-length lock file,
and so mmap() succeeds but returns no accessible memory, and we crash
when accessing it.

It would be trivial to fix this by simply removing the close() call in
git_transport_push(), but I imagine this might cause a file descriptor
leak in other cases instead.

Thoughts on this?

/ Jens
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] clone: Skip pack-*.keep files when cloning locally

2013-07-03 Thread Jens Lindström
On Wed, Jul 3, 2013 at 7:26 PM, Junio C Hamano  wrote:
> Jens Lindström  writes:
>
>> Not always.  The Linux kernel can at least be configured not to allow
>> it.  It seems this is enabled by default in at least Debian.
>
> You learn a new thing every day, I guess.  I am on Debian, I do not
> think I did any customization in that area, and I can hardlink just
> fine.

To configure it, write "1" (on) or "0" (off) to
/proc/sys/fs/protected_{hard,sym}links.  I can't remember (or imagine)
that I enabled it on any of my systems.  One of my systems is Debian
Squeeze with a 2.6.32 kernel, and it doesn't have those files, so I
guess it might have been added in some more recent kernel version.

>> This restriction had me a bit confused when I was testing variations
>> here; I expected all "access denied" failures to be because of .keep
>> files, but in fact creating hardlinks to other files (.idx and .pack)
>> failed too, even though they were readable.
>
> Is it possible that you are tripping cross-device link?  The reason
> why we have "attempt to hardlink but fall back to copy" is exactly
> because it is fairly common that people try local-cheap clone without
> realizing the source and the destination may be on separate filesystems.

No, I was certainly cloning within a single file system, and I can
confirm that a plain "ln src dest" command fails unless the user can
both read and write the source file.  So if the cloning user has
read-only access to the repository, copying will work and linking
won't (depending on the kernel,) in which case it of course is
excellent that git falls back to copying instead of linking.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] clone: Skip pack-*.keep files when cloning locally

2013-07-03 Thread Jens Lindström
On Mon, Jul 1, 2013 at 6:20 PM, Junio C Hamano  wrote:
> I am not sure if we should care that deeply about them in the first
> place.

Fine by me; I don't really have a strong opinion on the matter.

> Besides, I think you can make a hardlink to a file that you cannot
> read.

Not always.  The Linux kernel can at least be configured not to allow
it.  It seems this is enabled by default in at least Debian.

This restriction had me a bit confused when I was testing variations
here; I expected all "access denied" failures to be because of .keep
files, but in fact creating hardlinks to other files (.idx and .pack)
failed too, even though they were readable.  This caused "git clone
--local" to fail, while "git clone" succeeded after falling back to
copying instead of linking.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] clone: Skip pack-*.keep files when cloning locally

2013-07-01 Thread Jens Lindström
On Fri, Jun 28, 2013 at 8:38 PM, Junio C Hamano  wrote:

>> The pack-*.keep files are temporary, and serve no purpose in the
>> clone.
>
> They are not temporary, actually. A user can deliberatey create a
> "keep" marker after packing with a good set of parameters, so that
> the resulting pack will be kept, instead of letting a later repack
> (with faster set of parameters) remove and replace it with less
> optimal results.

Ah, I see.  Was (obviously) not aware of that.  It would perhaps be a
good idea to be able to differentiate between such permanent keep
files and the temporary ones created by built-in commands.

Also, even if some keep files are permanent in the source repository,
is it always a good idea to copy them over to the clone?  This would
only happen for some types of clones, anyway.


On Fri, Jun 28, 2013 at 10:38 PM, Junio C Hamano  wrote:

> That is, something like this, perhaps?

Comments:

With this patch, it still fails with --local, when the link() call
fails.  This seems a bit odd, in particular in the cases where --local
is implied.  IOW, one would not expect that adding --local would
change behavior, but here adding it causes the operation to fail.

Also, since failing to link() once implicitly enables --no-hardlinks,
it would copy the rest of the repository without trying to use link(),
which might make the whole operation much more expensive.

Applying the exception for inaccessible .keep files for link() as well
would seem a better solution to me.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html