date:20131024

The extra semi-colon is harmless, since we really do want
the while loop to do nothing. But it does trigger a warning
from clang.

Signed-off-by: Jeff King p...@peff.net
---
 date.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/date.c b/date.c
index 29f1540..83b4166 100644
--- a/date.c
+++ b/date.c
@@ -907,7 +907,7 @@ static const char *approxidate_alpha(const char *date, 
struct tm *tm, struct tm
const char *end = date;
int i;
 
-   while (isalpha(*++end));
+   while (isalpha(*++end))
;
 
for (i = 0; i  12; i++) {
-- 
1.8.4.1.898.g8bf8a41.dirty
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] silence gcc array-bounds warning

In shorten_unambiguous_ref, we build and cache a reverse-map of the
rev-parse rules like this:

  static char **scanf_fmts;
  static int nr_rules;
  if (!nr_rules) {
  for (; ref_rev_parse_rules[nr_rules]; nr_rules++)
  ... generate scanf_fmts ...
  }

where ref_rev_parse_rules is terminated with a NULL pointer.
Compiling with gcc -O2 -Wall does not cause any problems, but
compiling with -O3 -Wall generates:

  $ make CFLAGS='-O3 -Wall' refs.o
  refs.c: In function ‘shorten_unambiguous_ref’:
  refs.c:3379:29: warning: array subscript is above array bounds 
[-Warray-bounds]
 for (; ref_rev_parse_rules[nr_rules]; nr_rules++)

Curiously, we can silence this by explicitly nr_rules to 0
in the beginning of the loop, even though the compiler
should be able to tell that we follow this code path only
when nr_rules is already 0.

Signed-off-by: Jeff King p...@peff.net
---
This is a repost of:

  http://article.gmane.org/gmane.comp.version-control.git/235703

which contains a little more cover-letter discussion.

 refs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/refs.c b/refs.c
index 3710748..0c0e963 100644
--- a/refs.c
+++ b/refs.c
@@ -3376,7 +3376,7 @@ char *shorten_unambiguous_ref(const char *refname, int 
strict)
size_t total_len = 0;
 
/* the rule list is NULL terminated, count them first */
-   for (; ref_rev_parse_rules[nr_rules]; nr_rules++)
+   for (nr_rules = 0; ref_rev_parse_rules[nr_rules]; nr_rules++)
/* no +1 because strlen(%s)  strlen(%.*s) */
total_len += strlen(ref_rev_parse_rules[nr_rules]);
 
-- 
1.8.4.1.898.g8bf8a41.dirty
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] for-each-ref: avoid loading objects to print %(objectname)

If you ask for-each-ref to print each ref and its object,
like:

  git for-each-ref --format='%(objectname) %(refname)'

this should involve little more work than looking at the ref
files (and packed-refs) themselves. However, for-each-ref
will actually load each object from disk just to print its
sha1. For most repositories, this isn't a big deal, but it
can be noticeable if you have a large number of refs to
print. Here are best-of-five timings for the command above
on a repo with ~10K refs:

  [before]
  real0m0.112s
  user0m0.092s
  sys 0m0.016s

  [after]
  real0m0.014s
  user0m0.012s
  sys 0m0.000s

This patch checks for %(objectname) and %(objectname:short)
before we actually parse the object (and the rest of the
code is smart enough to avoid parsing if we have filled all
of our placeholders).

Note that we can't simply move the objectname parsing code
into the early loop. If the deref form %(*objectname) is
used, then we do need to parse the object in order to peel
the tag. So instead of moving the code, we factor it out
into a separate function that can be called for both cases.

While we're at it, we add some basic tests for the
dereferenced placeholders, which were not tested at all
before. This helps ensure we didn't regress that case.

Signed-off-by: Jeff King p...@peff.net
---
This is a repost of:

  http://article.gmane.org/gmane.comp.version-control.git/235704

I think the original was just overlooked.

 builtin/for-each-ref.c  | 29 -
 t/t6300-for-each-ref.sh |  4 
 2 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
index 1d4083c..d096051 100644
--- a/builtin/for-each-ref.c
+++ b/builtin/for-each-ref.c
@@ -205,6 +205,22 @@ static void *get_obj(const unsigned char *sha1, struct 
object **obj, unsigned lo
return buf;
 }
 
+static int grab_objectname(const char *name, const unsigned char *sha1,
+   struct atom_value *v)
+{
+   if (!strcmp(name, objectname)) {
+   char *s = xmalloc(41);
+   strcpy(s, sha1_to_hex(sha1));
+   v-s = s;
+   return 1;
+   }
+   if (!strcmp(name, objectname:short)) {
+   v-s = xstrdup(find_unique_abbrev(sha1, DEFAULT_ABBREV));
+   return 1;
+   }
+   return 0;
+}
+
 /* See grab_values */
 static void grab_common_values(struct atom_value *val, int deref, struct 
object *obj, void *buf, unsigned long sz)
 {
@@ -225,15 +241,8 @@ static void grab_common_values(struct atom_value *val, int 
deref, struct object
v-ul = sz;
v-s = s;
}
-   else if (!strcmp(name, objectname)) {
-   char *s = xmalloc(41);
-   strcpy(s, sha1_to_hex(obj-sha1));
-   v-s = s;
-   }
-   else if (!strcmp(name, objectname:short)) {
-   v-s = xstrdup(find_unique_abbrev(obj-sha1,
- DEFAULT_ABBREV));
-   }
+   else if (deref)
+   grab_objectname(name, obj-sha1, v);
}
 }
 
@@ -676,6 +685,8 @@ static void populate_value(struct refinfo *ref)
}
continue;
}
+   else if (!deref  grab_objectname(name, ref-objectname, v))
+   continue;
else
continue;
 
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index 752f5cb..2b4b9a9 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -58,6 +58,8 @@ test_atom head parent ''
 test_atom head numparent 0
 test_atom head object ''
 test_atom head type ''
+test_atom head *objectname ''
+test_atom head *objecttype ''
 test_atom head author 'A U Thor aut...@example.com 1151939924 +0200'
 test_atom head authorname 'A U Thor'
 test_atom head authoremail 'aut...@example.com'
@@ -91,6 +93,8 @@ test_atom tag parent ''
 test_atom tag numparent ''
 test_atom tag object '67a36f10722846e891fbada1ba48ed035de75581'
 test_atom tag type 'commit'
+test_atom tag *objectname '67a36f10722846e891fbada1ba48ed035de75581'
+test_atom tag *objecttype 'commit'
 test_atom tag author ''
 test_atom tag authorname ''
 test_atom tag authoremail ''
-- 
1.8.4.1.898.g8bf8a41.dirty
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/6] fix some parse_commit segfaults

This is a repost of the series here:

  http://thread.gmane.org/gmane.comp.version-control.git/235735/focus=235826

(the original was a single patch, followed by a 5-patch series, which
I've combined here). It's mostly a cleanup, since parse_commit will only
fail in corrupted repos, but I did run into it in a real (corrupted)
repo.

  [1/6]: log_tree_diff: die when we fail to parse a commit
  [2/6]: assume parse_commit checks commit-object.parsed
  [3/6]: assume parse_commit checks for NULL commit
  [4/6]: use parse_commit_or_die instead of segfaulting
  [5/6]: use parse_commit_or_die instead of custom message
  [6/6]: checkout: do not die when leaving broken detached HEAD

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/6] assume parse_commit checks for NULL commit

The parse_commit function will check whether it was passed a
NULL commit pointer, and if so, return an error. There is no
need for callers to check this separately.

Signed-off-by: Jeff King p...@peff.net
---
 builtin/branch.c | 2 +-
 builtin/commit.c | 4 ++--
 commit.c | 2 +-
 notes-utils.c| 2 +-
 sha1_name.c  | 2 --
 5 files changed, 5 insertions(+), 7 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index ad0f86d..491090f 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -496,7 +496,7 @@ static void add_verbose_info(struct strbuf *out, struct 
ref_item *item,
const char *sub = _(  invalid ref );
struct commit *commit = item-commit;
 
-   if (commit  !parse_commit(commit)) {
+   if (!parse_commit(commit)) {
pp_commit_easy(CMIT_FMT_ONELINE, commit, subject);
sub = subject.buf;
}
diff --git a/builtin/commit.c b/builtin/commit.c
index 6ab4605..e89c519 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1338,7 +1338,7 @@ static void print_summary(const char *prefix, const 
unsigned char *sha1,
commit = lookup_commit(sha1);
if (!commit)
die(_(couldn't look up newly created commit));
-   if (!commit || parse_commit(commit))
+   if (parse_commit(commit))
die(_(could not parse newly created commit));
 
strbuf_addstr(format, format:%h] %s);
@@ -1525,7 +1525,7 @@ int cmd_commit(int argc, const char **argv, const char 
*prefix)
current_head = NULL;
else {
current_head = lookup_commit_or_die(sha1, HEAD);
-   if (!current_head || parse_commit(current_head))
+   if (parse_commit(current_head))
die(_(could not parse HEAD commit));
}
argc = parse_and_validate_options(argc, argv, builtin_commit_options,
diff --git a/commit.c b/commit.c
index 51a9bbc..11509ff 100644
--- a/commit.c
+++ b/commit.c
@@ -79,7 +79,7 @@ struct commit *lookup_commit_reference_by_name(const char 
*name)
if (get_sha1_committish(name, sha1))
return NULL;
commit = lookup_commit_reference(sha1);
-   if (!commit || parse_commit(commit))
+   if (parse_commit(commit))
return NULL;
return commit;
 }
diff --git a/notes-utils.c b/notes-utils.c
index 9107c37..7bb3473 100644
--- a/notes-utils.c
+++ b/notes-utils.c
@@ -18,7 +18,7 @@ void create_notes_commit(struct notes_tree *t, struct 
commit_list *parents,
unsigned char parent_sha1[20];
if (!read_ref(t-ref, parent_sha1)) {
struct commit *parent = lookup_commit(parent_sha1);
-   if (!parent || parse_commit(parent))
+   if (parse_commit(parent))
die(Failed to find/parse commit %s, t-ref);
commit_list_insert(parent, parents);
}
diff --git a/sha1_name.c b/sha1_name.c
index 0e5fe7f..1dfc401 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -582,8 +582,6 @@ static int get_parent(const char *name, int len,
if (ret)
return ret;
commit = lookup_commit_reference(sha1);
-   if (!commit)
-   return -1;
if (parse_commit(commit))
return -1;
if (!idx) {
-- 
1.8.4.1.898.g8bf8a41.dirty

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/6] log_tree_diff: die when we fail to parse a commit

We currently call parse_commit and then assume we can
dereference the resulting tree struct field. If parsing
failed, however, that field is NULL and we end up
segfaulting.

Instead of a segfault, let's print an error message and die
a little more gracefully.

Note that this should never happen in practice, but may
happen in a corrupt repository.

Signed-off-by: Jeff King p...@peff.net
---
 commit.c   | 7 +++
 commit.h   | 1 +
 log-tree.c | 6 +++---
 3 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/commit.c b/commit.c
index de16a3c..51a9bbc 100644
--- a/commit.c
+++ b/commit.c
@@ -341,6 +341,13 @@ int parse_commit(struct commit *item)
return ret;
 }
 
+void parse_commit_or_die(struct commit *item)
+{
+   if (parse_commit(item))
+   die(unable to parse commit %s,
+   item ? sha1_to_hex(item-object.sha1) : (null));
+}
+
 int find_commit_subject(const char *commit_buffer, const char **subject)
 {
const char *eol;
diff --git a/commit.h b/commit.h
index bd841f4..934af88 100644
--- a/commit.h
+++ b/commit.h
@@ -49,6 +49,7 @@ struct commit *lookup_commit_or_die(const unsigned char 
*sha1, const char *ref_n
 
 int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long 
size);
 int parse_commit(struct commit *item);
+void parse_commit_or_die(struct commit *item);
 
 /* Find beginning and length of commit subject. */
 int find_commit_subject(const char *commit_buffer, const char **subject);
diff --git a/log-tree.c b/log-tree.c
index 8534d91..e958d07 100644
--- a/log-tree.c
+++ b/log-tree.c
@@ -734,7 +734,7 @@ static int log_tree_diff(struct rev_info *opt, struct 
commit *commit, struct log
if (!opt-diff  !DIFF_OPT_TST(opt-diffopt, EXIT_WITH_STATUS))
return 0;
 
-   parse_commit(commit);
+   parse_commit_or_die(commit);
sha1 = commit-tree-object.sha1;
 
/* Root commit? */
@@ -759,7 +759,7 @@ static int log_tree_diff(struct rev_info *opt, struct 
commit *commit, struct log
 * parent, showing summary diff of the others
 * we merged _in_.
 */
-   parse_commit(parents-item);
+   parse_commit_or_die(parents-item);
diff_tree_sha1(parents-item-tree-object.sha1,
   sha1, , opt-diffopt);
log_tree_diff_flush(opt);
@@ -774,7 +774,7 @@ static int log_tree_diff(struct rev_info *opt, struct 
commit *commit, struct log
for (;;) {
struct commit *parent = parents-item;
 
-   parse_commit(parent);
+   parse_commit_or_die(parent);
diff_tree_sha1(parent-tree-object.sha1,
   sha1, , opt-diffopt);
log_tree_diff_flush(opt);
-- 
1.8.4.1.898.g8bf8a41.dirty

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/6] assume parse_commit checks commit-object.parsed

The parse_commit function will check the parsed flag of
the object and do nothing if it is set. There is no need
for callers to check the flag themselves, and doing so only
clutters the code.

Signed-off-by: Jeff King p...@peff.net
---
 builtin/blame.c   | 3 +--
 builtin/name-rev.c| 3 +--
 builtin/show-branch.c | 3 +--
 fetch-pack.c  | 8 +++-
 4 files changed, 6 insertions(+), 11 deletions(-)

diff --git a/builtin/blame.c b/builtin/blame.c
index 6da7233..5f1cb09 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -1549,8 +1549,7 @@ static void assign_blame(struct scoreboard *sb, int opt)
 */
origin_incref(suspect);
commit = suspect-commit;
-   if (!commit-object.parsed)
-   parse_commit(commit);
+   parse_commit(commit);
if (reverse ||
(!(commit-object.flags  UNINTERESTING) 
 !(revs-max_age != -1  commit-date  revs-max_age)))
diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 20fcf8c..23daaa7 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -27,8 +27,7 @@ static void name_rev(struct commit *commit,
struct commit_list *parents;
int parent_number = 1;
 
-   if (!commit-object.parsed)
-   parse_commit(commit);
+   parse_commit(commit);
 
if (commit-date  cutoff)
return;
diff --git a/builtin/show-branch.c b/builtin/show-branch.c
index 001f29c..46902c3 100644
--- a/builtin/show-branch.c
+++ b/builtin/show-branch.c
@@ -227,8 +227,7 @@ static void join_revs(struct commit_list **list_p,
parents = parents-next;
if ((this_flag  flags) == flags)
continue;
-   if (!p-object.parsed)
-   parse_commit(p);
+   parse_commit(p);
if (mark_seen(p, seen_p)  !still_interesting)
extra--;
p-object.flags |= flags;
diff --git a/fetch-pack.c b/fetch-pack.c
index a0e0350..a141eb4 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -47,9 +47,8 @@ static void rev_list_push(struct commit *commit, int mark)
if (!(commit-object.flags  mark)) {
commit-object.flags |= mark;
 
-   if (!(commit-object.parsed))
-   if (parse_commit(commit))
-   return;
+   if (parse_commit(commit))
+   return;
 
prio_queue_put(rev_list, commit);
 
@@ -128,8 +127,7 @@ static const unsigned char *get_rev(void)
return NULL;
 
commit = prio_queue_get(rev_list);
-   if (!commit-object.parsed)
-   parse_commit(commit);
+   parse_commit(commit);
parents = commit-parents;
 
commit-object.flags |= POPPED;
-- 
1.8.4.1.898.g8bf8a41.dirty

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/6] use parse_commit_or_die instead of custom message

Many calls to parse_commit detect errors and die. In some
cases, the custom error messages are more useful than what
parse_commit_or_die could produce, because they give some
context, like which ref the commit came from. Some, however,
just say invalid commit. Let's convert the latter to use
parse_commit_or_die; its message is slightly more informative,
and it makes the error more consistent throughout git.

Signed-off-by: Jeff King p...@peff.net
---
 shallow.c | 3 +--
 upload-pack.c | 3 +--
 2 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/shallow.c b/shallow.c
index cdf37d6..961cf6f 100644
--- a/shallow.c
+++ b/shallow.c
@@ -90,8 +90,7 @@ struct commit_list *get_shallow_commits(struct object_array 
*heads, int depth,
cur_depth = *(int *)commit-util;
}
}
-   if (parse_commit(commit))
-   die(invalid commit);
+   parse_commit_or_die(commit);
cur_depth++;
if (cur_depth = depth) {
commit_list_insert(commit, result);
diff --git a/upload-pack.c b/upload-pack.c
index a6c54e0..abe6636 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -649,8 +649,7 @@ static void receive_needs(void)
/* make sure the real parents are parsed */
unregister_shallow(object-sha1);
object-parsed = 0;
-   if (parse_commit((struct commit *)object))
-   die(invalid commit);
+   parse_commit_or_die((struct commit *)object);
parents = ((struct commit *)object)-parents;
while (parents) {
add_object_array(parents-item-object,
-- 
1.8.4.1.898.g8bf8a41.dirty

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 4/6] use parse_commit_or_die instead of segfaulting

Some unchecked calls to parse_commit should obviously die on
error, because their next step is to start looking at the
parsed fields, which will cause a segfault. These are
obvious candidates for parse_commit_or_die, which will be a
strict improvement in behavior.

Signed-off-by: Jeff King p...@peff.net
---
 builtin/checkout.c| 4 ++--
 builtin/fast-export.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 0f57397..34a2e43 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -789,7 +789,7 @@ static int switch_branches(const struct checkout_opts *opts,
new-commit = old.commit;
if (!new-commit)
die(_(You are on a branch yet to be born));
-   parse_commit(new-commit);
+   parse_commit_or_die(new-commit);
}
 
ret = merge_working_tree(opts, old, new, writeout_error);
@@ -952,7 +952,7 @@ static int parse_branchname_arg(int argc, const char **argv,
/* not a commit */
*source_tree = parse_tree_indirect(rev);
} else {
-   parse_commit(new-commit);
+   parse_commit_or_die(new-commit);
*source_tree = new-commit-tree;
}
 
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 78250ea..ea63052 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -287,7 +287,7 @@ static void handle_commit(struct commit *commit, struct 
rev_info *rev)
 
rev-diffopt.output_format = DIFF_FORMAT_CALLBACK;
 
-   parse_commit(commit);
+   parse_commit_or_die(commit);
author = strstr(commit-buffer, \nauthor );
if (!author)
die (Could not find author in commit %s,
@@ -308,7 +308,7 @@ static void handle_commit(struct commit *commit, struct 
rev_info *rev)
if (commit-parents 
get_object_mark(commit-parents-item-object) != 0 
!full_tree) {
-   parse_commit(commit-parents-item);
+   parse_commit_or_die(commit-parents-item);
diff_tree_sha1(commit-parents-item-tree-object.sha1,
   commit-tree-object.sha1, , rev-diffopt);
}
-- 
1.8.4.1.898.g8bf8a41.dirty

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 6/6] checkout: do not die when leaving broken detached HEAD

If we move away from a detached HEAD that has broken or
corrupted commits, we might die in two places:

  1. Printing the old HEAD was... message.

  2. Printing the list of orphaned commits.

In both cases, we ignore the return value of parse_commit
and feed the resulting commit to the pretty-print machinery,
which will die() upon failing to read the commit object
itself.

Since both cases are ancillary to the real operation being
performed, let's be more robust and keep going. This lets
users more easily checkout away from broken history.

Signed-off-by: Jeff King p...@peff.net
---
 builtin/checkout.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 34a2e43..1df55c0 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -380,8 +380,8 @@ static void show_local_changes(struct object *head,
 static void describe_detached_head(const char *msg, struct commit *commit)
 {
struct strbuf sb = STRBUF_INIT;
-   parse_commit(commit);
-   pp_commit_easy(CMIT_FMT_ONELINE, commit, sb);
+   if (!parse_commit(commit))
+   pp_commit_easy(CMIT_FMT_ONELINE, commit, sb);
fprintf(stderr, %s %s... %s\n, msg,
find_unique_abbrev(commit-object.sha1, DEFAULT_ABBREV), 
sb.buf);
strbuf_release(sb);
@@ -677,12 +677,12 @@ static int add_pending_uninteresting_ref(const char 
*refname,
 
 static void describe_one_orphan(struct strbuf *sb, struct commit *commit)
 {
-   parse_commit(commit);
strbuf_addstr(sb,   );
strbuf_addstr(sb,
find_unique_abbrev(commit-object.sha1, DEFAULT_ABBREV));
strbuf_addch(sb, ' ');
-   pp_commit_easy(CMIT_FMT_ONELINE, commit, sb);
+   if (!parse_commit(commit))
+   pp_commit_easy(CMIT_FMT_ONELINE, commit, sb);
strbuf_addch(sb, '\n');
 }
 
-- 
1.8.4.1.898.g8bf8a41.dirty
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RFE: support change-id generation natively

2013-10-24 Thread james . moger

 That said, I don't think that --change-id option that the user must not
 forget to use is any better than a hook that the user must not forget to
 install.

Having a --change-id option, to my mind, simplifies use of the patch
workflow as it does not require downloading, copying and setting
executable a hook script per-repository or globally.  I agree that
forgetting to add it on the command-line is a potential problem.  This
could be improved by honoring the gerrit.createchangeid value (or
whatever setting name is appropriate).  Of course that still requires
configuring the repo after clone, but it's clean and straight-forward
since it is all plain git config.

-J
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Bug report: using git commit-tree with -F - adds trailing newlines

2013-10-24 Thread Mickey Killianey

I believe I've stumbled across an inconsistency in how git commit-tree
reads messages from stdin.  I would expect the behavior of these two
commands to be identical, and that neither would actually change any
commits:

git filter-branch --commit-filter 'git commit-tree $@'
git filter-branch --commit-filter 'git commit-tree -F - $@'

(If that assumption is untrue, please feel free to correct me and
disregard the rest of this message.)

What I'm observing is that if I add a commit with a message that
doesn't end in a trailing newline, like this:

git merge `echo -n No trailing newline | git commit-tree
HEAD^{tree} -p HEAD`

Then I get different behavior in different versions of Git on
different platforms.  On git 1.8.4 on Ubuntu 12.04 (build from
https://launchpad.net/~git-core/+archive/ppa) under GNU bash, version
4.2.25(1)-release (x86_64-pc-linux-gnu), using -F - seems to add a
linefeed to the body:

$ git log -n 1 --format=:%H:%B:
: 4a11052c110c3daea46c89ae1118b1a2c59cc01b:No trailing newline:
$ git filter-branch --commit-filter 'git commit-tree $@'
Rewrite 4a11052c110c3daea46c89ae1118b1a2c59cc01b (2/2)
WARNING: Ref 'refs/heads/master' is unchanged
$ git filter-branch --commit-filter 'git commit-tree -F - $@'
Rewrite 4a11052c110c3daea46c89ae1118b1a2c59cc01b (2/2)
Ref 'refs/heads/master' was rewritten
$ git log -n 1 --format=:%H:%B:
:5ecba0ff0ca1290f2a5e3a599622e2a59e311f67:No trailing newline
:

On git 1.7.12.4 (Apple Git-37) on Mac OS X 10.8.5 using GNU bash,
version 3.2.48(1)-release (x86_64-apple-darwin12), both commands (with
and without the -F - add newlines to the body of the commits.

Thanks for your attention!

Mick
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RFE: support change-id generation natively

2013-10-24 Thread Thomas Koch

On Thursday, October 24, 2013 02:11:05 PM james.mo...@gitblit.com wrote:
  That said, I don't think that --change-id option that the user must not
  forget to use is any better than a hook that the user must not forget to
  install.

I'm a bit paranoid. (e.g. I do all my development in a virtual machine and my 
host machine only runs binaries from debian stable.)

A command line option is a big improvement over having to download a random 
script from some potentially untrusted place and executing it probably even 
with the same user that also has access to my GPG key that signs my code and 
my SSH key that has access to the repository.

Regards, Thomas Koch
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RFE: support change-id generation natively

On Thu, Oct 24, 2013 at 7:11 PM,  james.mo...@gitblit.com wrote:
 That said, I don't think that --change-id option that the user must not
 forget to use is any better than a hook that the user must not forget to
 install.

 Having a --change-id option, to my mind, simplifies use of the patch
 workflow as it does not require downloading, copying and setting
 executable a hook script per-repository or globally.

This could be solved by shipping the hook with git. So all you need to
do is git init --template=gerrit. --template requires full path, but
I think we can change it to accept a name and look for $datadir/$name.
A more interesting case is removing the hook. I admit I haven't found
any neat way to do it without messing in $GIT_DIR/hooks manually.

 I agree that
 forgetting to add it on the command-line is a potential problem.  This
 could be improved by honoring the gerrit.createchangeid value (or
 whatever setting name is appropriate).  Of course that still requires
 configuring the repo after clone, but it's clean and straight-forward
 since it is all plain git config.

 -J



-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Finding the repository

On Thu, Oct 24, 2013 at 2:49 PM, Perry Hutchison per...@pluto.rain.com wrote:
 Duy Nguyen pclo...@gmail.com wrote:

 ... it's not easy to determine ambiguity here, especially when the
 repo finding code does not know anything about bar/barz.c (is it
 a pathname or an argument to an option?).

 IOW, the code that finds the repository is called too early?

Yes. First it tries to find and setup the repository. Read per-config
repo file and add aliases to the database if any. Resolve alias. Run
the actual command. Then the command parses the command line.

 One way to solve that to that would be to proceed, even if the
 repository has to be left as unknown until it actually needs to
 be consulted -- by which time the subcommand would presumably have
 parsed all of the options and pathnames and so would know which is
 which.  Then, use the pathname(s) to identify the repository(ies).
 Yes, if there's more than one repository involved, the subcommand
 has to do a for each repository loop.  The code to do all this
 could go in a module shared among the subcommands.

That breaks the normal process I described above (i.e. lots of code
changes). People would scream at the code changes for a minor
convenience. We could refactor the parsing code so that the startup
code could do a dry-run command line parsing pass first. May work
for the majority of commands that do not need special parsing
callbacks.

Even then we still need more changes because git does not work well if
cwd is outside the worktree. Paths like 'bar/baz.c need to cut bar/
out. Then if baz.c is to printed out again, bar/ needs to be
prepended back..

 There are more cases to consider, like what if you do
 git rm bar/baz.c and rab/zab.c where bar and rab are
 two different repositories..

 So we remove baz.c from bar and zab.c from rab.  It's not clear
 to me that there's anything wrong with that -- it's exactly what
 I would expect to have happen (and also what the hackish script
 I posted will do).

For git rm, maybe. Many other commands need repo information and it
would not make sense to have paths from two different repositories.
For example, commit, rev-list or log. And it may break more things as
most of current commands are designed to work on one repo from a to z.
Some may support multi-repo operations if they're part of submodule
support.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 01/19] sha1write: make buffer const-correct

We are passed a void * and write it out without ever
touching it; let's indicate that by using const.

Signed-off-by: Jeff King p...@peff.net
---
 csum-file.c | 6 +++---
 csum-file.h | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/csum-file.c b/csum-file.c
index 53f5375..465971c 100644
--- a/csum-file.c
+++ b/csum-file.c
@@ -11,7 +11,7 @@
 #include progress.h
 #include csum-file.h
 
-static void flush(struct sha1file *f, void *buf, unsigned int count)
+static void flush(struct sha1file *f, const void *buf, unsigned int count)
 {
if (0 = f-check_fd  count)  {
unsigned char check_buffer[8192];
@@ -86,13 +86,13 @@ int sha1close(struct sha1file *f, unsigned char *result, 
unsigned int flags)
return fd;
 }
 
-int sha1write(struct sha1file *f, void *buf, unsigned int count)
+int sha1write(struct sha1file *f, const void *buf, unsigned int count)
 {
while (count) {
unsigned offset = f-offset;
unsigned left = sizeof(f-buffer) - offset;
unsigned nr = count  left ? left : count;
-   void *data;
+   const void *data;
 
if (f-do_crc)
f-crc32 = crc32(f-crc32, buf, nr);
diff --git a/csum-file.h b/csum-file.h
index 3b540bd..9dedb03 100644
--- a/csum-file.h
+++ b/csum-file.h
@@ -34,7 +34,7 @@ extern struct sha1file *sha1fd(int fd, const char *name);
 extern struct sha1file *sha1fd_check(const char *name);
 extern struct sha1file *sha1fd_throughput(int fd, const char *name, struct 
progress *tp);
 extern int sha1close(struct sha1file *, unsigned char *, unsigned int);
-extern int sha1write(struct sha1file *, void *, unsigned int);
+extern int sha1write(struct sha1file *, const void *, unsigned int);
 extern void sha1flush(struct sha1file *f);
 extern void crc32_begin(struct sha1file *);
 extern uint32_t crc32_end(struct sha1file *);
-- 
1.8.4.1.898.g8bf8a41.dirty

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/19] pack bitmaps

This series implements JGit-style pack bitmaps to speed up fetching and
cloning. For example, here is a simulation of the server side of a clone
of a fully-packed kernel repo (measuring actual clones is harder,
because the client does a lot of work on resolving deltas):

   [before]
   $ time git pack-objects --all --stdout /dev/null /dev/null
   Counting objects: 3237103, done.
   Compressing objects: 100% (508752/508752), done.
   Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)

   real0m44.111s
   user0m42.396s
   sys 0m3.544s


   [after]
   $ time git pack-objects --all --stdout /dev/null /dev/null
   Reusing existing pack: 3237103, done.
   Total 3237103 (delta 0), reused 0 (delta 0)

   real0m1.636s
   user0m1.460s
   sys 0m0.172s


This helps eliminate load on the server side, but it also means that we
actually start transferring objects way faster, which means the clones
finish faster. If you look at current clones of torvalds/linux from
kernel.org, it's almost two minutes before they actually start sending
you any data, during which time the client is twiddling its thumbs.

The bitmaps implemented here are compatible with those produced by JGit.
We can read JGit-produced bitmaps, and JGit can read ours. The one
exception is the final patch, which adds an optional name-hash cache.
It's added in such a way that existing implementations can ignore it,
and is marked with a flag in the header. However, JGit is very picky
about the flags field; it will reject any bitmap index with a flag it
does not know about.

The patches are:

  [01/19]: sha1write: make buffer const-correct
  [02/19]: revindex: Export new APIs
  [03/19]: pack-objects: Refactor the packing list
  [04/19]: pack-objects: factor out name_hash
  [05/19]: revision: allow setting custom limiter function
  [06/19]: sha1_file: export `git_open_noatime`
  [07/19]: compat: add endianness helpers
  [08/19]: ewah: compressed bitmap implementation

Refactoring and support for the rest of the series.

  [09/19]: documentation: add documentation for the bitmap format
  [10/19]: pack-bitmap: add support for bitmap indexes
  [11/19]: pack-objects: use bitmaps when packing objects
  [12/19]: rev-list: add bitmap mode to speed up object lists

Bitmap reading (you can test it against JGit at this point by
running jgit debug-gc, and then cloning or running rev-list).

  [13/19]: pack-objects: implement bitmap writing
  [14/19]: repack: stop using magic number for ARRAY_SIZE(exts)
  [15/19]: repack: turn exts array into array-of-struct
  [16/19]: repack: handle optional files created by pack-objects
  [17/19]: repack: consider bitmaps when performing repacks

Bitmap writing (you can test against JGit by running
git repack -adb, and then running jgit daemon to
serve the result).

  [18/19]: t: add basic bitmap functionality tests

With reading and writing, we can do our own tests.

  [19/19]: pack-bitmap: implement optional name_hash cache

And this is our extension.

A similar series has been running on github.com for the past couple of
months, though not every repository has had bitmaps turned on (but some
very busy ones have).  We've hopefully squeezed out all of the bugs and
corner cases over that time. However, I did rebase this on a more modern
version of master; among other conflicts, this required porting the
git-repack changes from shell to C. So it's entirely possible I've
introduced new bugs. :)

The idea and original implementation for bitmaps comes from Shawn and
Colby, of course. The hard work in this series was done by Vicent Marti,
and he is credited as the author in most of the patches. I've added some
window dressing and helped a little with debugging and review. But along
with Vicent, I should be able to help with answering questions for
review, and as time goes on, I'm familiar enough with the code to deal
with bugs and reviewing future changes.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 02/19] revindex: Export new APIs

From: Vicent Marti tan...@gmail.com

Allow users to efficiently lookup consecutive entries that are expected
to be found on the same revindex by exporting `find_revindex_position`:
this function takes a pointer to revindex itself, instead of looking up
the proper revindex for a given packfile on each call.

Signed-off-by: Vicent Marti tan...@gmail.com
Signed-off-by: Jeff King p...@peff.net
---
 pack-revindex.c | 38 +-
 pack-revindex.h |  8 
 2 files changed, 33 insertions(+), 13 deletions(-)

diff --git a/pack-revindex.c b/pack-revindex.c
index b4d2b35..0bb13b1 100644
--- a/pack-revindex.c
+++ b/pack-revindex.c
@@ -16,11 +16,6 @@
  * get the object sha1 from the main index.
  */
 
-struct pack_revindex {
-   struct packed_git *p;
-   struct revindex_entry *revindex;
-};
-
 static struct pack_revindex *pack_revindex;
 static int pack_revindex_hashsz;
 
@@ -201,15 +196,14 @@ static void create_pack_revindex(struct pack_revindex 
*rix)
sort_revindex(rix-revindex, num_ent, p-pack_size);
 }
 
-struct revindex_entry *find_pack_revindex(struct packed_git *p, off_t ofs)
+struct pack_revindex *revindex_for_pack(struct packed_git *p)
 {
int num;
-   unsigned lo, hi;
struct pack_revindex *rix;
-   struct revindex_entry *revindex;
 
if (!pack_revindex_hashsz)
init_pack_revindex();
+
num = pack_revindex_ix(p);
if (num  0)
die(internal error: pack revindex fubar);
@@ -217,21 +211,39 @@ struct revindex_entry *find_pack_revindex(struct 
packed_git *p, off_t ofs)
rix = pack_revindex[num];
if (!rix-revindex)
create_pack_revindex(rix);
-   revindex = rix-revindex;
 
-   lo = 0;
-   hi = p-num_objects + 1;
+   return rix;
+}
+
+int find_revindex_position(struct pack_revindex *pridx, off_t ofs)
+{
+   int lo = 0;
+   int hi = pridx-p-num_objects + 1;
+   struct revindex_entry *revindex = pridx-revindex;
+
do {
unsigned mi = lo + (hi - lo) / 2;
if (revindex[mi].offset == ofs) {
-   return revindex + mi;
+   return mi;
} else if (ofs  revindex[mi].offset)
hi = mi;
else
lo = mi + 1;
} while (lo  hi);
+
error(bad offset for revindex);
-   return NULL;
+   return -1;
+}
+
+struct revindex_entry *find_pack_revindex(struct packed_git *p, off_t ofs)
+{
+   struct pack_revindex *pridx = revindex_for_pack(p);
+   int pos = find_revindex_position(pridx, ofs);
+
+   if (pos  0)
+   return NULL;
+
+   return pridx-revindex + pos;
 }
 
 void discard_revindex(void)
diff --git a/pack-revindex.h b/pack-revindex.h
index 8d5027a..866ca9c 100644
--- a/pack-revindex.h
+++ b/pack-revindex.h
@@ -6,6 +6,14 @@ struct revindex_entry {
unsigned int nr;
 };
 
+struct pack_revindex {
+   struct packed_git *p;
+   struct revindex_entry *revindex;
+};
+
+struct pack_revindex *revindex_for_pack(struct packed_git *p);
+int find_revindex_position(struct pack_revindex *pridx, off_t ofs);
+
 struct revindex_entry *find_pack_revindex(struct packed_git *p, off_t ofs);
 void discard_revindex(void);
 
-- 
1.8.4.1.898.g8bf8a41.dirty

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 03/19] pack-objects: Refactor the packing list

From: Vicent Marti tan...@gmail.com

The hash table that stores the packing list for a given `pack-objects`
run was tightly coupled to the pack-objects code.

In this commit, we refactor the hash table and the underlying storage
array into a `packing_data` struct. The functionality for accessing and
adding entries to the packing list is hence accessible from other parts
of Git besides the `pack-objects` builtin.

This refactoring is a requirement for further patches in this series
that will require accessing the commit packing list from outside of
`pack-objects`.

The hash table implementation has been minimally altered: we now
use table sizes which are always a power of two, to ensure a uniform
index distribution in the array.

Signed-off-by: Vicent Marti tan...@gmail.com
Signed-off-by: Jeff King p...@peff.net
---
 Makefile   |   2 +
 builtin/pack-objects.c | 175 +++--
 pack-objects.c | 111 +++
 pack-objects.h |  47 +
 4 files changed, 200 insertions(+), 135 deletions(-)
 create mode 100644 pack-objects.c
 create mode 100644 pack-objects.h

diff --git a/Makefile b/Makefile
index af847f8..48ff0bd 100644
--- a/Makefile
+++ b/Makefile
@@ -694,6 +694,7 @@ LIB_H += notes-merge.h
 LIB_H += notes-utils.h
 LIB_H += notes.h
 LIB_H += object.h
+LIB_H += pack-objects.h
 LIB_H += pack-revindex.h
 LIB_H += pack.h
 LIB_H += parse-options.h
@@ -831,6 +832,7 @@ LIB_OBJS += notes-merge.o
 LIB_OBJS += notes-utils.o
 LIB_OBJS += object.o
 LIB_OBJS += pack-check.o
+LIB_OBJS += pack-objects.o
 LIB_OBJS += pack-revindex.o
 LIB_OBJS += pack-write.o
 LIB_OBJS += pager.o
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 36273dd..f3f0cf9 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -14,6 +14,7 @@
 #include diff.h
 #include revision.h
 #include list-objects.h
+#include pack-objects.h
 #include progress.h
 #include refs.h
 #include streaming.h
@@ -25,42 +26,15 @@ static const char *pack_usage[] = {
NULL
 };
 
-struct object_entry {
-   struct pack_idx_entry idx;
-   unsigned long size; /* uncompressed size */
-   struct packed_git *in_pack; /* already in pack */
-   off_t in_pack_offset;
-   struct object_entry *delta; /* delta base object */
-   struct object_entry *delta_child; /* deltified objects who bases me */
-   struct object_entry *delta_sibling; /* other deltified objects who
-* uses the same base as me
-*/
-   void *delta_data;   /* cached delta (uncompressed) */
-   unsigned long delta_size;   /* delta data size (uncompressed) */
-   unsigned long z_delta_size; /* delta data size (compressed) */
-   enum object_type type;
-   enum object_type in_pack_type;  /* could be delta */
-   uint32_t hash;  /* name hint hash */
-   unsigned char in_pack_header_size;
-   unsigned preferred_base:1; /*
-   * we do not pack this, but is available
-   * to be used as the base object to delta
-   * objects against.
-   */
-   unsigned no_try_delta:1;
-   unsigned tagged:1; /* near the very tip of refs */
-   unsigned filled:1; /* assigned write-order */
-};
-
 /*
- * Objects we are going to pack are collected in objects array (dynamically
- * expanded).  nr_objects  nr_alloc controls this array.  They are stored
- * in the order we see -- typically rev-list --objects order that gives us
- * nice minimum seek order.
+ * Objects we are going to pack are collected in the `to_pack` structure.
+ * It contains an array (dynamically expanded) of the object data, and a map
+ * that can resolve SHA1s to their position in the array.
  */
-static struct object_entry *objects;
+static struct packing_data to_pack;
+
 static struct pack_idx_entry **written_list;
-static uint32_t nr_objects, nr_alloc, nr_result, nr_written;
+static uint32_t nr_result, nr_written;
 
 static int non_empty;
 static int reuse_delta = 1, reuse_object = 1;
@@ -90,21 +64,11 @@ static unsigned long cache_max_small_delta_size = 1000;
 static unsigned long window_memory_limit = 0;
 
 /*
- * The object names in objects array are hashed with this hashtable,
- * to help looking up the entry by object name.
- * This hashtable is built after all the objects are seen.
- */
-static int *object_ix;
-static int object_ix_hashsz;
-static struct object_entry *locate_object_entry(const unsigned char *sha1);
-
-/*
  * stats
  */
 static uint32_t written, written_delta;
 static uint32_t reused, reused_delta;
 
-
 static void *get_delta(struct object_entry *entry)
 {
unsigned long size, base_size, delta_size;
@@ -553,12 +517,12 @@ static int mark_tagged(const char *path, const unsigned 
char *sha1, int

[PATCH 04/19] pack-objects: factor out name_hash

From: Vicent Marti tan...@gmail.com

As the pack-objects system grows beyond the single
pack-objects.c file, more parts (like the soon-to-exist
bitmap code) will need to compute hashes for matching
deltas. Factor out name_hash to make it available to other
files.

Signed-off-by: Vicent Marti tan...@gmail.com
Signed-off-by: Jeff King p...@peff.net
---
 builtin/pack-objects.c | 24 ++--
 pack-objects.h | 20 
 2 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index f3f0cf9..faf746b 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -778,26 +778,6 @@ static void write_pack_file(void)
written, nr_result);
 }
 
-static uint32_t name_hash(const char *name)
-{
-   uint32_t c, hash = 0;
-
-   if (!name)
-   return 0;
-
-   /*
-* This effectively just creates a sortable number from the
-* last sixteen non-whitespace characters. Last characters
-* count most, so things that end in .c sort together.
-*/
-   while ((c = *name++) != 0) {
-   if (isspace(c))
-   continue;
-   hash = (hash  2) + (c  24);
-   }
-   return hash;
-}
-
 static void setup_delta_attr_check(struct git_attr_check *check)
 {
static struct git_attr *attr_delta;
@@ -826,7 +806,7 @@ static int add_object_entry(const unsigned char *sha1, enum 
object_type type,
struct object_entry *entry;
struct packed_git *p, *found_pack = NULL;
off_t found_offset = 0;
-   uint32_t hash = name_hash(name);
+   uint32_t hash = pack_name_hash(name);
uint32_t index_pos;
 
entry = packlist_find(to_pack, sha1, index_pos);
@@ -1082,7 +1062,7 @@ static void add_preferred_base_object(const char *name)
 {
struct pbase_tree *it;
int cmplen;
-   unsigned hash = name_hash(name);
+   unsigned hash = pack_name_hash(name);
 
if (!num_preferred_base || check_pbase_path(hash))
return;
diff --git a/pack-objects.h b/pack-objects.h
index f528215..90ad0a8 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -44,4 +44,24 @@ struct object_entry *packlist_find(struct packing_data 
*pdata,
   const unsigned char *sha1,
   uint32_t *index_pos);
 
+static inline uint32_t pack_name_hash(const char *name)
+{
+   uint32_t c, hash = 0;
+
+   if (!name)
+   return 0;
+
+   /*
+* This effectively just creates a sortable number from the
+* last sixteen non-whitespace characters. Last characters
+* count most, so things that end in .c sort together.
+*/
+   while ((c = *name++) != 0) {
+   if (isspace(c))
+   continue;
+   hash = (hash  2) + (c  24);
+   }
+   return hash;
+}
+
 #endif
-- 
1.8.4.1.898.g8bf8a41.dirty

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 05/19] revision: allow setting custom limiter function

From: Vicent Marti tan...@gmail.com

This commit enables users of `struct rev_info` to peform custom limiting
during a revision walk (i.e. `get_revision`).

If the field `include_check` has been set to a callback, this callback
will be issued once for each commit before it is added to the pending
list of the revwalk. If the include check returns 0, the commit will be
marked as added but won't be pushed to the pending list, effectively
limiting the walk.

Signed-off-by: Vicent Marti tan...@gmail.com
Signed-off-by: Jeff King p...@peff.net
---
 revision.c | 4 
 revision.h | 2 ++
 2 files changed, 6 insertions(+)

diff --git a/revision.c b/revision.c
index 0173e01..cddd605 100644
--- a/revision.c
+++ b/revision.c
@@ -779,6 +779,10 @@ static int add_parents_to_list(struct rev_info *revs, 
struct commit *commit,
return 0;
commit-object.flags |= ADDED;
 
+   if (revs-include_check 
+   !revs-include_check(commit, revs-include_check_data))
+   return 0;
+
/*
 * If the commit is uninteresting, don't try to
 * prune parents - we want the maximal uninteresting
diff --git a/revision.h b/revision.h
index e7f1d21..9957f3c 100644
--- a/revision.h
+++ b/revision.h
@@ -168,6 +168,8 @@ struct rev_info {
unsigned long min_age;
int min_parents;
int max_parents;
+   int (*include_check)(struct commit *, void *);
+   void *include_check_data;
 
/* diff info for patches and for paths limiting */
struct diff_options diffopt;
-- 
1.8.4.1.898.g8bf8a41.dirty

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 06/19] sha1_file: export `git_open_noatime`

From: Vicent Marti tan...@gmail.com

The `git_open_noatime` helper can be of general interest for other
consumers of git's different on-disk formats.

Signed-off-by: Vicent Marti tan...@gmail.com
Signed-off-by: Jeff King p...@peff.net
---
 cache.h | 1 +
 sha1_file.c | 4 +---
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/cache.h b/cache.h
index 5e3fc72..f2e5aa7 100644
--- a/cache.h
+++ b/cache.h
@@ -780,6 +780,7 @@ extern int hash_sha1_file(const void *buf, unsigned long 
len, const char *type,
 extern int write_sha1_file(const void *buf, unsigned long len, const char 
*type, unsigned char *return_sha1);
 extern int pretend_sha1_file(void *, unsigned long, enum object_type, unsigned 
char *);
 extern int force_object_loose(const unsigned char *sha1, time_t mtime);
+extern int git_open_noatime(const char *name);
 extern void *map_sha1_file(const unsigned char *sha1, unsigned long *size);
 extern int unpack_sha1_header(git_zstream *stream, unsigned char *map, 
unsigned long mapsize, void *buffer, unsigned long bufsiz);
 extern int parse_sha1_header(const char *hdr, unsigned long *sizep);
diff --git a/sha1_file.c b/sha1_file.c
index f80bbe4..4714bd8 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -239,8 +239,6 @@ char *sha1_pack_index_name(const unsigned char *sha1)
 struct alternate_object_database *alt_odb_list;
 static struct alternate_object_database **alt_odb_tail;
 
-static int git_open_noatime(const char *name);
-
 /*
  * Prepare alternate object database registry.
  *
@@ -1357,7 +1355,7 @@ int check_sha1_signature(const unsigned char *sha1, void 
*map,
return hashcmp(sha1, real_sha1) ? -1 : 0;
 }
 
-static int git_open_noatime(const char *name)
+int git_open_noatime(const char *name)
 {
static int sha1_file_open_flag = O_NOATIME;
 
-- 
1.8.4.1.898.g8bf8a41.dirty

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 07/19] compat: add endianness helpers

From: Vicent Marti tan...@gmail.com

The POSIX standard doesn't currently define a `nothll`/`htonll`
function pair to perform network-to-host and host-to-network
swaps of 64-bit data. These 64-bit swaps are necessary for the on-disk
storage of EWAH bitmaps if they are not in native byte order.

Signed-off-by: Vicent Marti tan...@gmail.com
Signed-off-by: Jeff King p...@peff.net
---
 compat/bswap.h | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/compat/bswap.h b/compat/bswap.h
index 5061214..ea1a9ed 100644
--- a/compat/bswap.h
+++ b/compat/bswap.h
@@ -47,4 +47,39 @@ static inline uint32_t git_bswap32(uint32_t x)
 #define ntohl(x) bswap32(x)
 #define htonl(x) bswap32(x)
 
+#ifndef __BYTE_ORDER
+#  if defined(BYTE_ORDER)  defined(LITTLE_ENDIAN)  defined(BIG_ENDIAN)
+#  define __BYTE_ORDER BYTE_ORDER
+#  define __LITTLE_ENDIAN LITTLE_ENDIAN
+#  define __BIG_ENDIAN BIG_ENDIAN
+#  else
+#  error Cannot determine endianness
+#  endif
+#endif
+
+#if __BYTE_ORDER == __BIG_ENDIAN
+# define ntohll(n) (n)
+# define htonll(n) (n)
+#elif __BYTE_ORDER == __LITTLE_ENDIAN
+#  if defined(__GNUC__)  defined(__GLIBC__)
+#  include byteswap.h
+#  else /* GNUC  GLIBC */
+static inline uint64_t bswap_64(uint64_t val)
+{
+   return ((val  (uint64_t)0x00ffULL)  56)
+   | ((val  (uint64_t)0xff00ULL)  40)
+   | ((val  (uint64_t)0x00ffULL)  24)
+   | ((val  (uint64_t)0xff00ULL)   8)
+   | ((val  (uint64_t)0x00ffULL)   8)
+   | ((val  (uint64_t)0xff00ULL)  24)
+   | ((val  (uint64_t)0x00ffULL)  40)
+   | ((val  (uint64_t)0xff00ULL)  56);
+}
+#  endif /* GNUC  GLIBC */
+#  define ntohll(n) bswap_64(n)
+#  define htonll(n) bswap_64(n)
+#else /* __BYTE_ORDER */
+#  error Can't define htonll or ntohll!
+#endif
+
 #endif
-- 
1.8.4.1.898.g8bf8a41.dirty

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 08/19] ewah: compressed bitmap implementation

From: Vicent Marti tan...@gmail.com

EWAH is a word-aligned compressed variant of a bitset (i.e. a data
structure that acts as a 0-indexed boolean array for many entries).

It uses a 64-bit run-length encoding (RLE) compression scheme,
trading some compression for better processing speed.

The goal of this word-aligned implementation is not to achieve
the best compression, but rather to improve query processing time.
As it stands right now, this EWAH implementation will always be more
efficient storage-wise than its uncompressed alternative.

EWAH arrays will be used as the on-disk format to store reachability
bitmaps for all objects in a repository while keeping reasonable sizes,
in the same way that JGit does.

This EWAH implementation is a mostly straightforward port of the
original `javaewah` library that JGit currently uses. The library is
self-contained and has been embedded whole (4 files) inside the `ewah`
folder to ease redistribution.

The library is re-licensed under the GPLv2 with the permission of Daniel
Lemire, the original author. The source code for the C version can
be found on GitHub:

https://github.com/vmg/libewok

The original Java implementation can also be found on GitHub:

https://github.com/lemire/javaewah

Signed-off-by: Vicent Marti tan...@gmail.com
Signed-off-by: Jeff King p...@peff.net
---
 Makefile   |   6 +
 ewah/bitmap.c  | 226 +
 ewah/ewah_bitmap.c | 701 +
 ewah/ewah_io.c | 192 +++
 ewah/ewah_rlw.c| 120 +
 ewah/ewok.h| 223 +
 ewah/ewok_rlw.h| 114 +
 7 files changed, 1582 insertions(+)
 create mode 100644 ewah/bitmap.c
 create mode 100644 ewah/ewah_bitmap.c
 create mode 100644 ewah/ewah_io.c
 create mode 100644 ewah/ewah_rlw.c
 create mode 100644 ewah/ewok.h
 create mode 100644 ewah/ewok_rlw.h

diff --git a/Makefile b/Makefile
index 48ff0bd..eacb940 100644
--- a/Makefile
+++ b/Makefile
@@ -667,6 +667,8 @@ LIB_H += diff.h
 LIB_H += diffcore.h
 LIB_H += dir.h
 LIB_H += exec_cmd.h
+LIB_H += ewah/ewok.h
+LIB_H += ewah/ewok_rlw.h
 LIB_H += fetch-pack.h
 LIB_H += fmt-merge-msg.h
 LIB_H += fsck.h
@@ -800,6 +802,10 @@ LIB_OBJS += dir.o
 LIB_OBJS += editor.o
 LIB_OBJS += entry.o
 LIB_OBJS += environment.o
+LIB_OBJS += ewah/bitmap.o
+LIB_OBJS += ewah/ewah_bitmap.o
+LIB_OBJS += ewah/ewah_io.o
+LIB_OBJS += ewah/ewah_rlw.o
 LIB_OBJS += exec_cmd.o
 LIB_OBJS += fetch-pack.o
 LIB_OBJS += fsck.o
diff --git a/ewah/bitmap.c b/ewah/bitmap.c
new file mode 100644
index 000..d248577
--- /dev/null
+++ b/ewah/bitmap.c
@@ -0,0 +1,226 @@
+/**
+ * Copyright 2013, GitHub, Inc
+ * Copyright 2009-2013, Daniel Lemire, Cliff Moon,
+ * David McIntosh, Robert Becho, Google Inc. and Veronika Zenz
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, 
USA.
+ */
+#include assert.h
+#include stdlib.h
+#include string.h
+
+#include ewok.h
+
+#define MASK(x) ((eword_t)1  (x % BITS_IN_WORD))
+#define BLOCK(x) (x / BITS_IN_WORD)
+
+struct bitmap *bitmap_new(void)
+{
+   struct bitmap *bitmap = ewah_malloc(sizeof(struct bitmap));
+   bitmap-words = ewah_calloc(32, sizeof(eword_t));
+   bitmap-word_alloc = 32;
+   return bitmap;
+}
+
+void bitmap_set(struct bitmap *self, size_t pos)
+{
+   size_t block = BLOCK(pos);
+
+   if (block = self-word_alloc) {
+   size_t old_size = self-word_alloc;
+   self-word_alloc = block * 2;
+   self-words = ewah_realloc(self-words, self-word_alloc * 
sizeof(eword_t));
+
+   memset(self-words + old_size, 0x0,
+   (self-word_alloc - old_size) * sizeof(eword_t));
+   }
+
+   self-words[block] |= MASK(pos);
+}
+
+void bitmap_clear(struct bitmap *self, size_t pos)
+{
+   size_t block = BLOCK(pos);
+
+   if (block  self-word_alloc)
+   self-words[block] = ~MASK(pos);
+}
+
+int bitmap_get(struct bitmap *self, size_t pos)
+{
+   size_t block = BLOCK(pos);
+   return block  self-word_alloc  (self-words[block]  MASK(pos)) != 
0;
+}
+
+struct ewah_bitmap *bitmap_to_ewah(struct bitmap *bitmap)
+{
+   struct ewah_bitmap *ewah = ewah_new();
+   size_t i, running_empty_words = 0;
+   eword_t last_word = 0;
+
+   for (i = 0; i

[PATCH 11/19] pack-objects: use bitmaps when packing objects

From: Vicent Marti tan...@gmail.com

In this patch, we use the bitmap API to perform the `Counting Objects`
phase in pack-objects, rather than a traditional walk through the object
graph. For a reasonably-packed large repo, the time to fetch and clone
is often dominated by the full-object revision walk during the Counting
Objects phase. Using bitmaps can reduce the CPU time required on the
server (and therefore start sending the actual pack data with less
delay).

For bitmaps to be used, the following must be true:

  1. We must be packing to stdout (as a normal `pack-objects` from
 `upload-pack` would do).

  2. There must be a .bitmap index containing at least one of the
 have objects that the client is asking for.

  3. Bitmaps must be enabled (they are enabled by default, but can be
 disabled by setting `pack.usebitmaps` to false, or by using
 `--no-use-bitmap-index` on the command-line).

If any of these is not true, we fall back to doing a normal walk of the
object graph.

Here are some sample timings from a full pack of `torvalds/linux` (i.e.
something very similar to what would be generated for a clone of the
repository) that show the speedup produced by various
methods:

[existing graph traversal]
$ time git pack-objects --all --stdout --no-use-bitmap-index \
/dev/null /dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)

real0m44.111s
user0m42.396s
sys 0m3.544s

[bitmaps only, without partial pack reuse; note that
 pack reuse is automatic, so timing this required a
 patch to disable it]
$ time git pack-objects --all --stdout /dev/null /dev/null
Counting objects: 3237103, done.
Compressing objects: 100% (508752/508752), done.
Total 3237103 (delta 2699584), reused 3237103 (delta 2699584)

real0m5.413s
user0m5.604s
sys 0m1.804s

[bitmaps with pack reuse (what you get with this patch)]
$ time git pack-objects --all --stdout /dev/null /dev/null
Reusing existing pack: 3237103, done.
Total 3237103 (delta 0), reused 0 (delta 0)

real0m1.636s
user0m1.460s
sys 0m0.172s

Signed-off-by: Vicent Marti tan...@gmail.com
Signed-off-by: Jeff King p...@peff.net
---
 builtin/pack-objects.c | 169 +
 1 file changed, 144 insertions(+), 25 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index faf746b..d15b9db 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -19,6 +19,7 @@
 #include refs.h
 #include streaming.h
 #include thread-utils.h
+#include pack-bitmap.h
 
 static const char *pack_usage[] = {
N_(git pack-objects --stdout [options...] [ ref-list |  
object-list]),
@@ -57,12 +58,23 @@ static struct progress *progress_state;
 static int pack_compression_level = Z_DEFAULT_COMPRESSION;
 static int pack_compression_seen;
 
+static struct packed_git *reuse_packfile;
+static uint32_t reuse_packfile_objects;
+static off_t reuse_packfile_offset;
+
+static int use_bitmap_index = 1;
+
 static unsigned long delta_cache_size = 0;
 static unsigned long max_delta_cache_size = 256 * 1024 * 1024;
 static unsigned long cache_max_small_delta_size = 1000;
 
 static unsigned long window_memory_limit = 0;
 
+enum {
+   OBJECT_ENTRY_EXCLUDE = (1  0),
+   OBJECT_ENTRY_NO_TRY_DELTA = (1  1)
+};
+
 /*
  * stats
  */
@@ -678,6 +690,49 @@ static struct object_entry **compute_write_order(void)
return wo;
 }
 
+static off_t write_reused_pack(struct sha1file *f)
+{
+   uint8_t buffer[8192];
+   off_t to_write;
+   int fd;
+
+   if (!is_pack_valid(reuse_packfile))
+   return 0;
+
+   fd = git_open_noatime(reuse_packfile-pack_name);
+   if (fd  0)
+   return 0;
+
+   if (lseek(fd, sizeof(struct pack_header), SEEK_SET) == -1) {
+   close(fd);
+   return 0;
+   }
+
+   if (reuse_packfile_offset  0)
+   reuse_packfile_offset = reuse_packfile-pack_size - 20;
+
+   to_write = reuse_packfile_offset - sizeof(struct pack_header);
+
+   while (to_write) {
+   int read_pack = xread(fd, buffer, sizeof(buffer));
+
+   if (read_pack = 0) {
+   close(fd);
+   return 0;
+   }
+
+   if (read_pack  to_write)
+   read_pack = to_write;
+
+   sha1write(f, buffer, read_pack);
+   to_write -= read_pack;
+   }
+
+   close(fd);
+   written += reuse_packfile_objects;
+   return reuse_packfile_offset - sizeof(struct pack_header);
+}
+
 static void write_pack_file(void)
 {
uint32_t i = 0, j;
@@ -704,6 +759,18 @@ static void write_pack_file(void)
offset = write_pack_header(f, nr_remaining);
if

[PATCH 09/19] documentation: add documentation for the bitmap format

From: Vicent Marti tan...@gmail.com

This is the technical documentation for the JGit-compatible Bitmap v1
on-disk format.

Signed-off-by: Vicent Marti tan...@gmail.com
Signed-off-by: Jeff King p...@peff.net
---
 Documentation/technical/bitmap-format.txt | 131 ++
 1 file changed, 131 insertions(+)
 create mode 100644 Documentation/technical/bitmap-format.txt

diff --git a/Documentation/technical/bitmap-format.txt 
b/Documentation/technical/bitmap-format.txt
new file mode 100644
index 000..c686dd1
--- /dev/null
+++ b/Documentation/technical/bitmap-format.txt
@@ -0,0 +1,131 @@
+GIT bitmap v1 format
+
+
+   - A header appears at the beginning:
+
+   4-byte signature: {'B', 'I', 'T', 'M'}
+
+   2-byte version number (network byte order)
+   The current implementation only supports version 1
+   of the bitmap index (the same one as JGit).
+
+   2-byte flags (network byte order)
+
+   The following flags are supported:
+
+   - BITMAP_OPT_FULL_DAG (0x1) REQUIRED
+   This flag must always be present. It implies that the 
bitmap
+   index has been generated for a packfile with full 
closure
+   (i.e. where every single object in the packfile can find
+its parent links inside the same packfile). This is a
+   requirement for the bitmap index format, also present 
in JGit,
+   that greatly reduces the complexity of the 
implementation.
+
+   4-byte entry count (network byte order)
+
+   The total count of entries (bitmapped commits) in this 
bitmap index.
+
+   20-byte checksum
+
+   The SHA1 checksum of the pack this bitmap index belongs 
to.
+
+   - 4 EWAH bitmaps that act as type indexes
+
+   Type indexes are serialized after the hash cache in the shape
+   of four EWAH bitmaps stored consecutively (see Appendix A for
+   the serialization format of an EWAH bitmap).
+
+   There is a bitmap for each Git object type, stored in the 
following
+   order:
+
+   - Commits
+   - Trees
+   - Blobs
+   - Tags
+
+   In each bitmap, the `n`th bit is set to true if the `n`th object
+   in the packfile is of that type.
+
+   The obvious consequence is that the OR of all 4 bitmaps will 
result
+   in a full set (all bits set), and the AND of all 4 bitmaps will
+   result in an empty bitmap (no bits set).
+
+   - N entries with compressed bitmaps, one for each indexed commit
+
+   Where `N` is the total amount of entries in this bitmap index.
+   Each entry contains the following:
+
+   - 4-byte object position (network byte order)
+   The position **in the index for the packfile** where the
+   bitmap for this commit is found.
+
+   - 1-byte XOR-offset
+   The xor offset used to compress this bitmap. For an 
entry
+   in position `x`, a XOR offset of `y` means that the 
actual
+   bitmap representing this commit is composed by XORing 
the
+   bitmap for this entry with the bitmap in entry `x-y` 
(i.e.
+   the bitmap `y` entries before this one).
+
+   Note that this compression can be recursive. In order to
+   XOR this entry with a previous one, the previous entry 
needs
+   to be decompressed first, and so on.
+
+   The hard-limit for this offset is 160 (an entry can 
only be
+   xor'ed against one of the 160 entries preceding it). 
This
+   number is always positive, and hence entries are always 
xor'ed
+   with **previous** bitmaps, not bitmaps that will come 
afterwards
+   in the index.
+
+   - 1-byte flags for this bitmap
+   At the moment the only available flag is `0x1`, which 
hints
+   that this bitmap can be re-used when rebuilding bitmap 
indexes
+   for the repository.
+
+   - The compressed bitmap itself, see Appendix A.
+
+== Appendix A: Serialization format for an EWAH bitmap
+
+Ewah bitmaps are serialized in the protocol as the JAVAEWAH
+library, making them backwards compatible with the JGit
+implementation:
+
+   - 4-byte number of bits of the resulting UNCOMPRESSED bitmap
+
+   - 4-byte number of words of the COMPRESSED bitmap, when stored
+
+   - N x 8-byte words, as specified by the previous field
+
+

[PATCH 13/19] pack-objects: implement bitmap writing

From: Vicent Marti tan...@gmail.com

This commit extends more the functionality of `pack-objects` by allowing
it to write out a `.bitmap` index next to any written packs, together
with the `.idx` index that currently gets written.

If bitmap writing is enabled for a given repository (either by calling
`pack-objects` with the `--write-bitmap-index` flag or by having
`pack.writebitmaps` set to `true` in the config) and pack-objects is
writing a packfile that would normally be indexed (i.e. not piping to
stdout), we will attempt to write the corresponding bitmap index for the
packfile.

Bitmap index writing happens after the packfile and its index has been
successfully written to disk (`finish_tmp_packfile`). The process is
performed in several steps:

1. `bitmap_writer_set_checksum`: this call stores the partial
   checksum for the packfile being written; the checksum will be
   written in the resulting bitmap index to verify its integrity

2. `bitmap_writer_build_type_index`: this call uses the array of
   `struct object_entry` that has just been sorted when writing out
   the actual packfile index to disk to generate 4 type-index bitmaps
   (one for each object type).

   These bitmaps have their nth bit set if the given object is of
   the bitmap's type. E.g. the nth bit of the Commits bitmap will be
   1 if the nth object in the packfile index is a commit.

   This is a very cheap operation because the bitmap writing code has
   access to the metadata stored in the `struct object_entry` array,
   and hence the real type for each object in the packfile.

3. `bitmap_writer_reuse_bitmaps`: if there exists an existing bitmap
   index for one of the packfiles we're trying to repack, this call
   will efficiently rebuild the existing bitmaps so they can be
   reused on the new index. All the existing bitmaps will be stored
   in a `reuse` hash table, and the commit selection phase will
   prioritize these when selecting, as they can be written directly
   to the new index without having to perform a revision walk to
   fill the bitmap. This can greatly speed up the repack of a
   repository that already has bitmaps.

4. `bitmap_writer_select_commits`: if bitmap writing is enabled for
   a given `pack-objects` run, the sequence of commits generated
   during the Counting Objects phase will be stored in an array.

   We then use that array to build up the list of selected commits.
   Writing a bitmap in the index for each object in the repository
   would be cost-prohibitive, so we use a simple heuristic to pick
   the commits that will be indexed with bitmaps.

   The current heuristics are a simplified version of JGit's
   original implementation. We select a higher density of commits
   depending on their age: the 100 most recent commits are always
   selected, after that we pick 1 commit of each 100, and the gap
   increases as the commits grow older. On top of that, we make sure
   that every single branch that has not been merged (all the tips
   that would be required from a clone) gets their own bitmap, and
   when selecting commits between a gap, we tend to prioritize the
   commit with the most parents.

   Do note that there is no right/wrong way to perform commit
   selection; different selection algorithms will result in
   different commits being selected, but there's no such thing as
   missing a commit. The bitmap walker algorithm implemented in
   `prepare_bitmap_walk` is able to adapt to missing bitmaps by
   performing manual walks that complete the bitmap: the ideal
   selection algorithm, however, would select the commits that are
   more likely to be used as roots for a walk in the future (e.g.
   the tips of each branch, and so on) to ensure a bitmap for them
   is always available.

5. `bitmap_writer_build`: this is the computationally expensive part
   of bitmap generation. Based on the list of commits that were
   selected in the previous step, we perform several incremental
   walks to generate the bitmap for each commit.

   The walks begin from the oldest commit, and are built up
   incrementally for each branch. E.g. consider this dag where A, B,
   C, D, E, F are the selected commits, and a, b, c, e are a chunk
   of simplified history that will not receive bitmaps.

A---a---B--b--C--c--D
 \
  E--e--F

   We start by building the bitmap for A, using A as the root for a
   revision walk and marking all the objects that are reachable
   until the walk is over. Once this bitmap is stored, we reuse the
   bitmap walker to perform the walk for B, assuming that once we
   reach A again, the walk will be terminated because A has already
   been SEEN on the previous walk.

   This process

Fwd: Errors when diff'ing arbitrary upstream remotes are not intuitive if git pull --all hasn't been done

2013-10-24 Thread yaneurabeya .

Hi!
I added an arbitrary upstream remote thinking that I could just
git diff the upstream remote's master. Turns out I needed to run git
pull --all in order to be able to diff the file (I forgot that step).
Could this error message be improved for interactive commands by
first checking to see whether or not the path starts with a remote,
then recommend that the remote be pulled? If this seems sane, I could
whip up a patch and post it in a github pull request.
Thanks!
-Garrett

$ git remote add pkohut [...]
$ git diff upstream/master -- conf/ | wc -l
16
$ git diff pkohut/master -- conf/ | wc -l
fatal: bad revision 'pkohut/master'
0
$ git pull --all
Fetching origin
Fetching upstream
Fetching pkohut
...
$ git diff pkohut/master -- conf/ | wc -l
46
$ git --version; uname -a
git version 1.7.9
CYGWIN_NT-6.1 ZL00757 1.7.25(0.270/5/3) 2013-08-31 20:37 x86_64 Cygwin
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 14/19] repack: stop using magic number for ARRAY_SIZE(exts)

We have a static array of extensions, but hardcode the size
of the array in our loops. Let's pull out this magic number,
which will make it easier to change.

Signed-off-by: Jeff King p...@peff.net
---
 builtin/repack.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index a0ff5c7..2e88975 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -115,7 +115,7 @@ static void remove_redundant_pack(const char *dir_name, 
const char *base_name)
 
 int cmd_repack(int argc, const char **argv, const char *prefix)
 {
-   const char *exts[2] = {.pack, .idx};
+   const char *exts[] = {.pack, .idx};
struct child_process cmd;
struct string_list_item *item;
struct argv_array cmd_args = ARGV_ARRAY_INIT;
@@ -258,7 +258,7 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
 */
failed = 0;
for_each_string_list_item(item, names) {
-   for (ext = 0; ext  2; ext++) {
+   for (ext = 0; ext  ARRAY_SIZE(exts); ext++) {
char *fname, *fname_old;
fname = mkpathdup(%s/%s%s, packdir,
item-string, exts[ext]);
@@ -315,7 +315,7 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
 
/* Now the ones with the same name are out of the way... */
for_each_string_list_item(item, names) {
-   for (ext = 0; ext  2; ext++) {
+   for (ext = 0; ext  ARRAY_SIZE(exts); ext++) {
char *fname, *fname_old;
struct stat statbuffer;
fname = mkpathdup(%s/pack-%s%s,
@@ -335,7 +335,7 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
 
/* Remove the old- files */
for_each_string_list_item(item, names) {
-   for (ext = 0; ext  2; ext++) {
+   for (ext = 0; ext  ARRAY_SIZE(exts); ext++) {
char *fname;
fname = mkpath(%s/old-pack-%s%s,
packdir,
-- 
1.8.4.1.898.g8bf8a41.dirty

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 15/19] repack: turn exts array into array-of-struct

This is slightly more verbose, but will let us annotate the
extensions with further options in future commits.

Signed-off-by: Jeff King p...@peff.net
---
 builtin/repack.c | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 2e88975..a176de2 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -115,7 +115,12 @@ static void remove_redundant_pack(const char *dir_name, 
const char *base_name)
 
 int cmd_repack(int argc, const char **argv, const char *prefix)
 {
-   const char *exts[] = {.pack, .idx};
+   struct {
+   const char *name;
+   } exts[] = {
+   {.pack},
+   {.idx},
+   };
struct child_process cmd;
struct string_list_item *item;
struct argv_array cmd_args = ARGV_ARRAY_INIT;
@@ -261,14 +266,14 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
for (ext = 0; ext  ARRAY_SIZE(exts); ext++) {
char *fname, *fname_old;
fname = mkpathdup(%s/%s%s, packdir,
-   item-string, exts[ext]);
+   item-string, exts[ext].name);
if (!file_exists(fname)) {
free(fname);
continue;
}
 
fname_old = mkpath(%s/old-%s%s, packdir,
-   item-string, exts[ext]);
+   item-string, exts[ext].name);
if (file_exists(fname_old))
if (unlink(fname_old))
failed = 1;
@@ -319,9 +324,9 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
char *fname, *fname_old;
struct stat statbuffer;
fname = mkpathdup(%s/pack-%s%s,
-   packdir, item-string, exts[ext]);
+   packdir, item-string, exts[ext].name);
fname_old = mkpathdup(%s-%s%s,
-   packtmp, item-string, exts[ext]);
+   packtmp, item-string, exts[ext].name);
if (!stat(fname_old, statbuffer)) {
statbuffer.st_mode = ~(S_IWUSR | S_IWGRP | 
S_IWOTH);
chmod(fname_old, statbuffer.st_mode);
@@ -340,7 +345,7 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
fname = mkpath(%s/old-pack-%s%s,
packdir,
item-string,
-   exts[ext]);
+   exts[ext].name);
if (remove_path(fname))
warning(_(removing '%s' failed), fname);
}
-- 
1.8.4.1.898.g8bf8a41.dirty

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 16/19] repack: handle optional files created by pack-objects

We ask pack-objects to pack to a set of temporary files, and
then rename them into place. Some files that pack-objects
creates may be optional (like a .bitmap file), in which case
we would not want to call rename(). We already call stat()
and make the chmod optional if the file cannot be accessed.
We could simply skip the rename step in this case, but that
would be a minor regression in noticing problems with
non-optional files (like the .pack and .idx files).

Instead, we can now annotate extensions as optional, and
skip them if they don't exist (and otherwise rely on
rename() to barf).

Signed-off-by: Jeff King p...@peff.net
---
 builtin/repack.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index a176de2..8b7dfd0 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -117,6 +117,7 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
 {
struct {
const char *name;
+   unsigned optional:1;
} exts[] = {
{.pack},
{.idx},
@@ -323,6 +324,7 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
for (ext = 0; ext  ARRAY_SIZE(exts); ext++) {
char *fname, *fname_old;
struct stat statbuffer;
+   int exists = 0;
fname = mkpathdup(%s/pack-%s%s,
packdir, item-string, exts[ext].name);
fname_old = mkpathdup(%s-%s%s,
@@ -330,9 +332,12 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
if (!stat(fname_old, statbuffer)) {
statbuffer.st_mode = ~(S_IWUSR | S_IWGRP | 
S_IWOTH);
chmod(fname_old, statbuffer.st_mode);
+   exists = 1;
+   }
+   if (exists || !exts[ext].optional) {
+   if (rename(fname_old, fname))
+   die_errno(_(renaming '%s' failed), 
fname_old);
}
-   if (rename(fname_old, fname))
-   die_errno(_(renaming '%s' failed), fname_old);
free(fname);
free(fname_old);
}
-- 
1.8.4.1.898.g8bf8a41.dirty

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 18/19] t: add basic bitmap functionality tests

Now that we can read and write bitmaps, we can exercise them
with some basic functionality tests. These tests aren't
particularly useful for seeing the benefit, as the test
repo is too small for it to make a difference. However, we
can at least check that using bitmaps does not break anything.

Signed-off-by: Jeff King p...@peff.net
---
 t/t5310-pack-bitmaps.sh | 114 
 1 file changed, 114 insertions(+)
 create mode 100755 t/t5310-pack-bitmaps.sh

diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
new file mode 100755
index 000..0868725
--- /dev/null
+++ b/t/t5310-pack-bitmaps.sh
@@ -0,0 +1,114 @@
+#!/bin/sh
+
+test_description='exercise basic bitmap functionality'
+. ./test-lib.sh
+
+test_expect_success 'setup repo with moderate-sized history' '
+   for i in $(test_seq 1 10); do
+   test_commit $i
+   done 
+   git checkout -b other HEAD~5 
+   for i in `test_seq 1 10`; do
+   test_commit side-$i
+   done 
+   git checkout master 
+   blob=$(echo tagged-blob | git hash-object -w --stdin) 
+   git tag tagged-blob $blob 
+   git config pack.writebitmaps true
+'
+
+test_expect_success 'full repack creates bitmaps' '
+   git repack -ad 
+   ls .git/objects/pack/ | grep bitmap output 
+   test_line_count = 1 output
+'
+
+test_expect_success 'rev-list --test-bitmap verifies bitmaps' '
+   git rev-list --test-bitmap HEAD
+'
+
+rev_list_tests() {
+   state=$1
+
+   test_expect_success counting commits via bitmap ($state) '
+   git rev-list --count HEAD expect 
+   git rev-list --use-bitmap-index --count HEAD actual 
+   test_cmp expect actual
+   '
+
+   test_expect_success counting partial commits via bitmap ($state) '
+   git rev-list --count HEAD~5..HEAD expect 
+   git rev-list --use-bitmap-index --count HEAD~5..HEAD actual 
+   test_cmp expect actual
+   '
+
+   test_expect_success counting non-linear history ($state) '
+   git rev-list --count other...master expect 
+   git rev-list --use-bitmap-index --count other...master actual 

+   test_cmp expect actual
+   '
+
+   test_expect_success enumerate --objects ($state) '
+   git rev-list --objects --use-bitmap-index HEAD tmp 
+   cut -d  -f1 tmp tmp2 
+   sort tmp2 actual 
+   git rev-list --objects HEAD tmp 
+   cut -d  -f1 tmp tmp2 
+   sort tmp2 expect 
+   test_cmp expect actual
+   '
+
+   test_expect_success bitmap --objects handles non-commit objects 
($state) '
+   git rev-list --objects --use-bitmap-index HEAD tagged-blob 
actual 
+   grep $blob actual
+   '
+}
+
+rev_list_tests 'full bitmap'
+
+test_expect_success 'clone from bitmapped repository' '
+   git clone --no-local --bare . clone.git 
+   git rev-parse HEAD expect 
+   git --git-dir=clone.git rev-parse HEAD actual 
+   test_cmp expect actual
+'
+
+test_expect_success 'setup further non-bitmapped commits' '
+   for i in `test_seq 1 10`; do
+   test_commit further-$i
+   done
+'
+
+rev_list_tests 'partial bitmap'
+
+test_expect_success 'fetch (partial bitmap)' '
+   git --git-dir=clone.git fetch origin master:master 
+   git rev-parse HEAD expect 
+   git --git-dir=clone.git rev-parse HEAD actual 
+   test_cmp expect actual
+'
+
+test_expect_success 'incremental repack cannot create bitmaps' '
+   test_commit more-1 
+   test_must_fail git repack -d
+'
+
+test_expect_success 'incremental repack can disable bitmaps' '
+   test_commit more-2 
+   git repack -d --no-write-bitmap-index
+'
+
+test_expect_success 'full repack, reusing previous bitmaps' '
+   git repack -ad 
+   ls .git/objects/pack/ | grep bitmap output 
+   test_line_count = 1 output
+'
+
+test_expect_success 'fetch (full bitmap)' '
+   git --git-dir=clone.git fetch origin master:master 
+   git rev-parse HEAD expect 
+   git --git-dir=clone.git rev-parse HEAD actual 
+   test_cmp expect actual
+'
+
+test_done
-- 
1.8.4.1.898.g8bf8a41.dirty

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 17/19] repack: consider bitmaps when performing repacks

From: Vicent Marti tan...@gmail.com

Since `pack-objects` will write a `.bitmap` file next to the `.pack` and
`.idx` files, this commit teaches `git-repack` to consider the new
bitmap indexes (if they exist) when performing repack operations.

This implies moving old bitmap indexes out of the way if we are
repacking a repository that already has them, and moving the newly
generated bitmap indexes into the `objects/pack` directory, next to
their corresponding packfiles.

Since `git repack` is now capable of handling these `.bitmap` files,
a normal `git gc` run on a repository that has `pack.writebitmaps` set
to true in its config file will generate bitmap indexes as part of the
garbage collection process.

Alternatively, `git repack` can be called with the `-b` switch to
explicitly generate bitmap indexes if you are experimenting
and don't want them on all the time.

Signed-off-by: Vicent Marti tan...@gmail.com
Signed-off-by: Jeff King p...@peff.net
---
 builtin/repack.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/builtin/repack.c b/builtin/repack.c
index 8b7dfd0..239f278 100644
--- a/builtin/repack.c
+++ b/builtin/repack.c
@@ -94,7 +94,7 @@ static void get_non_kept_pack_filenames(struct string_list 
*fname_list)
 
 static void remove_redundant_pack(const char *dir_name, const char *base_name)
 {
-   const char *exts[] = {.pack, .idx, .keep};
+   const char *exts[] = {.pack, .idx, .keep, .bitmap};
int i;
struct strbuf buf = STRBUF_INIT;
size_t plen;
@@ -121,6 +121,7 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
} exts[] = {
{.pack},
{.idx},
+   {.bitmap, 1},
};
struct child_process cmd;
struct string_list_item *item;
@@ -143,6 +144,7 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
int no_update_server_info = 0;
int quiet = 0;
int local = 0;
+   int write_bitmap = -1;
 
struct option builtin_repack_options[] = {
OPT_BIT('a', NULL, pack_everything,
@@ -161,6 +163,8 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
OPT__QUIET(quiet, N_(be quiet)),
OPT_BOOL('l', local, local,
N_(pass --local to git-pack-objects)),
+   OPT_BOOL('b', write-bitmap-index, write_bitmap,
+   N_(write bitmap index)),
OPT_STRING(0, unpack-unreachable, unpack_unreachable, 
N_(approxidate),
N_(with -A, do not loosen objects older than 
this)),
OPT_INTEGER(0, window, window,
@@ -202,6 +206,9 @@ int cmd_repack(int argc, const char **argv, const char 
*prefix)
argv_array_pushf(cmd_args, --no-reuse-delta);
if (no_reuse_object)
argv_array_pushf(cmd_args, --no-reuse-object);
+   if (write_bitmap = 0)
+   argv_array_pushf(cmd_args, --%swrite-bitmap-index,
+write_bitmap ?  : no-);
 
if (pack_everything  ALL_INTO_ONE) {
get_non_kept_pack_filenames(existing_packs);
-- 
1.8.4.1.898.g8bf8a41.dirty

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 19/19] pack-bitmap: implement optional name_hash cache

From: Vicent Marti tan...@gmail.com

When we use pack bitmaps rather than walking the object
graph, we end up with the list of objects to include in the
packfile, but we do not know the path at which any tree or
blob objects would be found.

In a recently packed repository, this is fine. A fetch would
use the paths only as a heuristic in the delta compression
phase, and a fully packed repository should not need to do
much delta compression.

As time passes, though, we may acquire more objects on top
of our large bitmapped pack. If clients fetch frequently,
then they never even look at the bitmapped history, and all
works as usual. However, a client who has not fetched since
the last bitmap repack will have have tips in the
bitmapped history, but want newer objects.

The bitmaps themselves degrade gracefully in this
circumstance. We manually walk the more recent bits of
history, and then use bitmaps when we hit them.

But we would also like to perform delta compression between
the newer objects and the bitmapped objects (both to delta
against what we know the user already has, but also between
new and old objects that the user is fetching). The lack
of pathnames makes our delta heuristics much less effective.

This patch adds an optional cache of the 32-bit name_hash
values to the end of the bitmap file. If present, a reader
can use it to match bitmapped and non-bitmapped names during
delta compression.

Signed-off-by: Vicent Marti tan...@gmail.com
Signed-off-by: Jeff King p...@peff.net
---
 Documentation/technical/bitmap-format.txt | 33 +++
 pack-bitmap-write.c   | 18 -
 pack-bitmap.c | 11 +++
 pack-bitmap.h |  1 +
 4 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/Documentation/technical/bitmap-format.txt 
b/Documentation/technical/bitmap-format.txt
index c686dd1..36a511c 100644
--- a/Documentation/technical/bitmap-format.txt
+++ b/Documentation/technical/bitmap-format.txt
@@ -21,6 +21,12 @@ GIT bitmap v1 format
requirement for the bitmap index format, also present 
in JGit,
that greatly reduces the complexity of the 
implementation.
 
+   - BITMAP_OPT_HASH_CACHE (0x4)
+   If present, the end of the bitmap file contains
+   `N` 32-bit name-hash values, one per object in the
+   pack. The format and meaning of the name-hash is
+   described below.
+
4-byte entry count (network byte order)
 
The total count of entries (bitmapped commits) in this 
bitmap index.
@@ -129,3 +135,30 @@ The bitstream represented by the above chunk is then:
 The next word after `L_M` (if any) must again be a RLW, for the next
 chunk.  For efficient appending to the bitstream, the EWAH stores a
 pointer to the last RLW in the stream.
+
+
+== Appendix B: Optional Bitmap Sections
+
+These sections may or may not be present in the `.bitmap` file; their
+presence is indicated by the header flags section described above.
+
+Name-hash cache
+---
+
+If the BITMAP_OPT_HASH_CACHE flag is set, the end of the bitmap contains
+a cache of 32-bit values, one per object in the pack. The value at
+position `i` is the hash of the pathname at which the `i`th object
+(counting in index order) in the pack can be found.  This can be fed
+into the delta heuristics to compare objects with similar pathnames.
+
+The hash algorithm used is:
+
+hash = 0;
+while ((c = *name++))
+   if (!isspace(c))
+   hash = (hash  2) + (c  24);
+
+Note that this hashing scheme is tied to the BITMAP_OPT_HASH_CACHE flag.
+If implementations want to choose a different hashing scheme, they are
+free to do so, but MUST allocate a new header flag (because comparing
+hashes made under two different schemes would be pointless).
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 6a589c3..c44874a 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -492,6 +492,19 @@ static void write_selected_commits_v1(struct sha1file *f,
}
 }
 
+static void write_hash_cache(struct sha1file *f,
+struct pack_idx_entry **index,
+uint32_t index_nr)
+{
+   uint32_t i;
+
+   for (i = 0; i  index_nr; ++i) {
+   struct object_entry *entry = (struct object_entry *)index[i];
+   uint32_t hash_value = htonl(entry-hash);
+   sha1write(f, hash_value, sizeof(hash_value));
+   }
+}
+
 void bitmap_writer_set_checksum(unsigned char *sha1)
 {
hashcpy(writer.pack_checksum, sha1);
@@ -503,7 +516,7 @@ void bitmap_writer_finish(struct pack_idx_entry **index,
 {
static char tmp_file[PATH_MAX];
static uint16_t default_version = 1;
-   static uint16_t flags = BITMAP_OPT_FULL_DAG;
+   static

[PATCH 12/19] rev-list: add bitmap mode to speed up object lists

From: Vicent Marti tan...@gmail.com

The bitmap reachability index used to speed up the counting objects
phase during `pack-objects` can also be used to optimize a normal
rev-list if the only thing required are the SHA1s of the objects during
the list (i.e., not the path names at which trees and blobs were found).

Calling `git rev-list --objects --use-bitmap-index [committish]` will
perform an object iteration based on a bitmap result instead of actually
walking the object graph.

These are some example timings for `torvalds/linux` (warm cache,
best-of-five):

$ time git rev-list --objects master  /dev/null

real0m34.191s
user0m33.904s
sys 0m0.268s

$ time git rev-list --objects --use-bitmap-index master  /dev/null

real0m1.041s
user0m0.976s
sys 0m0.064s

Likewise, using `git rev-list --count --use-bitmap-index` will speed up
the counting operation by building the resulting bitmap and performing a
fast popcount (number of bits set on the bitmap) on the result.

Here are some sample timings of different ways to count commits in
`torvalds/linux`:

$ time git rev-list master | wc -l
399882

real0m6.524s
user0m6.060s
sys 0m3.284s

$ time git rev-list --count master
399882

real0m4.318s
user0m4.236s
sys 0m0.076s

$ time git rev-list --use-bitmap-index --count master
399882

real0m0.217s
user0m0.176s
sys 0m0.040s

This also respects negative refs, so you can use it to count
a slice of history:

$ time git rev-list --count v3.0..master
144843

real0m1.971s
user0m1.932s
sys 0m0.036s

$ time git rev-list --use-bitmap-index --count v3.0..master
real0m0.280s
user0m0.220s
sys 0m0.056s

Though note that the closer the endpoints, the less it helps. In the
traversal case, we have fewer commits to cross, so we take less time.
But the bitmap time is dominated by generating the pack revindex, which
is constant with respect to the refs given.

Note that you cannot yet get a fast --left-right count of a symmetric
difference (e.g., --count --left-right master...topic). The slow part
of that walk actually happens during the merge-base determination when
we parse master...topic. Even though a count does not actually need to
know the real merge base (it only needs to take the symmetric difference
of the bitmaps), the revision code would require some refactoring to
handle this case.

Additionally, a `--test-bitmap` flag has been added that will perform
the same rev-list manually (i.e. using a normal revwalk) and using
bitmaps, and verify that the results are the same. This can be used to
exercise the bitmap code, and also to verify that the contents of the
.bitmap file are sane.

Signed-off-by: Vicent Marti tan...@gmail.com
Signed-off-by: Jeff King p...@peff.net
---
Note that most of the time we spend for --count invocations is on
generating the pack revindex. It may be worth storing a revidx (either
in a separate file, as part of the .idx, or as an optional section in
the .bitmap file).

 builtin/rev-list.c | 39 +++
 pack-bitmap.c  |  2 +-
 2 files changed, 40 insertions(+), 1 deletion(-)

diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 4fc1616..5209255 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -3,6 +3,8 @@
 #include diff.h
 #include revision.h
 #include list-objects.h
+#include pack.h
+#include pack-bitmap.h
 #include builtin.h
 #include log-tree.h
 #include graph.h
@@ -257,6 +259,18 @@ static int show_bisect_vars(struct rev_list_info *info, 
int reaches, int all)
return 0;
 }
 
+static int show_object_fast(
+   const unsigned char *sha1,
+   enum object_type type,
+   int exclude,
+   uint32_t name_hash,
+   struct packed_git *found_pack,
+   off_t found_offset)
+{
+   fprintf(stdout, %s\n, sha1_to_hex(sha1));
+   return 1;
+}
+
 int cmd_rev_list(int argc, const char **argv, const char *prefix)
 {
struct rev_info revs;
@@ -265,6 +279,7 @@ int cmd_rev_list(int argc, const char **argv, const char 
*prefix)
int bisect_list = 0;
int bisect_show_vars = 0;
int bisect_find_all = 0;
+   int use_bitmap_index = 0;
 
git_config(git_default_config, NULL);
init_revisions(revs, prefix);
@@ -306,6 +321,14 @@ int cmd_rev_list(int argc, const char **argv, const char 
*prefix)
bisect_show_vars = 1;
continue;
}
+   if (!strcmp(arg, --use-bitmap-index)) {
+   use_bitmap_index = 1;
+   continue;
+   }
+   if (!strcmp(arg, --test-bitmap)) {
+   test_bitmap_walk(revs);
+   return 0;
+   }

[PATCH 10/19] pack-bitmap: add support for bitmap indexes

From: Vicent Marti tan...@gmail.com

A bitmap index is a `.bitmap` file that can be found inside
`$GIT_DIR/objects/pack/`, next to its corresponding packfile, and
contains precalculated reachability information for selected commits.
The full specification of the format for these bitmap indexes can be found
in `Documentation/technical/bitmap-format.txt`.

For a given commit SHA1, if it happens to be available in the bitmap
index, its bitmap will represent every single object that is reachable
from the commit itself. The nth bit in the bitmap is the nth object in
the packfile; if it's set to 1, the object is reachable.

By using the bitmaps available in the index, this commit implements
several new functions:

- `prepare_bitmap_git`
- `prepare_bitmap_walk`
- `traverse_bitmap_commit_list`
- `reuse_partial_packfile_from_bitmap`

The `prepare_bitmap_walk` function tries to build a bitmap of all the
objects that can be reached from the commit roots of a given `rev_info`
struct by using the following algorithm:

- If all the interesting commits for a revision walk are available in
the index, the resulting reachability bitmap is the bitwise OR of all
the individual bitmaps.

- When the full set of WANTs is not available in the index, we perform a
partial revision walk using the commits that don't have bitmaps as
roots, and limiting the revision walk as soon as we reach a commit that
has a corresponding bitmap. The earlier OR'ed bitmap with all the
indexed commits can now be completed as this walk progresses, so the end
result is the full reachability list.

- For revision walks with a HAVEs set (a set of commits that are deemed
uninteresting), first we perform the same method as for the WANTs, but
using our HAVEs as roots, in order to obtain a full reachability bitmap
of all the uninteresting commits. This bitmap then can be used to:

a) limit the subsequent walk when building the WANTs bitmap
b) finding the final set of interesting commits by performing an
   AND-NOT of the WANTs and the HAVEs.

If `prepare_bitmap_walk` runs successfully, the resulting bitmap is
stored and the equivalent of a `traverse_commit_list` call can be
performed by using `traverse_bitmap_commit_list`; the bitmap version
of this call yields the objects straight from the packfile index
(without having to look them up or parse them) and hence is several
orders of magnitude faster.

As an extra optimization, when `prepare_bitmap_walk` succeeds, the
`reuse_partial_packfile_from_bitmap` call can be attempted: it will find
the amount of objects at the beginning of the on-disk packfile that can
be reused as-is, and return an offset into the packfile. The source
packfile can then be loaded and the bytes up to `offset` can be written
directly to the result without having to consider the entires inside the
packfile individually.

If the `prepare_bitmap_walk` call fails (e.g. because no bitmap files
are available), the `rev_info` struct is left untouched, and can be used
to perform a manual rev-walk using `traverse_commit_list`.

Hence, this new set of functions are a generic API that allows to
perform the equivalent of

git rev-list --objects [roots...] [^uninteresting...]

for any set of commits, even if they don't have specific bitmaps
generated for them.

In further patches, we'll use this bitmap traversal optimization to
speed up the `pack-objects` and `rev-list` commands.

Signed-off-by: Vicent Marti tan...@gmail.com
Signed-off-by: Jeff King p...@peff.net
---
 Makefile  |   2 +
 khash.h   | 342 +
 pack-bitmap.c | 965 ++
 pack-bitmap.h |  46 +++
 4 files changed, 1355 insertions(+)
 create mode 100644 khash.h
 create mode 100644 pack-bitmap.c
 create mode 100644 pack-bitmap.h

diff --git a/Makefile b/Makefile
index eacb940..737766c 100644
--- a/Makefile
+++ b/Makefile
@@ -699,6 +699,7 @@ LIB_H += object.h
 LIB_H += pack-objects.h
 LIB_H += pack-revindex.h
 LIB_H += pack.h
+LIB_H += pack-bitmap.h
 LIB_H += parse-options.h
 LIB_H += patch-ids.h
 LIB_H += pathspec.h
@@ -837,6 +838,7 @@ LIB_OBJS += notes-cache.o
 LIB_OBJS += notes-merge.o
 LIB_OBJS += notes-utils.o
 LIB_OBJS += object.o
+LIB_OBJS += pack-bitmap.o
 LIB_OBJS += pack-check.o
 LIB_OBJS += pack-objects.o
 LIB_OBJS += pack-revindex.o
diff --git a/khash.h b/khash.h
new file mode 100644
index 000..0fdf39d
--- /dev/null
+++ b/khash.h
@@ -0,0 +1,342 @@
+/* The MIT License
+
+   Copyright (c) 2008, 2009, 2011 by Attractive Chaos attrac...@live.co.uk
+
+   Permission is hereby granted, free of charge, to any person obtaining
+   a copy of this software and associated documentation files (the
+   Software), to deal in the Software without restriction, including
+   without limitation the rights to use, copy, modify, merge, publish,
+   distribute, sublicense, and/or sell copies of the Software, and to
+   permit persons to whom the Software is

Re: [PATCH] rebase: use reflog to find common base with upstream

Martin von Zweigbergk martinv...@gmail.com writes:

 I think

   git merge-base HEAD $(git rev-list -g $upstream_name)

 is roughly correct and hopefully fast enough. That can lead to too
 long a command line, so I was planning on teaching merge-base a
 --stdin option, but never got around to it.

Sorry for coming in late.

I think the above with s/HEAD/$curr_branch/ is a good way to compute
what the whole for reflog in $(git rev-list -g $remoteref loop
computes when one of the historic tips recorded in the reflog was
where $curr_branch forked from, i.e. the loop actually finds at
least one ancestor in the reflog and breaks out after setting
oldremoteref.  But it would give a completely different commit if
none of the reflog entries is a fork point.

A two patch series forthcoming.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/2] finding the fork point from reflog entries

The first one is a clean-up of the code to parse command line
options to git merge-base.  Options such as --independent,
--is-ancestor and --octopus are mutually exclusive and it is
better expressed in terms of the recently introduced OPT_CMDMODE.

The second one implements the entire logic of the for loop we see in
git pull --rebase directly using get_merge_bases_many() and
postprocessing the result.

Junio C Hamano (2):
  merge-base: use OPT_CMDMODE and clarify the command line parsing
  merge-base: --reflog mode finds fork point from reflog entries

 builtin/merge-base.c  | 115 +++---
 t/t6010-merge-base.sh |  27 
 2 files changed, 126 insertions(+), 16 deletions(-)

-- 
1.8.4.1-799-g1c32b8d

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] merge-base: use OPT_CMDMODE and clarify the command line parsing

The --octopus, --independent and --is-ancestor are mutually
exclusive command modes (in addition to not giving any of these
options), so represent them as such using the recent OPT_CMDMODE
facility available since 11588263 (parse-options: add OPT_CMDMODE(),
2013-07-30), which is in v1.8.4-82-g366b80b.  --all is compatible
only with plain vanilla mode and --octopus mode, and the minimum
number of arguments the command takes depends on the command modes,
so these are now separately checked in each command mode.

Signed-off-by: Junio C Hamano gits...@pobox.com
---
 builtin/merge-base.c | 38 ++
 1 file changed, 22 insertions(+), 16 deletions(-)

diff --git a/builtin/merge-base.c b/builtin/merge-base.c
index e88eb93..d39c910 100644
--- a/builtin/merge-base.c
+++ b/builtin/merge-base.c
@@ -90,32 +90,38 @@ int cmd_merge_base(int argc, const char **argv, const char 
*prefix)
struct commit **rev;
int rev_nr = 0;
int show_all = 0;
-   int octopus = 0;
-   int reduce = 0;
-   int is_ancestor = 0;
+   int cmdmode = 0;
 
struct option options[] = {
OPT_BOOL('a', all, show_all, N_(output all common 
ancestors)),
-   OPT_BOOL(0, octopus, octopus, N_(find ancestors for a 
single n-way merge)),
-   OPT_BOOL(0, independent, reduce, N_(list revs not reachable 
from others)),
-   OPT_BOOL(0, is-ancestor, is_ancestor,
-N_(is the first one ancestor of the other?)),
+   OPT_CMDMODE(0, octopus, cmdmode,
+   N_(find ancestors for a single n-way merge), 'o'),
+   OPT_CMDMODE(0, independent, cmdmode,
+   N_(list revs not reachable from others), 'r'),
+   OPT_CMDMODE(0, is-ancestor, cmdmode,
+   N_(is the first one ancestor of the other?), 'a'),
OPT_END()
};
 
git_config(git_default_config, NULL);
argc = parse_options(argc, argv, prefix, options, merge_base_usage, 0);
-   if (!octopus  !reduce  argc  2)
-   usage_with_options(merge_base_usage, options);
-   if (is_ancestor  (show_all || octopus || reduce))
-   die(--is-ancestor cannot be used with other options);
-   if (is_ancestor)
+
+   if (cmdmode == 'a') {
+   if (argc  2)
+   usage_with_options(merge_base_usage, options);
+   if (show_all)
+   die(--is-ancestor cannot be used with --all);
return handle_is_ancestor(argc, argv);
-   if (reduce  (show_all || octopus))
-   die(--independent cannot be used with other options);
+   }
 
-   if (octopus || reduce)
-   return handle_octopus(argc, argv, reduce, show_all);
+   if (cmdmode == 'r'  show_all)
+   die(--independent cannot be used with --all);
+
+   if (cmdmode == 'r' || cmdmode == 'o')
+   return handle_octopus(argc, argv, cmdmode == 'r', show_all);
+
+   if (argc  2)
+   usage_with_options(merge_base_usage, options);
 
rev = xmalloc(argc * sizeof(*rev));
while (argc--  0)
-- 
1.8.4.1-799-g1c32b8d

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] merge-base: --reflog mode finds fork point from reflog entries

The git pull --rebase command computes the fork point of the
branch being rebased using the reflog entries of the base branch
(typically a remote-tracking branch) the branch's work was based on,
in order to cope with the case in which the base branch has been
rewound and rebuilt.  For example, if the history looked like this:

 o---B1
/
---o---o---B2--o---o---o---Base
\
 B3
  \
   Derived

where the current tip of the base branch is at Base, but earlier
fetch observed that its tip used to be B3 and then B2 and then B1
before getting to the current commit, and the branch being rebased
on top of the latest base is based on commit B3, it tries to find
B3 by going through the output of git rev-list --reflog base (i.e.
Base, B1, B2, B3) until it finds a commit that is an ancestor of the
current tip Derived.

Internally, we have get_merge_bases_many() that can compute this
with one-go.  We would want a merge-base between Derived and a
fictitious merge commit that would result by merging all the
historical tips of base.  When such a commit exist, we should get
a single result, which exactly match one of the reflog entries of
base.

Teach git merge-base a new mode, --reflog, to compute exactly
that.

Signed-off-by: Junio C Hamano gits...@pobox.com
---
 builtin/merge-base.c  | 77 +++
 t/t6010-merge-base.sh | 27 ++
 2 files changed, 104 insertions(+)

diff --git a/builtin/merge-base.c b/builtin/merge-base.c
index d39c910..7b9bc15 100644
--- a/builtin/merge-base.c
+++ b/builtin/merge-base.c
@@ -1,6 +1,7 @@
 #include builtin.h
 #include cache.h
 #include commit.h
+#include refs.h
 #include parse-options.h
 
 static int show_merge_base(struct commit **rev, int rev_nr, int show_all)
@@ -27,6 +28,7 @@ static const char * const merge_base_usage[] = {
N_(git merge-base [-a|--all] --octopus commit...),
N_(git merge-base --independent commit...),
N_(git merge-base --is-ancestor commit commit),
+   N_(git merge-base --reflog ref [commit]),
NULL
 };
 
@@ -85,6 +87,73 @@ static int handle_is_ancestor(int argc, const char **argv)
return 1;
 }
 
+struct rev_collect {
+   struct commit **commit;
+   int nr;
+   int alloc;
+};
+
+static int collect_one_reflog_ent(unsigned char *osha1, unsigned char *nsha1,
+ const char *ident, unsigned long timestamp,
+ int tz, const char *message, void *cbdata_)
+{
+   struct rev_collect *revs = cbdata_;
+   struct commit *commit = lookup_commit(nsha1);
+   if (commit) {
+   ALLOC_GROW(revs-commit, revs-nr + 1, revs-alloc);
+   revs-commit[revs-nr++] = commit;
+   }
+   return 0;
+}
+
+static int handle_reflog(int argc, const char **argv)
+{
+   unsigned char sha1[20];
+   char *refname;
+   const char *commitname;
+   struct rev_collect revs;
+   struct commit *derived;
+   struct commit_list *bases;
+   int i;
+
+   switch (dwim_ref(argv[0], strlen(argv[0]), sha1, refname)) {
+   case 0:
+   die(No such ref: '%s', argv[0]);
+   case 1:
+   break; /* good */
+   default:
+   die(Ambiguous refname: '%s', argv[0]);
+   }
+
+   commitname = (argc == 2) ? argv[1] : HEAD;
+   if (get_sha1(commitname, sha1))
+   die(Not a valid object name: '%s', commitname);
+
+   derived = lookup_commit_reference(sha1);
+   memset(revs, 0, sizeof(revs));
+   for_each_reflog_ent(refname, collect_one_reflog_ent, revs);
+
+   bases = get_merge_bases_many(derived, revs.nr, revs.commit, 0);
+
+   /*
+* There should be one and only one merge base, when we found
+* a common ancestor among reflog entries.
+*/
+   if (!bases || bases-next)
+   return 1;
+
+   /* And the found one must be one of the reflog entries */
+   for (i = 0; i  revs.nr; i++)
+   if (bases-item-object == revs.commit[i]-object)
+   break; /* found */
+   if (revs.nr = i)
+   return 1; /* not found */
+
+   printf(%s\n, sha1_to_hex(bases-item-object.sha1));
+   free_commit_list(bases);
+   return 0;
+}
+
 int cmd_merge_base(int argc, const char **argv, const char *prefix)
 {
struct commit **rev;
@@ -100,6 +169,8 @@ int cmd_merge_base(int argc, const char **argv, const char 
*prefix)
N_(list revs not reachable from others), 'r'),
OPT_CMDMODE(0, is-ancestor, cmdmode,
N_(is the first one ancestor of the other?), 'a'),
+   OPT_CMDMODE(0, reflog, cmdmode,
+   N_(find where commit forked from reflog of 
ref), 'g'),
OPT_END()
};
 
@@ -120,6 +191,12 @@ int cmd_merge_base(int

Re: git grep: search whole tree by default?

Jeff King p...@peff.net writes:

 That would also provide people who do not like the change of default an
 escape hatch to keep the current behavior. And I do not think scripted
 use will be inconvenienced; they will already have to use . or :/ to
 be explicit (if they care) since the behavior is changing.

There is a big difference between scripted use will have an escape
hatch and scripted use will not be inconvenienced.  We *know*
scripts will be inconvenienced with or without such a configuration
variable, as they *have* to be updated if they rely on the current
behaviour of git grep that limits its search to the current
directory when fed no pathspec (and if their users want to keep the
current behaviour of such scripts).  Anything short of a warning (or
even erroring out) that is designed to annoy the users during the
transition period will help ease the pain of transition of scripts.

An annoying warning still can only *ease*, but cannot eliminate, the
pain of transition. The scripts need to be updated to adjust to the
new behaviour; there is no getting around to it.

Even if we ignore the helping your colleague at her terminal, cf.

http://thread.gmane.org/gmane.comp.version-control.git/133570/focus=133683

issue for now, adding a new configuration variable from day one
makes the transition of scripts somewhat worse, I am afraid.  Doing
so robs us a way to add such an annoying warning to help people
foresee problems in their existing scripts before the default
changes (the configuration presumably will disable the this command
line will behave differently after the default changes warning).

As I said, I think we can train people without an annoying warning,
as hits outside their current directory will serve as an annoyance
already, and people who set such a configuration in their repository
(or $HOME/.gitconfig), get used to the chosen behaviour too much,
and get surprised when they get to use a vanilla intallation of Git
(either helping colleague or setting up a new work environment) have
only themselves to blame, so it may not be too big a deal.

But I do not think the same reasoning extends to scripted uses X-.


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] entry.c: convert write_entry to use strbuf

Duy Nguyen pclo...@gmail.com writes:

 On Thu, Oct 24, 2013 at 12:52 AM, Junio C Hamano gits...@pobox.com wrote:
 Nguyễn Thái Ngọc Duy pclo...@gmail.com writes:

 The strcpy call in open_output_fd() implies that the output buffer
 must be at least 25 chars long.

 Hmph, where does that 25 come from?

 [snipped]

 Much better. Thanks.

So where does that 25 come from?

We strcpy .merge_link_XX or .merge_file_XX into path[]
and run mkstemp() on it, and these templates are 18 bytes long, so I
am puzzled.

Is 25 just a small random number that is surely longer than these
templates--did not bother to count how long the templates are?
That's fine by me; I am just trying to make sure I am not missing
anything that turns these templates into a longer filename.

Thanks.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 02/15] t5510: prepare test refs more straightforwardly

Michael Haggerty mhag...@alum.mit.edu writes:

 As long as you have checked that our local 'master' should be at the
 same commit as the origin's 'master' at this point, I think this
 change is OK.

 It doesn't matter what the reference points at; the only point of these
 tests is to check whether branches that look like stale remote-tracking
 branches are pruned or not by the later command.

OK, thanks.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RFE: support change-id generation natively

Johannes Sixt j.s...@viscovery.net writes:

Am 10/24/2013 7:25, schrieb Duy Nguyen:
On Thu, Oct 24, 2013 at 11:11 AM, Nasser Grainawi nas...@codeaurora.org
wrote:
It is not clear to me how you envision to make it work.

I don't have the source code.

Now you do:
https://gerrit.googlesource.com/gerrit/+/master/gerrit-server/src/main/resources/com/google/gerrit/server/tools/root/hooks/commit-msg

Thanks. So you do have tree sha-1 by running git write-tree. But at
that point I'm not sure if cache-tree is written down to disk yet, so
write-tree could be more expensive than necessary (one good point for
building --change-id in).

Consider that I make a commit with a change-id. Then I rewrite the commit,
but keep the change-id. Then I push the rewritten commit to Gerrit. Gerrit
does not have the objects that the change-id is based on; the change-id is
just a random number and has no other significance. Right?

Why do you go all the length in computing a change-id instead of just
pulling 20 bytes from /dev/random?

Very good point.

The quoted script does not necessarily give the right commit object
name at least under three scenarios:

- when we would need to add encoding header, etc.;

- when we are recording merges (perhaps merges will not get rebased
in Gerrit workflow and it does not matter what random garbage
this script added to them).

- when we record the commit after 1-sec boundary since _gen_ChangeIdInput
in the script was called.

I wouldn't call the script buggy, but I tend to agree with you
that it is an unnecessarily more complex way to spell grab 20
random bytes ;-)

That said, I don't think that --change-id option that the user must not
forget to use is any better than a hook that the user must not forget to
install.

That is why I said this in my first response to this thread:

... We may even want to
introduce commit.changeId boolean configuration variable if we did
so.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH] http.c: Spell the null pointer as NULL

2013-10-24 Thread Ramsay Jones


Commit 1bbcc224 (http: refactor options to http_get_*, 28-09-2013)
changed the type of final 'options' argument of the http_get_file()
function from an int to an 'struct http_get_options' pointer.
However, it neglected to update the (single) call site. Since this
call was passing '0' to that argument, it was (correctly) being
interpreted as a null pointer. Change to argument to NULL.

Noticed by sparse. (Using plain integer as NULL pointer)

Signed-off-by: Ramsay Jones ram...@ramsay1.demon.co.uk
---

Hi Junio,

This is a repost of:

  http://article.gmane.org/gmane.comp.version-control.git/236201

I suspect that this simply fell through the cracks ... (if not,
please let me know ;-)

ATB,
Ramsay Jones

 http.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/http.c b/http.c
index 96d7578..b133ffd 100644
--- a/http.c
+++ b/http.c
@@ -1045,7 +1045,7 @@ static char *fetch_pack_index(unsigned char *sha1, const 
char *base_url)
strbuf_addf(buf, %s.temp, sha1_pack_index_name(sha1));
tmp = strbuf_detach(buf, NULL);
 
-   if (http_get_file(url, tmp, 0) != HTTP_OK) {
+   if (http_get_file(url, tmp, NULL) != HTTP_OK) {
error(Unable to get pack index %s, url);
free(tmp);
tmp = NULL;
-- 
1.8.4
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/19] pack bitmaps

Jeff King p...@peff.net writes:

 A similar series has been running on github.com for the past couple of
 months, though not every repository has had bitmaps turned on (but some
 very busy ones have).  We've hopefully squeezed out all of the bugs and
 corner cases over that time. However, I did rebase this on a more modern
 version of master; among other conflicts, this required porting the
 git-repack changes from shell to C. So it's entirely possible I've
 introduced new bugs. :)

 The idea and original implementation for bitmaps comes from Shawn and
 Colby, of course. The hard work in this series was done by Vicent Marti,
 and he is credited as the author in most of the patches. I've added some
 window dressing and helped a little with debugging and review. But along
 with Vicent, I should be able to help with answering questions for
 review, and as time goes on, I'm familiar enough with the code to deal
 with bugs and reviewing future changes.

Woo-hoo.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] rebase: use reflog to find common base with upstream

On Mon, Oct 21, 2013 at 10:40:22PM -0700, Martin von Zweigbergk wrote:
 On Wed, Oct 16, 2013 at 11:53 AM, John Keeping j...@keeping.me.uk wrote:
  Commit 15a147e (rebase: use @{upstream} if no upstream specified,
  2011-02-09) says:
 
  Make it default to 'git rebase @{upstream}'. That is also what
  'git pull [--rebase]' defaults to, so it only makes sense that
  'git rebase' defaults to the same thing.
 
  but that isn't actually the case.  Since commit d44e712 (pull: support
  rebased upstream + fetch + pull --rebase, 2009-07-19), pull has actually
  chosen the most recent reflog entry which is an ancestor of the current
  branch if it can find one.
 
  Change rebase so that it uses the same logic.
 
  Signed-off-by: John Keeping j...@keeping.me.uk
  ---
   git-rebase.sh | 8 
   t/t3400-rebase.sh | 6 --
   2 files changed, 12 insertions(+), 2 deletions(-)
 
  diff --git a/git-rebase.sh b/git-rebase.sh
  index 226752f..fd36cf7 100755
  --- a/git-rebase.sh
  +++ b/git-rebase.sh
  @@ -437,6 +437,14 @@ then
  error_on_missing_default_upstream rebase rebase 
  \
  against git rebase branch
  fi
  +   for reflog in $(git rev-list -g $upstream_name 
  2/dev/null)
  +   do
  +   if test $reflog = $(git merge-base $reflog 
  HEAD)
  +   then
  +   upstream_name=$reflog
  +   break
  +   fi
  +   done
  ;;
  *)  upstream_name=$1
  shift
 
 A little later, onto_name gets assigned like so:
 
   onto_name=${onto-$upstream_name}
 
 So if upstream_name was set above, then onto would get the same value,
 which is not what we want, right? It seems like this block of code
 should come a bit later.
 
 I also think it not be run only when rebase was run without a given
 upstream. If the configured upstream is origin/master, it seems like
 it would be surprising to get different behavior from git rebase and
 git rebase origin/master.

Hmm... I think you're right.  I originally didn't want to change the
behaviour when the user specifies a branch, but in this case we want to
look for a common ancestor in the reflog of the upstream regardless of
where the ref came from.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 10/15] fetch --tags: fetch tags in addition to other stuff

Michael Haggerty mhag...@alum.mit.edu writes:

 Previously, fetch's --tags option was considered equivalent to
 specifying the refspec refs/tags/*:refs/tags/* on the command line;
 in particular, it caused the remote.name.refspec configuration to be
 ignored.

 But it is not very useful to fetch tags without also fetching other
 references, whereas it *is* quite useful to be able to fetch tags *in
 addition to* other references.

True but when fetching other references, tags relevant to the
history being fetched by default should automatically follow, so the
above explains why fetch --tags is not a useful thing to do daily.

 So change the semantics of this option
 to do the latter.

And I am not opposed to that change in the semantics; it makes an
operation that is not so useful less annoying.

 If a user wants to fetch *only* tags, then it is still possible to
 specifying an explicit refspec:

 git fetch remote 'refs/tags/*:refs/tags/*'

 Please note that the documentation prior to 1.8.0.3 was ambiguous
 about this aspect of fetch --tags behavior.  Commit

 f0cb2f137c 2012-12-14 fetch --tags: clarify documentation

 made the documentation match the old behavior.  This commit changes
 the documentation to match the new behavior.

The old behaviour as designed.  The documentation change was not
about refusing to fix a bug but the above makes it sound as if it
were a such commit.

 The change to builtin/fetch.c:get_ref_map() has the side effect of
 changing the order that the (struct ref) objects are listed in
 ref_map.  It seems to me that this could probably only matter in the
 case of duplicates.  But later ref_remove_duplicates() is called, so
 duplicates are eliminated.  However, ref_remove_duplicates() always
 retains the first instance of duplicates and discards the rest.  It is
 conceivable that the boolean flags (which are not inspected by
 ref_remove_duplicates()) could differ among the duplicates, and that
 therefore changing the order of the items in the original list has the
 effect of changing the flags on the items that are retained.

 I don't know this code well enough to judge whether this might be a
 problem.

 If it is, then the correct approach is probably either to teach
 ref_remove_duplicates() to ensure that the flags are also consistent
 across duplicates, or to somehow combine the flags from all duplicates
 into the one that is retained.  Please let me know if this is needed.

I do not think either of these two is sufficient if you want to fix
it; ref_remove_duplicates() needs to be taught to retain the first
one it encounters among the duplicates, not ensure the flags are
consistent by erroring out when they are not among duplicates, nor
somehow combine flags from later one to the first one.

But I suspect that, if this behaviour change were a problem, such a
configuration was a problem before this change to most people; the
order of duplicated [remote foo] section would not be under
control of those who only use git remote without using an editor
to tweak .git/config file anyway. So I do not think this regression
is too big a deal; it is something you can fix later on top.

 diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
 index ba1fe49..0e6d2ac 100644
 --- a/Documentation/fetch-options.txt
 +++ b/Documentation/fetch-options.txt
 @@ -61,11 +61,9 @@ endif::git-pull[]
  ifndef::git-pull[]
  -t::
  --tags::
 - This is a short-hand for giving `refs/tags/*:refs/tags/*`
 - refspec from the command line, to ask all tags to be fetched
 - and stored locally.  Because this acts as an explicit
 - refspec, the default refspecs (configured with the
 - remote.$name.fetch variable) are overridden and not used.
 + This is a short-hand requesting that all tags be fetched from
 + the remote in addition to whatever else is being fetched.  It
 + is similar to using the refspec `refs/tags/*:refs/tags/*`.

This is no longer a short-hand, is it?  There is no other way to ask
fetch the usual stuff, and then refs/tags/*:refs/tags/* as well.

It should be sufficient for me to locally do:

s/This is a short-hand requesting/Request/;

I think.

 diff --git a/git-pull.sh b/git-pull.sh
 index b946fd9..dac7e1c 100755
 --- a/git-pull.sh
 +++ b/git-pull.sh
 @@ -172,7 +172,7 @@ error_on_no_merge_candidates () {
   do
   case $opt in
   -t|--t|--ta|--tag|--tags)
 - echo Fetching tags only, you probably meant:
 + echo It doesn't make sense to pull tags; you probably 
 meant:

s/pull tags/pull all tags/; perhaps?

   echo   git fetch --tags
   exit 1
   esac
 diff --git a/t/t5510-fetch.sh b/t/t5510-fetch.sh
 index 8328be1..02e5901 100755
 --- a/t/t5510-fetch.sh
 +++ b/t/t5510-fetch.sh
 @@ -113,7 +113,7 @@ test_expect_success 'fetch --prune with a namespace keeps 
 other namespaces' '
   git rev-parse origin/master
  '

Re: [PATCH 0/2] finding the fork point from reflog entries

On Thu, Oct 24, 2013 at 12:11:22PM -0700, Junio C Hamano wrote:
 The first one is a clean-up of the code to parse command line
 options to git merge-base.  Options such as --independent,
 --is-ancestor and --octopus are mutually exclusive and it is
 better expressed in terms of the recently introduced OPT_CMDMODE.
 
 The second one implements the entire logic of the for loop we see in
 git pull --rebase directly using get_merge_bases_many() and
 postprocessing the result.

Nice!  I tried this in the case where the target commit happens to be
the 63rd reflog entry:

$ time sh -c 'for rev in $(git rev-list -g origin/master 2/dev/null)
do
git merge-base --is-ancestor $rev b2edae0  break
done
'

real0m3.772s
user0m3.338s
sys 0m0.440s

$ time git merge-base --reflog origin/master b2edae0

real0m0.156s
user0m0.138s
sys 0m0.018s


 Junio C Hamano (2):
   merge-base: use OPT_CMDMODE and clarify the command line parsing
   merge-base: --reflog mode finds fork point from reflog entries
 
  builtin/merge-base.c  | 115 
 +++---
  t/t6010-merge-base.sh |  27 
  2 files changed, 126 insertions(+), 16 deletions(-)

Should there be a change to Documentation/git-merge-base.txt in here?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] http.c: Spell the null pointer as NULL

2013-10-24 Thread Jonathan Nieder

Ramsay Jones wrote:

 Commit 1bbcc224 (http: refactor options to http_get_*, 28-09-2013)
 changed the type of final 'options' argument of the http_get_file()
 function from an int to an 'struct http_get_options' pointer.
 However, it neglected to update the (single) call site. Since this
 call was passing '0' to that argument, it was (correctly) being
 interpreted as a null pointer. Change to argument to NULL.

Jeff King wrote:

 Thanks, patch is obviously correct (and the cause was just oversight on
 my part).

For what it's worth, against the jk/http-auth-redirects branch,

Reviewed-by: Jonathan Nieder jrnie...@gmail.com
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] merge-base: --reflog mode finds fork point from reflog entries

2013-10-24 Thread Eric Sunshine

On Thu, Oct 24, 2013 at 3:11 PM, Junio C Hamano gits...@pobox.com wrote:
 diff --git a/t/t6010-merge-base.sh b/t/t6010-merge-base.sh
 index f80bba8..3a1abee 100755
 --- a/t/t6010-merge-base.sh
 +++ b/t/t6010-merge-base.sh
 @@ -230,4 +230,31 @@ test_expect_success 'criss-cross merge-base for 
 octopus-step' '
 test_cmp expected.sorted actual.sorted
  '

 +test_expect_success 'using reflog to find the fork point' '
 +   git reset --hard 
 +   git checkout -b base $E 
 +   (
 +   for count in 1 2 3 4 5
 +   do
 +   git commit --allow-empty -m Base commit #$count 
 +   git rev-parse HEAD expect$count 
 +   git checkout -B derived 
 +   git commit --allow-empty -m Derived #$count 
 +   git rev-parse HEAD derived$count 
 +   git checkout base 
 +   count=$(( $count + 1 )) || exit 1
 +   done

Did you want  here?

 +
 +   for count in 1 2 3 4 5
 +   do
 +   git merge-base --reflog base $(cat derived$count) 
 actual 
 +   test_cmp expect$count actual || exit 1
 +   done

And here?

 +
 +   # check defaulting to HEAD
 +   git merge-base --reflog base actual 
 +   test_cmp expect5 actual
 +   )
 +'
 +
  test_done
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 11/15] fetch --prune: prune only based on explicit refspecs

Michael Haggerty mhag...@alum.mit.edu writes:

 ...
 Signed-off-by: Michael Haggerty mhag...@alum.mit.edu

Everything in the proposed log message made sense to me.

 diff --git a/Documentation/config.txt b/Documentation/config.txt
 index d4d93c9..83c1700 100644
 --- a/Documentation/config.txt
 +++ b/Documentation/config.txt
 @@ -2086,7 +2086,7 @@ remote.name.vcs::
  remote.name.prune::
   When set to true, fetching from this remote by default will also
   remove any remote-tracking branches which no longer exist on the
 - remote (as if the `--prune` option was give on the command line).
 + remote (as if the `--prune` option was given on the command line).

Shouldn't we stop saying branches here?

 diff --git a/Documentation/fetch-options.txt b/Documentation/fetch-options.txt
 index 0e6d2ac..5d12219 100644
 --- a/Documentation/fetch-options.txt
 +++ b/Documentation/fetch-options.txt
 @@ -41,8 +41,14 @@ ifndef::git-pull[]
  
  -p::
  --prune::
 - After fetching, remove any remote-tracking branches which
 - no longer exist on the remote.
 + After fetching, remove any remote-tracking branches that

Likewise.  This is a lot more important than the one in
remote.name.prune documentation, as the next sentence Tags are
not subject to ... implies that they fall into the same category as
what gets pruned here, i.e. remote-tracking branches in the above
sentence, but nobody calls refs/tags/v1.0.0 a remote-tracking
branch even if it came from your 'origin'.

 + no longer exist on the remote.  Tags are not subject to
 + pruning in the usual case that they are fetched because of the
 + --tags option or remote.name.tagopt.  

We should mention the most usual case tags are fetched, before
mentioning the case the unusual option --tags was used from the
command line or .tagopt configuration was used.  Namely, when the
tags are automatically followed.

 + However, if tags are
 + fetched due to an explicit refspec (either on the command line
 + or in the remote configuration, for example if the remote was
 + cloned with the --mirror option), then they are also subject
 + to pruning.

Very good.

 @@ -63,7 +69,10 @@ ifndef::git-pull[]
  --tags::
   This is a short-hand requesting that all tags be fetched from
   the remote in addition to whatever else is being fetched.  It
 - is similar to using the refspec `refs/tags/*:refs/tags/*`.
 + is similar to using the refspec `refs/tags/*:refs/tags/*`,
 + except that it doesn't subject tags to pruning, regardless of
 + a --prune option or the configuration settings of fetch.prune
 + or remote.name.prune.

Using --tags is not similar to using refs/tags/*:refs/tags/* after
the previous patch already; git fetch origin --tags and git fetch
origin refs/tags/*:refs/tags/* are vastly different and that was
the whole point of the previous step.  And that calling something
not so similar similar needs to be fixed further here to clarify
that they are not similar in yet another way.

We should just lose It is similar to using from 10/15 and start
over, perhaps?  Add the first paragraph of the below in 10/15 and
add the rest in 11/15, or something.

--tags::
Request that all tags be fetched from the remote
under the same name (i.e. `refs/tags/X` is created in
our repository by copying their `refs/tags/X`), in
addition to whatever is fetched by the same `git
fetch` command without this option on the command
line.
+
When `refs/tags/*` hierarchy from the remote is copied only
because this option was given, they are not subject to be
pruned when `--prune` option (or configuration variables
like `fetch.prune` or `remote.name.prune`) is in effect.

That would make it clear that they are subject to pruning when --mirror
or an explicit refspec refs/tags/*:refs/tags/* is given, as tags are
not fetched only because of --tags in such cases.

 diff --git a/builtin/fetch.c b/builtin/fetch.c
 index 7edb1ea..47b63a7 100644
 --- a/builtin/fetch.c
 +++ b/builtin/fetch.c
 @@ -829,38 +829,17 @@ static int do_fetch(struct transport *transport,
   goto cleanup;
   }
   if (prune) {
 - struct refspec *prune_refspecs;
 - int prune_refspec_count;
 -
 + /*
 +  * We only prune based on refspecs specified
 +  * explicitly (via command line or configuration); we
 +  * don't care whether --tags was specified.
 +  */
   if (ref_count) {
 - prune_refspecs = refs;
 - prune_refspec_count = ref_count;
 - } else {
 - prune_refspecs = transport-remote-fetch;
 - prune_refspec_count = 
 transport-remote-fetch_refspec_nr;
 - }
 -
 - if (tags == TAGS_SET) {
 -

Re: [PATCH 0/2] finding the fork point from reflog entries

John Keeping j...@keeping.me.uk writes:

 On Thu, Oct 24, 2013 at 12:11:22PM -0700, Junio C Hamano wrote:
 The first one is a clean-up of the code to parse command line
 options to git merge-base.  Options such as --independent,
 --is-ancestor and --octopus are mutually exclusive and it is
 better expressed in terms of the recently introduced OPT_CMDMODE.
 
 The second one implements the entire logic of the for loop we see in
 git pull --rebase directly using get_merge_bases_many() and
 postprocessing the result.

 Nice!  I tried this in the case where the target commit happens to be
 the 63rd reflog entry:

 $ time sh -c 'for rev in $(git rev-list -g origin/master 2/dev/null)
 do
 git merge-base --is-ancestor $rev b2edae0  break
 done
 '

 real0m3.772s
 user0m3.338s
 sys 0m0.440s

 $ time git merge-base --reflog origin/master b2edae0

 real0m0.156s
 user0m0.138s
 sys 0m0.018s

The real question is if the C code computes the same as the shell
loop.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 15/15] fetch, remote: properly convey --no-prune options to subprocesses

I finished reading thru the remainder and all looked good (again
except minor nits I sent reviews for separately).

Thanks, will queue.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] merge-base: --reflog mode finds fork point from reflog entries

Eric Sunshine sunsh...@sunshineco.com writes:

 On Thu, Oct 24, 2013 at 3:11 PM, Junio C Hamano gits...@pobox.com wrote:
 diff --git a/t/t6010-merge-base.sh b/t/t6010-merge-base.sh
 index f80bba8..3a1abee 100755
 --- a/t/t6010-merge-base.sh
 +++ b/t/t6010-merge-base.sh
 @@ -230,4 +230,31 @@ test_expect_success 'criss-cross merge-base for 
 octopus-step' '
 test_cmp expected.sorted actual.sorted
  '

 +test_expect_success 'using reflog to find the fork point' '
 +   git reset --hard 
 +   git checkout -b base $E 
 +   (
 +   for count in 1 2 3 4 5
 +   do
 +   git commit --allow-empty -m Base commit #$count 
 +   git rev-parse HEAD expect$count 
 +   git checkout -B derived 
 +   git commit --allow-empty -m Derived #$count 
 +   git rev-parse HEAD derived$count 
 +   git checkout base 
 +   count=$(( $count + 1 )) || exit 1
 +   done

 Did you want  here?

No, I did not.  Can't you tell from the fact that I didn't put one
there ;-)?

It does not hurt to have one there, but it is not necessary.

Because the loop itself does not -cascade from anything else, the
only case anything after done  would be skipped and making the
whole thing fail would be when anything inside the loop fails, but
we already exit 1 to terminate the whole subprocess in that case,
so we will not continue past the loop.

 +
 +   for count in 1 2 3 4 5
 +   do
 +   git merge-base --reflog base $(cat derived$count) 
 actual 
 +   test_cmp expect$count actual || exit 1
 +   done

 And here?

Likewise.

Thanks.

 +
 +   # check defaulting to HEAD
 +   git merge-base --reflog base actual 
 +   test_cmp expect5 actual
 +   )
 +'
 +
  test_done
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] finding the fork point from reflog entries

On Thu, Oct 24, 2013 at 02:20:29PM -0700, Junio C Hamano wrote:
 John Keeping j...@keeping.me.uk writes:
 
  On Thu, Oct 24, 2013 at 12:11:22PM -0700, Junio C Hamano wrote:
  The first one is a clean-up of the code to parse command line
  options to git merge-base.  Options such as --independent,
  --is-ancestor and --octopus are mutually exclusive and it is
  better expressed in terms of the recently introduced OPT_CMDMODE.
  
  The second one implements the entire logic of the for loop we see in
  git pull --rebase directly using get_merge_bases_many() and
  postprocessing the result.
 
  Nice!  I tried this in the case where the target commit happens to be
  the 63rd reflog entry:
 
  $ time sh -c 'for rev in $(git rev-list -g origin/master 2/dev/null)
  do
  git merge-base --is-ancestor $rev b2edae0  break
  done
  '
 
  real0m3.772s
  user0m3.338s
  sys 0m0.440s
 
  $ time git merge-base --reflog origin/master b2edae0
 
  real0m0.156s
  user0m0.138s
  sys 0m0.018s
 
 The real question is if the C code computes the same as the shell
 loop.

And in fact it doesn't - if you replace the break with echo $rev the
shell version prints b2edae0... but the C version prints nothing (and
exists with status 1).
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Feature: support for file permissions

2013-10-24 Thread Renich Bon Ciric

Hello,

I think file permissions are really important for source code as well.

For example, in web development, you want configuration files to be
read only; specially if you're deploying with git.

Also, you would want some executable file to not be writable ; but
executable by world.

Permission support would become really handy when using GIT_WORK_TREE
on a bare repo.

Please, consider supporting them.

-- 
It's hard to be free... but I love to struggle. Love isn't asked for;
it's just given. Respect isn't asked for; it's earned!
Renich Bon Ciric

http://www.woralelandia.com/
http://www.introbella.com/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] finding the fork point from reflog entries

On Thu, Oct 24, 2013 at 10:31:35PM +0100, John Keeping wrote:
 On Thu, Oct 24, 2013 at 02:20:29PM -0700, Junio C Hamano wrote:
  John Keeping j...@keeping.me.uk writes:
  
   On Thu, Oct 24, 2013 at 12:11:22PM -0700, Junio C Hamano wrote:
   The first one is a clean-up of the code to parse command line
   options to git merge-base.  Options such as --independent,
   --is-ancestor and --octopus are mutually exclusive and it is
   better expressed in terms of the recently introduced OPT_CMDMODE.
   
   The second one implements the entire logic of the for loop we see in
   git pull --rebase directly using get_merge_bases_many() and
   postprocessing the result.
  
   Nice!  I tried this in the case where the target commit happens to be
   the 63rd reflog entry:
  
   $ time sh -c 'for rev in $(git rev-list -g origin/master 2/dev/null)
   do
   git merge-base --is-ancestor $rev b2edae0  break
   done
   '
  
   real0m3.772s
   user0m3.338s
   sys 0m0.440s
  
   $ time git merge-base --reflog origin/master b2edae0
  
   real0m0.156s
   user0m0.138s
   sys 0m0.018s
  
  The real question is if the C code computes the same as the shell
  loop.
 
 And in fact it doesn't - if you replace the break with echo $rev the
 shell version prints b2edae0... but the C version prints nothing (and
 exists with status 1).

To clarify: the particular commit in the calls above happens to be the
oldest entry in the reflog, if I pick a newer entry then it works.

It seems that for_each_reflog_ent isn't returning the oldest entry;
revs.nr is 62 whereas git rev-list -g origin/master | wc -l gives 63.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] merge-base: --reflog mode finds fork point from reflog entries

2013-10-24 Thread Eric Sunshine

On Thu, Oct 24, 2013 at 5:26 PM, Junio C Hamano gits...@pobox.com wrote:
 Eric Sunshine sunsh...@sunshineco.com writes:

 On Thu, Oct 24, 2013 at 3:11 PM, Junio C Hamano gits...@pobox.com wrote:
 diff --git a/t/t6010-merge-base.sh b/t/t6010-merge-base.sh
 index f80bba8..3a1abee 100755
 --- a/t/t6010-merge-base.sh
 +++ b/t/t6010-merge-base.sh
 @@ -230,4 +230,31 @@ test_expect_success 'criss-cross merge-base for 
 octopus-step' '
 test_cmp expected.sorted actual.sorted
  '

 +test_expect_success 'using reflog to find the fork point' '
 +   git reset --hard 
 +   git checkout -b base $E 
 +   (
 +   for count in 1 2 3 4 5
 +   do
 +   git commit --allow-empty -m Base commit #$count 
 +   git rev-parse HEAD expect$count 
 +   git checkout -B derived 
 +   git commit --allow-empty -m Derived #$count 
 +   git rev-parse HEAD derived$count 
 +   git checkout base 
 +   count=$(( $count + 1 )) || exit 1
 +   done

 Did you want  here?

 No, I did not.  Can't you tell from the fact that I didn't put one
 there ;-)?

 It does not hurt to have one there, but it is not necessary.

 Because the loop itself does not -cascade from anything else, the
 only case anything after done  would be skipped and making the
 whole thing fail would be when anything inside the loop fails, but
 we already exit 1 to terminate the whole subprocess in that case,
 so we will not continue past the loop.

I didn't read inside the loop closely enough to see that. Sorry for the noise.


 +
 +   for count in 1 2 3 4 5
 +   do
 +   git merge-base --reflog base $(cat derived$count) 
 actual 
 +   test_cmp expect$count actual || exit 1
 +   done

 And here?

 Likewise.

 Thanks.

 +
 +   # check defaulting to HEAD
 +   git merge-base --reflog base actual 
 +   test_cmp expect5 actual
 +   )
 +'
 +
  test_done
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] http.c: Spell the null pointer as NULL

Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] finding the fork point from reflog entries

On Thu, Oct 24, 2013 at 10:40:07PM +0100, John Keeping wrote:
 On Thu, Oct 24, 2013 at 10:31:35PM +0100, John Keeping wrote:
  On Thu, Oct 24, 2013 at 02:20:29PM -0700, Junio C Hamano wrote:
   John Keeping j...@keeping.me.uk writes:
   
On Thu, Oct 24, 2013 at 12:11:22PM -0700, Junio C Hamano wrote:
The first one is a clean-up of the code to parse command line
options to git merge-base.  Options such as --independent,
--is-ancestor and --octopus are mutually exclusive and it is
better expressed in terms of the recently introduced OPT_CMDMODE.

The second one implements the entire logic of the for loop we see in
git pull --rebase directly using get_merge_bases_many() and
postprocessing the result.
   
Nice!  I tried this in the case where the target commit happens to be
the 63rd reflog entry:
   
$ time sh -c 'for rev in $(git rev-list -g origin/master 2/dev/null)
do
git merge-base --is-ancestor $rev b2edae0  break
done
'
   
real0m3.772s
user0m3.338s
sys 0m0.440s
   
$ time git merge-base --reflog origin/master b2edae0
   
real0m0.156s
user0m0.138s
sys 0m0.018s
   
   The real question is if the C code computes the same as the shell
   loop.
  
  And in fact it doesn't - if you replace the break with echo $rev the
  shell version prints b2edae0... but the C version prints nothing (and
  exists with status 1).
 
 To clarify: the particular commit in the calls above happens to be the
 oldest entry in the reflog, if I pick a newer entry then it works.
 
 It seems that for_each_reflog_ent isn't returning the oldest entry;
 revs.nr is 62 whereas git rev-list -g origin/master | wc -l gives 63.

The following patch on top fixes it, but I'm sure it can be done in a
neater way.

-- 8 --
diff --git a/builtin/merge-base.c b/builtin/merge-base.c
index 7b9bc15..f6f1f14 100644
--- a/builtin/merge-base.c
+++ b/builtin/merge-base.c
@@ -98,7 +98,17 @@ static int collect_one_reflog_ent(unsigned char *osha1, 
unsigned char *nsha1,
  int tz, const char *message, void *cbdata_)
 {
struct rev_collect *revs = cbdata_;
-   struct commit *commit = lookup_commit(nsha1);
+   struct commit *commit;
+
+   if (!revs-nr) {
+   commit = lookup_commit(osha1);
+   if (commit) {
+   ALLOC_GROW(revs-commit, revs-nr + 1, revs-alloc);
+   revs-commit[revs-nr++] = commit;
+   }
+   }
+
+   commit = lookup_commit(nsha1);
if (commit) {
ALLOC_GROW(revs-commit, revs-nr + 1, revs-alloc);
revs-commit[revs-nr++] = commit;
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] merge-base: --reflog mode finds fork point from reflog entries

Eric Sunshine sunsh...@sunshineco.com writes:

 On Thu, Oct 24, 2013 at 5:26 PM, Junio C Hamano gits...@pobox.com wrote:
 Eric Sunshine sunsh...@sunshineco.com writes:

 On Thu, Oct 24, 2013 at 3:11 PM, Junio C Hamano gits...@pobox.com wrote:
 diff --git a/t/t6010-merge-base.sh b/t/t6010-merge-base.sh
 index f80bba8..3a1abee 100755
 --- a/t/t6010-merge-base.sh
 +++ b/t/t6010-merge-base.sh
 @@ -230,4 +230,31 @@ test_expect_success 'criss-cross merge-base for 
 octopus-step' '
 test_cmp expected.sorted actual.sorted
  '

 +test_expect_success 'using reflog to find the fork point' '
 +   git reset --hard 
 +   git checkout -b base $E 
 +   (
 +   for count in 1 2 3 4 5
 +   do
 +   git commit --allow-empty -m Base commit #$count 
 
 +   git rev-parse HEAD expect$count 
 +   git checkout -B derived 
 +   git commit --allow-empty -m Derived #$count 
 +   git rev-parse HEAD derived$count 
 +   git checkout base 
 +   count=$(( $count + 1 )) || exit 1
 +   done

 Did you want  here?

 No, I did not.  Can't you tell from the fact that I didn't put one
 there ;-)?

 It does not hurt to have one there, but it is not necessary.

 Because the loop itself does not -cascade from anything else, the
 only case anything after done  would be skipped and making the
 whole thing fail would be when anything inside the loop fails, but
 we already exit 1 to terminate the whole subprocess in that case,
 so we will not continue past the loop.

 I didn't read inside the loop closely enough to see that. Sorry
 for the noise.

Heh, you helped me realize that the above was suboptimal, and it
wasn't a noise ;-)

We could do it this way without the subshell and the exit, I would
think.

test_expect_success 'using reflog to find the fork point' '
git reset --hard 
git checkout -b base $E 
for count in 1 2 3 4 5
do
git commit --allow-empty -m Base commit #$count 
git rev-parse HEAD expect$count 
git checkout -B derived 
git commit --allow-empty -m Derived #$count 
git rev-parse HEAD derived$count 
git checkout base 
count=$(( $count + 1 )) || break
done 

for count in 1 2 3 4 5
do
git merge-base --reflog base $(cat derived$count) actual 
test_cmp expect$count actual || break
done 

# check defaulting to HEAD
git merge-base --reflog base actual 
test_cmp expect5 actual
'

To the earlier code, somebody could add

(
+   more setup code 
+   yet more setup code 
for count in 1 2 3 4 5

inside the subshell and we would fail to notice fairlure from the
new setup code if we didn't have  after the first done.  There
is much less risk of that kind of breakage if we did the loop
without subshell and exit and instead with the usual -cascade.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 2/2] merge-base: --reflog mode finds fork point from reflog entries

The git pull --rebase command computes the fork point of the
branch being rebased using the reflog entries of the base branch
(typically a remote-tracking branch) the branch's work was based on,
in order to cope with the case in which the base branch has been
rewound and rebuilt.  For example, if the history looked like this:

 o---B1
/
---o---o---B2--o---o---o---Base
\
 B3
  \
   Derived

where the current tip of the base branch is at Base, but earlier
fetch observed that its tip used to be B3 and then B2 and then B1
before getting to the current commit, and the branch being rebased
on top of the latest base is based on commit B3, it tries to find
B3 by going through the output of git rev-list --reflog base (i.e.
Base, B1, B2, B3) until it finds a commit that is an ancestor of the
current tip Derived.

Internally, we have get_merge_bases_many() that can compute this
with one-go.  We would want a merge-base between Derived and a
fictitious merge commit that would result by merging all the
historical tips of base.  When such a commit exist, we should get
a single result, which exactly match one of the reflog entries of
base.

Teach git merge-base a new mode, --reflog, to compute exactly
that.

Helped-by: Eric Sunshine sunsh...@sunshineco.com
Signed-off-by: Junio C Hamano gits...@pobox.com
---

 * With updated tests, based on conversation with Eric Sunshine

 builtin/merge-base.c  | 77 +++
 t/t6010-merge-base.sh | 25 +
 2 files changed, 102 insertions(+)

diff --git a/builtin/merge-base.c b/builtin/merge-base.c
index d39c910..7b9bc15 100644
--- a/builtin/merge-base.c
+++ b/builtin/merge-base.c
@@ -1,6 +1,7 @@
 #include builtin.h
 #include cache.h
 #include commit.h
+#include refs.h
 #include parse-options.h
 
 static int show_merge_base(struct commit **rev, int rev_nr, int show_all)
@@ -27,6 +28,7 @@ static const char * const merge_base_usage[] = {
N_(git merge-base [-a|--all] --octopus commit...),
N_(git merge-base --independent commit...),
N_(git merge-base --is-ancestor commit commit),
+   N_(git merge-base --reflog ref [commit]),
NULL
 };
 
@@ -85,6 +87,73 @@ static int handle_is_ancestor(int argc, const char **argv)
return 1;
 }
 
+struct rev_collect {
+   struct commit **commit;
+   int nr;
+   int alloc;
+};
+
+static int collect_one_reflog_ent(unsigned char *osha1, unsigned char *nsha1,
+ const char *ident, unsigned long timestamp,
+ int tz, const char *message, void *cbdata_)
+{
+   struct rev_collect *revs = cbdata_;
+   struct commit *commit = lookup_commit(nsha1);
+   if (commit) {
+   ALLOC_GROW(revs-commit, revs-nr + 1, revs-alloc);
+   revs-commit[revs-nr++] = commit;
+   }
+   return 0;
+}
+
+static int handle_reflog(int argc, const char **argv)
+{
+   unsigned char sha1[20];
+   char *refname;
+   const char *commitname;
+   struct rev_collect revs;
+   struct commit *derived;
+   struct commit_list *bases;
+   int i;
+
+   switch (dwim_ref(argv[0], strlen(argv[0]), sha1, refname)) {
+   case 0:
+   die(No such ref: '%s', argv[0]);
+   case 1:
+   break; /* good */
+   default:
+   die(Ambiguous refname: '%s', argv[0]);
+   }
+
+   commitname = (argc == 2) ? argv[1] : HEAD;
+   if (get_sha1(commitname, sha1))
+   die(Not a valid object name: '%s', commitname);
+
+   derived = lookup_commit_reference(sha1);
+   memset(revs, 0, sizeof(revs));
+   for_each_reflog_ent(refname, collect_one_reflog_ent, revs);
+
+   bases = get_merge_bases_many(derived, revs.nr, revs.commit, 0);
+
+   /*
+* There should be one and only one merge base, when we found
+* a common ancestor among reflog entries.
+*/
+   if (!bases || bases-next)
+   return 1;
+
+   /* And the found one must be one of the reflog entries */
+   for (i = 0; i  revs.nr; i++)
+   if (bases-item-object == revs.commit[i]-object)
+   break; /* found */
+   if (revs.nr = i)
+   return 1; /* not found */
+
+   printf(%s\n, sha1_to_hex(bases-item-object.sha1));
+   free_commit_list(bases);
+   return 0;
+}
+
 int cmd_merge_base(int argc, const char **argv, const char *prefix)
 {
struct commit **rev;
@@ -100,6 +169,8 @@ int cmd_merge_base(int argc, const char **argv, const char 
*prefix)
N_(list revs not reachable from others), 'r'),
OPT_CMDMODE(0, is-ancestor, cmdmode,
N_(is the first one ancestor of the other?), 'a'),
+   OPT_CMDMODE(0, reflog, cmdmode,
+   N_(find where commit

Re: Feature: support for file permissions

Renich Bon Ciric ren...@woralelandia.com writes:

 I think file permissions are really important for source code as well.

 For example, in web development, you want configuration files to be
 read only; specially if you're deploying with git.

That is not _source code_; you are talking about deployed set of
files, and as you said, Git is about source code and is not a tool
for deployment.

 Please, consider supporting them.

See the ancient discussion around this:

http://thread.gmane.org/gmane.comp.version-control.git/375/focus=435

A short answer is that it is not likely to happen.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Fwd: Bug report: reset -p HEAD

2013-10-24 Thread Maarten de Vries

Hi,

I noticed that reset -p HEAD is inconsistent with checkout -p HEAD.
When running checkout -p you are asked to discard hunks from the index
and worktree, but when running reset -p you are asked to apply hunks
to the index. It would make more sense if reset -p asked to discard
(reversed) hunks from the index.

Digging a bit further, it looks like reset -p is actually intended to
show hunks to discard when resetting to HEAD. The
git-add--interactive.perl script has different cases for resetting to
the head and for resetting to anything else. However, builtin/reset.c
always passes a hash to run_add_interactive, even if HEAD is provided
explicitly on the command line or no revision is given. As a result,
the special case for resetting to the HEAD is never triggered and
git-add--interactive.perl always asks to apply hunks rather than
discard the reverse hunks.

The offending part in builtin/reset.c is on line 307. It's the bit
with sha1_to_hex(sha1):
 if (patch_mode) {
 if (reset_type != NONE)
 die(_(--patch is incompatible with --{hard,mixed,soft}));
 return run_add_interactive(sha1_to_hex(sha1), --patch=reset, 
 pathspec);
 }

I'm not familiar enough with the git source, but it's probably a
fairly trivial fix for someone who is.


Kind regards,
Maarten de Vries


P.S.

This bit in git-add--interactive.perl convinced me that resetting to
HEAD interactively should be handled separately:
 'reset_head' = {
 DIFF = 'diff-index -p --cached',
 APPLY = sub { apply_patch 'apply -R --cached', @_; },
 APPLY_CHECK = 'apply -R --cached',
 VERB = 'Unstage',
 TARGET = '',
 PARTICIPLE = 'unstaging',
 FILTER = 'index-only',
 IS_REVERSE = 1,
 },
 'reset_nothead' = {
 DIFF = 'diff-index -R -p --cached',
 APPLY = sub { apply_patch 'apply --cached', @_; },
 APPLY_CHECK = 'apply --cached',
 VERB = 'Apply',
 TARGET = ' to index',
 PARTICIPLE = 'applying',
 FILTER = 'index-only',
 IS_REVERSE = 0,
 },
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Bug report: reset -p HEAD

2013-10-24 Thread Maarten de Vries

Some more info: It used to work as intended. Using a bisect shows it
has been broken by commit 166ec2e9.

Kinds regards,
Maarten de Vries

On 25 October 2013 01:05, Maarten de Vries maar...@de-vri.es wrote:
 Hi,

 I noticed that reset -p HEAD is inconsistent with checkout -p HEAD.
 When running checkout -p you are asked to discard hunks from the index
 and worktree, but when running reset -p you are asked to apply hunks
 to the index. It would make more sense if reset -p asked to discard
 (reversed) hunks from the index.

 Digging a bit further, it looks like reset -p is actually intended to
 show hunks to discard when resetting to HEAD. The
 git-add--interactive.perl script has different cases for resetting to
 the head and for resetting to anything else. However, builtin/reset.c
 always passes a hash to run_add_interactive, even if HEAD is provided
 explicitly on the command line or no revision is given. As a result,
 the special case for resetting to the HEAD is never triggered and
 git-add--interactive.perl always asks to apply hunks rather than
 discard the reverse hunks.

 The offending part in builtin/reset.c is on line 307. It's the bit
 with sha1_to_hex(sha1):
 if (patch_mode) {
 if (reset_type != NONE)
 die(_(--patch is incompatible with --{hard,mixed,soft}));
 return run_add_interactive(sha1_to_hex(sha1), --patch=reset, 
 pathspec);
 }

 I'm not familiar enough with the git source, but it's probably a
 fairly trivial fix for someone who is.


 Kind regards,
 Maarten de Vries


 P.S.

 This bit in git-add--interactive.perl convinced me that resetting to
 HEAD interactively should be handled separately:
 'reset_head' = {
 DIFF = 'diff-index -p --cached',
 APPLY = sub { apply_patch 'apply -R --cached', @_; },
 APPLY_CHECK = 'apply -R --cached',
 VERB = 'Unstage',
 TARGET = '',
 PARTICIPLE = 'unstaging',
 FILTER = 'index-only',
 IS_REVERSE = 1,
 },
 'reset_nothead' = {
 DIFF = 'diff-index -R -p --cached',
 APPLY = sub { apply_patch 'apply --cached', @_; },
 APPLY_CHECK = 'apply --cached',
 VERB = 'Apply',
 TARGET = ' to index',
 PARTICIPLE = 'applying',
 FILTER = 'index-only',
 IS_REVERSE = 0,
 },
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/19] ewah: compressed bitmap implementation

Jeff King p...@peff.net writes:

 From: Vicent Marti tan...@gmail.com

 EWAH is a word-aligned compressed variant of a bitset (i.e. a data
 structure that acts as a 0-indexed boolean array for many entries).

I think I've already pointed out some when the original series was
posted, but this does not seem to compile with -Wdecl-after-stmt
(possibly other violations may exist; I haven't looked very closely
yet).


 It uses a 64-bit run-length encoding (RLE) compression scheme,
 trading some compression for better processing speed.

 The goal of this word-aligned implementation is not to achieve
 the best compression, but rather to improve query processing time.
 As it stands right now, this EWAH implementation will always be more
 efficient storage-wise than its uncompressed alternative.

 EWAH arrays will be used as the on-disk format to store reachability
 bitmaps for all objects in a repository while keeping reasonable sizes,
 in the same way that JGit does.

 This EWAH implementation is a mostly straightforward port of the
 original `javaewah` library that JGit currently uses. The library is
 self-contained and has been embedded whole (4 files) inside the `ewah`
 folder to ease redistribution.

 The library is re-licensed under the GPLv2 with the permission of Daniel
 Lemire, the original author. The source code for the C version can
 be found on GitHub:

   https://github.com/vmg/libewok

 The original Java implementation can also be found on GitHub:

   https://github.com/lemire/javaewah

 Signed-off-by: Vicent Marti tan...@gmail.com
 Signed-off-by: Jeff King p...@peff.net
 ---
  Makefile   |   6 +
  ewah/bitmap.c  | 226 +
  ewah/ewah_bitmap.c | 701 
 +
  ewah/ewah_io.c | 192 +++
  ewah/ewah_rlw.c| 120 +
  ewah/ewok.h| 223 +
  ewah/ewok_rlw.h| 114 +
  7 files changed, 1582 insertions(+)
  create mode 100644 ewah/bitmap.c
  create mode 100644 ewah/ewah_bitmap.c
  create mode 100644 ewah/ewah_io.c
  create mode 100644 ewah/ewah_rlw.c
  create mode 100644 ewah/ewok.h
  create mode 100644 ewah/ewok_rlw.h

 diff --git a/Makefile b/Makefile
 index 48ff0bd..eacb940 100644
 --- a/Makefile
 +++ b/Makefile
 @@ -667,6 +667,8 @@ LIB_H += diff.h
  LIB_H += diffcore.h
  LIB_H += dir.h
  LIB_H += exec_cmd.h
 +LIB_H += ewah/ewok.h
 +LIB_H += ewah/ewok_rlw.h
  LIB_H += fetch-pack.h
  LIB_H += fmt-merge-msg.h
  LIB_H += fsck.h
 @@ -800,6 +802,10 @@ LIB_OBJS += dir.o
  LIB_OBJS += editor.o
  LIB_OBJS += entry.o
  LIB_OBJS += environment.o
 +LIB_OBJS += ewah/bitmap.o
 +LIB_OBJS += ewah/ewah_bitmap.o
 +LIB_OBJS += ewah/ewah_io.o
 +LIB_OBJS += ewah/ewah_rlw.o
  LIB_OBJS += exec_cmd.o
  LIB_OBJS += fetch-pack.o
  LIB_OBJS += fsck.o
 diff --git a/ewah/bitmap.c b/ewah/bitmap.c
 new file mode 100644
 index 000..d248577
 --- /dev/null
 +++ b/ewah/bitmap.c
 @@ -0,0 +1,226 @@
 +/**
 + * Copyright 2013, GitHub, Inc
 + * Copyright 2009-2013, Daniel Lemire, Cliff Moon,
 + *   David McIntosh, Robert Becho, Google Inc. and Veronika Zenz
 + *
 + * This program is free software; you can redistribute it and/or
 + * modify it under the terms of the GNU General Public License
 + * as published by the Free Software Foundation; either version 2
 + * of the License, or (at your option) any later version.
 + *
 + * This program is distributed in the hope that it will be useful,
 + * but WITHOUT ANY WARRANTY; without even the implied warranty of
 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 + * GNU General Public License for more details.
 + *
 + * You should have received a copy of the GNU General Public License
 + * along with this program; if not, write to the Free Software
 + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  
 02110-1301, USA.
 + */
 +#include assert.h
 +#include stdlib.h
 +#include string.h
 +
 +#include ewok.h
 +
 +#define MASK(x) ((eword_t)1  (x % BITS_IN_WORD))
 +#define BLOCK(x) (x / BITS_IN_WORD)
 +
 +struct bitmap *bitmap_new(void)
 +{
 + struct bitmap *bitmap = ewah_malloc(sizeof(struct bitmap));
 + bitmap-words = ewah_calloc(32, sizeof(eword_t));
 + bitmap-word_alloc = 32;
 + return bitmap;
 +}
 +
 +void bitmap_set(struct bitmap *self, size_t pos)
 +{
 + size_t block = BLOCK(pos);
 +
 + if (block = self-word_alloc) {
 + size_t old_size = self-word_alloc;
 + self-word_alloc = block * 2;
 + self-words = ewah_realloc(self-words, self-word_alloc * 
 sizeof(eword_t));
 +
 + memset(self-words + old_size, 0x0,
 + (self-word_alloc - old_size) * sizeof(eword_t));
 + }
 +
 + self-words[block] |= MASK(pos);
 +}
 +
 +void bitmap_clear(struct bitmap *self, size_t pos)
 +{
 + size_t block = BLOCK(pos);
 +
 + if (block  self-word_alloc)
 + self-words[block] = ~MASK(pos);
 +}
 +
 +int bitmap_get(struct bitmap *self, size_t

Re: [PATCH 2/2] entry.c: convert write_entry to use strbuf

On Fri, Oct 25, 2013 at 2:49 AM, Junio C Hamano gits...@pobox.com wrote:
 Duy Nguyen pclo...@gmail.com writes:

 On Thu, Oct 24, 2013 at 12:52 AM, Junio C Hamano gits...@pobox.com wrote:
 Nguyễn Thái Ngọc Duy pclo...@gmail.com writes:

 The strcpy call in open_output_fd() implies that the output buffer
 must be at least 25 chars long.

 Hmph, where does that 25 come from?

 [snipped]

 Much better. Thanks.

 So where does that 25 come from?

 We strcpy .merge_link_XX or .merge_file_XX into path[]
 and run mkstemp() on it, and these templates are 18 bytes long, so I
 am puzzled.

 Is 25 just a small random number that is surely longer than these
 templates--did not bother to count how long the templates are?

Yes. I was too lazy to subtract precisely the column number from
between the quotes, so I just made sure the number is large enough to
cover the columns..

 That's fine by me; I am just trying to make sure I am not missing
 anything that turns these templates into a longer filename.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 09/19] documentation: add documentation for the bitmap format

On Fri, Oct 25, 2013 at 1:03 AM, Jeff King p...@peff.net wrote:
 diff --git a/Documentation/technical/bitmap-format.txt 
 b/Documentation/technical/bitmap-format.txt
 new file mode 100644
 index 000..c686dd1
 --- /dev/null
 +++ b/Documentation/technical/bitmap-format.txt
 @@ -0,0 +1,131 @@
 +GIT bitmap v1 format
 +
 +
 +   - A header appears at the beginning:
 +
 +   4-byte signature: {'B', 'I', 'T', 'M'}
 +
 +   2-byte version number (network byte order)
 +   The current implementation only supports version 1
 +   of the bitmap index (the same one as JGit).

I suppose this means if I want to extend pack bitmaps to be used on
shallow clones, I need to step version to 2 before adding the shallow
points in .bitmap file as there's no chance of modifying v1 anymore,
correct?

 +
 +   2-byte flags (network byte order)
 +
 +   The following flags are supported:
 +
 +   - BITMAP_OPT_FULL_DAG (0x1) REQUIRED
 +   This flag must always be present. It implies that the 
 bitmap
 +   index has been generated for a packfile with full 
 closure
 +   (i.e. where every single object in the packfile can 
 find
 +its parent links inside the same packfile). This is a
 +   requirement for the bitmap index format, also present 
 in JGit,
 +   that greatly reduces the complexity of the 
 implementation.
 +
 +   4-byte entry count (network byte order)
 +
 +   The total count of entries (bitmapped commits) in 
 this bitmap index.
 +
 +   20-byte checksum
 +
 +   The SHA1 checksum of the pack this bitmap index 
 belongs to.
 +
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 13/19] pack-objects: implement bitmap writing

On Fri, Oct 25, 2013 at 1:06 AM, Jeff King p...@peff.net wrote:
 From: Vicent Marti tan...@gmail.com

 This commit extends more the functionality of `pack-objects` by allowing
 it to write out a `.bitmap` index next to any written packs, together
 with the `.idx` index that currently gets written.

 If bitmap writing is enabled for a given repository (either by calling
 `pack-objects` with the `--write-bitmap-index` flag or by having
 `pack.writebitmaps` set to `true` in the config) and pack-objects is
 writing a packfile that would normally be indexed (i.e. not piping to
 stdout), we will attempt to write the corresponding bitmap index for the
 packfile.

I haven't read the actual patch yet, but the diffstat says user
documentation is missing..
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: git grep: search whole tree by default?

2013-10-24 Thread David Aguilar

On Thu, Oct 24, 2013 at 12:40 PM, Junio C Hamano gits...@pobox.com wrote:
 Jeff King p...@peff.net writes:

 That would also provide people who do not like the change of default an
 escape hatch to keep the current behavior. And I do not think scripted
 use will be inconvenienced; they will already have to use . or :/ to
 be explicit (if they care) since the behavior is changing.

 There is a big difference between scripted use will have an escape
 hatch and scripted use will not be inconvenienced.  We *know*
 scripts will be inconvenienced with or without such a configuration
 variable, as they *have* to be updated if they rely on the current
 behaviour of git grep that limits its search to the current
 directory when fed no pathspec (and if their users want to keep the
 current behaviour of such scripts).  Anything short of a warning (or
 even erroring out) that is designed to annoy the users during the
 transition period will help ease the pain of transition of scripts.

 An annoying warning still can only *ease*, but cannot eliminate, the
 pain of transition. The scripts need to be updated to adjust to the
 new behaviour; there is no getting around to it.

 Even if we ignore the helping your colleague at her terminal, cf.

 http://thread.gmane.org/gmane.comp.version-control.git/133570/focus=133683

 issue for now, adding a new configuration variable from day one
 makes the transition of scripts somewhat worse, I am afraid.  Doing
 so robs us a way to add such an annoying warning to help people
 foresee problems in their existing scripts before the default
 changes (the configuration presumably will disable the this command
 line will behave differently after the default changes warning).

 As I said, I think we can train people without an annoying warning,
 as hits outside their current directory will serve as an annoyance
 already, and people who set such a configuration in their repository
 (or $HOME/.gitconfig), get used to the chosen behaviour too much,
 and get surprised when they get to use a vanilla intallation of Git
 (either helping colleague or setting up a new work environment) have
 only themselves to blame, so it may not be too big a deal.

 But I do not think the same reasoning extends to scripted uses X-.

The set of people that script git grep may in fact be pretty low /
almost non-existent so it may be a non-issue, but here's my one data
point:

For git-cola, this change in behavior would not make any difference.
It already jumps to the top-level during startup so its grep feature
is unaffected.

It'd be good to hear from other script writers but that's my $.02.
-- 
David
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/2] finding the fork point from reflog entries

John Keeping j...@keeping.me.uk writes:

 To clarify: the particular commit in the calls above happens to be the
 oldest entry in the reflog, if I pick a newer entry then it works.

 It seems that for_each_reflog_ent isn't returning the oldest entry;
 revs.nr is 62 whereas git rev-list -g origin/master | wc -l gives 63.

Perhaps this on top.

One thing I somehow feel dirty about this change is that we have to
include diff.h only to include revision.h only to get the definition
of TMP_MARK. Perhaps object.h should be the header that defines the
bit assignment for objects.flags bit-field.

 builtin/merge-base.c | 28 
 1 file changed, 24 insertions(+), 4 deletions(-)

diff --git a/builtin/merge-base.c b/builtin/merge-base.c
index 7b9bc15..7fdc3de 100644
--- a/builtin/merge-base.c
+++ b/builtin/merge-base.c
@@ -2,6 +2,8 @@
 #include cache.h
 #include commit.h
 #include refs.h
+#include diff.h
+#include revision.h
 #include parse-options.h
 
 static int show_merge_base(struct commit **rev, int rev_nr, int show_all)
@@ -91,18 +93,32 @@ struct rev_collect {
struct commit **commit;
int nr;
int alloc;
+   unsigned int initial : 1;
 };
 
+static void add_one_commit(unsigned char *sha1, struct rev_collect *revs)
+{
+   struct commit *commit = lookup_commit(sha1);
+
+   if (!commit || (commit-object.flags  TMP_MARK))
+   return;
+
+   ALLOC_GROW(revs-commit, revs-nr + 1, revs-alloc);
+   revs-commit[revs-nr++] = commit;
+   commit-object.flags |= TMP_MARK;
+}
+
 static int collect_one_reflog_ent(unsigned char *osha1, unsigned char *nsha1,
  const char *ident, unsigned long timestamp,
  int tz, const char *message, void *cbdata_)
 {
struct rev_collect *revs = cbdata_;
-   struct commit *commit = lookup_commit(nsha1);
-   if (commit) {
-   ALLOC_GROW(revs-commit, revs-nr + 1, revs-alloc);
-   revs-commit[revs-nr++] = commit;
+
+   if (revs-initial) {
+   add_one_commit(osha1, revs);
+   revs-initial = 0;
}
+   add_one_commit(nsha1, revs);
return 0;
 }
 
@@ -131,8 +147,12 @@ static int handle_reflog(int argc, const char **argv)
 
derived = lookup_commit_reference(sha1);
memset(revs, 0, sizeof(revs));
+   revs.initial = 1;
for_each_reflog_ent(refname, collect_one_reflog_ent, revs);
 
+   for (i = 0; i  revs.nr; i++)
+   revs.commit[i]-object.flags = ~TMP_MARK;
+
bases = get_merge_bases_many(derived, revs.nr, revs.commit, 0);
 
/*
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/19] pack bitmaps

This is only to tentatively work-around the compilation breakages;
the fixes need to be split into the respective patches that
introduce breakages when the series is rerolled (the one I sent for
pack-bitmap.c separately is also included in this message).

Thanks.

 ewah/ewah_bitmap.c  | 22 --
 ewah/ewah_io.c  | 44 ++--
 pack-bitmap-write.c |  2 --
 pack-bitmap.c   | 13 ++---
 4 files changed, 48 insertions(+), 33 deletions(-)

diff --git a/ewah/ewah_bitmap.c b/ewah/ewah_bitmap.c
index b74a1eb..7986720 100644
--- a/ewah/ewah_bitmap.c
+++ b/ewah/ewah_bitmap.c
@@ -65,6 +65,8 @@ static void buffer_push_rlw(struct ewah_bitmap *self, eword_t 
value)
 
 static size_t add_empty_words(struct ewah_bitmap *self, int v, size_t number)
 {
+   eword_t runlen;
+   eword_t can_add;
size_t added = 0;
 
if (rlw_get_run_bit(self-rlw) != v  rlw_size(self-rlw) == 0) {
@@ -76,8 +78,8 @@ static size_t add_empty_words(struct ewah_bitmap *self, int 
v, size_t number)
added++;
}
 
-   eword_t runlen = rlw_get_running_len(self-rlw);
-   eword_t can_add = min_size(number, RLW_LARGEST_RUNNING_COUNT - runlen);
+   runlen = rlw_get_running_len(self-rlw);
+   can_add = min_size(number, RLW_LARGEST_RUNNING_COUNT - runlen);
 
rlw_set_running_len(self-rlw, runlen + can_add);
number -= can_add;
@@ -426,6 +428,8 @@ void ewah_xor(
rlwit_init(rlw_j, ewah_j);
 
while (rlwit_word_size(rlw_i)  0  rlwit_word_size(rlw_j)  0) {
+   size_t literals;
+
while (rlw_i.rlw.running_len  0 || rlw_j.rlw.running_len  0) {
struct rlw_iterator *prey, *predator;
size_t index;
@@ -446,7 +450,7 @@ void ewah_xor(
rlwit_discard_first_words(predator, 
predator-rlw.running_len);
}
 
-   size_t literals = min_size(rlw_i.rlw.literal_words, 
rlw_j.rlw.literal_words);
+   literals = min_size(rlw_i.rlw.literal_words, 
rlw_j.rlw.literal_words);
 
if (literals) {
size_t k;
@@ -484,6 +488,8 @@ void ewah_and(
rlwit_init(rlw_j, ewah_j);
 
while (rlwit_word_size(rlw_i)  0  rlwit_word_size(rlw_j)  0) {
+   size_t literals;
+
while (rlw_i.rlw.running_len  0 || rlw_j.rlw.running_len  0) {
struct rlw_iterator *prey, *predator;
 
@@ -507,7 +513,7 @@ void ewah_and(
}
}
 
-   size_t literals = min_size(rlw_i.rlw.literal_words, 
rlw_j.rlw.literal_words);
+   literals = min_size(rlw_i.rlw.literal_words, 
rlw_j.rlw.literal_words);
 
if (literals) {
size_t k;
@@ -545,6 +551,8 @@ void ewah_and_not(
rlwit_init(rlw_j, ewah_j);
 
while (rlwit_word_size(rlw_i)  0  rlwit_word_size(rlw_j)  0) {
+   size_t literals;
+
while (rlw_i.rlw.running_len  0 || rlw_j.rlw.running_len  0) {
struct rlw_iterator *prey, *predator;
 
@@ -572,7 +580,7 @@ void ewah_and_not(
}
}
 
-   size_t literals = min_size(rlw_i.rlw.literal_words, 
rlw_j.rlw.literal_words);
+   literals = min_size(rlw_i.rlw.literal_words, 
rlw_j.rlw.literal_words);
 
if (literals) {
size_t k;
@@ -610,6 +618,8 @@ void ewah_or(
rlwit_init(rlw_j, ewah_j);
 
while (rlwit_word_size(rlw_i)  0  rlwit_word_size(rlw_j)  0) {
+   size_t literals;
+
while (rlw_i.rlw.running_len  0 || rlw_j.rlw.running_len  0) {
struct rlw_iterator *prey, *predator;
 
@@ -634,7 +644,7 @@ void ewah_or(
}
}
 
-   size_t literals = min_size(rlw_i.rlw.literal_words, 
rlw_j.rlw.literal_words);
+   literals = min_size(rlw_i.rlw.literal_words, 
rlw_j.rlw.literal_words);
 
if (literals) {
size_t k;
diff --git a/ewah/ewah_io.c b/ewah/ewah_io.c
index db6c062..05c51d9 100644
--- a/ewah/ewah_io.c
+++ b/ewah/ewah_io.c
@@ -58,19 +58,26 @@ int ewah_serialize_to(struct ewah_bitmap *self,
eword_t dump[2048];
const size_t words_per_dump = sizeof(dump) / sizeof(eword_t);
 
-   /* 32 bit -- bit size fr the map */
-   uint32_t bitsize =  htonl((uint32_t)self-bit_size);
+   /* 32 bit -- bit size for the map */
+   uint32_t bitsize;
+   /* 32 bit -- number of compressed 64-bit words */
+   uint32_t word_count;
+   /* 64 bit x N -- compressed words */
+   const eword_t *buffer = self-buffer;
+   size_t words_left;
+
+   /* 32 bit -- position for the RLW */
+   uint32_t rlw_pos;
+
+   bitsize =  htonl((uint32_t)self-bit_size);
if (write_fun(data, bitsize, 4) != 4)

Re: Finding the repository

2013-10-24 Thread Phil Hord

On Thu, Oct 24, 2013 at 9:46 AM, Duy Nguyen pclo...@gmail.com wrote:
 On Thu, Oct 24, 2013 at 2:49 PM, Perry Hutchison per...@pluto.rain.com 
 wrote:
 Duy Nguyen pclo...@gmail.com wrote:

 ... it's not easy to determine ambiguity here, especially when the
 repo finding code does not know anything about bar/barz.c (is it
 a pathname or an argument to an option?).
...
 There are more cases to consider, like what if you do
 git rm bar/baz.c and rab/zab.c where bar and rab are
 two different repositories..

 So we remove baz.c from bar and zab.c from rab.  It's not clear
 to me that there's anything wrong with that -- it's exactly what
 I would expect to have happen (and also what the hackish script
 I posted will do).

 For git rm, maybe. Many other commands need repo information and it
 would not make sense to have paths from two different repositories.
 For example, commit, rev-list or log. And it may break more things as
 most of current commands are designed to work on one repo from a to z.
 Some may support multi-repo operations if they're part of submodule
 support.

I've done some preliminary work on extending this sort of behavior to
submodule commands.  For example,

  git grep --recurse-submodules foo

which would look in the current project path and also any submodules
encountered.  This usage also begs for this extension:

  git grep --recurse-submodules foo path/to/sub/bar.c

Where 'path/to/sub' is a submodule, and therefore a foreign git repo
to this one.  Solving this is a little bit easier than your case
because git is already running inside a repo. Extending the reach to
submodules only requires more odb's than our first one to be
considered.  Along the way, I have considered your case, but I haven't
focused on it. Lately I haven't had time to focus on my case either,
though.  :-\

Phil
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/19] ewah: compressed bitmap implementation

On Thu, Oct 24, 2013 at 04:34:48PM -0700, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
  From: Vicent Marti tan...@gmail.com
 
  EWAH is a word-aligned compressed variant of a bitset (i.e. a data
  structure that acts as a 0-indexed boolean array for many entries).
 
 I think I've already pointed out some when the original series was
 posted, but this does not seem to compile with -Wdecl-after-stmt
 (possibly other violations may exist; I haven't looked very closely
 yet).

Sorry about that. I typically compile with -Wall -Werror which does
not catch this, and I didn't carefully go over the ewah code for style,
as I considered it mostly a black-box import (though certainly it is
worth fixing this case, as it's not just a style issue).

It looks like there is also some void pointer arithmetic...I _thought_
that was handled by -Wall, but I suppose not. Maybe I need to beef up my
warning settings.

Note that there is also some use of cast-mmap-file-to-struct, only one
instance of which uses __attribute__(packed). However, I notice that the
regular pack code is also guilty of this (e.g., see check_packed_git_idx),
so I don't know how careful we want to be (and whether we should be
using the packed attribute when needed, or if it is sufficiently
non-portable that we want to do it by hand in such cases).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 09/19] documentation: add documentation for the bitmap format

[+cc spearce; sorry, I really should have cc'd the whole series to you
 in the first place]

On Fri, Oct 25, 2013 at 08:16:18AM +0700, Nguyen Thai Ngoc Duy wrote:

  +   - A header appears at the beginning:
  +
  +   4-byte signature: {'B', 'I', 'T', 'M'}
  +
  +   2-byte version number (network byte order)
  +   The current implementation only supports version 1
  +   of the bitmap index (the same one as JGit).
 
 I suppose this means if I want to extend pack bitmaps to be used on
 shallow clones, I need to step version to 2 before adding the shallow
 points in .bitmap file as there's no chance of modifying v1 anymore,
 correct?

It depends on how you want to change it. And what Shawn says. :)

This is an attempt to document the JGit format. My feeling is that there
should be a version bump if an existing implementation would barf on it,
and I assume Shawn would agree.  But if you want to simply add extra
data that would be ignored by an existing implementation, it would be OK
to add the data and mark it with a flag:

  +   2-byte flags (network byte order)
  +
  +   The following flags are supported:
 [...]

That's how we added the name-hash cache in the final patch.

That being said, JGit is _not_ happy with that, and complains when any
flags besides OPT_FULL are used.  Whether that is because I am
misinterpreting the intent of the flags field, or because JGit is being
overly strict is up for debate.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 13/19] pack-objects: implement bitmap writing

On Fri, Oct 25, 2013 at 08:21:12AM +0700, Nguyen Thai Ngoc Duy wrote:

  If bitmap writing is enabled for a given repository (either by calling
  `pack-objects` with the `--write-bitmap-index` flag or by having
  `pack.writebitmaps` set to `true` in the config) and pack-objects is
  writing a packfile that would normally be indexed (i.e. not piping to
  stdout), we will attempt to write the corresponding bitmap index for the
  packfile.
 
 I haven't read the actual patch yet, but the diffstat says user
 documentation is missing..

I'll work on that for the re-roll.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 09/19] documentation: add documentation for the bitmap format

On Fri, Oct 25, 2013 at 10:21 AM, Jeff King p...@peff.net wrote:
 But if you want to simply add extra
 data that would be ignored by an existing implementation, it would be OK
 to add the data and mark it with a flag:

  +   2-byte flags (network byte order)
  +
  +   The following flags are supported:
 [...]

 That's how we added the name-hash cache in the final patch.

 That being said, JGit is _not_ happy with that, and complains when any
 flags besides OPT_FULL are used.  Whether that is because I am
 misinterpreting the intent of the flags field, or because JGit is being
 overly strict is up for debate.

Might be a good idea to support two classes of flags like how
extensions are handled in the index: ignore unrecognized uppercase
extension names, barf on unrecongized lowercase names. We could
partition the flag bit space more or less the same way.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re* Bug report: reset -p HEAD

Maarten de Vries maar...@de-vri.es writes:

 Some more info: It used to work as intended. Using a bisect shows it
 has been broken by commit 166ec2e9.

Thanks.

A knee-jerk change without thinking what side-effect it has for you
to try out.

 builtin/reset.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/builtin/reset.c b/builtin/reset.c
index f2f9d55..a3088d9 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -304,7 +304,10 @@ int cmd_reset(int argc, const char **argv, const char 
*prefix)
if (patch_mode) {
if (reset_type != NONE)
die(_(--patch is incompatible with 
--{hard,mixed,soft}));
-   return run_add_interactive(sha1_to_hex(sha1), --patch=reset, 
pathspec);
+   return run_add_interactive(
+   (unborn || strcmp(rev, HEAD))
+   ? sha1_to_hex(sha1)
+   : HEAD, --patch=reset, pathspec);
}
 
/* git reset tree [--] paths... can be used to
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[BUG] git --literal-pathspecs blame broken in master

There seems to be a bad interaction with --literal-pathspecs and the
pathspec magic that is in master. Here's an example:

  [setup]
  $ git init
  $ echo content ':(foo)bar'
  $ git add .  git commit -m foo

  [with git v1.8.4]
  $ git blame ':(foo)bar'
  ^6b07eb4 (Jeff King 2013-10-24 22:59:02 -0400 1) content
  $ git --literal-pathspecs blame ':(foo)bar'
  ^6b07eb4 (Jeff King 2013-10-24 22:59:02 -0400 1) content

So originally blame did not handle pathspec magic at all, and
--literal-pathspecs did nothing. So the former is arguably buggy, but
the latter did the right thing (if only by accident :) ).

Then along came 9a08727 (remove init_pathspec() in favor of
parse_pathspec(), 2013-07-14), and we get:

  $ git blame ':(foo)bar'
  fatal: Invalid pathspec magic 'foo' in ':(foo)bar'
  $ git --literal-pathspecs blame ':(foo)bar'
  fatal: Invalid pathspec magic 'foo' in ':(foo)bar'

The first is an improvement, because we are now consistent with other
pathspec handling in git. But the latter is a regression; we are not
respecting the literal pathspec flag.

We get another change with a16bf9d (pathspec: make --literal-pathspecs
disable pathspec magic, 2013-07-14), which I would think would fix
things, but doesn't.

  $ git blame ':(foo)bar'
  fatal: Invalid pathspec magic 'foo' in ':(foo)bar'
  $ git --literal-pathspecs blame ':(foo)bar'
  fatal: :(foo)bar: pathspec magic not supported by this command: 'literal'

The first one remains good, but the second one is still broken. I
haven't dug further yet, but I thought it might be a bit more obvious to
you.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [BUG] git --literal-pathspecs blame broken in master

On Thu, Oct 24, 2013 at 11:49:47PM -0400, Jeff King wrote:

 We get another change with a16bf9d (pathspec: make --literal-pathspecs
 disable pathspec magic, 2013-07-14), which I would think would fix
 things, but doesn't.
 
   $ git blame ':(foo)bar'
   fatal: Invalid pathspec magic 'foo' in ':(foo)bar'
   $ git --literal-pathspecs blame ':(foo)bar'
   fatal: :(foo)bar: pathspec magic not supported by this command: 'literal'
 
 The first one remains good, but the second one is still broken. I
 haven't dug further yet, but I thought it might be a bit more obvious to
 you.

Hmm. Is the fix as simple as:

diff --git a/builtin/blame.c b/builtin/blame.c
index 56e3d6b..1c2b303 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -408,7 +408,7 @@ static struct origin *find_origin(struct scoreboard *sb,
paths[0] = origin-path;
paths[1] = NULL;
 
-   parse_pathspec(diff_opts.pathspec, PATHSPEC_ALL_MAGIC, 0, , paths);
+   parse_pathspec(diff_opts.pathspec, PATHSPEC_ALL_MAGIC  
~PATHSPEC_LITERAL, 0, , paths);
diff_setup_done(diff_opts);
 
if (is_null_sha1(origin-commit-object.sha1))

All of the GUARD_PATHSPEC calls indicate that everybody understands
PATHSPEC_LITERAL. It is not technically true that git-blame understands
the literal pathspec magic:

  $ git blame -- ':(literal)revision.c'
  fatal: no such path ':(literal)revision.c' in HEAD

but that is a separate bug (that blame considers the argument as a path
first before feeding it to the pathspec machinery). The patch above does
not fix that, but AFAICT it does not make anything worse.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [BUG] git --literal-pathspecs blame broken in master

On Fri, Oct 25, 2013 at 10:49 AM, Jeff King p...@peff.net wrote:
   $ git --literal-pathspecs blame ':(foo)bar'
   fatal: :(foo)bar: pathspec magic not supported by this command: 'literal'

 The first one remains good, but the second one is still broken. I
 haven't dug further yet, but I thought it might be a bit more obvious to
 you.

I checked it too strict. Something like this should fix it. I'll post
a patch with tests later

diff --git a/pathspec.c b/pathspec.c
index ad1a9f5..69e4fdb 100644
--- a/pathspec.c
+++ b/pathspec.c
@@ -405,7 +405,7 @@ void parse_pathspec(struct pathspec *pathspec,
item[i].magic = prefix_pathspec(item + i, short_magic,
argv + i, flags,
prefix, prefixlen, entry);
-   if (item[i].magic  magic_mask)
+   if (item[i].magic  (magic_mask  ~PATHSPEC_LITERAL))
unsupported_magic(entry,
  item[i].magic  magic_mask,
  short_magic);

-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [BUG] git --literal-pathspecs blame broken in master

On Fri, Oct 25, 2013 at 11:16:08AM +0700, Nguyen Thai Ngoc Duy wrote:

  All of the GUARD_PATHSPEC calls indicate that everybody understands
  PATHSPEC_LITERAL. It is not technically true that git-blame understands
  the literal pathspec magic:
 
$ git blame -- ':(literal)revision.c'
fatal: no such path ':(literal)revision.c' in HEAD
 
  but that is a separate bug (that blame considers the argument as a path
  first before feeding it to the pathspec machinery). The patch above does
  not fix that, but AFAICT it does not make anything worse.
 
 I did consider this change but dropped it because there are more
 parse_pathspec() calls with PATHSPEC_ALL_MAGIC as mask. Thanks for
 bringing up :(literal).  I guess we need to change prefix_pathspec()
 to set PATHSPEC_LITERAL only when :(literal) is present, not when
 --literal-pathspecs is used.

I considered suggesting that, too, but it means that everywhere that
checks for PATHSPEC_LITERAL must _also_ check for literal_global (e.g.,
if they were deciding to feed the result to fnmatch). Whereas if we
catch it at the parse_pathspec layer, then the consumers of the pathspec
just need to check the one flag.

I dunno. I haven't kept up very well with your work in this area, so you
probably have a better sense than I do of what would be the most
elegant.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Re* Bug report: reset -p HEAD

On Thu, Oct 24, 2013 at 08:40:13PM -0700, Junio C Hamano wrote:

 Maarten de Vries maar...@de-vri.es writes:
 
  Some more info: It used to work as intended. Using a bisect shows it
  has been broken by commit 166ec2e9.
 
 Thanks.
 
 A knee-jerk change without thinking what side-effect it has for you
 to try out.
 
  builtin/reset.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)
 
 diff --git a/builtin/reset.c b/builtin/reset.c
 index f2f9d55..a3088d9 100644
 --- a/builtin/reset.c
 +++ b/builtin/reset.c
 @@ -304,7 +304,10 @@ int cmd_reset(int argc, const char **argv, const char 
 *prefix)
   if (patch_mode) {
   if (reset_type != NONE)
   die(_(--patch is incompatible with 
 --{hard,mixed,soft}));
 - return run_add_interactive(sha1_to_hex(sha1), --patch=reset, 
 pathspec);
 + return run_add_interactive(
 + (unborn || strcmp(rev, HEAD))
 + ? sha1_to_hex(sha1)
 + : HEAD, --patch=reset, pathspec);
   }

I think that's the correct fix for the regression.  You are restoring
the original, pre-166ec2e9 behavior for just the HEAD case. I do not
think add--interactive does any other magic between a symbolic rev and
its sha1, except for recognizing HEAD specially. However, if you wanted
to minimize the potential impact of 166ec2e9, you could pass the sha1
_only_ in the unborn case, like this:

diff --git a/builtin/reset.c b/builtin/reset.c
index f2f9d55..bfdd8d3 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -283,6 +283,7 @@ int cmd_reset(int argc, const char **argv, const char 
*prefix)
if (unborn) {
/* reset on unborn branch: treat as reset to empty tree */
hashcpy(sha1, EMPTY_TREE_SHA1_BIN);
+   rev = EMPTY_TREE_SHA1_HEX;
} else if (!pathspec.nr) {
struct commit *commit;
if (get_sha1_committish(rev, sha1))
@@ -304,7 +305,7 @@ int cmd_reset(int argc, const char **argv, const char 
*prefix)
if (patch_mode) {
if (reset_type != NONE)
die(_(--patch is incompatible with 
--{hard,mixed,soft}));
-   return run_add_interactive(sha1_to_hex(sha1), --patch=reset, 
pathspec);
+   return run_add_interactive(rev, --patch=reset, pathspec);
}
 
/* git reset tree [--] paths... can be used to

That fixes any possible regression from add--interactive treating the
two cases differently. On an unborn branch, we will still say apply
this hunk rather than unstage this hunk. That's not a regression,
because it simply didn't work before, but it's not ideal. To fix that,
we need to somehow tell add--interactive this is HEAD, but use the
empty tree because it's unborn. I can think of a few simple-ish ways:

  1. Pass the head/not-head flag as a separate option.

  2. Pass HEAD even in the unborn case; teach add--interactive to
 convert an unborn HEAD to the empty tree.

  3. Teach add--interactive to recognize the empty tree sha1 as an
 unstage path.

I kind of like (3). At first glance, it is wrong; we will also treat
git reset -p $(git hash-object -t tree /dev/null) as if HEAD had
been passed. But if you are explicitly passing the empty tree like that,
I think saying unstage makes a lot of sense.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 00/14] Officially start moving to the term 'staging area'

2013-10-24 Thread Felipe Contreras

Karsten Blees wrote:
 (2) Index
 
 An index, as in a library, maps almost perfectly to what the git index is
 _and_ what we do with it.

Not really. An index in the context of a library, and in any other context, is
a tool that indicates where something is, in order to find it quickly.

That is not how the Git index is used, nor what it is.

 (3b) Staging area (other meanings)
 
 I don't see how a stage (as in a theater) is in any way related to the git
 index.
 
 Data staging (as in loading a datawarehouse or web-server) fits to some
 extent, as its also about copying information, not moving physical things.

A stage in theater, and in any other context, is a special place, a standing
place, I don't see what is so different from the git staging area.

  Even 'native' speakers don't have a single consistent term for the
  concept. Terms are stolen from many varied industries and activities
  that have to prepare and package items (Ships, Trains, Theaters)
  (see http://en.wikipedia.org/wiki/Shipping_list, for a shortish list, which 
  doesn't mention an Index)
 
 All true, but we don't need to steal terms from unrelated fields if
 information science provides us with the terms we need.

But it doesn't.

-- 
Felipe Contreras
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: git grep: search whole tree by default?

On Thu, Oct 24, 2013 at 12:40:44PM -0700, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
  That would also provide people who do not like the change of default an
  escape hatch to keep the current behavior. And I do not think scripted
  use will be inconvenienced; they will already have to use . or :/ to
  be explicit (if they care) since the behavior is changing.
 
 There is a big difference between scripted use will have an escape
 hatch and scripted use will not be inconvenienced.

I think my communication may have been muddled in transit. What I meant
regarding inconvenienced was not any more so than by simply changing
the behavior in the first place, since scripts already will need to
start becoming explicit due to the behavior change.

And for the escape hatch, I did not mean for scripts. I actually meant
for users who do not like the extra typing and complain stupid git, I
always want '.'; you used to do what I want and now you do not.

 Even if we ignore the helping your colleague at her terminal, cf.
 
 http://thread.gmane.org/gmane.comp.version-control.git/133570/focus=133683

FWIW, I have never agreed with that line of reasoning. I was going to
explain why, but I see that I already did in response to the article you
linked. :)

 issue for now, adding a new configuration variable from day one
 makes the transition of scripts somewhat worse, I am afraid.  Doing
 so robs us a way to add such an annoying warning to help people
 foresee problems in their existing scripts before the default
 changes (the configuration presumably will disable the this command
 line will behave differently after the default changes warning).

If you want to have an annoying warning, why not consider the config a
tristate? Do X or do Y, or if unset, do X with an annoying warning
(which will switch to Y in the future). That does not help a user who
sets the variable after seeing the warning the first time, then later
runs a script that silently chooses the wrong behavior.

But neither does a warning that is squelched by advice.*, which the user
will also set soon after seeing it.

The only way to hit those scripts is to yell at the user anytime the
appropriate command-line override is not selected, with no way to turn
it off. That's what we're doing now with git add. I think people find
it a little annoying. But perhaps it is the least of all evils.


Anyway, I have said my piece, and I think we are on the same page with
the tradeoffs (what they are, though we may value them differently).  I
do not care that strongly about the config option these days; as I said,
it was something I would have used in certain workflows, but I do not
foresee myself even setting it these days. So I am willing to forego it
if there are concerns it will make things worse.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: git grep: search whole tree by default?

On Fri, Oct 25, 2013 at 11:37 AM, Jeff King p...@peff.net wrote:
 On Thu, Oct 24, 2013 at 12:40:44PM -0700, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:

  That would also provide people who do not like the change of default an
  escape hatch to keep the current behavior. And I do not think scripted
  use will be inconvenienced; they will already have to use . or :/ to
  be explicit (if they care) since the behavior is changing.

 There is a big difference between scripted use will have an escape
 hatch and scripted use will not be inconvenienced.

 I think my communication may have been muddled in transit. What I meant
 regarding inconvenienced was not any more so than by simply changing
 the behavior in the first place, since scripts already will need to
 start becoming explicit due to the behavior change.

 And for the escape hatch, I did not mean for scripts. I actually meant
 for users who do not like the extra typing and complain stupid git, I
 always want '.'; you used to do what I want and now you do not.

Such an escape hatch may be better done as an alias than a config key
(an alias is a config key anyway). I know it won't be easy to add '.'
if no pathspecs are given, using shell script. But that's something
we could improve, hopefully. An option is we could just export
PATHSPEC_PREFER_* flags via a command line (like --literal-pathspecs).
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Re* Bug report: reset -p HEAD

2013-10-24 Thread Martin von Zweigbergk

Sorry about the regression and thanks for report and fixes.

On Thu, Oct 24, 2013 at 9:24 PM, Jeff King p...@peff.net wrote:
 On Thu, Oct 24, 2013 at 08:40:13PM -0700, Junio C Hamano wrote:

 Maarten de Vries maar...@de-vri.es writes:

  Some more info: It used to work as intended. Using a bisect shows it
  has been broken by commit 166ec2e9.

 Thanks.

 A knee-jerk change without thinking what side-effect it has for you
 to try out.

  builtin/reset.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

 diff --git a/builtin/reset.c b/builtin/reset.c
 index f2f9d55..a3088d9 100644
 --- a/builtin/reset.c
 +++ b/builtin/reset.c
 @@ -304,7 +304,10 @@ int cmd_reset(int argc, const char **argv, const char 
 *prefix)
   if (patch_mode) {
   if (reset_type != NONE)
   die(_(--patch is incompatible with 
 --{hard,mixed,soft}));
 - return run_add_interactive(sha1_to_hex(sha1), --patch=reset, 
 pathspec);
 + return run_add_interactive(
 + (unborn || strcmp(rev, HEAD))
 + ? sha1_to_hex(sha1)
 + : HEAD, --patch=reset, pathspec);
   }

 I think that's the correct fix for the regression.  You are restoring
 the original, pre-166ec2e9 behavior for just the HEAD case. I do not
 think add--interactive does any other magic between a symbolic rev and
 its sha1, except for recognizing HEAD specially. However, if you wanted
 to minimize the potential impact of 166ec2e9, you could pass the sha1
 _only_ in the unborn case, like this:

Plus, the end result is more readable, IMHO.

 diff --git a/builtin/reset.c b/builtin/reset.c
 index f2f9d55..bfdd8d3 100644
 --- a/builtin/reset.c
 +++ b/builtin/reset.c
 @@ -283,6 +283,7 @@ int cmd_reset(int argc, const char **argv, const char 
 *prefix)
 if (unborn) {
 /* reset on unborn branch: treat as reset to empty tree */
 hashcpy(sha1, EMPTY_TREE_SHA1_BIN);
 +   rev = EMPTY_TREE_SHA1_HEX;
 } else if (!pathspec.nr) {
 struct commit *commit;
 if (get_sha1_committish(rev, sha1))
 @@ -304,7 +305,7 @@ int cmd_reset(int argc, const char **argv, const char 
 *prefix)
 if (patch_mode) {
 if (reset_type != NONE)
 die(_(--patch is incompatible with 
 --{hard,mixed,soft}));
 -   return run_add_interactive(sha1_to_hex(sha1), 
 --patch=reset, pathspec);
 +   return run_add_interactive(rev, --patch=reset, pathspec);
 }

 /* git reset tree [--] paths... can be used to

 That fixes any possible regression from add--interactive treating the
 two cases differently. On an unborn branch, we will still say apply
 this hunk rather than unstage this hunk. That's not a regression,
 because it simply didn't work before, but it's not ideal. To fix that,
 we need to somehow tell add--interactive this is HEAD, but use the
 empty tree because it's unborn. I can think of a few simple-ish ways:

   1. Pass the head/not-head flag as a separate option.

   2. Pass HEAD even in the unborn case; teach add--interactive to
  convert an unborn HEAD to the empty tree.

   3. Teach add--interactive to recognize the empty tree sha1 as an
  unstage path.

 I kind of like (3). At first glance, it is wrong; we will also treat
 git reset -p $(git hash-object -t tree /dev/null) as if HEAD had
 been passed. But if you are explicitly passing the empty tree like that,
 I think saying unstage makes a lot of sense.

Makes sense to me. I'm sure others can implement that much faster than
I can, but I feel a little guilty, so I'm happy to do it if no one
else wants to, as long as we agree this is the way we want to go.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCHv2 0/19] pack bitmaps

On Thu, Oct 24, 2013 at 01:59:15PM -0400, Jeff King wrote:

 This series implements JGit-style pack bitmaps to speed up fetching and
 cloning.

Here is a re-roll fixing the comments so far. I know that nobody is
likely to have had a chance to read through it carefully yet, but given
the compiler warnings from the initial version, it makes sense to me to
get a clean baseline in front of people before they start reading the
old one. Given the size, I'll let this one collect more in-depth
comments for a bit before sending out another re-roll.

Besides the warning fixups, the ewah/* files have some style fixups to
better match git, and I added documentation for some config and
command-line options. I didn't add documentation for items I do not
expect people to actually use (e.g., rev-list --test-bitmaps), as they
clutter the documentation.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 01/19] sha1write: make buffer const-correct