date:20160822

Re: git-svn bridge and line endings

2016-08-22 Thread Eric Wong

Alfred Perlstein  wrote:
> I hadn't anticipated there be to translation between svn props and
> .gitattributes, it sounds a bit messy but possible, that said, is it
> not possible to commit .gitattribute files to the svn repo?  Even in
> FreeBSD land such small token files are permitted.

I'm not sure if an automatic translation is necessary or
desired (because of a corruption risk).

Perhaps Lucian can clarify the situation for his repo.

> As far as documenting svn-properties, most of the properties are
> used on the Subversion side either by subversion itself, or by
> scripts in the subversion repository.  Perhaps a blurb "see the
> subversion documentation and/or your local subversion
> administrator's guide for properties and their uses." would suffice?

Yes, perhaps with a workable example Lucian can use today with
any git v2.3.0 or later.

Thanks for the quick response!

> Opinions?  Happy to look into it.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: git-svn bridge and line endings

2016-08-22 Thread Alfred Perlstein




On 8/22/16 8:07 PM, Eric Wong wrote:

Adding Alfred to the Cc:, more below...

Lucian Smith  wrote:

I'm attempting to use the git-svn bridge, and am having problems with
line endings on Windows.

The setup is that we have a git repository on github, and I've checked
out a branch on my Windows machine using Tortoise svn.  I make
changes, commit them, and the branch is updated.  In general, this
works fine.

If this was just SVN, I could set the 'eol-style' for files to
'native' to let it know to expect Windows/linux/mac line endings for
particular files.  This seems to be handled in git by using the
'.gitattributes' file instead.  Unfortunately, the git/svn bridge
doesn't seem to be translate the information in the .gitattributes
file to appropriate eol-style settings in SVN.  Checking out a file
using SVN on Windows leaves me with a file without CRLF's, and if I
check in a CRLF file, that's the way it goes into the repository.
Differences in CRLF alone show up as 'real' differences that can be
checked in, and, if this happens, this causes problems with other
people's repositories.

Am I doing something wrong; is there another way to handle this; or
can I file this as a bug report/feature request?

Thank you!

-Lucian Smith

Lucian: Which version of git are you using?

As of git v2.3.0 and thanks to Alfred, the "git svn propset"
command exists for setting new props via git; however it
only got documented recently in v2.9.1.

It also seems to support setting props via "svn-properties"
in the .gitattributes file based on reading the code;
but I'm not familiar with this area, so I defer to Alfred.

Alfred: "svn-properties" isn't documented anywhere, yet;
is this something that should/could be documented?

Thanks.

cf. commit 83c9433e679635f8fbf8961081ea3581c93ca778
 ("git-svn: support for git-svn propset")
 https://bogomips.org/git-svn.git/commit/?id=83c9433e6796

 commit 19a7f24b6f8aa89ea5899c928c2fa350f4b1521e
 ("git-svn: document the 'git svn propset' command")
 https://bogomips.org/git-svn.git/commit/?id=19a7f24b6f8a

 https://public-inbox.org/git/?q=s:%22git-svn+propset%22

Anyways I've never used SVN props much myself, so don't much
have experience in this area; nor do I have much experience
with git-side gitattributes.

Thank you.  I'm going to need some time to look into this.  The addition 
of svn-properties support was mostly to facilitate those that needed 
attributes set for other svn consumers or "repo meister" (the person 
that runs VCS).  An example being FreeBSD's 'Keyword expansion' on 
checkout facility, that says whether or not to expand $FreeBSD$.


I hadn't anticipated there be to translation between svn props and 
.gitattributes, it sounds a bit messy but possible, that said, is it not 
possible to commit .gitattribute files to the svn repo?  Even in FreeBSD 
land such small token files are permitted.


As far as documenting svn-properties, most of the properties are used on 
the Subversion side either by subversion itself, or by scripts in the 
subversion repository.  Perhaps a blurb "see the subversion 
documentation and/or your local subversion administrator's guide for 
properties and their uses." would suffice?


Opinions?  Happy to look into it.

-Alfred






--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] diff-highlight: add some tests.

2016-08-22 Thread Jeff King

On Mon, Aug 22, 2016 at 08:55:39AM -0700, Brian Henderson wrote:

> Jeff, I love your idea. how's this looking?

Much more readable, IMHO.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: git-svn bridge and line endings

2016-08-22 Thread Eric Wong

Adding Alfred to the Cc:, more below...

Lucian Smith  wrote:
> I'm attempting to use the git-svn bridge, and am having problems with
> line endings on Windows.
> 
> The setup is that we have a git repository on github, and I've checked
> out a branch on my Windows machine using Tortoise svn.  I make
> changes, commit them, and the branch is updated.  In general, this
> works fine.
> 
> If this was just SVN, I could set the 'eol-style' for files to
> 'native' to let it know to expect Windows/linux/mac line endings for
> particular files.  This seems to be handled in git by using the
> '.gitattributes' file instead.  Unfortunately, the git/svn bridge
> doesn't seem to be translate the information in the .gitattributes
> file to appropriate eol-style settings in SVN.  Checking out a file
> using SVN on Windows leaves me with a file without CRLF's, and if I
> check in a CRLF file, that's the way it goes into the repository.
> Differences in CRLF alone show up as 'real' differences that can be
> checked in, and, if this happens, this causes problems with other
> people's repositories.
> 
> Am I doing something wrong; is there another way to handle this; or
> can I file this as a bug report/feature request?
> 
> Thank you!
> 
> -Lucian Smith

Lucian: Which version of git are you using?

As of git v2.3.0 and thanks to Alfred, the "git svn propset"
command exists for setting new props via git; however it
only got documented recently in v2.9.1.

It also seems to support setting props via "svn-properties"
in the .gitattributes file based on reading the code;
but I'm not familiar with this area, so I defer to Alfred.

Alfred: "svn-properties" isn't documented anywhere, yet;
is this something that should/could be documented?

Thanks.

cf. commit 83c9433e679635f8fbf8961081ea3581c93ca778
("git-svn: support for git-svn propset")
https://bogomips.org/git-svn.git/commit/?id=83c9433e6796

commit 19a7f24b6f8aa89ea5899c928c2fa350f4b1521e
("git-svn: document the 'git svn propset' command")
https://bogomips.org/git-svn.git/commit/?id=19a7f24b6f8a

https://public-inbox.org/git/?q=s:%22git-svn+propset%22

Anyways I've never used SVN props much myself, so don't much
have experience in this area; nor do I have much experience
with git-side gitattributes.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v10 0/9] submodule inline diff format

2016-08-22 Thread Stefan Beller

On Mon, Aug 22, 2016 at 4:43 PM, Jacob Keller  wrote:
> From: Jacob Keller 
>
> A few suggestions from Stefan in regards to falling back to
> .git/modules/ being a bad idea. I've chosen I think to avoid using
> die() as we just stick with the current path if we can't find its name.

Which makes the existing bug more subtle :(

> I think this should be safe since we already do this today.

It's a bug today already. Thanks for spotting!

> The new flow
> only changes if we are able to lookup the submodule, so I don't think
> it's worth adding a die() call.

Well this series improves the buggy-ness as it is only buggy when the name
is not found, and we fall back on the path.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/7] delta_base_cache: use list.h for LRU

2016-08-22 Thread Jeff King

On Mon, Aug 22, 2016 at 11:18:06PM +, Eric Wong wrote:

> > As a bonus, the list_entry() macro lets us place the lru
> > pointers anywhere inside the delta_base_cache_entry struct
> > (as opposed to just casting the pointer, which requires it
> > at the front of the struct). This will be useful in later
> > patches when we need to place other items at the front of
> > the struct (e.g., our hashmap implementation requires this).
> 
> On a side note, I think we should s/list_entry/container_of/ and
> use container_of for hashmap.
> 
> Linux kernel defines list_entry to use container_of,
> but I'd rather avoid introducing the duality entirely.

That does make it slightly more annoying to use, since container_of()
requires naming the original type again (because of our lack of typeof),
whereas now you can freely cast between "struct hashmap_entry" and the
actual item. But if you are proposing to actually fix the hashmap to
eliminate the requirement to put the entry at the front, that might have
value (I'm not sure how easy it is to do, though; a lot of those casts
happen inside the hashmap code which is type-agnostic).

> > Signed-off-by: Jeff King 
> > ---
> > I think the result is much nicer, but I found list_entry() a little
> > disappointing, because we lack typeof(). So you are stuck writing:
> > 
> >   struct delta_base_cache_entry *f =
> > list_entry(p, struct delta_base_cache_entry, lru);
> > 
> > I waffled on adding something like:
> > 
> >   LIST_ITEM(struct delta_bas_cache_entry, f, p, lru);
> > 
> > to declare "f" as above. But it's getting rather magical and un-C-like.
> 
> Right.  I'd rather keep the list_entry/container_of usage
> identical to what developers familiar using userspace-rcu,
> ccan/list, or the Linux kernel expect.

Makes sense. I am not familiar with those APIs myself, but I do not see
any reason to deviate from them for no good reason.

> >  sha1_file.c | 38 --
> >  1 file changed, 16 insertions(+), 22 deletions(-)
> 
> Good to see the code reduction.

Yep. I realized the "hashmap must go at the front of the struct" thing
later. My initial reason for converting was actually to make the
"separate blob LRU" patch easier (since without this patch, it literally
doubles the amount of pointer manipulation required).

For reference, in case anybody wants to duplicate my experiments from
patch 5, that patch looks like this:

diff --git a/sha1_file.c b/sha1_file.c
index 2cd1b79..e82be30 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2076,7 +2076,8 @@ static void *unpack_compressed_entry(struct packed_git *p,
 
 static size_t delta_base_cached;
 
-static LIST_HEAD(delta_base_cache_lru);
+static LIST_HEAD(delta_base_cache_lru_blobs);
+static LIST_HEAD(delta_base_cache_lru_other);
 
 static struct delta_base_cache_entry {
struct list_head lru;
@@ -2170,15 +2171,14 @@ static void add_delta_base_cache(struct packed_git *p, 
off_t base_offset,
release_delta_base_cache(ent);
delta_base_cached += base_size;
 
-   list_for_each(lru, _base_cache_lru) {
+   list_for_each(lru, _base_cache_lru_blobs) {
struct delta_base_cache_entry *f =
list_entry(lru, struct delta_base_cache_entry, lru);
if (delta_base_cached <= delta_base_cache_limit)
break;
-   if (f->type == OBJ_BLOB)
-   release_delta_base_cache(f);
+   release_delta_base_cache(f);
}
-   list_for_each(lru, _base_cache_lru) {
+   list_for_each(lru, _base_cache_lru_other) {
struct delta_base_cache_entry *f =
list_entry(lru, struct delta_base_cache_entry, lru);
if (delta_base_cached <= delta_base_cache_limit)
@@ -2191,7 +2191,10 @@ static void add_delta_base_cache(struct packed_git *p, 
off_t base_offset,
ent->type = type;
ent->data = base;
ent->size = base_size;
-   list_add_tail(>lru, _base_cache_lru);
+   if (type == OBJ_BLOB)
+   list_add_tail(>lru, _base_cache_lru_blobs);
+   else
+   list_add_tail(>lru, _base_cache_lru_other);
 }
 
 static void *read_object(const unsigned char *sha1, enum object_type *type,

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v10 1/9] Git 2.10-rc1

2016-08-22 Thread Stefan Beller

On Mon, Aug 22, 2016 at 4:43 PM, Jacob Keller  wrote:

Bad rebase?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: git-svn bridge and line endings

2016-08-22 Thread Lucian Smith

I'm attempting to use the git-svn bridge, and am having problems with
line endings on Windows.

The setup is that we have a git repository on github, and I've checked
out a branch on my Windows machine using Tortoise svn.  I make
changes, commit them, and the branch is updated.  In general, this
works fine.

If this was just SVN, I could set the 'eol-style' for files to
'native' to let it know to expect Windows/linux/mac line endings for
particular files.  This seems to be handled in git by using the
'.gitattributes' file instead.  Unfortunately, the git/svn bridge
doesn't seem to be translate the information in the .gitattributes
file to appropriate eol-style settings in SVN.  Checking out a file
using SVN on Windows leaves me with a file without CRLF's, and if I
check in a CRLF file, that's the way it goes into the repository.
Differences in CRLF alone show up as 'real' differences that can be
checked in, and, if this happens, this causes problems with other
people's repositories.

Am I doing something wrong; is there another way to handle this; or
can I file this as a bug report/feature request?

Thank you!

-Lucian Smith
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v10 5/9] diff: prepare for additional submodule formats

2016-08-22 Thread Jacob Keller

From: Jacob Keller 

A future patch will add a new format for displaying the difference of
a submodule. Make it easier by changing how we store the current
selected format. Replace the DIFF_OPT flag with an enumeration, as each
format will be mutually exclusive.

Signed-off-by: Jacob Keller 
---
 diff.c | 12 ++--
 diff.h |  7 ++-
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/diff.c b/diff.c
index e57cf39ad109..d6b321da3d1d 100644
--- a/diff.c
+++ b/diff.c
@@ -132,9 +132,9 @@ static int parse_dirstat_params(struct diff_options 
*options, const char *params
 static int parse_submodule_params(struct diff_options *options, const char 
*value)
 {
if (!strcmp(value, "log"))
-   DIFF_OPT_SET(options, SUBMODULE_LOG);
+   options->submodule_format = DIFF_SUBMODULE_LOG;
else if (!strcmp(value, "short"))
-   DIFF_OPT_CLR(options, SUBMODULE_LOG);
+   options->submodule_format = DIFF_SUBMODULE_SHORT;
else
return -1;
return 0;
@@ -2300,9 +2300,9 @@ static void builtin_diff(const char *name_a,
struct strbuf header = STRBUF_INIT;
const char *line_prefix = diff_line_prefix(o);
 
-   if (DIFF_OPT_TST(o, SUBMODULE_LOG) &&
-   (!one->mode || S_ISGITLINK(one->mode)) &&
-   (!two->mode || S_ISGITLINK(two->mode))) {
+   if (o->submodule_format == DIFF_SUBMODULE_LOG &&
+   (!one->mode || S_ISGITLINK(one->mode)) &&
+   (!two->mode || S_ISGITLINK(two->mode))) {
const char *del = diff_get_color_opt(o, DIFF_FILE_OLD);
const char *add = diff_get_color_opt(o, DIFF_FILE_NEW);
show_submodule_summary(o->file, one->path ? one->path : 
two->path,
@@ -3916,7 +3916,7 @@ int diff_opt_parse(struct diff_options *options,
DIFF_OPT_SET(options, OVERRIDE_SUBMODULE_CONFIG);
handle_ignore_submodules_arg(options, arg);
} else if (!strcmp(arg, "--submodule"))
-   DIFF_OPT_SET(options, SUBMODULE_LOG);
+   options->submodule_format = DIFF_SUBMODULE_LOG;
else if (skip_prefix(arg, "--submodule=", ))
return parse_submodule_opt(options, arg);
else if (skip_prefix(arg, "--ws-error-highlight=", ))
diff --git a/diff.h b/diff.h
index 1f57aad25c71..43b353aea091 100644
--- a/diff.h
+++ b/diff.h
@@ -83,7 +83,6 @@ typedef struct strbuf *(*diff_prefix_fn_t)(struct 
diff_options *opt, void *data)
 #define DIFF_OPT_DIRSTAT_BY_FILE (1 << 20)
 #define DIFF_OPT_ALLOW_TEXTCONV  (1 << 21)
 #define DIFF_OPT_DIFF_FROM_CONTENTS  (1 << 22)
-#define DIFF_OPT_SUBMODULE_LOG   (1 << 23)
 #define DIFF_OPT_DIRTY_SUBMODULES(1 << 24)
 #define DIFF_OPT_IGNORE_UNTRACKED_IN_SUBMODULES (1 << 25)
 #define DIFF_OPT_IGNORE_DIRTY_SUBMODULES (1 << 26)
@@ -110,6 +109,11 @@ enum diff_words_type {
DIFF_WORDS_COLOR
 };
 
+enum diff_submodule_format {
+   DIFF_SUBMODULE_SHORT = 0,
+   DIFF_SUBMODULE_LOG
+};
+
 struct diff_options {
const char *orderfile;
const char *pickaxe;
@@ -157,6 +161,7 @@ struct diff_options {
int stat_count;
const char *word_regex;
enum diff_words_type word_diff;
+   enum diff_submodule_format submodule_format;
 
/* this is set by diffcore for DIFF_FORMAT_PATCH */
int found_changes;
-- 
2.10.0.rc0.259.g83512d9

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v10 7/9] submodule: convert show_submodule_summary to use struct object_id *

2016-08-22 Thread Jacob Keller

From: Jacob Keller 

Since we're going to be changing this function in a future patch, lets
go ahead and convert this to use object_id now.

Signed-off-by: Jacob Keller 
---
 diff.c  |  2 +-
 submodule.c | 16 
 submodule.h |  2 +-
 3 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/diff.c b/diff.c
index d6b321da3d1d..16253b191f53 100644
--- a/diff.c
+++ b/diff.c
@@ -2307,7 +2307,7 @@ static void builtin_diff(const char *name_a,
const char *add = diff_get_color_opt(o, DIFF_FILE_NEW);
show_submodule_summary(o->file, one->path ? one->path : 
two->path,
line_prefix,
-   one->oid.hash, two->oid.hash,
+   >oid, >oid,
two->dirty_submodule,
meta, del, add, reset);
return;
diff --git a/submodule.c b/submodule.c
index e1a51b7506ff..422353ccf6cc 100644
--- a/submodule.c
+++ b/submodule.c
@@ -335,7 +335,7 @@ static void print_submodule_summary(struct rev_info *rev, 
FILE *f,
 
 void show_submodule_summary(FILE *f, const char *path,
const char *line_prefix,
-   unsigned char one[20], unsigned char two[20],
+   struct object_id *one, struct object_id *two,
unsigned dirty_submodule, const char *meta,
const char *del, const char *add, const char *reset)
 {
@@ -345,14 +345,14 @@ void show_submodule_summary(FILE *f, const char *path,
struct strbuf sb = STRBUF_INIT;
int fast_forward = 0, fast_backward = 0;
 
-   if (is_null_sha1(two))
+   if (is_null_oid(two))
message = "(submodule deleted)";
else if (add_submodule_odb(path))
message = "(not initialized)";
-   else if (is_null_sha1(one))
+   else if (is_null_oid(one))
message = "(new submodule)";
-   else if (!(left = lookup_commit_reference(one)) ||
-!(right = lookup_commit_reference(two)))
+   else if (!(left = lookup_commit_reference(one->hash)) ||
+!(right = lookup_commit_reference(two->hash)))
message = "(commits not present)";
else if (prepare_submodule_summary(, path, left, right,
   _forward, _backward))
@@ -365,16 +365,16 @@ void show_submodule_summary(FILE *f, const char *path,
fprintf(f, "%sSubmodule %s contains modified content\n",
line_prefix, path);
 
-   if (!hashcmp(one, two)) {
+   if (!oidcmp(one, two)) {
strbuf_release();
return;
}
 
strbuf_addf(, "%s%sSubmodule %s %s..", line_prefix, meta, path,
-   find_unique_abbrev(one, DEFAULT_ABBREV));
+   find_unique_abbrev(one->hash, DEFAULT_ABBREV));
if (!fast_backward && !fast_forward)
strbuf_addch(, '.');
-   strbuf_addf(, "%s", find_unique_abbrev(two, DEFAULT_ABBREV));
+   strbuf_addf(, "%s", find_unique_abbrev(two->hash, DEFAULT_ABBREV));
if (message)
strbuf_addf(, " %s%s\n", message, reset);
else
diff --git a/submodule.h b/submodule.h
index 2af939099819..d83df57e24ff 100644
--- a/submodule.h
+++ b/submodule.h
@@ -43,7 +43,7 @@ const char *submodule_strategy_to_string(const struct 
submodule_update_strategy
 void handle_ignore_submodules_arg(struct diff_options *diffopt, const char *);
 void show_submodule_summary(FILE *f, const char *path,
const char *line_prefix,
-   unsigned char one[20], unsigned char two[20],
+   struct object_id *one, struct object_id *two,
unsigned dirty_submodule, const char *meta,
const char *del, const char *add, const char *reset);
 void set_config_fetch_recurse_submodules(int value);
-- 
2.10.0.rc0.259.g83512d9

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v10 6/9] allow do_submodule_path to work even if submodule isn't checked out

2016-08-22 Thread Jacob Keller

From: Jacob Keller 

Currently, do_submodule_path will attempt locating the .git directory by
using read_gitfile on /.git. If this fails it just assumes the
/.git is actually a git directory.

This is good because it allows for handling submodules which were cloned
in a regular manner first before being added to the parent project.

Unfortunately this fails if the  is not actually checked out any
longer, such as by removing the directory.

Fix this by checking if the directory we found is actually a gitdir. In
the case it is not, attempt to lookup the submodule configuration and
find the name of where it is stored in the .git/modules/ folder of the
parent project.

If we can't find this, just leave it alone and let the caller of
do_submodule_path handle a missing directory normally.

It might be worth adding a die() call here, but that seems a bit
overkill.

Because this change fixes add_submodule_odb to work even if the
submodule is not checked out, update the wording of the submodule log
diff format to correctly display that the submodule is "not initialized"
instead of "not checked out"

Add tests to ensure this change works as expected.

Signed-off-by: Jacob Keller 
---
 path.c|  11 +++
 submodule.c   |   2 +-
 t/t4059-diff-submodule-not-initialized.sh | 127 ++
 3 files changed, 139 insertions(+), 1 deletion(-)
 create mode 100755 t/t4059-diff-submodule-not-initialized.sh

diff --git a/path.c b/path.c
index fe3c4d96c6d8..07dd0f62eb82 100644
--- a/path.c
+++ b/path.c
@@ -6,6 +6,7 @@
 #include "string-list.h"
 #include "dir.h"
 #include "worktree.h"
+#include "submodule-config.h"
 
 static int get_st_mode_bits(const char *path, int *mode)
 {
@@ -474,6 +475,7 @@ static void do_submodule_path(struct strbuf *buf, const 
char *path,
const char *git_dir;
struct strbuf git_submodule_common_dir = STRBUF_INIT;
struct strbuf git_submodule_dir = STRBUF_INIT;
+   const struct submodule *sub;
 
strbuf_addstr(buf, path);
strbuf_complete(buf, '/');
@@ -484,6 +486,15 @@ static void do_submodule_path(struct strbuf *buf, const 
char *path,
strbuf_reset(buf);
strbuf_addstr(buf, git_dir);
}
+   if (!is_git_directory(buf->buf)) {
+   sub = submodule_from_path(null_sha1, path);
+   if (sub) {
+   strbuf_reset(buf);
+   strbuf_git_path(buf, "%s/%s", "modules",
+   sub->name);
+   }
+   }
+
strbuf_addch(buf, '/');
strbuf_addbuf(_submodule_dir, buf);
 
diff --git a/submodule.c b/submodule.c
index 1b5cdfb7e784..e1a51b7506ff 100644
--- a/submodule.c
+++ b/submodule.c
@@ -348,7 +348,7 @@ void show_submodule_summary(FILE *f, const char *path,
if (is_null_sha1(two))
message = "(submodule deleted)";
else if (add_submodule_odb(path))
-   message = "(not checked out)";
+   message = "(not initialized)";
else if (is_null_sha1(one))
message = "(new submodule)";
else if (!(left = lookup_commit_reference(one)) ||
diff --git a/t/t4059-diff-submodule-not-initialized.sh 
b/t/t4059-diff-submodule-not-initialized.sh
new file mode 100755
index ..c8775854d3c2
--- /dev/null
+++ b/t/t4059-diff-submodule-not-initialized.sh
@@ -0,0 +1,127 @@
+#!/bin/sh
+#
+# Copyright (c) 2016 Jacob Keller, based on t4041 by Jens Lehmann
+#
+
+test_description='Test for submodule diff on non-checked out submodule
+
+This test tries to verify that add_submodule_odb works when the submodule was
+initialized previously but the checkout has since been removed.
+'
+
+. ./test-lib.sh
+
+# Tested non-UTF-8 encoding
+test_encoding="ISO8859-1"
+
+# String "added" in German (translated with Google Translate), encoded in 
UTF-8,
+# used in sample commit log messages in add_file() function below.
+added=$(printf "hinzugef\303\274gt")
+
+add_file () {
+   (
+   cd "$1" &&
+   shift &&
+   for name
+   do
+   echo "$name" >"$name" &&
+   git add "$name" &&
+   test_tick &&
+   # "git commit -m" would break MinGW, as Windows refuse 
to pass
+   # $test_encoding encoded parameter to git.
+   echo "Add $name ($added $name)" | iconv -f utf-8 -t 
$test_encoding |
+   git -c "i18n.commitEncoding=$test_encoding" commit -F -
+   done >/dev/null &&
+   git rev-parse --short --verify HEAD
+   )
+}
+
+commit_file () {
+   test_tick &&
+   git commit "$@" -m "Commit $*" >/dev/null
+}
+
+test_expect_success 'setup - submodules' '
+   test_create_repo sm2 &&
+   add_file . foo &&
+   add_file sm2 foo1 foo2

[PATCH v10 9/9] diff: teach diff to display submodule difference with an inline diff

2016-08-22 Thread Jacob Keller

From: Jacob Keller 

Teach git-diff and friends a new format for displaying the difference
of a submodule. The new format is an inline diff of the contents of the
submodule between the commit range of the update. This allows the user
to see the actual code change caused by a submodule update.

Add tests for the new format and option.

Signed-off-by: Jacob Keller 
---
 Documentation/diff-config.txt|   9 +-
 Documentation/diff-options.txt   |  17 +-
 diff.c   |  31 +-
 diff.h   |   3 +-
 submodule.c  |  69 +++
 submodule.h  |   6 +
 t/t4060-diff-submodule-option-diff-format.sh | 749 +++
 7 files changed, 863 insertions(+), 21 deletions(-)
 create mode 100755 t/t4060-diff-submodule-option-diff-format.sh

diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt
index d5a5b17d5088..0eded24034b5 100644
--- a/Documentation/diff-config.txt
+++ b/Documentation/diff-config.txt
@@ -122,10 +122,11 @@ diff.suppressBlankEmpty::
 
 diff.submodule::
Specify the format in which differences in submodules are
-   shown.  The "log" format lists the commits in the range like
-   linkgit:git-submodule[1] `summary` does.  The "short" format
-   format just shows the names of the commits at the beginning
-   and end of the range.  Defaults to short.
+   shown.  The "short" format just shows the names of the commits
+   at the beginning and end of the range. The "log" format lists
+   the commits in the range like linkgit:git-submodule[1] `summary`
+   does. The "diff" format shows an inline diff of the changed
+   contents of the submodule. Defaults to "short".
 
 diff.wordRegex::
A POSIX Extended Regular Expression used to determine what is a "word"
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index cc4342e2034d..7805a0ccadf2 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -210,13 +210,16 @@ any of those replacements occurred.
of the `--diff-filter` option on what the status letters mean.
 
 --submodule[=]::
-   Specify how differences in submodules are shown.  When `--submodule`
-   or `--submodule=log` is given, the 'log' format is used.  This format 
lists
-   the commits in the range like linkgit:git-submodule[1] `summary` does.
-   Omitting the `--submodule` option or specifying `--submodule=short`,
-   uses the 'short' format. This format just shows the names of the commits
-   at the beginning and end of the range.  Can be tweaked via the
-   `diff.submodule` configuration variable.
+   Specify how differences in submodules are shown.  When specifying
+   `--submodule=short` the 'short' format is used.  This format just
+   shows the names of the commits at the beginning and end of the range.
+   When `--submodule` or `--submodule=log` is specified, the 'log'
+   format is used.  This format lists the commits in the range like
+   linkgit:git-submodule[1] `summary` does.  When `--submodule=diff`
+   is specified, the 'diff' format is used.  This format shows an
+   inline diff of the changes in the submodule contents between the
+   commit range.  Defaults to `diff.submodule` or the 'short' format
+   if the config option is unset.
 
 --color[=]::
Show colored diff.
diff --git a/diff.c b/diff.c
index 16253b191f53..b38d95eb249c 100644
--- a/diff.c
+++ b/diff.c
@@ -135,6 +135,8 @@ static int parse_submodule_params(struct diff_options 
*options, const char *valu
options->submodule_format = DIFF_SUBMODULE_LOG;
else if (!strcmp(value, "short"))
options->submodule_format = DIFF_SUBMODULE_SHORT;
+   else if (!strcmp(value, "diff"))
+   options->submodule_format = DIFF_SUBMODULE_INLINE_DIFF;
else
return -1;
return 0;
@@ -2300,6 +2302,15 @@ static void builtin_diff(const char *name_a,
struct strbuf header = STRBUF_INIT;
const char *line_prefix = diff_line_prefix(o);
 
+   diff_set_mnemonic_prefix(o, "a/", "b/");
+   if (DIFF_OPT_TST(o, REVERSE_DIFF)) {
+   a_prefix = o->b_prefix;
+   b_prefix = o->a_prefix;
+   } else {
+   a_prefix = o->a_prefix;
+   b_prefix = o->b_prefix;
+   }
+
if (o->submodule_format == DIFF_SUBMODULE_LOG &&
(!one->mode || S_ISGITLINK(one->mode)) &&
(!two->mode || S_ISGITLINK(two->mode))) {
@@ -2311,6 +2322,17 @@ static void builtin_diff(const char *name_a,
two->dirty_submodule,
meta, del, add, reset);
return;
+   } else if (o->submodule_format == DIFF_SUBMODULE_INLINE_DIFF &&
+

[PATCH v10 2/9] cache: add empty_tree_oid object and helper function

2016-08-22 Thread Jacob Keller

From: Jacob Keller 

Similar to is_null_oid(), and is_empty_blob_sha1() add an
empty_tree_oid along with helper function is_empty_tree_oid(). For
completeness, also add an "is_empty_tree_sha1()",
"is_empty_blob_sha1()", "is_empty_tree_oid()" and "is_empty_blob_oid()"
helpers.

To ensure we only get one singleton, implement EMPTY_BLOB_SHA1_BIN as
simply getting the hash of empty_blob_oid structure.

Signed-off-by: Jacob Keller 
---
 cache.h | 25 +
 sha1_file.c |  6 ++
 2 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/cache.h b/cache.h
index f30a4417efdf..70428e92d7ed 100644
--- a/cache.h
+++ b/cache.h
@@ -953,22 +953,39 @@ static inline void oidclr(struct object_id *oid)
 #define EMPTY_TREE_SHA1_BIN_LITERAL \
 "\x4b\x82\x5d\xc6\x42\xcb\x6e\xb9\xa0\x60" \
 "\xe5\x4b\xf8\xd6\x92\x88\xfb\xee\x49\x04"
-#define EMPTY_TREE_SHA1_BIN \
-((const unsigned char *) EMPTY_TREE_SHA1_BIN_LITERAL)
+extern const struct object_id empty_tree_oid;
+#define EMPTY_TREE_SHA1_BIN (empty_tree_oid.hash)
 
 #define EMPTY_BLOB_SHA1_HEX \
"e69de29bb2d1d6434b8b29ae775ad8c2e48c5391"
 #define EMPTY_BLOB_SHA1_BIN_LITERAL \
"\xe6\x9d\xe2\x9b\xb2\xd1\xd6\x43\x4b\x8b" \
"\x29\xae\x77\x5a\xd8\xc2\xe4\x8c\x53\x91"
-#define EMPTY_BLOB_SHA1_BIN \
-   ((const unsigned char *) EMPTY_BLOB_SHA1_BIN_LITERAL)
+extern const struct object_id empty_blob_oid;
+#define EMPTY_BLOB_SHA1_BIN (empty_blob_oid.hash)
+
 
 static inline int is_empty_blob_sha1(const unsigned char *sha1)
 {
return !hashcmp(sha1, EMPTY_BLOB_SHA1_BIN);
 }
 
+static inline int is_empty_blob_oid(const struct object_id *oid)
+{
+   return !hashcmp(oid->hash, EMPTY_BLOB_SHA1_BIN);
+}
+
+static inline int is_empty_tree_sha1(const unsigned char *sha1)
+{
+   return !hashcmp(sha1, EMPTY_TREE_SHA1_BIN);
+}
+
+static inline int is_empty_tree_oid(const struct object_id *oid)
+{
+   return !hashcmp(oid->hash, EMPTY_TREE_SHA1_BIN);
+}
+
+
 int git_mkstemp(char *path, size_t n, const char *template);
 
 /* set default permissions by passing mode arguments to open(2) */
diff --git a/sha1_file.c b/sha1_file.c
index 1e23fc186a02..21cf923bcf1f 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -38,6 +38,12 @@ static inline uintmax_t sz_fmt(size_t s) { return s; }
 
 const unsigned char null_sha1[20];
 const struct object_id null_oid;
+const struct object_id empty_tree_oid = {
+   EMPTY_TREE_SHA1_BIN_LITERAL
+};
+const struct object_id empty_blob_oid = {
+   EMPTY_BLOB_SHA1_BIN_LITERAL
+};
 
 /*
  * This is meant to hold a *small* number of objects that you would
-- 
2.10.0.rc0.259.g83512d9

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v10 4/9] graph: add support for --line-prefix on all graph-aware output

2016-08-22 Thread Jacob Keller

From: Jacob Keller 

Add an extension to git-diff and git-log (and any other graph-aware
displayable output) such that "--line-prefix=" will print the
additional line-prefix on every line of output.

To make this work, we have to fix a few bugs in the graph API that force
graph_show_commit_msg to be used only when you have a valid graph.
Additionally, we extend the default_diff_output_prefix handler to work
even when no graph is enabled.

This is somewhat of a hack on top of the graph API, but I think it
should be acceptable here.

This will be used by a future extension of submodule display which
displays the submodule diff as the actual diff between the pre and post
commit in the submodule project.

Add some tests for both git-log and git-diff to ensure that the prefix
is honored correctly.

Signed-off-by: Jacob Keller 
---
 Documentation/diff-options.txt |   3 +
 builtin/rev-list.c |  70 ++---
 diff.c |   7 +
 diff.h |   2 +
 graph.c|  98 ---
 graph.h|  22 +-
 log-tree.c |   5 +-
 t/t4013-diff-various.sh|   6 +
 ...diff.diff_--line-prefix=abc_master_master^_side |  29 ++
 t/t4013/diff.diff_--line-prefix_--cached_--_file0  |  15 +
 t/t4202-log.sh | 323 +
 11 files changed, 502 insertions(+), 78 deletions(-)
 create mode 100644 t/t4013/diff.diff_--line-prefix=abc_master_master^_side
 create mode 100644 t/t4013/diff.diff_--line-prefix_--cached_--_file0

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 705a87394200..cc4342e2034d 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -569,5 +569,8 @@ endif::git-format-patch[]
 --no-prefix::
Do not show any source or destination prefix.
 
+--line-prefix=::
+   Prepend an additional prefix to every line of output.
+
 For more detailed explanation on these common options, see also
 linkgit:gitdiffcore[7].
diff --git a/builtin/rev-list.c b/builtin/rev-list.c
index 0ba82b1635b6..8479f6ed28aa 100644
--- a/builtin/rev-list.c
+++ b/builtin/rev-list.c
@@ -122,48 +122,40 @@ static void show_commit(struct commit *commit, void *data)
ctx.fmt = revs->commit_format;
ctx.output_encoding = get_log_output_encoding();
pretty_print_commit(, commit, );
-   if (revs->graph) {
-   if (buf.len) {
-   if (revs->commit_format != CMIT_FMT_ONELINE)
-   graph_show_oneline(revs->graph);
+   if (buf.len) {
+   if (revs->commit_format != CMIT_FMT_ONELINE)
+   graph_show_oneline(revs->graph);
 
-   graph_show_commit_msg(revs->graph, );
+   graph_show_commit_msg(revs->graph, stdout, );
 
-   /*
-* Add a newline after the commit message.
-*
-* Usually, this newline produces a blank
-* padding line between entries, in which case
-* we need to add graph padding on this line.
-*
-* However, the commit message may not end in a
-* newline.  In this case the newline simply
-* ends the last line of the commit message,
-* and we don't need any graph output.  (This
-* always happens with CMIT_FMT_ONELINE, and it
-* happens with CMIT_FMT_USERFORMAT when the
-* format doesn't explicitly end in a newline.)
-*/
-   if (buf.len && buf.buf[buf.len - 1] == '\n')
-   graph_show_padding(revs->graph);
-   putchar('\n');
-   } else {
-   /*
-* If the message buffer is empty, just show
-* the rest of the graph output for this
-* commit.
-*/
-   if (graph_show_remainder(revs->graph))
-   putchar('\n');
-   if (revs->commit_format == CMIT_FMT_ONELINE)
-   putchar('\n');
-   }
+   /*
+

[PATCH v10 3/9] diff.c: remove output_prefix_length field

2016-08-22 Thread Jacob Keller

From: Junio C Hamano 

"diff/log --stat" has a logic that determines the display columns
available for the diffstat part of the output and apportions it for
pathnames and diffstat graph automatically.

5e71a84a (Add output_prefix_length to diff_options, 2012-04-16)
added the output_prefix_length field to diff_options structure to
allow this logic to subtract the display columns used for the
history graph part from the total "terminal width"; this matters
when the "git log --graph -p" option is in use.

The field must be set to the number of display columns needed to
show the output from the output_prefix() callback, which is error
prone.  As there is only one user of the field, and the user has the
actual value of the prefix string, let's get rid of the field and
have the user count the display width itself.

Signed-off-by: Junio C Hamano 
---
 diff.c  | 2 +-
 diff.h  | 1 -
 graph.c | 2 --
 3 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/diff.c b/diff.c
index 534c12e28ea8..50bef1f07658 100644
--- a/diff.c
+++ b/diff.c
@@ -1625,7 +1625,7 @@ static void show_stats(struct diffstat_t *data, struct 
diff_options *options)
 */
 
if (options->stat_width == -1)
-   width = term_columns() - options->output_prefix_length;
+   width = term_columns() - strlen(line_prefix);
else
width = options->stat_width ? options->stat_width : 80;
number_width = decimal_width(max_change) > number_width ?
diff --git a/diff.h b/diff.h
index 7883729edf10..747a204d75a4 100644
--- a/diff.h
+++ b/diff.h
@@ -174,7 +174,6 @@ struct diff_options {
diff_format_fn_t format_callback;
void *format_callback_data;
diff_prefix_fn_t output_prefix;
-   int output_prefix_length;
void *output_prefix_data;
 
int diff_path_counter;
diff --git a/graph.c b/graph.c
index dd1720148dc5..a46803840511 100644
--- a/graph.c
+++ b/graph.c
@@ -197,7 +197,6 @@ static struct strbuf *diff_output_prefix_callback(struct 
diff_options *opt, void
assert(opt);
assert(graph);
 
-   opt->output_prefix_length = graph->width;
strbuf_reset();
graph_padding_line(graph, );
return 
@@ -245,7 +244,6 @@ struct git_graph *graph_init(struct rev_info *opt)
 */
opt->diffopt.output_prefix = diff_output_prefix_callback;
opt->diffopt.output_prefix_data = graph;
-   opt->diffopt.output_prefix_length = 0;
 
return graph;
 }
-- 
2.10.0.rc0.259.g83512d9

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v10 8/9] submodule: refactor show_submodule_summary with helper function

2016-08-22 Thread Jacob Keller

From: Jacob Keller 

A future patch is going to add a new submodule diff format which
displays an inline diff of the submodule changes. To make this easier,
and to ensure that both submodule diff formats use the same initial
header, factor out show_submodule_header() function which will print the
current submodule header line, and then leave the show_submodule_summary
function to lookup and print the submodule log format.

This does create one format change in that "(revision walker failed)"
will now be displayed on its own line rather than as part of the message
because we no longer perform this step directly in the header display
flow. However, this is a rare case as most causes of the failure will be
due to a missing commit which we already check for and avoid previously.
flow. However, this is a rare case and shouldn't impact much.

Signed-off-by: Jacob Keller 
---
 submodule.c | 115 +++-
 1 file changed, 82 insertions(+), 33 deletions(-)

diff --git a/submodule.c b/submodule.c
index 422353ccf6cc..1fb753af3066 100644
--- a/submodule.c
+++ b/submodule.c
@@ -278,9 +278,9 @@ void handle_ignore_submodules_arg(struct diff_options 
*diffopt,
 
 static int prepare_submodule_summary(struct rev_info *rev, const char *path,
struct commit *left, struct commit *right,
-   int *fast_forward, int *fast_backward)
+   struct commit_list *merge_bases)
 {
-   struct commit_list *merge_bases, *list;
+   struct commit_list *list;
 
init_revisions(rev, NULL);
setup_revisions(0, NULL, rev, NULL);
@@ -289,13 +289,6 @@ static int prepare_submodule_summary(struct rev_info *rev, 
const char *path,
left->object.flags |= SYMMETRIC_LEFT;
add_pending_object(rev, >object, path);
add_pending_object(rev, >object, path);
-   merge_bases = get_merge_bases(left, right);
-   if (merge_bases) {
-   if (merge_bases->item == left)
-   *fast_forward = 1;
-   else if (merge_bases->item == right)
-   *fast_backward = 1;
-   }
for (list = merge_bases; list; list = list->next) {
list->item->object.flags |= UNINTERESTING;
add_pending_object(rev, >item->object,
@@ -333,31 +326,23 @@ static void print_submodule_summary(struct rev_info *rev, 
FILE *f,
strbuf_release();
 }
 
-void show_submodule_summary(FILE *f, const char *path,
+/* Helper function to display the submodule header line prior to the full
+ * summary output. If it can locate the submodule objects directory it will
+ * attempt to lookup both the left and right commits and put them into the
+ * left and right pointers.
+ */
+static void show_submodule_header(FILE *f, const char *path,
const char *line_prefix,
struct object_id *one, struct object_id *two,
unsigned dirty_submodule, const char *meta,
-   const char *del, const char *add, const char *reset)
+   const char *reset,
+   struct commit **left, struct commit **right,
+   struct commit_list **merge_bases)
 {
-   struct rev_info rev;
-   struct commit *left = NULL, *right = NULL;
const char *message = NULL;
struct strbuf sb = STRBUF_INIT;
int fast_forward = 0, fast_backward = 0;
 
-   if (is_null_oid(two))
-   message = "(submodule deleted)";
-   else if (add_submodule_odb(path))
-   message = "(not initialized)";
-   else if (is_null_oid(one))
-   message = "(new submodule)";
-   else if (!(left = lookup_commit_reference(one->hash)) ||
-!(right = lookup_commit_reference(two->hash)))
-   message = "(commits not present)";
-   else if (prepare_submodule_summary(, path, left, right,
-  _forward, _backward))
-   message = "(revision walker failed)";
-
if (dirty_submodule & DIRTY_SUBMODULE_UNTRACKED)
fprintf(f, "%sSubmodule %s contains untracked content\n",
line_prefix, path);
@@ -365,11 +350,46 @@ void show_submodule_summary(FILE *f, const char *path,
fprintf(f, "%sSubmodule %s contains modified content\n",
line_prefix, path);
 
+   if (is_null_oid(one))
+   message = "(new submodule)";
+   else if (is_null_oid(two))
+   message = "(submodule deleted)";
+
+   if (add_submodule_odb(path)) {
+   if (!message)
+   message = "(not initialized)";
+   goto output_header;
+   }
+
+   /*
+* Attempt to lookup the commit references, and determine if this is
+* a fast forward or fast backwards update.
+*/
+   *left = lookup_commit_reference(one->hash);
+   *right =

[PATCH v10 0/9] submodule inline diff format

2016-08-22 Thread Jacob Keller

From: Jacob Keller 

A few suggestions from Stefan in regards to falling back to
.git/modules/ being a bad idea. I've chosen I think to avoid using
die() as we just stick with the current path if we can't find its name.
I think this should be safe since we already do this today. The new flow
only changes if we are able to lookup the submodule, so I don't think
it's worth adding a die() call.

interdiff between v9 and v10 is pretty small:


diff --git c/Documentation/RelNotes/2.10.0.txt 
w/Documentation/RelNotes/2.10.0.txt
index 4f7460be3914..0ef70fd9b1eb 100644
--- c/Documentation/RelNotes/2.10.0.txt
+++ w/Documentation/RelNotes/2.10.0.txt
@@ -118,6 +118,15 @@ UI, Workflows & Features
"git branch --delete/--move [--remote]".
(merge 2703c22 vs/completion-branch-fully-spelled-d-m-r later to maint).
 
+ * "git rev-parse --git-path hooks/" learned to take
+   core.hooksPath configuration variable (introduced during 2.9 cycle)
+   into account.
+   (merge 9445b49 ab/hooks later to maint).
+
+ * "git log --show-signature" and other commands that display the
+   verification status of PGP signature now shows the longer key-id,
+   as 32-bit key-id is so last century.
+
 
 Performance, Internal Implementation, Development Support etc.
 
@@ -600,6 +609,28 @@ notes for details).
arises).
(merge c2cafd3 js/test-lint-pathname later to maint).
 
+ * When "git merge-recursive" works on history with many criss-cross
+   merges in "verbose" mode, the names the command assigns to the
+   virtual merge bases could have overwritten each other by unintended
+   reuse of the same piece of memory.
+   (merge 5447a76 rs/pull-signed-tag later to maint).
+
+ * "git checkout --detach " used to give the same advice
+   message as that is issued when "git checkout " (or anything
+   that is not a branch name) is given, but asking with "--detach" is
+   an explicit enough sign that the user knows what is going on.  The
+   advice message has been squelched in this case.
+   (merge 779b88a sb/checkout-explit-detach-no-advice later to maint).
+
+ * "git difftool" by default ignores the error exit from the backend
+   commands it spawns, because often they signal that they found
+   differences by exiting with a non-zero status code just like "diff"
+   does; the exit status codes 126 and above however are special in
+   that they are used to signal that the command is not executable,
+   does not exist, or killed by a signal.  "git difftool" has been
+   taught to notice these exit status codes.
+   (merge 45a4f5d jk/difftool-command-not-found later to maint).
+
  * Other minor clean-ups and documentation updates
(merge 02a8cfa rs/merge-add-strategies-simplification later to maint).
(merge af4941d rs/merge-recursive-string-list-init later to maint).
diff --git c/GIT-VERSION-GEN w/GIT-VERSION-GEN
index eea85c340454..702c067a78c0 100755
--- c/GIT-VERSION-GEN
+++ w/GIT-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=GIT-VERSION-FILE
-DEF_VER=v2.10.0-rc0
+DEF_VER=v2.10.0-rc1
 
 LF='
 '
diff --git c/path.c w/path.c
index 081a22c1163c..07dd0f62eb82 100644
--- c/path.c
+++ w/path.c
@@ -475,7 +475,7 @@ static void do_submodule_path(struct strbuf *buf, const 
char *path,
const char *git_dir;
struct strbuf git_submodule_common_dir = STRBUF_INIT;
struct strbuf git_submodule_dir = STRBUF_INIT;
-   const struct submodule *submodule_config;
+   const struct submodule *sub;
 
strbuf_addstr(buf, path);
strbuf_complete(buf, '/');
@@ -487,17 +487,12 @@ static void do_submodule_path(struct strbuf *buf, const 
char *path,
strbuf_addstr(buf, git_dir);
}
if (!is_git_directory(buf->buf)) {
-   strbuf_reset(buf);
-   /*
-* Lookup the submodule name from the config. If that fails
-* fall back to assuming the path is the name.
-*/
-   submodule_config = submodule_from_path(null_sha1, path);
-   if (submodule_config)
+   sub = submodule_from_path(null_sha1, path);
+   if (sub) {
+   strbuf_reset(buf);
strbuf_git_path(buf, "%s/%s", "modules",
-   submodule_config->name);
-   else
-   strbuf_git_path(buf, "%s/%s", "modules", path);
+   sub->name);
+   }
}
 
strbuf_addch(buf, '/');

>8

Jacob Keller (7):
  cache: add empty_tree_oid object and helper function
  graph: add support for --line-prefix on all graph-aware output
  diff: prepare for additional submodule formats
  allow do_submodule_path to work even if submodule isn't checked out
  submodule: convert show_submodule_summary to use struct object_id *
  submodule: refactor show_submodule_summary with helper function
  diff: teach diff to display submodule difference with an inline diff

[PATCH v10 1/9] Git 2.10-rc1

2016-08-22 Thread Jacob Keller

From: Junio C Hamano 

Signed-off-by: Junio C Hamano 
---
 Documentation/RelNotes/2.10.0.txt | 31 +++
 GIT-VERSION-GEN   |  2 +-
 2 files changed, 32 insertions(+), 1 deletion(-)

diff --git a/Documentation/RelNotes/2.10.0.txt 
b/Documentation/RelNotes/2.10.0.txt
index 4f7460be3914..0ef70fd9b1eb 100644
--- a/Documentation/RelNotes/2.10.0.txt
+++ b/Documentation/RelNotes/2.10.0.txt
@@ -118,6 +118,15 @@ UI, Workflows & Features
"git branch --delete/--move [--remote]".
(merge 2703c22 vs/completion-branch-fully-spelled-d-m-r later to maint).
 
+ * "git rev-parse --git-path hooks/" learned to take
+   core.hooksPath configuration variable (introduced during 2.9 cycle)
+   into account.
+   (merge 9445b49 ab/hooks later to maint).
+
+ * "git log --show-signature" and other commands that display the
+   verification status of PGP signature now shows the longer key-id,
+   as 32-bit key-id is so last century.
+
 
 Performance, Internal Implementation, Development Support etc.
 
@@ -600,6 +609,28 @@ notes for details).
arises).
(merge c2cafd3 js/test-lint-pathname later to maint).
 
+ * When "git merge-recursive" works on history with many criss-cross
+   merges in "verbose" mode, the names the command assigns to the
+   virtual merge bases could have overwritten each other by unintended
+   reuse of the same piece of memory.
+   (merge 5447a76 rs/pull-signed-tag later to maint).
+
+ * "git checkout --detach " used to give the same advice
+   message as that is issued when "git checkout " (or anything
+   that is not a branch name) is given, but asking with "--detach" is
+   an explicit enough sign that the user knows what is going on.  The
+   advice message has been squelched in this case.
+   (merge 779b88a sb/checkout-explit-detach-no-advice later to maint).
+
+ * "git difftool" by default ignores the error exit from the backend
+   commands it spawns, because often they signal that they found
+   differences by exiting with a non-zero status code just like "diff"
+   does; the exit status codes 126 and above however are special in
+   that they are used to signal that the command is not executable,
+   does not exist, or killed by a signal.  "git difftool" has been
+   taught to notice these exit status codes.
+   (merge 45a4f5d jk/difftool-command-not-found later to maint).
+
  * Other minor clean-ups and documentation updates
(merge 02a8cfa rs/merge-add-strategies-simplification later to maint).
(merge af4941d rs/merge-recursive-string-list-init later to maint).
diff --git a/GIT-VERSION-GEN b/GIT-VERSION-GEN
index eea85c340454..702c067a78c0 100755
--- a/GIT-VERSION-GEN
+++ b/GIT-VERSION-GEN
@@ -1,7 +1,7 @@
 #!/bin/sh
 
 GVF=GIT-VERSION-FILE
-DEF_VER=v2.10.0-rc0
+DEF_VER=v2.10.0-rc1
 
 LF='
 '
-- 
2.10.0.rc0.259.g83512d9

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 4/7] delta_base_cache: use list.h for LRU

2016-08-22 Thread Eric Wong

Jeff King  wrote:
> We keep an LRU list of entries for when we need to drop
> something from an over-full cache. The list is implemented
> as a circular doubly-linked list, which is exactly what
> list.h provides. We can save a few lines by using the list.h
> macros and functions. More importantly, this makes the code
> easier to follow, as the reader sees explicit concepts like
> "list_add_tail()" instead of pointer manipulation.

Yay!

> As a bonus, the list_entry() macro lets us place the lru
> pointers anywhere inside the delta_base_cache_entry struct
> (as opposed to just casting the pointer, which requires it
> at the front of the struct). This will be useful in later
> patches when we need to place other items at the front of
> the struct (e.g., our hashmap implementation requires this).

On a side note, I think we should s/list_entry/container_of/ and
use container_of for hashmap.

Linux kernel defines list_entry to use container_of,
but I'd rather avoid introducing the duality entirely.

> Signed-off-by: Jeff King 
> ---
> I think the result is much nicer, but I found list_entry() a little
> disappointing, because we lack typeof(). So you are stuck writing:
> 
>   struct delta_base_cache_entry *f =
> list_entry(p, struct delta_base_cache_entry, lru);
> 
> I waffled on adding something like:
> 
>   LIST_ITEM(struct delta_bas_cache_entry, f, p, lru);
> 
> to declare "f" as above. But it's getting rather magical and un-C-like.

Right.  I'd rather keep the list_entry/container_of usage
identical to what developers familiar using userspace-rcu,
ccan/list, or the Linux kernel expect.

>  sha1_file.c | 38 --
>  1 file changed, 16 insertions(+), 22 deletions(-)

Good to see the code reduction.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Working with public-inbox.org [Was: [PATCH] rev-parse: respect core.hooksPath in --git-path]

2016-08-22 Thread Eric Wong

Johannes Schindelin  wrote:
> On Fri, 19 Aug 2016, Eric Wong wrote:
> > Johannes Schindelin  wrote:
> > > On Thu, 18 Aug 2016, Eric Wong wrote:
> > > > Johannes Schindelin  wrote:
> > > >
> > > > > Old dogs claim the mail list-approach works for them. Nope.
> > > > > Doesn't.  Else you would not have written all those custom
> > > > > scripts.
> > > > 
> > > > git and cogito started as a bunch of custom scripts, too.
> > > 
> > > The difference is that neither git nor cogito were opinionated. Those
> > > custom scripts are. They are for one particular workflow, with one
> > > particular mail client, with a strong bias to a Unix-y environment.



> > I guess this is a fundamental difference between *nix and Windows
> > culture.
> 
> I do not understand how you get from "I wish to make it fun to contribute
> to Git" to "there is a fundamental difference between *nix and Windows
> culture".

Sorry, I over-quoted by 3 lines.



> > > We do not even have a section on Outlook in our SubmittingPatches.
> > > 
> > > Okay, if not the most popular mail client, then web mail? Nope, nope,
> > > nope. No piping *at all* to external commands from there.
> > > 
> > > So you basically slam the door shut on the vast majority of email users.
> > 
> > Users have a choice to use a more scriptable mail client
> > (but I guess the OS nudges users towards monolithic tools)
> 
> You call that choice. Are you serious?
> 
> > > That is not leaving much choice to the users in my book.
> > 
> > Users of alpine, gnus, mutt, sylpheed, thunderbird, kmail,
> > roundcube, squirelmail, etc. can all download the source, hack,
> > fix and customize things.  It's easier with smaller software,
> > of course:  git-send-email does not even require learning
> > the build process or separate download.
> 
> Now I am getting upset. This is a BS argument. Sure, I can hack the source
> of these tools.
>
> But why on earth do I *have* to? Why can't we use or create an open
> contribution process *that works without having to work so hard to be able
> to contribute*?

The process we have is already open.  It may be *nix-centric,
and *nix may be picky about it's friends, but it is open:

Anybody can still contribute today without any sort of
registration, credentialism, or terms-of-service(*).

I am looking beyond git.

I hate signing up for websites.  For many years, I have used
Debian as a proxy for other projects with less open contribution
processes:

apt-get source ...; ; reportbug ...

Of course, going through Debian maintainers is not always
reliable or efficient.

I foolishly hoped git-svn would put an end to all the
registration-required bug tracker instances so I could just
send my changes directly to upstream maintainers without any
sort of registration.  Did not happen :<

> So unfortunately this thread has devolved. Which is sad. Because all I
> wanted is to have a change in Git's submission process that would not
> exclude *so many* developers. That is really all I care about. Not about
> tools. Not about open vs proprietary, or standards.
> 
> I just want developers who are already familiar with Git, and come up with
> an improvement to Git itself, to be able to contribute it without having
> to pull out their hair in despair.

We want the same thing.  I just want to go farther and get
people familiar with (federated|decentralized) tools instead of
proprietary and centralized ones.



(*) I wish git could get rid of the DCO, even.  But at least
it's far better than the "papers, please" policy for some
GNU projects.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 7/7] t/perf: add basic perf tests for delta base cache

2016-08-22 Thread Jeff King

This just shows off the improvements done by the last few
patches, and gives us a baseline for noticing regressions in
the future. Here are the results with linux.git as the perf
"large repo":

TestoriginHEAD
---
0003.1: log --raw   43.41(40.36+2.69) 33.86(30.96+2.41) -22.0%
0003.2: log -S  313.61(309.74+3.78)   298.75(295.58+3.00) -4.7%

(for a large repo, the "log -S" improvements are greater if
you bump the delta base cache limit, but I think it makes
sense to test the "stock" behavior, since that is what most
people will see).

Signed-off-by: Jeff King 
---
I also got good times for "log -S" when measuring git.git (but _not_
when measuring "log --raw", because of its size). So I guess perhaps the
script could measure both. I'm not sure if the perf framework is ready
for using both test_default_repo and test_large_repo, though.

 t/perf/p0003-delta-base-cache.sh | 31 +++
 1 file changed, 31 insertions(+)
 create mode 100755 t/perf/p0003-delta-base-cache.sh

diff --git a/t/perf/p0003-delta-base-cache.sh b/t/perf/p0003-delta-base-cache.sh
new file mode 100755
index 000..62369ea
--- /dev/null
+++ b/t/perf/p0003-delta-base-cache.sh
@@ -0,0 +1,31 @@
+#!/bin/sh
+
+test_description='Test operations that emphasize the delta base cache.
+
+We look at both "log --raw", which should put only trees into the delta cache,
+and "log -Sfoo --raw", which should look at both trees and blobs.
+
+Any effects will be emphasized if the test repository is fully packed (loose
+objects obviously do not use the delta base cache at all). It is also
+emphasized if the pack has long delta chains (e.g., as produced by "gc
+--aggressive"), though cache is still quite noticeable even with the default
+depth of 50.
+
+The setting of core.deltaBaseCacheLimit in the source repository is also
+relevant (depending on the size of your test repo), so be sure it is consistent
+between runs.
+'
+. ./perf-lib.sh
+
+test_perf_large_repo
+
+# puts mostly trees into the delta base cache
+test_perf 'log --raw' '
+   git log --raw >/dev/null
+'
+
+test_perf 'log -S' '
+   git log --raw -Sfoo >/dev/null
+'
+
+test_done
-- 
2.10.0.rc1.118.ge2299eb
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Git and SHA-1 security (again)

2016-08-22 Thread Philip Oakley


Sorry if I'm dropping in at the wrong point (this is one I'd bookmarked)..



From: "Duy Nguyen" 
Sent: Wednesday, July 20, 2016 3:44 PM

On Wed, Jul 20, 2016 at 2:28 PM, Johannes Schindelin
 wrote:

But that strategy *still* ignores the distributed nature of Git. Just
because *you* make that merge at a certain point does not necessarily 
mean

that I make it at that point, too.

Any approach that tries to have one single point of conversion will most
likely fall short of a solution.


OK I see the difference in our views now. To me an sha256 repo would
see an sha1 repo as a _foreign_ DVCS, pretty much like git sees
mercurial now. So a transition from sha1 to sha256 is not that
different from cvs -> svn -> a dvcs bubble -> git.



I think that within Git, that it is possible to have inter-workability (for 
those parts that negotiate) between instances with different views about the 
availability of two hash types. Fetch/push negotiation is a normal part of 
working with a remote.


To be honest, I am less concerned about the GPG-signed commits (after 
all,

after switching to a more secure hash algorithm, a maintainer could
cross-sign all signed commits, or only the branch tips or tags, as new
tags, to reinstitute trust).

I am much more concerned about references to commits, both inside and
outside the repository. That is, if I read anywhere on the internet about
Git having added support for `git add --chmod=+x ` in 4e55ed3 (add:
add --chmod=+x / --chmod=-x options, 2016-05-31), I want to find that
commit by that reference.

And I am of course concerned what should happen if a user wants to fetch
from, or push to, a SHA-1-hashed remote repository into, or from, a
SHA-256-hashed local one.


to follow the above, in my view, interaction with sha1 repos go
through some conversion bridges like what we have with hg and svn. I
don't know if we are going this route. It's certainly simpler and
people already have experiences (from previous migration) to prepare
for it.
--


The main thought was that rather than worrying about which advanced hash to 
pick (with all the arguments that entails), rather it is worth reducing the 
problem space to create a 'toy problem', to look at the interaction issues.


For the toy problem view we'd keep the current oid length (so that the 
transmission formats don't change size), however we swap the old-new to make 
sha1 the new hash and use an older shorter hash (e.g. md5) to investigate 
the transition from a short to long hash.


Keeping it as a 'toy problem' avoids folks having too much invested in the 
new hash choice, rather the interworking can be more easily sorted, and some 
issue can be punted on (e.g. the choice of salt to extend the md5 to the 
sha1, and collisions therein).


--
Philip

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 6/7] delta_base_cache: use hashmap.h

2016-08-22 Thread Jeff King

The fundamental data structure of the delta base cache is a
hash table mapping pairs of "(packfile, offset)" into
structs containing the actual object data. The hash table
implementation dates back to e5e0161 (Implement a simple
delta_base cache, 2007-03-17), and uses a fixed-size table.
The current size is a hard-coded 256 entries.

Because we need to be able to remove objects from the hash
table, entry lookup does not do any kind of probing to
handle collisions. Colliding items simply replace whatever
is in their slot.  As a result, we have fewer usable slots
than even the 256 we allocate. At half full, each new item
has a 50% chance of displacing another one. Or another way
to think about it: every item has a 1/256 chance of being
ejected due to hash collision, without regard to our LRU
strategy.

So it would be interesting to see the effect of increasing
the cache size on the runtime for some common operations. As
with the previous patch, we'll measure "git log --raw" for
tree-only operations, and "git log -Sfoo --raw" for
operations that touch trees and blobs. All times are
wall-clock best-of-3, done against fully packed repos with
--depth=50, and the default core.deltaBaseCacheLimit of
96MB.

Here are timings for various values of MAX_DELTA_CACHE
against git.git (the asterisk marks the minimum time for
each operation):

MAX_DELTA_CACHElog-raw  log-S
---   -   -
256   0m02.227s   0m12.821s
512   0m02.143s   0m10.602s
   1024   0m02.127s   0m08.642s
   2048   0m02.148s   0m07.123s
   4096   0m02.194s   0m06.448s*
   8192   0m02.239s   0m06.504s
  16384   0m02.144s*  0m06.502s
  32768   0m02.202s   0m06.622s
  65536   0m02.230s   0m06.677s

The log-raw case isn't changed much at all here (probably
because our trees just aren't that big in the first place,
or possibly because we have so _few_ trees in git.git that
the 256-entry cache is enough). But once we start putting
blobs in the cache, too, we see a big improvement (almost
50%). The curve levels off around 4096, which means that we
can hold about that many entries before hitting the 96MB
memory limit (or possibly that the workload is small enough
that there is simply no more work to be optimized out by
caching more).

(As a side note, I initially timed my existing git.git pack,
which was a base of --aggressive combined with some pulls on
top. So it had quite a few deeper delta chains. The
256-cache case was more like 15s, and it still dropped to
~6.5s in the same way).

Here are the timings for linux.git:

MAX_DELTA_CACHElog-raw  log-S
---   -   -
256   0m41.661s   5m12.410s
512   0m39.547s   5m07.920s
   1024   0m37.054s   4m54.666s
   2048   0m35.871s   4m41.194s*
   4096   0m34.646s   4m51.648s
   8192   0m33.881s   4m55.342s
  16384   0m35.190s   5m00.122s
  32768   0m35.060s   4m58.851s
  65536   0m33.311s*  4m51.420s

As we grow we see a nice 20% speedup in the tree traversal,
and more modest 10% in the log-S. This is probably an
indication that we are bound less by the number of entries,
and more by the memory limit (more on that below). What is
interesting is that the numbers bounce around a bit;
increasing the number of entries isn't always a strict
improvement.

Partially this is due to noise in the measurement. But it
may also be an indication that our LRU ejection scheme is
not optimal. The smaller cache sizes introduce some
randomness into the ejection (due to collisions), which may
sometimes work in our favor (and sometimes not!).

So what is the optimal setting of MAX_DELTA_CACHE? The
"bouncing" in the linux.git log-S numbers notwithstanding,
it mostly seems like bigger is better. And even if we were
to try to find a "sweet spot", these are just two
repositories, that are not necessarily representative. The
shape of history, the size of trees and blobs, the memory
limit configuration, etc, all will affect the outcome.

Rather than trying to find the "right" number, another
strategy is to just switch to a hash table that can actually
store collisions: namely our hashmap.h implementation.

Here are numbers for that compared to the "best" we saw from
adjusting MAX_DELTA_CACHE:

|   log-raw|   log-S
| best   hashmap   | best   hashmap
| -  - | -  -
  git   | 0m02.144s  0m02.144s | 0m06.448s  0m06.688s
  linux | 0m33.311s  0m33.092s | 4m41.194s  4m57.172s

We can see the results are similar in most cases, which is
what we'd expect. We're not ejecting due to collisions at
all, so this is purely representing the LRU. So really, we'd
expect this to model most closely the larger values of the
static MAX_DELTA_CACHE limit. And that does seem to be

[PATCH 4/7] delta_base_cache: use list.h for LRU

2016-08-22 Thread Jeff King

We keep an LRU list of entries for when we need to drop
something from an over-full cache. The list is implemented
as a circular doubly-linked list, which is exactly what
list.h provides. We can save a few lines by using the list.h
macros and functions. More importantly, this makes the code
easier to follow, as the reader sees explicit concepts like
"list_add_tail()" instead of pointer manipulation.

As a bonus, the list_entry() macro lets us place the lru
pointers anywhere inside the delta_base_cache_entry struct
(as opposed to just casting the pointer, which requires it
at the front of the struct). This will be useful in later
patches when we need to place other items at the front of
the struct (e.g., our hashmap implementation requires this).

Signed-off-by: Jeff King 
---
I think the result is much nicer, but I found list_entry() a little
disappointing, because we lack typeof(). So you are stuck writing:

  struct delta_base_cache_entry *f =
list_entry(p, struct delta_base_cache_entry, lru);

I waffled on adding something like:

  LIST_ITEM(struct delta_bas_cache_entry, f, p, lru);

to declare "f" as above. But it's getting rather magical and un-C-like.

 sha1_file.c | 38 --
 1 file changed, 16 insertions(+), 22 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 8264b39..c02e785 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -24,6 +24,7 @@
 #include "streaming.h"
 #include "dir.h"
 #include "mru.h"
+#include "list.h"
 
 #ifndef O_NOATIME
 #if defined(__linux__) && (defined(__i386__) || defined(__PPC__))
@@ -2077,13 +2078,10 @@ static void *unpack_compressed_entry(struct packed_git 
*p,
 
 static size_t delta_base_cached;
 
-static struct delta_base_cache_lru_list {
-   struct delta_base_cache_lru_list *prev;
-   struct delta_base_cache_lru_list *next;
-} delta_base_cache_lru = { _base_cache_lru, _base_cache_lru };
+static LIST_HEAD(delta_base_cache_lru);
 
 static struct delta_base_cache_entry {
-   struct delta_base_cache_lru_list lru;
+   struct list_head lru;
void *data;
struct packed_git *p;
off_t base_offset;
@@ -2128,8 +2126,7 @@ static int in_delta_base_cache(struct packed_git *p, 
off_t base_offset)
 static void detach_delta_base_cache_entry(struct delta_base_cache_entry *ent)
 {
ent->data = NULL;
-   ent->lru.next->prev = ent->lru.prev;
-   ent->lru.prev->next = ent->lru.next;
+   list_del(>lru);
delta_base_cached -= ent->size;
 }
 
@@ -2168,24 +2165,24 @@ static void add_delta_base_cache(struct packed_git *p, 
off_t base_offset,
 {
unsigned long hash = pack_entry_hash(p, base_offset);
struct delta_base_cache_entry *ent = delta_base_cache + hash;
-   struct delta_base_cache_lru_list *lru;
+   struct list_head *lru;
 
release_delta_base_cache(ent);
delta_base_cached += base_size;
 
-   for (lru = delta_base_cache_lru.next;
-delta_base_cached > delta_base_cache_limit
-&& lru != _base_cache_lru;
-lru = lru->next) {
-   struct delta_base_cache_entry *f = (void *)lru;
+   list_for_each(lru, _base_cache_lru) {
+   struct delta_base_cache_entry *f =
+   list_entry(lru, struct delta_base_cache_entry, lru);
+   if (delta_base_cached <= delta_base_cache_limit)
+   break;
if (f->type == OBJ_BLOB)
release_delta_base_cache(f);
}
-   for (lru = delta_base_cache_lru.next;
-delta_base_cached > delta_base_cache_limit
-&& lru != _base_cache_lru;
-lru = lru->next) {
-   struct delta_base_cache_entry *f = (void *)lru;
+   list_for_each(lru, _base_cache_lru) {
+   struct delta_base_cache_entry *f =
+   list_entry(lru, struct delta_base_cache_entry, lru);
+   if (delta_base_cached <= delta_base_cache_limit)
+   break;
release_delta_base_cache(f);
}
 
@@ -2194,10 +2191,7 @@ static void add_delta_base_cache(struct packed_git *p, 
off_t base_offset,
ent->type = type;
ent->data = base;
ent->size = base_size;
-   ent->lru.next = _base_cache_lru;
-   ent->lru.prev = delta_base_cache_lru.prev;
-   delta_base_cache_lru.prev->next = >lru;
-   delta_base_cache_lru.prev = >lru;
+   list_add_tail(>lru, _base_cache_lru);
 }
 
 static void *read_object(const unsigned char *sha1, enum object_type *type,
-- 
2.10.0.rc1.118.ge2299eb

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 5/7] delta_base_cache: drop special treatment of blobs

2016-08-22 Thread Jeff King

When the delta base cache runs out of allowed memory, it has
to drop entries. It does so by walking an LRU list, dropping
objects until we are under the memory limit. But we actually
walk the list twice: once to drop blobs, and then again to
drop other objects (which are generally trees). This comes
from 18bdec1 (Limit the size of the new delta_base_cache,
2007-03-19).

This performs poorly as the number of entries grows, because
any time dropping blobs does not satisfy the limit, we have
to walk the _entire_ list, trees included, looking for blobs
to drop, before starting to drop any trees.

It's not generally a problem now, as the cache is limited to
only 256 entries. But as we could benefit from increasing
that in a future patch, it's worth looking at how it
performs as the cache size grows. And the answer is "not
well".

The table below shows times for various operations with
different values of MAX_DELTA_CACHE (which is not a run-time
knob; I recompiled with -DMAX_DELTA_CACHE=$n for each).

I chose "git log --raw" ("log-raw" in the table) because it
will access all of the trees, but no blobs at all (so in a
sense it is a worst case for this problem, because we will
always walk over the entire list of trees once before
realizing there are no blobs to drop). This is also
representative of other tree-only operations like "rev-list
--objects" and "git log -- ".

I also timed "git log -Sfoo --raw" ("log-S" in the table).
It similarly accesses all of the trees, but also the blobs
for each commit. It's representative of "git log -p", though
it emphasizes the cost of blob access more, as "-S" is
cheaper than computing an actual blob diff.

All timings are best-of-3 wall-clock times (though they all
were CPU bound, so the user CPU times are similar). The
repositories were fully packed with --depth=50, and the
default core.deltaBaseCacheLimit of 96M was in effect.  The
current value of MAX_DELTA_CACHE is 256, so I started there
and worked up by factors of 2.

First, here are values for git.git (the asterisk signals the
fastest run for each operation):

MAX_DELTA_CACHElog-raw   log-S
---   --
256   0m02.212s0m12.634s
512   0m02.136s*   0m10.614s
   1024   0m02.156s0m08.614s
   2048   0m02.208s0m07.062s
   4096   0m02.190s0m06.484s*
   8192   0m02.176s0m07.635s
  16384   0m02.913s0m19.845s
  32768   0m03.617s1m05.507s
  65536   0m04.031s1m18.488s

You can see that for the tree-only log-raw case, we don't
actually benefit that much as the cache grows (all the
differences up through 8192 are basically just noise; this
is probably because we don't actually have that many
distinct trees in git.git). But for log-S, we get a definite
speed improvement as the cache grows, but the improvements
are lost as cache size grows and the linear LRU management
starts to dominate.

Here's the same thing run against linux.git:

MAX_DELTA_CACHElog-raw   log-S
---   ---
256   0m40.987s 5m13.216s
512   0m37.949s 5m03.243s
   1024   0m35.977s 4m50.580s
   2048   0m33.855s 4m39.818s
   4096   0m32.913s 4m47.299s*
   8192   0m32.176s*5m14.650s
  16384   0m32.185s 6m31.625s
  32768   0m38.056s 9m31.136s
  65536   1m30.518s17m38.549s

The pattern is similar, though the effect in log-raw is more
pronounced here. The times dip down in the middle, and then
go back up as we keep growing.

So we know there's a problem. What's the solution?

The obvious one is to improve the data structure to avoid
walking over tree entries during the looking-for-blobs
traversal. We can do this by keeping _two_ LRU lists: one
for blobs, and one for other objects. We drop items from the
blob LRU first, and then from the tree LRU (if necessary).

Here's git.git using that strategy:

MAX_DELTA_CACHElog-raw  log-S
---   -   --
256   0m02.264s   0m12.830s
512   0m02.201s   0m10.771s
   1024   0m02.181s   0m08.593s
   2048   0m02.205s   0m07.116s
   4096   0m02.158s   0m06.537s*
   8192   0m02.213s   0m07.246s
  16384   0m02.155s*  0m10.975s
  32768   0m02.159s   0m16.047s
  65536   0m02.181s   0m16.992s

The upswing on log-raw is gone completely. But log-S still
has it (albeit much better than without this strategy).
Let's see what linux.git shows:

MAX_DELTA_CACHElog-raw   log-S
---   --
256   0m42.519s5m14.654s
512   0m39.106s5m04.708s
   1024   0m36.802s4m51.454s
   2048   0m34.685s4m39.378s*

[PATCH 2/7] clear_delta_base_cache_entry: use a more descriptive name

2016-08-22 Thread Jeff King

The delta base cache entries are stored in a fixed-length
hash table. So the way to remove an entry is to "clear" the
slot in the table, and that is what this function does.

However, the name is a leaky abstraction. If we were to
change the hash table implementation, it would no longer be
about "clearing". We should name it after _what_ it does,
not _how_ it does it. I.e., something like "remove" instead
of "clear".

But that does not tell the whole story, either. The subtle
thing about this function is that it removes the entry, but
does not free the entry data. So a more descriptive name is
"detach"; we give ownership of the data buffer to the
caller, and remove any other resources.

This patch uses the name detach_delta_base_cache_entry().
We could further model this after functions like
strbuf_detach(), which pass back all of the detached
information. However, since there are so many bits of
information in the struct (the data, the size, the type),
and so few callers (only one), it's not worth that
awkwardness. The name change and a comment can make the
intent clear.

Signed-off-by: Jeff King 
---
 sha1_file.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 2333911..1d0810c 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2120,7 +2120,12 @@ static int in_delta_base_cache(struct packed_git *p, 
off_t base_offset)
return eq_delta_base_cache_entry(ent, p, base_offset);
 }
 
-static void clear_delta_base_cache_entry(struct delta_base_cache_entry *ent)
+/*
+ * Remove the entry from the cache, but do _not_ free the associated
+ * entry data. The caller takes ownership of the "data" buffer, and
+ * should copy out any fields it wants before detaching.
+ */
+static void detach_delta_base_cache_entry(struct delta_base_cache_entry *ent)
 {
ent->data = NULL;
ent->lru.next->prev = ent->lru.prev;
@@ -2243,7 +2248,7 @@ void *unpack_entry(struct packed_git *p, off_t obj_offset,
type = ent->type;
data = ent->data;
size = ent->size;
-   clear_delta_base_cache_entry(ent);
+   detach_delta_base_cache_entry(ent);
base_from_cache = 1;
break;
}
-- 
2.10.0.rc1.118.ge2299eb

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/7] release_delta_base_cache: reuse existing detach function

2016-08-22 Thread Jeff King

This function drops an entry entirely from the cache,
meaning that aside from the freeing of the buffer, it is
exactly equivalent to detach_delta_base_cache_entry(). Let's
build on top of the detach function, which shortens the code
and will make it simpler when we change out the underlying
storage in future patches.

Signed-off-by: Jeff King 
---
 sha1_file.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 1d0810c..8264b39 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2152,10 +2152,7 @@ static inline void release_delta_base_cache(struct 
delta_base_cache_entry *ent)
 {
if (ent->data) {
free(ent->data);
-   ent->data = NULL;
-   ent->lru.next->prev = ent->lru.prev;
-   ent->lru.prev->next = ent->lru.next;
-   delta_base_cached -= ent->size;
+   detach_delta_base_cache_entry(ent);
}
 }
 
-- 
2.10.0.rc1.118.ge2299eb

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/7] cache_or_unpack_entry: drop keep_cache parameter

2016-08-22 Thread Jeff King

There is only one caller of cache_or_unpack_entry() and it
always passes 1 for the keep_cache parameter. We can
simplify it by dropping the "!keep_cache" case.

Another call, which did pass 0, was dropped in abe601b
(sha1_file: remove recursion in unpack_entry, 2013-03-27),
as unpack_entry() now does more complicated things than a
simple unpack when there is a cache miss.

Signed-off-by: Jeff King 
---
 sha1_file.c | 13 +++--
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 3045aea..2333911 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2129,25 +2129,18 @@ static void clear_delta_base_cache_entry(struct 
delta_base_cache_entry *ent)
 }
 
 static void *cache_or_unpack_entry(struct packed_git *p, off_t base_offset,
-   unsigned long *base_size, enum object_type *type, int keep_cache)
+   unsigned long *base_size, enum object_type *type)
 {
struct delta_base_cache_entry *ent;
-   void *ret;
 
ent = get_delta_base_cache_entry(p, base_offset);
 
if (!eq_delta_base_cache_entry(ent, p, base_offset))
return unpack_entry(p, base_offset, type, base_size);
 
-   ret = ent->data;
-
-   if (!keep_cache)
-   clear_delta_base_cache_entry(ent);
-   else
-   ret = xmemdupz(ent->data, ent->size);
*type = ent->type;
*base_size = ent->size;
-   return ret;
+   return xmemdupz(ent->data, ent->size);
 }
 
 static inline void release_delta_base_cache(struct delta_base_cache_entry *ent)
@@ -2755,7 +2748,7 @@ static void *read_packed_sha1(const unsigned char *sha1,
 
if (!find_pack_entry(sha1, ))
return NULL;
-   data = cache_or_unpack_entry(e.p, e.offset, size, type, 1);
+   data = cache_or_unpack_entry(e.p, e.offset, size, type);
if (!data) {
/*
 * We're probably in deep shit, but let's try to fetch
-- 
2.10.0.rc1.118.ge2299eb

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/7] tweaking the delta base cache

2016-08-22 Thread Jeff King

After the experiments I did with --depth=50 recently, I noticed there
seemed to be a lot of room for improvement in the delta-base-cache (and
in particular, there seemed to be a lack of actual numbers).

So I tried a series of experiments, and these are the tweaks I came up
with. There are a lot of numbers and analysis in the commit messages
themselves. The most dramatic effect I got was that before this patch,
bumping core.deltaBaseCacheLimit for the kernel gets you basically
nothing, whereas with it, I get:

  core.deltaBaseCacheLimittime to run git log -Sfoo --raw
  ---
  128m4m56.486s
  256m4m33.769s
  512m4m12.968s
 1024m3m32.623s

Note that I don't actually propose bumping the memory limit in this
series. That's a bit more contentious, as it's really using more
resources to do a space/time tradeoff, and people may not want to spend
the RAM. Whereas this series just adjusts the actual data structures to
let us use the RAM we've already been allocated more efficiently.

The interesting changes are really patches 5 and 6, which adjust the LRU
management and the underlying hash structure.

There are a few ideas I thought of or saw in past threads but didn't
explore. I don't plan on digging further on them right now, so if
anybody else wants to do so, be my guest:

  - limiting the size of items entering the cache (e.g., to avoid a
single giant blob blowing out all of the other entries)

  - something more clever than LRU, like weighting by a mix of size and
recency

  - I didn't look at the criteria for adding entries to the cache at all

  - we seem to drop cache entries as we use them in unpack_entry(); I'm
not sure if we would do better to retain them and let them leave via
LRU expiration

So there may be more work, but I think these improvements stand on their
own.

  [1/7]: cache_or_unpack_entry: drop keep_cache parameter
  [2/7]: clear_delta_base_cache_entry: use a more descriptive name
  [3/7]: release_delta_base_cache: reuse existing detach function
  [4/7]: delta_base_cache: use list.h for LRU
  [5/7]: delta_base_cache: drop special treatment of blobs
  [6/7]: delta_base_cache: use hashmap.h
  [7/7]: t/perf: add basic perf tests for delta base cache

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Working with public-inbox.org [Was: [PATCH] rev-parse: respect core.hooksPath in --git-path]

2016-08-22 Thread Philip Oakley


From: "Duy Nguyen" 

On Mon, Aug 22, 2016 at 8:06 PM, Johannes Schindelin
 wrote:

My point stands. We are way more uninviting to contributors than
necessary. And a huge part of the problem is that we require contributors
to send their patches inlined into whitespace-preserving mails.


We probably can settle this in the next git survey with a new
question: what's stopping you from contributing to git?
--

One has to be careful that some of this is preaching to the choir.

Those who have difficulty won't be anywhere near the survey. Those that are 
near the survey will have enough nouce to contribute to the level they 
manage.


Also the argument here has started to be at cross purposes about the issues 
vs the solutions.


I do note that dscho's patches now have the extra footer (below the three 
dashes) e.g.


Published-As: https://github.com/dscho/git/releases/tag/cat-file-filters-v1
Fetch-It-Via: git fetch https://github.com/dscho/git cat-file-filters-v1

If say I used that, and sent my patch series via Outlook Express (), 
with it's white space damage, would those footers help once the content has 
been reviewed (rather than white spacing style) in the applying the patch?


Even, Is there a way of accepting email that has HTML embedded by the 
client, rather than [vger, etc.] simply deleting the email as 'not 
acceptable'. Unless users can get past these sort of (to them) petty and 
awkward restrictions, it's going to continue to be difficult to encourage 
participation.


--

Philip

PS Sudden though about the 'Published-As: Fetch-It-Via: ' stuff. I don't 
think we have a method of 'Fetch-As-A-Series' so that what is received from 
the server (e.g. GitHub, as a git command equivalent) is just those 
patches/commits in the series, and the recipient can then see if they apply 
cleanly at the point they want to apply them? This would almost be a 'pack 
file' that is all commit deltas, that can be given to say 'rebase' (or 
'apply'). It could also be expanded back into an mbox format if required...




--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Adding more namespace support to git

2016-08-22 Thread Richard

On 22 August 2016 at 20:16, Josh Triplett  wrote:
> On Mon, Aug 22, 2016 at 07:36:31PM +0100, Richard wrote:
>> On 21 Aug 2016 15:07, "Josh Triplett"  wrote:
>> > I'd like to see it work more automatically than that.  Perhaps a
>> > separate environment variable to set the client-side namespace?
>>
>> How about a config option? That could be set globally, per repository, in
>> the environment or on the command line.
>
> That might work, though you wouldn't normally want to set it globally or
> per-repository (since it affects access to a repository and you'd
> typically want to use multiple different values or it wouldn't have much
> point).

Globally is a bit contrived, but could be used to keep the top-level
namespace clean
so you might opt to default to fetching into a namespace called "main"
so that if you need to temporarily fetch into a different namespace it
wouldn't be problematic.
Perhaps it's a kernel tree from a vendor with a messy branch naming scheme
so you don't want to fetch it into your primary namespace and make it
difficult to find your branches,
but you don't know which of their branches you need until you've got them all.
So you fetch into the different namespace rather than a fresh clone
to avoid re-fetching everything (numerous alternative solutions exist)
Then once you've found out which branch you need,
you make a note, switch back to the "main" namespace and re-fetch just
that branch.

A per repository default namespace could also be useful
if an upstream repository has multiple namespaces (code vs
documentation maybe) you could fetch them all
and then switch between them when you need to work on different parts,
and if it's config rather than an environment variable it will persist
between shell sessions easier.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Working with public-inbox.org [Was: [PATCH] rev-parse: respect core.hooksPath in --git-path]

2016-08-22 Thread Jeff King

On Fri, Aug 19, 2016 at 09:55:54AM -0700, Stefan Beller wrote:

> It was not my intend to start this discussion again with my initial email.
> I rather wanted to point out how I make progress in doing my own
> tooling.
> 
> I mean if email works well for Junio (both as a maintainer as
> well as a contributor) and Jeff as a contributor, then I can adapt
> my workflow to that, because these two have brought in
> 8300 of 33000 non merge patches. (i.e. they had 25% of the
> patches over the life time of the project and are with the project
> longer than others IIUC). So why would I demand they change
> their style just to accommodate one newcomer like me?

Even though I do really like the mail-based workflow, I think this is a
dangerous line of reasoning. If the project were just me and Junio,
working as efficiently as two people possibly can, then sure, asking us
to change from what works for us would be silly.

But it's not. We have to make sure that the project community thrives.
That includes catering to some degree to occasional contributors, and
doing things to attract new members to the community as old ones drift
away.

You'll notice that I hand-waved away "to some degree" there. There is
definitely a balance to be found in managing the time of the maintainer
and the reviewers, versus making things easier for new contributors. As
a reductio ad absurdum, the simplest thing for contributors would be to
make a vague bug report and have the maintainer produce a polished
patch. That obviously does not scale. :)

Likewise, it is not just a matter of time spent, but workflows impact
_who_ will join[1]. Contributing to git is very friendly to a certain
niche of Unix die-hards, and that impacts who bothers to do so, and
consequently, what contributions we see (both to code and to
discussion). There's value in diversity of opinions[2], and we should be
wary of becoming an obsolete and out-of-touch mono-culture.

So I say "dangerous" because that is one way that open source projects
can die: the number of contributors dwindles, development slows, there
are no new ideas in the community, etc.

I don't think git would ever die off completely; there are too many
users. But there have been projects that seem to ossify for many years,
and are rejuvenated only when they shake up some elements of the
community or workflow (e.g., mutt is recently seeing such a resurgence;
sometimes it even takes the form of a follow-on project, like CVS->SVN,
with new people).

I don't think we're at that point with git. But it is something to be
mindful of. It's not clear to me if mutt-loving luddites like myself are
the last of a dying breed, or if there will always be enough of us to
churn out contributions to projects like git.

-Peff

[1] I think Dscho feels this much more acutely on Git for Windows than
we do on the regular git mailing list, because the "who" audience
for GfW is much different than the Unix world.

[2] I also think there's such as a thing as "too many opinions" in a
project. If we started rewriting bits of git in Haskell (just to
pick on a random pretty-far-from-C language), things would get very
complex very quickly. So again, it's about finding a balance.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Adding more namespace support to git

2016-08-22 Thread Josh Triplett

On Mon, Aug 22, 2016 at 07:36:31PM +0100, Richard wrote:
> On 21 Aug 2016 15:07, "Josh Triplett"  wrote:
> > I'd like to see it work more automatically than that.  Perhaps a
> > separate environment variable to set the client-side namespace?
> 
> How about a config option? That could be set globally, per repository, in
> the environment or on the command line.

That might work, though you wouldn't normally want to set it globally or
per-repository (since it affects access to a repository and you'd
typically want to use multiple different values or it wouldn't have much
point).

- Josh Triplett
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Editing a typo in the message given to "git commit"

2016-08-22 Thread Jakub Narębski

W dniu 21.08.2016 o 17:19, n...@dad.org pisze:

> I am learning how to use git. I would like to know:
> 
> How can I correct a typo in the message I gave to an old "git commit"? I see
> that the typo occurs in exactly two files in .git:
> 
> .git/logs/refs/heads/master
> .git/logs/HEAD
> 
> /usr/bin/file says that they are both ASCII English text. So I could just
> hand edit them. But that seems somehow sacrilegious and might break git.

As the pathname suggests, those two files are only *logs*, to be more
exact these store so called reflogs, which allow for example use @{1}
for previous position of current branch in your local repository, or
"git checkout -" (or "git checkout @{-1}") to go back to previous branch.
You can edit them (just take care to not touch the rest of line / file),
but it wouldn't change what is in your history, what 'git log' would show.

If the typo was in the last commit you have created, the simplest solution
is to use 'git commit --amend' (assuming that you didn't 'git add' any files
in meantime).

If it was something few commits back, you need to use 'git rebase 
--interactive',
starting from the commit before the one you want to change.  Then you need
to change 'pick' to 'reword', as described in instruction sheet for interactive
rebase. 


P.S. Good source of finding answers is StackOverflow[1], and new (and in beta)
StackOverflow Documentation

[1]: http://stackoverflow.com/questions/tagged/git 
[2]: http://stackoverflow.com/documentation/git

There is also #git channel on FreeNode[3]

[3]: ircs://chat.freenode.net:6697/git
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: git-config(1) should mention `git -c`

2016-08-22 Thread Jeff King

On Mon, Aug 22, 2016 at 11:42:52AM -0700, David Glasser wrote:

> In addition to describing the `git config` command, git-config(1) also
> appears to be where the algorithm for determining the active
> configuration values is documented, in the FILES and ENVIRONMENT
> sections.  (There is minimal documentation of these files and
> environment variables in git(1).)
> 
> However, `git -c name=value` is not documented in this manpage. So to
> understand the full set of ways to affect the configuration of a git
> command, you need to know to read both git(1) and git-config(1).

Yeah, I agree it is probably worth mentioning in git-config(1).

> I'd like to add a reference to `git -c` to git-config(1). I would
> happily send a patch but I'm not sure of the best place to put it —
> maybe in the FILES section?

That seems like the most sensible place, as that is where we should
cover the order of reading and precedence. Perhaps FILES should be
renamed to SOURCES or something (though I do not recall if we are
restricted to "usual" manpage section names or not).

Arguably this is not about git-config the program at all, but the
general concept of "configuration for git", because the precedence rules
apply equally to all of the git programs that read config. So it _could_
be pulled out into its own gitconfig(7) page. That's a lot more work,
though, and I don't blame you if you want to start with a more minor
update.

(I also think that having both git-config(1) and gitconfig(7) may be
confusing; "git help revisions" knows to find gitrevisions(7), but "git
help config" would always end up in git-config(1), I think).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

git-config(1) should mention `git -c`

2016-08-22 Thread David Glasser

In addition to describing the `git config` command, git-config(1) also
appears to be where the algorithm for determining the active
configuration values is documented, in the FILES and ENVIRONMENT
sections.  (There is minimal documentation of these files and
environment variables in git(1).)

However, `git -c name=value` is not documented in this manpage. So to
understand the full set of ways to affect the configuration of a git
command, you need to know to read both git(1) and git-config(1).

I'd like to add a reference to `git -c` to git-config(1). I would
happily send a patch but I'm not sure of the best place to put it —
maybe in the FILES section?

--dave

-- 
glas...@davidglasser.net | langtonlabs.org | flickr.com/photos/glasser/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Extending "extended SHA1" syntax to traverse through gitlinks?

2016-08-22 Thread Jakub Narębski

W dniu 21.08.2016 o 16:26, Josh Triplett pisze:
> On Sun, Aug 21, 2016 at 03:46:36PM +0200, Jakub Narębski wrote:
>> W dniu 21.08.2016 o 00:50, Josh Triplett pisze:
>>> Currently, if you have a branch "somebranch" that contains a gitlink
>>> "somecommit", you can write "somebranch:somecommit" to refer to the
>>> commit, just like a tree or blob.  ("man git-rev-parse" defines this
>>> syntax in the "SPECIFYING REVISIONS" section.)  You can use this
>>> anywhere you can use a committish, including "git show
>>> somebranch:somecommit", "git log somebranch:somecommit..anotherbranch",
>>> or even "git format-patch -1 somebranch:somecommit".
>>>
>>> However, you cannot traverse *through* the gitlink to look at files
>>> inside its own tree, or to look at other commits relative to that
>>> commit.  For instance, "somebranch:somecommit:somefile" and
>>> "somebranch:somecommit~3" do not work.
>>
>> Note that there is the same problem traversing through trees:
>> while 'git cat-file -p HEAD:subdir/file' works, the 'HEAD:subdir:file'
>> doesn't:
>>
>>   $ git cat-file -p HEAD:subdir:file
>>   fatal: Not a valid object name HEAD:subdir:file
> 
> Interesting point; if extending this syntax anyway, any treeish ought to
> work, not just a committish.

Actually, because you can use simply "HEAD:subdir/file" I'd rather
it didn't work (no two ways of access), unless we can get it for free.

>> Though you can do resolve step manually
>>
>>   $ git cat-file -p $(git rev-parse HEAD:subdir):file
>>
>> This works.
> 
> True, but that seems quite inconvenient.

Especially that for submodules you need:

$ git --git-dir=subdir/.git cat-file -p $(git rev-parse HEAD:subdir):file

(or something like that), assuming that you start in supermodule.
 
>>> I'd love to have a syntax that allows traversing through the gitlink to
>>> other files or commits.  Ideally, I'd suggest the syntax above, as a
>>> natural extension of the existing extended syntax.
>>
>> And with the above manual resolving, you can see the problem with
>> implementing it: the git-cat-file (in submodule) and git-rev-parse
>> (in supermodule) are across repository boundary.
> 
> Only if the gitlink points to a commit that doesn't exist in the same
> repository.  A gitlink can point to a commit you already have.

The idea of submodules is that tree object in superproject includes
link to commit of subproject (so called gitlink).  Tree object is
in superproject repository, while gitlinked commit is in submodule
repository.

True, with modern Git the submodule repository is embedded in .git
area of superproject, with '.git' in submodule being gitling file,
but by design those objects are in different repositories, in different
object databases.

>> Also the problem with proposed syntax is that is not very visible.
>> But perhaps it is all right.  Maybe :/ as separator would be better,
>> or using parentheses or braces?
> 
> It seems as visible as the standard commit:path syntax; the second colon
> seems just as visible as the first.  :/ already has a different meaning
> (text search), so that would introduce inconsistency.

Actually ":/" has a special meaning only if it is at beginning:
 - :/ for first matching commit from any ref
 - :/   is 'top directory' pathspec (equivalent to ':(top)')

But perhaps '//' would be better.

>>> (That syntax would potentially introduce ambiguity if you had a file
>>> named "somecommit:somefile" or "somecommit~3".  That doesn't seem like a
>>> problem, though; the existing syntax already doesn't support accessing a
>>> file named "x..y" or "x...y", so scripts already can't expect to access
>>> arbitrary filenames with that syntax without some kind of quoting, which
>>> we also don't have.)
>>
>> Errr... what?
>>
>>   $ echo A..B >A..B
>>   $ git add A..B
>>   $ git commit -m 'A..B added'
>>   [master 2d69af9] A..B added
>>1 file changed, 1 insertion(+), 1 deletion(-)
>>create mode 100644 A..B
>>   $ git show HEAD:A..B
>>   A..B
> 
> I stand corrected; I didn't find that.  I thought rev parsing worked
> independently from the repository, and didn't have any automagic
> detection based on the contents of the repository?

It probably depends on whether command expects range (like git-log),
supports range-like notation (like git-diff), or expects single or
multiple things (like git-show).

> This seems ambiguous, and (AFAICT) not documented.  If HEAD:A and B both
> refer to a commit, in addition to the blob A..B, which will HEAD:A..B
> refer to?  I did test the HEAD:gitlink..anotherbranch case, and it does
> parse as a range.

Well, it is ambiguous.

We would probably want to support some kind of quoting, for example
HEAD:"A..B" (where everything inside "..." is c-quoted, but can use utf-8).

-- 
Jakub Narębski 


 

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Most recent revision that contains a string

2016-08-22 Thread Nikolaus Rath

On Aug 21 2016, Eric Wong  wrote:
> Nikolaus Rath  wrote:
>> What's the easiest way to find the most recent revision (of any file in
>> the repository, including those that have been deleted in the current
>> HEAD) that contains a given string?
>
> I normally do something like:
>
>   git log -r --raw -p -SSTRING
>   git log -r --raw -p -GREGEXP
>
> You can also add --diff-filter=D to filter only on deletes.

Great, thanks!

> Btw, please don't set these headers on kernel.org lists:
>
>   Mail-Copies-To: never
>   Mail-Followup-To: git@vger.kernel.org
>
> Like any mail server, vger fails from time-to-time and
> reply-to-all prevents it from being a single point of failure.

Huh? If the list-server fails only I will receive the message so it's
still lost for everyone else. But I am more than happy to take this
little risk if it saves me from the nuisance of getting duplicate
responses.

Best,
-Nikolaus

-- 
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

 »Time flies like an arrow, fruit flies like a Banana.«
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Working with public-inbox.org [Was: [PATCH] rev-parse: respect core.hooksPath in --git-path]

2016-08-22 Thread Jakub Narębski

W dniu 22.08.2016 o 15:18, Johannes Schindelin pisze:

> So unfortunately this thread has devolved. Which is sad. Because all I
> wanted is to have a change in Git's submission process that would not
> exclude *so many* developers. That is really all I care about. Not about
> tools. Not about open vs proprietary, or standards.
> 
> I just want developers who are already familiar with Git, and come up with
> an improvement to Git itself, to be able to contribute it without having
> to pull out their hair in despair.

What is lacking in using submitGit tool for those that have problems
with sending patches via email?

Submitting changes in Git comes in three phases:
 - submit email with patches
 - review and discuss patch
 - apply patches from email

Pull request via GitHub / Bitbucket / GitLab is easier than sending
patches via email (pity that GitHub et al. do not have such submitGit-like
automation built-in).  But I think email, with threaded view, careful
trimming of quoted contents, multi-level quotes is superior to exiting
web-based solutions.

Regards,
-- 
Jakub Narębski

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 2/2] mingw: ensure temporary file handles are not inherited by child processes

2016-08-22 Thread Eric Wong

Johannes Schindelin  wrote:
> As Eric Wong pointed out, we need to be careful to handle the case where
> the Linux headers used to compile Git support O_CLOEXEC but the Linux
> kernel used to run Git does not: it returns an EINVAL.

> +++ b/git-compat-util.h
> @@ -667,6 +667,10 @@ void *gitmemmem(const void *haystack, size_t haystacklen,
>  #define getpagesize() sysconf(_SC_PAGESIZE)
>  #endif
>  
> +#ifndef O_CLOEXEC
> +#define O_CLOEXEC 0
> +#endif

> +++ b/tempfile.c
> @@ -120,7 +120,12 @@ int create_tempfile(struct tempfile *tempfile, const 
> char *path)
>   prepare_tempfile_object(tempfile);
>  
>   strbuf_add_absolute_path(>filename, path);
> - tempfile->fd = open(tempfile->filename.buf, O_RDWR | O_CREAT | O_EXCL, 
> 0666);
> + tempfile->fd = open(tempfile->filename.buf,
> + O_RDWR | O_CREAT | O_EXCL | O_CLOEXEC, 0666);
> + if (O_CLOEXEC && tempfile->fd < 0 && errno == EINVAL)
> + /* Try again w/o O_CLOEXEC: the kernel might not support it */
> + tempfile->fd = open(tempfile->filename.buf,
> + O_RDWR | O_CREAT | O_EXCL, 0666);
>   if (tempfile->fd < 0) {
>   strbuf_reset(>filename);
>   return -1;

Looks good to me from a Linux standpoint.  Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v9 0/8] submodule show inline diff

2016-08-22 Thread Stefan Beller

On Fri, Aug 19, 2016 at 11:16 PM, Jacob Keller  wrote:
>>
>>> +   if (submodule_config)
>>> +   strbuf_git_path(buf, "%s/%s", "modules",
>>> +   submodule_config->name);
>>> +   else
>>
>> I do not think we want to assume the path as the name for the fallback, 
>> though.
>>
>
> I couldn't think of anything better to do There is no error return
> flow, so just... leave it as is maybe?
>
>> If `submodule_config == NULL` then
>> a) the path/name doesn't exist in the given version.
>> (If null_sha1 is given, HEAD + working tree is assumed, whereas
>> you could also check for a specific commit of the superproject
>> with another sha1)
>
> I can't check for a specific version here because there is no
> mechanism to pass in the value we'd check it in... Maybe I need a
> separate function for that to check the specific sha1 instead of using
> nullsha1? But.. no I don't think that makes sense, we use the current
> working tree to get the submodule and then lookup old references in
> it... but if we checked an old tree it might be in the wrong path
> which does us no good because the name no longer matches? Hmmm

I agree, null_sha1 is the way to go.

>
>>
>> b) or the submodule config cache was not initialized
>>(missing call to gitmodules_config())
>>
>> c) There is no c) [at least I never came across another reason for a
>> NULL return here]
>>
>> Using the path as the fallback is errorprone (e.g. to b. in the future
>> and then you get the wrong submodule repository which is shaded by
>> assuming the path and it is hard to debug later or write forward looking
>> tests for that now)
>
> Sure, but if it doesn't exist we just fail.. so what should I put in
> the string? just leave it empty? The function doesn't have an error
> return at the moment.

I thought a die(...) would be better instead.
Looking at the callers of do_submodule_path
(which is wrapped via strbuf_git_path_submodule and
git_pathdup_submodule), we end up in

./refs/files-backend.c:
get_packed_ref_cache
resolve_gitlink_ref_recursive
read_loose_refs

other submodule related code:
module_clone
add_submodule_odb

The refs code doesn't have error handling code at the places where
we do the call to submodule path handling, so I think a die(..)
is still ok, as these cases would only happen if your super project
is hosed. e.g. I think to get into this state you'd roughly do this:

git submodule update --init
# make path and name different:
git mv sub-foo sub-bar

# (I think) two ways to hose a repository now:

# a) delete the actual submodule repository
rm -rf .git/modules/sub-foo

# or b) rename the submodule name to break the
# assumption of .git/modules/ as the sub repo path.
git config -f .gitmodules --rename-section submodule.sub-foo
submodule.sub-void

# now exercise the refs code:
git status

A wrong path would do no harm in this case; in fact it is better to go
with a non existent wrong name as it then falls back to the error handling
of the refs code.

However, assume the path would exist. (As that happens when you
"swap" two submodules in their path, i.e. when a submodule path becomes
another submodules name.

In that case I think "bad things" may happen to the other submodule?

So after typing all this out, I think a call to die(..) is still the way to go.

Thanks,
Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2] for-each-ref: add %(upstream:gone) to mark missing refs

2016-08-22 Thread Øystein Walle

git branch -vv will show "gone" next to a remote tracking branch if it
does not exist. for-each-ref is suitable for parsing but had no way of
showing this information.

This introduces "%(upstream:gone)" to display "gone" in the formatted
output if the ref does not exist or an empty string otherwise, analogous
to git branch -vv.

Signed-off-by: Øystein Walle 
---
I took the liberty of sending in a v2 on my own. Removed the last argument to
stat_tracking_info() and used test_config instead of test_when_finished.

 Documentation/git-for-each-ref.txt |  5 +++--
 ref-filter.c   |  9 -
 t/t6300-for-each-ref.sh| 11 +++
 3 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-for-each-ref.txt 
b/Documentation/git-for-each-ref.txt
index f57e69b..039a86b 100644
--- a/Documentation/git-for-each-ref.txt
+++ b/Documentation/git-for-each-ref.txt
@@ -114,8 +114,9 @@ upstream::
`refname` above.  Additionally respects `:track` to show
"[ahead N, behind M]" and `:trackshort` to show the terse
version: ">" (ahead), "<" (behind), "<>" (ahead and behind),
-   or "=" (in sync).  Has no effect if the ref does not have
-   tracking information associated with it.
+   or "=" (in sync) and `:gone` to show "gone" if the remote ref
+   does not exist, or an empty string if it does. Has no effect if
+   the ref does not have tracking information associated with it.
 
 push::
The name of a local ref which represents the `@{push}` location
diff --git a/ref-filter.c b/ref-filter.c
index bc551a7..757f473 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -37,7 +37,7 @@ static struct used_atom {
union {
char color[COLOR_MAXLEN];
struct align align;
-   enum { RR_NORMAL, RR_SHORTEN, RR_TRACK, RR_TRACKSHORT }
+   enum { RR_NORMAL, RR_SHORTEN, RR_TRACK, RR_TRACKSHORT, RR_GONE }
remote_ref;
struct {
enum { C_BARE, C_BODY, C_BODY_DEP, C_LINES, C_SIG, 
C_SUB } option;
@@ -67,6 +67,8 @@ static void remote_ref_atom_parser(struct used_atom *atom, 
const char *arg)
atom->u.remote_ref = RR_TRACK;
else if (!strcmp(arg, "trackshort"))
atom->u.remote_ref = RR_TRACKSHORT;
+   else if (!strcmp(arg, "gone"))
+   atom->u.remote_ref = RR_GONE;
else
die(_("unrecognized format: %%(%s)"), atom->name);
 }
@@ -923,6 +925,11 @@ static void fill_remote_ref_details(struct used_atom 
*atom, const char *refname,
*s = ">";
else
*s = "<>";
+   } else if (atom->u.remote_ref == RR_GONE) {
+   if (stat_tracking_info(branch, _ours, _theirs, NULL) < 
0)
+   *s = "gone";
+   else
+   *s = "";
} else /* RR_NORMAL */
*s = refname;
 }
diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index 19a2823..f99bfd0 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -383,6 +383,17 @@ test_expect_success 'Check that :track[short] works when 
upstream is invalid' '
test_cmp expected actual
 '
 
+test_expect_success 'Check that :gone produces expected results' '
+   cat >expected <<-\EOF &&
+gone
+   EOF
+   test_config branch.master.merge refs/heads/does-not-exist &&
+   git for-each-ref \
+   --format="%(upstream:gone)" \
+   refs/heads >actual &&
+   test_cmp expected actual
+'
+
 test_expect_success 'Check for invalid refname format' '
test_must_fail git for-each-ref --format="%(refname:INVALID)"
 '
-- 
2.9.2

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/4] cat-file: introduce the --filters option

2016-08-22 Thread Torsten Bögershausen

On 19.08.16 17:00, Johannes Schindelin wrote:
> Hi Torsten,
>
> On Fri, 19 Aug 2016, Torsten Bögershausen wrote:
>
>> On Thu, Aug 18, 2016 at 02:46:17PM +0200, Johannes Schindelin wrote:
>>
>>> +--filters::
>>> +   Show the content as transformed by the filters configured in
>> Minor comment:
>> s/transformed/converted/ ?
> Sure.
>
>> Does it make sense to be more specific here:
>> The order of conversion is
>> - ident
>> - CRLF
>> - smudge
> I do not think it makes sense to complexify the documentation in that
> manner. The filters should always be applied in the same order, methinks,
> and it would only clutter the man page to repeat that order here.
Can we can shorten the description and have something like this:

--filters::
+   Show the content converted by the filters configured in
+   the current working tree for the given . 
+has to be of the form : or :.
+


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] diff-highlight: add some tests.

2016-08-22 Thread Brian Henderson

Jeff, I love your idea. how's this looking?

Junio, I wasn't meaning to be stubborn, although definitely a fault of mine. I
understand a lot better now, thanks for your patience.

---
 contrib/diff-highlight/Makefile  |   5 +
 contrib/diff-highlight/t/Makefile|  22 +++
 contrib/diff-highlight/t/t9400-diff-highlight.sh | 163 +++
 3 files changed, 190 insertions(+)
 create mode 100644 contrib/diff-highlight/Makefile
 create mode 100644 contrib/diff-highlight/t/Makefile
 create mode 100755 contrib/diff-highlight/t/t9400-diff-highlight.sh

diff --git a/contrib/diff-highlight/Makefile b/contrib/diff-highlight/Makefile
new file mode 100644
index 000..9018724
--- /dev/null
+++ b/contrib/diff-highlight/Makefile
@@ -0,0 +1,5 @@
+# nothing to build
+all:
+
+test:
+   $(MAKE) -C t
diff --git a/contrib/diff-highlight/t/Makefile 
b/contrib/diff-highlight/t/Makefile
new file mode 100644
index 000..5ff5275
--- /dev/null
+++ b/contrib/diff-highlight/t/Makefile
@@ -0,0 +1,22 @@
+-include ../../../config.mak.autogen
+-include ../../../config.mak
+
+# copied from ../../t/Makefile
+SHELL_PATH ?= $(SHELL)
+SHELL_PATH_SQ = $(subst ','\'',$(SHELL_PATH))
+T = $(wildcard t[0-9][0-9][0-9][0-9]-*.sh)
+
+all: test
+test: $(T)
+
+.PHONY: help clean all test $(T)
+
+help:
+   @echo 'Run "$(MAKE) test" to launch test scripts'
+   @echo 'Run "$(MAKE) clean" to remove trash folders'
+
+$(T):
+   @echo "*** $@ ***"; '$(SHELL_PATH_SQ)' $@ $(GIT_TEST_OPTS)
+
+clean:
+   $(RM) -r 'trash directory'.*
diff --git a/contrib/diff-highlight/t/t9400-diff-highlight.sh 
b/contrib/diff-highlight/t/t9400-diff-highlight.sh
new file mode 100755
index 000..7c303f7
--- /dev/null
+++ b/contrib/diff-highlight/t/t9400-diff-highlight.sh
@@ -0,0 +1,163 @@
+#!/bin/sh
+
+test_description='Test diff-highlight'
+
+CURR_DIR=$(pwd)
+TEST_OUTPUT_DIRECTORY=$(pwd)
+TEST_DIRECTORY="$CURR_DIR"/../../../t
+DIFF_HIGHLIGHT="$CURR_DIR"/../diff-highlight
+
+CW="$(printf "\033[7m")"   # white
+CR="$(printf "\033[27m")"  # reset
+
+. "$TEST_DIRECTORY"/test-lib.sh
+
+if ! test_have_prereq PERL
+then
+   skip_all='skipping diff-highlight tests; perl not available'
+   test_done
+fi
+
+# dh_test is a test helper function which takes 3 file names as parameters. The
+# first 2 files are used to generate diff and commit output, which is then
+# piped through diff-highlight. The 3rd file should contain the expected output
+# of diff-highlight (minus the diff/commit header, ie. everything after and
+# including the first @@ line).
+dh_test () {
+   a="$1" b="$2" &&
+
+   cat >patch.exp &&
+
+   {
+   cat "$a" >file &&
+   git add file &&
+   git commit -m "Add a file" &&
+
+   cat "$b" >file &&
+   git diff file >diff.raw &&
+   git commit -am "Update a file" &&
+   git show >commit.raw
+   } >/dev/null &&
+
+   "$DIFF_HIGHLIGHT" diff.act &&
+   "$DIFF_HIGHLIGHT" commit.act &&
+   test_cmp patch.exp diff.act &&
+   test_cmp patch.exp commit.act
+}
+
+test_strip_patch_header () {
+   sed -n '/^@@/,$p' $*
+}
+
+test_expect_success 'diff-highlight highlights the beginning of a line' '
+   cat >a <<-\EOF &&
+   aaa
+   bbb
+   ccc
+   EOF
+
+   cat >b <<-\EOF &&
+   aaa
+   0bb
+   ccc
+   EOF
+
+   dh_test a b <<-EOF
+   @@ -1,3 +1,3 @@
+aaa
+   -${CW}b${CR}bb
+   +${CW}0${CR}bb
+ccc
+   EOF
+'
+
+test_expect_success 'diff-highlight highlights the end of a line' '
+   cat >a <<-\EOF &&
+   aaa
+   bbb
+   ccc
+   EOF
+
+   cat >b <<-\EOF &&
+   aaa
+   bb0
+   ccc
+   EOF
+
+   dh_test a b <<-EOF
+   @@ -1,3 +1,3 @@
+aaa
+   -bb${CW}b${CR}
+   +bb${CW}0${CR}
+ccc
+   EOF
+'
+
+test_expect_success 'diff-highlight highlights the middle of a line' '
+   cat >a <<-\EOF &&
+   aaa
+   bbb
+   ccc
+   EOF
+
+   cat >b <<-\EOF &&
+   aaa
+   b0b
+   ccc
+   EOF
+
+   dh_test a b <<-EOF
+   @@ -1,3 +1,3 @@
+aaa
+   -b${CW}b${CR}b
+   +b${CW}0${CR}b
+ccc
+   EOF
+'
+
+test_expect_success 'diff-highlight does not highlight whole line' '
+   cat >a <<-\EOF &&
+   aaa
+   bbb
+   ccc
+   EOF
+
+   cat >b <<-\EOF &&
+   aaa
+   000
+   ccc
+   EOF
+
+   dh_test a b <<-EOF
+   @@ -1,3 +1,3 @@
+aaa
+   -bbb
+   +000
+ccc
+   EOF
+'
+

Re: [git-for-windows] Re: [ANNOUNCE] Git for Windows 2.9.3

2016-08-22 Thread Duy Nguyen

On Thu, Aug 18, 2016 at 3:37 PM, Johannes Schindelin
 wrote:
> Hi Junio,
>
> On Wed, 17 Aug 2016, Junio C Hamano wrote:
>
>> Johannes Schindelin  writes:
>>
>> >> And then your "git cat-file" patch can be upstreamed with the option
>> >> renamed to (or with an additional synonym) "--filters", which would make
>> >> things consistent.
>> >
>> > Right. I would like to ask for a `--smudge` synonym nevertheless, just
>> > because I already use this. On the other hand, it is early enough to tell
>> > everybody who knows about this feature to change their invocation (anybody
>> > who would know about `--smudge` would be in that 1% of users that have
>> > read the release notes, so most likely would read the next release notes,
>> > too).
>>
>> It is OK if it were your private edition, but you end up hurting
>> your users if you need to redo the feature differently.
>
> Unfortunately, this is the situation of Git for Windows from its
> beginning: there has not been a single time that Git for Windows could
> live with unpatched upstream Git's source code.
>
> Business as usual, though.

Bug fixes is one thing, features is completely different. Should we
just acknowledge git-for-windows as a long-living fork and rename it
to something else? Because if somebody comes here with a "git" problem
on Windows, I would look at git.git source code, not gfw. I'd rather
recognize it a special fork (by name) right away and ignore. You could
have the same policy distros have: all bugs go to distros (i.e. gfw),
some bugs may be forwarded upstream (git.git).
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Working with public-inbox.org [Was: [PATCH] rev-parse: respect core.hooksPath in --git-path]

2016-08-22 Thread Johannes Schindelin

Hi Stefan,

On Fri, 19 Aug 2016, Stefan Beller wrote:

> >> I see a choice of mail client as no different than a choice of text
> >> editor.  Neither my mail client or text editor is heavily customized.
> >> The key feature I rely on from both tools is piping data to external
> >> commands.
> >
> > There you go. That key feature seems to be unavailable in the most
> > wide-spread email client: Outlook. So by not restricting the choice
> > you should make it possible to use that mail client, too, right?
> 
> Well I think this data piping is essential to any workflow.

Data piping can go through Git, with convenient commands.

> Even if were to abandon email completely and roll our own communications
> protocol, one of the first things added would be an API to "use your own
> text editor".
> 
> In my case git-send-email works well for the sending direction without a
> lot of custom tooling (read: none apart from the initial configuration).

Sending those mails is but the tiny, first step of contributing patches.
You know as well as I do that many a times contributors have to work
through many iterations to get their work accepted.

So while send-email helps with one direction, everything after that is
hard, manual work.

> > We do not even have a section on Outlook in our SubmittingPatches.
> 
> "You can write one? Pretty please?" would be the canonical answer. ;)

Sure. And my answer to that is: I cannot write it. Why? Because I cannot
get it the heck to work. Because Outlook supports writing mails, i.e.
messages from humans for humans. You can change the font, insert a nice
photo from your vacations, left-justify, right-justify, center the text.
You can do all kinds of nice things that you need to do when talking to
humans.

You can even paste a diff, for a human to read. Humans are very good at
not even seeing that there is no space at the beginning of the line.
Humans are also very good at understanding that those 8 spaces are the
same as the tab in the source code.

Outlook can also keep excellent track of who was Cc:ed, of mail threads,
filtering mails based on a plethora of criteria, integrating with a
calendar, etc.

So Outlook does really an excellent job.

What it does *not* to well is something mail was not designed for.

> Maybe we should invent a patch format that copes with broken whitespace?

Or maybe we do not even have to go that far, but maybe we can teach `git
apply` a mode where it is much smarter about whitespace changes and
wrapped text in the patch it receives?

That would probably go a long way further to making the patch submission
process we use more friendly to human beings.

It still would not make it easy to go from replies containing suggestions
how to improve the code to the corresponding file/revision.

> >> Users ought to be able to pick, choose, and replace tools as they
> >> wish as long as an interchange format remains stable and
> >> widely-supported.
> >
> > Right. Let's talk about the interchange format called mails, for the
> > data called patches. Is it stable and widely-supported?
> 
> It is stable as it has been around for years and you can choose whether
> you use git apply or the patch utility.

You seem to assume that mail clients have an easy time with this supposed
"stable" format.

They don't.

> It is widely supported as it is raw text so it can be used across
> different platforms. However it doesn't cope well with email, as email
> modifies text sometimes such as mangling white spaces.

I "mangle" whitespace all the time when I respond to mails. You will note
that I re-wrap quoted text to 76 columns/row.

So I am as guilty as any mail client of your charge.

Sorry.

> > Can users really pick and choose the tools they like most to send patches
> > to the Git project? Like, the Outlook client? Or the GMail client?
> 
> Of course, see[1] ;)
> [1] 
> https://public-inbox.org/git/CA+55aFy2AEe7ew5Px=2Uit6hraGV9zFr=jz57rsyxwmq4nm...@mail.gmail.com/

You speak in riddles. That link leads to Linus' mail talking about
committerdates and generation numbers.

It does not help me, not in the slightest, to send a patch via Outlook or
the web interface of GoogleMail without risking to get yelled at for
corrupting the patch.

Ciao,
Dscho
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4] config: add conditional include

2016-08-22 Thread Duy Nguyen

On Mon, Aug 22, 2016 at 8:22 PM, Matthieu Moy
 wrote:
>>> I think the syntax should be design to allow arbitrary boolean
>>> expression later if needed.
>>
>> I would be against that. We may extend it more in future, but it
>> should be under control, not full boolean expressions.
>
> Why?
>
> I'm not saying we absolutely need it, but if we allow several kinds of
> conditions (gitdir-is:... and others in the future), then it's natural
> to allow combining them, and arbitrary boolean expression is both simple
> and powerful (operators and/or/not and parenthesis and you have
> everything you'll ever need).

For starter, we don't want another debate "python vs ruby vs lua vs
..." as the scripting language :) (for the record I vote Scheme! maybe
with infix syntax)

Seriously though, for things that are evaluated in the background
every single time a command is executed and things that come from many
different places (/etc, $HOME, $XDG, $GIT_DIR) I think absolute
flexibility just makes it harder to pinpoint when things go wrong.
Combination in this case would be a bad thing, not a good one. By
judging case by case before introducing a new condition type, we
probably can see what sort of combination would be and whether we
could accept that.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Working with public-inbox.org [Was: [PATCH] rev-parse: respect core.hooksPath in --git-path]

2016-08-22 Thread Johannes Schindelin

Hi Eric,

On Fri, 19 Aug 2016, Eric Wong wrote:

> Johannes Schindelin  wrote:
> > On Thu, 18 Aug 2016, Eric Wong wrote:
> > > Johannes Schindelin  wrote:
> > >
> > > > Old dogs claim the mail list-approach works for them. Nope.
> > > > Doesn't.  Else you would not have written all those custom
> > > > scripts.
> > > 
> > > git and cogito started as a bunch of custom scripts, too.
> > 
> > The difference is that neither git nor cogito were opinionated. Those
> > custom scripts are. They are for one particular workflow, with one
> > particular mail client, with a strong bias to a Unix-y environment.
> > 
> > I work really hard to make Git for Windows as easy and fun to use as
> > possible. I just wish that we were working together to make it as easy
> > and fun to contribute to Git, too.
> 
> I guess this is a fundamental difference between *nix and Windows
> culture.

I do not understand how you get from "I wish to make it fun to contribute
to Git" to "there is a fundamental difference between *nix and Windows
culture".

> I know Windows users have major performance problems with
> shell scripts;

That's because shell scripting is not native to Windows. I wish Linux had
a Powershell, allowing for decent scripting that does not try to smoosh
everything into a line-based text format. (Of course, since last week,
Linux does have a Powershell.)

Powershell is blazing fast, by the way, and not as ridiculously limited in
its expressibility as shell scripting.

But all of this is digressing from the original topic. I do not think this
is a productive.

> > We do not even have a section on Outlook in our SubmittingPatches.
> > 
> > Okay, if not the most popular mail client, then web mail? Nope, nope,
> > nope. No piping *at all* to external commands from there.
> > 
> > So you basically slam the door shut on the vast majority of email users.
> 
> Users have a choice to use a more scriptable mail client
> (but I guess the OS nudges users towards monolithic tools)

You call that choice. Are you serious?

> > That is not leaving much choice to the users in my book.
> 
> Users of alpine, gnus, mutt, sylpheed, thunderbird, kmail,
> roundcube, squirelmail, etc. can all download the source, hack,
> fix and customize things.  It's easier with smaller software,
> of course:  git-send-email does not even require learning
> the build process or separate download.

Now I am getting upset. This is a BS argument. Sure, I can hack the source
of these tools.

But why on earth do I *have* to? Why can't we use or create an open
contribution process *that works without having to work so hard to be able
to contribute*?

So unfortunately this thread has devolved. Which is sad. Because all I
wanted is to have a change in Git's submission process that would not
exclude *so many* developers. That is really all I care about. Not about
tools. Not about open vs proprietary, or standards.

I just want developers who are already familiar with Git, and come up with
an improvement to Git itself, to be able to contribute it without having
to pull out their hair in despair.

Ciao,
Dscho
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Working with public-inbox.org [Was: [PATCH] rev-parse: respect core.hooksPath in --git-path]

2016-08-22 Thread Duy Nguyen

On Mon, Aug 22, 2016 at 8:06 PM, Johannes Schindelin
 wrote:
> My point stands. We are way more uninviting to contributors than
> necessary. And a huge part of the problem is that we require contributors
> to send their patches inlined into whitespace-preserving mails.

We probably can settle this in the next git survey with a new
question: what's stopping you from contributing to git?
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4] config: add conditional include

2016-08-22 Thread Matthieu Moy

Duy Nguyen  writes:

> On Mon, Aug 22, 2016 at 7:59 PM, Matthieu Moy
>  wrote:
>> Duy Nguyen  writes:
>>
>>> On Sun, Aug 21, 2016 at 4:08 AM, Jakub Narębski  wrote:
 W dniu 19.08.2016 o 15:54, Jeff King pisze:
> On Sat, Aug 13, 2016 at 03:40:59PM +0700, Duy Nguyen wrote:
>
>> Ping..
>
> There was some discussion after v4. I think the open issues are:
>
>   - the commit message is rather terse (it should describe motivation,
> and can refer to the docs for the "how")
>
>   - the syntax might be more clear as:
>
>[include-if "gitdir:..."]
>
> or
>
>[include "gitdir-is:..."]

 Or

  [include "if-gitdir:..."]
>>>
>>> I like this one. I can re-roll to address the first two bullet point,
>>> if the last one, the open question, will not become a blocker later
>>> on.
>>
>> I think the syntax should be design to allow arbitrary boolean
>> expression later if needed.
>
> I would be against that. We may extend it more in future, but it
> should be under control, not full boolean expressions.

Why?

I'm not saying we absolutely need it, but if we allow several kinds of
conditions (gitdir-is:... and others in the future), then it's natural
to allow combining them, and arbitrary boolean expression is both simple
and powerful (operators and/or/not and parenthesis and you have
everything you'll ever need).

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4] config: add conditional include

2016-08-22 Thread Duy Nguyen

On Mon, Aug 22, 2016 at 7:59 PM, Matthieu Moy
 wrote:
> Duy Nguyen  writes:
>
>> On Sun, Aug 21, 2016 at 4:08 AM, Jakub Narębski  wrote:
>>> W dniu 19.08.2016 o 15:54, Jeff King pisze:
 On Sat, Aug 13, 2016 at 03:40:59PM +0700, Duy Nguyen wrote:

> Ping..

 There was some discussion after v4. I think the open issues are:

   - the commit message is rather terse (it should describe motivation,
 and can refer to the docs for the "how")

   - the syntax might be more clear as:

[include-if "gitdir:..."]

 or

[include "gitdir-is:..."]
>>>
>>> Or
>>>
>>>  [include "if-gitdir:..."]
>>
>> I like this one. I can re-roll to address the first two bullet point,
>> if the last one, the open question, will not become a blocker later
>> on.
>
> I think the syntax should be design to allow arbitrary boolean
> expression later if needed.

I would be against that. We may extend it more in future, but it
should be under control, not full boolean expressions.

> Then, I prefer one of
>
>   [include-if "gitdir-is:..."]
>   [include"gitdir-is:..."]
>
> because it may later be extended to e.g.
>
>   [include-if "not(gitdir-is:...)"]
>   [include-if "gitdir-matches:regex"]
>   [include-if "gitdir-is:... and git-version-greater:2.9"]
>   ...
>
> I actually already use "conditional include on version number" because I
> use push.default=upstream which makes older versions of Git crash, but
> fortunately these versions of Git also ignore the "include" directive so
> having this push.default=upstream in an included file works. It's a
> hack, it worked once but it won't work again.

We probably have a way to stop old git from reading new git's
features, including ones in config files: the config group
extensions.*. Assuming that "[include "blah"]" is new stuff, git can
be taught to accept that only when extensions.blah is present (which
old git would bail). It discourages adding too many fancy features
(because extensions.* in your config file would be a mess), which is
IMO a good point.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Working with public-inbox.org [Was: [PATCH] rev-parse: respect core.hooksPath in --git-path]

2016-08-22 Thread Johannes Schindelin

Hi Peff,

On Fri, 19 Aug 2016, Jeff King wrote:

> On Thu, Aug 18, 2016 at 02:42:34PM +0200, Johannes Schindelin wrote:
> 
> > BTW I take this thread as yet another proof that people are unhappy
> > with mail list-based review: if you have to build *that much* tooling
> > around it (and Peff & Junio certainly have a megaton of advanced and
> > sophisticated tooling around it, holy cow!) it is really incorrect to
> > state that the mail list-driven approach works for you. It is much
> > closer to the truth to say that the
> > mail-list-plus-loads-of-custom-tools-driven approach works for you.
> > 
> > I am really not a fan of this.
> > 
> > The theory "it's inclusive because everyone has access to mail" falls
> > on its face, badly, when even old timers have to build entire
> > infrastructures around it just to begin to be able to use it
> > efficiently.
> > 
> > It reminds me of an old software developer I met long ago, who claimed
> > CVS works for him. He had written tens of thousands of lines of shell
> > scripts, is what allowed "CVS" to work for him.
> > 
> > Same here. Old dogs claim the mail list-approach works for them. Nope.
> > Doesn't. Else you would not have written all those custom scripts.
> 
> I read this over, didn't agree, waited a whole day for perspective, and
> still just can't agree. So now I'm responding. :)

Thank you for your constructive feedback.

Obviously I just can't agree with what you wrote, at least not completely.
So after waiting an entire weekend for perspective, here are my thoughts
on your comments:

> There is nothing wrong with building tooling around your workflow. If we
> had a GitHub-based workflow, I'd build tooling around that, too.

Sure. You, Junio and myself seem to be kings of scripting. Automating
common tasks through scripting is something that allows me to not go
completely bunkers with the workload I have. I imagine it is similar on
your side.

> One of the things I _like_ about a mail-based workflow is how easy it is
> to build that tooling, and to get it to integrate with other existing
> tools.

Here I disagree violently. What I would have agreed to would be a
statement similar to "It is easy to integrate scripting with mutt".

To be quite frank: we are talking about very different things when we talk
about mail-based workflows. Heck, we even talk about gazillions of
different things when we talk about an email! Just think about this here
email: you might read it written in a font that makes it easy to discern
"1" from "l" from "I". For a vast number of people, this is not even true!

So what we are talking about here are apples and oranges and apple cider
vinegar. Email clients are *so vastly different* from one another, it
would be ludicrious to assume that they have anything in common when it
comes to "mail-based workflows".

As a matter of fact, the thing that you pointed out as the most important
("how easy it is [...] to integrate with other existing tools") does *not*
apply for the *vast* majority of email clients, most prominently Outlook,
GMail, Apple Mail and Thunderbird.

And that mere fact is very, very important to keep in mind. We build Git,
which is very, very successful because it fills a need of developers using
means that are accessible to many, directly or indirectly (the
command-line).

Yet, when it comes to contributing to Git's source code, we deviate from
the common path and require a means that is *not* accessible to many. We
require them to use something different than their regular email client.

> It's the major reason I'm resistant to moving development to GitHub. Not
> only would I have to throw away all of my tools, but I'm not sure I
> could easily build equivalent ones.

I am sympathetic to your reasoning (even if I vividly disagree with your
assessment that it would be difficult to build tools around the GitHub
API, I made quite a couple of such tools myself, and it is quite easy, you
can even script it on the command-line using cURL).

However, I have to point out that the Git project is really uninviting to
contributors, and that this resistance is part of what makes it so.

> Now, I am perfectly open to the idea that more of the tooling should be
> standardized, so people do not have to build their own. But the major
> problem there is that so much of what I've built is about gluing things
> together for the rest of _my_ tools. I've shared my techniques and
> scripts for using mutt several times, but they don't do somebody a lick
> of good if they are using gmail or thunderbird.

It is nice of you to share those tools, of course, and you are correct
that the specificity of your tools limits their being useful. I, for one,
cannot use your mutt-based tools.

(I gladly use git-jump now, though, but it is still very limited by its
being specific to vim. In other words, both your mutt-based scripts and
your git-jump script are limited in their audience just by being so
opinionated. This is distinctly different from Git's

Re: [PATCH v4] config: add conditional include

2016-08-22 Thread Matthieu Moy

Duy Nguyen  writes:

> On Sun, Aug 21, 2016 at 4:08 AM, Jakub Narębski  wrote:
>> W dniu 19.08.2016 o 15:54, Jeff King pisze:
>>> On Sat, Aug 13, 2016 at 03:40:59PM +0700, Duy Nguyen wrote:
>>>
 Ping..
>>>
>>> There was some discussion after v4. I think the open issues are:
>>>
>>>   - the commit message is rather terse (it should describe motivation,
>>> and can refer to the docs for the "how")
>>>
>>>   - the syntax might be more clear as:
>>>
>>>[include-if "gitdir:..."]
>>>
>>> or
>>>
>>>[include "gitdir-is:..."]
>>
>> Or
>>
>>  [include "if-gitdir:..."]
>
> I like this one. I can re-roll to address the first two bullet point,
> if the last one, the open question, will not become a blocker later
> on.

I think the syntax should be design to allow arbitrary boolean
expression later if needed. Then, I prefer one of

  [include-if "gitdir-is:..."]
  [include"gitdir-is:..."]

because it may later be extended to e.g.

  [include-if "not(gitdir-is:...)"]
  [include-if "gitdir-matches:regex"]
  [include-if "gitdir-is:... and git-version-greater:2.9"]
  ...

I actually already use "conditional include on version number" because I
use push.default=upstream which makes older versions of Git crash, but
fortunately these versions of Git also ignore the "include" directive so
having this push.default=upstream in an included file works. It's a
hack, it worked once but it won't work again.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 1/2] t6026-merge-attr: child processes must not inherit index.lock handles

2016-08-22 Thread Johannes Schindelin

From: Ben Wijen 

On Windows, a file cannot be removed unless all file handles to it have
been released. Hence it is particularly important to close handles when
spawning children (which would probably not even know that they hold on
to those handles).

The example chosen for this test is a custom merge driver that indeed
has no idea that it blocks the deletion of index.lock. The full use case
is a daemon that lives on after the merge, with subsequent invocations
handing off to the daemon, thereby avoiding hefty start-up costs. We
simulate this behavior by simply sleeping one second.

Note that the test only fails on Windows, due to the file locking issue.
Since we have no way to say "expect failure with MINGW, success
otherwise", we simply skip this test on Windows for now.

Signed-off-by: Ben Wijen 
Signed-off-by: Johannes Schindelin 
---
 t/t6026-merge-attr.sh | 13 +
 1 file changed, 13 insertions(+)

diff --git a/t/t6026-merge-attr.sh b/t/t6026-merge-attr.sh
index ef0cbce..3d28c78 100755
--- a/t/t6026-merge-attr.sh
+++ b/t/t6026-merge-attr.sh
@@ -181,4 +181,17 @@ test_expect_success 'up-to-date merge without common 
ancestor' '
)
 '
 
+test_expect_success !MINGW 'custom merge does not lock index' '
+   git reset --hard anchor &&
+   write_script sleep-one-second.sh <<-\EOF &&
+   sleep 1 &
+   EOF
+
+   test_write_lines >.gitattributes \
+   "* merge=ours" "text merge=sleep-one-second" &&
+   test_config merge.ours.driver true &&
+   test_config merge.sleep-one-second.driver ./sleep-one-second.sh &&
+   git merge master
+'
+
 test_done
-- 
2.10.0.rc0.115.ged054c0


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] mingw: ensure temporary file handles are not inherited by child processes

2016-08-22 Thread Johannes Schindelin

Hi,

On Fri, 19 Aug 2016, Junio C Hamano wrote:

> Eric Wong  writes:
> 
> > I'd be more comfortable keeping the EINVAL check that got
> > snipped in your reply.  O_CLOEXEC can be defined to non-zero in
> > new userspace headers, but an older kernel chokes on it with
> > EINVAL.
> 
> Good point.  Thanks.

I tried to accomodate both of your suggestions by defining O_CLOEXEC
in git-compat-util.h unless defined earlier, and by handling EINVAL via
dropping O_CLOEXEC in a second call to open(). Please inspect the
interdiff of the upcoming iteration.

Ciao,
Dscho
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 2/2] mingw: ensure temporary file handles are not inherited by child processes

2016-08-22 Thread Johannes Schindelin

From: Ben Wijen 

When the index is locked and child processes inherit the handle to
said lock and the parent process wants to remove the lock before the
child process exits, on Windows there is a problem: it won't work
because files cannot be deleted if a process holds a handle on them.
The symptom:

Rename from 'xxx/.git/index.lock' to 'xxx/.git/index' failed.
Should I try again? (y/n)

Spawning child processes with bInheritHandles==FALSE would not work
because no file handles would be inherited, not even the hStdXxx
handles in STARTUPINFO (stdin/stdout/stderr).

Opening every file with O_NOINHERIT does not work, either, as e.g.
git-upload-pack expects inherited file handles.

This leaves us with the only way out: creating temp files with the
O_NOINHERIT flag. This flag is Windows-specific, however. For our
purposes, it is equivalent to O_CLOEXEC (which does not exist on
Windows), so let's just open temporary files with the O_CLOEXEC flag and
map that flag to O_NOINHERIT on Windows.

As Eric Wong pointed out, we need to be careful to handle the case where
the Linux headers used to compile Git support O_CLOEXEC but the Linux
kernel used to run Git does not: it returns an EINVAL.

This fixes the test that we just introduced to demonstrate the problem.

Signed-off-by: Ben Wijen 
Signed-off-by: Johannes Schindelin 
---
 compat/mingw.h| 4 
 git-compat-util.h | 4 
 lockfile.h| 4 
 t/t6026-merge-attr.sh | 2 +-
 tempfile.c| 7 ++-
 tempfile.h| 4 
 6 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/compat/mingw.h b/compat/mingw.h
index 95e128f..753e641 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -67,6 +67,10 @@ typedef int pid_t;
 #define F_SETFD 2
 #define FD_CLOEXEC 0x1
 
+#if !defined O_CLOEXEC && defined O_NOINHERIT
+#define O_CLOEXEC  O_NOINHERIT
+#endif
+
 #ifndef EAFNOSUPPORT
 #define EAFNOSUPPORT WSAEAFNOSUPPORT
 #endif
diff --git a/git-compat-util.h b/git-compat-util.h
index f52e00b..db89ba7 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -667,6 +667,10 @@ void *gitmemmem(const void *haystack, size_t haystacklen,
 #define getpagesize() sysconf(_SC_PAGESIZE)
 #endif
 
+#ifndef O_CLOEXEC
+#define O_CLOEXEC 0
+#endif
+
 #ifdef FREAD_READS_DIRECTORIES
 #ifdef fopen
 #undef fopen
diff --git a/lockfile.h b/lockfile.h
index 3d30193..d26ad27 100644
--- a/lockfile.h
+++ b/lockfile.h
@@ -55,6 +55,10 @@
  *   * calling `fdopen_lock_file()` to get a `FILE` pointer for the
  * open file and writing to the file using stdio.
  *
+ *   Note that the file descriptor returned by hold_lock_file_for_update()
+ *   is marked O_CLOEXEC, so the new contents must be written by the
+ *   current process, not a spawned one.
+ *
  * When finished writing, the caller can:
  *
  * * Close the file descriptor and rename the lockfile to its final
diff --git a/t/t6026-merge-attr.sh b/t/t6026-merge-attr.sh
index 3d28c78..dd8f88d 100755
--- a/t/t6026-merge-attr.sh
+++ b/t/t6026-merge-attr.sh
@@ -181,7 +181,7 @@ test_expect_success 'up-to-date merge without common 
ancestor' '
)
 '
 
-test_expect_success !MINGW 'custom merge does not lock index' '
+test_expect_success 'custom merge does not lock index' '
git reset --hard anchor &&
write_script sleep-one-second.sh <<-\EOF &&
sleep 1 &
diff --git a/tempfile.c b/tempfile.c
index 0af7ebf..2990c92 100644
--- a/tempfile.c
+++ b/tempfile.c
@@ -120,7 +120,12 @@ int create_tempfile(struct tempfile *tempfile, const char 
*path)
prepare_tempfile_object(tempfile);
 
strbuf_add_absolute_path(>filename, path);
-   tempfile->fd = open(tempfile->filename.buf, O_RDWR | O_CREAT | O_EXCL, 
0666);
+   tempfile->fd = open(tempfile->filename.buf,
+   O_RDWR | O_CREAT | O_EXCL | O_CLOEXEC, 0666);
+   if (O_CLOEXEC && tempfile->fd < 0 && errno == EINVAL)
+   /* Try again w/o O_CLOEXEC: the kernel might not support it */
+   tempfile->fd = open(tempfile->filename.buf,
+   O_RDWR | O_CREAT | O_EXCL, 0666);
if (tempfile->fd < 0) {
strbuf_reset(>filename);
return -1;
diff --git a/tempfile.h b/tempfile.h
index 4219fe4..2f0038d 100644
--- a/tempfile.h
+++ b/tempfile.h
@@ -33,6 +33,10 @@
  *   * calling `fdopen_tempfile()` to get a `FILE` pointer for the
  * open file and writing to the file using stdio.
  *
+ *   Note that the file descriptor returned by create_tempfile()
+ *   is marked O_CLOEXEC, so the new contents must be written by
+ *   the current process, not any spawned one.
+ *
  * When finished writing, the caller can:
  *
  * * Close the file descriptor and remove the temporary file by
-- 
2.10.0.rc0.115.ged054c0
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo

[PATCH v3 0/2] Do not lock temporary files via child processes on Windows

2016-08-22 Thread Johannes Schindelin

This issue was originally reported and fixed in
https://github.com/git-for-windows/git/pull/755

The problem is that file handles to temporary files (such as
index.lock) were inherited by spawned processes. If those spawned
processes do not exit before the parent process wants to delete or
rename them, we are in big trouble.

The original use case triggering the bug is a merge driver that does
not quit, but listen to subsequent merge requests.

However, the same issue turned up in Lars Schneider's work on making
clean/smudge filters batchable (i.e. more efficient by avoiding
possibly thousands of child processes, one per file).

Changes since v2:

- O_CLOEXEC is defined in git-compat-util.h unless already defined
- we now handle EINVAL by trying again without O_CLOEXEC


Ben Wijen (2):
  t6026-merge-attr: child processes must not inherit index.lock handles
  mingw: ensure temporary file handles are not inherited by child
processes

 compat/mingw.h|  4 
 git-compat-util.h |  4 
 lockfile.h|  4 
 t/t6026-merge-attr.sh | 13 +
 tempfile.c|  7 ++-
 tempfile.h|  4 
 6 files changed, 35 insertions(+), 1 deletion(-)

Published-As: https://github.com/dscho/git/releases/tag/mingw-index-lock-v3
Fetch-It-Via: git fetch https://github.com/dscho/git mingw-index-lock-v3

Interdiff vs v2:

 diff --git a/git-compat-util.h b/git-compat-util.h
 index f52e00b..db89ba7 100644
 --- a/git-compat-util.h
 +++ b/git-compat-util.h
 @@ -667,6 +667,10 @@ void *gitmemmem(const void *haystack, size_t haystacklen,
  #define getpagesize() sysconf(_SC_PAGESIZE)
  #endif
  
 +#ifndef O_CLOEXEC
 +#define O_CLOEXEC 0
 +#endif
 +
  #ifdef FREAD_READS_DIRECTORIES
  #ifdef fopen
  #undef fopen
 diff --git a/tempfile.c b/tempfile.c
 index db3981d..2990c92 100644
 --- a/tempfile.c
 +++ b/tempfile.c
 @@ -120,7 +120,12 @@ int create_tempfile(struct tempfile *tempfile, const char 
*path)
prepare_tempfile_object(tempfile);
  
strbuf_add_absolute_path(>filename, path);
 -  tempfile->fd = open(tempfile->filename.buf, O_RDWR | O_CREAT | O_EXCL | 
O_CLOEXEC, 0666);
 +  tempfile->fd = open(tempfile->filename.buf,
 +  O_RDWR | O_CREAT | O_EXCL | O_CLOEXEC, 0666);
 +  if (O_CLOEXEC && tempfile->fd < 0 && errno == EINVAL)
 +  /* Try again w/o O_CLOEXEC: the kernel might not support it */
 +  tempfile->fd = open(tempfile->filename.buf,
 +  O_RDWR | O_CREAT | O_EXCL, 0666);
if (tempfile->fd < 0) {
strbuf_reset(>filename);
return -1;

-- 
2.10.0.rc0.115.ged054c0

base-commit: 2632c897f74b1cc9b5533f467da459b9ec725538
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4] config: add conditional include

2016-08-22 Thread Duy Nguyen

On Sun, Aug 21, 2016 at 4:08 AM, Jakub Narębski  wrote:
> W dniu 19.08.2016 o 15:54, Jeff King pisze:
>> On Sat, Aug 13, 2016 at 03:40:59PM +0700, Duy Nguyen wrote:
>>
>>> Ping..
>>
>> There was some discussion after v4. I think the open issues are:
>>
>>   - the commit message is rather terse (it should describe motivation,
>> and can refer to the docs for the "how")
>>
>>   - the syntax might be more clear as:
>>
>>[include-if "gitdir:..."]
>>
>> or
>>
>>[include "gitdir-is:..."]
>
> Or
>
>  [include "if-gitdir:..."]

I like this one. I can re-roll to address the first two bullet point,
if the last one, the open question, will not become a blocker later
on.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] checkout: swap the order of ambiguity check for :/ syntax

2016-08-22 Thread Nguyễn Thái Ngọc Duy

This should speed up "git checkout :/long/path/taken/from/diffstat"
because we don't have to walk through the commit graph to see if
"long/path/taken/from/diffstat" exists in any commit message.

This is in the the same spirit as 4db86e8 (Update :/abc ambiguity check
- 2013-01-21), but instead of considering this case ("abc" in ":/abc"
exists on worktree) ambiguous and shouting, we just assume the user
means "pathspec" not "ref".

It's not wonderful, but it's in line with how git-checkout stops caring
about ambiguity after the first argument can be resolved as a ref
(there's even a test for it, t2010.6). In other words, we assume the
user means "ref" not "pathspec" when that happens. Here we simply swap
the order of checking specifically for :/ on practical ground.

Last note, to be pedantic, we should check if "abc" from ":/abc" exists
as an _index_ entry, not on disk. But chances are they exist in both
places anyway...

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/checkout.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 8672d07..6f016db 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -973,6 +973,22 @@ static int parse_branchname_arg(int argc, const char 
**argv,
if (!strcmp(arg, "-"))
arg = "@{-1}";
 
+   if (dash_dash_pos < 0 && starts_with(arg, ":/") &&
+   check_filename(opts->prefix, arg)) {
+   /*
+* Normally if the first argument is ambiguous, we
+* choose to believe the user specifies an extended
+* SHA-1 syntax unless it turns out not true, then we
+* see if it's a pathspec.
+*
+* :/ here is an exception because resolving :/abc may
+* involve walking through the entire commit
+* graph. Expensive and slow. If :/abc points to an
+* existing file, ignore ambiguity and go with
+* pathspec (i.e. skip get_sha1_mb()).
+*/
+   return 0;   /* case (2) */
+   }
if (get_sha1_mb(arg, rev)) {
/*
 * Either case (3) or (4), with  not being
-- 
2.8.2.524.g6ff3d78

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] po/vi.po: add a missing space

2016-08-22 Thread Nguyễn Thái Ngọc Duy

This space is important because the %s could be ":/abc", which makes
the final output "tham chiếu không phải là một cây::/abc".
"cây::/abc" looks like a new fancy form of extended SHA-1 syntax. But
it's not.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 po/vi.po | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/po/vi.po b/po/vi.po
index f048244..e24e214 100644
--- a/po/vi.po
+++ b/po/vi.po
@@ -4626,7 +4626,7 @@ msgstr "tham chiếu không hợp lệ: %s"
 #: builtin/checkout.c:1037
 #, c-format
 msgid "reference is not a tree: %s"
-msgstr "tham chiếu không phải là một cây:%s"
+msgstr "tham chiếu không phải là một cây: %s"
 
 #: builtin/checkout.c:1076
 msgid "paths cannot be used with switching branches"
-- 
2.8.2.524.g6ff3d78

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 7/7] blame: actually use the diff opts parsed from the command line

2016-08-22 Thread Michael Haggerty

"git blame" already parsed generic diff options from the command line
via diff_opt_parse(), but instead of passing the resulting xdl_opts to
xdi_diff(), it sent its own xdl_opts, which only reflected the values of
the self-parsed options "-w" and "--minimal". Instead, rely on
diff_opt_parse() to parse all of the diff options, including "-w" and
"--minimal", and pass the resulting xdl_opts to xdi_diff().

Signed-off-by: Michael Haggerty 
---
Somebody who knows more about how diff operations are configured
should please review this. I'm not certain that the change as
implemented won't have other unwanted side-effects, though of course
I checked that the test suite runs correctly.

 builtin/blame.c|  11 ++--
 t/t4059-diff-indent.sh | 160 +
 2 files changed, 165 insertions(+), 6 deletions(-)
 create mode 100755 t/t4059-diff-indent.sh

diff --git a/builtin/blame.c b/builtin/blame.c
index 7ec7823..cde2d15 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -48,11 +48,12 @@ static int show_root;
 static int reverse;
 static int blank_boundary;
 static int incremental;
-static int xdl_opts;
 static int abbrev = -1;
 static int no_whole_file_rename;
 static int show_progress;
 
+static struct rev_info revs;
+
 static struct date_mode blame_date_mode = { DATE_ISO8601 };
 static size_t blame_date_width;
 
@@ -137,11 +138,12 @@ struct progress_info {
 static int diff_hunks(mmfile_t *file_a, mmfile_t *file_b,
  xdl_emit_hunk_consume_func_t hunk_func, void *cb_data)
 {
-   xpparam_t xpp = {0};
+   xpparam_t xpp;
xdemitconf_t xecfg = {0};
xdemitcb_t ecb = {NULL};
 
-   xpp.flags = xdl_opts;
+   memset(, 0, sizeof(xpp));
+   xpp.flags = revs.diffopt.xdl_opts;
xecfg.hunk_func = hunk_func;
ecb.priv = cb_data;
return xdi_diff(file_a, file_b, , , );
@@ -2517,7 +2519,6 @@ static int blame_move_callback(const struct option 
*option, const char *arg, int
 
 int cmd_blame(int argc, const char **argv, const char *prefix)
 {
-   struct rev_info revs;
const char *path;
struct scoreboard sb;
struct origin *o;
@@ -2548,8 +2549,6 @@ int cmd_blame(int argc, const char **argv, const char 
*prefix)
OPT_BIT('l', NULL, _option, N_("Show long commit SHA1 
(Default: off)"), OUTPUT_LONG_OBJECT_NAME),
OPT_BIT('s', NULL, _option, N_("Suppress author name and 
timestamp (Default: off)"), OUTPUT_NO_AUTHOR),
OPT_BIT('e', "show-email", _option, N_("Show author 
email instead of name (Default: off)"), OUTPUT_SHOW_EMAIL),
-   OPT_BIT('w', NULL, _opts, N_("Ignore whitespace 
differences"), XDF_IGNORE_WHITESPACE),
-   OPT_BIT(0, "minimal", _opts, N_("Spend extra cycles to find 
better match"), XDF_NEED_MINIMAL),
OPT_STRING('S', NULL, _file, N_("file"), N_("Use revisions 
from  instead of calling git-rev-list")),
OPT_STRING(0, "contents", _from, N_("file"), N_("Use 
's contents as the final image")),
{ OPTION_CALLBACK, 'C', NULL, , N_("score"), N_("Find line 
copies within and across files"), PARSE_OPT_OPTARG, blame_copy_callback },
diff --git a/t/t4059-diff-indent.sh b/t/t4059-diff-indent.sh
new file mode 100755
index 000..8b6c7ef
--- /dev/null
+++ b/t/t4059-diff-indent.sh
@@ -0,0 +1,160 @@
+#!/bin/sh
+
+test_description='Test diff indent heuristic.
+
+'
+. ./test-lib.sh
+. "$TEST_DIRECTORY"/diff-lib.sh
+
+# Compare two diff outputs. Ignore "index" lines, because we don't
+# care about SHA-1s or file modes.
+compare_diff () {
+   sed -e "/^index /d" <"$1" >.tmp-1
+   sed -e "/^index /d" <"$2" >.tmp-2
+   test_cmp .tmp-1 .tmp-2 && rm -f .tmp-1 .tmp-2
+}
+
+test_expect_success 'diff: favor trailing blank lines' '
+   cat <<-\EOF >old &&
+   1
+   2
+   a
+
+   b
+   3
+   4
+   EOF
+
+   cat <<-\EOF >new &&
+   1
+   2
+   a
+
+   b
+   a
+
+   b
+   3
+   4
+   EOF
+
+   tr "_" " " <<-\EOF >expect &&
+   diff --git a/old b/new
+   --- a/old
+   +++ b/new
+   @@ -3,5 +3,8 @@
+a
+   _
+b
+   +a
+   +
+   +b
+3
+4
+   EOF
+
+   tr "_" " " <<-\EOF >expect-compacted &&
+   diff --git a/old b/new
+   --- a/old
+   +++ b/new
+   @@ -2,6 +2,9 @@
+2
+a
+   _
+   +b
+   +a
+   +
+b
+3
+4
+   EOF
+
+   test_must_fail git diff --no-index old new >out &&
+   compare_diff expect out &&
+
+   test_must_fail git diff --no-index --indent-heuristic old new 
>out-compacted &&
+   compare_diff expect-compacted out-compacted &&
+
+   test_must_fail git -c diff.indentHeuristic=true diff --no-index old new 
>out-compacted2 &&
+   compare_diff expect-compacted out-compacted2 &&
+
+   test_must_fail git diff

[PATCH v2 4/7] recs_match(): take two xrecord_t pointers as arguments

2016-08-22 Thread Michael Haggerty

There is no reason for it to take an array and two indexes as argument,
as it only accesses two elements of the array.

Signed-off-by: Michael Haggerty 
---
 xdiff/xdiffi.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index ed2df64..8a5832a 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -405,11 +405,11 @@ static int is_blank_line(xrecord_t *rec, long flags)
return xdl_blankline(rec->ptr, rec->size, flags);
 }
 
-static int recs_match(xrecord_t **recs, long ixs, long ix, long flags)
+static int recs_match(xrecord_t *rec1, xrecord_t *rec2, long flags)
 {
-   return (recs[ixs]->ha == recs[ix]->ha &&
-   xdl_recmatch(recs[ixs]->ptr, recs[ixs]->size,
-recs[ix]->ptr, recs[ix]->size,
+   return (rec1->ha == rec2->ha &&
+   xdl_recmatch(rec1->ptr, rec1->size,
+rec2->ptr, rec2->size,
 flags));
 }
 
@@ -457,7 +457,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long 
flags) {
 * the last line of the current change group, shift 
backward
 * the group.
 */
-   while (ixs > 0 && recs_match(recs, ixs - 1, ix - 1, 
flags)) {
+   while (ixs > 0 && recs_match(recs[ixs - 1], recs[ix - 
1], flags)) {
rchg[--ixs] = 1;
rchg[--ix] = 0;
 
@@ -484,7 +484,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long 
flags) {
 * the line next of the current change group, shift 
forward
 * the group.
 */
-   while (ix < nrec && recs_match(recs, ixs, ix, flags)) {
+   while (ix < nrec && recs_match(recs[ixs], recs[ix], 
flags)) {
blank_lines += is_blank_line(recs[ix], flags);
 
rchg[ixs++] = 0;
@@ -525,7 +525,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long 
flags) {
 */
while (ixs > 0 &&
   !is_blank_line(recs[ix - 1], flags) &&
-  recs_match(recs, ixs - 1, ix - 1, flags)) {
+  recs_match(recs[ixs - 1], recs[ix - 1], flags)) {
rchg[--ixs] = 1;
rchg[--ix] = 0;
while (rchgo[--ixo]);
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 5/7] xdl_change_compact(): introduce the concept of a change group

2016-08-22 Thread Michael Haggerty

The idea of xdl_change_compact() is fairly simple:

* Proceed through groups of changed lines in the file to be compacted,
  keeping track of the corresponding location in the "other" file.

* If possible, slide the group up and down to try to give the most
  aesthetically pleasing diff. Whenever it is slid, the current location
  in the other file needs to be adjusted.

But these simple concepts are obfuscated by a lot of index handling that
is written in terse, subtle, and varied patterns. I found it very hard
to convince myself that the function was correct.

So introduce a "struct group" that represents a group of changed lines
in a file. Add some functions that perform elementary operations on
groups:

* Initialize a group to the first group in a file
* Move to the next or previous group in a file
* Slide a group up or down

Even though the resulting code is longer, I think it is easier to
understand and review. Its performance is not changed
appreciably (though it would be if `group_next()` and `group_previous()`
were not inlined).

...and in fact, the rewriting helped me discover another bug in the
--compaction-heuristic code: The update of blank_lines was never done
for the highest possible position of the group. This means that it could
fail to slide the group to its highest possible position, even if that
position had a blank line as its last line. So for example, it yielded
the following diff:

$ git diff --no-index --compaction-heuristic a.txt b.txt
diff --git a/a.txt b/b.txt
index e53969f..0d60c5fe 100644
--- a/a.txt
+++ b/b.txt
@@ -1,3 +1,7 @@
 1
 A
+
+B
+
+A
 2

when in fact the following diff is better (according to the rules of
--compaction-heuristic):

$ git diff --no-index --compaction-heuristic a.txt b.txt
diff --git a/a.txt b/b.txt
index e53969f..0d60c5fe 100644
--- a/a.txt
+++ b/b.txt
@@ -1,3 +1,7 @@
 1
+A
+
+B
+
 A
 2

The new code gives the bottom answer.

Signed-off-by: Michael Haggerty 
---
 xdiff/xdiffi.c | 293 +++--
 1 file changed, 203 insertions(+), 90 deletions(-)

diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 8a5832a..44fded6 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -413,126 +413,239 @@ static int recs_match(xrecord_t *rec1, xrecord_t *rec2, 
long flags)
 flags));
 }
 
-int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long flags) {
-   long ix, ixo, ixs, ixref, grpsiz, nrec = xdf->nrec;
-   char *rchg = xdf->rchg, *rchgo = xdfo->rchg;
-   unsigned int blank_lines;
-   xrecord_t **recs = xdf->recs;
+/*
+ * Represent a group of changed lines in an xdfile_t (i.e., a contiguous group
+ * of lines that was inserted or deleted from the corresponding version of the
+ * file). We consider there to be such a group at the beginning of the file, at
+ * the end of the file, and between any two unchanged lines, though most such
+ * groups will usually be empty.
+ *
+ * If the first line in a group is equal to the line following the group, then
+ * the group can be slid down. Similarly, if the last line in a group is equal
+ * to the line preceding the group, then the group can be slid up. See
+ * group_slide_down() and group_slide_up().
+ *
+ * Note that loops that are testing for changed lines in xdf->rchg do not need
+ * index bounding since the array is prepared with a zero at position -1 and N.
+ */
+struct group {
+   /*
+* The index of the first changed line in the group, or the index of
+* the unchanged line above which the (empty) group is located.
+*/
+   long start;
 
/*
-* This is the same of what GNU diff does. Move back and forward
-* change groups for a consistent and pretty diff output. This also
-* helps in finding joinable change groups and reduce the diff size.
+* The index of the first unchanged line after the group. For an empty
+* group, end is equal to start.
 */
-   for (ix = ixo = 0;;) {
-   /*
-* Find the first changed line in the to-be-compacted file.
-* We need to keep track of both indexes, so if we find a
-* changed lines group on the other file, while scanning the
-* to-be-compacted file, we need to skip it properly. Note
-* that loops that are testing for changed lines on rchg* do
-* not need index bounding since the array is prepared with
-* a zero at position -1 and N.
-*/
-   for (; ix < nrec && !rchg[ix]; ix++)
-   while (rchgo[ixo++]);
-   if (ix == nrec)
-   break;
+   long end;
+};
+
+/*
+ * Initialize g to point at the first group in xdf.
+ */
+static void group_init(xdfile_t *xdf, struct group *g)
+{
+

[PATCH v2 6/7] diff: improve positioning of add/delete blocks in diffs

2016-08-22 Thread Michael Haggerty

Some groups of added/deleted lines in diffs can be slid up or down,
because lines at the edges of the group are not unique. Picking good
shifts for such groups is not a matter of correctness but definitely has
a big effect on aesthetics. For example, consider the following two
diffs. The first is what standard Git emits:

--- a/9c572b21dd090a1e5c5bb397053bf8043ffe7fb4:git-send-email.perl
+++ b/6dcfa306f2b67b733a7eb2d7ded1bc9987809edb:git-send-email.perl
@@ -231,6 +231,9 @@ if (!defined $initial_reply_to && $prompting) {
 }

 if (!$smtp_server) {
+   $smtp_server = $repo->config('sendemail.smtpserver');
+}
+if (!$smtp_server) {
foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) {
if (-x $_) {
$smtp_server = $_;

The following diff is equivalent, but is obviously preferable from an
aesthetic point of view:

--- a/9c572b21dd090a1e5c5bb397053bf8043ffe7fb4:git-send-email.perl
+++ b/6dcfa306f2b67b733a7eb2d7ded1bc9987809edb:git-send-email.perl
@@ -230,6 +230,9 @@ if (!defined $initial_reply_to && $prompting) {
$initial_reply_to =~ s/(^\s+|\s+$)//g;
 }

+if (!$smtp_server) {
+   $smtp_server = $repo->config('sendemail.smtpserver');
+}
 if (!$smtp_server) {
foreach (qw( /usr/sbin/sendmail /usr/lib/sendmail )) {
if (-x $_) {

This patch teaches Git to pick better positions for such "diff sliders"
using heuristics that take the positions of nearby blank lines and the
indentation of nearby lines into account.

The existing Git code basically always shifts such "sliders" as far down
in the file as possible. The only exception is when the slider can be
aligned with a group of changed lines in the other file, in which case
Git favors depicting the change as one add+delete block rather than one
add and a slightly offset delete block. This naive algorithm often
yields ugly diffs.

Commit d634d61ed6 improved the situation somewhat by preferring to
position add/delete groups to make their last line a blank line, when
that is possible. This heuristic does more good than harm, but (1) it
can only help if there are blank lines in the right places, and (2)
always picks the last blank line, even if there are others that might be
better. The end result is that it makes perhaps 1/3 as many errors as
the default Git algorithm, but that still leaves a lot of ugly diffs.

This commit implements a new and much better heuristic for picking
optimal "slider" positions using the following approach: First observe
that each hypothetical positioning of a diff slider introduces two
splits: one between the context lines preceding the group and the first
added/deleted line, and the other between the last added/deleted line
and the first line of context following it. It tries to find the
positioning that creates the least bad splits.

Splits are evaluated based only on the presence and locations of nearby
blank lines, and the indentation of lines near the split. Basically, it
prefers to introduce splits adjacent to blank lines, between lines that
are indented less, and between lines with the same level of indentation.
In more detail:

1. It measures the following characteristics of a proposed splitting
   position in a `struct split_measurement`:

   * the number of blank lines above the proposed split
   * whether the line directly after the split is blank
   * the number of blank lines following that line
   * the indentation of the nearest non-blank line above the split
   * the indentation of the line directly below the split
   * the indentation of the nearest non-blank line after that line

2. It combines the measured attributes using a bunch of
   empirically-optimized weighting factors to derive a `struct
   split_score` that measures the "badness" of splitting the text at
   that position.

3. It combines the `split_score` for the top and the bottom of the
   slider at each of its possible positions, and selects the position
   that has the best `split_score`.

I determined the initial set of weighting factors by collecting a corpus
of Git histories from 29 open-source software projects in various
programming languages. I generated many diffs from this corpus, and
determined the best positioning "by eye" for about 6600 diff sliders. I
used about half of the repositories in the corpus (corresponding to
about 2/3 of the sliders) as a training set, and optimized the weights
against this corpus using a crude automated search of the parameter
space to get the best agreement with the manually-determined values.
Then I tested the resulting heuristic against the full corpus. The
results are summarized in the following table, in column `indent-1`:

| repository| count |  Git 2.9.0 | compaction | 
compaction-fixed |   indent-1 |   indent-2 |
| - | - | -- | -- | 
 | -- |

[PATCH v2 3/7] is_blank_line(): take a single xrecord_t as argument

2016-08-22 Thread Michael Haggerty

There is no reason for it to take an array and index as argument, as it
only accesses a single element of the array.

Signed-off-by: Michael Haggerty 
---
 xdiff/xdiffi.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 61deed8..ed2df64 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -400,9 +400,9 @@ static xdchange_t *xdl_add_change(xdchange_t *xscr, long 
i1, long i2, long chg1,
 }
 
 
-static int is_blank_line(xrecord_t **recs, long ix, long flags)
+static int is_blank_line(xrecord_t *rec, long flags)
 {
-   return xdl_blankline(recs[ix]->ptr, recs[ix]->size, flags);
+   return xdl_blankline(rec->ptr, rec->size, flags);
 }
 
 static int recs_match(xrecord_t **recs, long ixs, long ix, long flags)
@@ -485,7 +485,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long 
flags) {
 * the group.
 */
while (ix < nrec && recs_match(recs, ixs, ix, flags)) {
-   blank_lines += is_blank_line(recs, ix, flags);
+   blank_lines += is_blank_line(recs[ix], flags);
 
rchg[ixs++] = 0;
rchg[ix++] = 1;
@@ -524,7 +524,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long 
flags) {
 * back only if at all.
 */
while (ixs > 0 &&
-  !is_blank_line(recs, ix - 1, flags) &&
+  !is_blank_line(recs[ix - 1], flags) &&
   recs_match(recs, ixs - 1, ix - 1, flags)) {
rchg[--ixs] = 1;
rchg[--ix] = 0;
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 1/7] xdl_change_compact(): fix compaction heuristic to adjust ixo

2016-08-22 Thread Michael Haggerty

The code branch used for the compaction heuristic forgot to keep ixo in
sync while the group was shifted. This is certainly wrong, as it causes
the two counters to get out of sync.

I *think* that this bug could also have caused the function to read past
the end of the rchgo array, though I haven't done the work to prove it
for sure. Here is my reasoning:

If ixo is not decremented correctly during one iteration of the outer
while loop, then it will loose sync with the ix counter. In particular,
ixo will be too large.

Suppose that the next iterations of the outer while loop (i.e.,
processing the next block of add/delete lines) don't have any sliders.
Then the ixo counter would be incremented by the number of non-changed
lines in xdf, which is the same as the number of non-changed lines in
xdfo that *should have* followed the group that experienced the
malfunction. But since ixo was too large at the end of that iteration,
it will be incremented past the end of the xdfo->rchg array, and will
try to read that memory illegally.

Signed-off-by: Michael Haggerty 
---
 xdiff/xdiffi.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index b3c6848..95b037e 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -528,6 +528,7 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, long 
flags) {
   recs_match(recs, ixs - 1, ix - 1, flags)) {
rchg[--ixs] = 1;
rchg[--ix] = 0;
+   while (rchgo[--ixo]);
}
}
}
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 0/7] Better heuristics make prettier diffs

2016-08-22 Thread Michael Haggerty

This is v2 of a patch series to improve the heuristics that `git diff`
and related commands use to position ambiguous blocks of added/deleted
lines. Thanks to Stefan, Jacob, Peff, Junio, and Jakub for their
comments about v1 [1]. This patch series is also available from my
GitHub account [2] as branch `diff-indent-heuristic`.

This version started out as a light touch-up of v1, but as I was
working I realized that it needed more changes. Among other things,
the final heuristic is now improved relative to the one proposed in
v1. Summary of changes:

* I changed the `--compaction-heuristic` code such that if a group of
  added/deleted lines can be aligned with a group of deleted/added
  lines in the other file, then that should be done rather than try to
  slide to a blank line. In the existing code, the compaction
  heuristic was attempted, incorrectly, *after* trying to align
  groups, which doesn't make sense.

* I changed `recs_match()` similarly to `is_blank_line()`: now it
  takes two `xrecord_t *` as arguments rather than an array of
  `xrecord_t` and two indexes.

* I refactored the old `xdl_change_compact()` more thoroughly. All of
  the index manipulation was pretty confusing, and I was having
  trouble convincing myself that even the old code was working
  correctly. So I introduced a `struct group`, which is like a cursor
  that keeps track of a (possibly empty) group of changed lines in the
  old or new version of a file. I added functions to initialize a
  group to the first change in a file, to move to the next or previous
  groups, and to slide a group forward or backward if possible (i.e.,
  if the group is a "slider").

* In the course of that refactoring, I found (and fixed) another bug
  in the `--compaction-heuristic` code: it didn't notice if the top
  possible position of a slider had a blank line as its last line. See
  the commit message of patch [5/5] for more details.

* I realized that the behavior of the indent heuristic from v1,
  because it multiplied weighting factors times indent widths, would
  behave differently if a project changed its convention for how many
  spaces to use per level of indentation. That doesn't make sense, so
  I changed the handling of indentation:

  Now, the sum of the indentation width at the top and bottom of the
  slider are added, *but those sums are only compared* between slider
  positions, and the weight is multipled by -1, 0, or +1 depending on
  whether the first indent sum is less than, equal, or greater than
  the other. This should make the behavior relatively independent of
  the project's indentation convention, and thus make the heuristic
  work more consistently across projects.

  The resulting heuristic works significantly better than the one
  proposed in v1:

  | repository| count |  Git 2.9.0 | v1 |   
  v2 |
  | - | - | -- | -- | 
-- |
  | afnetworking  |   109 |89  (81.7%) | 3   (2.8%) | 2   
(1.8%) |
  | alamofire |30 |18  (60.0%) | 2   (6.7%) | 0   
(0.0%) |
  | angular   |   184 |   127  (69.0%) |15   (8.2%) | 5   
(2.7%) |
  | animate   |   313 | 2   (0.6%) | 2   (0.6%) | 2   
(0.6%) |
  | ant   |   380 |   356  (93.7%) |22   (5.8%) |15   
(3.9%) |
  | bugzilla  |   306 |   263  (85.9%) |24   (7.8%) |15   
(4.9%) |
  | corefx|   126 |91  (72.2%) |15  (11.9%) | 6   
(4.8%) |
  | couchdb   |78 |44  (56.4%) |17  (21.8%) | 6   
(7.7%) |
  | cpython   |   937 |   158  (16.9%) |26   (2.8%) | 5   
(0.5%) |
  | discourse |   160 |95  (59.4%) |24  (15.0%) |13   
(8.1%) |
  | docker|   307 |   194  (63.2%) |16   (5.2%) | 8   
(2.6%) |
  | electron  |   163 |   132  (81.0%) |15   (9.2%) | 6   
(3.7%) |
  | git   |   536 |   470  (87.7%) |18   (3.4%) |16   
(3.0%) |
  | gitflow   |   127 | 0   (0.0%) | 0   (0.0%) | 0   
(0.0%) |
  | ionic |   133 |89  (66.9%) | 6   (4.5%) | 1   
(0.8%) |
  | ipython   |   482 |   362  (75.1%) |45   (9.3%) |11   
(2.3%) |
  | junit |   161 |   147  (91.3%) | 4   (2.5%) | 1   
(0.6%) |
  | lighttable|15 | 5  (33.3%) | 2  (13.3%) | 0   
(0.0%) |
  | magit |88 |75  (85.2%) | 5   (5.7%) | 0   
(0.0%) |
  | neural-style  |28 | 0   (0.0%) | 0   (0.0%) | 0   
(0.0%) |
  | nodejs|   781 |   649  (83.1%) |98  (12.5%) | 5   
(0.6%) |
  | phpmyadmin|   491 |   481  (98.0%) | 7   (1.4%) | 2   
(0.4%) |
  | react-native  |   168 |   130  (77.4%) | 5   (3.0%) | 0   
(0.0%) |
  | rust

[PATCH v2 2/7] xdl_change_compact(): only use heuristic if group can't be matched

2016-08-22 Thread Michael Haggerty

If the changed group of lines can be matched to a group in the other
file, then that positioning should take precedence over the compaction
heuristic.

The old code tried the heuristic unconditionally, which cost redundant
effort and also was broken if the matching code had already shifted the
group higher than the blank line.

Signed-off-by: Michael Haggerty 
---
 xdiff/xdiffi.c | 38 +++---
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
index 95b037e..61deed8 100644
--- a/xdiff/xdiffi.c
+++ b/xdiff/xdiffi.c
@@ -504,25 +504,25 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, 
long flags) {
}
} while (grpsiz != ix - ixs);
 
-   /*
-* Try to move back the possibly merged group of changes, to 
match
-* the recorded position in the other file.
-*/
-   while (ixref < ix) {
-   rchg[--ixs] = 1;
-   rchg[--ix] = 0;
-   while (rchgo[--ixo]);
-   }
-
-   /*
-* If a group can be moved back and forth, see if there is a
-* blank line in the moving space. If there is a blank line,
-* make sure the last blank line is the end of the group.
-*
-* As we already shifted the group forward as far as possible
-* in the earlier loop, we need to shift it back only if at all.
-*/
-   if ((flags & XDF_COMPACTION_HEURISTIC) && blank_lines) {
+   if (ixref < ix) {
+   /*
+* Try to move back the possibly merged group of 
changes, to match
+* the recorded position in the other file.
+*/
+   while (ixref < ix) {
+   rchg[--ixs] = 1;
+   rchg[--ix] = 0;
+   while (rchgo[--ixo]);
+   }
+   } else if ((flags & XDF_COMPACTION_HEURISTIC) && blank_lines) {
+   /*
+* The group can be slid up to make its last line a
+* blank line. Do so.
+*
+* As we already shifted the group forward as far as
+* possible in the earlier loop, we need to shift it
+* back only if at all.
+*/
while (ixs > 0 &&
   !is_blank_line(recs, ix - 1, flags) &&
   recs_match(recs, ixs - 1, ix - 1, flags)) {
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

68 matches

Mail list logo