Git Bash and Git GUI freezing just after opening

2016-09-08 Thread André Marcondes
I have a fresh git installation on my windows 10 machine, but I am
unable to use the Git Bash or the Git GUI because the program freezes
just after opening.
The only difference I have made on my machine was to update Windows 10
Pro to it's latest version (build 14393.105) and change the default
Documents folder to another partition.
I really need to fix this. How can I get some help?


[PATCH] Allow stashes to be referenced by index only

2016-09-08 Thread Aaron M Watson
Instead of referencing "stash@{n}" explicitly, it can simply be
referenced as "n".  Most users only reference stashes by their position
in the stash stask (what I refer to as the "index"). The syntax for the
typical stash (stash@{n}) is slightly annoying and easy to forget, and
sometimes difficult to escape properly in a script. Because of this the
capability to do things with the stash by simply referencing the index
is desirable.

This patch includes the superior implementation provided by Øsse Walle
(thanks for that), with a slight change to fix a broken test in the test
suite. I also merged the test scripts as suggested by Jeff King, and
un-wrapped the documentation as suggested by Junio Hamano.

Signed-off-by: Aaron M Watson 
---
 Documentation/git-stash.txt |  3 ++-
 git-stash.sh| 17 +++--
 t/t3903-stash.sh| 35 +++
 3 files changed, 52 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-stash.txt b/Documentation/git-stash.txt
index 92df596..2e9cef0 100644
--- a/Documentation/git-stash.txt
+++ b/Documentation/git-stash.txt
@@ -39,7 +39,8 @@ The latest stash you created is stored in `refs/stash`; older
 stashes are found in the reflog of this reference and can be named using
 the usual reflog syntax (e.g. `stash@{0}` is the most recently
 created stash, `stash@{1}` is the one before it, `stash@{2.hours.ago}`
-is also possible).
+is also possible). Stashes may also be referenced by specifying just the
+stash index (e.g. the integer `n` is equivalent to `stash@{n}`).
 
 OPTIONS
 ---
diff --git a/git-stash.sh b/git-stash.sh
index 826af18..d8d3b8d 100755
--- a/git-stash.sh
+++ b/git-stash.sh
@@ -384,9 +384,10 @@ parse_flags_and_rev()
i_tree=
u_tree=
 
-   REV=$(git rev-parse --no-flags --symbolic --sq "$@") || exit 1
+   REV=$(git rev-parse --no-flags --symbolic --sq "$@" 2> /dev/null)
 
FLAGS=
+   ARGV=
for opt
do
case "$opt" in
@@ -404,10 +405,13 @@ parse_flags_and_rev()
die "$(eval_gettext "unknown option: 
\$opt")"
FLAGS="${FLAGS}${FLAGS:+ }$opt"
;;
+   *)
+   ARGV="${ARGV}${ARGV:+ }'$opt'"
+   ;;
esac
done
 
-   eval set -- $REV
+   eval set -- $ARGV
 
case $# in
0)
@@ -422,6 +426,15 @@ parse_flags_and_rev()
;;
esac
 
+   case "$1" in
+   *[!0-9]*)
+   :
+   ;;
+   *)
+   set -- "${ref_stash}@{$1}"
+   ;;
+   esac
+
REV=$(git rev-parse --symbolic --verify --quiet "$1") || {
reference="$1"
die "$(eval_gettext "\$reference is not a valid reference")"
diff --git a/t/t3903-stash.sh b/t/t3903-stash.sh
index 2142c1f..f82a8c4 100755
--- a/t/t3903-stash.sh
+++ b/t/t3903-stash.sh
@@ -131,6 +131,26 @@ test_expect_success 'drop middle stash' '
test 1 = $(git show HEAD:file)
 '
 
+test_expect_success 'drop middle stash by index' '
+   git reset --hard &&
+   echo 8 > file &&
+   git stash &&
+   echo 9 > file &&
+   git stash &&
+   git stash drop 1 &&
+   test 2 = $(git stash list | wc -l) &&
+   git stash apply &&
+   test 9 = $(cat file) &&
+   test 1 = $(git show :file) &&
+   test 1 = $(git show HEAD:file) &&
+   git reset --hard &&
+   git stash drop &&
+   git stash apply &&
+   test 3 = $(cat file) &&
+   test 1 = $(git show :file) &&
+   test 1 = $(git show HEAD:file)
+'
+
 test_expect_success 'stash pop' '
git reset --hard &&
git stash pop &&
@@ -604,6 +624,21 @@ test_expect_success 'invalid ref of the form stash@{n}, n 
>= N' '
git stash drop
 '
 
+test_expect_success 'invalid ref of the form "n", n >= N' '
+   git stash clear &&
+   test_must_fail git stash drop 0 &&
+   echo bar5 > file &&
+   echo bar6 > file2 &&
+   git add file2 &&
+   git stash &&
+   test_must_fail git stash drop 1 &&
+   test_must_fail git stash pop 1 &&
+   test_must_fail git stash apply 1 &&
+   test_must_fail git stash show 1 &&
+   test_must_fail git stash branch tmp 1 &&
+   git stash drop
+'
+
 test_expect_success 'stash branch should not drop the stash if the branch 
exists' '
git stash clear &&
echo foo >file &&
-- 
2.7.4



What's cooking in git.git (Sep 2016, #02; Thu, 8)

2016-09-08 Thread Junio C Hamano
Here are the topics that have been cooking.  Commits prefixed with
'-' are only in 'pu' (proposed updates) while commits prefixed with
'+' are in 'next'.  The ones marked with '.' do not appear in any of
the integration branches, but I am still holding onto them.

There are a few more topics in flight that may be ready to be picked
up but I haven't, and other topics in flight that may not be quite
ready.  I'll start merging topics that have been cooking in 'next'
to 'master', rewind and rebuild 'next', and merge those that have
been waiting in 'pu' to 'next' before picking them up.

You can find the changes described here in the integration branches
of the repositories listed at

http://git-blame.blogspot.com/p/git-public-repositories.html

--
[New Topics]

* bc/object-id (2016-09-07) 20 commits
 - builtin/reset: convert to use struct object_id
 - builtin/commit-tree: convert to struct object_id
 - builtin/am: convert to struct object_id
 - refs: add an update_ref_oid function.
 - sha1_name: convert get_sha1_mb to struct object_id
 - builtin/update-index: convert file to struct object_id
 - notes: convert init_notes to use struct object_id
 - builtin/rm: convert to use struct object_id
 - builtin/blame: convert file to use struct object_id
 - Convert read_mmblob to take struct object_id.
 - notes-merge: convert struct notes_merge_pair to struct object_id
 - builtin/checkout: convert some static functions to struct object_id
 - streaming: make stream_blob_to_fd take struct object_id
 - builtin: convert textconv_object to use struct object_id
 - builtin/cat-file: convert some static functions to struct object_id
 - builtin/cat-file: convert struct expand_data to use struct object_id
 - builtin/log: convert some static functions to use struct object_id
 - builtin/blame: convert struct origin to use struct object_id
 - builtin/apply: convert static functions to struct object_id
 - cache: convert struct cache_entry to use struct object_id

 The "unsigned char sha1[20]" to "struct object_id" conversion
 continues.  Notable changes in this round includes that ce->sha1,
 i.e. the object name recorded in the cache_entry, turns into an
 object_id.

 It had merge conflicts with a few topics in flight (Christian's
 "apply.c split", Dscho's "cat-file --filters" and Jeff Hostetler's
 "status --porcelain-v2").  Extra sets of eyes double-checking for
 mismerges are highly appreciated.


* ep/use-git-trace-curl-in-tests (2016-09-07) 4 commits
 - t5551-http-fetch-smart.sh: use the GIT_TRACE_CURL environment var
 - t5550-http-fetch-dumb.sh: use the GIT_TRACE_CURL environment var
 - test-lib.sh: preserve GIT_TRACE_CURL from the environment
 - t5541-http-push-smart.sh: use the GIT_TRACE_CURL environment var

 Update a few tests that used to use GIT_CURL_VERBOSE to use the
 newer GIT_TRACE_CURL.

 Will merge to 'next'.


* jk/pack-tag-of-tag (2016-09-07) 5 commits
 - pack-objects: walk tag chains for --include-tag
 - t5305: simplify packname handling
 - t5305: use "git -C"
 - t5305: drop "dry-run" of unpack-objects
 - t5305: move cleanup into test block

 "git pack-objects --include-tag" was taught that when we know that
 we are sending an object C, we want a tag B that directly points at
 C but also a tag A that points at the tag B.  We used to miss the
 intermediate tag B in some cases.


* js/t6026-clean-up (2016-09-07) 1 commit
 - t6026-merge-attr: clean up background process at end of test case

 A test spawned a short-lived background process, which sometimes
 prevented the test directory from getting removed at the end of the
 script on some platforms.

 Will merge to 'next'.


* js/t9903-chaining (2016-09-07) 1 commit
 - t9903: fix broken && chain

 Will merge to 'next'.


* jt/accept-capability-advertisement-when-fetching-from-void (2016-09-07) 2 
commits
 - connect: advertized capability is not a ref
 - tests: move test_lazy_prereq JGIT to test-lib.sh

 JGit can show a fake ref "capabilities^{}" to "git fetch" when it
 does not advertise any refs, but "git fetch" was not prepared to
 see such an advertisement.

 Waiting for a reroll.
 Rewording the log, and avoiding making it overly loose are needed.
 cf. <20160908013431.gc25...@google.com>


* rs/compat-strdup (2016-09-07) 1 commit
 - compat: move strdup(3) replacement to its own file

 Will merge to 'next'.


* rs/hex2chr (2016-09-07) 1 commit
 - introduce hex2chr() for converting two hexadecimal digits to a character

 Will merge to 'next'.


* rt/rebase-i-broken-insn-advise (2016-09-07) 1 commit
 - rebase -i: improve advice on bad instruction lines

 When "git rebase -i" is given a broken instruction, it told the
 user to fix it with "--edit-todo", but didn't say what the step
 after that was (i.e. "--continue").

 Will hold.
 Dscho's "rebase -i" hopefully will become available in 'pu', by
 which time an equivalent of this fix would be ported to C.  This is
 queued merely as a reminder.


* 

Re: [PATCH v7 08/10] convert: modernize tests

2016-09-08 Thread Stefan Beller
On Thu, Sep 8, 2016 at 11:21 AM,   wrote:
> From: Lars Schneider 
>
> Use `test_config` to set the config, check that files are empty with
> `test_must_be_empty`, compare files with `test_cmp`, and remove spaces
> after ">" and "<".
>
> Please note that the "rot13" filter configured in "setup" keeps using
> `git config` instead of `test_config` because subsequent tests might
> depend on it.
>
> Signed-off-by: Lars Schneider 
> ---

Makes sense & Reviewed-by "Stefan Beller "

Thanks,
Stefan


Re: [PATCH v7 06/10] pkt-line: add functions to read/write flush terminated packet streams

2016-09-08 Thread Stefan Beller
On Thu, Sep 8, 2016 at 11:21 AM,   wrote:
> From: Lars Schneider 
>
> write_packetized_from_fd() and write_packetized_from_buf() write a
> stream of packets. All content packets use the maximal packet size
> except for the last one. After the last content packet a `flush` control
> packet is written.

I presume we need both write_* things in a later patch; can you clarify why
we need both of them?


> +   if (paket_len < 0) {
> +   if (oldalloc == 0)
> +   strbuf_release(sb_out);

So if old alloc is 0, we release it, which is documented as
/**
 * Release a string buffer and the memory it used. You should not use the
 * string buffer after using this function, unless you initialize it again.
 */

> +   else
> +   strbuf_setlen(sb_out, oldlen);

Otherwise we just set the length back, such that it looks like before.

So as a caller the strbuf is in a different state in case of error
depending whether
the strbuf already had some data in it. I think it would be better if
we only did
`strbuf_setlen(sb_out, oldlen);` here, such that the caller can
strbuf_release it
unconditionally.

Or to make things more confusing, you could use strbuf_reset in case of 0,
as that is a strbuf_setlen internally. ;)

> @@ -77,6 +79,11 @@ char *packet_read_line(int fd, int *size);
>   */
>  char *packet_read_line_buf(char **src_buf, size_t *src_len, int *size);
>
> +/*
> + * Reads a stream of variable sized packets until a flush packet is detected.

Strictly speaking we read until a packet of size 0 appears, but as per
the implementation
of packet_read we cannot distinguish between "" and "0004", i.e.
an empty non-flush
packet. So I think we're fine both in the implementation as well as
the documentation here.


Re: [PATCH v2 38/38] refs: implement iteration over only per-worktree refs

2016-09-08 Thread David Turner
Other than the duplicated sign-offs, this series looks good to me
("Don't act surprised, you guys, cuz I wrote 'em").

Kind of a funny place to cut it off, but I guess it makes sense.

On Sun, 2016-09-04 at 18:08 +0200, Michael Haggerty wrote:
> From: David Turner 
> 
> Alternate refs backends might still use files to store per-worktree
> refs. So provide a way to iterate over only the per-worktree references
> in a ref_store. The other backend can set up a files ref_store and
> iterate using the new DO_FOR_EACH_PER_WORKTREE_ONLY flag when iterating.




Bug: git-p4 can generate duplicate commits when syncing changes that span multiple depot paths

2016-09-08 Thread James Farwell
Reproduction Steps:

1. Have a git repo cloned from a perforce repo using multiple depot paths (e.g. 
//depot/foo and //depot/bar).
2. Submit a single change to the perforce repo that makes changes in both 
//depot/foo and //depot/bar.
3. Run "git p4 sync" to sync the change from #2.


Expected Behavior:

Change should be synced as a single commit to the git repo.


Actual Behavior:

Change is synced as multiple commits, one for each depot path that was affected.


Best Guess:

I believe this is happening because the command syntax "p4 changes 
//depot/foo/...@123,456 //depot/bar/...@123,456", which git-p4 uses to get the 
list of changes to sync, will return the same change number multiple times if 
the change was present in multiple depot paths. This is expected behavior as 
per the p4 changes documentation: "If p4 changes is called with multiple file 
arguments, the sets of changelists that affect each argument are evaluated 
individually. The final output is neither combined nor sorted; the effect is 
the same as calling p4 changes multiple times, once for each file argument." 
git-p4 is handling the sorting itself, but it is not handling the combining.

I would imagine this is fixable in the p4ChangesForPaths() method by dropping 
non-unique elements of the list before or after sorting. Rudimentary testing in 
the python interpreter would suggest that something like "changes = 
sorted(set(changes))" should do the trick, but I am no python expert so there 
may be a better way.


Thanks!
- James


Re: [PATCH] checkout: eliminate unnecessary merge for trivial checkout

2016-09-08 Thread Jeff King
On Thu, Sep 08, 2016 at 02:22:16PM -0700, Junio C Hamano wrote:

> > +   /*
> > +* Optimize the performance of checkout when the current and
> > +* new branch have the same OID and avoid the trivial merge.
> > +* For example, a "git checkout -b foo" just needs to create
> > +* the new ref and report the stats.
> > +*/
> > +   if (!old.commit || !new->commit
> > +   || oidcmp(>object.oid, >commit->object.oid)
> > +   || !opts->new_branch || opts->new_branch_force || 
> > opts->new_orphan_branch
> > +   || opts->patch_mode || opts->merge || opts->force || 
> > opts->force_detach
> > +   || opts->writeout_stage || !opts->overwrite_ignore
> > +   || opts->ignore_skipworktree || opts->ignore_other_worktrees
> > +   || opts->new_branch_log || opts->branch_exists || opts->prefix
> > +   || opts->source_tree) {
> 
> ... this is a maintenance nightmare in that any new option we will
> add later will need to consider what this "optimization" is trying
> (not) to skip.  The first two lines (i.e. we need a real checkout if
> we cannot positively say that old and new commits are the same
> object) are clear, but no explanation was given for all the other
> random conditions this if condition checks.  What if opts->something
> was not listed (or "listed" for that matter) in the list above--it
> is totally unclear if it was missed by mistake (or "added by
> mistake") or deliberately excluded (or "deliberately added").
> 
> For example, why is opts->prefix there?  If
> 
>   git checkout -b new-branch HEAD
> 
> should be able to omit the two-way merge, shouldn't
> 
>   cd t && git checkout -b new-branch HEAD
> 
> also be able to?

I was just writing another reply, but I think our complaints may have
dovetailed.

My issue is that the condition above is an unreadable mass.  It would be
really nice to pull it out into a helper function, and then all of the
items could be split out and commented independently, like:

  static int needs_working_tree_merge(const struct checkout_opts *opts,
  const struct branch_info *old,
  const struct branch_info *new)
  {
/*
 * We must do the merge if we are actually moving to a new
 * commit.
 */
if (!old->commit || !new->commit ||
oidcmp(>object.oid, >commit->object.oid))
return 1;

/* Option "foo" is not compatible because of... */
if (opts->foo)
return 1;

... etc ...
  }

That still leaves your "what if opts->something is not listed" question
open, but at least it makes it easier to comment on it in the code.

-Peff

PS I didn't think hard on whether the conditions above make _sense_. My
   first goal would be to get more communication about them individually,
   and then we can evaluate them.



Re: [PATCH] gpg-interface: reflect stderr to stderr

2016-09-08 Thread Junio C Hamano
Jeff King  writes:

>> Even though this patch is fixing only one of the two issues, I am
>> tempted to say that we should queue it for now, as it does so
>> without breaking a bigger gain made by the original, i.e. we learn
>> the status of verification in a way the authors of GPG wants us to,
>> while somebody figuires out what the best way is to show the prompt
>> to the console on Windows.
>
> That's OK by me, but I don't know if we can put off the "best way to
> show the prompt" fix. It seems like a pretty serious regression for
> people on Windows.

Yes, I am not saying that it is OK to keep Windows users broken.

As I understand what Dscho said correctly, his users are covered by
a reversion of the "read the GPG status correctly" patch, i.e. with
a different trade-off between the correctness of GPG status vs
usability of the prompt, he will ship with Git for Windows, and that
stop-gap measure will last only until developers who can do Windows
(which excludes you, me, and Michael it seems) comes up with a
solution that satisfies both.

I consider that an approach that is perfectly fine.




Re: [PATCH 4/5] versioncmp: pass full tagnames to swap_prereleases()

2016-09-08 Thread Junio C Hamano
SZEDER Gábor  writes:

> I'm not sure about the relevancy of this pararaph, or the relevancy of
> the original version for that matter.  I mean, there is a different
> character for sure, so it's really rather obvious that it can't
> possibly be the same suffix in both, isn't it?  So I don't think it
> adds much value, and don't mind deleting it in the reroll.

Concurred.  Let's lose this confusing statement.

Thanks.


Re: [PATCH v2 3/3] Use the newly-introduced regexec_buf() function

2016-09-08 Thread Junio C Hamano
Jeff King  writes:

>> commit f96e5673 ("grep: use REG_STARTEND for all matching if available",
>> 22-05-2010) introduced this test and expects ".. NUL characters themselves
>> are not matched in any way". With the native library on cygwin they are
>> matched, with the compat/regex they are not. Indeed, if you use the system
>> 'grep' command (rather than 'git grep'), then it will also not match ... :-D
>> 
>> Slightly off topic, but ...
>
> Hmm. So it sounds like the "regmatch" in grep.c could go away in favor
> of Johannes's regexec_buf(), and cygwin ought to be using NO_REGEX.

Sounds like a plan.


Re: [PATCH v7 05/10] pkt-line: add packet_write_gently()

2016-09-08 Thread Stefan Beller
On Thu, Sep 8, 2016 at 11:21 AM,   wrote:
> From: Lars Schneider 
>
> packet_write_fmt_gently() uses format_packet() which lets the caller
> only send string data via "%s". That means it cannot be used for
> arbitrary data that may contain NULs.

Makes sense.

>
> Add packet_write_gently() which writes arbitrary data and returns `0`
> for success and `-1` for an error.

I think documenting the return code is better done in either the header file
or in a commend preceding the implementation instead of the commit message?

Maybe just a generic comment for *_gently is good enough, maybe even no
comment. So the commit is fine, too. I dunno.


> This function is used by other
> pkt-line functions in a subsequent patch.

That's what I figured. Do we also need to mention that in the preceding patch
for packet_flush_gently ?

>
> Signed-off-by: Lars Schneider 
> ---
>  pkt-line.c | 19 +++
>  1 file changed, 19 insertions(+)
>
> diff --git a/pkt-line.c b/pkt-line.c
> index 37345ca..1d3d725 100644
> --- a/pkt-line.c
> +++ b/pkt-line.c
> @@ -181,6 +181,25 @@ int packet_write_fmt_gently(int fd, const char *fmt, ...)
> return status;
>  }
>
> +int packet_write_gently(const int fd_out, const char *buf, size_t size)
> +{
> +   static char packet_write_buffer[LARGE_PACKET_MAX];
> +
> +   if (size > sizeof(packet_write_buffer) - 4) {
> +   error("packet write failed");
> +   return -1;
> +   }
> +   packet_trace(buf, size, 1);
> +   size += 4;
> +   set_packet_header(packet_write_buffer, size);
> +   memcpy(packet_write_buffer + 4, buf, size - 4);
> +   if (write_in_full(fd_out, packet_write_buffer, size) == size)
> +   return 0;
> +
> +   error("packet write failed");
> +   return -1;
> +}
> +
>  void packet_buf_write(struct strbuf *buf, const char *fmt, ...)
>  {
> va_list args;
> --
> 2.10.0
>


Re: [PATCH] Move format-patch base commit and prerequisites before email signature

2016-09-08 Thread Junio C Hamano
Jeff King  writes:

> On Thu, Sep 08, 2016 at 11:54:08AM -0700, Josh Triplett wrote:
>
>> > your problem description
>> > looks perfect.  I am still not sure if the code does a reasonable
>> > thing in MIME case, though.
>> 
>> It *looks* correct to me.
> 
> Hmm. It looks correct to me, too; ...
> ...
> So this is actually fixing a bug,...

Yes, I actually wanted to hear that from Josh and have that in the
proposed log message ;-).


Re: [PATCH] checkout: eliminate unnecessary merge for trivial checkout

2016-09-08 Thread Junio C Hamano
Ben Peart  writes:

> Teach git to avoid unnecessary merge during trivial checkout.  When
> running 'git checkout -b foo' git follows a common code path through
> the expensive merge_working_tree even when it is unnecessary.

I would be lying if I said I am not sympathetic to the cause, but...

> + /*
> +  * Optimize the performance of checkout when the current and
> +  * new branch have the same OID and avoid the trivial merge.
> +  * For example, a "git checkout -b foo" just needs to create
> +  * the new ref and report the stats.
> +  */
> + if (!old.commit || !new->commit
> + || oidcmp(>object.oid, >commit->object.oid)
> + || !opts->new_branch || opts->new_branch_force || 
> opts->new_orphan_branch
> + || opts->patch_mode || opts->merge || opts->force || 
> opts->force_detach
> + || opts->writeout_stage || !opts->overwrite_ignore
> + || opts->ignore_skipworktree || opts->ignore_other_worktrees
> + || opts->new_branch_log || opts->branch_exists || opts->prefix
> + || opts->source_tree) {

... this is a maintenance nightmare in that any new option we will
add later will need to consider what this "optimization" is trying
(not) to skip.  The first two lines (i.e. we need a real checkout if
we cannot positively say that old and new commits are the same
object) are clear, but no explanation was given for all the other
random conditions this if condition checks.  What if opts->something
was not listed (or "listed" for that matter) in the list above--it
is totally unclear if it was missed by mistake (or "added by
mistake") or deliberately excluded (or "deliberately added").

For example, why is opts->prefix there?  If

git checkout -b new-branch HEAD

should be able to omit the two-way merge, shouldn't

cd t && git checkout -b new-branch HEAD

also be able to?

Even the main condition is unclear.  It wants to see that old and
new have exactly the same commit, but shouldn't the "the result of
the two-way merge is known to be no-op" logic equally apply if the
old and two trees are the same?






Re: [PATCH v7 03/10] pkt-line: add packet_write_fmt_gently()

2016-09-08 Thread Stefan Beller
On Thu, Sep 8, 2016 at 11:21 AM,   wrote:

> +static int packet_write_fmt_1(int fd, int gently,
> +  const char *fmt, va_list args)
> +{
> +   struct strbuf buf = STRBUF_INIT;
> +   size_t count;
> +
> +   format_packet(, fmt, args);
> +   count = write_in_full(fd, buf.buf, buf.len);
> +   if (count == buf.len)
> +   return 0;
> +
> +   if (!gently) {

call check_pipe from write_or_die here instead of
reproducing that function?

> +   if (errno == EPIPE) {
> +   if (in_async())
> +   async_exit(141);
> +
> +   signal(SIGPIPE, SIG_DFL);
> +   raise(SIGPIPE);
> +   /* Should never happen, but just in case... */
> +   exit(141);
> +   }
> +   die_errno("packet write error");
> +   }
> +   error("packet write failed");
> +   return -1;

I think the more idiomatic way is to

return error(...);

as error always return -1.


Re: [PATCH 3/3] diff: remove dead code

2016-09-08 Thread Junio C Hamano
Stefan Beller  writes:

> When `len < 1`, len has to be 0 or negative, emit_line will then remove the
> first character and by then `len` would be negative. As this doesn't
> happen, it is safe to assume it is dead code.
>
> This continues to simplify the code, which was started in b8d9c1a66b
> (2009-09-03,  diff.c: the builtin_diff() deals with only two-file
> comparison).

We look at line[0] to see if it is '@' before this check, which
would have been wrong if "len < 1" were ever true.

>
> Signed-off-by: Stefan Beller 
> ---
>  diff.c | 8 
>  1 file changed, 8 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index 79ad91d..c143019 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -1251,14 +1251,6 @@ static void fn_out_consume(void *priv, char *line, 
> unsigned long len)
>   return;
>   }
>  
> - if (len < 1) {
> - emit_line(o, reset, reset, line, len);
> - if (ecbdata->diff_words
> - && ecbdata->diff_words->type == DIFF_WORDS_PORCELAIN)
> - fputs("~\n", o->file);
> - return;
> - }
> -
>   if (ecbdata->diff_words) {
>   if (line[0] == '-') {
>   diff_words_append(line, len,


Re: [PATCH 2/3] diff: omit found pointer from emit_callback

2016-09-08 Thread Junio C Hamano
Stefan Beller  writes:

> diff --git a/diff.c b/diff.c
> index 4a6501c..79ad91d 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -354,7 +354,6 @@ struct emit_callback {
>   const char **label_path;
>   struct diff_words_data *diff_words;
>   struct diff_options *opt;
> - int *found_changesp;
>   struct strbuf *header;
>  };

I briefly wondered if we have some callsites that do not want
o->found_changes to be modified (hence pointing this field at
elsewhere), but the fact that you can _remove_ this field means that
there is no such use case, which is good.

> @@ -722,7 +721,6 @@ static void emit_rewrite_diff(const char *name_a,
>  
>   memset(, 0, sizeof(ecbdata));
>   ecbdata.color_diff = want_color(o->use_color);
> - ecbdata.found_changesp = >found_changes;
>   ecbdata.ws_rule = whitespace_rule(name_b);
>   ecbdata.opt = o;
>   if (ecbdata.ws_rule & WS_BLANK_AT_EOF) {
> @@ -1215,13 +1213,13 @@ static void fn_out_consume(void *priv, char *line, 
> unsigned long len)
>   const char *reset = diff_get_color(ecbdata->color_diff, DIFF_RESET);
>   struct diff_options *o = ecbdata->opt;
>   const char *line_prefix = diff_line_prefix(o);
> + o->found_changes = 1;
>  
>   if (ecbdata->header) {
>   fprintf(o->file, "%s", ecbdata->header->buf);
>   strbuf_reset(ecbdata->header);
>   ecbdata->header = NULL;
>   }
> - *(ecbdata->found_changesp) = 1;

Is there a good reason to move the assignment up?  "The fact that
this function was called even once means we found some change" is
probably a good argument, but then I'd prefer to have a blank before
it to separate it (the first statement) from the block of decls.

No need to resend.  Thanks.


Re: [PATCH 01/13] i18n: apply: mark plural string for translation

2016-09-08 Thread Junio C Hamano
Thanks.

I'll skip 01-03/13 and queue the remainder for now, as I'd want to
see Christian's "split builtin/apply.c into two, moving bulk to
apply.c at the top-level to be reused" merged to 'next' sooner and
to 'master' hopefully during this cycle.







[PATCH] checkout: eliminate unnecessary merge for trivial checkout

2016-09-08 Thread Ben Peart
Teach git to avoid unnecessary merge during trivial checkout.  When
running 'git checkout -b foo' git follows a common code path through
the expensive merge_working_tree even when it is unnecessary.  As a
result, 95% of the time is spent in merge_working_tree doing the 2-way
merge between the new and old commit trees that is unneeded.

The time breakdown is as follows:

merge_working_tree <-- 95%
unpack_trees <-- 80%
traverse_trees <-- 50%
cache_tree_update <-- 17%
mark_new_skip_worktree <-- 10%

With a large repo, this cost is pronounced.  Using "git checkout -b r"
to create and switch to a new branch costs 166 seconds (all times worst
case with a cold file system cache).

git.c:406   trace: built-in: git 'checkout' '-b' 'r'
read-cache.c:1667   performance: 17.442926555 s: read_index_from
name-hash.c:128 performance: 2.912145231 s: lazy_init_name_hash
read-cache.c:2208   performance: 4.387713335 s: write_locked_index
trace.c:420 performance: 166.458921289 s: git command:
'c:\Users\benpeart\bin\git.exe' 
'checkout' '-b' 'r'
Switched to a new branch 'r'

By adding a test to skip the unnecessary call to merge_working_tree in
this case reduces the cost to 16 seconds.

git.c:406   trace: built-in: git 'checkout' '-b' 's'
read-cache.c:1667   performance: 16.100742476 s: read_index_from
trace.c:420 performance: 16.461547867 s: git command: 
'c:\Users\benpeart\bin\git.exe' 'checkout' '-b' 's'
Switched to a new branch 's'

Signed-off-by: Ben Peart 
---
 builtin/checkout.c | 23 +++
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 8672d07..595d64b 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -827,10 +827,25 @@ static int switch_branches(const struct checkout_opts 
*opts,
parse_commit_or_die(new->commit);
}
 
-   ret = merge_working_tree(opts, , new, _error);
-   if (ret) {
-   free(path_to_free);
-   return ret;
+   /*
+* Optimize the performance of checkout when the current and
+* new branch have the same OID and avoid the trivial merge.
+* For example, a "git checkout -b foo" just needs to create
+* the new ref and report the stats.
+*/
+   if (!old.commit || !new->commit
+   || oidcmp(>object.oid, >commit->object.oid)
+   || !opts->new_branch || opts->new_branch_force || 
opts->new_orphan_branch
+   || opts->patch_mode || opts->merge || opts->force || 
opts->force_detach
+   || opts->writeout_stage || !opts->overwrite_ignore
+   || opts->ignore_skipworktree || opts->ignore_other_worktrees
+   || opts->new_branch_log || opts->branch_exists || opts->prefix
+   || opts->source_tree) {
+   ret = merge_working_tree(opts, , new, _error);
+   if (ret) {
+   free(path_to_free);
+   return ret;
+   }
}
 
if (!opts->quiet && !old.path && old.commit && new->commit != 
old.commit)
-- 
2.10.0.windows.1



Re: [PATCH 4/5] versioncmp: pass full tagnames to swap_prereleases()

2016-09-08 Thread SZEDER Gábor


Quoting Junio C Hamano :


SZEDER Gábor  writes:


- * Note that we don't have to deal with the situation when both p1 and
- * p2 start with the same suffix because the common part is already
+ * Note that we don't have to deal with the situation when both s1 and
+ * s2 contain the same suffix because the common part is already
  * consumed by the caller.


"The common part is already consumed" was relevant while the
function was fed p1 and p2, i.e. the first difference, but the whole
point of passing the original s1 and s2 with ofs is so that the
function can look behind ofs as necessary.  Is "already consumed"
still correct (or relevant) with s/p/s/ you did to its calling
convention?


Well, it's still correct in the sense that we don't have to worry about
finding the same suffix in both strings.  However, "consume" is not the
right word to use here, as incrementing an offset until it points past
the common part doesn't count as "consumption", so more rewording would
be necessary.

I'm not sure about the relevancy of this pararaph, or the relevancy of
the original version for that matter.  I mean, there is a different
character for sure, so it's really rather obvious that it can't
possibly be the same suffix in both, isn't it?  So I don't think it
adds much value, and don't mind deleting it in the reroll.


Best,
Gábor



Re: [PATCH v2] t/Makefile: add a rule to re-run previously-failed tests

2016-09-08 Thread Junio C Hamano
Johannes Schindelin  writes:

> On Thu, 1 Sep 2016, Junio C Hamano wrote:
>
>> Hopefully that [patch removing the - suffix] would help making
>> Dscho's "what are the failed tests?" logic simpler.
>
> Of course.
>
> It also makes sure that those 2 hours I spent on writing and perfecting
> the sed magic were spent in vain... ;-)

Well it is either

 * the sed magic is so arcane that you'd need to spend a long time,
   comparable to 2 hours you already spent, if you ever need to look
   at it and figure out what it does next time you need to change
   something in it.

or

 * you are not familiar with the sed magic and you would be able to
   write the same thing in 2 minutes next time if you need to adjust
   it when we add -pid back later.

Either way, those 2 hours are not wasted.

I personally fall into the former category.  Any sed script that
needs G, h, and x together I need to spend at least 15 minutes just
to warm myself up, as I do not work with the language that often.

Thanks ;-)



Re: [PATCH] Move format-patch base commit and prerequisites before email signature

2016-09-08 Thread Jeff King
On Thu, Sep 08, 2016 at 11:54:08AM -0700, Josh Triplett wrote:

> > your problem description
> > looks perfect.  I am still not sure if the code does a reasonable
> > thing in MIME case, though.
> 
> It *looks* correct to me.

Hmm. It looks correct to me, too; we stick it just after the patch, so
with "--attach" it is part of the text/x-patch, which is reasonable.

But looking at the results of "--attach" from _before_ your patch, it
looks totally broken. The "base" information comes _after the final
delimiter of the multipart/mixed. Most mailers would just throw it away
when decoding the multipart, I think.

So this is actually fixing a bug, and you could probably add a test
(though I am not sure we have anything in git that actually parses
multipart messages _or_ that carefully consumes the base-commit info, so
it might be hard to test in practice).

-Peff


Re: [PATCH 3/3] checkout: fix ambiguity check in subdir

2016-09-08 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> The two functions in parse_branchname_arg(), verify_non_filename and
> check_filename, need correct prefix in order to reconstruct the paths
> and check for their existence. With NULL prefix, they just check paths
> at top dir instead.

Good eyes.  Will queue.

>
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  builtin/checkout.c|  4 ++--
>  t/t2010-checkout-ambiguous.sh |  9 +
>  t/t2024-checkout-dwim.sh  | 12 
>  3 files changed, 23 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/checkout.c b/builtin/checkout.c
> index 1f71d06..53c7284 100644
> --- a/builtin/checkout.c
> +++ b/builtin/checkout.c
> @@ -985,7 +985,7 @@ static int parse_branchname_arg(int argc, const char 
> **argv,
>   int recover_with_dwim = dwim_new_local_branch_ok;
>  
>   if (!has_dash_dash &&
> - (check_filename(NULL, arg) || !no_wildcard(arg)))
> + (check_filename(opts->prefix, arg) || !no_wildcard(arg)))
>   recover_with_dwim = 0;
>   /*
>* Accept "git checkout foo" and "git checkout foo --"
> @@ -1046,7 +1046,7 @@ static int parse_branchname_arg(int argc, const char 
> **argv,
>* it would be extremely annoying.
>*/
>   if (argc)
> - verify_non_filename(NULL, arg);
> + verify_non_filename(opts->prefix, arg);
>   } else {
>   argcount++;
>   argv++;
> diff --git a/t/t2010-checkout-ambiguous.sh b/t/t2010-checkout-ambiguous.sh
> index e76e84a..2e47fe0 100755
> --- a/t/t2010-checkout-ambiguous.sh
> +++ b/t/t2010-checkout-ambiguous.sh
> @@ -41,6 +41,15 @@ test_expect_success 'check ambiguity' '
>   test_must_fail git checkout world all
>  '
>  
> +test_expect_success 'check ambiguity in subdir' '
> + mkdir sub &&
> + # not ambiguous because sub/world does not exist
> + git -C sub checkout world ../all &&
> + echo hello >sub/world &&
> + # ambiguous because sub/world does exist
> + test_must_fail git -C sub checkout world ../all
> +'
> +
>  test_expect_success 'disambiguate checking out from a tree-ish' '
>   echo bye > world &&
>   git checkout world -- world &&
> diff --git a/t/t2024-checkout-dwim.sh b/t/t2024-checkout-dwim.sh
> index 468a000..3e5ac81 100755
> --- a/t/t2024-checkout-dwim.sh
> +++ b/t/t2024-checkout-dwim.sh
> @@ -174,6 +174,18 @@ test_expect_success 'checkout of branch with a file 
> having the same name fails'
>   test_branch master
>  '
>  
> +test_expect_success 'checkout of branch with a file in subdir having the 
> same name fails' '
> + git checkout -B master &&
> + test_might_fail git branch -D spam &&
> +
> + >spam &&
> + mkdir sub &&
> + mv spam sub/spam &&
> + test_must_fail git -C sub checkout spam &&
> + test_must_fail git rev-parse --verify refs/heads/spam &&
> + test_branch master
> +'
> +
>  test_expect_success 'checkout  -- succeeds, even if a file with the 
> same name exists' '
>   git checkout -B master &&
>   test_might_fail git branch -D spam &&


Re: [PATCH] gpg-interface: reflect stderr to stderr

2016-09-08 Thread Jeff King
On Thu, Sep 08, 2016 at 11:20:09AM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > On Wed, Sep 07, 2016 at 10:27:34AM +0200, Michael J Gruber wrote:
> >
> >> Now, I can't reproduce C on Linux[*], so there is more involved. It
> >> could be that my patch just exposes a problem in our start_command()
> >> etc.: run-command.c contains a lot of ifdefing, so possibly quite
> >> different code is run on different platforms.
> >
> > Maybe, though my blind guess is that it is simply that on Linux we can
> > open /dev/tty directly, and console-IO on Windows is a bit more
> > complicated.
> 
> True.
> 
> Even though this patch is fixing only one of the two issues, I am
> tempted to say that we should queue it for now, as it does so
> without breaking a bigger gain made by the original, i.e. we learn
> the status of verification in a way the authors of GPG wants us to,
> while somebody figuires out what the best way is to show the prompt
> to the console on Windows.

That's OK by me, but I don't know if we can put off the "best way to
show the prompt" fix. It seems like a pretty serious regression for
people on Windows.

-Peff


Re: [PATCH 2/3] checkout.txt: document a common case that ignores ambiguation rules

2016-09-08 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> Normally we err on the safe side: if something can be seen as both an
> SHA1 and a pathspec, we stop and scream. In checkout, there is one
> exception added in 859fdab (git-checkout: improve error messages, detect
> ambiguities. - 2008-07-23), to allow the common case "git checkout
> branch". Let's document this exception.

Good idea, but...

> +ARGUMENT AMBIGUATION
> +
> +
> +When there is only one argument given and it is not `--` (e.g. "git
> +checkout abc"), "abc" could be seen as either a `` or a
> +``, but Git will assume the argument is a ``, which is
> +a common case for switching branches. Use `git checkout -- `
> +form if you mean it to be a pathspec.

... this is far from reasonable.  I'd read "but Git will assume the
argument is a tree-ish" to mean "git checkout Makefile" would
attempt to checkout the Makefile branch and fail.

When there is only one argument given and it is not `--` (e.g. "git
checkout abc"), and when the argument is both a valid ``
(e.g. a branch "abc" exists) and a valid `` (e.g. a file
or a directory whose name is "abc" exists), Git would usually ask
you to disambiguate.  Because checking out a branch is so common an
operation, however, "git checkout abc" takes "abc" as a ``
in such situation.  Use `git checkout -- ` if you want to
checkout these paths out of the index.

or something like that?


Re: [PATCH 2/3] t0001: work around the bug that reads config file before repo setup

2016-09-08 Thread Jeff King
On Thu, Sep 08, 2016 at 08:47:18PM +0700, Nguyễn Thái Ngọc Duy wrote:

> git-init somehow reads '.git/config' at current directory and sets
> log_all_ref_updates based on this file. Because log_all_ref_updates is
> not unspecified (-1) any more. It will not be written to the new repo's
> config file (see create_default_files() function).
> 
> This will affect our tests in the next patch as we will compare the
> config file and expect that core.logallrefupdates is already set to true
> by "git init main-worktree".

This is a bug for more than worktrees, and is something I'm working on
fixing (what I'd like to do is kill off the lazy fallback to ".git/" as
the repo name, which is almost always the wrong thing to do).

I'm not opposed to your workaround, but just FYI.

-Peff


Re: [PATCH 3/3] init: do not set core.worktree more often than necessary

2016-09-08 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> +/*
> + * Return the first ".git" that we have encountered.
> + * FIXME this function for not entirely correct because
> + * setup_git_directory() and enter_repo() do not update first_git_dir
> + * when they follow .git files. The function in its current state is
> + * only suitable for "git init".
> + */

Would it be possible to move this to "init-db.c" then?

The very first thing cmd_init_db() does to what is in the
environment.c is to call set_git_dir() via set_git_dir_init() to
tell it where the ".git" thing is, no?  Can't that code remember the
location itself, instead of adding code that is known not to be
usable by other callers?  That would help avoiding the future
confusion.

> +const char *get_first_git_dir(void)
> +{
> + return first_git_dir ? first_git_dir : git_dir;
> +}
> +
>  const char *get_git_common_dir(void)
>  {
>   return git_common_dir;
> diff --git a/t/t0001-init.sh b/t/t0001-init.sh
> index 393c940..d59669a 100755
> --- a/t/t0001-init.sh
> +++ b/t/t0001-init.sh
> @@ -393,9 +393,11 @@ test_expect_success 're-init from a linked worktree' '
>   test_commit first &&
>   git worktree add ../linked-worktree &&
>   mv .git/info/exclude expected-exclude &&
> + cp .git/config expected-config &&
>   find .git/worktrees -print | sort >expected &&
>   git -C ../linked-worktree init &&
>   test_cmp expected-exclude .git/info/exclude &&
> + test_cmp expected-config .git/config &&
>   find .git/worktrees -print | sort >actual &&
>   test_cmp expected actual
>   )


Re: [PATCH v2 3/3] Use the newly-introduced regexec_buf() function

2016-09-08 Thread Jeff King
On Thu, Sep 08, 2016 at 08:06:35PM +0100, Ramsay Jones wrote:

> > Actually, I take it back again. Your test case doesn't have an embedded
> > NUL in it (so we check that git finds it, but aside from the lack of
> > segfault, stock git would already find it).
> 
> This reminds me ... despite the native cygwin regex library no longer
> having the 'regex bug' (ie t0070.5 now passes), I still have NO_REGEX
> set on cygwin. This is because, when building with the native library,
> we have an "unexpected pass" for t7008.12, which looks like:
> 
> test_expect_failure 'git grep .fi a' '
> git grep .fi a
> '
> [where the file a is set up earlier by: echo 'binaryQfile' | q_to_nul >a]
> 
> commit f96e5673 ("grep: use REG_STARTEND for all matching if available",
> 22-05-2010) introduced this test and expects ".. NUL characters themselves
> are not matched in any way". With the native library on cygwin they are
> matched, with the compat/regex they are not. Indeed, if you use the system
> 'grep' command (rather than 'git grep'), then it will also not match ... :-D
> 
> Slightly off topic, but ...

Hmm. So it sounds like the "regmatch" in grep.c could go away in favor
of Johannes's regexec_buf(), and cygwin ought to be using NO_REGEX.

-Peff


Re: [PATCH 2/3] t0001: work around the bug that reads config file before repo setup

2016-09-08 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> git-init somehow reads '.git/config' at current directory and sets
> log_all_ref_updates based on this file. Because log_all_ref_updates is
> not unspecified (-1) any more. It will not be written to the new repo's
> config file (see create_default_files() function).

I vaguely recall this was discussed recently somewhere.  Is this
related to <20160902091130.jxckdeomw35ee...@sigill.intra.peff.net>?


> This will affect our tests in the next patch as we will compare the
> config file and expect that core.logallrefupdates is already set to true
> by "git init main-worktree".

Ugh.  The "mv" workaround is fine to demonstrate that you have fixed
one of two problems, but as you said, it would be nice to have another
that expects failure until both of them gets fixed.

> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  t/t0001-init.sh | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/t/t0001-init.sh b/t/t0001-init.sh
> index d64e5e3..393c940 100755
> --- a/t/t0001-init.sh
> +++ b/t/t0001-init.sh
> @@ -385,7 +385,9 @@ test_expect_success MINGW 'bare git dir not hidden' '
>  '
>  
>  test_expect_success 're-init from a linked worktree' '
> + mv .git/config work-around-init-reading-wrong-file &&
>   git init main-worktree &&
> + mv work-around-init-reading-wrong-file .git/config &&
>   (
>   cd main-worktree &&
>   test_commit first &&


Re: [PATCH 1/3] init: correct re-initialization from a linked worktree

2016-09-08 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> When 'git init' is called from a linked worktree, '.git' dir as the main
> '.git' (i.e. $GIT_COMMON_DIR) and populate the whole repository skeleton
> in there. It does not harm anything (*) but it is still wrong.

-ECANNOTPARSE.  Did you mean "... worktree, we treat '.git' dir as
if it is the main '.git' ..." or something entirely different?

> diff --git a/builtin/init-db.c b/builtin/init-db.c
> index 3a45f0b..6d9552e 100644
> --- a/builtin/init-db.c
> +++ b/builtin/init-db.c
> @@ -138,7 +138,7 @@ static void copy_templates(const char *template_dir)
>   goto close_free_return;
>   }
>  
> - strbuf_addstr(, get_git_dir());
> + strbuf_addstr(, get_git_common_dir());
>   strbuf_complete(, '/');
>   copy_templates_1(, _path, dir);
>  close_free_return:
> diff --git a/t/t0001-init.sh b/t/t0001-init.sh
> index a6fdd5e..d64e5e3 100755
> --- a/t/t0001-init.sh
> +++ b/t/t0001-init.sh
> @@ -384,4 +384,19 @@ test_expect_success MINGW 'bare git dir not hidden' '
>   ! is_hidden newdir
>  '
>  
> +test_expect_success 're-init from a linked worktree' '
> + git init main-worktree &&
> + (
> + cd main-worktree &&
> + test_commit first &&
> + git worktree add ../linked-worktree &&
> + mv .git/info/exclude expected-exclude &&
> + find .git/worktrees -print | sort >expected &&
> + git -C ../linked-worktree init &&
> + test_cmp expected-exclude .git/info/exclude &&
> + find .git/worktrees -print | sort >actual &&
> + test_cmp expected actual
> + )
> +'
> +
>  test_done


Re: [PATCH] Move format-patch base commit and prerequisites before email signature

2016-09-08 Thread Junio C Hamano
Josh Triplett  writes:

> If any other change ends up being necessary, I'll split the patch in v2.

Thanks. I do not see anything else offhand myself, but other people
watching the topic from the sideline may spot something we missed.



Re: [PATCH v2 3/3] Use the newly-introduced regexec_buf() function

2016-09-08 Thread Ramsay Jones


On 08/09/16 09:35, Jeff King wrote:
> On Thu, Sep 08, 2016 at 04:14:46AM -0400, Jeff King wrote:
> 
>> On Thu, Sep 08, 2016 at 04:10:24AM -0400, Jeff King wrote:
>>
>>> On Thu, Sep 08, 2016 at 09:54:51AM +0200, Johannes Schindelin wrote:
>>>
>  diff.c |  3 ++-
>  diffcore-pickaxe.c | 18 --
>  xdiff-interface.c  | 13 -
>  3 files changed, 14 insertions(+), 20 deletions(-)

 I just realized that this should switch the test_expect_failure from 1/3
 to a test_expect_success.
>>>
>>> Yep. I wonder if we also would want to test that we correctly find
>>> regexes inside binary files.
>>>
>>> E.g., given a mixed binary/text file like:
>>>
>>>   printf 'binary\0text' >file &&
>>>   git add file &&
>>>   git commit -m file
>>>
>>> then "git log -Stext" will find that file, but "--pickaxe-regex" will
>>> not (using stock git). Ditto for "-Gtext".
>>>
>>> Your patch should fix that.
>>
>> Of course if I had actually _looked carefully_ at your patch, I would
>> have seen that your test doesn't just check that we don't segfault, but
>> actually confirms that we find the entry.
>>
>> Sorry for the noise.
> 
> Actually, I take it back again. Your test case doesn't have an embedded
> NUL in it (so we check that git finds it, but aside from the lack of
> segfault, stock git would already find it).

This reminds me ... despite the native cygwin regex library no longer
having the 'regex bug' (ie t0070.5 now passes), I still have NO_REGEX
set on cygwin. This is because, when building with the native library,
we have an "unexpected pass" for t7008.12, which looks like:

test_expect_failure 'git grep .fi a' '
git grep .fi a
'
[where the file a is set up earlier by: echo 'binaryQfile' | q_to_nul >a]

commit f96e5673 ("grep: use REG_STARTEND for all matching if available",
22-05-2010) introduced this test and expects ".. NUL characters themselves
are not matched in any way". With the native library on cygwin they are
matched, with the compat/regex they are not. Indeed, if you use the system
'grep' command (rather than 'git grep'), then it will also not match ... :-D

Slightly off topic, but ...

ATB,
Ramsay Jones




Re: [PATCH] Move format-patch base commit and prerequisites before email signature

2016-09-08 Thread Josh Triplett
On Thu, Sep 08, 2016 at 11:34:15AM -0700, Junio C Hamano wrote:
> Josh Triplett  writes:
> 
> > Any text below the "-- " for the email signature gets treated as part of
> > the signature, and many mail clients will trim it from the quoted text
> > for a reply.  Move it above the signature, so people can reply to it
> > more easily.
> >
> > Add tests for the exact format of the email signature, and add tests to
> > ensure the email signature appears last.
> >
> > (Patch by Junio Hamano; tests by Josh Triplett.)
> > Signed-off-by: Josh Triplett 
> > ---
> >
> > Does the above seem reasonable, for a patch that incorporates the
> > proposed patch from Message-Id
> > xmqqh99rpud4@gitster.mtv.corp.google.com and adds tests?
> 
> Other than that I'd probably retitle it,

Ah, true, I should have titled it "format-patch: move base commit ...".

> your problem description
> looks perfect.  I am still not sure if the code does a reasonable
> thing in MIME case, though.

It *looks* correct to me.

> Thanks for tying the loose ends anyway.
> 
> > diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh
> > index b0579dd..a4af275 100755
> > --- a/t/t4014-format-patch.sh
> > +++ b/t/t4014-format-patch.sh
> > @@ -754,9 +754,22 @@ test_expect_success 'format-patch 
> > --ignore-if-in-upstream HEAD' '
> > git format-patch --ignore-if-in-upstream HEAD
> >  '
> >  
> > +git_version="$(git --version | sed "s/.* //")"
> > +
> > +signature() {
> > +   printf "%s\n%s\n\n" "-- " "${1:-$git_version}"
> > +}
> 
> Hmph.  I would actually have expected that you would force a fixed
> and an easily noticeable string via format.signature for the purpose
> of the test,

One of the git tests already did that.  I just modified that test to
test the exact signature format and that it appears at the end, rather
than just grepping to check that the signature string appears somewhere.
Then when doing so, I realized that I should check the default case too
(at which point that test change probably should have gone in a separate
patch).

> but I guess this test covers a lot more than what the
> purpose of the main part of the patch does (i.e. enforces that the
> default signature must be made from the version string of Git).  It
> is not a bad thing to test, but it probably does not belong to this
> change.  If you _were_ to split the patch in two, that is where I
> probably would split, i.e. "we didn't test what the default signature
> looks like, or we didn't make sure --signature option overrides the
> default signature, so let's test it" as the preliminary preparation,
> followed by "having base info after sig is inconvenient, let's move
> it and make sure base info stays before sig with additional test" as
> the second (and primary) patch.
>
> But a single patch is fine.
> 
> Thanks.

If any other change ends up being necessary, I'll split the patch in v2.

- Josh Triplett


Re: [PATCH 2/3] diff_populate_filespec: NUL-terminate buffers

2016-09-08 Thread Jeff King
On Thu, Sep 08, 2016 at 09:57:43AM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > Between the two options for regexec_buf(), I think you have convinced me
> > that REG_STARTEND is better than just using compat/regex everywhere. I
> > do think the fallback for platforms like musl should be "use
> > compat/regex" and not doing an expensive copy (which in most cases is
> > not even necessary).
> 
> I agree with you that it would be the best approach to build
> regexec_buf() that unconditionally uses REG_STARTEND and tell people
> without REG_STARTEND to use compat/regex instead of their platform
> regex library.
> 
> The description in Makefile may want to be rephrased to clarify.
> 
> -# Define NO_REGEX if you have no or inferior regex support in your C library.
> +# Define NO_REGEX if your C library lacks regex support with REG_STARTEND
> +# feature.
> 
> The word "inferior" is not giving any useful information there.

Yeah, very much agreed (and the "#error" when we lack REG_STARTEND I
mentioned earlier may be a good hint).

-Peff


Re: "fatal error in commit_refs" from pushing to github

2016-09-08 Thread Jeff King
On Thu, Sep 08, 2016 at 06:03:33PM +0700, Duy Nguyen wrote:

> > So this is really an internal failure at the ref-update stage. There
> > _should_ be a reasonable error message, but I think "fatal error in
> > commit_refs" is the generic last-ditch fallback. I'll pass this along to
> > people in charge of that code, as we should be generating a more useful
> > error message.
> 
> Hmm.. I'm interested in this because the "fix" is from client side. I
> did "git gc" and "git fetch" and the problem was gone.

It may also have been a transient error inside GitHub that resolved
itself between your two pushes.

-Peff


Re: [PATCH] Move format-patch base commit and prerequisites before email signature

2016-09-08 Thread Junio C Hamano
Josh Triplett  writes:

> Any text below the "-- " for the email signature gets treated as part of
> the signature, and many mail clients will trim it from the quoted text
> for a reply.  Move it above the signature, so people can reply to it
> more easily.
>
> Add tests for the exact format of the email signature, and add tests to
> ensure the email signature appears last.
>
> (Patch by Junio Hamano; tests by Josh Triplett.)
> Signed-off-by: Josh Triplett 
> ---
>
> Does the above seem reasonable, for a patch that incorporates the
> proposed patch from Message-Id
> xmqqh99rpud4@gitster.mtv.corp.google.com and adds tests?

Other than that I'd probably retitle it, your problem description
looks perfect.  I am still not sure if the code does a reasonable
thing in MIME case, though.

Thanks for tying the loose ends anyway.

> diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh
> index b0579dd..a4af275 100755
> --- a/t/t4014-format-patch.sh
> +++ b/t/t4014-format-patch.sh
> @@ -754,9 +754,22 @@ test_expect_success 'format-patch 
> --ignore-if-in-upstream HEAD' '
>   git format-patch --ignore-if-in-upstream HEAD
>  '
>  
> +git_version="$(git --version | sed "s/.* //")"
> +
> +signature() {
> + printf "%s\n%s\n\n" "-- " "${1:-$git_version}"
> +}

Hmph.  I would actually have expected that you would force a fixed
and an easily noticeable string via format.signature for the purpose
of the test, but I guess this test covers a lot more than what the
purpose of the main part of the patch does (i.e. enforces that the
default signature must be made from the version string of Git).  It
is not a bad thing to test, but it probably does not belong to this
change.  If you _were_ to split the patch in two, that is where I
probably would split, i.e. "we didn't test what the default signature
looks like, or we didn't make sure --signature option overrides the
default signature, so let's test it" as the preliminary preparation,
followed by "having base info after sig is inconvenient, let's move
it and make sure base info stays before sig with additional test" as
the second (and primary) patch.

But a single patch is fine.

Thanks.


Re: [PATCH 2/3] diff_populate_filespec: NUL-terminate buffers

2016-09-08 Thread Johannes Schindelin
Hi Peff & Junio,

On Thu, 8 Sep 2016, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > Between the two options for regexec_buf(), I think you have convinced me
> > that REG_STARTEND is better than just using compat/regex everywhere. I
> > do think the fallback for platforms like musl should be "use
> > compat/regex" and not doing an expensive copy (which in most cases is
> > not even necessary).
> 
> I agree with you that it would be the best approach to build
> regexec_buf() that unconditionally uses REG_STARTEND and tell people
> without REG_STARTEND to use compat/regex instead of their platform
> regex library.
> 
> The description in Makefile may want to be rephrased to clarify.
> 
> -# Define NO_REGEX if you have no or inferior regex support in your C library.
> +# Define NO_REGEX if your C library lacks regex support with REG_STARTEND
> +# feature.

Okay, I will make that happen. But first I need to sleep.

Ciao,
Dscho


[PATCH v7 10/10] convert: add filter..process option

2016-09-08 Thread larsxschneider
From: Lars Schneider 

Git's clean/smudge mechanism invokes an external filter process for
every single blob that is affected by a filter. If Git filters a lot of
blobs then the startup time of the external filter processes can become
a significant part of the overall Git execution time.

In a preliminary performance test this developer used a clean/smudge
filter written in golang to filter 12,000 files. This process took 364s
with the existing filter mechanism and 5s with the new mechanism. See
details here: https://github.com/github/git-lfs/pull/1382

This patch adds the `filter..process` string option which, if
used, keeps the external filter process running and processes all blobs
with the packet format (pkt-line) based protocol over standard input and
standard output. The full protocol is explained in detail in
`Documentation/gitattributes.txt`.

A few key decisions:

* The long running filter process is referred to as filter protocol
  version 2 because the existing single shot filter invocation is
  considered version 1.
* Git sends a welcome message and expects a response right after the
  external filter process has started. This ensures that Git will not
  hang if a version 1 filter is incorrectly used with the
  filter..process option for version 2 filters. In addition,
  Git can detect this kind of error and warn the user.
* The status of a filter operation (e.g. "success" or "error) is set
  before the actual response and (if necessary!) re-set after the
  response. The advantage of this two step status response is that if
  the filter detects an error early, then the filter can communicate
  this and Git does not even need to create structures to read the
  response.
* All status responses are pkt-line lists terminated with a flush
  packet. This allows us to send other status fields with the same
  protocol in the future.

Helped-by: Martin-Louis Bright 
Reviewed-by: Jakub Narebski 
Signed-off-by: Lars Schneider 
---
 Documentation/gitattributes.txt| 154 +-
 contrib/long-running-filter/example.pl | 111 ++
 convert.c  | 349 ---
 pkt-line.h |   1 +
 t/t0021-conversion.sh  | 365 -
 t/t0021/rot13-filter.pl| 179 
 unpack-trees.c |   1 +
 7 files changed, 1129 insertions(+), 31 deletions(-)
 create mode 100755 contrib/long-running-filter/example.pl
 create mode 100755 t/t0021/rot13-filter.pl

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index 7aff940..ac000ea 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -293,7 +293,13 @@ checkout, when the `smudge` command is specified, the 
command is
 fed the blob object from its standard input, and its standard
 output is used to update the worktree file.  Similarly, the
 `clean` command is used to convert the contents of worktree file
-upon checkin.
+upon checkin. By default these commands process only a single
+blob and terminate.  If a long running `process` filter is used
+in place of `clean` and/or `smudge` filters, then Git can process
+all blobs with a single filter command invocation for the entire
+life of a single Git command, for example `git add --all`.  See
+section below for the description of the protocol used to
+communicate with a `process` filter.
 
 One use of the content filtering is to massage the content into a shape
 that is more convenient for the platform, filesystem, and the user to use.
@@ -373,6 +379,152 @@ not exist, or may have different contents. So, smudge and 
clean commands
 should not try to access the file on disk, but only act as filters on the
 content provided to them on standard input.
 
+Long Running Filter Process
+^^^
+
+If the filter command (a string value) is defined via
+`filter..process` then Git can process all blobs with a
+single filter invocation for the entire life of a single Git
+command. This is achieved by using the following packet format
+(pkt-line, see technical/protocol-common.txt) based protocol over
+standard input and standard output.
+
+Git starts the filter when it encounters the first file
+that needs to be cleaned or smudged. After the filter started
+Git sends a welcome message ("git-filter-client"), a list of
+supported protocol version numbers, and a flush packet. Git expects
+to read a welcome response message ("git-filter-server") and exactly
+one protocol version number from the previously sent list. All further
+communication will be based on the selected version. The remaining
+protocol description below documents "version=2". Please note that
+"version=42" in the example below does not exist and is only there
+to illustrate how the protocol would look like with more than one
+version.
+

[PATCH v7 09/10] convert: make apply_filter() adhere to standard Git error handling

2016-09-08 Thread larsxschneider
From: Lars Schneider 

apply_filter() returns a boolean that tells the caller if it
"did convert or did not convert". The variable `ret` was used throughout
the function to track errors whereas `1` denoted success and `0`
failure. This is unusual for the Git source where `0` denotes success.

Rename the variable and flip its value to make the function easier
readable for Git developers.

Signed-off-by: Lars Schneider 
---
 convert.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/convert.c b/convert.c
index 986c239..a068df7 100644
--- a/convert.c
+++ b/convert.c
@@ -451,7 +451,7 @@ static int apply_filter(const char *path, const char *src, 
size_t len, int fd,
 *
 * (child --> cmd) --> us
 */
-   int ret = 1;
+   int err = 0;
struct strbuf nbuf = STRBUF_INIT;
struct async async;
struct filter_params params;
@@ -478,22 +478,22 @@ static int apply_filter(const char *path, const char 
*src, size_t len, int fd,
 
if (strbuf_read(, async.out, len) < 0) {
error("read from external filter '%s' failed", cmd);
-   ret = 0;
+   err = -1;
}
if (close(async.out)) {
error("read from external filter '%s' failed", cmd);
-   ret = 0;
+   err = -1;
}
if (finish_async()) {
error("external filter '%s' failed", cmd);
-   ret = 0;
+   err = -1;
}
 
-   if (ret) {
+   if (!err) {
strbuf_swap(dst, );
}
strbuf_release();
-   return ret;
+   return !err;
 }
 
 static struct convert_driver {
-- 
2.10.0



[PATCH v7 04/10] pkt-line: add packet_flush_gently()

2016-09-08 Thread larsxschneider
From: Lars Schneider 

packet_flush() would die in case of a write error even though for some
callers an error would be acceptable. Add packet_flush_gently() which
writes a pkt-line flush packet and returns `0` for success and `-1` for
failure.

Signed-off-by: Lars Schneider 
---
 pkt-line.c | 9 +
 pkt-line.h | 1 +
 2 files changed, 10 insertions(+)

diff --git a/pkt-line.c b/pkt-line.c
index 3824d05..37345ca 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -91,6 +91,15 @@ void packet_flush(int fd)
write_or_die(fd, "", 4);
 }
 
+int packet_flush_gently(int fd)
+{
+   packet_trace("", 4, 1);
+   if (write_in_full(fd, "", 4) == 4)
+   return 0;
+   error("flush packet write failed");
+   return -1;
+}
+
 void packet_buf_flush(struct strbuf *buf)
 {
packet_trace("", 4, 1);
diff --git a/pkt-line.h b/pkt-line.h
index 3caea77..3fa0899 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -23,6 +23,7 @@ void packet_flush(int fd);
 void packet_write_fmt(int fd, const char *fmt, ...) __attribute__((format 
(printf, 2, 3)));
 void packet_buf_flush(struct strbuf *buf);
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...) 
__attribute__((format (printf, 2, 3)));
+int packet_flush_gently(int fd);
 int packet_write_fmt_gently(int fd, const char *fmt, ...) 
__attribute__((format (printf, 2, 3)));
 
 /*
-- 
2.10.0



[PATCH v7 07/10] convert: quote filter names in error messages

2016-09-08 Thread larsxschneider
From: Lars Schneider 

Git filter driver commands with spaces (e.g. `filter.sh foo`) are hard to
read in error messages. Quote them to improve the readability.

Signed-off-by: Lars Schneider 
---
 convert.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/convert.c b/convert.c
index 077f5e6..986c239 100644
--- a/convert.c
+++ b/convert.c
@@ -412,7 +412,7 @@ static int filter_buffer_or_fd(int in, int out, void *data)
child_process.out = out;
 
if (start_command(_process))
-   return error("cannot fork to run external filter %s", 
params->cmd);
+   return error("cannot fork to run external filter '%s'", 
params->cmd);
 
sigchain_push(SIGPIPE, SIG_IGN);
 
@@ -430,13 +430,13 @@ static int filter_buffer_or_fd(int in, int out, void 
*data)
if (close(child_process.in))
write_err = 1;
if (write_err)
-   error("cannot feed the input to external filter %s", 
params->cmd);
+   error("cannot feed the input to external filter '%s'", 
params->cmd);
 
sigchain_pop(SIGPIPE);
 
status = finish_command(_process);
if (status)
-   error("external filter %s failed %d", params->cmd, status);
+   error("external filter '%s' failed %d", params->cmd, status);
 
strbuf_release();
return (write_err || status);
@@ -477,15 +477,15 @@ static int apply_filter(const char *path, const char 
*src, size_t len, int fd,
return 0;   /* error was already reported */
 
if (strbuf_read(, async.out, len) < 0) {
-   error("read from external filter %s failed", cmd);
+   error("read from external filter '%s' failed", cmd);
ret = 0;
}
if (close(async.out)) {
-   error("read from external filter %s failed", cmd);
+   error("read from external filter '%s' failed", cmd);
ret = 0;
}
if (finish_async()) {
-   error("external filter %s failed", cmd);
+   error("external filter '%s' failed", cmd);
ret = 0;
}
 
-- 
2.10.0



[PATCH v7 05/10] pkt-line: add packet_write_gently()

2016-09-08 Thread larsxschneider
From: Lars Schneider 

packet_write_fmt_gently() uses format_packet() which lets the caller
only send string data via "%s". That means it cannot be used for
arbitrary data that may contain NULs.

Add packet_write_gently() which writes arbitrary data and returns `0`
for success and `-1` for an error. This function is used by other
pkt-line functions in a subsequent patch.

Signed-off-by: Lars Schneider 
---
 pkt-line.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/pkt-line.c b/pkt-line.c
index 37345ca..1d3d725 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -181,6 +181,25 @@ int packet_write_fmt_gently(int fd, const char *fmt, ...)
return status;
 }
 
+int packet_write_gently(const int fd_out, const char *buf, size_t size)
+{
+   static char packet_write_buffer[LARGE_PACKET_MAX];
+
+   if (size > sizeof(packet_write_buffer) - 4) {
+   error("packet write failed");
+   return -1;
+   }
+   packet_trace(buf, size, 1);
+   size += 4;
+   set_packet_header(packet_write_buffer, size);
+   memcpy(packet_write_buffer + 4, buf, size - 4);
+   if (write_in_full(fd_out, packet_write_buffer, size) == size)
+   return 0;
+
+   error("packet write failed");
+   return -1;
+}
+
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...)
 {
va_list args;
-- 
2.10.0



[PATCH v7 08/10] convert: modernize tests

2016-09-08 Thread larsxschneider
From: Lars Schneider 

Use `test_config` to set the config, check that files are empty with
`test_must_be_empty`, compare files with `test_cmp`, and remove spaces
after ">" and "<".

Please note that the "rot13" filter configured in "setup" keeps using
`git config` instead of `test_config` because subsequent tests might
depend on it.

Signed-off-by: Lars Schneider 
---
 t/t0021-conversion.sh | 58 +--
 1 file changed, 29 insertions(+), 29 deletions(-)

diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index e799e59..dc50938 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -38,8 +38,8 @@ script='s/^\$Id: \([0-9a-f]*\) \$/\1/p'
 
 test_expect_success check '
 
-   cmp test.o test &&
-   cmp test.o test.t &&
+   test_cmp test.o test &&
+   test_cmp test.o test.t &&
 
# ident should be stripped in the repository
git diff --raw --exit-code :test :test.i &&
@@ -47,10 +47,10 @@ test_expect_success check '
embedded=$(sed -ne "$script" test.i) &&
test "z$id" = "z$embedded" &&
 
-   git cat-file blob :test.t > test.r &&
+   git cat-file blob :test.t >test.r &&
 
-   ./rot13.sh < test.o > test.t &&
-   cmp test.r test.t
+   ./rot13.sh test.t &&
+   test_cmp test.r test.t
 '
 
 # If an expanded ident ever gets into the repository, we want to make sure that
@@ -130,7 +130,7 @@ test_expect_success 'filter shell-escaped filenames' '
 
# delete the files and check them out again, using a smudge filter
# that will count the args and echo the command-line back to us
-   git config filter.argc.smudge "sh ./argc.sh %f" &&
+   test_config filter.argc.smudge "sh ./argc.sh %f" &&
rm "$normal" "$special" &&
git checkout -- "$normal" "$special" &&
 
@@ -141,7 +141,7 @@ test_expect_success 'filter shell-escaped filenames' '
test_cmp expect "$special" &&
 
# do the same thing, but with more args in the filter expression
-   git config filter.argc.smudge "sh ./argc.sh %f --my-extra-arg" &&
+   test_config filter.argc.smudge "sh ./argc.sh %f --my-extra-arg" &&
rm "$normal" "$special" &&
git checkout -- "$normal" "$special" &&
 
@@ -154,9 +154,9 @@ test_expect_success 'filter shell-escaped filenames' '
 '
 
 test_expect_success 'required filter should filter data' '
-   git config filter.required.smudge ./rot13.sh &&
-   git config filter.required.clean ./rot13.sh &&
-   git config filter.required.required true &&
+   test_config filter.required.smudge ./rot13.sh &&
+   test_config filter.required.clean ./rot13.sh &&
+   test_config filter.required.required true &&
 
echo "*.r filter=required" >.gitattributes &&
 
@@ -165,17 +165,17 @@ test_expect_success 'required filter should filter data' '
 
rm -f test.r &&
git checkout -- test.r &&
-   cmp test.o test.r &&
+   test_cmp test.o test.r &&
 
./rot13.sh expected &&
git cat-file blob :test.r >actual &&
-   cmp expected actual
+   test_cmp expected actual
 '
 
 test_expect_success 'required filter smudge failure' '
-   git config filter.failsmudge.smudge false &&
-   git config filter.failsmudge.clean cat &&
-   git config filter.failsmudge.required true &&
+   test_config filter.failsmudge.smudge false &&
+   test_config filter.failsmudge.clean cat &&
+   test_config filter.failsmudge.required true &&
 
echo "*.fs filter=failsmudge" >.gitattributes &&
 
@@ -186,9 +186,9 @@ test_expect_success 'required filter smudge failure' '
 '
 
 test_expect_success 'required filter clean failure' '
-   git config filter.failclean.smudge cat &&
-   git config filter.failclean.clean false &&
-   git config filter.failclean.required true &&
+   test_config filter.failclean.smudge cat &&
+   test_config filter.failclean.clean false &&
+   test_config filter.failclean.required true &&
 
echo "*.fc filter=failclean" >.gitattributes &&
 
@@ -197,8 +197,8 @@ test_expect_success 'required filter clean failure' '
 '
 
 test_expect_success 'filtering large input to small output should use little 
memory' '
-   git config filter.devnull.clean "cat >/dev/null" &&
-   git config filter.devnull.required true &&
+   test_config filter.devnull.clean "cat >/dev/null" &&
+   test_config filter.devnull.required true &&
for i in $(test_seq 1 30); do printf "%1048576d" 1; done >30MB &&
echo "30MB filter=devnull" >.gitattributes &&
GIT_MMAP_LIMIT=1m GIT_ALLOC_LIMIT=1m git add 30MB
@@ -207,7 +207,7 @@ test_expect_success 'filtering large input to small output 
should use little mem
 test_expect_success 'filter that does not read is fine' '
test-genrandom foo $((128 * 1024 + 1)) >big &&
echo "big filter=epipe" >.gitattributes &&
-   git config 

[PATCH v7 06/10] pkt-line: add functions to read/write flush terminated packet streams

2016-09-08 Thread larsxschneider
From: Lars Schneider 

write_packetized_from_fd() and write_packetized_from_buf() write a
stream of packets. All content packets use the maximal packet size
except for the last one. After the last content packet a `flush` control
packet is written.

read_packetized_to_buf() reads arbitrary sized packets until it detects
a `flush` packet.

Signed-off-by: Lars Schneider 
---
 pkt-line.c | 68 ++
 pkt-line.h |  7 +++
 2 files changed, 75 insertions(+)

diff --git a/pkt-line.c b/pkt-line.c
index 1d3d725..5001a07 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -209,6 +209,47 @@ void packet_buf_write(struct strbuf *buf, const char *fmt, 
...)
va_end(args);
 }
 
+int write_packetized_from_fd(int fd_in, int fd_out)
+{
+   static char buf[PKTLINE_DATA_MAXLEN];
+   int err = 0;
+   ssize_t bytes_to_write;
+
+   while (!err) {
+   bytes_to_write = xread(fd_in, buf, sizeof(buf));
+   if (bytes_to_write < 0)
+   return COPY_READ_ERROR;
+   if (bytes_to_write == 0)
+   break;
+   err = packet_write_gently(fd_out, buf, bytes_to_write);
+   }
+   if (!err)
+   err = packet_flush_gently(fd_out);
+   return err;
+}
+
+int write_packetized_from_buf(const char *src_in, size_t len, int fd_out)
+{
+   static char buf[PKTLINE_DATA_MAXLEN];
+   int err = 0;
+   size_t bytes_written = 0;
+   size_t bytes_to_write;
+
+   while (!err) {
+   if ((len - bytes_written) > sizeof(buf))
+   bytes_to_write = sizeof(buf);
+   else
+   bytes_to_write = len - bytes_written;
+   if (bytes_to_write == 0)
+   break;
+   err = packet_write_gently(fd_out, src_in + bytes_written, 
bytes_to_write);
+   bytes_written += bytes_to_write;
+   }
+   if (!err)
+   err = packet_flush_gently(fd_out);
+   return err;
+}
+
 static int get_packet_data(int fd, char **src_buf, size_t *src_size,
   void *dst, unsigned size, int options)
 {
@@ -318,3 +359,30 @@ char *packet_read_line_buf(char **src, size_t *src_len, 
int *dst_len)
 {
return packet_read_line_generic(-1, src, src_len, dst_len);
 }
+
+ssize_t read_packetized_to_buf(int fd_in, struct strbuf *sb_out)
+{
+   int paket_len;
+   int options = PACKET_READ_GENTLE_ON_EOF;
+
+   size_t oldlen = sb_out->len;
+   size_t oldalloc = sb_out->alloc;
+
+   for (;;) {
+   strbuf_grow(sb_out, PKTLINE_DATA_MAXLEN+1);
+   paket_len = packet_read(fd_in, NULL, NULL,
+   sb_out->buf + sb_out->len, PKTLINE_DATA_MAXLEN+1, 
options);
+   if (paket_len <= 0)
+   break;
+   sb_out->len += paket_len;
+   }
+
+   if (paket_len < 0) {
+   if (oldalloc == 0)
+   strbuf_release(sb_out);
+   else
+   strbuf_setlen(sb_out, oldlen);
+   return paket_len;
+   }
+   return sb_out->len - oldlen;
+}
diff --git a/pkt-line.h b/pkt-line.h
index 3fa0899..6df8449 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -25,6 +25,8 @@ void packet_buf_flush(struct strbuf *buf);
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...) 
__attribute__((format (printf, 2, 3)));
 int packet_flush_gently(int fd);
 int packet_write_fmt_gently(int fd, const char *fmt, ...) 
__attribute__((format (printf, 2, 3)));
+int write_packetized_from_fd(int fd_in, int fd_out);
+int write_packetized_from_buf(const char *src_in, size_t len, int fd_out);
 
 /*
  * Read a packetized line into the buffer, which must be at least size bytes
@@ -77,6 +79,11 @@ char *packet_read_line(int fd, int *size);
  */
 char *packet_read_line_buf(char **src_buf, size_t *src_len, int *size);
 
+/*
+ * Reads a stream of variable sized packets until a flush packet is detected.
+ */
+ssize_t read_packetized_to_buf(int fd_in, struct strbuf *sb_out);
+
 #define DEFAULT_PACKET_MAX 1000
 #define LARGE_PACKET_MAX 65520
 extern char packet_buffer[LARGE_PACKET_MAX];
-- 
2.10.0



[PATCH v7 02/10] pkt-line: extract set_packet_header()

2016-09-08 Thread larsxschneider
From: Lars Schneider 

set_packet_header() converts an integer to a 4 byte hex string. Make
this function locally available so that other pkt-line functions can
use it.

Signed-off-by: Lars Schneider 
---
 pkt-line.c | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/pkt-line.c b/pkt-line.c
index 0a9b61c..e8adc0f 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -97,10 +97,20 @@ void packet_buf_flush(struct strbuf *buf)
strbuf_add(buf, "", 4);
 }
 
-#define hex(a) (hexchar[(a) & 15])
-static void format_packet(struct strbuf *out, const char *fmt, va_list args)
+static void set_packet_header(char *buf, const int size)
 {
static char hexchar[] = "0123456789abcdef";
+
+   #define hex(a) (hexchar[(a) & 15])
+   buf[0] = hex(size >> 12);
+   buf[1] = hex(size >> 8);
+   buf[2] = hex(size >> 4);
+   buf[3] = hex(size);
+   #undef hex
+}
+
+static void format_packet(struct strbuf *out, const char *fmt, va_list args)
+{
size_t orig_len, n;
 
orig_len = out->len;
@@ -111,10 +121,7 @@ static void format_packet(struct strbuf *out, const char 
*fmt, va_list args)
if (n > LARGE_PACKET_MAX)
die("protocol error: impossibly long line");
 
-   out->buf[orig_len + 0] = hex(n >> 12);
-   out->buf[orig_len + 1] = hex(n >> 8);
-   out->buf[orig_len + 2] = hex(n >> 4);
-   out->buf[orig_len + 3] = hex(n);
+   set_packet_header(>buf[orig_len], n);
packet_trace(out->buf + orig_len + 4, n - 4, 1);
 }
 
-- 
2.10.0



[PATCH v7 01/10] pkt-line: rename packet_write() to packet_write_fmt()

2016-09-08 Thread larsxschneider
From: Lars Schneider 

packet_write() should be called packet_write_fmt() as the string
parameter can be formatted.

Suggested-by: Junio C Hamano 
Signed-off-by: Lars Schneider 
---
 builtin/archive.c|  4 ++--
 builtin/receive-pack.c   |  4 ++--
 builtin/remote-ext.c |  4 ++--
 builtin/upload-archive.c |  4 ++--
 connect.c|  2 +-
 daemon.c |  2 +-
 http-backend.c   |  2 +-
 pkt-line.c   |  2 +-
 pkt-line.h   |  2 +-
 shallow.c|  2 +-
 upload-pack.c| 30 +++---
 11 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/builtin/archive.c b/builtin/archive.c
index a1e3b94..49f4914 100644
--- a/builtin/archive.c
+++ b/builtin/archive.c
@@ -47,10 +47,10 @@ static int run_remote_archiver(int argc, const char **argv,
if (name_hint) {
const char *format = archive_format_from_filename(name_hint);
if (format)
-   packet_write(fd[1], "argument --format=%s\n", format);
+   packet_write_fmt(fd[1], "argument --format=%s\n", 
format);
}
for (i = 1; i < argc; i++)
-   packet_write(fd[1], "argument %s\n", argv[i]);
+   packet_write_fmt(fd[1], "argument %s\n", argv[i]);
packet_flush(fd[1]);
 
buf = packet_read_line(fd[0], NULL);
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 011db00..1ce7682 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -218,7 +218,7 @@ static int receive_pack_config(const char *var, const char 
*value, void *cb)
 static void show_ref(const char *path, const unsigned char *sha1)
 {
if (sent_capabilities) {
-   packet_write(1, "%s %s\n", sha1_to_hex(sha1), path);
+   packet_write_fmt(1, "%s %s\n", sha1_to_hex(sha1), path);
} else {
struct strbuf cap = STRBUF_INIT;
 
@@ -233,7 +233,7 @@ static void show_ref(const char *path, const unsigned char 
*sha1)
if (advertise_push_options)
strbuf_addstr(, " push-options");
strbuf_addf(, " agent=%s", git_user_agent_sanitized());
-   packet_write(1, "%s %s%c%s\n",
+   packet_write_fmt(1, "%s %s%c%s\n",
 sha1_to_hex(sha1), path, 0, cap.buf);
strbuf_release();
sent_capabilities = 1;
diff --git a/builtin/remote-ext.c b/builtin/remote-ext.c
index 88eb8f9..11b48bf 100644
--- a/builtin/remote-ext.c
+++ b/builtin/remote-ext.c
@@ -128,9 +128,9 @@ static void send_git_request(int stdin_fd, const char 
*serv, const char *repo,
const char *vhost)
 {
if (!vhost)
-   packet_write(stdin_fd, "%s %s%c", serv, repo, 0);
+   packet_write_fmt(stdin_fd, "%s %s%c", serv, repo, 0);
else
-   packet_write(stdin_fd, "%s %s%chost=%s%c", serv, repo, 0,
+   packet_write_fmt(stdin_fd, "%s %s%chost=%s%c", serv, repo, 0,
 vhost, 0);
 }
 
diff --git a/builtin/upload-archive.c b/builtin/upload-archive.c
index 2caedf1..dc872f6 100644
--- a/builtin/upload-archive.c
+++ b/builtin/upload-archive.c
@@ -88,11 +88,11 @@ int cmd_upload_archive(int argc, const char **argv, const 
char *prefix)
writer.git_cmd = 1;
if (start_command()) {
int err = errno;
-   packet_write(1, "NACK unable to spawn subprocess\n");
+   packet_write_fmt(1, "NACK unable to spawn subprocess\n");
die("upload-archive: %s", strerror(err));
}
 
-   packet_write(1, "ACK\n");
+   packet_write_fmt(1, "ACK\n");
packet_flush(1);
 
while (1) {
diff --git a/connect.c b/connect.c
index 722dc3f..5330d9c 100644
--- a/connect.c
+++ b/connect.c
@@ -730,7 +730,7 @@ struct child_process *git_connect(int fd[2], const char 
*url,
 * Note: Do not add any other headers here!  Doing so
 * will cause older git-daemon servers to crash.
 */
-   packet_write(fd[1],
+   packet_write_fmt(fd[1],
 "%s %s%chost=%s%c",
 prog, path, 0,
 target_host, 0);
diff --git a/daemon.c b/daemon.c
index 425aad0..afce1b9 100644
--- a/daemon.c
+++ b/daemon.c
@@ -281,7 +281,7 @@ static int daemon_error(const char *dir, const char *msg)
 {
if (!informative_errors)
msg = "access denied or repository not exported";
-   packet_write(1, "ERR %s: %s", msg, dir);
+   packet_write_fmt(1, "ERR %s: %s", msg, dir);
return -1;
 }
 
diff --git a/http-backend.c b/http-backend.c
index adc8c8c..eef0a36 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -464,7 +464,7 @@ static void get_info_refs(struct strbuf *hdr, char *arg)
  

[PATCH v7 03/10] pkt-line: add packet_write_fmt_gently()

2016-09-08 Thread larsxschneider
From: Lars Schneider 

packet_write_fmt() would die in case of a write error even though for
some callers an error would be acceptable. Add packet_write_fmt_gently()
which writes a formatted pkt-line and returns `0` for success and `-1`
for an error.

Signed-off-by: Lars Schneider 
---
 pkt-line.c | 43 +++
 pkt-line.h |  1 +
 2 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/pkt-line.c b/pkt-line.c
index e8adc0f..3824d05 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -125,16 +125,51 @@ static void format_packet(struct strbuf *out, const char 
*fmt, va_list args)
packet_trace(out->buf + orig_len + 4, n - 4, 1);
 }
 
+static int packet_write_fmt_1(int fd, int gently,
+  const char *fmt, va_list args)
+{
+   struct strbuf buf = STRBUF_INIT;
+   size_t count;
+
+   format_packet(, fmt, args);
+   count = write_in_full(fd, buf.buf, buf.len);
+   if (count == buf.len)
+   return 0;
+
+   if (!gently) {
+   if (errno == EPIPE) {
+   if (in_async())
+   async_exit(141);
+
+   signal(SIGPIPE, SIG_DFL);
+   raise(SIGPIPE);
+   /* Should never happen, but just in case... */
+   exit(141);
+   }
+   die_errno("packet write error");
+   }
+   error("packet write failed");
+   return -1;
+}
+
 void packet_write_fmt(int fd, const char *fmt, ...)
 {
-   static struct strbuf buf = STRBUF_INIT;
va_list args;
 
-   strbuf_reset();
va_start(args, fmt);
-   format_packet(, fmt, args);
+   packet_write_fmt_1(fd, 0, fmt, args);
+   va_end(args);
+}
+
+int packet_write_fmt_gently(int fd, const char *fmt, ...)
+{
+   int status;
+   va_list args;
+
+   va_start(args, fmt);
+   status = packet_write_fmt_1(fd, 1, fmt, args);
va_end(args);
-   write_or_die(fd, buf.buf, buf.len);
+   return status;
 }
 
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...)
diff --git a/pkt-line.h b/pkt-line.h
index 1902fb3..3caea77 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -23,6 +23,7 @@ void packet_flush(int fd);
 void packet_write_fmt(int fd, const char *fmt, ...) __attribute__((format 
(printf, 2, 3)));
 void packet_buf_flush(struct strbuf *buf);
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...) 
__attribute__((format (printf, 2, 3)));
+int packet_write_fmt_gently(int fd, const char *fmt, ...) 
__attribute__((format (printf, 2, 3)));
 
 /*
  * Read a packetized line into the buffer, which must be at least size bytes
-- 
2.10.0



[PATCH v7 00/10] Git filter protocol

2016-09-08 Thread larsxschneider
From: Lars Schneider 

The goal of this series is to avoid launching a new clean/smudge filter
process for each file that is filtered.

A short summary about v1 to v5 can be found here:
https://git.github.io/rev_news/2016/08/17/edition-18/

This series is also published on web:
https://github.com/larsxschneider/git/pull/11

Thanks a lot to
  Stefan, Junio, Jakub (Narebski and Keller), Peff, and Torsten
for very helpful reviews,
Lars



## Major changes since v6

* series rebased to Git v2.10.0
* simplified and improved multi-packet test
* v2 filter takes precedence over v1 filter
* rename "error-all" to "abort"
* patch 07/13 "pack-protocol: fix maximum pkt-line size" submitted separately:
  http://public-inbox.org/git/20160829175509.69050-1-larsxschnei...@gmail.com/
* removed "10/13 convert: generate large test files only once"
* patch 13/13 "read-cache: make sure file handles are not inherited by child
  processes" moved to an own series:
  http://public-inbox.org/git/2016090521.72956-1-larsxschnei...@gmail.com/



## All changes since v6

### Stefan

* 
http://public-inbox.org/git/cagz79kajn-yf28+ls2jyqss6jt-oj2jw-saxqn-oe5xaxsy...@mail.gmail.com/
* submit patch "pack-protocol: fix maximum pkt-line size" separately:
  
http://public-inbox.org/git/20160829175509.69050-1-larsxschnei...@gmail.com/

* 
http://public-inbox.org/git/cagz79kbeoyvm_+mwkajt2phn1okzxate5d+dv_usj7dypgx...@mail.gmail.com/
* trace failures in new *_gently() pkt-line functions


### Junio

* http://public-inbox.org/git/xmqq8tvkle6o@gitster.mtv.corp.google.com/
* share code between packet_write_fmt() and packet_write_fmt_gently()

* http://public-inbox.org/git/xmqq4m68ldrg@gitster.mtv.corp.google.com/
* remove "Second, it will..." from "pkt-line: add packet_write_gently()"
  commit message
* use separate buffer in write_packetized_from_fd()
* fix write order in packet_write_gently()

* http://public-inbox.org/git/xmqqzio0jxh7@gitster.mtv.corp.google.com/
* rename read/write flush terminated packet streams functions
* remove meaningless "bytes_to_write > ..." check
* reuse packet_read() function in read_packetized_to_buf()

* http://public-inbox.org/git/xmqqk2f3fgbd@gitster.mtv.corp.google.com/
* revert test_config replacement in test setup

* http://public-inbox.org/git/xmqq37lncvj6@gitster.mtv.corp.google.com/
* improve multi-packet test

* http://public-inbox.org/git/xmqqzinv6wtg@gitster.mtv.corp.google.com/
* give v2 filter precedence over v1 filter

* http://public-inbox.org/git/xmqqmvjt3nht@gitster.mtv.corp.google.com/
* explain that "error" behavior mimics v1 filter
* explain that Git closes the pipe on exit to allow the filter a graceful
  exit


### Jakub

* http://public-inbox.org/git/7fbd02a1-ad62-7391-df2a-835aae8a6...@gmail.com/
* rename "error-all" to "abort"
* add a simple Perl v2 filter example in contrib



## Interdiff (v6..v7)

diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt
index df059e9..ac000ea 100644
--- a/Documentation/gitattributes.txt
+++ b/Documentation/gitattributes.txt
@@ -470,20 +470,11 @@ packet:  git< status=error\n
 packet:  git< 
 

-In case the filter cannot or does not want to process the content
-as well as any future content for the lifetime of the Git process,
-it is expected to respond with an "error-all" status. Depending on
-the `filter..required` flag Git will interpret that as error
-but it will not stop or restart the filter process.
-
-packet:  git< status=error-all\n
-packet:  git< 
-
-
 If the filter experiences an error during processing, then it can
-send the status "error". Depending on the `filter..required`
-flag Git will interpret that as error but it will not stop or restart
-the filter process.
+send the status "error" after the content was (partially or
+completely) sent. Depending on the `filter..required` flag
+Git will interpret that as error but it will not stop or restart the
+filter process.
 
 packet:  git< status=success\n
 packet:  git< 
@@ -495,21 +486,38 @@ packet:  git< 

 If the filter dies during the communication or does not adhere to
 the protocol then Git will stop the filter process and restart it
-with the next file that needs to be processed.
+with the next file that needs to be processed. Depending on the
+`filter..required` flag Git will interpret that as error.
+
+The error handling for all cases above mimic the behavior of
+the `filter..clean` / `filter..smudge` error
+handling.
+
+In case the filter cannot or does not want to process the content
+as well as any future content for the lifetime of the Git process,
+it is expected to respond with an "abort" status. Depending on
+the `filter..required` flag Git will interpret that as 

Re: [PATCH v3 3/3] Use the newly-introduced regexec_buf() function

2016-09-08 Thread Junio C Hamano
Johannes Schindelin  writes:

> @@ -33,11 +32,8 @@ static void diffgrep_consume(void *priv, char *line, 
> unsigned long len)
>* caller early.
>*/
>   return;
> - /* Yuck -- line ought to be "const char *"! */
> - hold = line[len];
> - line[len] = '\0';
> - data->hit = !regexec(data->regexp, line + 1, 1, , 0);
> - line[len] = hold;
> + data->hit = !regexec_buf(data->regexp, line + 1, len - 1, 1,
> +  , 0);

This is an unexpected happy surprise.  It really feels good to see
that "Yuck" line go.

> @@ -228,18 +227,16 @@ static long ff_regexp(const char *line, long len,
>   len--;
>   }
>  
> - line_buffer = xstrndup(line, len); /* make NUL terminated */
> -
>   for (i = 0; i < regs->nr; i++) {
>   struct ff_reg *reg = regs->array + i;
> - if (!regexec(>re, line_buffer, 2, pmatch, 0)) {
> + if (!regexec_buf(>re, line, len, 2, pmatch, 0)) {

So is this hunk.  Removing unnecessary copying is a very good thing.

Please give these three patches a common prefix, e.g.

regex: -G feeds a non NUL-terminated string to regexec() and 
fails
regex: add regexec_buf() that can work on a non NUL-terminated string
regex: use regexec_buf()

or something like that.

Also I agree with Peff that a test with an embedded NUL would be a
good thing.

This round is so close to perfect.


Re: [PATCH v3 2/3] Introduce a function to run regexec() on non-NUL-terminated buffers

2016-09-08 Thread Junio C Hamano
Johannes Schindelin  writes:

> We just introduced a test that demonstrates that our sloppy use of
> regexec() on a mmap()ed area can result in incorrect results or even
> hard crashes.
>
> So what we need to fix this is a function that calls regexec() on a
> length-delimited, rather than a NUL-terminated, string.
>
> Happily, there is an extension to regexec() introduced by the NetBSD
> project and present in all major regex implementation including
> Linux', MacOSX' and the one Git includes in compat/regex/: by using
> the (non-POSIX) REG_STARTEND flag, it is possible to tell the
> regexec() function that it should only look at the offsets between
> pmatch[0].rm_so and pmatch[0].rm_eo.
>
> That is exactly what we need.

Yes, that is good.

> Since support for REG_STARTEND is so widespread by now, let's just
> introduce a helper function that uses it, and fall back to allocating
> and constructing a NUL-terminated when REG_STARTEND is not available.

I do not think this fallback is good; we do ship a compat/ fallback
that does support REG_STARTEND and you'd want to use that.  Not
having the copying fallback means you do not even have to worry
about the size+1 overflow and fix it with xmallocz() ;-)


Re: [PATCH 2/3] diff_populate_filespec: NUL-terminate buffers

2016-09-08 Thread Junio C Hamano
Jeff King  writes:

> Between the two options for regexec_buf(), I think you have convinced me
> that REG_STARTEND is better than just using compat/regex everywhere. I
> do think the fallback for platforms like musl should be "use
> compat/regex" and not doing an expensive copy (which in most cases is
> not even necessary).

I agree with you that it would be the best approach to build
regexec_buf() that unconditionally uses REG_STARTEND and tell people
without REG_STARTEND to use compat/regex instead of their platform
regex library.

The description in Makefile may want to be rephrased to clarify.

-# Define NO_REGEX if you have no or inferior regex support in your C library.
+# Define NO_REGEX if your C library lacks regex support with REG_STARTEND
+# feature.

The word "inferior" is not giving any useful information there.



Re: git-gui, was Re: [PATCH v2 6/6] git-gui: Update Japanese information

2016-09-08 Thread Junio C Hamano
Johannes Schindelin  writes:

> On Wed, 7 Sep 2016, Junio C Hamano wrote:
>
>> Pat, we haven't heard from you for a long time.
>
> Indeed. There are a couple of git-gui patches in Git for Windows that have
> been contributed a long time ago and not been picked up.
>
> Maybe it is time to just accept git-gui patches directly into Git, after
> some other Tcl/Tk savvy people ACKed them?

Yes, that was my thinking, though I'd prefer to see somebodyto be
acting as a central point of contact so that I do not have to pay
any attention to the part of the system other than responding to a
pull request.  The subsystem maintainer does not have to forever be
Pat, of course, and can be handed over to somebody else, but if that
is what is going to happen, I'd prefer to see a volunteer or two to
step up and Pat to bless them.




Re: [PATCH] connect: tighten check for unexpected early hang up (Re: [PATCH v3 2/2] connect: advertized capability is not a ref)

2016-09-08 Thread Junio C Hamano
Jonathan Nieder  writes:

> Jonathan Nieder wrote:
>
>> Subject: connect: tighten check for unexpected early hang up
> [...]
>> @@ -131,7 +131,7 @@ struct ref **get_remote_heads(int in, char *src_buf, 
>> size_t src_len,
>>PACKET_READ_GENTLE_ON_EOF |
>>PACKET_READ_CHOMP_NEWLINE);
>>  if (len < 0)
>> -die_initial_contact(got_at_least_one_head);
>> +die_initial_contact(first_line);
>
> This should say !first_line.
>
> I'll add tests if the patch seems like a good idea.

I tried to write one-paragraph comment for the die_initial_contact()
function, like so:

+/*
+ * A remote end that is unwilling to talk to us would not give
+ * any response to us before hanging up.  After seeing some
+ * response, we know the hang-up is unexpected.
+ */
+static void die_initial_contact(int saw_any_response)

but then I got stuck.

We may know that after seeing any response (not necessarily a ref,
but .have or shallow) the other end is willing to talk to us, but
the reverse is not necessarily true (it may be willing to talk to
us, but the network between us may have prevented it from doing so).
For that reason, the above comment is inappropriate for a function
that takes a bool and gives an "unexpected hung-up" or an
"unreachable, possible ACL or problems" message.

So my second attempt was to comment on the variable that keeps track
of the status of the conversation, which turned out to be better
(attached).

I think I fixed your "oops, the bool needs polarity flip".  A test
may be a good idea, but I am not sure how you plan to produce a
failure after sending some response.

-- >8 --
From: Jonathan Nieder 
Date: Wed, 7 Sep 2016 18:45:55 -0700
Subject: [PATCH] connect: tighten check for unexpected early hang up

A server hanging up immediately to mark access being denied does not
send any .have refs, shallow lines, or anything else before hanging
up.  If the server has sent anything, then the hangup is unexpected.

That is, if the server hangs up after a shallow line but before sending
any refs, then git should tell me so:

fatal: The remote end hung up upon initial contact

instead of suggesting an access control problem:

fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.

Noticed while examining this code.  This case isn't likely to come up
in practice but tightening the check makes the code easier to read and
manipulate.

Signed-off-by: Jonathan Nieder 
Signed-off-by: Junio C Hamano 
---
 connect.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/connect.c b/connect.c
index c53f3f1..067cf40 100644
--- a/connect.c
+++ b/connect.c
@@ -43,9 +43,9 @@ int check_ref_type(const struct ref *ref, int flags)
return check_ref(ref->name, flags);
 }
 
-static void die_initial_contact(int got_at_least_one_head)
+static void die_initial_contact(int unexpected)
 {
-   if (got_at_least_one_head)
+   if (unexpected)
die("The remote end hung up upon initial contact");
else
die("Could not read from remote repository.\n\n"
@@ -115,10 +115,17 @@ struct ref **get_remote_heads(int in, char *src_buf, 
size_t src_len,
  struct sha1_array *shallow_points)
 {
struct ref **orig_list = list;
-   int got_at_least_one_head = 0;
+
+   /*
+* A hang-up after seeing some response from the other end
+* means that it is unexpected, as we know the other end is
+* willing to talk to us.  A hang-up before seeing any
+* response does not necessarily mean an ACL problem, though.
+*/
+   int saw_response;
 
*list = NULL;
-   for (;;) {
+   for (saw_response = 0; ; saw_response = 1) {
struct ref *ref;
struct object_id old_oid;
char *name;
@@ -131,7 +138,7 @@ struct ref **get_remote_heads(int in, char *src_buf, size_t 
src_len,
  PACKET_READ_GENTLE_ON_EOF |
  PACKET_READ_CHOMP_NEWLINE);
if (len < 0)
-   die_initial_contact(got_at_least_one_head);
+   die_initial_contact(saw_response);
 
if (!len)
break;
@@ -171,7 +178,6 @@ struct ref **get_remote_heads(int in, char *src_buf, size_t 
src_len,
oidcpy(>old_oid, _oid);
*list = ref;
list = >next;
-   got_at_least_one_head = 1;
}
 
annotate_refs_with_symref_info(*orig_list);
-- 
2.10.0-267-g7db2ae3



Re: [PATCH] connect: tighten check for unexpected early hang up (Re: [PATCH v3 2/2] connect: advertized capability is not a ref)

2016-09-08 Thread Stefan Beller
On Wed, Sep 7, 2016 at 6:45 PM, Jonathan Nieder  wrote:
> (+cc: Heiko)
> Jonathan Nieder wrote:
>
>> 'die_initial_contact' uses got_at_least_one_head to determine whether
>> it was on the first line but code paths added later that use
>> 'continue' don't populate it properly (see b06dcd7d, 40c155ff, and
>> 1a7141ff).  We could do
>>
>>   int first_line = 1;
>>
>>   for (;; first_line = 0) {
>>   ...
>>   }
>>
>> and use !first_line instead of got_at_least_one_head (removing
>> got_at_least_one_head in the process since it has no other purpose).
>
> I got the history wrong.  It looks like this was always confused
> by the 'continue' cases.  Unless I'm missing something subtle ---
> thoughts?

I was a bit confused by the line

for (;; first_line = 0) {

at first, but the explanation of 'continue's make sense for this pattern.
However I'd rather prefer if we'd have

int first_line = 1;
for(;;) {
... // stuff with no continue here
if (len < 0)
die_initial_contact(!first_line);
first_line = 0;
... // here we may have some continues, but that doesn't matter
// w.r.t. first_line
}


Re: [PATCH v3 2/2] connect: advertized capability is not a ref

2016-09-08 Thread Junio C Hamano
Jonathan Nieder  writes:

> I think we can make this stricter.  The capabilities^{} line is supposed
> to be the first advertised ref, before any 'shallow' lines or .have
> extra refs.

"The first", or "the first and only"?  I thought that it would be
the latter.


[PATCH 0/3] Fix git-init in linked worktrees

2016-09-08 Thread Nguyễn Thái Ngọc Duy
My ASAP is not so ASAP. Sorry about that but I think I have fixed it.

Side note about 2/3. I've known this problem (in general) for years
(accidentally reading .git config file before .git is searched) and
could not do anything about it. And because test_expect_failure should
only be there if someone will eventually fix it, and I don't see
myself doing it, and I don't see it super important that other people
would bother, so I decided to simply sweep it under the rug. I could
make a test_expect_failure if people think otherwise.

Side note about 3/3. Yes there's a FIXME in there. It will take a lot
more time to remove that FIXME (because setup_git_directory is not as
simple). For now it should be ok to leave it there. When we find a use
for get_first_git_dir outside git-init, we can fix it then.

Nguyễn Thái Ngọc Duy (3):
  init: correct re-initialization from a linked worktree
  t0001: work around the bug that reads config file before repo setup
  init: do not set core.worktree more often than necessary

 builtin/init-db.c |  4 ++--
 cache.h   |  1 +
 environment.c | 16 +++-
 t/t0001-init.sh   | 19 +++
 4 files changed, 37 insertions(+), 3 deletions(-)

-- 
2.8.2.524.g6ff3d78



[PATCH 3/3] init: do not set core.worktree more often than necessary

2016-09-08 Thread Nguyễn Thái Ngọc Duy
When "git init" is called with GIT_WORK_TREE environment set, we want to
keep this worktree's location in core.worktree so the user does not have
to set the environment again and again. See ef6f0af (git-init: set
core.worktree if GIT_WORK_TREE is specified - 2007-07-04)

We detect that by this logic (in needs_work_tree_config): normally
worktree's top dir would contains ".git" directory, if this is not true,
worktree is probably set to elsewhere by the user.

Unfortunately when it calls get_git_dir() it does not take ".git" files
into account. When we find a .git file, we immediately follow the file
until we find the real ".git" directory. The location of this first
".git" file is lost.

The .git file would satisfy the logic above and not create
core.worktree (correct). But because the final .git's location is used,
needs_work_tree_config() is misled and creates core.worktree anyway.

This would not be a huge deal normally. But if this happens in a
multiple worktree setup it becomes a real problem because up until now,
core.worktree will be applied to the main worktree only. If you
accidentally do "git init" from a linked worktree, you set
core.worktree (for the main repo) pointing to the _linked_ worktree.
After that point, may you live in interesting times.

Record the .git file location and use it here.

Noticed-by: Max Nordlund 
Helped-by: Michael J Gruber 
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/init-db.c |  2 +-
 cache.h   |  1 +
 environment.c | 16 +++-
 t/t0001-init.sh   |  2 ++
 4 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/builtin/init-db.c b/builtin/init-db.c
index 6d9552e..36255f2 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -254,7 +254,7 @@ static int create_default_files(const char *template_path)
/* allow template config file to override the default */
if (log_all_ref_updates == -1)
git_config_set("core.logallrefupdates", "true");
-   if (needs_work_tree_config(get_git_dir(), work_tree))
+   if (needs_work_tree_config(get_first_git_dir(), work_tree))
git_config_set("core.worktree", work_tree);
}
 
diff --git a/cache.h b/cache.h
index b780a91..e6c05f8 100644
--- a/cache.h
+++ b/cache.h
@@ -460,6 +460,7 @@ extern char *git_work_tree_cfg;
 extern int is_inside_work_tree(void);
 extern const char *get_git_dir(void);
 extern const char *get_git_common_dir(void);
+extern const char *get_first_git_dir(void);
 extern char *get_object_directory(void);
 extern char *get_index_file(void);
 extern char *get_graft_file(void);
diff --git a/environment.c b/environment.c
index ca72464..8cfb8f3 100644
--- a/environment.c
+++ b/environment.c
@@ -100,7 +100,7 @@ static char *work_tree;
 static const char *namespace;
 static size_t namespace_len;
 
-static const char *git_dir, *git_common_dir;
+static const char *git_dir, *git_common_dir, *first_git_dir;
 static char *git_object_dir, *git_index_file, *git_graft_file;
 int git_db_env, git_index_env, git_graft_env, git_common_dir_env;
 
@@ -168,6 +168,8 @@ static void setup_git_env(void)
if (!git_dir)
git_dir = DEFAULT_GIT_DIR_ENVIRONMENT;
gitfile = read_gitfile(git_dir);
+   if (gitfile && !first_git_dir)
+   first_git_dir = xstrdup(git_dir);
git_dir = xstrdup(gitfile ? gitfile : git_dir);
if (get_common_dir(, git_dir))
git_common_dir_env = 1;
@@ -203,6 +205,18 @@ const char *get_git_dir(void)
return git_dir;
 }
 
+/*
+ * Return the first ".git" that we have encountered.
+ * FIXME this function for not entirely correct because
+ * setup_git_directory() and enter_repo() do not update first_git_dir
+ * when they follow .git files. The function in its current state is
+ * only suitable for "git init".
+ */
+const char *get_first_git_dir(void)
+{
+   return first_git_dir ? first_git_dir : git_dir;
+}
+
 const char *get_git_common_dir(void)
 {
return git_common_dir;
diff --git a/t/t0001-init.sh b/t/t0001-init.sh
index 393c940..d59669a 100755
--- a/t/t0001-init.sh
+++ b/t/t0001-init.sh
@@ -393,9 +393,11 @@ test_expect_success 're-init from a linked worktree' '
test_commit first &&
git worktree add ../linked-worktree &&
mv .git/info/exclude expected-exclude &&
+   cp .git/config expected-config &&
find .git/worktrees -print | sort >expected &&
git -C ../linked-worktree init &&
test_cmp expected-exclude .git/info/exclude &&
+   test_cmp expected-config .git/config &&
find .git/worktrees -print | sort >actual &&
test_cmp expected actual
)
-- 
2.8.2.524.g6ff3d78



[PATCH 1/3] init: correct re-initialization from a linked worktree

2016-09-08 Thread Nguyễn Thái Ngọc Duy
When 'git init' is called from a linked worktree, '.git' dir as the main
'.git' (i.e. $GIT_COMMON_DIR) and populate the whole repository skeleton
in there. It does not harm anything (*) but it is still wrong.

Since 'git init' calls set_git_dir() at preparation time, which
indirectly calls get_common_dir() and correctly detects multiple
worktree setup, all git_path_buf() calls in create_default_files() will
return correct paths in both single and multiple worktree setups. The
only thing left is copy_templates(), which targets $GIT_DIR, not
$GIT_COMMON_DIR.

Fix that with get_git_common_dir(). This function will return $GIT_DIR
in single-worktree setup, so we don't have to make a special case for
multiple-worktree here.

(*) It does in fact, thanks to another bug. More on that later.

Noticed-by: Max Nordlund 
Helped-by: Michael J Gruber 
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/init-db.c |  2 +-
 t/t0001-init.sh   | 15 +++
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/builtin/init-db.c b/builtin/init-db.c
index 3a45f0b..6d9552e 100644
--- a/builtin/init-db.c
+++ b/builtin/init-db.c
@@ -138,7 +138,7 @@ static void copy_templates(const char *template_dir)
goto close_free_return;
}
 
-   strbuf_addstr(, get_git_dir());
+   strbuf_addstr(, get_git_common_dir());
strbuf_complete(, '/');
copy_templates_1(, _path, dir);
 close_free_return:
diff --git a/t/t0001-init.sh b/t/t0001-init.sh
index a6fdd5e..d64e5e3 100755
--- a/t/t0001-init.sh
+++ b/t/t0001-init.sh
@@ -384,4 +384,19 @@ test_expect_success MINGW 'bare git dir not hidden' '
! is_hidden newdir
 '
 
+test_expect_success 're-init from a linked worktree' '
+   git init main-worktree &&
+   (
+   cd main-worktree &&
+   test_commit first &&
+   git worktree add ../linked-worktree &&
+   mv .git/info/exclude expected-exclude &&
+   find .git/worktrees -print | sort >expected &&
+   git -C ../linked-worktree init &&
+   test_cmp expected-exclude .git/info/exclude &&
+   find .git/worktrees -print | sort >actual &&
+   test_cmp expected actual
+   )
+'
+
 test_done
-- 
2.8.2.524.g6ff3d78



[PATCH 2/3] t0001: work around the bug that reads config file before repo setup

2016-09-08 Thread Nguyễn Thái Ngọc Duy
git-init somehow reads '.git/config' at current directory and sets
log_all_ref_updates based on this file. Because log_all_ref_updates is
not unspecified (-1) any more. It will not be written to the new repo's
config file (see create_default_files() function).

This will affect our tests in the next patch as we will compare the
config file and expect that core.logallrefupdates is already set to true
by "git init main-worktree".

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 t/t0001-init.sh | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/t/t0001-init.sh b/t/t0001-init.sh
index d64e5e3..393c940 100755
--- a/t/t0001-init.sh
+++ b/t/t0001-init.sh
@@ -385,7 +385,9 @@ test_expect_success MINGW 'bare git dir not hidden' '
 '
 
 test_expect_success 're-init from a linked worktree' '
+   mv .git/config work-around-init-reading-wrong-file &&
git init main-worktree &&
+   mv work-around-init-reading-wrong-file .git/config &&
(
cd main-worktree &&
test_commit first &&
-- 
2.8.2.524.g6ff3d78



Re: How to simulate a real checkout to test a new smudge filter?

2016-09-08 Thread Jakub Narębski
W dniu 06.09.2016 o 23:01, john smith pisze:

> I'd prefer smudge/clean filters instead of `make' scripts etc. to
> convert template dotfiles into something usable and back because
> filters:
> 
> 1. could be run automatically
> 
> 2. do not modify files as shown by `git show HEAD:' and
> therefore no files are reported as modified by git status and also
> there are not conflicts when merging master into work/home branch.
> 
> I have problems because with point 1 because apparently smudge filter
> is not run automatically every time when branch is changed if files
> listed in .gitattributes do not change. As the last resort I could
> force smudge/clean filter to run just to keep advantage specified in
> point 2.

Couldn't you use post-checkout hook plus clean filter instead of
clean/smudge filter pair, if the smudge part depends on the branch?

Or make post-checkout hook invoke smudge filter... though
`git cat-file --filters` is not in any released version, I think...

Best,
-- 
Jakub Narębski



Re: [PATCH v2 6/6] git-gui: Update Japanese information

2016-09-08 Thread Satoshi Yasushima

On Wed, 07 Sep 2016 10:35:22 -0700 Junio C Hamano wrote:

Since I received the patch directly bypassing vger, I queued it on
gitgui-0.20.0 from Pat and tentatively merged it to my 'pu'.


wow, thanks so much.


Re: "fatal error in commit_refs" from pushing to github

2016-09-08 Thread Duy Nguyen
On Thu, Sep 8, 2016 at 8:25 AM, Jeff King  wrote:
> On Thu, Sep 08, 2016 at 07:49:12AM +0700, Duy Nguyen wrote:
>
>> I got the message in the subject when pushing to github today. Yes I
>> know it's github, not git. But according to stackoveflow [1] it's a
>> local problem. Which makes me think, if we know exactly what this is
>> (or at least roughly the problem area), maybe we could improve git to
>> catch it locally in the first place (and because other git servers may
>> not have the same protection as github).  Jeff maybe you can reveal
>> something about this "fatal error in commit_refs"? I'm sure it's not
>> in git code. But I would understand if the answer is "no".
>
> The short answer is that it's nothing to do with Git or the client; it's
> GitHub-specific code running on the server that is outside of Git
> entirely.
>
> The long answer is that pushes to GitHub don't hit Git directly these
> days. They hit a proxy layer that speaks just enough of the Git protocol
> to relay to N separate receives spread across N replica servers[1]. Those
> receive-packs take in the pack and verify it, but don't actually update
> any refs[2]. Then the proxy layer runs its own set of policy hooks, and
> speaks a commit-protocol to each of the replicas so that they all agree
> on the new ref state. That last step is called "commit_refs" internally.
>
> So this is really an internal failure at the ref-update stage. There
> _should_ be a reasonable error message, but I think "fatal error in
> commit_refs" is the generic last-ditch fallback. I'll pass this along to
> people in charge of that code, as we should be generating a more useful
> error message.

Hmm.. I'm interested in this because the "fix" is from client side. I
did "git gc" and "git fetch" and the problem was gone. From this
description, I suppose C Git sends a good pack (phew!), but probably
with some stale ref or something that upsets this this last stage.
It's hard to make a connection back to either gc or fetch. Maybe gc
does ref trimming or something (that should probably be done by
git-push as well). Oh well.. maybe next time I see it, I'll get a nice
and clear message :)
-- 
Duy


[no subject]

2016-09-08 Thread Jacobs, Steven



I am Jacobs  Steven  $ 2 Million Has Been Donated To You this is Real,For More 
Info Contact Mr James Stocklas with this Email ; >>   jamesstocklas...@gmail.com

Send Your Response To >>  jamesstocklas...@gmail.com

__
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
__


Re: [ANNOUNCE] Git for Windows 2.10.0

2016-09-08 Thread stefan.naewe
Am 03.09.2016 um 15:17 schrieb Johannes Schindelin:
> Dear Git users,
> 
> It is my pleasure to announce that Git for Windows 2.10.0 is available.
> This time, I even blogged about it, primarily because I am so excited
> about the speed improvements of rebase -i:
> 
> https://blogs.msdn.microsoft.com/visualstudioalm/2016/09/03/whats-new-in-git-for-windows-2-10/
> 
> As always, you can download it from: https://git-for-windows.github.io/
> 
> Changes since Git for Windows v2.9.3(2) (August 25th 2016)
> 
> New Features
> 
>   • Comes with Git v2.10.0.
>   • The git rebase -i command was made faster by reimplementing large
> parts in C.

I finally had the chance to do a "bigger" rebase and what shall I say...
F***k, has this thing become fast, or what!

Thank you so much for doing this

Stefan
-- 

/dev/random says: A Cat's courage is as strong as a dog's chain
python -c "print 
'73746566616e2e6e616577654061746c61732d656c656b74726f6e696b2e636f6d'.decode('hex')"
 
GPG Key fingerprint = 2DF5 E01B 09C3 7501 BCA9  9666 829B 49C5 9221 27AF

Re: [PATCH v2 3/3] Use the newly-introduced regexec_buf() function

2016-09-08 Thread Jeff King
On Thu, Sep 08, 2016 at 04:14:46AM -0400, Jeff King wrote:

> On Thu, Sep 08, 2016 at 04:10:24AM -0400, Jeff King wrote:
> 
> > On Thu, Sep 08, 2016 at 09:54:51AM +0200, Johannes Schindelin wrote:
> > 
> > > >  diff.c |  3 ++-
> > > >  diffcore-pickaxe.c | 18 --
> > > >  xdiff-interface.c  | 13 -
> > > >  3 files changed, 14 insertions(+), 20 deletions(-)
> > > 
> > > I just realized that this should switch the test_expect_failure from 1/3
> > > to a test_expect_success.
> > 
> > Yep. I wonder if we also would want to test that we correctly find
> > regexes inside binary files.
> > 
> > E.g., given a mixed binary/text file like:
> > 
> >   printf 'binary\0text' >file &&
> >   git add file &&
> >   git commit -m file
> > 
> > then "git log -Stext" will find that file, but "--pickaxe-regex" will
> > not (using stock git). Ditto for "-Gtext".
> > 
> > Your patch should fix that.
> 
> Of course if I had actually _looked carefully_ at your patch, I would
> have seen that your test doesn't just check that we don't segfault, but
> actually confirms that we find the entry.
> 
> Sorry for the noise.

Actually, I take it back again. Your test case doesn't have an embedded
NUL in it (so we check that git finds it, but aside from the lack of
segfault, stock git would already find it).

Sorry for the double-noise.

-Peff


Re: [PATCH] t6026-merge-attr: wait for process to release trash directory

2016-09-08 Thread Jeff King
On Thu, Sep 08, 2016 at 10:05:58AM +0200, Johannes Schindelin wrote:

> > > Is fifo safe on Windows, though?
> > 
> > No clue. We seem to use mkfifo unconditionally in lib-daemon, but
> > perhaps people do not run that test on Windows. Other invocations seem
> > to be protected by the PIPE prerequisite. But...
> 
> AFAICT we do not use mkfifo on Windows. Let's see what t/test-lib.sh has
> to say about the matter:
> 
>   test_lazy_prereq PIPE '
>   # test whether the filesystem supports FIFOs
>   case $(uname -s) in
>   CYGWIN*|MINGW*)
>   false
>   ;;
>   *)
>   rm -f testfifo && mkfifo testfifo
>   ;;
>   esac
>   '
> 
> So there you go.
> 
> The reason it is disabled is that Cygwin/MSYS2 do have a concept of a
> FIFO. But `git.exe` won't be able to access such a FIFO because it is
> emulated by the POSIX emulation layer, which Git cannot access.

Regarding my "unconditionally" above: coincidentally, I happened to be
looking in lib-git-daemon.sh about an hour ago and noticed that we do
indeed check "test_have_prereq PIPE" (just not near the mkfifo, of
course, because we are not in a test block).

It seems to have been added by a "Johannes Schindelin". Any relation?

So yeah. It definitely would be a bad idea to use a fifo in a test that
is already Windows-specific.

-Peff


Re: [PATCH 2/3] diff_populate_filespec: NUL-terminate buffers

2016-09-08 Thread Jeff King
On Thu, Sep 08, 2016 at 09:49:38AM +0200, Johannes Schindelin wrote:

> > > diff --git a/diff.c b/diff.c
> > > index 534c12e..2c5a360 100644
> > > --- a/diff.c
> > > +++ b/diff.c
> > > @@ -951,7 +951,13 @@ static int find_word_boundaries(mmfile_t *buffer,
> > > regex_t *word_regex,
> > >  {
> > >   if (word_regex && *begin < buffer->size) {
> > >   regmatch_t match[1];
> > > - if (!regexec(word_regex, buffer->ptr + *begin, 1, match,
> > >   0)) {
> > > + int f = 0;
> > > +#ifdef REG_STARTEND
> > > + match[0].rm_so = 0;
> > > + match[0].rm_eo = *end - *begin;
> > > + f = REG_STARTEND;
> > > +#endif
> > > + if (!regexec(word_regex, buffer->ptr + *begin, 1, match,
> > > f)) {
> 
> Heh. You introduced the same bug I did. Or maybe you just fetched my
> mmap-regexec branch and looked at an intermediate iteration?

I do not think I introduced anything.  The quoted text is what you
sent. Which is perhaps why it has your bug. :)

> > But I much prefer this approach to copying the data just to add a NUL.
> 
> I think it is not worth the burden. The only regex implementation in
> semi-widespread use that do not support REG_STARTEND seems to be musl.
> 
> I'd rather not spend *so much* effort just to support an obscure platform.
> Not when the users of that obscure platform could spend that effort
> themselves. And probably won't, because we only copy data to add a NUL on
> those platforms when regexec() is called on an mmfile_t.

I'm confused about what you think I'm proposing. I was saying I _like_
something like regexec_buf() instead of copying the data. Which seems
like the simpler thing to me (and presumably to you). Or do you mean
using compat/regex to build on re_search() consistently? I do not think
that is all that complex; the question is only whether people really
want to use their own regex libraries.

Between the two options for regexec_buf(), I think you have convinced me
that REG_STARTEND is better than just using compat/regex everywhere. I
do think the fallback for platforms like musl should be "use
compat/regex" and not doing an expensive copy (which in most cases is
not even necessary).

-Peff


Re: [PATCH v2 3/3] Use the newly-introduced regexec_buf() function

2016-09-08 Thread Jeff King
On Thu, Sep 08, 2016 at 04:10:24AM -0400, Jeff King wrote:

> On Thu, Sep 08, 2016 at 09:54:51AM +0200, Johannes Schindelin wrote:
> 
> > >  diff.c |  3 ++-
> > >  diffcore-pickaxe.c | 18 --
> > >  xdiff-interface.c  | 13 -
> > >  3 files changed, 14 insertions(+), 20 deletions(-)
> > 
> > I just realized that this should switch the test_expect_failure from 1/3
> > to a test_expect_success.
> 
> Yep. I wonder if we also would want to test that we correctly find
> regexes inside binary files.
> 
> E.g., given a mixed binary/text file like:
> 
>   printf 'binary\0text' >file &&
>   git add file &&
>   git commit -m file
> 
> then "git log -Stext" will find that file, but "--pickaxe-regex" will
> not (using stock git). Ditto for "-Gtext".
> 
> Your patch should fix that.

Of course if I had actually _looked carefully_ at your patch, I would
have seen that your test doesn't just check that we don't segfault, but
actually confirms that we find the entry.

Sorry for the noise.

-Peff


Re: [PATCH v2 0/3] Fix a segfault caused by regexec() being called on mmap()ed data

2016-09-08 Thread Jeff King
On Thu, Sep 08, 2016 at 09:33:29AM +0200, Johannes Schindelin wrote:

> On Thu, 8 Sep 2016, Johannes Schindelin wrote:
> 
> > We solve this by introducing a helper, regexec_buf(), that takes a
> > pointer and a length instead of a NUL-terminated string.
> 
> BTW I should have clarified why I decided on another name than regexecn()
> (I had considered this even before reading Peff's proposed patch): the 
> in string functions suggest a limiting of NUL-terminated strings. In other
> words, if n = 100 and the provided pointer points to a NUL-terminated
> string of length 3, the *n function will treat it as a string of length 3.
> 
> That is not what regexec_buf() does: it ignores the NUL. Hence the
> different name.

I agree that is a better name (this was the exact thing I wondered about
with REG_STARTEND, but certainly what we _want_ is true "_buf"
semantics).

I guess an argument that REG_STARTEND does what we want everywhere is
that the GNU implementation does what we want, and since it has been
around for over a decade presumably _somebody_ would have complained if
it did not match the NetBSD behavior.

-Peff


Re: [PATCH v2 3/3] Use the newly-introduced regexec_buf() function

2016-09-08 Thread Jeff King
On Thu, Sep 08, 2016 at 09:54:51AM +0200, Johannes Schindelin wrote:

> >  diff.c |  3 ++-
> >  diffcore-pickaxe.c | 18 --
> >  xdiff-interface.c  | 13 -
> >  3 files changed, 14 insertions(+), 20 deletions(-)
> 
> I just realized that this should switch the test_expect_failure from 1/3
> to a test_expect_success.

Yep. I wonder if we also would want to test that we correctly find
regexes inside binary files.

E.g., given a mixed binary/text file like:

  printf 'binary\0text' >file &&
  git add file &&
  git commit -m file

then "git log -Stext" will find that file, but "--pickaxe-regex" will
not (using stock git). Ditto for "-Gtext".

Your patch should fix that.

-Peff


Re: [PATCH] t6026-merge-attr: wait for process to release trash directory

2016-09-08 Thread Johannes Schindelin
Hi Peff & Junio,

On Wed, 7 Sep 2016, Jeff King wrote:

> On Wed, Sep 07, 2016 at 11:39:57AM -0700, Junio C Hamano wrote:
> 
> > > Can we do some signaling with fifos to tell the hook when it is safe to
> > > exit? Then we would just need to `wait` for its parent process.
> > 
> > Is fifo safe on Windows, though?
> 
> No clue. We seem to use mkfifo unconditionally in lib-daemon, but
> perhaps people do not run that test on Windows. Other invocations seem
> to be protected by the PIPE prerequisite. But...

AFAICT we do not use mkfifo on Windows. Let's see what t/test-lib.sh has
to say about the matter:

test_lazy_prereq PIPE '
# test whether the filesystem supports FIFOs
case $(uname -s) in
CYGWIN*|MINGW*)
false
;;
*)
rm -f testfifo && mkfifo testfifo
;;
esac
'

So there you go.

The reason it is disabled is that Cygwin/MSYS2 do have a concept of a
FIFO. But `git.exe` won't be able to access such a FIFO because it is
emulated by the POSIX emulation layer, which Git cannot access.

> > With v2 that explicitly kills, I guess we can make the sleep longer
> > without slowing down in the optimistic case?
> 
> Yeah, I think the v2 one is non-racy (I thought at first we might race
> with the "echo", but it should be synchronous; the hook will not exit
> until we have written the pid file, and git will not exit until the hook
> is done running).

Please note that Hannes and I discussed this (as I originally suggested to
increase it to 10 seconds, and Hannes rightfully pointed out that we would
have to change the script name, too, as it says sleep-one-second.sh, and
that would have made the patch less readable) and we came to the same
conclusion: it's not necessary.

Ciao,
Dscho


Re: [PATCH v2 2/3] Introduce a function to run regexec() on non-NUL-terminated buffers

2016-09-08 Thread Jeff King
On Thu, Sep 08, 2016 at 09:31:11AM +0200, Johannes Schindelin wrote:

> diff --git a/git-compat-util.h b/git-compat-util.h
> index db89ba7..19128b3 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -965,6 +965,27 @@ void git_qsort(void *base, size_t nmemb, size_t size,
>  #define qsort git_qsort
>  #endif
>  
> +static inline int regexec_buf(const regex_t *preg, const char *buf, size_t 
> size,
> +   size_t nmatch, regmatch_t pmatch[], int eflags)
> +{
> +#ifdef REG_STARTEND
> + assert(nmatch > 0 && pmatch);
> + pmatch[0].rm_so = 0;
> + pmatch[0].rm_eo = size;
> + return regexec(preg, buf, nmatch, pmatch, eflags | REG_STARTEND);
> +#else
> + char *buf2 = xmalloc(size + 1);
> + int ret;
> +
> + memcpy(buf2, buf, size);
> + buf2[size] = '\0';

I mentioned elsewhere that I'd prefer we just push people into using
compat/regex if they don't have REG_STARTEND. But if we _do_ keep this
fallback, note that the above has a buffer overflow (think what happens
when "size" is the maximum value for a size_t).  You can avoid it by
using xmallocz().

-Peff


Re: [PATCH 0/3] Fix a segfault caused by regexec() being called on mmap()ed data

2016-09-08 Thread Jeff King
On Thu, Sep 08, 2016 at 09:29:58AM +0200, Johannes Schindelin wrote:

> sorry for the late answer, I was really busy trying to come up with a new
> and improved version of the patch series, and while hunting a bug I
> introduced got bogged down with other tasks.

No problem. I am not in a hurry.

> > I always assumed the _point_ of re_search taking a ptr/len pair was
> > exactly to handle this case. The documentation[1] says:
> > 
> >`string` is the string you want to match; it can contain newline and
> >null characters. `size` is the length of that string.
> > 
> > Which seems pretty definitive to me (that's for re_match(), but
> > re_search() is defined in the docs in terms of re_match()).
> 
> Right. The problem is: I *really* want to avoid using GNU-isms.

I don't think GNU-isms are a problem if we wrap them to give a nice
interface, and if we rely on having compat/regex. But if you mean "I do
not want to rely on using compat/regex everywhere", then OK. I can see
arguments both for and against using a consistent regex library, but I
do not care that much either way myself.

> > We can contain this to the existing compat/regexec/regexec.c, and just
> > provide a wrapper that is similar to regexec but takes a ptr/len pair.
> 
> But we can do even better than that: we can provide a wrapper that uses
> REG_STARTEND where available (which is really the majority of platforms we
> care about: Linux, MacOSX, Windows, and even the *BSDs). Where it is not
> available, we simply malloc(), memcpy() and append a NUL.

Doesn't that make things much _worse_ for people on systems without
REG_STARTEND? If we imagine that most regexec calls would operate on a
NUL-terminated buffer, then they are now paying the extra malloc and
copy for each call to regexec_buf(), even if the buffer was already
NUL-terminated (because they have no idea whether it was or not).

I think I'd rather just have:

  #ifndef REG_STARTEND
  #error "Your regex library sucks. Compile with NO_REGEX=NeedsStartEnd"
  #endif

(or you could just use REG_STARTEND and let the compiler complain, but
then the user has to figure out the right knob to twiddle).


One other question about REG_STARTEND is: what does it do with NULs
inside the buffer? Certainly glibc (and our compat/regex) treat it as a
buffer with a particular length and ignore embedded NULs, as we want.
But the NetBSD documentation says only:

 REG_STARTEND   The string is considered to start at string +
pmatch[0].rm_so and to have a terminating NUL
located at string + pmatch[0].rm_eo (there need not
actually be a NUL at that location), 

Besides avoiding a segfault, one of the benefits of regcomp_buf() is
that we will now find pickaxe-regex strings inside mixed binary/text
files. But it's not clear to me that NetBSD's implementation does this.

I guess we can assume it is fine (it is certainly no _worse_ than the
current behavior), and if people's platforms do not handle it, they can
build with NO_REGEX.

-Peff


[PATCH v3 3/3] Use the newly-introduced regexec_buf() function

2016-09-08 Thread Johannes Schindelin
The new regexec_buf() function operates on buffers with an explicitly
specified length, rather than NUL-terminated strings.

We need to use this function whenever the buffer we want to pass to
regexec() may have been mmap()ed (and is hence not NUL-terminated).

Note: the original motivation for this patch was to fix a bug where
`git diff -G ` would crash. This patch converts more callers,
though, some of which explicitly allocated and constructed
NUL-terminated strings (or worse: modified read-only buffers to insert
NULs).

Some of the buffers actually may be NUL-terminated. As regexec_buf()
uses REG_STARTEND where available, but has to fall back to allocating
and constructing NUL-terminated strings where REG_STARTEND is not
available, this makes the code less efficient in the latter case.

However, given the widespread support for REG_STARTEND, combined with
the improved ease of code maintenance, we strike the balance in favor
of REG_STARTEND.

Signed-off-by: Johannes Schindelin 
---
 diff.c  |  3 ++-
 diffcore-pickaxe.c  | 18 --
 t/t4061-diff-pickaxe.sh |  2 +-
 xdiff-interface.c   | 13 -
 4 files changed, 15 insertions(+), 21 deletions(-)

diff --git a/diff.c b/diff.c
index 534c12e..526775a 100644
--- a/diff.c
+++ b/diff.c
@@ -951,7 +951,8 @@ static int find_word_boundaries(mmfile_t *buffer, regex_t 
*word_regex,
 {
if (word_regex && *begin < buffer->size) {
regmatch_t match[1];
-   if (!regexec(word_regex, buffer->ptr + *begin, 1, match, 0)) {
+   if (!regexec_buf(word_regex, buffer->ptr + *begin,
+buffer->size - *begin, 1, match, 0)) {
char *p = memchr(buffer->ptr + *begin + match[0].rm_so,
'\n', match[0].rm_eo - match[0].rm_so);
*end = p ? p - buffer->ptr : match[0].rm_eo + *begin;
diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index 55067ca..9795ca1 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -23,7 +23,6 @@ static void diffgrep_consume(void *priv, char *line, unsigned 
long len)
 {
struct diffgrep_cb *data = priv;
regmatch_t regmatch;
-   int hold;
 
if (line[0] != '+' && line[0] != '-')
return;
@@ -33,11 +32,8 @@ static void diffgrep_consume(void *priv, char *line, 
unsigned long len)
 * caller early.
 */
return;
-   /* Yuck -- line ought to be "const char *"! */
-   hold = line[len];
-   line[len] = '\0';
-   data->hit = !regexec(data->regexp, line + 1, 1, , 0);
-   line[len] = hold;
+   data->hit = !regexec_buf(data->regexp, line + 1, len - 1, 1,
+, 0);
 }
 
 static int diff_grep(mmfile_t *one, mmfile_t *two,
@@ -50,9 +46,11 @@ static int diff_grep(mmfile_t *one, mmfile_t *two,
xdemitconf_t xecfg;
 
if (!one)
-   return !regexec(regexp, two->ptr, 1, , 0);
+   return !regexec_buf(regexp, two->ptr, two->size,
+   1, , 0);
if (!two)
-   return !regexec(regexp, one->ptr, 1, , 0);
+   return !regexec_buf(regexp, one->ptr, one->size,
+   1, , 0);
 
/*
 * We have both sides; need to run textual diff and see if
@@ -83,8 +81,8 @@ static unsigned int contains(mmfile_t *mf, regex_t *regexp, 
kwset_t kws)
regmatch_t regmatch;
int flags = 0;
 
-   assert(data[sz] == '\0');
-   while (*data && !regexec(regexp, data, 1, , flags)) {
+   while (*data &&
+  !regexec_buf(regexp, data, sz, 1, , flags)) {
flags |= REG_NOTBOL;
data += regmatch.rm_eo;
if (*data && regmatch.rm_so == regmatch.rm_eo)
diff --git a/t/t4061-diff-pickaxe.sh b/t/t4061-diff-pickaxe.sh
index 5929f2e..f0bf50b 100755
--- a/t/t4061-diff-pickaxe.sh
+++ b/t/t4061-diff-pickaxe.sh
@@ -14,7 +14,7 @@ test_expect_success setup '
test_tick &&
git commit -m "A 4k file"
 '
-test_expect_failure '-G matches' '
+test_expect_success '-G matches' '
git diff --name-only -G "^0{4096}$" HEAD^ >out &&
test 4096-zeroes.txt = "$(cat out)"
 '
diff --git a/xdiff-interface.c b/xdiff-interface.c
index f34ea76..50702a2 100644
--- a/xdiff-interface.c
+++ b/xdiff-interface.c
@@ -214,11 +214,10 @@ struct ff_regs {
 static long ff_regexp(const char *line, long len,
char *buffer, long buffer_size, void *priv)
 {
-   char *line_buffer;
struct ff_regs *regs = priv;
regmatch_t pmatch[2];
int i;
-   int result = -1;
+   int result;
 
/* Exclude terminating newline (and cr) from matching */
if (len > 0 && line[len-1] == '\n') {
@@ -228,18 +227,16 @@ static long 

[PATCH v3 2/3] Introduce a function to run regexec() on non-NUL-terminated buffers

2016-09-08 Thread Johannes Schindelin
We just introduced a test that demonstrates that our sloppy use of
regexec() on a mmap()ed area can result in incorrect results or even
hard crashes.

So what we need to fix this is a function that calls regexec() on a
length-delimited, rather than a NUL-terminated, string.

Happily, there is an extension to regexec() introduced by the NetBSD
project and present in all major regex implementation including
Linux', MacOSX' and the one Git includes in compat/regex/: by using
the (non-POSIX) REG_STARTEND flag, it is possible to tell the
regexec() function that it should only look at the offsets between
pmatch[0].rm_so and pmatch[0].rm_eo.

That is exactly what we need.

Since support for REG_STARTEND is so widespread by now, let's just
introduce a helper function that uses it, and fall back to allocating
and constructing a NUL-terminated when REG_STARTEND is not available.

Signed-off-by: Johannes Schindelin 
---
 git-compat-util.h | 21 +
 1 file changed, 21 insertions(+)

diff --git a/git-compat-util.h b/git-compat-util.h
index db89ba7..19128b3 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -965,6 +965,27 @@ void git_qsort(void *base, size_t nmemb, size_t size,
 #define qsort git_qsort
 #endif
 
+static inline int regexec_buf(const regex_t *preg, const char *buf, size_t 
size,
+ size_t nmatch, regmatch_t pmatch[], int eflags)
+{
+#ifdef REG_STARTEND
+   assert(nmatch > 0 && pmatch);
+   pmatch[0].rm_so = 0;
+   pmatch[0].rm_eo = size;
+   return regexec(preg, buf, nmatch, pmatch, eflags | REG_STARTEND);
+#else
+   char *buf2 = xmalloc(size + 1);
+   int ret;
+
+   memcpy(buf2, buf, size);
+   buf2[size] = '\0';
+   ret = regexec(preg, buf2, nmatch, pmatch, eflags);
+   free(buf2);
+
+   return ret;
+#endif
+}
+
 #ifndef DIR_HAS_BSD_GROUP_SEMANTICS
 # define FORCE_DIR_SET_GID S_ISGID
 #else
-- 
2.10.0.windows.1.10.g803177d




[PATCH v3 0/3] Fix a segfault caused by regexec() being called on mmap()ed data

2016-09-08 Thread Johannes Schindelin
This patch series addresses a problem where `git diff` is called using
`-G` or `-S --pickaxe-regex` on new-born files that are configured
without user diff drivers, and that hence get mmap()ed into memory.

The problem with that: mmap()ed memory is *not* NUL-terminated, yet the
pickaxe code calls regexec() on it just the same.

This problem has been reported by my colleague Chris Sidi.

We solve this by introducing a helper, regexec_buf(), that takes a
pointer and a length instead of a NUL-terminated string.

This helper then uses REG_STARTEND where available, and falls back to
allocating and constructing a NUL-terminated string. Given the
wide-spread support for REG_STARTEND (Linux has it, MacOSX has it, Git
for Windows has it because it uses compat/regex/ that has it), I think
this is a fair trade-off.

Changes since v2:

- changed 3/3 to switch the test_expect_failure from 1/3 to a
  test_expect_success


Johannes Schindelin (3):
  Demonstrate a problem: our pickaxe code assumes NUL-terminated buffers
  Introduce a function to run regexec() on non-NUL-terminated buffers
  Use the newly-introduced regexec_buf() function

 diff.c  |  3 ++-
 diffcore-pickaxe.c  | 18 --
 git-compat-util.h   | 21 +
 t/t4061-diff-pickaxe.sh | 22 ++
 xdiff-interface.c   | 13 -
 5 files changed, 57 insertions(+), 20 deletions(-)
 create mode 100755 t/t4061-diff-pickaxe.sh

Published-As: https://github.com/dscho/git/releases/tag/mmap-regexec-v3
Fetch-It-Via: git fetch https://github.com/dscho/git mmap-regexec-v3

Interdiff vs v2:

 diff --git a/t/t4061-diff-pickaxe.sh b/t/t4061-diff-pickaxe.sh
 index 5929f2e..f0bf50b 100755
 --- a/t/t4061-diff-pickaxe.sh
 +++ b/t/t4061-diff-pickaxe.sh
 @@ -14,7 +14,7 @@ test_expect_success setup '
test_tick &&
git commit -m "A 4k file"
  '
 -test_expect_failure '-G matches' '
 +test_expect_success '-G matches' '
git diff --name-only -G "^0{4096}$" HEAD^ >out &&
test 4096-zeroes.txt = "$(cat out)"
  '

-- 
2.10.0.windows.1.10.g803177d

base-commit: 6ebdac1bab966b720d776aa43ca188fe378b1f4b


[PATCH v3 1/3] Demonstrate a problem: our pickaxe code assumes NUL-terminated buffers

2016-09-08 Thread Johannes Schindelin
When our pickaxe code feeds file contents to regexec(), it implicitly
assumes that the file contents are read into implicitly NUL-terminated
buffers (i.e. that we overallocate by 1, appending a single '\0').

This is not so.

In particular when the file contents are simply mmap()ed, we can be
virtually certain that the buffer is preceding uninitialized bytes, or
invalid pages.

Note that the test we add here is known to be flakey: we simply cannot
know whether the byte following the mmap()ed ones is a NUL or not.

Typically, on Linux the test passes. On Windows, it fails virtually
every time due to an access violation (that's a segmentation fault for
you Unix-y people out there). And Windows would be correct: the
regexec() call wants to operate on a regular, NUL-terminated string,
there is no NUL in the mmap()ed memory range, and it is undefined
whether the next byte is even legal to access.

When run with --valgrind it demonstrates quite clearly the breakage, of
course.

Being marked with `test_expect_failure`, this test will sometimes be
declare "TODO fixed", even if it only passes by mistake.

This test case represents a Minimal, Complete and Verifiable Example of
a breakage reported by Chris Sidi.

Signed-off-by: Johannes Schindelin 
---
 t/t4061-diff-pickaxe.sh | 22 ++
 1 file changed, 22 insertions(+)
 create mode 100755 t/t4061-diff-pickaxe.sh

diff --git a/t/t4061-diff-pickaxe.sh b/t/t4061-diff-pickaxe.sh
new file mode 100755
index 000..5929f2e
--- /dev/null
+++ b/t/t4061-diff-pickaxe.sh
@@ -0,0 +1,22 @@
+#!/bin/sh
+#
+# Copyright (c) 2016 Johannes Schindelin
+#
+
+test_description='Pickaxe options'
+
+. ./test-lib.sh
+
+test_expect_success setup '
+   test_commit initial &&
+   printf "%04096d" 0 >4096-zeroes.txt &&
+   git add 4096-zeroes.txt &&
+   test_tick &&
+   git commit -m "A 4k file"
+'
+test_expect_failure '-G matches' '
+   git diff --name-only -G "^0{4096}$" HEAD^ >out &&
+   test 4096-zeroes.txt = "$(cat out)"
+'
+
+test_done
-- 
2.10.0.windows.1.10.g803177d




Re: [PATCH v2 3/3] Use the newly-introduced regexec_buf() function

2016-09-08 Thread Johannes Schindelin
Hi,

On Thu, 8 Sep 2016, Johannes Schindelin wrote:

> The new regexec_buf() function operates on buffers with an explicitly
> specified length, rather than NUL-terminated strings.
> 
> We need to use this function whenever the buffer we want to pass to
> regexec() may have been mmap()ed (and is hence not NUL-terminated).
> 
> Note: the original motivation for this patch was to fix a bug where
> `git diff -G ` would crash. This patch converts more callers,
> though, some of which explicitly allocated and constructed
> NUL-terminated strings (or worse: modified read-only buffers to insert
> NULs).
> 
> Some of the buffers actually may be NUL-terminated. As regexec_buf()
> uses REG_STARTEND where available, but has to fall back to allocating
> and constructing NUL-terminated strings where REG_STARTEND is not
> available, this makes the code less efficient in the latter case.
> 
> However, given the widespread support for REG_STARTEND, combined with
> the improved ease of code maintenance, we strike the balance in favor
> of REG_STARTEND.
> 
> Signed-off-by: Johannes Schindelin 
> ---
>  diff.c |  3 ++-
>  diffcore-pickaxe.c | 18 --
>  xdiff-interface.c  | 13 -
>  3 files changed, 14 insertions(+), 20 deletions(-)

I just realized that this should switch the test_expect_failure from 1/3
to a test_expect_success.

Will send out v3 in a moment.

Ciao,
Dscho


Re: [PATCH 1/3] Demonstrate a problem: our pickaxe code assumes NUL-terminated buffers

2016-09-08 Thread Johannes Schindelin
Hi Peff,

On Tue, 6 Sep 2016, Jeff King wrote:

> On Mon, Sep 05, 2016 at 05:45:02PM +0200, Johannes Schindelin wrote:
> 
> > Typically, on Linux the test passes. On Windows, it fails virtually
> > every time due to an access violation (that's a segmentation fault for
> > you Unix-y people out there). And Windows would be correct: the
> > regexec() call wants to operate on a regular, NUL-terminated string,
> > there is no NUL in the mmap()ed memory range, and it is undefined
> > whether the next byte is even legal to access.
> > 
> > When run with --valgrind it demonstrates quite clearly the breakage, of
> > course.
> > 
> > So we simply mark it with `test_expect_success` for now.
> 
> I'd prefer if this were marked as expect_failure. It fails reliably for
> me on Linux, even without --valgrind. But even if that were not so,
> there is no reason to hurt bisectability of somebody running with
> "--valgrind" (not when it costs so little to mark it correctly).

Forgot to say in the cover letter: I did change this from
test_expect_success to test_expect_failure.

But of course, now I remember that I failed to change it back in 3/3. Bah.

Ciao,
Dscho


Re: [PATCH 2/3] diff_populate_filespec: NUL-terminate buffers

2016-09-08 Thread Johannes Schindelin
Hi Junio,

On Wed, 7 Sep 2016, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > What happens to those poor souls on systems without REG_STARTEND? Do
> > they get to keep segfaulting?
> >
> > I think the solution is to push them into setting NO_REGEX. So looking
> > at this versus a "regexecn", it seems:
> >
> >   - this lets people keep using their native regexec if it supports
> > STARTEND
> >
> >   - this is a bit more clunky to use at the callsites (though we could
> > _create_ a portable regexecn wrapper that uses this technique on top
> > of the native regex library)
> >
> > But I much prefer this approach to copying the data just to add a NUL.
> 
> I first thought "push them to NO_REGEX" to mean "they live with
> crippled Git that does not do regexp" and went "Huh?", but it merely
> means "let's avoid platform regex library and use on from the
> compat/ hierarchy", which would solve the STARTEND portability issue
> for everybody.
> 
> Which is very good.
> 
> The idea to create a thin regexecn() wrapper also sounds like a good
> idea, too.  The changes to the callsites in the demonstration patch
> does look a bit clunky to me, too.

The demonstration patch was only meant as a mere demonstration where this
leads us. I DRY'd it up quite a bit (which was my plan all along, but it
was faster to make the changes in place, to avoid a full-sale
recompilation due to a central header change; you might not care because
you use Linux with its native POSIX, while I have to use MSYS2, making
even my builds slower).

And I really do not think that it would be a good idea to use
compat/regex/ for everybody, even if they already have a working regex.h
on their system.

Ciao,
Dscho


Re: [PATCH 2/3] diff_populate_filespec: NUL-terminate buffers

2016-09-08 Thread Johannes Schindelin
Hi Peff,

On Tue, 6 Sep 2016, Jeff King wrote:

> On Tue, Sep 06, 2016 at 06:02:59PM +0200, Johannes Schindelin wrote:
> 
> > It will still be quite tricky, because we have to touch a function that is
> > rather at the bottom of the food chain: diff_populate_filespec() is called
> > from fill_textconv(), which in turn is called from pickaxe_match(), and
> > only pickaxe_match() knows whether we want to call regexec() or not (it
> > depends on its regexp parameter).
> > 
> > Adding a flag to diff_populate_filespec() sounds really reasonable until
> > you see how many call sites fill_textconv() has.
> 
> I was thinking of something quite gross, like a global "switch to using
> slower-but-safer NUL termination" flag (but I agree with Junio's point
> elsewhere that we do not even know if it is "slower").

Urgh.

;-)

> > So now for the better idea.
> > 
> > While I was researching the code for this reply, I hit upon one thing
> > that I never knew existed, introduced in f96e567 (grep: use
> > REG_STARTEND for all matching if available, 2010-05-22). Apparently,
> > NetBSD introduced an extension to regexec() where you can specify
> > buffer boundaries using REG_STARTEND. Which is pretty much what we
> > need.
> 
> Yes, and compat/regex support this, too. My question is whether it is
> portable.

That is only one question.

Another, important question is: is it efficient?

I have no idea whether there exists any hardware-accelerated regex library
out there, maybe even using CUDA (I know that there is some code out there
using SSE to perform LF -> CR/LF conversion, unfortunately it is
intentionally incompatible with GPLv2).

We cannot simply switch everybody and her dog to compat/regex/ just
because we want to avoid a segfault.

> > diff --git a/diff.c b/diff.c
> > index 534c12e..2c5a360 100644
> > --- a/diff.c
> > +++ b/diff.c
> > @@ -951,7 +951,13 @@ static int find_word_boundaries(mmfile_t *buffer,
> > regex_t *word_regex,
> >  {
> > if (word_regex && *begin < buffer->size) {
> > regmatch_t match[1];
> > -   if (!regexec(word_regex, buffer->ptr + *begin, 1, match,
> > 0)) {
> > +   int f = 0;
> > +#ifdef REG_STARTEND
> > +   match[0].rm_so = 0;
> > +   match[0].rm_eo = *end - *begin;
> > +   f = REG_STARTEND;
> > +#endif
> > +   if (!regexec(word_regex, buffer->ptr + *begin, 1, match,
> > f)) {

Heh. You introduced the same bug I did. Or maybe you just fetched my
mmap-regexec branch and looked at an intermediate iteration?

The problem with this patch is that *end is uninitialized. I actually
initialized it in my patch, but it was still incorrect. I settled on using
buffer->size - *begin in the end.

> What happens to those poor souls on systems without REG_STARTEND? Do
> they get to keep segfaulting?

Of course not. Those poor souls on systems without REG_STARTEND pay a
little price for that: malloc(); memcpy(); *end = '\0'; ... free();

I think it is worth it: maintenance of the code is much easier that way
than forcing everybody and her dog and her dog's hamster to compat/regex/.

> But I much prefer this approach to copying the data just to add a NUL.

I think it is not worth the burden. The only regex implementation in
semi-widespread use that do not support REG_STARTEND seems to be musl.

I'd rather not spend *so much* effort just to support an obscure platform.
Not when the users of that obscure platform could spend that effort
themselves. And probably won't, because we only copy data to add a NUL on
those platforms when regexec() is called on an mmfile_t.

Better to keep it simple,
Dscho


Re: [PATCH v2 0/3] Fix a segfault caused by regexec() being called on mmap()ed data

2016-09-08 Thread Johannes Schindelin
Hi,

On Thu, 8 Sep 2016, Johannes Schindelin wrote:

> We solve this by introducing a helper, regexec_buf(), that takes a
> pointer and a length instead of a NUL-terminated string.

BTW I should have clarified why I decided on another name than regexecn()
(I had considered this even before reading Peff's proposed patch): the 
in string functions suggest a limiting of NUL-terminated strings. In other
words, if n = 100 and the provided pointer points to a NUL-terminated
string of length 3, the *n function will treat it as a string of length 3.

That is not what regexec_buf() does: it ignores the NUL. Hence the
different name.

Ciao,
Dscho


[PATCH v2 2/3] Introduce a function to run regexec() on non-NUL-terminated buffers

2016-09-08 Thread Johannes Schindelin
We just introduced a test that demonstrates that our sloppy use of
regexec() on a mmap()ed area can result in incorrect results or even
hard crashes.

So what we need to fix this is a function that calls regexec() on a
length-delimited, rather than a NUL-terminated, string.

Happily, there is an extension to regexec() introduced by the NetBSD
project and present in all major regex implementation including
Linux', MacOSX' and the one Git includes in compat/regex/: by using
the (non-POSIX) REG_STARTEND flag, it is possible to tell the
regexec() function that it should only look at the offsets between
pmatch[0].rm_so and pmatch[0].rm_eo.

That is exactly what we need.

Since support for REG_STARTEND is so widespread by now, let's just
introduce a helper function that uses it, and fall back to allocating
and constructing a NUL-terminated when REG_STARTEND is not available.

Signed-off-by: Johannes Schindelin 
---
 git-compat-util.h | 21 +
 1 file changed, 21 insertions(+)

diff --git a/git-compat-util.h b/git-compat-util.h
index db89ba7..19128b3 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -965,6 +965,27 @@ void git_qsort(void *base, size_t nmemb, size_t size,
 #define qsort git_qsort
 #endif
 
+static inline int regexec_buf(const regex_t *preg, const char *buf, size_t 
size,
+ size_t nmatch, regmatch_t pmatch[], int eflags)
+{
+#ifdef REG_STARTEND
+   assert(nmatch > 0 && pmatch);
+   pmatch[0].rm_so = 0;
+   pmatch[0].rm_eo = size;
+   return regexec(preg, buf, nmatch, pmatch, eflags | REG_STARTEND);
+#else
+   char *buf2 = xmalloc(size + 1);
+   int ret;
+
+   memcpy(buf2, buf, size);
+   buf2[size] = '\0';
+   ret = regexec(preg, buf2, nmatch, pmatch, eflags);
+   free(buf2);
+
+   return ret;
+#endif
+}
+
 #ifndef DIR_HAS_BSD_GROUP_SEMANTICS
 # define FORCE_DIR_SET_GID S_ISGID
 #else
-- 
2.10.0.windows.1.10.g803177d




[PATCH v2 3/3] Use the newly-introduced regexec_buf() function

2016-09-08 Thread Johannes Schindelin
The new regexec_buf() function operates on buffers with an explicitly
specified length, rather than NUL-terminated strings.

We need to use this function whenever the buffer we want to pass to
regexec() may have been mmap()ed (and is hence not NUL-terminated).

Note: the original motivation for this patch was to fix a bug where
`git diff -G ` would crash. This patch converts more callers,
though, some of which explicitly allocated and constructed
NUL-terminated strings (or worse: modified read-only buffers to insert
NULs).

Some of the buffers actually may be NUL-terminated. As regexec_buf()
uses REG_STARTEND where available, but has to fall back to allocating
and constructing NUL-terminated strings where REG_STARTEND is not
available, this makes the code less efficient in the latter case.

However, given the widespread support for REG_STARTEND, combined with
the improved ease of code maintenance, we strike the balance in favor
of REG_STARTEND.

Signed-off-by: Johannes Schindelin 
---
 diff.c |  3 ++-
 diffcore-pickaxe.c | 18 --
 xdiff-interface.c  | 13 -
 3 files changed, 14 insertions(+), 20 deletions(-)

diff --git a/diff.c b/diff.c
index 534c12e..526775a 100644
--- a/diff.c
+++ b/diff.c
@@ -951,7 +951,8 @@ static int find_word_boundaries(mmfile_t *buffer, regex_t 
*word_regex,
 {
if (word_regex && *begin < buffer->size) {
regmatch_t match[1];
-   if (!regexec(word_regex, buffer->ptr + *begin, 1, match, 0)) {
+   if (!regexec_buf(word_regex, buffer->ptr + *begin,
+buffer->size - *begin, 1, match, 0)) {
char *p = memchr(buffer->ptr + *begin + match[0].rm_so,
'\n', match[0].rm_eo - match[0].rm_so);
*end = p ? p - buffer->ptr : match[0].rm_eo + *begin;
diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index 55067ca..9795ca1 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -23,7 +23,6 @@ static void diffgrep_consume(void *priv, char *line, unsigned 
long len)
 {
struct diffgrep_cb *data = priv;
regmatch_t regmatch;
-   int hold;
 
if (line[0] != '+' && line[0] != '-')
return;
@@ -33,11 +32,8 @@ static void diffgrep_consume(void *priv, char *line, 
unsigned long len)
 * caller early.
 */
return;
-   /* Yuck -- line ought to be "const char *"! */
-   hold = line[len];
-   line[len] = '\0';
-   data->hit = !regexec(data->regexp, line + 1, 1, , 0);
-   line[len] = hold;
+   data->hit = !regexec_buf(data->regexp, line + 1, len - 1, 1,
+, 0);
 }
 
 static int diff_grep(mmfile_t *one, mmfile_t *two,
@@ -50,9 +46,11 @@ static int diff_grep(mmfile_t *one, mmfile_t *two,
xdemitconf_t xecfg;
 
if (!one)
-   return !regexec(regexp, two->ptr, 1, , 0);
+   return !regexec_buf(regexp, two->ptr, two->size,
+   1, , 0);
if (!two)
-   return !regexec(regexp, one->ptr, 1, , 0);
+   return !regexec_buf(regexp, one->ptr, one->size,
+   1, , 0);
 
/*
 * We have both sides; need to run textual diff and see if
@@ -83,8 +81,8 @@ static unsigned int contains(mmfile_t *mf, regex_t *regexp, 
kwset_t kws)
regmatch_t regmatch;
int flags = 0;
 
-   assert(data[sz] == '\0');
-   while (*data && !regexec(regexp, data, 1, , flags)) {
+   while (*data &&
+  !regexec_buf(regexp, data, sz, 1, , flags)) {
flags |= REG_NOTBOL;
data += regmatch.rm_eo;
if (*data && regmatch.rm_so == regmatch.rm_eo)
diff --git a/xdiff-interface.c b/xdiff-interface.c
index f34ea76..50702a2 100644
--- a/xdiff-interface.c
+++ b/xdiff-interface.c
@@ -214,11 +214,10 @@ struct ff_regs {
 static long ff_regexp(const char *line, long len,
char *buffer, long buffer_size, void *priv)
 {
-   char *line_buffer;
struct ff_regs *regs = priv;
regmatch_t pmatch[2];
int i;
-   int result = -1;
+   int result;
 
/* Exclude terminating newline (and cr) from matching */
if (len > 0 && line[len-1] == '\n') {
@@ -228,18 +227,16 @@ static long ff_regexp(const char *line, long len,
len--;
}
 
-   line_buffer = xstrndup(line, len); /* make NUL terminated */
-
for (i = 0; i < regs->nr; i++) {
struct ff_reg *reg = regs->array + i;
-   if (!regexec(>re, line_buffer, 2, pmatch, 0)) {
+   if (!regexec_buf(>re, line, len, 2, pmatch, 0)) {
if (reg->negate)
-   goto fail;
+

[PATCH v2 0/3] Fix a segfault caused by regexec() being called on mmap()ed data

2016-09-08 Thread Johannes Schindelin
This patch series addresses a problem where `git diff` is called using
`-G` or `-S --pickaxe-regex` on new-born files that are configured
without user diff drivers, and that hence get mmap()ed into memory.

The problem with that: mmap()ed memory is *not* NUL-terminated, yet the
pickaxe code calls regexec() on it just the same.

This problem has been reported by my colleague Chris Sidi.

We solve this by introducing a helper, regexec_buf(), that takes a
pointer and a length instead of a NUL-terminated string.

This helper then uses REG_STARTEND where available, and falls back to
allocating and constructing a NUL-terminated string. Given the
wide-spread support for REG_STARTEND (Linux has it, MacOSX has it, Git
for Windows has it because it uses compat/regex/ that has it), I think
this is a fair trade-off.

Changes since v1:

- completely revamped the approach: now we no longer force-append a NUL
  whenever mmap()ing buffers for use by the diff machinery, but we fix
  things at the regexec() level.


Johannes Schindelin (3):
  Demonstrate a problem: our pickaxe code assumes NUL-terminated buffers
  Introduce a function to run regexec() on non-NUL-terminated buffers
  Use the newly-introduced regexec_buf() function

 diff.c  |  3 ++-
 diffcore-pickaxe.c  | 18 --
 git-compat-util.h   | 21 +
 t/t4061-diff-pickaxe.sh | 22 ++
 xdiff-interface.c   | 13 -
 5 files changed, 57 insertions(+), 20 deletions(-)
 create mode 100755 t/t4061-diff-pickaxe.sh

Published-As: https://github.com/dscho/git/releases/tag/mmap-regexec-v2
Fetch-It-Via: git fetch https://github.com/dscho/git mmap-regexec-v2

Interdiff vs v1:

 diff --git a/diff.c b/diff.c
 index 32f7f46..526775a 100644
 --- a/diff.c
 +++ b/diff.c
 @@ -951,7 +951,8 @@ static int find_word_boundaries(mmfile_t *buffer, regex_t 
*word_regex,
  {
if (word_regex && *begin < buffer->size) {
regmatch_t match[1];
 -  if (!regexec(word_regex, buffer->ptr + *begin, 1, match, 0)) {
 +  if (!regexec_buf(word_regex, buffer->ptr + *begin,
 +   buffer->size - *begin, 1, match, 0)) {
char *p = memchr(buffer->ptr + *begin + match[0].rm_so,
'\n', match[0].rm_eo - match[0].rm_so);
*end = p ? p - buffer->ptr : match[0].rm_eo + *begin;
 @@ -2826,15 +2827,6 @@ int diff_populate_filespec(struct diff_filespec *s, 
unsigned int flags)
s->data = strbuf_detach(, );
s->size = size;
s->should_free = 1;
 -  } else {
 -  /* data must be NUL-terminated so e.g. for regexec() */
 -  char *data = xmalloc(s->size + 1);
 -  memcpy(data, s->data, s->size);
 -  data[s->size] = '\0';
 -  munmap(s->data, s->size);
 -  s->should_munmap = 0;
 -  s->data = data;
 -  s->should_free = 1;
}
}
else {
 diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
 index 88820b6..9795ca1 100644
 --- a/diffcore-pickaxe.c
 +++ b/diffcore-pickaxe.c
 @@ -23,7 +23,6 @@ static void diffgrep_consume(void *priv, char *line, 
unsigned long len)
  {
struct diffgrep_cb *data = priv;
regmatch_t regmatch;
 -  int hold;
  
if (line[0] != '+' && line[0] != '-')
return;
 @@ -33,11 +32,8 @@ static void diffgrep_consume(void *priv, char *line, 
unsigned long len)
 * caller early.
 */
return;
 -  /* Yuck -- line ought to be "const char *"! */
 -  hold = line[len];
 -  line[len] = '\0';
 -  data->hit = !regexec(data->regexp, line + 1, 1, , 0);
 -  line[len] = hold;
 +  data->hit = !regexec_buf(data->regexp, line + 1, len - 1, 1,
 +   , 0);
  }
  
  static int diff_grep(mmfile_t *one, mmfile_t *two,
 @@ -49,12 +45,12 @@ static int diff_grep(mmfile_t *one, mmfile_t *two,
xpparam_t xpp;
xdemitconf_t xecfg;
  
 -  assert(!one || one->ptr[one->size] == '\0');
 -  assert(!two || two->ptr[two->size] == '\0');
if (!one)
 -  return !regexec(regexp, two->ptr, 1, , 0);
 +  return !regexec_buf(regexp, two->ptr, two->size,
 +  1, , 0);
if (!two)
 -  return !regexec(regexp, one->ptr, 1, , 0);
 +  return !regexec_buf(regexp, one->ptr, one->size,
 +  1, , 0);
  
/*
 * We have both sides; need to run textual diff and see if
 @@ -85,8 +81,8 @@ static unsigned int contains(mmfile_t *mf, regex_t *regexp, 
kwset_t kws)
regmatch_t regmatch;
int flags = 0;
  
 -  assert(data[sz] == '\0');
 -  

[PATCH v2 1/3] Demonstrate a problem: our pickaxe code assumes NUL-terminated buffers

2016-09-08 Thread Johannes Schindelin
When our pickaxe code feeds file contents to regexec(), it implicitly
assumes that the file contents are read into implicitly NUL-terminated
buffers (i.e. that we overallocate by 1, appending a single '\0').

This is not so.

In particular when the file contents are simply mmap()ed, we can be
virtually certain that the buffer is preceding uninitialized bytes, or
invalid pages.

Note that the test we add here is known to be flakey: we simply cannot
know whether the byte following the mmap()ed ones is a NUL or not.

Typically, on Linux the test passes. On Windows, it fails virtually
every time due to an access violation (that's a segmentation fault for
you Unix-y people out there). And Windows would be correct: the
regexec() call wants to operate on a regular, NUL-terminated string,
there is no NUL in the mmap()ed memory range, and it is undefined
whether the next byte is even legal to access.

When run with --valgrind it demonstrates quite clearly the breakage, of
course.

Being marked with `test_expect_failure`, this test will sometimes be
declare "TODO fixed", even if it only passes by mistake.

This test case represents a Minimal, Complete and Verifiable Example of
a breakage reported by Chris Sidi.

Signed-off-by: Johannes Schindelin 
---
 t/t4061-diff-pickaxe.sh | 22 ++
 1 file changed, 22 insertions(+)
 create mode 100755 t/t4061-diff-pickaxe.sh

diff --git a/t/t4061-diff-pickaxe.sh b/t/t4061-diff-pickaxe.sh
new file mode 100755
index 000..5929f2e
--- /dev/null
+++ b/t/t4061-diff-pickaxe.sh
@@ -0,0 +1,22 @@
+#!/bin/sh
+#
+# Copyright (c) 2016 Johannes Schindelin
+#
+
+test_description='Pickaxe options'
+
+. ./test-lib.sh
+
+test_expect_success setup '
+   test_commit initial &&
+   printf "%04096d" 0 >4096-zeroes.txt &&
+   git add 4096-zeroes.txt &&
+   test_tick &&
+   git commit -m "A 4k file"
+'
+test_expect_failure '-G matches' '
+   git diff --name-only -G "^0{4096}$" HEAD^ >out &&
+   test 4096-zeroes.txt = "$(cat out)"
+'
+
+test_done
-- 
2.10.0.windows.1.10.g803177d




Re: [RFC/PATCH v2 0/3] patch-id for merges

2016-09-08 Thread Jeff King
On Wed, Sep 07, 2016 at 03:51:04PM -0700, Josh Triplett wrote:

> > This is still marked RFC, because there are really two approaches here,
> > and I'm not sure which one is better for "format-patch --base". I'd like
> > to get input from Xiaolong Ye (who worked on --base), and Josh Triplett
> > (who has proposed some patches in that area, and is presumably using
> > them).
> 
> Thanks.
> 
> I'd love to see a more resilient patch-id mechanism, to make it easier
> to match up patches between branches.  I don't think it makes sense to
> talk about the patch-id of a merge commit (though it might make sense
> for a merge which makes additional changes not present in any of the
> parents).  Even if someone wants to match up merge commits with merge
> commits, I don't think that should happen via patch-id; I think that
> should happen in terms of "what patches does this merge introduce",
> without constructing a merge-patch-id via a Merkle tree of commit
> patch-ids.
> 
> So, I think this patch series makes sense (modulo the comments about the
> commit message in patch 3).  We already don't respect merge commits when
> doing format-patch; this seems consistent with that.  If we ever make it
> possible for format-patch to handle merge commits, then we should also
> allow it to have merge commits as prerequisites.

Thanks for the input. I knew that format-patch doesn't show merge
commits, but I didn't realize that merges were skipped entirely for the
base preparation (but I see it now; there is a "rev.max_parents = 1"
setting in prepare_bases).

So this really doesn't change the output there at all. And in fact, the
switch to:

   if (commit_patch_id(commit, , sha1, 0))
  - die(_("cannot get patch id"))
  + continue;

should never hit that continue. It could be:

die("BUG: somehow a merge got fed to commit_patch_id?");

but the "continue" somehow seems like the right thing to me.

I'll wait another day or so for comments and then send the re-roll using
this approach.

-Peff


Re: [PATCH 0/3] Fix a segfault caused by regexec() being called on mmap()ed data

2016-09-08 Thread Johannes Schindelin
Hi Peff,

sorry for the late answer, I was really busy trying to come up with a new
and improved version of the patch series, and while hunting a bug I
introduced got bogged down with other tasks.

The good news is that I made up my mind about releasing a Git for Windows
v2.10.0(2): originally, I had planned to do that today, to have time for
any hot fixes until Sunday, if necessary, before going semi-dark.

FWIW I am now trying to track my plans for v2.10.0(2) (or v2.10.1, if
upstream Git v2.10.1 is released before) on GitHub:

https://github.com/git-for-windows/git/milestone/3

On Tue, 6 Sep 2016, Jeff King wrote:

> On Tue, Sep 06, 2016 at 04:06:32PM +0200, Johannes Schindelin wrote:
> 
> > > I think re_search() the correct replacement function but it's been a
> > > while since I've looked into it.
> > 
> > The segfault I investigated happened in a call to strlen(). I see many
> > calls to strlen() in compat/regex/... The one that triggers the segfault
> > is in regexec(), compat/regex/regexec.c:241.
> 
> Yes, that is the important one, I think. The others are for patterns,
> error msgs, etc. Of course strlen() is not the only function that cares
> about NUL delimiters (and there might even be a "while (*p)" somewhere
> in the code).
> 
> I always assumed the _point_ of re_search taking a ptr/len pair was
> exactly to handle this case. The documentation[1] says:
> 
>`string` is the string you want to match; it can contain newline and
>null characters. `size` is the length of that string.
> 
> Which seems pretty definitive to me (that's for re_match(), but
> re_search() is defined in the docs in terms of re_match()).

Right. The problem is: I *really* want to avoid using GNU-isms.

> > The bigger problem is that re_search() is defined in the __USE_GNU section
> > of regex.h, and I do not think it is appropriate to universally #define
> > said constant before #include'ing regex.h. So it would appear that major
> > surgery would be required if we wanted to use regular expressions on
> > strings that are not NUL-terminated.
> 
> We can contain this to the existing compat/regexec/regexec.c, and just
> provide a wrapper that is similar to regexec but takes a ptr/len pair.

But we can do even better than that: we can provide a wrapper that uses
REG_STARTEND where available (which is really the majority of platforms we
care about: Linux, MacOSX, Windows, and even the *BSDs). Where it is not
available, we simply malloc(), memcpy() and append a NUL.

Which is what my v2 does (will send it out in a moment).

Ciao,
Dscho


Re: [PATCH 2/3] diff_flush_patch_id: stop returning error result

2016-09-08 Thread Jeff King
On Thu, Sep 08, 2016 at 01:51:05AM +0100, Ramsay Jones wrote:

> 
> 
> On 07/09/16 23:04, Jeff King wrote:
> > All of our errors come from diff_get_patch_id(), which has
> > exactly three error conditions. The first is an internal
> > assertion, which should be a die("BUG") in the first place.
> > 
> > The other two are caused by an inability to two diff blobs,
>^
> Huh? ... to diff two blobs?

Sorry. English my getting worse be to seems.

Will fix in a re-roll.

-Peff


Re: [PATCH 3/5] git-rebase--interactive: fix English grammar

2016-09-08 Thread Johannes Schindelin
Hi Alex,

On Wed, 7 Sep 2016, Alex Henrie wrote:

> diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
> index 7e558b0..6fd6d4e 100644
> --- a/git-rebase--interactive.sh
> +++ b/git-rebase--interactive.sh
> @@ -1082,7 +1082,7 @@ If they are meant to go into a new commit, run:
>  
>git commit \$gpg_sign_opt_quoted
>  
> -In both case, once you're done, continue with:
> +In both cases, once you're done, continue with:

Good catch! I ported this into my `prepare-sequencer` patch series.

Ciao,
Johannes


git-gui, was Re: [PATCH v2 6/6] git-gui: Update Japanese information

2016-09-08 Thread Johannes Schindelin
Hi Junio,

On Wed, 7 Sep 2016, Junio C Hamano wrote:

> Pat, we haven't heard from you for a long time.

Indeed. There are a couple of git-gui patches in Git for Windows that have
been contributed a long time ago and not been picked up.

Maybe it is time to just accept git-gui patches directly into Git, after
some other Tcl/Tk savvy people ACKed them?

Ciao,
Dscho


Re: [PATCH] compat: move strdup(3) replacement to its own file

2016-09-08 Thread Johannes Schindelin
Hi,

On Wed, 7 Sep 2016, Junio C Hamano wrote:

> René Scharfe  writes:
> 
> > Well, OK.  I think the missing point is that the original nedmalloc
> > doesn't come with strdup() and doesn't need it.  Only _users_ of
> > nedmalloc need it.  Marius added it in nedmalloc.c, but strdup.c is a
> > better place for it.
> 
> Thanks.  I'll add these lines like so:
> 
> Move our implementation of strdup(3) out of compat/nedmalloc/ and
> allow it to be used independently from USE_NED_ALLOCATOR.  The
> original nedmalloc doesn't come with strdup() and doesn't need it.
> Only _users_ of nedmalloc need it, which was added when we imported
> it to our compat/ hierarchy.
> 
> This reduces the difference of our copy of nedmalloc from the
> original, making it easier to update, and allows for easier testing
> and reusing of our version of strdup().

Excellent!

Thanks,
Dscho

Re: Bug? ran into a "fatal" using interactive rebase

2016-09-08 Thread Johannes Schindelin
Hi Ralf,

On Wed, 7 Sep 2016, Ralf Thielow wrote:

> 2016-09-07 17:19 GMT+02:00 Johannes Schindelin :
> >
> > So, something like this should help (if you are interested in seeing this
> > patch included, please run with it, as I am running short on time):

What I meant is: could you craft a test, a proper commit message, and
submit it? I will be mostly offline during the second half of September
and have many other things to take care of right now.

Thanks,
Dscho


Re: [PATCH v2 00/38] Virtualization of the refs API

2016-09-08 Thread Michael Haggerty
On 09/07/2016 09:20 PM, Junio C Hamano wrote:
> Michael Haggerty  writes:
> 
>> This is v2 of the patch series to virtualize the references API
>> (though earlier patch series similar in spirit were submitted by
>> Ronnie Sahlberg and David Turner). Thanks to Junio, Eric, and Ramsay
>> for their comments about v1 [1].
>>
>> Nobody pointed out any fundamental problems with v1, but this version
>> includes the following improvements:
> 
> Curiously, many of these improvements were already in 'pu'.

I pushed some of these changes to GitHub a while ago and mentioned that
fact on the mailing list [1]; I assume you fetched from there. But I was
waiting for the dust to settle from the earlier patch series and the
2.10 release before sending them to the mailing list again.

>> * In "refs: add methods for reflog":
>>
>>   * Don't export `files_reflog_iterator_begin()` (suggested by
>> Ramsay).
> 
> This I can see was missing in what has been in 'pu'.

And according to your "What's cooking", you were waiting on this change
before proceeding.

FYI: I haven't done significant work on any next steps, which IMO could
go in many possible directions:

* Splitting loose and packed refs storage into two separate ref_store
subclasses that are joined together using some kind of
overlay_ref_store. This is not necessarily a blocker for the following
ideas, but the overlay_ref_store would make it easier to construct other
hybrid ref-stores, as for example the ubiquitous "most references are
stored in main repo; per-worktree refs are stored in worktree".

* Implementing reference caching as a ref_store that "writes through"
changes to a backing reference store. This is also not a blocker for
other ideas, but it would allow reference caching easily to be turned
on/off for other reference stores (or even disabled for packed/loose
reference storage during command invocations that know they won't have
to iterate over references more than once).

* A variant of loose reference storage that encodes the reference names
somehow before turning them into filenames (1) to make refnames
independent of filesystem path-name encoding issues, and (2) to
eliminate D/F conflicts, thereby allowing reflogs to be retained after a
reference is deleted.

* A reference backend based on LMDB (or any other key-value store).

* A reference backend based on Shawn Pearce's RefTree proposal or
something like it [2].

It's uncertain when I'll next have a chunk of time to work on any of these.

Michael

[1] http://public-inbox.org/git/575990fb.5000...@alum.mit.edu/
[2]
http://public-inbox.org/git/CAJo=hjvnapnaddcaawavu9c4rveqdos3ev9wtguhx4fd0v_...@mail.gmail.com/t



Re: [PATCH 5/5] unpack-trees: do not capitalize "working"

2016-09-08 Thread Matthieu Moy
Alex Henrie  writes:

> In English, only proper nouns are capitalized.
>
> Signed-off-by: Alex Henrie 
> ---
>  unpack-trees.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/unpack-trees.c b/unpack-trees.c
> index 11c37fb..c87a90a 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -123,9 +123,9 @@ void setup_unpack_trees_porcelain(struct 
> unpack_trees_options *opts,
>   msgs[ERROR_SPARSE_NOT_UPTODATE_FILE] =
>   _("Cannot update sparse checkout: the following entries are not 
> up-to-date:\n%s");
>   msgs[ERROR_WOULD_LOSE_ORPHANED_OVERWRITTEN] =
> - _("The following Working tree files would be overwritten by 
> sparse checkout update:\n%s");
> + _("The following working tree files would be overwritten by 
> sparse checkout update:\n%s");
>   msgs[ERROR_WOULD_LOSE_ORPHANED_REMOVED] =
> - _("The following Working tree files would be removed by sparse 
> checkout update:\n%s");
> + _("The following working tree files would be removed by sparse 
> checkout update:\n%s");

Probably a leftover from an old sentence starting with Working? In any
case, obviously correct too, thanks.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/


Re: [PATCH 3/5] git-rebase--interactive: fix English grammar

2016-09-08 Thread Matthieu Moy
Alex Henrie  writes:

> --- a/git-rebase--interactive.sh
> +++ b/git-rebase--interactive.sh
> @@ -1082,7 +1082,7 @@ If they are meant to go into a new commit, run:
>  
>git commit \$gpg_sign_opt_quoted
>  
> -In both case, once you're done, continue with:
> +In both cases, once you're done, continue with:

I don't remember writing this, but since I'm Cc-ed I guess I did ;-).

Obviously correct, thanks.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/


Re: [PATCH 1/3] diff.c: use diff_options directly

2016-09-08 Thread Jacob Keller
On Wed, Sep 7, 2016 at 4:36 PM, Stefan Beller  wrote:
> The value of `ecbdata->opt` is accessible via the short variable `o`
> already, so let's use that instead.
>
> Signed-off-by: Stefan Beller 

Seems reasonable.

> ---
>  diff.c | 21 ++---
>  1 file changed, 10 insertions(+), 11 deletions(-)
>
> diff --git a/diff.c b/diff.c
> index 534c12e..4a6501c 100644
> --- a/diff.c
> +++ b/diff.c
> @@ -1217,7 +1217,7 @@ static void fn_out_consume(void *priv, char *line, 
> unsigned long len)
> const char *line_prefix = diff_line_prefix(o);
>
> if (ecbdata->header) {
> -   fprintf(ecbdata->opt->file, "%s", ecbdata->header->buf);
> +   fprintf(o->file, "%s", ecbdata->header->buf);
> strbuf_reset(ecbdata->header);
> ecbdata->header = NULL;
> }
> @@ -1229,9 +1229,9 @@ static void fn_out_consume(void *priv, char *line, 
> unsigned long len)
> name_a_tab = strchr(ecbdata->label_path[0], ' ') ? "\t" : "";
> name_b_tab = strchr(ecbdata->label_path[1], ' ') ? "\t" : "";
>
> -   fprintf(ecbdata->opt->file, "%s%s--- %s%s%s\n",
> +   fprintf(o->file, "%s%s--- %s%s%s\n",
> line_prefix, meta, ecbdata->label_path[0], reset, 
> name_a_tab);
> -   fprintf(ecbdata->opt->file, "%s%s+++ %s%s%s\n",
> +   fprintf(o->file, "%s%s+++ %s%s%s\n",
> line_prefix, meta, ecbdata->label_path[1], reset, 
> name_b_tab);
> ecbdata->label_path[0] = ecbdata->label_path[1] = NULL;
> }
> @@ -1249,15 +1249,15 @@ static void fn_out_consume(void *priv, char *line, 
> unsigned long len)
> find_lno(line, ecbdata);
> emit_hunk_header(ecbdata, line, len);
> if (line[len-1] != '\n')
> -   putc('\n', ecbdata->opt->file);
> +   putc('\n', o->file);
> return;
> }
>
> if (len < 1) {
> -   emit_line(ecbdata->opt, reset, reset, line, len);
> +   emit_line(o, reset, reset, line, len);
> if (ecbdata->diff_words
> && ecbdata->diff_words->type == DIFF_WORDS_PORCELAIN)
> -   fputs("~\n", ecbdata->opt->file);
> +   fputs("~\n", o->file);
> return;
> }
>
> @@ -1282,8 +1282,8 @@ static void fn_out_consume(void *priv, char *line, 
> unsigned long len)
> }
> diff_words_flush(ecbdata);
> if (ecbdata->diff_words->type == DIFF_WORDS_PORCELAIN) {
> -   emit_line(ecbdata->opt, context, reset, line, len);
> -   fputs("~\n", ecbdata->opt->file);
> +   emit_line(o, context, reset, line, len);
> +   fputs("~\n", o->file);
> } else {
> /*
>  * Skip the prefix character, if any.  With
> @@ -1294,7 +1294,7 @@ static void fn_out_consume(void *priv, char *line, 
> unsigned long len)
>   line++;
>   len--;
> }
> -   emit_line(ecbdata->opt, context, reset, line, len);
> +   emit_line(o, context, reset, line, len);
> }
> return;
> }
> @@ -1316,8 +1316,7 @@ static void fn_out_consume(void *priv, char *line, 
> unsigned long len)
> default:
> /* incomplete line at the end */
> ecbdata->lno_in_preimage++;
> -   emit_line(ecbdata->opt,
> - diff_get_color(ecbdata->color_diff, DIFF_CONTEXT),
> +   emit_line(o, diff_get_color(ecbdata->color_diff, 
> DIFF_CONTEXT),
>   reset, line, len);
> break;
> }
> --
> 2.10.0.2.g0676c79.dirty
>


Re: [PATCH v1 2/2] read-cache: make sure file handles are not inherited by child processes

2016-09-08 Thread Lars Schneider

> On 07 Sep 2016, at 20:23, Junio C Hamano  wrote:
> 
> Eric Wong  writes:
> 
>> We probably should be using O_NOATIME for all O_RDONLY cases
>> to get the last bit of performance out (especially since
>> non-modern-Linux systems probably still lack relatime).
> 
> No, please do not go there.
> 
> The user can read from a file in a working tree using "less",
> "grep", etc., and they all update the atime, so should "git grep".
> We do not use atime ourselves on these files but we should let
> outside tools rely on the validity of atime (e.g. "what are the
> files that were looked at yesterday?").
> 
> If you grep for noatime in our current codebase, you'd notice that
> we use it only for files in objects/ hierarchy, and that makes very
> good sense.  These files are what we create for our _sole_ use and
> no other tools can peek at them and expect to get any useful
> information out of them (we hear from time to time that virus
> scanners leaving open file descriptors on them causing trouble, but
> that is an example of a useless access), and that makes a file in
> objects/ hierarchy a fair game for noatime optimization.

How do we deal with read-cache:ce_compare_data, though?

By your definition above we shouldn't use NOATIME since the read file
is not in objects/. However, the file read is not something the user
explicitly triggers. The read is part of the Git internal "clean"
machinery.

What would you suggest? Should I open the file with or without NOATIME?

Thanks,
Lars