Re: Alternatives to mid.gmane.org?

2016-08-10 Thread Eric Wong
Eric Wong  wrote:
> Duy Nguyen  wrote:
> > I read this and thought "temporarily" but apparently it's not [1]. A
> > lot of our links in the mail archive are gmane's :(
> > 
> > [1] https://lars.ingebrigtsen.no/2016/07/28/the-end-of-gmane/
> 
> I may not have time to integrate this extensibly into the public-inbox
> search engine today, but at least here is an NNTP article number to
> Message-ID mapping for non-NNTP users:

I'm working on getting "gmane:NNN" searches working.

I'm deploying and reindexing this change to one of the onions, first.
So, to get article 123 from "gmane:123":

http://czquwvybam4bgbro.onion/git/?q=gmane%3A123

(%3A is the URI escape for ':')

For those unable to run Tor, you can also try using a Tor2web
proxy listed at https://www.tor2web.org/  For example:

http://czquwvybam4bgbro.onion.to/git/?q=gmane%3A123

Additionally, searching by Message-ID with the search box will
also work after the reindexing (instead of just using the
Message-ID in the URL):

https://public-inbox.org/meta/20160810232358.GA29130@whir/

After this is fully-deployed and working well; the plan is also
to be able to fix existing gmane links (and also handling
the $gmane/NNN links that were commonly used).

> https://public-inbox.org/.temp/gmane.comp.version-control.git-300599.txt.gz
> (~5 MB)
> 
> Script used to generate this:
> 
> https://public-inbox.org/meta/20160728220145.13024-...@80x24.org/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] i18n: archive: mark errors for translation

2016-08-10 Thread Vasco Almeida
A Ter, 09-08-2016 às 12:35 -0700, Junio C Hamano escreveu:
> Hmm, this function is called by write_archive(), which can be called
> by the upload-archive process running on the remote end, whose
> locale certainly is different from that of your local environment.
> 
> If I do not read English and got one of these messages from the
> remote end, I can copy that into a search engine to read more about
> the error, but if I got it translated into, say, Portuguese, I'd
> have a (slightly) harder time dealing with the error, I would think.
> 
> Having said that, I expect that sites that expect internatinal
> audience to come would run these services in C locale, and other
> sites that target audiences in a single locale would choose to use
> their favourite single locale, so probably we do not have to worry
> about it.

I don't know what is the best. I trust you to decide. I'm happy with
whatever you decide.

When I marked those string I did not noticed what you did, just spot
the string and marked them. So I submitted this without taking that
into consideration.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 08/15] pkt-line: rename packet_write() to packet_write_fmt()

2016-08-10 Thread larsxschneider
From: Lars Schneider 

packet_write() should be called packet_write_fmt() as the string
parameter can be formatted.

Suggested-by: Junio C Hamano 
Signed-off-by: Lars Schneider 
---
 builtin/archive.c|  4 ++--
 builtin/receive-pack.c   |  4 ++--
 builtin/remote-ext.c |  4 ++--
 builtin/upload-archive.c |  4 ++--
 connect.c|  2 +-
 daemon.c |  2 +-
 http-backend.c   |  2 +-
 pkt-line.c   |  2 +-
 pkt-line.h   |  2 +-
 shallow.c|  2 +-
 upload-pack.c| 30 +++---
 11 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/builtin/archive.c b/builtin/archive.c
index a1e3b94..49f4914 100644
--- a/builtin/archive.c
+++ b/builtin/archive.c
@@ -47,10 +47,10 @@ static int run_remote_archiver(int argc, const char **argv,
if (name_hint) {
const char *format = archive_format_from_filename(name_hint);
if (format)
-   packet_write(fd[1], "argument --format=%s\n", format);
+   packet_write_fmt(fd[1], "argument --format=%s\n", 
format);
}
for (i = 1; i < argc; i++)
-   packet_write(fd[1], "argument %s\n", argv[i]);
+   packet_write_fmt(fd[1], "argument %s\n", argv[i]);
packet_flush(fd[1]);
 
buf = packet_read_line(fd[0], NULL);
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 15c323a..d60a412 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -199,7 +199,7 @@ static int receive_pack_config(const char *var, const char 
*value, void *cb)
 static void show_ref(const char *path, const unsigned char *sha1)
 {
if (sent_capabilities) {
-   packet_write(1, "%s %s\n", sha1_to_hex(sha1), path);
+   packet_write_fmt(1, "%s %s\n", sha1_to_hex(sha1), path);
} else {
struct strbuf cap = STRBUF_INIT;
 
@@ -212,7 +212,7 @@ static void show_ref(const char *path, const unsigned char 
*sha1)
if (push_cert_nonce)
strbuf_addf(, " push-cert=%s", push_cert_nonce);
strbuf_addf(, " agent=%s", git_user_agent_sanitized());
-   packet_write(1, "%s %s%c%s\n",
+   packet_write_fmt(1, "%s %s%c%s\n",
 sha1_to_hex(sha1), path, 0, cap.buf);
strbuf_release();
sent_capabilities = 1;
diff --git a/builtin/remote-ext.c b/builtin/remote-ext.c
index 88eb8f9..11b48bf 100644
--- a/builtin/remote-ext.c
+++ b/builtin/remote-ext.c
@@ -128,9 +128,9 @@ static void send_git_request(int stdin_fd, const char 
*serv, const char *repo,
const char *vhost)
 {
if (!vhost)
-   packet_write(stdin_fd, "%s %s%c", serv, repo, 0);
+   packet_write_fmt(stdin_fd, "%s %s%c", serv, repo, 0);
else
-   packet_write(stdin_fd, "%s %s%chost=%s%c", serv, repo, 0,
+   packet_write_fmt(stdin_fd, "%s %s%chost=%s%c", serv, repo, 0,
 vhost, 0);
 }
 
diff --git a/builtin/upload-archive.c b/builtin/upload-archive.c
index 2caedf1..dc872f6 100644
--- a/builtin/upload-archive.c
+++ b/builtin/upload-archive.c
@@ -88,11 +88,11 @@ int cmd_upload_archive(int argc, const char **argv, const 
char *prefix)
writer.git_cmd = 1;
if (start_command()) {
int err = errno;
-   packet_write(1, "NACK unable to spawn subprocess\n");
+   packet_write_fmt(1, "NACK unable to spawn subprocess\n");
die("upload-archive: %s", strerror(err));
}
 
-   packet_write(1, "ACK\n");
+   packet_write_fmt(1, "ACK\n");
packet_flush(1);
 
while (1) {
diff --git a/connect.c b/connect.c
index 722dc3f..5330d9c 100644
--- a/connect.c
+++ b/connect.c
@@ -730,7 +730,7 @@ struct child_process *git_connect(int fd[2], const char 
*url,
 * Note: Do not add any other headers here!  Doing so
 * will cause older git-daemon servers to crash.
 */
-   packet_write(fd[1],
+   packet_write_fmt(fd[1],
 "%s %s%chost=%s%c",
 prog, path, 0,
 target_host, 0);
diff --git a/daemon.c b/daemon.c
index e647254..2646d0f 100644
--- a/daemon.c
+++ b/daemon.c
@@ -281,7 +281,7 @@ static int daemon_error(const char *dir, const char *msg)
 {
if (!informative_errors)
msg = "access denied or repository not exported";
-   packet_write(1, "ERR %s: %s", msg, dir);
+   packet_write_fmt(1, "ERR %s: %s", msg, dir);
return -1;
 }
 
diff --git a/http-backend.c b/http-backend.c
index 0d59499..aa61c18 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -460,7 +460,7 @@ static void get_info_refs(char *arg)

Re: [PATCH 2/8] xdl_change_compact(): clarify code

2016-08-10 Thread Michael Haggerty
On 08/04/2016 01:50 AM, Stefan Beller wrote:
> On Wed, Aug 3, 2016 at 4:14 PM, Michael Haggerty  wrote:
>> On 08/04/2016 12:11 AM, Stefan Beller wrote:
>>> On Wed, Aug 3, 2016 at 3:00 PM, Michael Haggerty  
>>> wrote:
 [...]
 +
 +   /*
 +* Are there any blank lines that could appear as 
 the last
 +* line of this group?
 +*/
>>>
>>> IIRC this comment is not quite correct as this 'only' counts the number of
>>> blank lines within the forward shifting section, i.e. in the movable space.
>>>
>>> Later we use it as a boolean indicator (whether or not it is equal to 0)
>>> to see if we can do better.
>>> [...]

Thanks for your comments, Stefan.

I realized that the main thing that took me a while to grok when I was
reading this code was that blank_lines was really only used as a boolean
value, even though it was updated with "+=". That's the main information
that I'd like to convey to the reader.

So I decided to change the comment to emphasize this fact (and change it
from a question to a statement), and also changed the place that
blank_lines is updated to treat it more like a boolean. The latter
change also has the advantage of not calling is_blank_line()
unnecessarily when blank_lines is already true.

If you have no objections, that is what I will put in v2 of this patch
series:

> diff --git a/xdiff/xdiffi.c b/xdiff/xdiffi.c
> index de15de2..fde0433 100644
> --- a/xdiff/xdiffi.c
> +++ b/xdiff/xdiffi.c
> @@ -460,6 +460,12 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, 
> long flags) {
>  
> do {
> groupsize = i - start;
> +
> +   /*
> +* Boolean value that records whether there are any 
> blank
> +* lines that could be made to be the last line of 
> this
> +* group.
> +*/
> blank_lines = 0;
>  
> /*
> @@ -511,7 +517,8 @@ int xdl_change_compact(xdfile_t *xdf, xdfile_t *xdfo, 
> long flags) {
>  * the current change group.
>  */
> while (i < nrec && recs_match(recs, start, i, flags)) 
> {
> -   blank_lines += is_blank_line(recs, i, flags);
> +   if (!blank_lines)
> +   blank_lines = is_blank_line(recs, i, 
> flags);
>  
> rchg[start++] = 0;
> rchg[i++] = 1;

Michael

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] checkout: do not mention detach advice for explicit --detach option

2016-08-10 Thread Jonathan Nieder
Stefan Beller wrote:

> When a user asked for a detached HEAD specifically with `--detach`,
> we do not need to give advice on what a detached HEAD state entails as
> we can assume they know what they're getting into as they asked for it.

Example? Tests?

> Signed-off-by: Stefan Beller 
> ---
>  builtin/checkout.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

I like the idea. Thanks for writing it.

> diff --git a/builtin/checkout.c b/builtin/checkout.c
> index 27c1a05..fa2dce5 100644
> --- a/builtin/checkout.c
> +++ b/builtin/checkout.c
> @@ -655,7 +655,7 @@ static void update_refs_for_switch(const struct 
> checkout_opts *opts,
>   update_ref(msg.buf, "HEAD", new->commit->object.oid.hash, NULL,
>  REF_NODEREF, UPDATE_REFS_DIE_ON_ERR);
>   if (!opts->quiet) {
> - if (old->path && advice_detached_head)
> + if (old->path && advice_detached_head && 
> !opts->force_detach)
>   detach_advice(new->name);
>   describe_detached_head(_("HEAD is now at"), 
> new->commit);
>   }
> -- 
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 05/15] pkt-line: add packet_write_gently_fmt()

2016-08-10 Thread Junio C Hamano
Lars Schneider  writes:

>>> Similarly, I'd think this could share code with the non-gentle form
>>> (which should be able to just call this and die() if returns an error).
>>> Though sometimes the va_list transformation makes that awkward.
>> 
>> Yes.
>
> Peff just posted that he tried the shared code idea but the result
> ended up ugly.

OK.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in git.git (Aug 2016, #03; Mon, 8)

2016-08-10 Thread Lars Schneider

> On 10 Aug 2016, at 19:36, Stefan Beller  wrote:
> 
> On Wed, Aug 10, 2016 at 10:30 AM, Lars Schneider
>  wrote:
>> 
>>> 
>>> * sb/submodule-update-dot-branch (2016-08-03) 7 commits
>>> (merged to 'next' on 2016-08-04 at 47bff41)
>>> + submodule update: allow '.' for branch value
>>> + submodule--helper: add remote-branch helper
>>> + submodule-config: keep configured branch around
>>> + submodule--helper: fix usage string for relative-path
>>> + submodule update: narrow scope of local variable
>>> + submodule update: respect depth in subsequent fetches
>>> + t7406: future proof tests with hard coded depth
>>> 
>>> A few updates to "git submodule update".
>>> 
>>> Will merge to 'master'.
>> 
>> I think "t7406: future proof tests with hard coded depth"
>> breaks the tests on OSX:
>> 
>> https://travis-ci.org/git/git/jobs/150779244
>> 
>> Cheers,
>> Lars
>> 
> 
> 
> error: pathspec '4' did not match any file(s) known to git.
> 
> not ok 46 - submodule update clone shallow submodule
> 
> #
> # test_when_finished "rm -rf super3" &&
> # first=$(git -C cloned submodule status submodule |cut -c2-41) &&
> # second=$(git -C submodule rev-parse HEAD) &&
> # commit_count=$(git -C submodule rev-list $first^..$second | wc -l) &&
> # git clone cloned super3 &&
> # pwd=$(pwd) &&
> # (
> # cd super3 &&
> # sed -e "s#url = ../#url = file://$pwd/#" <.gitmodules >.gitmodules.tmp &&
> # mv -f .gitmodules.tmp .gitmodules &&
> # git submodule update --init --depth=$commit_count &&
> # test 1 = $(git -C submodule log --oneline | wc -l)
> # )
> #
> 
> 
> Is it possible that the "wc -l" produces  SP  on OSX,
> such that the
> 
> # git submodule update --init --depth=$commit_count
> 
> contains "--depth= 4" which means empty depth and 4 as the pathspec
> for the update command?

Consider this:

~code/git git:(master) ▶ ls | wc -l
 747

Apparently `wc -l` adds 5 spaces on OS X...

- Lars

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 04/15] pkt-line: add packet_write_gently()

2016-08-10 Thread Lars Schneider

> On 10 Aug 2016, at 19:17, Junio C Hamano  wrote:
> 
> Jeff King  writes:
> 
>> On Wed, Aug 10, 2016 at 03:36:45PM +0200, Lars Schneider wrote:
>> 
 So now we have packet_write() and packet_write_gently(), but they differ
 in more than just whether they are gentle. That seems like a weird
 interface.
 
 Should we either be picking a new name (e.g., packet_write_mem() or
 something), or migrating packet_write() to packet_write_fmt()?
>>> 
>>> Done in "[PATCH v5 08/15] pkt-line: rename packet_write() to 
>>> packet_write_fmt()"
>> 
>> Ah, OK. Generally I'd suggest to reorder things so that each patch looks
>> like a step forward (and so the early patches become preparatory steps,
>> and the justification in them is something like "we're going to add more
>> write functions, so let's give this a more descriptive name").
> 
> I am guilty for saying "packet_write() should have been similar to
> write(2)".  We may want to have a time-period during which there is
> no "packet_write()" in the codebase, before we get to that stage.
> I.e. rename it to packet_write_fmt() to vacate the name and add
> packet_write_mem(), and then later rename packet_write_mem() to its
> final name packet_write(), or something like that.  The two-step
> process would reduce the chance of misconversion.

OK. Does this mean I can leave the "packet_write()" to "packet_write_fmt()"
rename as is in this series?

Thanks,
Lars

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 04/15] pkt-line: add packet_write_gently()

2016-08-10 Thread Junio C Hamano
Lars Schneider  writes:

>> On 10 Aug 2016, at 19:17, Junio C Hamano  wrote:
>> 
> OK. Does this mean I can leave the "packet_write()" to "packet_write_fmt()"
> rename as is in this series?

I didn't really check what order you are doing things to answer
that.

If the function that is introduced in this step is a version of
packet_write_fmt() that does its thing only gently, you would want
to do the rename s/packet_write/packet_write_fmt/ before this step,
and then add the new function as packet_write_fmt_gently(), I would
think.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 04/15] pkt-line: add packet_write_gently()

2016-08-10 Thread Lars Schneider

> On 10 Aug 2016, at 20:21, Junio C Hamano  wrote:
> 
> Lars Schneider  writes:
> 
>>> On 10 Aug 2016, at 19:17, Junio C Hamano  wrote:
>>> 
>> OK. Does this mean I can leave the "packet_write()" to "packet_write_fmt()"
>> rename as is in this series?
> 
> I didn't really check what order you are doing things to answer
> that.
> 
> If the function that is introduced in this step is a version of
> packet_write_fmt() that does its thing only gently, you would want
> to do the rename s/packet_write/packet_write_fmt/ before this step,
> and then add the new function as packet_write_fmt_gently(), I would
> think.

OK - will fix. I did it that way because I thought it would be easier
if we decide to drop the big rename patch.

Thanks,
Lars

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git cherry-pick conflict error message is deceptive when cherry-picking multiple commits

2016-08-10 Thread Stephen Morton
On Mon, Aug 1, 2016 at 5:12 AM, Johannes Schindelin 
 wrote:

Hi Stephen,

On Wed, 27 Jul  2016, Stephen Morton wrote:


On Wed, Jul  27, 2016 at 11:03 AM, Johannes Schindelin
  wrote:
>
> On Wed,  27 Jul 2016, Stephen Morton wrote:
>
>>  diff --git a/sequencer.c b/sequencer.c
>>  index cdfac82..ce06876 100644
>> ---  a/sequencer.c
>> +++  b/sequencer.c
>> @@  -176,7 +176,8 @@ static void print_advice(int show_hint, struct
>>  replay_opts *opts)
>> else
>> advise(_("after resolving the conflicts, mark
>> the  corrected paths\n"
>>  "with 'git add ' or 'git rm '\n"
>> -  "and commit the result with 'git commit'"));
>> +  "then continue the %s with 'git %s
>>  --continue'\n"
>> +  "or cancel the %s operation with 'git
>> %s  --abort'" ),  action_name(opts), action_name(opts),
>>  action_name(opts), action_name(opts));
>
> That is  an awful lot of repetition right there, with an added
>  inconsistency that the action is referred to by its name alone in the
>  "--continue" case, but with "operation" added in the "--abort" case.
>
> And  additionally, in the most common case (one commit to cherry-pick), the
> advice  now suggests a more complicated operation than necessary: a simply
> `git  commit` would be enough, then.
>
> Can't  we have a test whether this is the last of the commits to be
>  cherry-picked, and if so, have the simpler advice again?

Ok, knowing  that I'm not on the last element of the sequencer is
beyond my  git code knowledge.


Oh, my mistake:  I meant to say that this information could be easily
provided by  `pick_commits()` if it passed it to `print_advice()` via
 `do_pick_commit()`.

Ciao,
Johannes


Formatting on previous email was terrible, plus the diff wasn't 
performed against origin. Re-sending.


(Finally getting back to this.)

Something like the diff below, then Johannes?

(I intentionally print the '--continue' hint even in the case whereit's 
last of n commits that fails.)



Stephen

~/ws/extern/git (maint *%>) > git diff @{u}
diff --git a/sequencer.c b/sequencer.c
index c6362d6..e0071aa 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -154,7 +154,7 @@ static void free_message(struct commit *commit, 
struct commit_message *msg)

unuse_commit_buffer(commit, msg->message);
 }

-static void print_advice(int show_hint, struct replay_opts *opts)
+static void print_advice(int show_hint, int multiple_commits, struct 
replay_opts *opts)

 {
char *msg = getenv("GIT_CHERRY_PICK_HELP");

@@ -174,9 +174,14 @@ static void print_advice(int show_hint, struct 
replay_opts *opts)
advise(_("after resolving the conflicts, mark 
the corrected paths\n"
 "with 'git add ' or 'git rm 
'"));

else
-   advise(_("after resolving the conflicts, mark 
the corrected paths\n"
-"with 'git add ' or 'git rm 
'\n"
-"and commit the result with 'git 
commit'"));

+if  (multiple_commits)
+   advise(_("after resolving the conflicts, 
mark the corrected paths with 'git add ' or 'git rm '\n"
+"then continue with 'git %s 
--continue'\n"
+"or cancel with 'git %s 
--abort'" ), action_name(opts), action_name(opts));

+else
+advise(_("after resolving the 
conflicts, mark the corrected paths\n"
+"with 'git add ' or 'git 
rm '\n"
+"and commit the result with 
'git commit'"));

}
 }

@@ -440,7 +445,7 @@ static int allow_empty(struct replay_opts *opts, 
struct commit *commit)

return 1;
 }

-static int do_pick_commit(struct commit *commit, struct replay_opts *opts)
+static int do_pick_commit(struct commit *commit, struct replay_opts 
*opts, int multiple_commits)

 {
unsigned char head[20];
struct commit *base, *next, *parent;
@@ -595,7 +600,7 @@ static int do_pick_commit(struct commit *commit, 
struct replay_opts *opts)

  : _("could not apply %s... %s"),
  find_unique_abbrev(commit->object.oid.hash, 
DEFAULT_ABBREV),

  msg.subject);
-   print_advice(res == 1, opts);
+   print_advice(res == 1, multiple_commits, opts);
rerere(opts->allow_rerere_auto);
goto leave;
}
@@ -959,6 +964,7 @@ static int pick_commits(struct commit_list 
*todo_list, struct replay_opts *opts)

 {
struct commit_list *cur;
int res;
+int multiple_commits = (todo_list->next) != NULL;

setenv(GIT_REFLOG_ACTION, action_name(opts), 0);

Re: [PATCH 8/8] diff: improve positioning of add/delete blocks in diffs

2016-08-10 Thread Michael Haggerty
On 08/04/2016 09:39 PM, Junio C Hamano wrote:
> Michael Haggerty  writes:
> 
 +   }
 +   /*
 +* We have reached the end of the line without finding any 
 non-space
 +* characters; i.e., the whole line consists of trailing spaces, 
 which we
 +* are not interested in.
 +*/
 +   return -1;
> 
> Not related to Jacob's review, but "the whole line consists of
> trailing spaces" made me read it twice; while it is technically
> correct, "the whole line consists of spaces", or even "this is a
> blank line", would read a lot more easily, at least for me.

Hehe, yes, ETOOMANYWORDS.

>> I was implicitly assuming that such lines would have text somewhere
>> after those 200 spaces (or 25 TABs or whatever). But you're right, the
>> line could consist only of whitespace. Unfortunately, the only way to
>> distinguish these two cases is to read the rest of the line, which is
>> exactly what we *don't* want to do.
> 
> Hmm, why is it exactly what we don't want to do?  Is it a
> performance concern?  In other words, is it because this function is
> called many times to measure the same line multiple times?

Yes, performance is the reason, and especially the desire to avoid
unreasonable runtimes for pathological cases. Thanks for asking, though,
because it's worthwhile to look into this more rigorously.

Suppose there is a slider that can be shifted to any of `numshift`
positions. Then we have to call `measure_split()` `2*numshift` times
(once for the beginning and once for the end of each candidate slider
position).

Suppose there are `numblanks` blank lines in the neighborhood of the
slider. Each time we call `measure_split()`, we count the number of
blank lines before and after the proposed split position. So we end up
calling `get_indent()` `2*numshift*numblanks` times.

Finally, suppose that the blank lines each contain `numws` whitespace
characters. Then each call to `get_indent()` has to do `O(numws)` work.

So altogether, if there were no limits, then the amount of work to
position a slider would scale like

O(numshift * numblanks * numws)

However, the total number of characters in the file might only be

O(numblanks * numws)

So without limits, the amount of work to position sliders could scale by
numshift times the size of the file.

The worst case is pretty easy to achieve. Just create a file consisting
of a million or so LF characters, then add another LF to it. The diff
would be a slider with

numshift = 100
numblanks = 100
numws = 1

so the heuristic would take O(N^2) in the size of the file.

Effectively the limits cap the effective `numblanks` at `2*MAX_BLANKS`
(which is 2*20) and the effective `numws` at `MAX_INDENT` (which is
200), meaning that the maximum effort scales like

numshift * 2*20 * 200

which is still a big number but not absurd. Empirically, for the case
described above, `git diff` takes 104 ms and `git diff
--indent-heuristic` takes 490 ms. I think that's not prohibitive for a
pathological case.

Meanwhile, Myers's algorithm scales like O(ND), where N is the number of
lines and D is the edit distance, so I suppose that it is already
possible to find diffs that are intractable to compute.

> After
> all, somebody in this file is already scanning each and every line
> to see where it ends to split the input into records, so perhaps a
> "right" (if the "theoretical correctness" of the return value from
> this function mattered, which you wave-away below) optimization
> could be to precompute it while the lines are broken into records
> and store it in the "rec" structure?

That would certainly be possible, and would help in cases where there
are a lot of lines with lots of leading whitespace. You could get nearly
the same benefit by recording a single bit in struct rec, indicating
whether the line is blank or not.

But it wouldn't help the worst case described above, where each call to
`git_indent()` is already very cheap. And I didn't think it was worth
allocating the extra memory to optimize this heuristic

* which isn't used all that often in the first place,
* which (for normal inputs) doesn't take a significant amount of time, and
* when the optimization doesn't improve the worst-case scenario (and
thus any DoS potential)

I think the only way to ensure O(size_of_file) runtime in the above case
would be to record, along with each line, how many blank lines
immediately precede and succeed it. You could achieve something like
O(size_of_file lg(size_of_file)) by storing, e.g., the total number of
nonblank lines that precede each line and doing a binary search to find
the nearest non-blank line.

>> But I think it doesn't matter anyway. Such "text" will likely never be
>> read by a human, so it's not a big deal if the slider position is not
>> picked perfectly. And remember, this whole saga is just to improve the
>> aesthetics of the diff. The diff is *correct* 

[PATCH v5 10/15] convert: quote filter names in error messages

2016-08-10 Thread larsxschneider
From: Lars Schneider 

Git filter driver commands with spaces (e.g. `filter.sh foo`) are hard to
read in error messages. Quote them to improve the readability.

Signed-off-by: Lars Schneider 
---
 convert.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/convert.c b/convert.c
index b1614bf..522e2c5 100644
--- a/convert.c
+++ b/convert.c
@@ -397,7 +397,7 @@ static int filter_buffer_or_fd(int in, int out, void *data)
child_process.out = out;
 
if (start_command(_process))
-   return error("cannot fork to run external filter %s", 
params->cmd);
+   return error("cannot fork to run external filter '%s'", 
params->cmd);
 
sigchain_push(SIGPIPE, SIG_IGN);
 
@@ -415,13 +415,13 @@ static int filter_buffer_or_fd(int in, int out, void 
*data)
if (close(child_process.in))
write_err = 1;
if (write_err)
-   error("cannot feed the input to external filter %s", 
params->cmd);
+   error("cannot feed the input to external filter '%s'", 
params->cmd);
 
sigchain_pop(SIGPIPE);
 
status = finish_command(_process);
if (status)
-   error("external filter %s failed %d", params->cmd, status);
+   error("external filter '%s' failed %d", params->cmd, status);
 
strbuf_release();
return (write_err || status);
@@ -462,15 +462,15 @@ static int apply_filter(const char *path, const char 
*src, size_t len, int fd,
return 0;   /* error was already reported */
 
if (strbuf_read(, async.out, len) < 0) {
-   error("read from external filter %s failed", cmd);
+   error("read from external filter '%s' failed", cmd);
ret = 0;
}
if (close(async.out)) {
-   error("read from external filter %s failed", cmd);
+   error("read from external filter '%s' failed", cmd);
ret = 0;
}
if (finish_async()) {
-   error("external filter %s failed", cmd);
+   error("external filter '%s' failed", cmd);
ret = 0;
}
 
-- 
2.9.2

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 06/15] pkt-line: add packet_flush_gently()

2016-08-10 Thread larsxschneider
From: Lars Schneider 

packet_flush() would die in case of a write error even though for some callers
an error would be acceptable. Add packet_flush_gently() which writes a pkt-line
flush packet and returns `0` for success and `-1` for failure.

Signed-off-by: Lars Schneider 
---
 pkt-line.c | 6 ++
 pkt-line.h | 1 +
 2 files changed, 7 insertions(+)

diff --git a/pkt-line.c b/pkt-line.c
index a8207dd..e6a0924 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -92,6 +92,12 @@ void packet_flush(int fd)
write_or_die(fd, "", 4);
 }
 
+int packet_flush_gently(int fd)
+{
+   packet_trace("", 4, 1);
+   return (write_in_full(fd, "", 4) == 4 ? 0 : -1);
+}
+
 void packet_buf_flush(struct strbuf *buf)
 {
packet_trace("", 4, 1);
diff --git a/pkt-line.h b/pkt-line.h
index 82676f4..b6c8bcd 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -23,6 +23,7 @@ void packet_flush(int fd);
 void packet_write(int fd, const char *fmt, ...) __attribute__((format (printf, 
2, 3)));
 void packet_buf_flush(struct strbuf *buf);
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...) 
__attribute__((format (printf, 2, 3)));
+int packet_flush_gently(int fd);
 int packet_write_gently_fmt(int fd, const char *fmt, ...) 
__attribute__((format (printf, 2, 3)));
 
 /*
-- 
2.9.2

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] merge: use string_list_split() in add_strategies()

2016-08-10 Thread Johannes Schindelin
Hi Junio,

On Mon, 8 Aug 2016, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> > I wonder, however, if we could somhow turn things around by
> > introducing something like
> >
> > split_and_do_for_each(item_p, length, string, delimiter)
> > ...  ...
> >
> > that both string_list_split() *and* add_strategies() could use? We
> > would then be able to avoid allocating the list and duplicating the
> > items in the latter case.
> 
> I do think such a feature may be useful if we often work on pieces of a
> string delimited by a delimiter, but if the caller does not see the
> split result, then the function with "split" is probably misnamed.
> 
> I however suspect the variant of this where "delimiter" can just be a
> single byte would not be so useful.
> 
> If the input comes from the end user, we certainly would want to allow
> "word1  word2\tword3 " as input (i.e. squishing repeated delimiters into
> one without introducing an "empty" element, allowing more than one
> delimiter characters like SP and HT, and ignoring leading or trailing
> runs of delimiter characters).
> 
> If the input is generated internally, perhaps we should rethink the
> interface between the function that wants to do the for-each-word and
> its caller; if the caller wants to pass multiple things to the callee,
> it should be able to do so without first having to stuff these multiple
> things into a single string, only to force the callee to use this helper
> to split them out into individual pieces.

All true, but I guess this type of complexity would really complexify
René's patch too much, so I am comfortable with the patch as-is.

Ciao,
Dscho

Re: [PATCH v5 05/15] pkt-line: add packet_write_gently_fmt()

2016-08-10 Thread Jeff King
On Wed, Aug 10, 2016 at 04:10:02PM +0200, Lars Schneider wrote:

> 
> > On 10 Aug 2016, at 15:43, Jeff King  wrote:
> > 
> > On Wed, Aug 10, 2016 at 03:04:01PM +0200, larsxschnei...@gmail.com wrote:
> > 
> >> +int packet_write_gently_fmt(int fd, const char *fmt, ...)
> >> +{
> >> +  static struct strbuf buf = STRBUF_INIT;
> >> +  va_list args;
> >> +
> >> +  strbuf_reset();
> >> +  va_start(args, fmt);
> >> +  format_packet(1, , fmt, args);
> >> +  va_end(args);
> >> +  packet_trace(buf.buf + 4, buf.len - 4, 1);
> >> +  return (write_in_full(fd, buf.buf, buf.len) == buf.len ? 0 : -1);
> >> +}
> > 
> > Could the end of this function just be:
> > 
> >  return packet_write_gently(fd, buf.buf, buf.len);
> > 
> > ? I guess we'd prefer to avoid that, because it incurs an extra
> > memmove() of the data.
> 
> I don't think the memmove would be that expensive. However, format_packet()
> already creates the packet_header and packet_write_gently would do the same
> again, no?

Yeah, I think you would want extra refactoring to have a shared common
function. I took a stab at it, but the result ends up pretty ugly; the
amount of boilerplate exceeds the duplication here (the really nasty
thing is that format_packet() is hard to split up, because the part you
want to switch out is in the middle, but it needs to keep some context
between the start and the end. In a higher level language you'd pass it
a callback to fill in the strbuf in the middle, but in C that just ends
up horrible).

> > Similarly, I'd think this could share code with the non-gentle form
> > (which should be able to just call this and die() if returns an error).
> > Though sometimes the va_list transformation makes that awkward.
> 
> Yeah, the code duplication annoyed me, too. va_list was the reason I did it
> that way. Do you think that is something that needs to be addressed in the
> series?

No, I don't think it needs to be. It's just a case of making sure that
the internals don't grow too crufty and unmanageable for future
maintainability.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git cherry-pick conflict error message is deceptive when cherry-picking multiple commits

2016-08-10 Thread Stephen Morton
On Mon, Aug 1, 2016 at 5:12 AM, Johannes Schindelin
 wrote:
> Hi Stephen,
>
> On Wed, 27 Jul 2016, Stephen Morton wrote:
>
>> On Wed, Jul 27, 2016 at 11:03 AM, Johannes Schindelin
>>  wrote:
>> >
>> > On Wed, 27 Jul 2016, Stephen Morton wrote:
>> >
>> >> diff --git a/sequencer.c b/sequencer.c
>> >> index cdfac82..ce06876 100644
>> >> --- a/sequencer.c
>> >> +++ b/sequencer.c
>> >> @@ -176,7 +176,8 @@ static void print_advice(int show_hint, struct
>> >> replay_opts *opts)
>> >> else
>> >> advise(_("after resolving the conflicts, mark
>> >> the corrected paths\n"
>> >>  "with 'git add ' or 'git rm 
>> >> '\n"
>> >> -"and commit the result with 'git 
>> >> commit'"));
>> >> +"then continue the %s with 'git %s
>> >> --continue'\n"
>> >> +"or cancel the %s operation with 'git
>> >> %s --abort'" ),  action_name(opts), action_name(opts),
>> >> action_name(opts), action_name(opts));
>> >
>> > That is an awful lot of repetition right there, with an added
>> > inconsistency that the action is referred to by its name alone in the
>> > "--continue" case, but with "operation" added in the "--abort" case.
>> >
>> > And additionally, in the most common case (one commit to cherry-pick), the
>> > advice now suggests a more complicated operation than necessary: a simply
>> > `git commit` would be enough, then.
>> >
>> > Can't we have a test whether this is the last of the commits to be
>> > cherry-picked, and if so, have the simpler advice again?
>>
>> Ok, knowing that I'm not on the last element of the sequencer is
>> beyond my git code knowledge.
>
> Oh, my mistake: I meant to say that this information could be easily
> provided by `pick_commits()` if it passed it to `print_advice()` via
> `do_pick_commit()`.
>
> Ciao,
> Johannes

(Finally getting back to this.)
Something like this, then Johannes?
(I intentionally print the '--continue' hint even in the case where
it's last of n commits that fails.)

~/ws/extern/git (maint *%>) > git diff
diff --git a/sequencer.c b/sequencer.c
index 617a3df..e0071aa 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -154,7 +154,7 @@ static void free_message(struct commit *commit,
struct commit_message *msg)
unuse_commit_buffer(commit, msg->message);
 }

-static void print_advice(int show_hint, struct replay_opts *opts)
+static void print_advice(int show_hint, int multiple_commits, struct
replay_opts *opts)
 {
char *msg = getenv("GIT_CHERRY_PICK_HELP");

@@ -174,14 +174,14 @@ static void print_advice(int show_hint, struct
replay_opts *opts)
advise(_("after resolving the conflicts, mark
the corrected paths\n"
 "with 'git add ' or 'git rm '"));
else
-if  (! file_exists(git_path_seq_dir()))
-advise(_("after resolving the
conflicts, mark the corrected paths\n"
-"with 'git add ' or
'git rm '\n"
- "and commit the
result with 'git commit'"));
-else
+if  (multiple_commits)
advise(_("after resolving the
conflicts, mark the corrected paths with 'git add ' or 'git rm
'\n"
 "then continue with 'git %s
--continue'\n"
 "or cancel with 'git %s
--abort'" ), action_name(opts), action_name(opts));
+else
+advise(_("after resolving the
conflicts, mark the corrected paths\n"
+"with 'git add ' or
'git rm '\n"
+"and commit the result with
'git commit'"));
}
 }

@@ -445,7 +445,7 @@ static int allow_empty(struct replay_opts *opts,
struct commit *commit)
return 1;
 }

-static int do_pick_commit(struct commit *commit, struct replay_opts *opts)
+static int do_pick_commit(struct commit *commit, struct replay_opts
*opts, int multiple_commits)
 {
unsigned char head[20];
struct commit *base, *next, *parent;
@@ -600,7 +600,7 @@ static int do_pick_commit(struct commit *commit,
struct replay_opts *opts)
  : _("could not apply %s... %s"),
  find_unique_abbrev(commit->object.oid.hash,
DEFAULT_ABBREV),
  msg.subject);
-   print_advice(res == 1, opts);
+   print_advice(res == 1, multiple_commits, opts);
rerere(opts->allow_rerere_auto);
goto leave;
}
@@ -964,6 +964,7 @@ static int pick_commits(struct commit_list
*todo_list, struct replay_opts *opts)
 {
struct commit_list *cur;
int res;
+int multiple_commits = 

Re: [PATCH 8/8] diff: improve positioning of add/delete blocks in diffs

2016-08-10 Thread Michael Haggerty
On 08/04/2016 02:04 AM, Stefan Beller wrote:
> On Wed, Aug 3, 2016 at 4:30 PM, Michael Haggerty  wrote:
>> Stefan Beller wrote:
>>> [...]
>>> Rather the 10 describes the ratio of "advanced magic" to pure indentation
>>> based scoring in my understanding.
>>
>> No, it's basically just a number against which the other constants are
>> compared. E.g., if another bonus wants to balance out against exactly
>> one space of indentation, its constant needs to be 10. If it wants to
>> balance out against exactly 5 spaces, its constant needs to be 50. Etc.
> 
> So another interpretation is that the 10 gives the resolution for all other
> constants, i.e. if we keep 10, then we can only give weights in 1/10 of
> "one indent". But the "ideal" weight may not be a multiple of 1/10,
> so we approximate them to the nearest multiple of 1/10.
> 
> If we were to use 1000 here, we could have a higher accuracy of the
> other constants, but probably we do not care about the 3rd decimal place
> for these because they are created heuristically from a corpus that may
> not warrant a precision of constants with a 3rd decimal place.

Not only that. Since all of the inputs to the heuristic are integers,
the outputs are discontinuous and can take only certain discrete values.
And the outputs are only ever compared against one another. So a small
adjustment to the output will only make a difference if it causes the
value to become larger/smaller than another of the possible output
values. So too high a resolution makes no sense at all.

That being said, I didn't actually check in any systematic way whether
the resolution of 10:1 is high enough in practice. But I can say that
the overall performance of the heuristic (in terms of number of errors
counted) remained constant over a relatively large parameter range, so I
think the resolution is sufficient.

Feel free to play with the parameters and/or other heuristics. The tools
and raw data are all published in [1]. Let me know if you need help
getting started.

Michael

[1] https://github.com/mhagger/diff-slider-tools

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/8] xdl_change_compact(): fix compaction heuristic to adjust io

2016-08-10 Thread Michael Haggerty
On 08/10/2016 06:58 PM, Michael Haggerty wrote:
> On 08/04/2016 09:27 AM, Jeff King wrote:
>> On Thu, Aug 04, 2016 at 12:00:33AM +0200, Michael Haggerty wrote:
>>
>>> The code branch used for the compaction heuristic incorrectly forgot to
>>> keep io in sync while the group was shifted. I think that could have
>>> led to reading past the end of the rchgo array.
>>>
>>> Signed-off-by: Michael Haggerty 
>>> ---
>>> I didn't actually try to verify the presence of a bug, because it
>>> seems like more work than worthwhile. But here is my reasoning:
>>>
>>> If io is not decremented correctly during one iteration of the outer
>>> `while` loop, then it will loose sync with the `end` counter. In
>>> particular it will be too large.
>>>
>>> Suppose that the next iterations of the outer `while` loop (i.e.,
>>> processing the next block of add/delete lines) don't have any sliders.
>>> Then the `io` counter would be incremented by the number of
>>> non-changed lines in xdf, which is the same as the number of
>>> non-changed lines in xdfo that *should have* followed the group that
>>> experienced the malfunction. But since `io` was too large at the end
>>> of that iteration, it will be incremented past the end of the
>>> xdfo->rchg array, and will try to read that memory illegally.
>>
>> Hmm. In the loop:
>>
>>   while (rchgo[io])
>>  io++;
>>
>> that implies that rchgo has a zero-marker that we can rely on hitting.
> 
> I agree.
> 
>> And it looks like rchgo[io] always ends the loop on a 0. So it seems
>> like we would just hit that condition again.
> 
> Correct...in this loop. But there is another place where `io` is
> incremented unconditionally. In the version before my changes, it is via
> the preincrement operator in this while statement conditional:
> 
> https://github.com/mhagger/git/blob/a28705da929ad746abcb34270947f738549d3246/xdiff/xdiffi.c#L502
> 
> After my changes, the unconditional increment is more obvious because I
> took it out of the while condition:
> 
> https://github.com/mhagger/git/blob/39a135da93834fd72ee923d95d0cebfe525dfe7a/xdiff/xdiffi.c#L541
> 
> (BTW, I think this is a good example of how patch 2/8 makes the code
> easier to reason about.)

Actually, for the case that no more sliders are found in the file, the
key lines where io is incremented unconditionally are

https://github.com/mhagger/git/blob/a28705da929ad746abcb34270947f738549d3246/xdiff/xdiffi.c#L438

before the change (note that the post-increment happens even if the
while condition returns false), and

https://github.com/mhagger/git/blob/39a135da93834fd72ee923d95d0cebfe525dfe7a/xdiff/xdiffi.c#L443-L444

after the change. (The lines I mentioned in my previous email are also
unconditional increments, but those are only executed in the case that
more sliders are found.)

Michael

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Spelling fixes

2016-08-10 Thread Junio C Hamano
Johannes Sixt  writes:

> Am 09.08.2016 um 20:19 schrieb Junio C Hamano:
>>  
>> 
>> accidentlyaccidentally
>> commited  committed
>> dependancydependency
>> emtpy empty
>> existance existence
>> explicitely   explicitly
>> git-upload-achive git-upload-archive
>> hierachy  hierarchy
>> intialinitial
>> mulitple  multiple
>> non-existant  non-existent
>> precendence.  precedence.
>> priviledged   privileged
>> programatically   programmatically
>> psuedo-binary pseudo-binary
>> soemwhere somewhere
>> successfull   successful
>> unkownunknown
>> usefull   useful
>> writting  writing
>
> This reminds me of one occurrence of "indegee" in commit-slab.h that
> should be "indegree". The word is not in this list above, so I don't
> know whether you would like to have it fixed in the same patch or not...
>
> (BTW, is that word "in-degree" or "ingredient"?)

IIRC it is the former, to count "how many incoming arcs does this
node have?"

Will queue to squash in.  Thanks.

> -- Hannes
>
> diff --git a/commit-slab.h b/commit-slab.h
> --- a/commit-slab.h
> +++ b/commit-slab.h
> @@ -8,7 +8,7 @@
>   *
>   * After including this header file, using:
>   *
> - * define_commit_slab(indegee, int);
> + * define_commit_slab(indegree, int);
>   *
>   * will let you call the following functions:
>   *
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v10 33/40] environment: add set_index_file()

2016-08-10 Thread Junio C Hamano
Christian Couder  writes:

>> Isn't the mention on NO_THE_INDEX_COMPATIBILITY_MACROS in the added
>> comments (there are two) pure red-herring?
>
> Yeah, true.
>
> So do you want me to refactor the code to use
> hold_lock_file_for_update() instead of hold_locked_index() and to
> avoid the set_index_file() hack?

I somehow thought that we already agreed to accept the technical
debt, taking your "it is too much work" assessment at the face
value.  If you now think it is feasible within the scope of the
series, of course we'd prefer it done "right" ;-)

> Or would the set_index_file() hack be ok with a commit message like
> the following:
>
> ---
> Introduce set_index_file() to be able to temporarily change the
> index file.
>
> Yeah, this is a short cut and this new function should not be used
> by other code.
>
> It adds a small technical debt, that could perhaps be avoided with a
> refactoring and by using hold_lock_file_for_update() instead of
> hold_locked_index() to pass the filename where the index should be
> written.

Is the problematic hold_locked_index() something you do yourself, or
buried deep inside the callchain of a helper function you use?  I am
sensing that it is the latter (otherwise you wouldn't be doing the
hack or at least will trivially fixing it up in a later "now we did
a faithful conversion from the previous code using GIT_INDEX_FILE
enviornment variable in an earlier step. Let's clean it up by being
in full control of what file we read from and write to" step in the
series).

Do you also have an issue on the reading side (i.e. you write it out
to temporary file and then later re-read the in-core cache from that
temporary file, for example)?  Or is a single "write to a temporary
index" the only thing you need to work around?  

The "Yeah, this is a short cut ..." admission of guilt is much much
less interesting than showing later people a way forward when they
help us by cleaning up the mess we decided to leave behind for
expediency.  I am not seeing that "here are the callchains that need
to be proper refactored, if we want to avoid this hack" yet; but
that is what would help them.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in git.git (Aug 2016, #03; Mon, 8)

2016-08-10 Thread Lars Schneider

> 
> * sb/submodule-update-dot-branch (2016-08-03) 7 commits
>  (merged to 'next' on 2016-08-04 at 47bff41)
> + submodule update: allow '.' for branch value
> + submodule--helper: add remote-branch helper
> + submodule-config: keep configured branch around
> + submodule--helper: fix usage string for relative-path
> + submodule update: narrow scope of local variable
> + submodule update: respect depth in subsequent fetches
> + t7406: future proof tests with hard coded depth
> 
> A few updates to "git submodule update".
> 
> Will merge to 'master'.

I think "t7406: future proof tests with hard coded depth"
breaks the tests on OSX:

https://travis-ci.org/git/git/jobs/150779244

Cheers,
Lars

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 03/15] pkt-line: add `gentle` parameter to format_packet()

2016-08-10 Thread Jeff King
On Wed, Aug 10, 2016 at 03:59:19PM +0200, Lars Schneider wrote:

> > It does still feel a little weird that you cannot tell the difference
> > between a write() error and bad input. Because you really might want to
> > do something different between the two. Like:
> > 
> >  #define MAX_FILENAME (PKTLINE_DATA_MAXLEN - strlen("filename"))
> > 
> >  if (filename > MAX_FILENAME) {
> > warning("woah, that name is ridiculous; truncating");
> > ret = packet_write_fmt_gently(fd, "%.*s", MAX_FILENAME, filename);
> >  } else
> > ret = packet_write_fmt_gently(fd, "%s", filename);
> 
> 
> I can do that. However, I wouldn't truncate the filename as this
> might create a weird outcome. I would just let the filter fail.

Yeah, I think that is probably fine (I don't have a real opinion for
this particular case, but was mostly just trying to think about whether
the pktline interface was suitably flexible).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in git.git (Aug 2016, #03; Mon, 8)

2016-08-10 Thread Johannes Schindelin
Hi Elijah,

On Tue, 9 Aug 2016, Elijah Newren wrote:

> [... detailed review ...]
>
> So, I think the series looks good.

Thank you very much for this quite thorough review!

> Sorry that I didn't spot any errors.

That's quite alright with me ;-)

Thanks again!
Dscho
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 02/15] pkt-line: call packet_trace() only if a packet is actually send

2016-08-10 Thread Jeff King
On Wed, Aug 10, 2016 at 03:03:58PM +0200, larsxschnei...@gmail.com wrote:

> From: Lars Schneider 
> 
> The packet_trace() call is not ideal in format_packet() as we would print
> a trace when a packet is formatted and (potentially) when the packet is
> actually send. This was no problem up until now because format_packet()
> was only used by one function. Fix it by moving the trace call into the
> function that actually sends the packet.

It looks like there are two functions: packet_write() and
packet_buf_write().

Your patch only touches one of them, and it looks like we would fail to
trace many packets (e.g., see receive-pack.c:report(), which uses
packet_buf_write() and then write()s out the result).

-Peff

PS Also, s/send/sent/ in the commit message.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 03/15] pkt-line: add `gentle` parameter to format_packet()

2016-08-10 Thread Jeff King
On Wed, Aug 10, 2016 at 03:03:59PM +0200, larsxschnei...@gmail.com wrote:

> From: Lars Schneider 
> 
> format_packet() dies if the caller wants to format a packet larger than
> LARGE_PACKET_MAX. Certain callers might prefer an error response instead.

I am not sure I agree here. Certainly I see the usefulness of gently
handling a failure to write(). But if you are passing in too-large
buffers, isn't that a bug in the program?

How would you recover, except by splitting up the content? That might
not be possible depending on how you are using the pkt-lines. And even
if it is, wouldn't it be simpler to split it up before sending it to
format_packet()?

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 02/15] pkt-line: call packet_trace() only if a packet is actually send

2016-08-10 Thread Jeff King
On Wed, Aug 10, 2016 at 03:51:35PM +0200, Lars Schneider wrote:

> I guess my point is that I stumbled over the un-intutiive format_packet() 
> behavior
> and I wanted to improve the situation in a way that others don't run into this
> trap. If you think that is no issue then it would be OK for me if we leave the
> current behavior as is.

I don't think the behavior as-is is a problem, and it would remain OK as
long as no callers are added who format a packet but don't write it (or
don't write it in a timely manner).

But most importantly, if you are going to refactor code, you can't
regress the existing callers. So even if we did want to change how this
worked, this patch is not acceptable as-is; it would need to fix up all
of the callers of packet_buf_write().

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 05/15] pkt-line: add packet_write_gently_fmt()

2016-08-10 Thread Lars Schneider

> On 10 Aug 2016, at 19:18, Junio C Hamano  wrote:
> 
> Jeff King  writes:
> 
>> On Wed, Aug 10, 2016 at 03:04:01PM +0200, larsxschnei...@gmail.com wrote:
>> 
>>> +int packet_write_gently_fmt(int fd, const char *fmt, ...)
>>> +{
>>> +   static struct strbuf buf = STRBUF_INIT;
>>> +   va_list args;
>>> +
>>> +   strbuf_reset();
>>> +   va_start(args, fmt);
>>> +   format_packet(1, , fmt, args);
>>> +   va_end(args);
>>> +   packet_trace(buf.buf + 4, buf.len - 4, 1);
>>> +   return (write_in_full(fd, buf.buf, buf.len) == buf.len ? 0 : -1);
>>> +}
>> 
>> Could the end of this function just be:
>> 
>>  return packet_write_gently(fd, buf.buf, buf.len);
>> 
>> ? I guess we'd prefer to avoid that, because it incurs an extra
>> memmove() of the data.
>> 
>> Similarly, I'd think this could share code with the non-gentle form
>> (which should be able to just call this and die() if returns an error).
>> Though sometimes the va_list transformation makes that awkward.
> 
> Yes.

Peff just posted that he tried the shared code idea but the result
ended up ugly.


> Also regarding the naming, please have "_gently" at the end; that is
> how all other function families with _gently variant are named, I
> think.

OK, I will rename them.

Thanks,
Lars
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Spelling fixes

2016-08-10 Thread Junio C Hamano
Ville Skyttä  writes:

> On Tue, Aug 9, 2016 at 9:19 PM, Junio C Hamano  wrote:
> [...]
>> There are two "commited" you seem to have missed, though,
>>
>> t/t3420-rebase-autostash.sh:echo uncommited-content >file0 &&
>> t/t3420-rebase-autostash.sh:echo uncommited-content >expected &&
>>
>> which I'll queue on top of this patch to be later squashed (i.e. no
>> need to resend the whole thing only to add these two).
>
> Thanks. https://github.com/client9/misspell/pull/61 :)
>
> Also, there's SSTATE_TRANSFERING->SSTATE_TRANSFERRING in
> transport-helper.c, maybe you can squash that one in as well if you
> think it's fine?

One comment that refers to that token spells it correctly, which is
funny ;-)

Here is a squashable change I've collected so far.

Subject: [PATCH] SQUASH???

 
indegee   indegree
transfering   transferring
uncommiteduncommitted
---
 commit-slab.h   | 2 +-
 t/t3420-rebase-autostash.sh | 4 ++--
 transport-helper.c  | 8 
 3 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/commit-slab.h b/commit-slab.h
index f84b449..be16da7 100644
--- a/commit-slab.h
+++ b/commit-slab.h
@@ -8,7 +8,7 @@
  *
  * After including this header file, using:
  *
- * define_commit_slab(indegee, int);
+ * define_commit_slab(indegree, int);
  *
  * will let you call the following functions:
  *
diff --git a/t/t3420-rebase-autostash.sh b/t/t3420-rebase-autostash.sh
index 532ff5c..ab8a63e 100755
--- a/t/t3420-rebase-autostash.sh
+++ b/t/t3420-rebase-autostash.sh
@@ -179,7 +179,7 @@ testrebase " --interactive" .git/rebase-merge
 
 test_expect_success 'abort rebase -i with --autostash' '
test_when_finished "git reset --hard" &&
-   echo uncommited-content >file0 &&
+   echo uncommitted-content >file0 &&
(
write_script abort-editor.sh <<-\EOF &&
echo >"$1"
@@ -188,7 +188,7 @@ test_expect_success 'abort rebase -i with --autostash' '
test_must_fail git rebase -i --autostash HEAD^ &&
rm -f abort-editor.sh
) &&
-   echo uncommited-content >expected &&
+   echo uncommitted-content >expected &&
test_cmp expected file0
 '
 
diff --git a/transport-helper.c b/transport-helper.c
index 4208743..db2f930 100644
--- a/transport-helper.c
+++ b/transport-helper.c
@@ -1103,7 +1103,7 @@ static void transfer_debug(const char *fmt, ...)
 }
 
 /* Stream state: More data may be coming in this direction. */
-#define SSTATE_TRANSFERING 0
+#define SSTATE_TRANSFERRING 0
 /*
  * Stream state: No more data coming in this direction, flushing rest of
  * data.
@@ -1112,7 +1112,7 @@ static void transfer_debug(const char *fmt, ...)
 /* Stream state: Transfer in this direction finished. */
 #define SSTATE_FINISHED 2
 
-#define STATE_NEEDS_READING(state) ((state) <= SSTATE_TRANSFERING)
+#define STATE_NEEDS_READING(state) ((state) <= SSTATE_TRANSFERRING)
 #define STATE_NEEDS_WRITING(state) ((state) <= SSTATE_FLUSHING)
 #define STATE_NEEDS_CLOSING(state) ((state) == SSTATE_FLUSHING)
 
@@ -1369,7 +1369,7 @@ int bidirectional_transfer_loop(int input, int output)
state.ptg.dest = 1;
state.ptg.src_is_sock = (input == output);
state.ptg.dest_is_sock = 0;
-   state.ptg.state = SSTATE_TRANSFERING;
+   state.ptg.state = SSTATE_TRANSFERRING;
state.ptg.bufuse = 0;
state.ptg.src_name = "remote input";
state.ptg.dest_name = "stdout";
@@ -1378,7 +1378,7 @@ int bidirectional_transfer_loop(int input, int output)
state.gtp.dest = output;
state.gtp.src_is_sock = 0;
state.gtp.dest_is_sock = (input == output);
-   state.gtp.state = SSTATE_TRANSFERING;
+   state.gtp.state = SSTATE_TRANSFERRING;
state.gtp.bufuse = 0;
state.gtp.src_name = "stdin";
state.gtp.dest_name = "remote output";
-- 
2.9.2-831-g532adb1



--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Spelling fixes

2016-08-10 Thread Johannes Sixt
Am 09.08.2016 um 20:19 schrieb Junio C Hamano:
>  
> 
> accidentlyaccidentally
> commited  committed
> dependancydependency
> emtpy empty
> existance existence
> explicitely   explicitly
> git-upload-achive git-upload-archive
> hierachy  hierarchy
> intialinitial
> mulitple  multiple
> non-existant  non-existent
> precendence.  precedence.
> priviledged   privileged
> programatically   programmatically
> psuedo-binary pseudo-binary
> soemwhere somewhere
> successfull   successful
> unkownunknown
> usefull   useful
> writting  writing

This reminds me of one occurrence of "indegee" in commit-slab.h that
should be "indegree". The word is not in this list above, so I don't
know whether you would like to have it fixed in the same patch or not...

(BTW, is that word "in-degree" or "ingredient"?)

-- Hannes

diff --git a/commit-slab.h b/commit-slab.h
--- a/commit-slab.h
+++ b/commit-slab.h
@@ -8,7 +8,7 @@
  *
  * After including this header file, using:
  *
- * define_commit_slab(indegee, int);
+ * define_commit_slab(indegree, int);
  *
  * will let you call the following functions:
  *

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] t7406: fix breakage on OSX

2016-08-10 Thread Junio C Hamano
Stefan Beller  writes:

> On OSX `wc` prefixes the output of numbers with whitespace, such that
> the `commit_count` would be "SP ". When using that in
>
> git submodule update --init --depth=$commit_count
>
> the depth would be empty and the number is interpreted as the pathspec.
> Fix this by not using `wc` and rather instruct rev-list to count.
>
> Another way to fix this is to remove the `=` sign after the `--depth`
> argument as then we are allowed to have more than just one whitespace
> between `--depth` and the actual number. Prefer the solution of rev-list
> counting as that is expected to be slightly faster and more self-sustained
> within Git.

You meant self-contained, I would guess.

There are a couple of "log --oneline | wc -l" remaining that are
currently safe but they may be a time-bomb waiting to go off.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 14/15] convert: add filter..process option

2016-08-10 Thread larsxschneider
From: Lars Schneider 

Git's clean/smudge mechanism invokes an external filter process for every
single blob that is affected by a filter. If Git filters a lot of blobs
then the startup time of the external filter processes can become a
significant part of the overall Git execution time.

In a preliminary performance test this developer used a clean/smudge filter
written in golang to filter 12,000 files. This process took 364s with the
existing filter mechanism and 5s with the new mechanism. See details here:
https://github.com/github/git-lfs/pull/1382

This patch adds the `filter..process` string option which, if used,
keeps the external filter process running and processes all blobs with
the packet format (pkt-line) based protocol over standard input and standard
output described below.

Git starts the filter when it encounters the first file
that needs to be cleaned or smudged. After the filter started
Git sends a welcome message, a list of supported protocol
version numbers, and a flush packet. Git expects to read the
welcome message and one protocol version number from the
previously sent list. Afterwards Git sends a list of supported
capabilities and a flush packet. Git expects to read a list of
desired capabilities, which must be a subset of the supported
capabilities list, and a flush packet as response:

packet:  git> git-filter-client
packet:  git> version=2
packet:  git> version=42
packet:  git> 
packet:  git< git-filter-server
packet:  git< version=2
packet:  git> clean=true
packet:  git> smudge=true
packet:  git> not-yet-invented=true
packet:  git> 
packet:  git< clean=true
packet:  git< smudge=true
packet:  git< 

Supported filter capabilities in version 2 are "clean" and
"smudge".

Afterwards Git sends a list of "key=value" pairs terminated with
a flush packet. The list will contain at least the filter command
(based on the supported capabilities) and the pathname of the file
to filter relative to the repository root. Right after these packets
Git sends the content split in zero or more pkt-line packets and a
flush packet to terminate content.

packet:  git> command=smudge\n
packet:  git> pathname=path/testfile.dat\n
packet:  git> 
packet:  git> CONTENT
packet:  git> 


The filter is expected to respond with a list of "key=value" pairs
terminated with a flush packet. If the filter does not experience
problems then the list must contain a "success" status. Right after
these packets the filter is expected to send the content in zero
or more pkt-line packets and a flush packet at the end. Finally, a
second list of "key=value" pairs terminated with a flush packet
is expected. The filter can change the status in the second list.

packet:  git< status=success\n
packet:  git< 
packet:  git< SMUDGED_CONTENT
packet:  git< 
packet:  git<   # empty list!


If the result content is empty then the filter is expected to respond
with a success status and an empty list.

packet:  git< status=success\n
packet:  git< 
packet:  git<   # empty content!
packet:  git<   # empty list!


In case the filter cannot or does not want to process the content,
it is expected to respond with an "error" status. Depending on the
`filter..required` flag Git will interpret that as error
but it will not stop or restart the filter process.

packet:  git< status=error\n
packet:  git< 


In case the filter cannot or does not want to process the content
as well as any future content for the lifetime of the Git process,
it is expected to respond with an "error-all" status. Depending on
the `filter..required` flag Git will interpret that as error
but it will not stop or restart the filter process.

packet:  git< status=error-all\n
packet:  git< 


If the filter experiences an error during processing, then it can
send the status "error". Depending on the `filter..required`
flag Git will interpret that as error but it will not stop or restart
the filter process.

packet:  git< status=success\n
packet:  git< 
packet:  git< HALF_WRITTEN_ERRONEOUS_CONTENT
packet:  git< 
packet:  git< status=error\n
packet:  git< 


If the filter dies during the communication or does not adhere to
the protocol then Git will stop the filter process and restart it
with the next file that needs to be processed.

After the filter has processed a blob 

Re: [PATCH] http-backend: buffer headers before sending

2016-08-10 Thread Jeff King
On Tue, Aug 09, 2016 at 11:47:31PM +, Eric Wong wrote:

> Avoid waking up the readers for unnecessary context switches for
> each line of header data being written, as all the headers are
> written in short succession.
> 
> It is unlikely any HTTP/1.x server would want to read a CGI
> response one-line-at-a-time and trickle each to the client.
> Instead, I'd expect HTTP servers want to minimize syscall and
> TCP/IP framing overhead by trying to send all of its response
> headers in a single syscall or even combining the headers and
> first chunk of the body with MSG_MORE or writev.
> 
> Verified by strace-ing response parsing on the CGI side.

I don't think this is wrong to do, but it does feel like it makes the
code slightly more brittle (you have to pass around the strbuf and
remember to initialize it and end_headers() when you're done), for not
much benefit.

Using some kind of buffered I/O would be nicer, as then you would get
nice-sized chunks without having to impact the code. I wonder if just
using stdio here would be that bad. The place it usually sucks is in
complex error handling, but we don't care about that at all here (I
think we are basically happy to write until we get SIGPIPE).

I dunno. I suspect the performance improvement from your patch is
marginal, but it's not like the resulting code is all _that_ complex. So
I guess I am OK either way, just not enthused.

> ---
>   I admit I only noticed this because I was being lazy when
>   implementing the reader-side on an HTTP server by making
>   a single read(2) call :x

The trouble is that your HTTP server is still broken. Now it's just
broken in an unpredictable and racy way, because the OS may still split
the write at PIPE_BUF boundaries. (Though given that this is not in the
commit message, I suspect you know this patch is not an excuse not to
fix your HTTP server).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 03/15] pkt-line: add `gentle` parameter to format_packet()

2016-08-10 Thread Lars Schneider

> On 10 Aug 2016, at 15:15, Jeff King  wrote:
> 
> On Wed, Aug 10, 2016 at 03:03:59PM +0200, larsxschnei...@gmail.com wrote:
> 
>> From: Lars Schneider 
>> 
>> format_packet() dies if the caller wants to format a packet larger than
>> LARGE_PACKET_MAX. Certain callers might prefer an error response instead.
> 
> I am not sure I agree here. Certainly I see the usefulness of gently
> handling a failure to write(). But if you are passing in too-large
> buffers, isn't that a bug in the program?
> 
> How would you recover, except by splitting up the content? That might
> not be possible depending on how you are using the pkt-lines. And even
> if it is, wouldn't it be simpler to split it up before sending it to
> format_packet()?

Good argument. I agree - this patch should be dropped.

Thanks,
Lars
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 04/15] pkt-line: add packet_write_gently()

2016-08-10 Thread larsxschneider
From: Lars Schneider 

packet_write() has two shortcomings. First, it uses format_packet() which
lets the caller only send string data via "%s". That means it cannot be
used for arbitrary data that may contain NULs. Second, it will always
die on error.

Add packet_write_gently() which writes arbitrary data and returns `0` for
success and `-1` for an error.

Signed-off-by: Lars Schneider 
---
 pkt-line.c | 12 
 pkt-line.h |  1 +
 2 files changed, 13 insertions(+)

diff --git a/pkt-line.c b/pkt-line.c
index e6b8410..4f25748 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -3,6 +3,7 @@
 #include "run-command.h"
 
 char packet_buffer[LARGE_PACKET_MAX];
+char packet_write_buffer[LARGE_PACKET_MAX];
 static const char *packet_trace_prefix = "git";
 static struct trace_key trace_packet = TRACE_KEY_INIT(PACKET);
 static struct trace_key trace_pack = TRACE_KEY_INIT(PACKFILE);
@@ -141,6 +142,17 @@ void packet_write(int fd, const char *fmt, ...)
write_or_die(fd, buf.buf, buf.len);
 }
 
+int packet_write_gently(const int fd_out, const char *buf, size_t size)
+{
+   if (size > PKTLINE_DATA_MAXLEN)
+   return -1;
+   packet_trace(buf, size, 1);
+   memmove(packet_write_buffer + 4, buf, size);
+   size += 4;
+   set_packet_header(packet_write_buffer, size);
+   return (write_in_full(fd_out, packet_write_buffer, size) == size ? 0 : 
-1);
+}
+
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...)
 {
va_list args;
diff --git a/pkt-line.h b/pkt-line.h
index 3cb9d91..88584f1 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -77,6 +77,7 @@ char *packet_read_line_buf(char **src_buf, size_t *src_len, 
int *size);
 
 #define DEFAULT_PACKET_MAX 1000
 #define LARGE_PACKET_MAX 65520
+#define PKTLINE_DATA_MAXLEN (LARGE_PACKET_MAX - 4)
 extern char packet_buffer[LARGE_PACKET_MAX];
 
 #endif
-- 
2.9.2

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC] diff: add SUBMODULE_DIFF format to display submodule diff

2016-08-10 Thread Johannes Sixt

Am 10.08.2016 um 00:56 schrieb Jacob Keller:

On Tue, Aug 9, 2016 at 3:50 PM, Stefan Beller  wrote:

On Tue, Aug 9, 2016 at 3:32 PM, Jacob Keller  wrote:

+   if (strbuf_read(buf, cp.out, 0) < 0)


So we keep the whole diff in memory
I don't know much about the diff machinery, but I thought
the rest of the diff machinery just streams it out?


Yea, but I can't figure out how to do that. Is there an easy way to
stream chunks from the pipe straight into the file?


You don't stream via a pipe, you let the sub-process write directly to 
the file:


fflush(f);
cp.out = dup(fileno(f));

Of course, you no longer "prepare_submodule_diff" anymore (which 
currently runs the child process), but you print the "Submodule" header 
line, then invoke the child process, and if there was a failure, you 
fprintf(f, "(diff failed)").


-- Hannes

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 11/15] convert: modernize tests

2016-08-10 Thread larsxschneider
From: Lars Schneider 

Use `test_config` to set the config, check that files are empty with
`test_must_be_empty`, compare files with `test_cmp`, and remove spaces
after ">" and "<".

Signed-off-by: Lars Schneider 
---
 t/t0021-conversion.sh | 62 +--
 1 file changed, 31 insertions(+), 31 deletions(-)

diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index 7bac2bc..7b45136 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -13,8 +13,8 @@ EOF
 chmod +x rot13.sh
 
 test_expect_success setup '
-   git config filter.rot13.smudge ./rot13.sh &&
-   git config filter.rot13.clean ./rot13.sh &&
+   test_config filter.rot13.smudge ./rot13.sh &&
+   test_config filter.rot13.clean ./rot13.sh &&
 
{
echo "*.t filter=rot13"
@@ -38,8 +38,8 @@ script='s/^\$Id: \([0-9a-f]*\) \$/\1/p'
 
 test_expect_success check '
 
-   cmp test.o test &&
-   cmp test.o test.t &&
+   test_cmp test.o test &&
+   test_cmp test.o test.t &&
 
# ident should be stripped in the repository
git diff --raw --exit-code :test :test.i &&
@@ -47,10 +47,10 @@ test_expect_success check '
embedded=$(sed -ne "$script" test.i) &&
test "z$id" = "z$embedded" &&
 
-   git cat-file blob :test.t > test.r &&
+   git cat-file blob :test.t >test.r &&
 
-   ./rot13.sh < test.o > test.t &&
-   cmp test.r test.t
+   ./rot13.sh test.t &&
+   test_cmp test.r test.t
 '
 
 # If an expanded ident ever gets into the repository, we want to make sure that
@@ -130,7 +130,7 @@ test_expect_success 'filter shell-escaped filenames' '
 
# delete the files and check them out again, using a smudge filter
# that will count the args and echo the command-line back to us
-   git config filter.argc.smudge "sh ./argc.sh %f" &&
+   test_config filter.argc.smudge "sh ./argc.sh %f" &&
rm "$normal" "$special" &&
git checkout -- "$normal" "$special" &&
 
@@ -141,7 +141,7 @@ test_expect_success 'filter shell-escaped filenames' '
test_cmp expect "$special" &&
 
# do the same thing, but with more args in the filter expression
-   git config filter.argc.smudge "sh ./argc.sh %f --my-extra-arg" &&
+   test_config filter.argc.smudge "sh ./argc.sh %f --my-extra-arg" &&
rm "$normal" "$special" &&
git checkout -- "$normal" "$special" &&
 
@@ -154,9 +154,9 @@ test_expect_success 'filter shell-escaped filenames' '
 '
 
 test_expect_success 'required filter should filter data' '
-   git config filter.required.smudge ./rot13.sh &&
-   git config filter.required.clean ./rot13.sh &&
-   git config filter.required.required true &&
+   test_config filter.required.smudge ./rot13.sh &&
+   test_config filter.required.clean ./rot13.sh &&
+   test_config filter.required.required true &&
 
echo "*.r filter=required" >.gitattributes &&
 
@@ -165,17 +165,17 @@ test_expect_success 'required filter should filter data' '
 
rm -f test.r &&
git checkout -- test.r &&
-   cmp test.o test.r &&
+   test_cmp test.o test.r &&
 
./rot13.sh expected &&
git cat-file blob :test.r >actual &&
-   cmp expected actual
+   test_cmp expected actual
 '
 
 test_expect_success 'required filter smudge failure' '
-   git config filter.failsmudge.smudge false &&
-   git config filter.failsmudge.clean cat &&
-   git config filter.failsmudge.required true &&
+   test_config filter.failsmudge.smudge false &&
+   test_config filter.failsmudge.clean cat &&
+   test_config filter.failsmudge.required true &&
 
echo "*.fs filter=failsmudge" >.gitattributes &&
 
@@ -186,9 +186,9 @@ test_expect_success 'required filter smudge failure' '
 '
 
 test_expect_success 'required filter clean failure' '
-   git config filter.failclean.smudge cat &&
-   git config filter.failclean.clean false &&
-   git config filter.failclean.required true &&
+   test_config filter.failclean.smudge cat &&
+   test_config filter.failclean.clean false &&
+   test_config filter.failclean.required true &&
 
echo "*.fc filter=failclean" >.gitattributes &&
 
@@ -197,8 +197,8 @@ test_expect_success 'required filter clean failure' '
 '
 
 test_expect_success 'filtering large input to small output should use little 
memory' '
-   git config filter.devnull.clean "cat >/dev/null" &&
-   git config filter.devnull.required true &&
+   test_config filter.devnull.clean "cat >/dev/null" &&
+   test_config filter.devnull.required true &&
for i in $(test_seq 1 30); do printf "%1048576d" 1; done >30MB &&
echo "30MB filter=devnull" >.gitattributes &&
GIT_MMAP_LIMIT=1m GIT_ALLOC_LIMIT=1m git add 30MB
@@ -207,7 +207,7 @@ test_expect_success 'filtering large input to small output 
should use little mem
 

Re: [PATCH v5 04/15] pkt-line: add packet_write_gently()

2016-08-10 Thread Lars Schneider

> On 10 Aug 2016, at 15:28, Jeff King  wrote:
> 
> On Wed, Aug 10, 2016 at 03:04:00PM +0200, larsxschnei...@gmail.com wrote:
> 
>> From: Lars Schneider 
>> 
>> packet_write() has two shortcomings. First, it uses format_packet() which
>> lets the caller only send string data via "%s". That means it cannot be
>> used for arbitrary data that may contain NULs. Second, it will always
>> die on error.
>> 
>> Add packet_write_gently() which writes arbitrary data and returns `0` for
>> success and `-1` for an error.
> 
> So now we have packet_write() and packet_write_gently(), but they differ
> in more than just whether they are gentle. That seems like a weird
> interface.
> 
> Should we either be picking a new name (e.g., packet_write_mem() or
> something), or migrating packet_write() to packet_write_fmt()?

Done in "[PATCH v5 08/15] pkt-line: rename packet_write() to packet_write_fmt()"


>> diff --git a/pkt-line.c b/pkt-line.c
>> index e6b8410..4f25748 100644
>> --- a/pkt-line.c
>> +++ b/pkt-line.c
>> @@ -3,6 +3,7 @@
>> #include "run-command.h"
>> 
>> char packet_buffer[LARGE_PACKET_MAX];
>> +char packet_write_buffer[LARGE_PACKET_MAX];
> 
> Should this be static? I.e., are random other bits of the code allowed
> to write into it (I guess not because it's not declared in pkt-line.h).

static is better!


>> +int packet_write_gently(const int fd_out, const char *buf, size_t size)
>> +{
>> +if (size > PKTLINE_DATA_MAXLEN)
>> +return -1;
>> +packet_trace(buf, size, 1);
>> +memmove(packet_write_buffer + 4, buf, size);
> 
> It looks like this iteration drops the idea of callers using a
> LARGE_PACKET_MAX buffer and only filling it at "buf+4" with
> PKTLINE_DATA_MAXLEN bytes (which is fine).
> 
> I wonder if we still need PKTLINE_DATA_MAXLEN, or of it is just
> obscuring things. The magic number "4" still appears separately here,
> and it actually makes it harder to see that things are correct.  I.e.,
> doing:
> 
>  if (size > sizeof(packet_write_buffer) - 4)
>   return -1;
>  memmove(packet_write_buffer + 4, buf, size);
> 
> is more obviously correct, because you do not have to wonder about the
> relationship between the size of your buffer and the macro.
> 
> It might still be worth having PKTLINE_DATA_MAXLEN public, though, if
> callers use it to size their input to packet_write_gently().

I agree. In a later patch I am using PKTLINE_DATA_MAXLEN inside pkt-line.c,
too. I will change it to your suggestion.

For now I would remove PKTLINE_DATA_MAXLEN because it should be an 
implementation
detail of pkt-line.c (plus it is not used by anyone).

Thanks,
Lars--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 02/15] pkt-line: call packet_trace() only if a packet is actually send

2016-08-10 Thread larsxschneider
From: Lars Schneider 

The packet_trace() call is not ideal in format_packet() as we would print
a trace when a packet is formatted and (potentially) when the packet is
actually send. This was no problem up until now because format_packet()
was only used by one function. Fix it by moving the trace call into the
function that actually sends the packet.

Signed-off-by: Lars Schneider 
---
 pkt-line.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pkt-line.c b/pkt-line.c
index 177dc73..9400b47 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -121,7 +121,6 @@ static void format_packet(struct strbuf *out, const char 
*fmt, va_list args)
die("protocol error: impossibly long line");
 
set_packet_header(>buf[orig_len], n);
-   packet_trace(out->buf + orig_len + 4, n - 4, 1);
 }
 
 void packet_write(int fd, const char *fmt, ...)
@@ -133,6 +132,7 @@ void packet_write(int fd, const char *fmt, ...)
va_start(args, fmt);
format_packet(, fmt, args);
va_end(args);
+   packet_trace(buf.buf + 4, buf.len - 4, 1);
write_or_die(fd, buf.buf, buf.len);
 }
 
-- 
2.9.2

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 04/15] pkt-line: add packet_write_gently()

2016-08-10 Thread Jeff King
On Wed, Aug 10, 2016 at 03:04:00PM +0200, larsxschnei...@gmail.com wrote:

> From: Lars Schneider 
> 
> packet_write() has two shortcomings. First, it uses format_packet() which
> lets the caller only send string data via "%s". That means it cannot be
> used for arbitrary data that may contain NULs. Second, it will always
> die on error.
> 
> Add packet_write_gently() which writes arbitrary data and returns `0` for
> success and `-1` for an error.

So now we have packet_write() and packet_write_gently(), but they differ
in more than just whether they are gentle. That seems like a weird
interface.

Should we either be picking a new name (e.g., packet_write_mem() or
something), or migrating packet_write() to packet_write_fmt()?

> diff --git a/pkt-line.c b/pkt-line.c
> index e6b8410..4f25748 100644
> --- a/pkt-line.c
> +++ b/pkt-line.c
> @@ -3,6 +3,7 @@
>  #include "run-command.h"
>  
>  char packet_buffer[LARGE_PACKET_MAX];
> +char packet_write_buffer[LARGE_PACKET_MAX];

Should this be static? I.e., are random other bits of the code allowed
to write into it (I guess not because it's not declared in pkt-line.h).

> +int packet_write_gently(const int fd_out, const char *buf, size_t size)
> +{
> + if (size > PKTLINE_DATA_MAXLEN)
> + return -1;
> + packet_trace(buf, size, 1);
> + memmove(packet_write_buffer + 4, buf, size);

It looks like this iteration drops the idea of callers using a
LARGE_PACKET_MAX buffer and only filling it at "buf+4" with
PKTLINE_DATA_MAXLEN bytes (which is fine).

I wonder if we still need PKTLINE_DATA_MAXLEN, or of it is just
obscuring things. The magic number "4" still appears separately here,
and it actually makes it harder to see that things are correct.  I.e.,
doing:

  if (size > sizeof(packet_write_buffer) - 4)
return -1;
  memmove(packet_write_buffer + 4, buf, size);

is more obviously correct, because you do not have to wonder about the
relationship between the size of your buffer and the macro.

It might still be worth having PKTLINE_DATA_MAXLEN public, though, if
callers use it to size their input to packet_write_gently().

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 07/15] pkt-line: add functions to read/write flush terminated packet streams

2016-08-10 Thread larsxschneider
From: Lars Schneider 

packet_write_stream_with_flush_from_fd() and
packet_write_stream_with_flush_from_buf() write a stream of packets. All
content packets use the maximal packet size except for the last one.
After the last content packet a `flush` control packet is written.

packet_read_till_flush() reads arbitary sized packets until it detects
a `flush` packet.

Signed-off-by: Lars Schneider 
---
 pkt-line.c | 89 ++
 pkt-line.h |  7 +
 2 files changed, 96 insertions(+)

diff --git a/pkt-line.c b/pkt-line.c
index e6a0924..4b526e1 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -181,6 +181,45 @@ void packet_buf_write(struct strbuf *buf, const char *fmt, 
...)
va_end(args);
 }
 
+int packet_write_stream_with_flush_from_fd(int fd_in, int fd_out)
+{
+   int err = 0;
+   ssize_t bytes_to_write;
+   while (!err) {
+   bytes_to_write = xread(fd_in, packet_write_buffer, 
PKTLINE_DATA_MAXLEN);
+   if (bytes_to_write < 0)
+   return COPY_READ_ERROR;
+   if (bytes_to_write == 0)
+   break;
+   if (bytes_to_write > PKTLINE_DATA_MAXLEN)
+   return COPY_WRITE_ERROR;
+   err = packet_write_gently(fd_out, packet_write_buffer, 
bytes_to_write);
+   }
+   if (!err)
+   err = packet_flush_gently(fd_out);
+   return err;
+}
+
+int packet_write_stream_with_flush_from_buf(const char *src_in, size_t len, 
int fd_out)
+{
+   int err = 0;
+   size_t bytes_written = 0;
+   size_t bytes_to_write;
+   while (!err) {
+   if ((len - bytes_written) > PKTLINE_DATA_MAXLEN)
+   bytes_to_write = PKTLINE_DATA_MAXLEN;
+   else
+   bytes_to_write = len - bytes_written;
+   if (bytes_to_write == 0)
+   break;
+   err = packet_write_gently(fd_out, src_in + bytes_written, 
bytes_to_write);
+   bytes_written += bytes_to_write;
+   }
+   if (!err)
+   err = packet_flush_gently(fd_out);
+   return err;
+}
+
 static int get_packet_data(int fd, char **src_buf, size_t *src_size,
   void *dst, unsigned size, int options)
 {
@@ -290,3 +329,53 @@ char *packet_read_line_buf(char **src, size_t *src_len, 
int *dst_len)
 {
return packet_read_line_generic(-1, src, src_len, dst_len);
 }
+
+ssize_t packet_read_till_flush(int fd_in, struct strbuf *sb_out)
+{
+   int len, ret;
+   int options = PACKET_READ_GENTLE_ON_EOF;
+   char linelen[4];
+
+   size_t oldlen = sb_out->len;
+   size_t oldalloc = sb_out->alloc;
+
+   for (;;) {
+   /* Read packet header */
+   ret = get_packet_data(fd_in, NULL, NULL, linelen, 4, options);
+   if (ret < 0)
+   goto done;
+   len = packet_length(linelen);
+   if (len < 0)
+   die("protocol error: bad line length character: %.4s", 
linelen);
+   if (!len) {
+   /* Found a flush packet - Done! */
+   packet_trace("", 4, 0);
+   break;
+   }
+   len -= 4;
+
+   /* Read packet content */
+   strbuf_grow(sb_out, len);
+   ret = get_packet_data(fd_in, NULL, NULL, sb_out->buf + 
sb_out->len, len, options);
+   if (ret < 0)
+   goto done;
+
+   if (ret != len) {
+   error("protocol error: incomplete read (expected %d, 
got %d)", len, ret);
+   goto done;
+   }
+
+   packet_trace(sb_out->buf + sb_out->len, len, 0);
+   sb_out->len += len;
+   }
+
+done:
+   if (ret < 0) {
+   if (oldalloc == 0)
+   strbuf_release(sb_out);
+   else
+   strbuf_setlen(sb_out, oldlen);
+   return ret;  /* unexpected EOF */
+   }
+   return sb_out->len - oldlen;
+}
diff --git a/pkt-line.h b/pkt-line.h
index b6c8bcd..89063ee 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -25,6 +25,8 @@ void packet_buf_flush(struct strbuf *buf);
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...) 
__attribute__((format (printf, 2, 3)));
 int packet_flush_gently(int fd);
 int packet_write_gently_fmt(int fd, const char *fmt, ...) 
__attribute__((format (printf, 2, 3)));
+int packet_write_stream_with_flush_from_fd(int fd_in, int fd_out);
+int packet_write_stream_with_flush_from_buf(const char *src_in, size_t len, 
int fd_out);
 
 /*
  * Read a packetized line into the buffer, which must be at least size bytes
@@ -77,6 +79,11 @@ char *packet_read_line(int fd, int *size);
  */
 char *packet_read_line_buf(char **src_buf, size_t *src_len, int *size);
 

Re: [PATCH RFC v2] diff: add SUBMODULE_DIFF format to display submodule diff

2016-08-10 Thread Jacob Keller
On Tue, Aug 9, 2016 at 8:37 PM, Junio C Hamano  wrote:
> Jacob Keller  writes:
>> + cp.dir = path;
>> + cp.out = -1;
>> + cp.no_stdin = 1;
>> + argv_array_push(, "diff");
>> + argv_array_pushf(, "--src-prefix=a/%s/", path);
>> + argv_array_pushf(, "--dst-prefix=b/%s/", path);
>
> I think this is wrong.  Imagine when the caller gave you prefixes
> that are different from a/ and b/ (think: the extreme case is that
> you are a sub-sub-module, and the caller is recursing into you with
> its own prefix, perhaps set to a/sub and b/sub).  Use the prefixes
> you got for a/ and b/ instead of hardcoding them and you'd be fine,
> I'd guess.

I'll have to get these passed, but yes this makes more sense, will look into it.

>
>> + argv_array_push(, sha1_to_hex(one));
>> +
>> + /*
>> +  * If the submodule has untracked or modified contents, diff between
>> +  * the old base and the current tip. This had the advantage of showing
>> +  * the full difference of the submodule contents.
>> +  */
>> + if (!create_dirty_diff)
>> + argv_array_push(, sha1_to_hex(two));
>> +
>> + if (start_command())
>> + die("Could not run 'git diff' command in submodule %s", path);
>> +
>> + diff = fdopen(cp.out, "r");
>> +
>> + c = fgetc(diff);
>> + while (c != EOF) {
>> + fputc(c, f);
>> + c = fgetc(diff);
>> + }
>> +
>> + fclose(diff);
>> + finish_command();
>
> I do not think you need to do this.  If "f" is already a FILE * to
> which the output must go, then instead of reading from the pipe and
> writing it, you can just let the "diff" spit out its output to the
> same file descriptor, by doing something like:
>
> fflush(f);
> cp.out = dup(fileno(f));
> ... other setup ...
> run_command();
>
> no?  I do not offhand recall run_command() closes cp.out after done,
> and if it doesn't you may have to close it yourself after the above
> sequence.


That makes a lot more sense, yes. I hadn't thought of dup, (long time
since I've had to use file descriptors).

the child_process api does close the descriptor itself. That's a much
better way to get the result we want, and it is less code, so I'll try
this out.

>
> While I personally do not want to see code to access submodule's
> object store by temporarily adding it as alternates, the "show
> left-right log between the commits" code already does so, so I
> should point out that running "diff" internally without spawning it
> as a subprocess shouldn't be too hard.  The diff API is reasonably
> libified and there shouldn't be additional "libification" preparation
> work required to do this, if you really wanted to.

I looked at trying to call diff for this, but I think it's not worth
it. Using the child process API makes more sense and is a cleaner
method. I'll go this route as it appears to work pretty well. The only
major concern I have is that options may not get forwarded
correctly...

Thanks,
Jake
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/2] pack-objects: break delta cycles before delta-search phase

2016-08-10 Thread Junio C Hamano
Jeff King  writes:

> ...
> We could do analysis on any cycles that we find to
> distinguish the two cases (i.e., it is a bogus pack if and
> only if every delta in the cycle is in the same pack), but
> we don't need to. If there is a cycle inside a pack, we'll
> run into problems not only reusing the delta, but accessing
> the object data at all. So when we try to dig up the actual
> size of the object, we'll hit that same cycle and kick in
> our usual complain-and-try-another-source code.

I agree with all of the above reasoning.

> Actually, skimming the sha1_file code, I am not 100% sure that we detect
> cycles in OBJ_REF_DELTA (you cannot have cycles in OBJ_OFS_DELTA since
> they always point backwards in the pack). But if that is the case, then
> I think we should fix that, not worry about special-casing it here.

Yes, but sha1_file.c?  It is the reading side and it is too late if
we notice a problem, I would think.

> +/*
> + * Drop an on-disk delta we were planning to reuse. Naively, this would
> + * just involve blanking out the "delta" field, but we have to deal
> + * with two extra pieces of book-keeping:
> + *
> + *   1. Removing ourselves from the delta_sibling linked list.
> + *
> + *   2. Updating our size; check_object() will have filled in the size of our
> + *  delta, but a non-delta object needs it true size.

Excellent point.

> +/*
> + * Follow the chain of deltas from this entry onward, throwing away any links
> + * that cause us to hit a cycle (as determined by the DFS state flags in
> + * the entries).
> + */
> +static void break_delta_cycles(struct object_entry *entry)
> +{
> + /* If it's not a delta, it can't be part of a cycle. */
> + if (!entry->delta) {
> + entry->dfs_state = DFS_DONE;
> + return;
> + }
> +
> + switch (entry->dfs_state) {
> + case DFS_NONE:
> + /*
> +  * This is the first time we've seen the object. We mark it as
> +  * part of the active potential cycle and recurse.
> +  */
> + entry->dfs_state = DFS_ACTIVE;
> + break_delta_cycles(entry->delta);
> + entry->dfs_state = DFS_DONE;
> + break;
> +
> + case DFS_DONE:
> + /* object already examined, and not part of a cycle */
> + break;
> +
> + case DFS_ACTIVE:
> + /*
> +  * We found a cycle that needs broken. It would be correct to
> +  * break any link in the chain, but it's convenient to
> +  * break this one.
> +  */
> + drop_reused_delta(entry);
> + break;
> + }
> +}

Do we need to do anything to the DFS state of an entry when
drop_reused_delta() resets its other fields?  If we had this and
started from A (read "A--->B" as "A is based on B"):

A--->B--->C--->A

we paint A as ACTIVE, visit B and then C and paint them as active,
and when we visit A for the second time, we drop it (i.e. break the
link between A and B), return and paint C as DONE, return and paint
B as DONE, and leaving A painted as ACTIVE, while the chain is now

 B--->C--->A

If we later find D that is directly based on A, wouldn't we end up
visiting A and attempt to drop it again?  drop_reused_delta() is
idempotent so there will be no data structure corruption, I think,
but we can safely declare that the entry is now DONE after calling
drop_reused_delta() on it (either in the function or in the caller
after it calls the function), no?

> + 2. Picking the next pack to examine based on locality (i.e., where we found
> +something else recently).
> +
> +In this case, we want to make sure that we find the delta versions of A 
> and
> +B and not their base versions. We can do this by putting two blobs in 
> each
> +pack. The first is a "dummy" blob that can only be found in the pack in
> +question.  And then the second is the actual delta we want to find.
> +
> +The two blobs must be present in the same tree, not present in other 
> trees,
> +and the dummy pathname must sort before the delta path.

> +# Create a pack containing the the tree $1 and blob $1:file, with
> +# the latter stored as a delta against $2:file.
> +#
> +# We convince pack-objects to make the delta in the direction of our choosing
> +# by marking $2 as a preferred-base edge. That results in $1:file as a thin
> +# delta, and index-pack completes it by adding $2:file as a base.

Tricky but clever and correct ;-)

> +make_pack () {
> +  {
> +  echo "-$(git rev-parse $2)"

Is everybody's 'echo' happy with dash followed by unknown string?

> +  echo "$(git rev-parse $1:dummy) dummy"
> +  echo "$(git rev-parse $1:file) file"
> +  } |
> +  git pack-objects --stdout |
> +  git index-pack --stdin --fix-thin

An alternative

git pack-objects --stdout <<-EOF |
-$(git rev-parse $2)
$(git rev-parse 

[PATCH v5 13/15] convert: make apply_filter() adhere to standard Git error handling

2016-08-10 Thread larsxschneider
From: Lars Schneider 

apply_filter() returns a boolean that tells the caller if it
"did convert or did not convert". The variable `ret` was used throughout
the function to track errors wheras `1` denoted success and `0` failure.
This is unusual for the Git source where `0` denotes success.

Rename the variable and flip its value to make the function easier
readable for Git developers.

Signed-off-by: Lars Schneider 
---
 convert.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/convert.c b/convert.c
index 522e2c5..bd17340 100644
--- a/convert.c
+++ b/convert.c
@@ -436,7 +436,7 @@ static int apply_filter(const char *path, const char *src, 
size_t len, int fd,
 *
 * (child --> cmd) --> us
 */
-   int ret = 1;
+   int err = 0;
struct strbuf nbuf = STRBUF_INIT;
struct async async;
struct filter_params params;
@@ -463,22 +463,22 @@ static int apply_filter(const char *path, const char 
*src, size_t len, int fd,
 
if (strbuf_read(, async.out, len) < 0) {
error("read from external filter '%s' failed", cmd);
-   ret = 0;
+   err = -1;
}
if (close(async.out)) {
error("read from external filter '%s' failed", cmd);
-   ret = 0;
+   err = -1;
}
if (finish_async()) {
error("external filter '%s' failed", cmd);
-   ret = 0;
+   err = -1;
}
 
-   if (ret) {
+   if (!err) {
strbuf_swap(dst, );
}
strbuf_release();
-   return ret;
+   return !err;
 }
 
 static struct convert_driver {
-- 
2.9.2

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: patch submission process, was Re: [PATCH v6 06/16] merge_recursive: abort properly upon errors

2016-08-10 Thread Josh Triplett
On Wed, Aug 10, 2016 at 09:30:01AM +0200, Jakub Narębski wrote:
> On 10 August 2016 at 02:55, Josh Triplett  wrote:
> > On Tue, Aug 09, 2016 at 06:28:00PM +, Eric Wong wrote:
> >> Some of these problems I hope public-inbox (or something like
> >> it) can fix and turn the tide towards email, again.
> >
> > This really seems like the dichotomy that drives people towards central
> > services like GitHub or GitLab.  We need an alternative that doesn't
> > involve email, or at the very least, doesn't require people to use email
> > directly.  Half of the pain in the process comes from coaxing email
> > clients that don't treat mail text as sacrosanct to leave it alone and
> > not mangle it.  (Some of that would go away if we accepted attachments
> > with inline disposition, but not all of it.  All of it would go away if
> > the submission process just involved "git push" to an appropriate
> > location.)
> 
> But submission is less important than review. And for review it is
> usually better (except gigantic series) to have patch text for review
> with the review.

Agreed.  However, submission typically requires more work than review,
because the patch text must remain applicable.  For review, as long as
the email client you use to respond doesn't do something horrible like
*re-wrap* the quoted patch text, the result will work as a review.

Ideally, I'd love to see 1) a review UI that stores line-by-line reviews
into a common format and can translate those to email, and 2) a
mechanism to translate reviews written by email and quoting into the
review format and store them with the repository.

> And (meta)-versioning of series.

I've got a documented format for that. :)

> And place for proof-of-concept / weather-balon patches...

Same place as all other patches, just with an "RFC" tag on them.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] t7406: fix breakage on OSX

2016-08-10 Thread Junio C Hamano
Stefan Beller  writes:

> On Wed, Aug 10, 2016 at 11:27 AM, Junio C Hamano  wrote:
>> Stefan Beller  writes:
>>
>>> On OSX `wc` prefixes the output of numbers with whitespace, such that
>>> the `commit_count` would be "SP ". When using that in
>>>
>>> git submodule update --init --depth=$commit_count
>>>
>>> the depth would be empty and the number is interpreted as the pathspec.
>>> Fix this by not using `wc` and rather instruct rev-list to count.
>>>
>>> Another way to fix this is to remove the `=` sign after the `--depth`
>>> argument as then we are allowed to have more than just one whitespace
>>> between `--depth` and the actual number. Prefer the solution of rev-list
>>> counting as that is expected to be slightly faster and more self-sustained
>>> within Git.
>>
>> You meant self-contained, I would guess.
>
> Yes. Mind to fix that locally, or waiting for a resend?

Fixed it up while queuing.

>> There are a couple of "log --oneline | wc -l" remaining that are
>> currently safe but they may be a time-bomb waiting to go off.
> ...
> All of the occurrences are not white space sensitive AFAICT,
> they are just bad examples, which may inspire others to follow
> that pattern.

Yup, I agree(d).



--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2 0/6] git clone: Marry --recursive and --reference

2016-08-10 Thread Junio C Hamano
Stefan Beller  writes:

> On Tue, Aug 9, 2016 at 2:45 PM, Junio C Hamano  wrote:
>> becomes easily doable (i.e. subsequent "submodule update" can realize
>> that the submodule does not have alternates but it could borrow from
>> the submodule in the other-super-project-location).
>
> I would suggest to postpone this to a later time once the need arises.

Oh, no question about that (did I even sound to be suggesting to do
it now?).  It is just one of the yardsticks I use when gauging if a
proposed solution is sound to see how it will naturally scale and/or
extend to other possible scenarios, and I was pointing out that this
seems to pass the test.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] diff-highlight: add some tests.

2016-08-10 Thread Eric Wong
Brian Henderson  wrote:

Hi Brian,

A few minor portability/style nits below, but contrib/ probably
(still?) has laxer rules than the rest of git...

I think we still require Signed-off-by lines in contrib,
though...

> +++ b/contrib/diff-highlight/t/Makefile
> @@ -0,0 +1,19 @@
> +-include ../../../config.mak.autogen
> +-include ../../../config.mak
> +
> +T = $(wildcard t[0-9][0-9][0-9][0-9]-*.sh)
> +
> +all: test
> +test: $(T)
> +
> +.PHONY: help clean all test $(T)
> +
> +help:
> + @echo 'Run "$(MAKE) test" to launch test scripts'
> + @echo 'Run "$(MAKE) clean" to remove trash folders'
> +
> +$(T):
> + @echo "*** $@ ***"; sh $@ $(GIT_TEST_OPTS)

Probably:

@echo "*** $@ ***"; '$(SHELL_PATH_SQ)' $@ $(GIT_TEST_OPTS)

like we do in t/Makefile in case we need to override 'sh'.

> +
> +clean:
> + $(RM) -r 'trash directory'.*
> diff --git a/contrib/diff-highlight/t/t9400-diff-highlight.sh 
> b/contrib/diff-highlight/t/t9400-diff-highlight.sh
> new file mode 100644
> index 000..ca7605f
> --- /dev/null
> +++ b/contrib/diff-highlight/t/t9400-diff-highlight.sh
> @@ -0,0 +1,63 @@
> +#!/bin/sh
> +#
> +# Copyright (C) 2016

IANAL, but I think your name (or who you represent) needs to be
in the copyright.

> +
> +test_description='Test diff-highlight'
> +
> +. ./test-diff-highlight.sh
> +. $TEST_DIRECTORY/test-lib.sh

TEST_DIRECTORY ought to be quoted since it could contain
shell-unfriendly chars (we intentionally name 'trash directory'
this way to trigger errors).

> +
> +# PERL is required, but assumed to be present, although not necessarily 
> modern
> +# some tests require 5.8
> +
> +test_expect_success 'diff-highlight highlightes the beginning of a line' '

You can declare a prereq for PERL::

test_expect_success PERL 'name' 'true'

And spelling: "highlights" (there's other instances of this, too)

> +  dh_test \
> +"aaa\nbbb\nccc\n" \
> +"aaa\n0bb\nccc\n" \
> +"

We use tabs for shell indentation.



> diff --git a/contrib/diff-highlight/t/test-diff-highlight.sh 
> b/contrib/diff-highlight/t/test-diff-highlight.sh
> new file mode 100644
> index 000..9b26271
> --- /dev/null
> +++ b/contrib/diff-highlight/t/test-diff-highlight.sh
> @@ -0,0 +1,69 @@
> +#!/bin/sh
> +#
> +# Copyright (C) 2016
> +
> +CURR_DIR=$(pwd)
> +TEST_OUTPUT_DIRECTORY=$(pwd)
> +TEST_DIRECTORY="$CURR_DIR"/../../../t
> +cmd=diff-highlight
> +CMD="$CURR_DIR"/../$cmd

Same comments as above on quoting pathnames and empty copyright.

> +
> +CW="\033[7m"
> +CR="\033[27m"
> +
> +export TEST_OUTPUT_DIRECTORY TEST_DIRECTORY CW CR
> +
> +dh_test() {
> +  dh_diff_test "$@"
> +  dh_commit_test "$@"
> +}
> +
> +dh_diff_test() {
> +  local a="$1" b="$2"

"local" is not a portable construct.  It's common for
Debian/Ubuntu systems to use dash as /bin/sh instead of bash;
(dash is faster, and mostly sticks to POSIX)

The "devscripts" package in Debian/Ubuntu-based systems has a
handy "checkbashisms" tool for checking portability of shell
scripts.

> +
> +  printf "$a" > file
> +  git add file
> +
> +  printf "$b" > file
> +
> +  git diff file > diff.raw

Commands should be '&&'-chained here since any of them could fail
("git add"/printf/etc could run out of space or fail on disk errors)

Also, our redirect style is:command >file
without a space between '>' and destination.

See Documentation/CodingGuidelines for more details.
Unfortunately, the reasoning is not explained for '>NOSPACE'
and I'm not sure exactly why, either...

> +  if test "$#" = 3

Better to use -eq for numeric comparisons: test $# -eq 3
Quoting $# doesn't seem necessary unless your shell is
hopelessly buggy :)

> +  then
> +# remove last newline
> +head -n5 diff.raw | head -c -1 > diff.act

"head -c" isn't portable, fortunately Jeff hoisted it out for
use as test_copy_bytes in commit 48860819e8026
https://public-inbox.org/git/20160630090753.ga17...@sigill.intra.peff.net/

> +printf "$3" >> diff.act
> +  else
> +cat diff.raw > diff.act
> +  fi
> +
> +  < diff.raw $CMD > diff.exp

$CMD probably needs to be quoted.  However, by the time I got
here I had to ask myself:  What is $CMD again?
A: Oh, look up at the top!

*shrug*  My attention span is tiny and my fonts are gigantic.

Perhaps using:

DIFF_HIGHLIGHT="$CURR_DIR"/../diff-highlight

Would be more-readable?

> +
> +  diff diff.exp diff.act

Maybe use test_cmp to avoid depending external diff.
(or "git diff -b --no-index" in your later test)

Same comments for the rest of the series, I think.

Typically, we expect a reroll in a few days, and I guess there's
no rush (so you can squash your comment patch in addressing
Junio's concern into 3/3).

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC v2] diff: add SUBMODULE_DIFF format to display submodule diff

2016-08-10 Thread Jacob Keller
On Tue, Aug 9, 2016 at 5:53 PM, Stefan Beller  wrote:
> This is not used any more, but the child is run directly below?
>   unsigned char one[20], unsigned char two[20])
>> +{
>

Yea I meant to take it all out and forgot. Will be gone in v3.

>
> This pattern seems familar, do we have a function for that?
> (get a submodules git dir? As I was mainly working on the helper
> I do not know off hand)
>

I copied it from somewhere, I didn't see a helper. It's probably worth
making one.

> diff is a FILE* pointer. cp.out is a standard int file descriptor
>
> Maybe you'd want a similar thing as strbuf_getwholeline_fd()
> just instead of adding it to the strbuf, adding it to `f` ?
>

Junio has a much better suggestion in another thread, which I will use.

Thanks,
Jake

> (Which is what you have here? I just wonder about the buffer size,
> as I think reading 1 by 1 is not beneficial)
>
>> +
>> +   c = fgetc(diff);
>> +   while (c != EOF) {
>> +   fputc(c, f);
>> +   c = fgetc(diff);
>> +   }
>> +
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] http-backend: buffer headers before sending

2016-08-10 Thread Eric Wong
Jeff King  wrote:
> On Tue, Aug 09, 2016 at 11:47:31PM +, Eric Wong wrote:
> 
> > Avoid waking up the readers for unnecessary context switches for
> > each line of header data being written, as all the headers are
> > written in short succession.
> > 
> > It is unlikely any HTTP/1.x server would want to read a CGI
> > response one-line-at-a-time and trickle each to the client.
> > Instead, I'd expect HTTP servers want to minimize syscall and
> > TCP/IP framing overhead by trying to send all of its response
> > headers in a single syscall or even combining the headers and
> > first chunk of the body with MSG_MORE or writev.
> > 
> > Verified by strace-ing response parsing on the CGI side.
> 
> I don't think this is wrong to do, but it does feel like it makes the
> code slightly more brittle (you have to pass around the strbuf and
> remember to initialize it and end_headers() when you're done), for not
> much benefit.

Yes, I was nervous about some of these changes and had to look
it over several times.  I think the problem was the code
initially assumed global state while this change localizes it;
so this ought to make future changes easier.

Thanks to you and Junio for the extra eyes.

> Using some kind of buffered I/O would be nicer, as then you would get
> nice-sized chunks without having to impact the code. I wonder if just
> using stdio here would be that bad. The place it usually sucks is in
> complex error handling, but we don't care about that at all here (I
> think we are basically happy to write until we get SIGPIPE).

stdio to a non-blocking pipe (if so setup by a caller) could be
problematic portability-wise.  And reliance on SIGPIPE would
hurt reusability if this were eventually factored into a
standalone daemon using libmicrohttpd or similar.

> I dunno. I suspect the performance improvement from your patch is
> marginal, but it's not like the resulting code is all _that_ complex. So
> I guess I am OK either way, just not enthused.

Yes, marginal, but I still get annoyed to see extra lines from
strace (maybe I trace syscalls too much :x).  But I think it's
also a baby step which opens up potential for the future.  I
have nothing planned at the moment, but who knows in a year or
five...

> > ---
> >   I admit I only noticed this because I was being lazy when
> >   implementing the reader-side on an HTTP server by making
> >   a single read(2) call :x
> 
> The trouble is that your HTTP server is still broken. Now it's just
> broken in an unpredictable and racy way, because the OS may still split
> the write at PIPE_BUF boundaries. (Though given that this is not in the
> commit message, I suspect you know this patch is not an excuse not to
> fix your HTTP server).

Of course :)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/8] xdl_change_compact(): fix compaction heuristic to adjust io

2016-08-10 Thread Michael Haggerty
On 08/04/2016 08:43 PM, Junio C Hamano wrote:
> Michael Haggerty  writes:
> 
>> The code branch used for the compaction heuristic incorrectly forgot to
>> keep io in sync while the group was shifted. I think that could have
>> led to reading past the end of the rchgo array.
> 
> I had to read the first sentence three times as "incorrectly forgot"
> was a bit strange thing to say (as if there is a situation where
> 'forgetting to do' is the correct thing to do, but in that case we
> would phrase it to stress that not doing is a deliberate choice,
> e.g. 'refraining from doing').  Perhaps s/incorrectly // is the
> simplest readability improvement?

Yes, that makes it clearer. Will change.

Michael

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 05/15] pkt-line: add packet_write_gently_fmt()

2016-08-10 Thread Junio C Hamano
Jeff King  writes:

> On Wed, Aug 10, 2016 at 03:04:01PM +0200, larsxschnei...@gmail.com wrote:
>
>> +int packet_write_gently_fmt(int fd, const char *fmt, ...)
>> +{
>> +static struct strbuf buf = STRBUF_INIT;
>> +va_list args;
>> +
>> +strbuf_reset();
>> +va_start(args, fmt);
>> +format_packet(1, , fmt, args);
>> +va_end(args);
>> +packet_trace(buf.buf + 4, buf.len - 4, 1);
>> +return (write_in_full(fd, buf.buf, buf.len) == buf.len ? 0 : -1);
>> +}
>
> Could the end of this function just be:
>
>   return packet_write_gently(fd, buf.buf, buf.len);
>
> ? I guess we'd prefer to avoid that, because it incurs an extra
> memmove() of the data.
>
> Similarly, I'd think this could share code with the non-gentle form
> (which should be able to just call this and die() if returns an error).
> Though sometimes the va_list transformation makes that awkward.

Yes.

Also regarding the naming, please have "_gently" at the end; that is
how all other function families with _gently variant are named, I
think.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 03/15] pkt-line: add `gentle` parameter to format_packet()

2016-08-10 Thread larsxschneider
From: Lars Schneider 

format_packet() dies if the caller wants to format a packet larger than
LARGE_PACKET_MAX. Certain callers might prefer an error response instead.

Add a parameter `gentle` to define if the function should signal an error
with the return value (gentle=1) or die (gentle=0).

Signed-off-by: Lars Schneider 
---
 pkt-line.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/pkt-line.c b/pkt-line.c
index 9400b47..e6b8410 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -108,7 +108,7 @@ static void set_packet_header(char *buf, const int size)
#undef hex
 }
 
-static void format_packet(struct strbuf *out, const char *fmt, va_list args)
+static int format_packet(int gentle, struct strbuf *out, const char *fmt, 
va_list args)
 {
size_t orig_len, n;
 
@@ -117,10 +117,15 @@ static void format_packet(struct strbuf *out, const char 
*fmt, va_list args)
strbuf_vaddf(out, fmt, args);
n = out->len - orig_len;
 
-   if (n > LARGE_PACKET_MAX)
-   die("protocol error: impossibly long line");
+   if (n > LARGE_PACKET_MAX) {
+   if (gentle)
+   return -1;
+   else
+   die("protocol error: impossibly long line");
+   }
 
set_packet_header(>buf[orig_len], n);
+   return 0;
 }
 
 void packet_write(int fd, const char *fmt, ...)
@@ -130,7 +135,7 @@ void packet_write(int fd, const char *fmt, ...)
 
strbuf_reset();
va_start(args, fmt);
-   format_packet(, fmt, args);
+   format_packet(0, , fmt, args);
va_end(args);
packet_trace(buf.buf + 4, buf.len - 4, 1);
write_or_die(fd, buf.buf, buf.len);
@@ -141,7 +146,7 @@ void packet_buf_write(struct strbuf *buf, const char *fmt, 
...)
va_list args;
 
va_start(args, fmt);
-   format_packet(buf, fmt, args);
+   format_packet(0, buf, fmt, args);
va_end(args);
 }
 
-- 
2.9.2

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] checkout: do not mention detach advice for explicit --detach option

2016-08-10 Thread Stefan Beller
When a user asked for a detached HEAD specifically with `--detach`,
we do not need to give advice on what a detached HEAD state entails as
we can assume they know what they're getting into as they asked for it.

Signed-off-by: Stefan Beller 
---
 builtin/checkout.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 27c1a05..fa2dce5 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -655,7 +655,7 @@ static void update_refs_for_switch(const struct 
checkout_opts *opts,
update_ref(msg.buf, "HEAD", new->commit->object.oid.hash, NULL,
   REF_NODEREF, UPDATE_REFS_DIE_ON_ERR);
if (!opts->quiet) {
-   if (old->path && advice_detached_head)
+   if (old->path && advice_detached_head && 
!opts->force_detach)
detach_advice(new->name);
describe_detached_head(_("HEAD is now at"), 
new->commit);
}
-- 
2.9.2.665.gdb8bb2f

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/3] i18n: setup: mark error messages for translation

2016-08-10 Thread Vasco Almeida
Signed-off-by: Vasco Almeida 
---
 setup.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/setup.c b/setup.c
index 6d0e0c9..fe572b8 100644
--- a/setup.c
+++ b/setup.c
@@ -759,9 +759,9 @@ static const char *setup_bare_git_dir(struct strbuf *cwd, 
int offset,
 static const char *setup_nongit(const char *cwd, int *nongit_ok)
 {
if (!nongit_ok)
-   die("Not a git repository (or any of the parent directories): 
%s", DEFAULT_GIT_DIR_ENVIRONMENT);
+   die(_("Not a git repository (or any of the parent directories): 
%s"), DEFAULT_GIT_DIR_ENVIRONMENT);
if (chdir(cwd))
-   die_errno("Cannot come back to cwd");
+   die_errno(_("Cannot come back to cwd"));
*nongit_ok = 1;
return NULL;
 }
@@ -842,7 +842,7 @@ static const char *setup_git_directory_gently_1(int 
*nongit_ok)
*nongit_ok = 0;
 
if (strbuf_getcwd())
-   die_errno("Unable to read current working directory");
+   die_errno(_("Unable to read current working directory"));
offset = cwd.len;
 
/*
@@ -912,19 +912,19 @@ static const char *setup_git_directory_gently_1(int 
*nongit_ok)
if (parent_device != current_device) {
if (nongit_ok) {
if (chdir(cwd.buf))
-   die_errno("Cannot come back to 
cwd");
+   die_errno(_("Cannot come back 
to cwd"));
*nongit_ok = 1;
return NULL;
}
strbuf_setlen(, offset);
-   die("Not a git repository (or any parent up to 
mount point %s)\n"
-   "Stopping at filesystem boundary 
(GIT_DISCOVERY_ACROSS_FILESYSTEM not set).",
+   die(_("Not a git repository (or any parent up 
to mount point %s)\n"
+   "Stopping at filesystem boundary 
(GIT_DISCOVERY_ACROSS_FILESYSTEM not set)."),
cwd.buf);
}
}
if (chdir("..")) {
strbuf_setlen(, offset);
-   die_errno("Cannot change to '%s/..'", cwd.buf);
+   die_errno(_("Cannot change to '%s/..'"), cwd.buf);
}
offset = offset_parent;
}
@@ -986,9 +986,9 @@ int git_config_perm(const char *var, const char *value)
/* A filemode value was given: 0xxx */
 
if ((i & 0600) != 0600)
-   die("Problem with core.sharedRepository filemode value "
+   die(_("Problem with core.sharedRepository filemode value "
"(0%.3o).\nThe owner of files must always have "
-   "read and write permissions.", i);
+   "read and write permissions."), i);
 
/*
 * Mask filemode value. Others can not get write permission.
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: patch submission process, was Re: [PATCH v6 06/16] merge_recursive: abort properly upon errors

2016-08-10 Thread Jakub Narębski
On 10 August 2016 at 02:55, Josh Triplett  wrote:
> On Tue, Aug 09, 2016 at 06:28:00PM +, Eric Wong wrote:
>> Some of these problems I hope public-inbox (or something like
>> it) can fix and turn the tide towards email, again.
>
> This really seems like the dichotomy that drives people towards central
> services like GitHub or GitLab.  We need an alternative that doesn't
> involve email, or at the very least, doesn't require people to use email
> directly.  Half of the pain in the process comes from coaxing email
> clients that don't treat mail text as sacrosanct to leave it alone and
> not mangle it.  (Some of that would go away if we accepted attachments
> with inline disposition, but not all of it.  All of it would go away if
> the submission process just involved "git push" to an appropriate
> location.)

But submission is less important than review. And for review it is
usually better (except gigantic series) to have patch text for review
with the review. And threading. And (meta)-versioning of series.
And place for proof-of-concept / weather-balon patches...

-- 
Jakub Narebski
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/3] i18n: archive: mark errors for translation

2016-08-10 Thread Vasco Almeida
Signed-off-by: Vasco Almeida 
---
 archive.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/archive.c b/archive.c
index 42df974..dde1ab4 100644
--- a/archive.c
+++ b/archive.c
@@ -458,11 +458,11 @@ static int parse_archive_args(int argc, const char **argv,
argc = parse_options(argc, argv, NULL, opts, archive_usage, 0);
 
if (remote)
-   die("Unexpected option --remote");
+   die(_("Unexpected option --remote"));
if (exec)
-   die("Option --exec can only be used together with --remote");
+   die(_("Option --exec can only be used together with --remote"));
if (output)
-   die("Unexpected option --output");
+   die(_("Unexpected option --output"));
 
if (!base)
base = "";
@@ -484,14 +484,14 @@ static int parse_archive_args(int argc, const char **argv,
usage_with_options(archive_usage, opts);
*ar = lookup_archiver(format);
if (!*ar || (is_remote && !((*ar)->flags & ARCHIVER_REMOTE)))
-   die("Unknown archive format '%s'", format);
+   die(_("Unknown archive format '%s'"), format);
 
args->compression_level = Z_DEFAULT_COMPRESSION;
if (compression_level != -1) {
if ((*ar)->flags & ARCHIVER_WANT_COMPRESSION_LEVELS)
args->compression_level = compression_level;
else {
-   die("Argument not supported for format '%s': -%d",
+   die(_("Argument not supported for format '%s': -%d"),
format, compression_level);
}
}
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/3] i18n: git-stash: mark messages for translation

2016-08-10 Thread Vasco Almeida
Signed-off-by: Vasco Almeida 
---

I added the second mark that I had missed the first time.
Thank you Junio C Hamano for spotting that.

 git-stash.sh | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/git-stash.sh b/git-stash.sh
index 22fb8bc..826af18 100755
--- a/git-stash.sh
+++ b/git-stash.sh
@@ -265,7 +265,7 @@ save_stash () {
create_stash "$stash_msg" $untracked
store_stash -m "$stash_msg" -q $w_commit ||
die "$(gettext "Cannot save the current status")"
-   say Saved working directory and index state "$stash_msg"
+   say "$(eval_gettext "Saved working directory and index state 
\$stash_msg")"
 
if test -z "$patch_mode"
then
@@ -548,7 +548,7 @@ pop_stash() {
drop_stash "$@"
else
status=$?
-   say "The stash is kept in case you need it again."
+   say "$(gettext "The stash is kept in case you need it again.")"
exit $status
fi
 }
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] git-series: track changes to a patch series over time

2016-08-10 Thread Richard Ipsum
On Thu, Aug 04, 2016 at 12:40:58PM -1000, Josh Triplett wrote:
> On Wed, Aug 03, 2016 at 08:12:02PM +0100, Richard Ipsum wrote:
> > On Thu, Jul 28, 2016 at 11:40:55PM -0700, Josh Triplett wrote:
> > > I'd welcome any feedback, whether on the interface and workflow, the
> > > internals and collaboration, ideas on presenting diffs of patch series,
> > > or anything else.
> > 
> > One other nice thing I've noticed about this tool is the
> > way series behave like regular git branches: I specify the name
> > of the series and from then on all other commands act on that
> > series until told otherwise.
> 
> Thanks; I spent a while thinking about that part of the workflow.  I
> save the current series as a symbolic ref SHEAD, and everything operates
> on SHEAD.  (I should probably add support for running things like "git
> series log" or "git series format" on a different series, because right
> now "until told otherwise" doesn't include a way to tell it otherwise.)

Apologies for this delayed response,
I needed time to gather my thoughts,
and also to fix the perl libgit2 binding to allow me to use
your symbolic ref suggestion. :p

Though it turns out that libgit2 doesn't currently allow
me to write arbitrary data to a symbolic ref as git-symbolic-ref(1) will,
so this still needs to be fixed somehow.

> 
> One fun detail that took a couple of iterations to get right: I keep
> separate "staged" and "working" versions per-series, so even with
> outstanding changes to the cover letter, base, or series, you can always
> detach or checkout another series without losing anything.  If you
> switch back, all your staged and unstaged changes will remain staged and
> unstaged where you left them.  That solves the "checkout a different
> series with modifications to the current series" case.

Cool

> 
> > git-appraise looks as though it might also have this behaviour.
> > I think it's a nice way to do it, since you don't generally
> > perform more than one review simultaneously. So I may well
> > use this idea in git-candidate if it's okay. :)
> 
> By all means.  For a review tool like git-candidate, it seems like you'd
> want even more contextual information, to make it easier to specify
> things like "comment on file F line L".  For instance, what if you
> spawned the diff to review in an editor, with plenty of extra context
> and a file extension that'll cause most editors to recognize it as a
> patch (and specifically a git-candidate patch to allow specialized
> editor modes), and told people to add their comments after the line they
> applied to?  When the editor exits successfully, you can scan the file,
> detect the added lines, and save those as comments.  You could figure
> out the appropriate line by looking for the diff hunk headers and
> counting line numbers.

I really like this idea, the current interface for commenting is a little
tedious I find.

> 
> If you use a format-patch diff that includes the headers and commit
> message, you could also support commenting on those in the same way.
> Does the notedb format support commenting on those?

Comments in notedb are just a git note keyed on the sha of the
commit being commented on, I'm not certain what advantage a format-patch
diff provides in this case?

I've been closely following the 'patch submission process' thread,
and given the discussion there I'm having doubts over the value
of comments in git-candidate vs the mailing list. It seems to me that
git-candidate has many of the disadvantages of Github/Gitlab when it
comes to comments, for example, there is no threading.

Also the system would be less open than the mailing list, since,
as it stands currently you would require push access to the repository
to comment on anything.

It may be worth reflecting that one reason some organisations
have switched away from mailing list reviews to Github/Gitlab is that
they provide patch tracking, where the mailing list provides none,
so patches there can be 'lost'. So instead of trying to reimplement
an entire Gerrit/Github/Gitlab ui on the commandline, I wonder whether
it would be sufficient to add the minimum functionality necessary
to provide git with native patch tracking, and leave comments for the
mailing list. Ofcourse this is exactly what git-series seems to do,
so in some sense I may be advocating dropping my own work in favour of
improving git-series.

On the other hand, relying on the mailing list means that some of the
history of a series is left outside of the repository which is
anathema to the goal of git based/stored review, not least because
mail archives are centralised.
(which can obviously be problematic (as we've seen recently with gmane))

Maybe there's a better solution to this problem than git-candidate then,
maybe we can just invent some wonderful new subcommand that fetches
a mailing list archive into a git repo, for those that want that,
I don't know.

Out of interest, did you have any thoughts on Notedb itself with respect
to its 

git svn --version requires working copy

2016-08-10 Thread Michal Čihař
Hi

I've just noticed, that running git svn --version requires working copy,
what is quite ugly to require working copy just to figure out if git svn
is installed and what version.

$ git svn --version
fatal: Not a git repository (or any parent up to mount point /home)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
Unable to find .git directory
 at /usr/lib/git-core/git-svn line 347.

Is this expected behavior?

Thanks
-- 
Michal Čihař | http://cihar.com/ | https://weblate.org/



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v5 02/15] pkt-line: call packet_trace() only if a packet is actually send

2016-08-10 Thread Lars Schneider

> On 10 Aug 2016, at 15:13, Jeff King  wrote:
> 
> On Wed, Aug 10, 2016 at 03:03:58PM +0200, larsxschnei...@gmail.com wrote:
> 
>> From: Lars Schneider 
>> 
>> The packet_trace() call is not ideal in format_packet() as we would print
>> a trace when a packet is formatted and (potentially) when the packet is
>> actually send. This was no problem up until now because format_packet()
>> was only used by one function. Fix it by moving the trace call into the
>> function that actually sends the packet.
> 
> It looks like there are two functions: packet_write() and
> packet_buf_write().

I did not call trace in packet_buf_write() because this function does not
perform any writes.


> Your patch only touches one of them, and it looks like we would fail to
> trace many packets (e.g., see receive-pack.c:report(), which uses
> packet_buf_write() and then write()s out the result).

I see. But isn't it confusing if packet_buf_write() issues a trace call?
If I just call this function then nothing happens at all. Shouldn't the
trace call be made in receive-pack.c:report() ? Or shouldn't receive-pack
let pkt-line.c perform the write calls?

-Lars


> PS Also, s/send/sent/ in the commit message.

Thanks :)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/8] xdl_change_compact(): keep track of the earliest end

2016-08-10 Thread Michael Haggerty
On 08/04/2016 08:46 PM, Junio C Hamano wrote:
> Michael Haggerty  writes:
> 
>> This makes it easier to detect whether shifting is possible, and will
>> also make the next change easier.
> 
> I can see the code keeping track of earliest_end but the above does
> not make it clear what the new "continue" is about.
> 
> ... easier to detect whether shifting is possible (in which case we
> can skip the shifting), and will also make ...
> 
> perhaps.

Thanks. I will make the change that you suggest.

Michael

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 04/15] pkt-line: add packet_write_gently()

2016-08-10 Thread Junio C Hamano
Jeff King  writes:

> On Wed, Aug 10, 2016 at 03:36:45PM +0200, Lars Schneider wrote:
>
>> > So now we have packet_write() and packet_write_gently(), but they differ
>> > in more than just whether they are gentle. That seems like a weird
>> > interface.
>> > 
>> > Should we either be picking a new name (e.g., packet_write_mem() or
>> > something), or migrating packet_write() to packet_write_fmt()?
>> 
>> Done in "[PATCH v5 08/15] pkt-line: rename packet_write() to 
>> packet_write_fmt()"
>
> Ah, OK. Generally I'd suggest to reorder things so that each patch looks
> like a step forward (and so the early patches become preparatory steps,
> and the justification in them is something like "we're going to add more
> write functions, so let's give this a more descriptive name").

I am guilty for saying "packet_write() should have been similar to
write(2)".  We may want to have a time-period during which there is
no "packet_write()" in the codebase, before we get to that stage.
I.e. rename it to packet_write_fmt() to vacate the name and add
packet_write_mem(), and then later rename packet_write_mem() to its
final name packet_write(), or something like that.  The two-step
process would reduce the chance of misconversion.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/2] diff: add SUBMODULE_DIFF format to display submodule diff

2016-08-10 Thread Jacob Keller
From: Jacob Keller 

For projects which have frequent updates to submodules it is often
useful to be able to see a submodule update commit as a difference.
Teach diff's --submodule= a new "diff" format which will execute a diff
for the submodule between the old and new commit, and display it as
a standard diff. This allows users to easily see what changed in
a submodule without having to dig into the submodule themselves.

Signed-off-by: Jacob Keller 
---
Changes since v2:
* dropped the prepare_submodule_diff as it's not really needed
* added --line-prefix to properly prefix every line
* added correct use of a_prefix and b_prefix
* use dup and fileno instead so that we don't have to copy output

I still need to sort out some tests, and I ran into one interesting
issue: I had to do make-install in order to get the sub command to
process the new diff option... is that normal? I think that may cause
issues during tests as well... Shouldn't we use the same binaries we're
running from and not search in the install path when running a sub
command?

 Documentation/diff-config.txt  |  3 +-
 Documentation/diff-options.txt |  6 ++--
 diff.c | 39 
 diff.h |  2 +-
 submodule.c| 67 ++
 submodule.h|  6 
 6 files changed, 107 insertions(+), 16 deletions(-)

diff --git a/Documentation/diff-config.txt b/Documentation/diff-config.txt
index d5a5b17d5088..f5d693afad6c 100644
--- a/Documentation/diff-config.txt
+++ b/Documentation/diff-config.txt
@@ -123,7 +123,8 @@ diff.suppressBlankEmpty::
 diff.submodule::
Specify the format in which differences in submodules are
shown.  The "log" format lists the commits in the range like
-   linkgit:git-submodule[1] `summary` does.  The "short" format
+   linkgit:git-submodule[1] `summary` does.  The "diff" format shows the
+   diff between the beginning and end of the range. The "short" format
format just shows the names of the commits at the beginning
and end of the range.  Defaults to short.
 
diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index e7b729f3644f..988068225463 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -215,8 +215,10 @@ any of those replacements occurred.
the commits in the range like linkgit:git-submodule[1] `summary` does.
Omitting the `--submodule` option or specifying `--submodule=short`,
uses the 'short' format. This format just shows the names of the commits
-   at the beginning and end of the range.  Can be tweaked via the
-   `diff.submodule` configuration variable.
+   at the beginning and end of the range. When `--submodule=diff` is
+   given, the 'diff' format is used. This format shows the diff between
+   the old and new submodule commmit from the perspective of the
+   submodule.  Can be tweaked via the `diff.submodule` configuration 
variable.
 
 --color[=]::
Show colored diff.
diff --git a/diff.c b/diff.c
index 6fe5c6d084a3..2d23767739a4 100644
--- a/diff.c
+++ b/diff.c
@@ -130,12 +130,18 @@ static int parse_dirstat_params(struct diff_options 
*options, const char *params
 
 static int parse_submodule_params(struct diff_options *options, const char 
*value)
 {
-   if (!strcmp(value, "log"))
+   if (!strcmp(value, "log")) {
DIFF_OPT_SET(options, SUBMODULE_LOG);
-   else if (!strcmp(value, "short"))
+   DIFF_OPT_CLR(options, SUBMODULE_DIFF);
+   } else if (!strcmp(value, "diff")) {
+   DIFF_OPT_SET(options, SUBMODULE_DIFF);
DIFF_OPT_CLR(options, SUBMODULE_LOG);
-   else
+   } else if (!strcmp(value, "short")) {
+   DIFF_OPT_CLR(options, SUBMODULE_DIFF);
+   DIFF_OPT_CLR(options, SUBMODULE_LOG);
+   } else {
return -1;
+   }
return 0;
 }
 
@@ -2305,6 +2311,15 @@ static void builtin_diff(const char *name_a,
struct strbuf header = STRBUF_INIT;
const char *line_prefix = diff_line_prefix(o);
 
+   diff_set_mnemonic_prefix(o, "a/", "b/");
+   if (DIFF_OPT_TST(o, REVERSE_DIFF)) {
+   a_prefix = o->b_prefix;
+   b_prefix = o->a_prefix;
+   } else {
+   a_prefix = o->a_prefix;
+   b_prefix = o->b_prefix;
+   }
+
if (DIFF_OPT_TST(o, SUBMODULE_LOG) &&
(!one->mode || S_ISGITLINK(one->mode)) &&
(!two->mode || S_ISGITLINK(two->mode))) {
@@ -2316,6 +2331,15 @@ static void builtin_diff(const char *name_a,
two->dirty_submodule,
meta, del, add, reset);
return;
+   } else if (DIFF_OPT_TST(o, SUBMODULE_DIFF) &&
+   (!one->mode || 

Re: patch submission process, was Re: [PATCH v6 06/16] merge_recursive: abort properly upon errors

2016-08-10 Thread Junio C Hamano
Josh Triplett  writes:

>> But submission is less important than review. And for review it is
>> usually better (except gigantic series) to have patch text for review
>> with the review.
>
> Agreed.  However, submission typically requires more work than review,
> because the patch text must remain applicable.  For review, as long as
> the email client you use to respond doesn't do something horrible like
> *re-wrap* the quoted patch text, the result will work as a review.

Yup.  That is why we say "please send patch inline; when asked to
send it as an attachment, please do so".
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC v2] diff: add SUBMODULE_DIFF format to display submodule diff

2016-08-10 Thread Junio C Hamano
Jacob Keller  writes:

>>> + diff = fdopen(cp.out, "r");
>>> +
>>> + c = fgetc(diff);
>>> + while (c != EOF) {
>>> + fputc(c, f);
>>> + c = fgetc(diff);
>>> + }
>>> +
>>> + fclose(diff);
>>> + finish_command();
>>
>> I do not think you need to do this.  If "f" is already a FILE * to
>> which the output must go, then instead of reading from the pipe and
>> writing it, you can just let the "diff" spit out its output to the
>> same file descriptor, by doing something like:
>>
>> fflush(f);
>> cp.out = dup(fileno(f));
>> ... other setup ...
>> run_command();
>>
>> no?  I do not offhand recall run_command() closes cp.out after done,
>> and if it doesn't you may have to close it yourself after the above
>> sequence.
>
> That makes a lot more sense, yes. I hadn't thought of dup, (long time
> since I've had to use file descriptors).
>
> the child_process api does close the descriptor itself. That's a much
> better way to get the result we want, and it is less code, so I'll try
> this out.

One caveat.  If the superproject is doing "log -p --graph", the
output will be broken with this "splice in 'git -C submodule diff'
output in place" approach.  I personally do not care if it is broken
as I do not use "-p" and "--graph" together (for that matter, I do
not think I'll use the "splice in 'git -C submodule diff'" feature
added by this patch, either, so I do not think I'd deeply care ;-),
but I am reasonably sure those who would use these would.

The way to solve it is to capture diff output and send out with the
current graph prefix (i.e. the set of vertial lines), and the
easiest and most expensive (probably too expensive to be practical)
way to do so would be to capture the whole thing into a strbuf, but
I think it is OK to teach a new option to "git diff" that tells to
prefix a fixed string given as its argument to each and every line
of its output, and that would be a more practical solution, which
can also be reused if you choose to later run "diff" internally
without spawning it as a separate process.

The graph machinery may have to be taught a way to tell what the
current graph prefix is if you go that route.

> I looked at trying to call diff for this, but I think it's not worth
> it. Using the child process API makes more sense and is a cleaner
> method. I'll go this route as it appears to work pretty well. The only
> major concern I have is that options may not get forwarded
> correctly...

You'd need to deal with options either way.  You do not want to
assume all options should be passed down.  You now know you need to
tweak the prefix, but you do not know if that is all that is
required.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/2] diff: add --line-prefix option for passing in a prefix

2016-08-10 Thread Jacob Keller
From: Jacob Keller 

This will be used by a future patch which implements a diff mode for
submodule display. Without this, the diff output would incorrectly
display when using both -p and --graph during a git-log.

Signed-off-by: Jacob Keller 
---
As suggested by Junio, I implemented --line-prefix to enable the graph
display correctly. This works by a neat trick of adding to the msgbuf,
so no code needs to be altered. I presumed that the line prefix should
go *after* the graphs own prefix.

 Documentation/diff-options.txt |  3 +++
 diff.c | 12 +++-
 diff.h |  1 +
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/Documentation/diff-options.txt b/Documentation/diff-options.txt
index 705a87394200..e7b729f3644f 100644
--- a/Documentation/diff-options.txt
+++ b/Documentation/diff-options.txt
@@ -569,5 +569,8 @@ endif::git-format-patch[]
 --no-prefix::
Do not show any source or destination prefix.
 
+--line-prefix=::
+   Prepend an additional prefix to every diff output line.
+
 For more detailed explanation on these common options, see also
 linkgit:gitdiffcore[7].
diff --git a/diff.c b/diff.c
index b43d3dd2ecb7..6fe5c6d084a3 100644
--- a/diff.c
+++ b/diff.c
@@ -1167,10 +1167,16 @@ const char *diff_get_color(int diff_use_color, enum 
color_diff ix)
 const char *diff_line_prefix(struct diff_options *opt)
 {
struct strbuf *msgbuf;
+
if (!opt->output_prefix)
-   return "";
+   if (opt->line_prefix)
+   return opt->line_prefix;
+   else
+   return "";
 
msgbuf = opt->output_prefix(opt, opt->output_prefix_data);
+   if (opt->line_prefix)
+   strbuf_addstr(msgbuf, opt->line_prefix);
return msgbuf->buf;
 }
 
@@ -3966,6 +3972,10 @@ int diff_opt_parse(struct diff_options *options,
options->a_prefix = optarg;
return argcount;
}
+   else if ((argcount = parse_long_opt("line-prefix", av, ))) {
+   options->line_prefix = optarg;
+   return argcount;
+   }
else if ((argcount = parse_long_opt("dst-prefix", av, ))) {
options->b_prefix = optarg;
return argcount;
diff --git a/diff.h b/diff.h
index 125447be09eb..6a91a1139686 100644
--- a/diff.h
+++ b/diff.h
@@ -115,6 +115,7 @@ struct diff_options {
const char *pickaxe;
const char *single_follow;
const char *a_prefix, *b_prefix;
+   const char *line_prefix;
unsigned flags;
unsigned touched_flags;
 
-- 
2.9.2.872.g0b694e0.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/8] xdl_change_compact(): fix compaction heuristic to adjust io

2016-08-10 Thread Michael Haggerty
On 08/04/2016 09:27 AM, Jeff King wrote:
> On Thu, Aug 04, 2016 at 12:00:33AM +0200, Michael Haggerty wrote:
> 
>> The code branch used for the compaction heuristic incorrectly forgot to
>> keep io in sync while the group was shifted. I think that could have
>> led to reading past the end of the rchgo array.
>>
>> Signed-off-by: Michael Haggerty 
>> ---
>> I didn't actually try to verify the presence of a bug, because it
>> seems like more work than worthwhile. But here is my reasoning:
>>
>> If io is not decremented correctly during one iteration of the outer
>> `while` loop, then it will loose sync with the `end` counter. In
>> particular it will be too large.
>>
>> Suppose that the next iterations of the outer `while` loop (i.e.,
>> processing the next block of add/delete lines) don't have any sliders.
>> Then the `io` counter would be incremented by the number of
>> non-changed lines in xdf, which is the same as the number of
>> non-changed lines in xdfo that *should have* followed the group that
>> experienced the malfunction. But since `io` was too large at the end
>> of that iteration, it will be incremented past the end of the
>> xdfo->rchg array, and will try to read that memory illegally.
> 
> Hmm. In the loop:
> 
>   while (rchgo[io])
>   io++;
> 
> that implies that rchgo has a zero-marker that we can rely on hitting.

I agree.

> And it looks like rchgo[io] always ends the loop on a 0. So it seems
> like we would just hit that condition again.

Correct...in this loop. But there is another place where `io` is
incremented unconditionally. In the version before my changes, it is via
the preincrement operator in this while statement conditional:

https://github.com/mhagger/git/blob/a28705da929ad746abcb34270947f738549d3246/xdiff/xdiffi.c#L502

After my changes, the unconditional increment is more obvious because I
took it out of the while condition:

https://github.com/mhagger/git/blob/39a135da93834fd72ee923d95d0cebfe525dfe7a/xdiff/xdiffi.c#L541

(BTW, I think this is a good example of how patch 2/8 makes the code
easier to reason about.)

I didn't do the hard work to determine whether `io` could *really* walk
off the end of the array, but I don't see an obvious reason why it
*couldn't*.

> Anyway, I'd suggest putting your cover letter bits into the commit
> message. Even though they are all suppositions, they are the kind of
> thing that could really help somebody debugging this in 2 years, and are
> better than nothing.

Good idea. Will do.

Michael

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 9/9] status: unit tests for --porcelain=v2

2016-08-10 Thread Jeff Hostetler



On 08/08/2016 01:07 PM, Junio C Hamano wrote:

Jeff Hostetler  writes:


+test_expect_success pre_initial_commit_0 '
+   ...
+   git status --porcelain=v2 --branch --untracked-files=normal >actual &&
+   test_cmp expect actual
+'
+
+
+test_expect_success pre_initial_commit_1 '
+   git add file_x file_y file_z dir1 &&
+   ...
+   cat >expect <<-EOF &&
+   # branch.oid (initial)
+   # branch.head master
+   1 A. N... 00 100644 100644 $_z40 $OID_A dir1/file_a
+   ...


This makes one wonder what would/should be shown on the "A." column
if one of these files are added with "-N" (intent-to-add).  I do not
see any test for such entries in this patch, but I think it should.


I must admit that I don't use -N, so I'm open to recommendations here.
In my brief testing, the existing porcelain status reports it as "AM"
(for both a file with content and an empty file).

The V2 code outputs the following:
1 AM N... 00 100644 100644  
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 intent.add


Which I think makes sense.  I'll add a test to exercise that.



"Try -z on the above" can and should be in the test title to help
...
Having said all that, it is OK to fix their titles after the current
9-patch series lands on 'next'; incremental refinements are easier
on reviewers than having to review too many rerolls.


I'll change the test titles to have all that info.



This is probably a good place to see what happens to these untracked
files and branch info if we do not ask the command to show them.


I'll add some cases to cycle thru the options and confirm
there's no output when not requested.



+##
+## Ignore a file
+##
+
+test_expect_success ignore_file_0 '
+   echo x.ign >.gitignore &&
+   echo "ignore me" >x.ign &&
+   H1=$(git rev-parse HEAD) &&
+
+   ## ignored file SHOULD NOT appear in output when --ignored is not used.
+ ...
+   git status --porcelain=v2 --branch --untracked-files=all >actual &&
+   test_cmp expect actual &&
+ ...
+   git status --porcelain=v2 --branch --ignored --untracked-files=all >actual 
&&
+   rm x.ign &&
+   rm .gitignore &&


Arrange these files to be cleaned before you create them by having

test_when_finished "rm -f x.ign .gitignore" &&

at the very beginning of this test before they are created.
Otherwise, if any step before these removal fail, later test that
assume they are gone will be affected.  You already do so correctly
in the upstream_fields_0 test below.


Missed a few.  Got it.  Thanks!



+
+   cat >expect <<-EOF &&
+   # branch.oid $HM
+   # branch.head AA_M
+   u AA N... 00 100644 100644 100644 $_z40 $OID_AA_B $OID_AA_A 
conflict.txt
+   EOF


This is a small point, but doesn't the lowercase 'u' somehow look
ugly, especially because the status letters that immediately follow
it are all uppercase?



Since we are inventing a new format and my column 1 is completely new
I didn't think it mattered.  And I used a lowercase 'u' to distinguish 
it from the "U" in the X and Y columns since they mean different things.


But we can change it if you'd prefer.



+   git status --porcelain=v2 --branch --untracked-files=all >actual &&
+   git reset --hard &&


This "reset" also may be a candidate for test_when_finished clean-up
(I won't repeat the comment but there probably are many others).


Got it. And I hopefully caught the rest.



+test_expect_success 'upstream_fields_0' '
+   git checkout master &&
+   test_when_finished rm -rf sub_repo &&


The "when-finished" argument are usually quoted like this, I think:

test_when_finished "rm -rf sub_repo" &&


Got it. Thanks.


I have the test changes ready.  As suggested, I'll send a single commit
patch after my 9-patch series lands on 'next'.

Thanks,
Jeff




--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] http-backend: buffer headers before sending

2016-08-10 Thread Junio C Hamano
Eric Wong  writes:

> diff --git a/http-backend.c b/http-backend.c
> index 0d59499..adc8c8c 100644
> --- a/http-backend.c
> +++ b/http-backend.c
> @@ -75,55 +75,57 @@ static void format_write(int fd, const char *fmt, ...)
>   write_or_die(fd, buffer, n);
>  }
>  
> -static void http_status(unsigned code, const char *msg)
> +static void http_status(struct strbuf *hdr, unsigned code, const char *msg)
>  {
> - format_write(1, "Status: %u %s\r\n", code, msg);
> + strbuf_addf(hdr, "Status: %u %s\r\n", code, msg);
>  }

The idea is to pass "struct strbuf *hdr" around the callchain to
helpers so that they can all append into it instead of doing
format_write(), which is sensible.  This pattern continues quite a
bit (omitted).

> -static void end_headers(void)
> +static void end_headers(struct strbuf *hdr)
>  {
> - write_or_die(1, "\r\n", 2);
> + strbuf_add(hdr, "\r\n", 2);
> + write_or_die(1, hdr->buf, hdr->len);
> + strbuf_release(hdr);
>  }

Then end_headers() is taught to drain the buffered headers.  The
helpers used by other helpers in the next level of abstraction that
emit the header and the body also need to take the buffer as their
parameter, like this one:

> -static void send_strbuf(const char *type, struct strbuf *buf)
> +static void send_strbuf(struct strbuf *hdr,
> + const char *type, struct strbuf *buf)
>  {
> - hdr_int(content_length, buf->len);
> - hdr_str(content_type, type);
> - end_headers();
> + hdr_int(hdr, content_length, buf->len);
> + hdr_str(hdr, content_type, type);
> + end_headers(hdr);
>   write_or_die(1, buf->buf, buf->len);
>  }

because their caller already has something to say in the header
part.

> +static int bad_request(struct strbuf *hdr, const struct service_cmd *c)
> +{
> + const char *proto = getenv("SERVER_PROTOCOL");
> +
> + if (proto && !strcmp(proto, "HTTP/1.1")) {
> + http_status(hdr, 405, "Method Not Allowed");
> + hdr_str(hdr, "Allow",
> + !strcmp(c->method, "GET") ? "GET, HEAD" : c->method);
> + } else
> + http_status(hdr, 400, "Bad Request");
> + hdr_nocache(hdr);
> + end_headers(hdr);
> + return 0;
> +}
> +

This is like other helpers that respond with 4xx, e.g. forbidden(),
but it did not exist only because there was a single callsite that
needed this feature, which made it open-coded directly in main().
It was somewhat surprising to see a new helper, because the only
thing this patch changes logically is where the output is sent, but
this is a very sensible refactoring.

>  int cmd_main(int argc, const char **argv)
>  {
>   char *method = getenv("REQUEST_METHOD");
> ...
> @@ -659,18 +681,8 @@ int cmd_main(int argc, const char **argv)
>   if (!regexec(, dir, 1, out, 0)) {
>   size_t n;
>  
> - if (strcmp(method, c->method)) {
> - const char *proto = getenv("SERVER_PROTOCOL");
> - if (proto && !strcmp(proto, "HTTP/1.1")) {
> - http_status(405, "Method Not Allowed");
> - hdr_str("Allow", !strcmp(c->method, 
> "GET") ?
> - "GET, HEAD" : c->method);
> - } else
> - http_status(400, "Bad Request");
> - hdr_nocache();
> - end_headers();
> - return 0;
> - }
> + if (strcmp(method, c->method))
> + return bad_request(, c);

... and this is where it came from.  It could have been a separate
"preparatory cleanup" step, but it is so trivial that "while at it"
clean-up is perfectly fine.

Thanks, will queue.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/8] xdl_change_compact(): clarify code

2016-08-10 Thread Stefan Beller
On Wed, Aug 10, 2016 at 9:39 AM, Michael Haggerty  wrote:

>
> I realized that the main thing that took me a while to grok when I was
> reading this code was that blank_lines was really only used as a boolean
> value, even though it was updated with "+=". That's the main information
> that I'd like to convey to the reader.

Oh :(

I think there was some discussion when we added the blank line counting
whether we would want to have it boolean or counting. And we settled
for counting as "future algorithms can make use of this additional information"
IIRC.

>
> So I decided to change the comment to emphasize this fact (and change it
> from a question to a statement), and also changed the place that
> blank_lines is updated to treat it more like a boolean. The latter
> change also has the advantage of not calling is_blank_line()
> unnecessarily when blank_lines is already true.
>
> If you have no objections, that is what I will put in v2 of this patch
> series:

No objections from my side,
sorry for this lengthy discussion about a comment,

Thanks,
Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] t7406: fix breakage on OSX

2016-08-10 Thread Stefan Beller
On OSX `wc` prefixes the output of numbers with whitespace, such that
the `commit_count` would be "SP ". When using that in

git submodule update --init --depth=$commit_count

the depth would be empty and the number is interpreted as the pathspec.
Fix this by not using `wc` and rather instruct rev-list to count.

Another way to fix this is to remove the `=` sign after the `--depth`
argument as then we are allowed to have more than just one whitespace
between `--depth` and the actual number. Prefer the solution of rev-list
counting as that is expected to be slightly faster and more self-sustained
within Git.

Reported-by: Lars Schneider 
Helped-by: Junio C Hamano ,
Signed-off-by: Stefan Beller 
---

  origin/sb/submodule-update-dot-branch

 t/t7406-submodule-update.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
index d7983cf..64f322c 100755
--- a/t/t7406-submodule-update.sh
+++ b/t/t7406-submodule-update.sh
@@ -877,7 +877,7 @@ test_expect_success 'submodule update clone shallow 
submodule' '
test_when_finished "rm -rf super3" &&
first=$(git -C cloned submodule status submodule |cut -c2-41) &&
second=$(git -C submodule rev-parse HEAD) &&
-   commit_count=$(git -C submodule rev-list $first^..$second | wc -l) &&
+   commit_count=$(git -C submodule rev-list --count $first^..$second) &&
git clone cloned super3 &&
pwd=$(pwd) &&
(
-- 
2.9.2.665.gdb8bb2f

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/2] diff: add --line-prefix option for passing in a prefix

2016-08-10 Thread Junio C Hamano
Jacob Keller  writes:

> As suggested by Junio, I implemented --line-prefix to enable the graph
> display correctly. This works by a neat trick of adding to the msgbuf,
> so no code needs to be altered. I presumed that the line prefix should
> go *after* the graphs own prefix.

I do not understand the last sentence.

The motivation I suggested the --line-prefix for is for a scenario
in which "git log -p --graph" that recurses into a submodule causes
"git diff A B" between the two commits in a submodule to run; the
internal diff machinery driven by "log -p --graph" for the
superproject knows what the graph lines that depict the lineages of
history in the superproject should look like, but the "git diff A B"
in the submodule of course does not, so while showing e.g. "| | |"
(three lineages) for the history in the superproject, you would run
"git diff --line-prefix='| | | ' A B" and by showing them before
anything that "git diff A B" without the new option would have
produced, you could mimick and match the graph output in the
superproject.

In that scenario, the line prefix _is_ the graph's prefix in the
superproject.

You might be envisioning a future enhancement where the recursive
one uses not "-Submodule commit A"/"-Submodule commit B", and not
"diff A B", but "log -p A...B" in the submodule, and in such a case,
it might make sense to run "log -p --graph A...B" instead when the
command the end user run in the superproject asked for "--graph".
You would use the same "--line-prefix='| | | '" when running the
"log -p --graph A...B" command in the submodule, so that the output
for the submodule will go _after_ the graph of the superproject, but
in that case, the output for the submodule would also include its
own graph to show the relationship between A and B.  The line-prefix
must come before that graph part (and also if it is "git log" run in
the submodule, the same line-prefix must come before each line of
the log message output).

With that understanding/assumption, "line prefix should go after the
graph's own prefix" sounds like a wrong choice to me.  Shouldn't the
prefix go before everything else?

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v12 02/13] bisect: rewrite `check_term_format` shell function in C

2016-08-10 Thread Pranit Bauva
Reimplement the `check_term_format` shell function in C and add
a `--check-term-format` subcommand to `git bisect--helper` to call it
from git-bisect.sh

Using `--check-term-format` subcommand is a temporary measure to port
shell function to C so as to use the existing test suite. As more
functions are ported, this subcommand will be retired and its
implementation will be called by some other method/subcommand. For
eg. In conversion of write_terms() of git-bisect.sh, the subcommand will
be removed and instead check_term_format() will be called in its C
implementation while a new subcommand will be introduced for write_terms().

Helped-by: Johannes Schindelein 
Mentored-by: Lars Schneider 
Mentored-by: Christian Couder 
Signed-off-by: Pranit Bauva 
---
 builtin/bisect--helper.c | 60 +++-
 git-bisect.sh| 31 ++---
 2 files changed, 61 insertions(+), 30 deletions(-)

diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c
index 8111c91..a47f1f2 100644
--- a/builtin/bisect--helper.c
+++ b/builtin/bisect--helper.c
@@ -2,19 +2,73 @@
 #include "cache.h"
 #include "parse-options.h"
 #include "bisect.h"
+#include "refs.h"
 
 static const char * const git_bisect_helper_usage[] = {
N_("git bisect--helper --next-all [--no-checkout]"),
+   N_("git bisect--helper --check-term-format  "),
NULL
 };
 
+/*
+ * Check whether the string `term` belongs to the set of strings
+ * included in the variable arguments.
+ */
+LAST_ARG_MUST_BE_NULL
+static int one_of(const char *term, ...)
+{
+   int res = 0;
+   va_list matches;
+   const char *match;
+
+   va_start(matches, term);
+   while (!res && (match = va_arg(matches, const char *)))
+   res = !strcmp(term, match);
+   va_end(matches);
+
+   return res;
+}
+
+static int check_term_format(const char *term, const char *orig_term)
+{
+   int res;
+   char *new_term = xstrfmt("refs/bisect/%s", term);
+
+   res = check_refname_format(new_term, 0);
+   free(new_term);
+
+   if (res)
+   return error(_("'%s' is not a valid term"), term);
+
+   if (one_of(term, "help", "start", "skip", "next", "reset",
+   "visualize", "replay", "log", "run", NULL))
+   return error(_("can't use the builtin command '%s' as a term"), 
term);
+
+   /*
+* In theory, nothing prevents swapping completely good and bad,
+* but this situation could be confusing and hasn't been tested
+* enough. Forbid it for now.
+*/
+
+   if ((strcmp(orig_term, "bad") && one_of(term, "bad", "new", NULL)) ||
+(strcmp(orig_term, "good") && one_of(term, "good", "old", 
NULL)))
+   return error(_("can't change the meaning of the term '%s'"), 
term);
+
+   return 0;
+}
+
 int cmd_bisect__helper(int argc, const char **argv, const char *prefix)
 {
-   enum { NEXT_ALL = 1 } cmdmode = 0;
+   enum {
+   NEXT_ALL = 1,
+   CHECK_TERM_FMT
+   } cmdmode = 0;
int no_checkout = 0;
struct option options[] = {
OPT_CMDMODE(0, "next-all", ,
 N_("perform 'git bisect next'"), NEXT_ALL),
+   OPT_CMDMODE(0, "check-term-format", ,
+N_("check format of the term"), CHECK_TERM_FMT),
OPT_BOOL(0, "no-checkout", _checkout,
 N_("update BISECT_HEAD instead of checking out the 
current commit")),
OPT_END()
@@ -29,6 +83,10 @@ int cmd_bisect__helper(int argc, const char **argv, const 
char *prefix)
switch (cmdmode) {
case NEXT_ALL:
return bisect_next_all(prefix, no_checkout);
+   case CHECK_TERM_FMT:
+   if (argc != 2)
+   die(_("--check-term-format requires two arguments"));
+   return check_term_format(argv[0], argv[1]);
default:
die("BUG: unknown subcommand '%d'", cmdmode);
}
diff --git a/git-bisect.sh b/git-bisect.sh
index 5d1cb00..7d7965d 100755
--- a/git-bisect.sh
+++ b/git-bisect.sh
@@ -564,38 +564,11 @@ write_terms () {
then
die "$(gettext "please use two different terms")"
fi
-   check_term_format "$TERM_BAD" bad
-   check_term_format "$TERM_GOOD" good
+   git bisect--helper --check-term-format "$TERM_BAD" bad || exit
+   git bisect--helper --check-term-format "$TERM_GOOD" good || exit
printf '%s\n%s\n' "$TERM_BAD" "$TERM_GOOD" >"$GIT_DIR/BISECT_TERMS"
 }
 
-check_term_format () {
-   term=$1
-   git check-ref-format refs/bisect/"$term" ||
-   die "$(eval_gettext "'\$term' is not a valid term")"
-   case "$term" in
-   help|start|terms|skip|next|reset|visualize|replay|log|run)
-   die 

[PATCH v12 07/13] bisect--helper: `bisect_reset` shell function in C

2016-08-10 Thread Pranit Bauva
Reimplement `bisect_reset` shell function in C and add a `--bisect-reset`
subcommand to `git bisect--helper` to call it from git-bisect.sh .

Using `bisect_reset` subcommand is a temporary measure to port shell
functions to C so as to use the existing test suite. As more functions
are ported, this subcommand would be retired but its implementation will
be called by some other method.

Note: --bisect-clean-state subcommand has not been retired as there are
still a function namely `bisect_start()` which still uses this
subcommand.

Mentored-by: Lars Schneider 
Mentored-by: Christian Couder 
Signed-off-by: Pranit Bauva 
---
 builtin/bisect--helper.c | 47 ++-
 git-bisect.sh| 28 ++--
 2 files changed, 48 insertions(+), 27 deletions(-)

diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c
index 3fffa78..409e268 100644
--- a/builtin/bisect--helper.c
+++ b/builtin/bisect--helper.c
@@ -4,6 +4,8 @@
 #include "bisect.h"
 #include "refs.h"
 #include "dir.h"
+#include "argv-array.h"
+#include "run-command.h"
 
 static GIT_PATH_FUNC(git_path_bisect_terms, "BISECT_TERMS")
 static GIT_PATH_FUNC(git_path_bisect_expected_rev, "BISECT_EXPECTED_REV")
@@ -13,11 +15,13 @@ static GIT_PATH_FUNC(git_path_bisect_names, "BISECT_NAMES")
 static GIT_PATH_FUNC(git_path_bisect_run, "BISECT_RUN")
 static GIT_PATH_FUNC(git_path_head_name, "head-name")
 static GIT_PATH_FUNC(git_path_bisect_start, "BISECT_START")
+static GIT_PATH_FUNC(git_path_bisect_head, "BISECT_HEAD")
 
 static const char * const git_bisect_helper_usage[] = {
N_("git bisect--helper --next-all [--no-checkout]"),
N_("git bisect--helper --write-terms  "),
N_("git bisect--helper --bisect-clean-state"),
+   N_("git bisect--helper --bisect-reset []"),
NULL
 };
 
@@ -125,12 +129,47 @@ static int bisect_clean_state(void)
return result;
 }
 
+static int bisect_reset(const char *commit)
+{
+   struct strbuf branch = STRBUF_INIT;
+
+   if (!commit) {
+   if (strbuf_read_file(, git_path_bisect_start(), 0) < 1) {
+   printf("We are not bisecting.\n");
+   return 0;
+   }
+   strbuf_rtrim();
+   } else {
+   struct object_id oid;
+   if (get_oid(commit, ))
+   return error(_("'%s' is not a valid commit"), commit);
+   strbuf_addstr(, commit);
+   }
+
+   if (!file_exists(git_path_bisect_head())) {
+   struct argv_array argv = ARGV_ARRAY_INIT;
+   argv_array_pushl(, "checkout", branch.buf, "--", NULL);
+   if (run_command_v_opt(argv.argv, RUN_GIT_CMD)) {
+   error(_("Could not check out original HEAD '%s'. Try "
+   "'git bisect reset '."), branch.buf);
+   strbuf_release();
+   argv_array_clear();
+   return -1;
+   }
+   argv_array_clear();
+   }
+
+   strbuf_release();
+   return bisect_clean_state();
+}
+
 int cmd_bisect__helper(int argc, const char **argv, const char *prefix)
 {
enum {
NEXT_ALL = 1,
WRITE_TERMS,
-   BISECT_CLEAN_STATE
+   BISECT_CLEAN_STATE,
+   BISECT_RESET
} cmdmode = 0;
int no_checkout = 0;
struct option options[] = {
@@ -140,6 +179,8 @@ int cmd_bisect__helper(int argc, const char **argv, const 
char *prefix)
 N_("write the terms to .git/BISECT_TERMS"), 
WRITE_TERMS),
OPT_CMDMODE(0, "bisect-clean-state", ,
 N_("cleanup the bisection state"), BISECT_CLEAN_STATE),
+   OPT_CMDMODE(0, "bisect-reset", ,
+N_("reset the bisection state"), BISECT_RESET),
OPT_BOOL(0, "no-checkout", _checkout,
 N_("update BISECT_HEAD instead of checking out the 
current commit")),
OPT_END()
@@ -162,6 +203,10 @@ int cmd_bisect__helper(int argc, const char **argv, const 
char *prefix)
if (argc != 0)
die(_("--bisect-clean-state requires no arguments"));
return bisect_clean_state();
+   case BISECT_RESET:
+   if (argc > 1)
+   die(_("--bisect-reset requires either zero or one 
arguments"));
+   return bisect_reset(argc ? argv[0] : NULL);
default:
die("BUG: unknown subcommand '%d'", cmdmode);
}
diff --git a/git-bisect.sh b/git-bisect.sh
index bbc57d2..18580b7 100755
--- a/git-bisect.sh
+++ b/git-bisect.sh
@@ -409,35 +409,11 @@ bisect_visualize() {
eval '"$@"' --bisect -- $(cat "$GIT_DIR/BISECT_NAMES")
 }
 
-bisect_reset() {
-   test -s "$GIT_DIR/BISECT_START" || 

[PATCH v12 13/13] bisect--helper: `bisect_start` shell function partially in C

2016-08-10 Thread Pranit Bauva
Reimplement the `bisect_start` shell function partially in C and add
`bisect-start` subcommand to `git bisect--helper` to call it from
git-bisect.sh .

The last part is not converted because it calls another shell function.
`bisect_start` shell function will be completed after the `bisect_next`
shell function is ported in C.

Using `--bisect-start` subcommand is a temporary measure to port shell
function in C so as to use the existing test suite. As more functions
are ported, this subcommand will be retired and will be called by some
other methods.

Mentored-by: Lars Schneider 
Mentored-by: Christian Couder 
Signed-off-by: Pranit Bauva 
---
 builtin/bisect--helper.c | 254 ++-
 git-bisect.sh| 133 +
 2 files changed, 254 insertions(+), 133 deletions(-)

diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c
index f912010..80116ba 100644
--- a/builtin/bisect--helper.c
+++ b/builtin/bisect--helper.c
@@ -7,6 +7,7 @@
 #include "argv-array.h"
 #include "run-command.h"
 #include "prompt.h"
+#include "quote.h"
 
 static GIT_PATH_FUNC(git_path_bisect_terms, "BISECT_TERMS")
 static GIT_PATH_FUNC(git_path_bisect_expected_rev, "BISECT_EXPECTED_REV")
@@ -27,6 +28,8 @@ static const char * const git_bisect_helper_usage[] = {
N_("git bisect--helper --bisect-check-and-set-terms  
 "),
N_("git bisect--helper --bisect-next-check []  
term_good);
+   strbuf_addstr(>term_good, argv[++i]);
+   }
+   else if (skip_prefix(argv[i], "--term-good=", [i])) {
+   must_write_terms = 1;
+   strbuf_reset(>term_good);
+   strbuf_addstr(>term_good, argv[i]);
+   }
+   else if (skip_prefix(argv[i], "--term-old=", [i])) {
+   must_write_terms = 1;
+   strbuf_reset(>term_good);
+   strbuf_addstr(>term_good, argv[i]);
+   }
+   else if (!strcmp(argv[i], "--term-bad") ||
+!strcmp(argv[i], "--term-new")) {
+   must_write_terms = 1;
+   strbuf_reset(>term_bad);
+   strbuf_addstr(>term_bad, argv[++i]);
+   }
+   else if (skip_prefix(argv[i], "--term-bad=", [i])) {
+   must_write_terms = 1;
+   strbuf_reset(>term_bad);
+   strbuf_addstr(>term_bad, argv[i]);
+   }
+   else if (skip_prefix(argv[i], "--term-new=", [i])) {
+   must_write_terms = 1;
+   strbuf_reset(>term_good);
+   strbuf_addstr(>term_good, argv[i]);
+   }
+   else if (starts_with(argv[i], "--") &&
+!one_of(argv[i], "--term-good", "--term-bad", NULL)) {
+   string_list_clear(, 0);
+   string_list_clear(, 0);
+   die(_("unrecognised option: '%s'"), argv[i]);
+   }
+   else if (get_oid(argv[i], ) && !has_double_dash) {
+   string_list_clear(, 0);
+   

[ANNOUNCE] tig-2.2

2016-08-10 Thread Jonas Fonseca
Hello,

I've just released the 35th release of Tig. It brings several search
improvements such as highlighting and wrap around, and machinery for future
support of typeahead search. This release also gives more choice over how the
user configuration file is loaded either at built-time or at runtime through
support of the XDG basedir spec. Among fixes several segfaults and invalid
reads have been addressed and the tests are now run with Valgrind and
AddressSanitizer by Travis on each push. There are several breaking changes so
ensure you read the section on incompatibilities in the release notes before
upgrading.

It's crazy to think that Tig has happily browsed Git repos for more than 10
years! Thanks to everybody who contributed and made that possible. Looking
forward to the next 10 years ...

 - Homepage: http://jonas.nitro.dk/tig/
 - Manual: http://jonas.nitro.dk/tig/manual.html
 - Tarballs: http://jonas.nitro.dk/tig/releases/
 - Git URL: git://github.com/jonas/tig.git
 - Gitter: https://gitter.im/jonas/tig
 - Q: http://stackoverflow.com/questions/tagged/tig

Release notes
-
Incompatibilities:

 - Note that all user-defined commands are now executed at the repository root
   instead of whatever subdirectory Tig was started in. (GH #412)
 - Remove `cmdline-args` option to avoid problems where setting it in `~/.tigrc`
   potentially breaks other views due to its "context-sensitive" nature, where
   a `git-log` option maybe cause `git-grep` to fail. (GH #431)

Improvements:

 - Use .mailmap to show canonical name and email addresses, off by default.
   Add `set mailmap = yes` to `~/.tigrc` to enable. (GH #411)
 - Highlight search results, configurable via `search-result` color. (GH #493)
 - Wrap around when searching, configurable via `wrap-search` setting.
 - Populate `%(file)` with file names from diff stat. (GH #404)
 - `tig --merge` implies `--boundary` similar to gitk.
 - Expose repository variables to external commands, e.g.
`%(repo:head)` gives the
   branch name of the current HEAD and `%(repo:cdup)` for the repo root path.
 - Add `make uninstall`. (GH #417)
 - Add ZSH completion file (based on Bash completion) (GH #433)
 - Expose the text of the currently selected line as the %(text) (GH #457)
 - Allow users to specify rev arguments to blame (GH #439)
 - Update OSX make config to find brew installed ncurses
 - Add sample git-flow keybinding (GH #421)
 - Add chocolate theme (GH #432)
 - Show stash diffs. (GH #328)
 - Make user tigrc location configurable. (GH #479)
 - Compact relative date display mode. (GH #331)
 - Add date column option controlling whether to show local date.
 - Move to parent commit in the main view. (GH #388)
 - Add `:goto ` prompt command to go to a `git-rev-parse`d revision, e.g.
   `:goto some/branch` or `:goto %(commit)^2`.
 - Respect the XDG standard for configuration files. (GH #513)
 - Resolve diff paths when `diff.noprefix` is true. (GH #487, #488)
 - Support for custom `strftime(3)` date formats, e.g.:

set main-view-date = custom
set main-view-date-format = "%Y-%m-%d"

Bug fixes:

 - Prevent staged rename from displaying unstaged changes (GH #472, #491)
 - Fix corrupt chunk header during staging of single lines. (GH #410)
 - Fix out of bounds read in graph-v2 module. (GH #402)
 - Add currently checked out branch to `%(branch)`. (GH #416)
 - Size diff stats correctly for split views.
 - Fix `git-worktree` support by using `git-show-ref`. (GH #437)
 - Add currently checked out branch to `%(branch)` (GH #416)
 - Fix segfault when hitting return in empty file search (GH #464)
 - Remove separator on horizontal split when switching from vertical split
 - Do not expand `--all` when parsing `%(revargs)` (GH #442, #462)
 - Fix exit when the main view is reloaded due to option toggling. (GH #470)
 - Expand all whitespace and control characters to spaces. (GH #485)
 - Restore ability to unbind a default keybinding with `none`. (GH #483)
 - Fix blob view to honor the `wrap-lines` setting.

Change summary
--
The diffstat and log summary for changes made in this release.

 .mailmap  |   3 +
 .travis.yml   |  36 +-
 INSTALL.adoc  |  15 +
 Makefile  |  29 +-
 NEWS.adoc |  65 ++-
 autogen.sh|   4 +-
 contrib/chocolate.theme.tigrc |  16 +
 contrib/config.make-Darwin|  15 +-
 contrib/git-flow.tigrc|  49 +++
 contrib/tig-completion.zsh|  21 +
 contrib/tig.spec.in   |   2 +-
 contrib/vim.tigrc |   3 +
 doc/manual.adoc   |  29 +-
 doc/tig.1.adoc|   9 +-
 doc/tigrc.5.adoc  |  40 +-
 include/tig/argv.h|  

Re: [PATCH v3 1/2] pack-objects: break delta cycles before delta-search phase

2016-08-10 Thread Jeff King
On Wed, Aug 10, 2016 at 01:17:22PM -0700, Junio C Hamano wrote:

> > Actually, skimming the sha1_file code, I am not 100% sure that we detect
> > cycles in OBJ_REF_DELTA (you cannot have cycles in OBJ_OFS_DELTA since
> > they always point backwards in the pack). But if that is the case, then
> > I think we should fix that, not worry about special-casing it here.
> 
> Yes, but sha1_file.c?  It is the reading side and it is too late if
> we notice a problem, I would think.

We already are covered on the writing side. That is what your code in
write_one() does. The reason to warn is on the reading side ("I fixed
this for you, but by the way, your existing packs were bogus").

But of more concern is whether read_sha1_file() would recurse
infinitely, which would be bad (though I do not think it would be a
feasible attack vector; index-pack already rejects such packs before
they are admitted to the repository).

> > + *   2. Updating our size; check_object() will have filled in the size of 
> > our
> > + *  delta, but a non-delta object needs it true size.
> 
> Excellent point.

I was not clever enough to think of it; the pack-objects code is filled
with nice assertions (Thanks, Nico!) that help out when you are stupid. :)

One thing to be careful of is that there are more things this
drop_reused_delta() should be doing. But I looked through the rest of
check_object() and could not find anything else.

> > +   case DFS_ACTIVE:
> > +   /*
> > +* We found a cycle that needs broken. It would be correct to
> > +* break any link in the chain, but it's convenient to
> > +* break this one.
> > +*/
> > +   drop_reused_delta(entry);
> > +   break;
> > +   }
> > +}
> 
> Do we need to do anything to the DFS state of an entry when
> drop_reused_delta() resets its other fields?

Good catch. It should be marked DONE after we have broken the delta. It
doesn't matter in practice, because...

> If we later find D that is directly based on A, wouldn't we end up
> visiting A and attempt to drop it again?  drop_reused_delta() is
> idempotent so there will be no data structure corruption, I think,
> but we can safely declare that the entry is now DONE after calling
> drop_reused_delta() on it (either in the function or in the caller
> after it calls the function), no?

I think the idempotency of drop_reused_delta() doesn't matter. When we
visit A again later, its "delta" field will be NULL, so we'll hit the
condition at the top of the function: this is a base object, mark DONE
and don't recurse.

So it's correct as-is, but I agree it feels weird that the DFS would end
with some objects potentially marked ACTIVE. Everything should be DONE
at the end.

> > +# Create a pack containing the the tree $1 and blob $1:file, with
> > +# the latter stored as a delta against $2:file.
> > +#
> > +# We convince pack-objects to make the delta in the direction of our 
> > choosing
> > +# by marking $2 as a preferred-base edge. That results in $1:file as a thin
> > +# delta, and index-pack completes it by adding $2:file as a base.
> 
> Tricky but clever and correct ;-)

Thanks, it took a long time to think up. ;)

I actually wish we had better tools for making fake packs. Something
where you could say "add A, then add B as a delta of A, then...".
Because you often have to jump through quite a few hoops to convince
pack-objects to generate the pack you want, and even some things are
impossible (for example, I would like to make a chain of 10 deltas; how
do I convince the delta search to put my objects in the right order?).

I tried briefly yesterday to convince pack-objects to just take a list
of objects and their deltas, but it got ugly very quickly.  I think we'd
be better off writing a new tool that happens to reuse some of the
formatting functions from pack-objects. But even then, we've got to
write an .idx, which means going through index-pack (which will complain
if we are writing bogus packs for testing odd situations), or we have to
keep a valid list of "struct object_entry" to feed to the idx writer.

So even that approach is not quite trivial.

> > +make_pack () {
> > +{
> > +echo "-$(git rev-parse $2)"
> 
> Is everybody's 'echo' happy with dash followed by unknown string?

I'd assume so because it will be "-", and I think echoes which
take options are careful about that.

Still, it would not be hard to tweak.

> > +echo "$(git rev-parse $1:dummy) dummy"
> > +echo "$(git rev-parse $1:file) file"
> > +} |
> > +git pack-objects --stdout |
> > +git index-pack --stdin --fix-thin
> 
> An alternative
> 
>   git pack-objects --stdout <<-EOF |
>   -$(git rev-parse $2)
> $(git rev-parse $1:dummy) dummy
> $(git rev-parse $1:file) file
>   EOF
> git index-pack --stdin --fix-thin
> 
> looks somewhat ugly, though.

Yeah, I think we would be better to just switch to printf if we want to
be 

Re: [PATCH v3 0/2] pack-objects mru

2016-08-10 Thread Jeff King
On Wed, Aug 10, 2016 at 09:47:52AM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> >> This is not new with this change, but I am not quite sure what in
> >> the current code prevents us from busting the delta limit for reused
> >> ones, though.
> >
> > I think in the current code you are limited by the depth you might find
> > in a single existing pack (since we never reuse cross-pack deltas).
> 
> Sorry for going deeper in the tangent, but I vaguely recall raising
> it long time ago as a potential issue that delta reuse out of an
> original pack created with deep delta chain may bust a delta chain
> limit when repacking with shorter delta chain limit; I just do not
> remember where that discussion went (i.e. decided to be a non-issue?
> we added code to avoid it? etc.)

Digging on the list and in the history, I found your e4c9327
(pack-objects: avoid delta chains that are too long., 2006-02-17). That
approach went away with Nico's 898b14c (pack-objects: rework
check_delta_limit usage, 2007-04-16), which I think is where we are at
today.

I found the patches for both on the list, but no interesting discussion.

It looks like 898b14c does the depth check dynamically in find_delta. So
we would not build on a too-long chain, but I do not see anything that
prevents us from creating a too-long chain purely out of reused deltas.

Which means that even without my patches, repacking with a shorter delta
chain does not guarantee you do not have a longer chain. And mine
introduces a potential "packs * max_depth" problem (I also think
check_delta_limit could recurse very deeply if given a pathologically
weird pack that has insane delta limits, but presumably we would just
run out of stack and crash, which seems like an OK outcome for such
maliciousness).

I guess it would not be that hard to break the reused chains as part of
the DFS search I introduced (we are recursing already; just stop
recursing and break when we hit the max-depth).

> > However, I think with cross-pack deltas, you could have a situation
> > like:
> >
> >   pack 1: A -> B -> C
> >   pack 2: C -> D -> E
> >
> > and pick A and B from the first pack, and C, D, and E from the second.
> > Then you end up with:
> >
> >   A -> B -> C -> D -> E
> >
> > in the output. IOW, I think the absolute worst case chain is the
> > max_depth times the number of packs.
> 
> Also if pack1 and pack2 were created with depth limit of 3 and we
> are repacking with depth limit of 2, then we are busting the limit
> already with or without cross-pack chaining, I would think.

Right, though that is at least bounded to "what you packed with before",
which people do not usually change (OTOH, we accept packs from random
strangers via the protocol).

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/8] xdl_change_compact(): fix compaction heuristic to adjust io

2016-08-10 Thread Jeff King
On Wed, Aug 10, 2016 at 06:58:06PM +0200, Michael Haggerty wrote:

> > And it looks like rchgo[io] always ends the loop on a 0. So it seems
> > like we would just hit that condition again.
> 
> Correct...in this loop. But there is another place where `io` is
> incremented unconditionally. In the version before my changes, it is via
> the preincrement operator in this while statement conditional:
> 
> https://github.com/mhagger/git/blob/a28705da929ad746abcb34270947f738549d3246/xdiff/xdiffi.c#L502
> 
> After my changes, the unconditional increment is more obvious because I
> took it out of the while condition:
> 
> https://github.com/mhagger/git/blob/39a135da93834fd72ee923d95d0cebfe525dfe7a/xdiff/xdiffi.c#L541
> 
> (BTW, I think this is a good example of how patch 2/8 makes the code
> easier to reason about.)

Ah, yeah, I totally missed that case (and I agree the code after your
2/8 makes it much more obvious).

> I didn't do the hard work to determine whether `io` could *really* walk
> off the end of the array, but I don't see an obvious reason why it
> *couldn't*.

Yeah, I'm in agreement.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4 1/2] pack-objects: break delta cycles before delta-search phase

2016-08-10 Thread Jeff King
On Thu, Aug 11, 2016 at 01:02:52AM -0400, Jeff King wrote:

> Yeah, I think we would be better to just switch to printf if we want to
> be careful.
> 
> I'll follow-up with a patch.

Here it is. It switches out the echos for printfs (which, ironically,
_do_ complain about the "-" argument if you do so naively, but of course
the right way is "printf '%s\n' -whatever").

It also sets the dfs_state flag to DONE after breaking a delta, as
discussed.

Patch 2/2 on top does not need any adjustment, so I won't bother sending
it.

-- >8 --
Subject: [PATCH] pack-objects: break delta cycles before delta-search phase

We do not allow cycles in the delta graph of a pack (i.e., A
is a delta of B which is a delta of A) for the obvious
reason that you cannot actually access any of the objects in
such a case.

There's a last-ditch attempt to notice cycles during the
write phase, during which we issue a warning to the user and
write one of the objects out in full. However, this is
"last-ditch" for two reasons:

  1. By this time, it's too late to find another delta for
 the object, so the resulting pack is larger than it
 otherwise could be.

  2. The warning is there because this is something that
 _shouldn't_ ever happen. If it does, then either:

   a. a pack we are reusing deltas from had its own
  cycle

   b. we are reusing deltas from multiple packs, and
  we found a cycle among them (i.e., A is a delta of
  B in one pack, but B is a delta of A in another,
  and we choose to use both deltas).

   c. there is a bug in the delta-search code

 So this code serves as a final check that none of these
 things has happened, warns the user, and prevents us
 from writing a bogus pack.

Right now, (2b) should never happen because of the static
ordering of packs in want_object_in_pack(). If two objects
have a delta relationship, then they must be in the same
pack, and therefore we will find them from that same pack.

However, a future patch would like to change that static
ordering, which will make (2b) a common occurrence. In
preparation, we should be able to handle those kinds of
cycles better. This patch does by introducing a
cycle-breaking step during the get_object_details() phase,
when we are deciding which deltas can be reused. That gives
us the chance to feed the objects into the delta search as
if the cycle did not exist.

We'll leave the detection and warning in the write_object()
phase in place, as it still serves as a check for case (2c).

This does mean we will stop warning for (2a). That case is
caused by bogus input packs, and we ideally would warn the
user about it.  However, since those cycles show up after
picking reusable deltas, they look the same as (2b) to us;
our new code will break the cycles early and the last-ditch
check will never see them.

We could do analysis on any cycles that we find to
distinguish the two cases (i.e., it is a bogus pack if and
only if every delta in the cycle is in the same pack), but
we don't need to. If there is a cycle inside a pack, we'll
run into problems not only reusing the delta, but accessing
the object data at all. So when we try to dig up the actual
size of the object, we'll hit that same cycle and kick in
our usual complain-and-try-another-source code.

Signed-off-by: Jeff King 
---
 builtin/pack-objects.c  |  84 +
 pack-objects.h  |   9 
 t/t5314-pack-cycle-detection.sh | 113 
 3 files changed, 206 insertions(+)
 create mode 100755 t/t5314-pack-cycle-detection.sh

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 4a63398..b8e4e41 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1495,6 +1495,83 @@ static int pack_offset_sort(const void *_a, const void 
*_b)
(a->in_pack_offset > b->in_pack_offset);
 }
 
+/*
+ * Drop an on-disk delta we were planning to reuse. Naively, this would
+ * just involve blanking out the "delta" field, but we have to deal
+ * with two extra pieces of book-keeping:
+ *
+ *   1. Removing ourselves from the delta_sibling linked list.
+ *
+ *   2. Updating our size; check_object() will have filled in the size of our
+ *  delta, but a non-delta object needs it true size.
+ */
+static void drop_reused_delta(struct object_entry *entry)
+{
+   struct object_entry **p = >delta->delta_child;
+   struct pack_window *w_curs = NULL;
+
+   while (*p) {
+   if (*p == entry)
+   *p = (*p)->delta_sibling;
+   else
+   p = &(*p)->delta_sibling;
+   }
+   entry->delta = NULL;
+
+   entry->size = get_size_from_delta(entry->in_pack, _curs,
+ entry->in_pack_offset + entry->in_pack_header_size);
+   unuse_pack(_curs);
+
+   /*
+* If we failed to get the size from this pack for whatever reason,
+ 

Re: [PATCH] checkout: do not mention detach advice for explicit --detach option

2016-08-10 Thread Junio C Hamano
Stefan Beller  writes:

> When a user asked for a detached HEAD specifically with `--detach`,
> we do not need to give advice on what a detached HEAD state entails as
> we can assume they know what they're getting into as they asked for it.

Makes sense; I agree that "Don't be noisy if you did exactly what
you were told to do." is a very sensible principle.

> Signed-off-by: Stefan Beller 
> ---
>  builtin/checkout.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/builtin/checkout.c b/builtin/checkout.c
> index 27c1a05..fa2dce5 100644
> --- a/builtin/checkout.c
> +++ b/builtin/checkout.c
> @@ -655,7 +655,7 @@ static void update_refs_for_switch(const struct 
> checkout_opts *opts,
>   update_ref(msg.buf, "HEAD", new->commit->object.oid.hash, NULL,
>  REF_NODEREF, UPDATE_REFS_DIE_ON_ERR);
>   if (!opts->quiet) {
> - if (old->path && advice_detached_head)
> + if (old->path && advice_detached_head && 
> !opts->force_detach)
>   detach_advice(new->name);
>   describe_detached_head(_("HEAD is now at"), 
> new->commit);
>   }
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in git.git (Aug 2016, #03; Mon, 8)

2016-08-10 Thread Stefan Beller
On Wed, Aug 10, 2016 at 10:30 AM, Lars Schneider
 wrote:
>
>>
>> * sb/submodule-update-dot-branch (2016-08-03) 7 commits
>>  (merged to 'next' on 2016-08-04 at 47bff41)
>> + submodule update: allow '.' for branch value
>> + submodule--helper: add remote-branch helper
>> + submodule-config: keep configured branch around
>> + submodule--helper: fix usage string for relative-path
>> + submodule update: narrow scope of local variable
>> + submodule update: respect depth in subsequent fetches
>> + t7406: future proof tests with hard coded depth
>>
>> A few updates to "git submodule update".
>>
>> Will merge to 'master'.
>
> I think "t7406: future proof tests with hard coded depth"
> breaks the tests on OSX:
>
> https://travis-ci.org/git/git/jobs/150779244
>
> Cheers,
> Lars
>


error: pathspec '4' did not match any file(s) known to git.

not ok 46 - submodule update clone shallow submodule

#
# test_when_finished "rm -rf super3" &&
# first=$(git -C cloned submodule status submodule |cut -c2-41) &&
# second=$(git -C submodule rev-parse HEAD) &&
# commit_count=$(git -C submodule rev-list $first^..$second | wc -l) &&
# git clone cloned super3 &&
# pwd=$(pwd) &&
# (
# cd super3 &&
# sed -e "s#url = ../#url = file://$pwd/#" <.gitmodules >.gitmodules.tmp &&
# mv -f .gitmodules.tmp .gitmodules &&
# git submodule update --init --depth=$commit_count &&
# test 1 = $(git -C submodule log --oneline | wc -l)
# )
#


Is it possible that the "wc -l" produces  SP  on OSX,
such that the

# git submodule update --init --depth=$commit_count

contains "--depth= 4" which means empty depth and 4 as the pathspec
for the update command?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: What's cooking in git.git (Aug 2016, #03; Mon, 8)

2016-08-10 Thread Junio C Hamano
Lars Schneider  writes:

> I think "t7406: future proof tests with hard coded depth"
> breaks the tests on OSX:
>
> https://travis-ci.org/git/git/jobs/150779244

Thanks.  "| wc -l" can probably be replaced with "--count",
if the upstream is "git rev-list".
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 0/2] pack-objects mru

2016-08-10 Thread Junio C Hamano
Jeff King  writes:

>> This is not new with this change, but I am not quite sure what in
>> the current code prevents us from busting the delta limit for reused
>> ones, though.
>
> I think in the current code you are limited by the depth you might find
> in a single existing pack (since we never reuse cross-pack deltas).

Sorry for going deeper in the tangent, but I vaguely recall raising
it long time ago as a potential issue that delta reuse out of an
original pack created with deep delta chain may bust a delta chain
limit when repacking with shorter delta chain limit; I just do not
remember where that discussion went (i.e. decided to be a non-issue?
we added code to avoid it? etc.)

> However, I think with cross-pack deltas, you could have a situation
> like:
>
>   pack 1: A -> B -> C
>   pack 2: C -> D -> E
>
> and pick A and B from the first pack, and C, D, and E from the second.
> Then you end up with:
>
>   A -> B -> C -> D -> E
>
> in the output. IOW, I think the absolute worst case chain is the
> max_depth times the number of packs.

Also if pack1 and pack2 were created with depth limit of 3 and we
are repacking with depth limit of 2, then we are busting the limit
already with or without cross-pack chaining, I would think.

> I'm not sure how much we should be worried about it. We could fill in
> the depth values when adding a reused delta, but I don't think we have
> the number handy; we'd have to actually walk the chain (though with
> delta-base-offset, that is reasonably cheap).

True.  It is something we may want to keep back in our mind and
revisit later.  It is definitely not a low-hanging fruit, but
something that should go to the leftover-bits list.

> The second patch is the same as before, though I tweaked the commit
> message a bit, so please replace what is at the tip of
> jk/pack-objects-optim-mru.
>
>   [1/2]: pack-objects: break delta cycles before delta-search phase
>   [2/2]: pack-objects: use mru list when iterating over packs

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 04/15] pkt-line: add packet_write_gently()

2016-08-10 Thread Jeff King
On Wed, Aug 10, 2016 at 03:36:45PM +0200, Lars Schneider wrote:

> > So now we have packet_write() and packet_write_gently(), but they differ
> > in more than just whether they are gentle. That seems like a weird
> > interface.
> > 
> > Should we either be picking a new name (e.g., packet_write_mem() or
> > something), or migrating packet_write() to packet_write_fmt()?
> 
> Done in "[PATCH v5 08/15] pkt-line: rename packet_write() to 
> packet_write_fmt()"

Ah, OK. Generally I'd suggest to reorder things so that each patch looks
like a step forward (and so the early patches become preparatory steps,
and the justification in them is something like "we're going to add more
write functions, so let's give this a more descriptive name").

> I agree. In a later patch I am using PKTLINE_DATA_MAXLEN inside pkt-line.c,
> too. I will change it to your suggestion.
> 
> For now I would remove PKTLINE_DATA_MAXLEN because it should be an 
> implementation
> detail of pkt-line.c (plus it is not used by anyone).

Sounds reasonable.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/2] pack-objects mru

2016-08-10 Thread Jeff King
On Tue, Aug 09, 2016 at 03:29:33PM -0700, Junio C Hamano wrote:

> > @@ -1516,6 +1577,13 @@ static void get_object_details(void)
> > entry->no_try_delta = 1;
> > }
> >  
> > +   /*
> > +* This must happen in a second pass, since we rely on the delta
> > +* information for the whole list being completed.
> > +*/
> > +   for (i = 0; i < to_pack.nr_objects; i++)
> > +   break_delta_cycles(_pack.objects[i]);
> > +
> > free(sorted_by_offset);
> >  }
> 
> A potential cycle can only come from reusing deltas across packs in
> an unstable order, that happens way before we do the find_delta()
> thing, so this is a good place to have the new call.  While reading
> break_delta_cycles(), I was wondering if what it does is safe under
> multi-threading but there is no need to worry.

It definitely is not multi-threaded safe (and I'm not sure it could
easily be made so). But yeah, it is quick and happens as part of the
get_object_details() look at the objects, which is where we make
decisions about delta reuse.

> The recursiveness of break-delta-cycles is not too bad for the same
> reason why it is OK to recurse in check_delta_limit(), I would guess?

Yes, and for the same reason that it is OK in write_one(); the recursion
is limited by the depth of the delta chains.

> This is not new with this change, but I am not quite sure what in
> the current code prevents us from busting the delta limit for reused
> ones, though.

I think in the current code you are limited by the depth you might find
in a single existing pack (since we never reuse cross-pack deltas).
There is a comment in check_object() that claims:

* Depth value does not matter - find_deltas() will
* never consider reused delta as the base object to
* deltify other objects against, in order to avoid
* circular deltas.

but I do not recall seeing code to enforce that (but presumably that is
just beacuse it is a corner of the code I have not seen yet).

However, I think with cross-pack deltas, you could have a situation
like:

  pack 1: A -> B -> C
  pack 2: C -> D -> E

and pick A and B from the first pack, and C, D, and E from the second.
Then you end up with:

  A -> B -> C -> D -> E

in the output. IOW, I think the absolute worst case chain is the
max_depth times the number of packs. In practice I'd expect it to be
much smaller.

I'm not sure how much we should be worried about it. We could fill in
the depth values when adding a reused delta, but I don't think we have
the number handy; we'd have to actually walk the chain (though with
delta-base-offset, that is reasonably cheap).

> > I think my preference is to clean up the "hacky" bit of this patch, and
> > then apply the earlier MRU patch on top of it (which takes my repack
> > from 44 minutes to 5 minutes for this particular test set).
> 
> Yup, with something like this to break the delta chain _and_ allow
> an object to go through the usual deltify machinery, I'd say the MRU
> patch is a wonderful thing to have.

Here are cleaned-up patches. The cycle-breaking patch is cleaned up and
has tests. I erred on the side of verbosity in the comments there,
because it literally took me hours to come up with a reliable working
set of commands (and a convincing argument that it's correct). Hopefully
it doesn't put anyone to sleep. :)

The second patch is the same as before, though I tweaked the commit
message a bit, so please replace what is at the tip of
jk/pack-objects-optim-mru.

  [1/2]: pack-objects: break delta cycles before delta-search phase
  [2/2]: pack-objects: use mru list when iterating over packs

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 01/15] pkt-line: extract set_packet_header()

2016-08-10 Thread larsxschneider
From: Lars Schneider 

set_packet_header() converts an integer to a 4 byte hex string. Make
this function locally available so that other pkt-line functions can
use it.

Signed-off-by: Lars Schneider 
---
 pkt-line.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/pkt-line.c b/pkt-line.c
index 62fdb37..177dc73 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -97,10 +97,19 @@ void packet_buf_flush(struct strbuf *buf)
strbuf_add(buf, "", 4);
 }
 
-#define hex(a) (hexchar[(a) & 15])
-static void format_packet(struct strbuf *out, const char *fmt, va_list args)
+static void set_packet_header(char *buf, const int size)
 {
static char hexchar[] = "0123456789abcdef";
+   #define hex(a) (hexchar[(a) & 15])
+   buf[0] = hex(size >> 12);
+   buf[1] = hex(size >> 8);
+   buf[2] = hex(size >> 4);
+   buf[3] = hex(size);
+   #undef hex
+}
+
+static void format_packet(struct strbuf *out, const char *fmt, va_list args)
+{
size_t orig_len, n;
 
orig_len = out->len;
@@ -111,10 +120,7 @@ static void format_packet(struct strbuf *out, const char 
*fmt, va_list args)
if (n > LARGE_PACKET_MAX)
die("protocol error: impossibly long line");
 
-   out->buf[orig_len + 0] = hex(n >> 12);
-   out->buf[orig_len + 1] = hex(n >> 8);
-   out->buf[orig_len + 2] = hex(n >> 4);
-   out->buf[orig_len + 3] = hex(n);
+   set_packet_header(>buf[orig_len], n);
packet_trace(out->buf + orig_len + 4, n - 4, 1);
 }
 
-- 
2.9.2

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v10 33/40] environment: add set_index_file()

2016-08-10 Thread Christian Couder
On Tue, Aug 9, 2016 at 12:13 AM, Junio C Hamano  wrote:
> Christian Couder  writes:
>
>> Now if someone really needs to use this new function, it should be
>> used like this:
>>
>> /* Save current index file */
>> old_index_file = get_index_file();
>> set_index_file((char *)tmp_index_file);
>>
>> /* Do stuff that will use tmp_index_file as the index file */
>> ...
>>
>> /* When finished reset the index file */
>> set_index_file(old_index_file);
>>
>> It is intended to be used by builtins commands, in fact only `git am`,
>> to temporarily change the index file used by libified code.
>>
>> This is useful when libified code uses the global index, but a builtin
>> command wants another index file to be used instead.
>
> That is OK, but I do not think NO_THE_INDEX_COMPATIBILITY_MACROS has
> much to do with this hack.  Even if you stop using the_index and
> have the caller pass its own temporary index_state, that structure
> does *not* know which file to read the (temporary) index from, or
> which file to write the (temporary) index to.  In fact, apply.c
> already does this in build_fake_ancestor():
>
> static int build_fake_ancestor(struct patch *list, const char *filename)
> {
> ...
> hold_lock_file_for_update(, filename, LOCK_DIE_ON_ERROR);
> res = write_locked_index(, , COMMIT_LOCK);
> ...
> }
>
> As you can see, this function works with a non-standard/default
> index file _without_ having to use non-default index_state.  What
> the set_index_file() hack allows you to do is to use interface that
> does *NOT* pass "filename" like the caller does to this function.
>
> Isn't the mention on NO_THE_INDEX_COMPATIBILITY_MACROS in the added
> comments (there are two) pure red-herring?

Yeah, true.

So do you want me to refactor the code to use
hold_lock_file_for_update() instead of hold_locked_index() and to
avoid the set_index_file() hack?

Or would the set_index_file() hack be ok with a commit message like
the following:

---
Introduce set_index_file() to be able to temporarily change the
index file.

Yeah, this is a short cut and this new function should not be used
by other code.

It adds a small technical debt, that could perhaps be avoided with a
refactoring and by using hold_lock_file_for_update() instead of
hold_locked_index() to pass the filename where the index should be
written.

Now if someone really needs to use this new function, it should be
used like this:

/* Save current index file */
old_index_file = get_index_file();
set_index_file((char *)tmp_index_file);

/* Do stuff that will use tmp_index_file as the index file */
...

/* When finished reset the index file */
set_index_file(old_index_file);

It is intended to be used by builtins commands, in fact only `git am`,
to temporarily change the index file used by libified code.

This is useful when libified code uses the global index, but a builtin
command wants another index file to be used instead.
---

And with comments like this on top of set_index_file() definition and
declaration:

/*
 * Hack to temporarily change the index.
 * Yeah, the libification of 'apply' took a short-circuit by adding
 * this technical debt.
 * Please set the filename using for example hold_lock_file_for_update(),
 * instead of this function.
 * If you really need to use this function, please save the current
 * index file using get_index_file() before changing the index
 * file. And when finished, reset it to the saved value.
 */

?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 00/15] Git filter protocol

2016-08-10 Thread larsxschneider
From: Lars Schneider 

Hi all,

## Notable changes since v4
* drop the shutdown capability
* add error-all response to signal that the filter does not want to filter 
anymore
* extend init sequence to negotiate version number and capabilities
  (plus detect wrongly configured version 1 filters)
* improve status response format according to Peff's suggestions
* fix Git for Window support


## Patches
* 01-09: Make pktline ready to be used for filters.
* 01 - @Junio: I know you told me twice that you don't like this patch but
   please look closely: the patch does not add a new interface to
   pktline. It is a private functions used 03.
* 08 - This patch changes the pktline interface as it renames packet_write() to
   packet_write_fmt() as suggested by Junio.
   (This patch is not required for the the series and could be dropped)
* 10-13: Prepare convert code.
*14: Main contribution.
*15: Proposed fix to make the long running filters work on Windows. See
 the discussion here:
 
https://github.com/git-for-windows/git/issues/770#issuecomment-238361829
 (This patch is not required in Git core and could be dropped)


## Jakub
* http://public-inbox.org/git/607c07fe-5b6f-fd67-13e1-705020c267ee%40gmail.com/
  * make Git start the handshake with the filter to detect one-shot filter
mis configured as process-filter
  * filter driver refusing to filter further

* http://public-inbox.org/git/e8b550ed-1765-764f-49e5-72e5a609d936%40gmail.com/
  * improve commit messages

* http://public-inbox.org/git/0b7d7d96-dfdc-54a4-2c24-2aead6743ae1%40gmail.com/
  * handle sigpipe correctly
  * use cp instead of cat where appropriate
  * explicitly name the test with a binary file (containing \0)


## Junio
* http://public-inbox.org/git/xmqq8twd8uld.fsf%40gitster.mtv.corp.google.com/
  * shorten variable names
  * do not initialize statics explicitly to 0
  * no // comments
  * replace "ret" with "!error"

* 
http://public-inbox.org/git/CAPc5daV3Tke4qHjtpri%3D6QCRaOax_K3uYhpFzRcd271%3DGHj1%2BQ%40mail.gmail.com/
  * rename packet_write() to packet_write_fmt()


## Peff
* 
http://public-inbox.org/git/20160803224619.bwtbvmslhuicx2qi%40sigill.intra.peff.net/
  * change capabilities format

* 
http://public-inbox.org/git/20160808150255.2otm3z5fluimpiqw%40sigill.intra.peff.net/
  * change status response format


## Eric
* http://public-inbox.org/git/20160805185559.GB463%40starla/
  * die if larger-than-4GB files are processed on 32-bit


Thanks a lot for the valuable reviews!

Best,
Lars

You can find a branch with the source on GitHub here:
https://github.com/larsxschneider/git/tree/protocol-filter/v5

I also rebased the series on top of the latest Git for Windows master:
https://github.com/larsxschneider/git/tree/protocol-filter/v5-win


Lars Schneider (15):
  pkt-line: extract set_packet_header()
  pkt-line: call packet_trace() only if a packet is actually send
  pkt-line: add `gentle` parameter to format_packet()
  pkt-line: add packet_write_gently()
  pkt-line: add packet_write_gently_fmt()
  pkt-line: add packet_flush_gently()
  pkt-line: add functions to read/write flush terminated packet streams
  pkt-line: rename packet_write() to packet_write_fmt()
  pack-protocol: fix maximum pkt-line size
  convert: quote filter names in error messages
  convert: modernize tests
  convert: generate large test files only once
  convert: make apply_filter() adhere to standard Git error handling
  convert: add filter..process option
  read-cache: make sure file handles are not inherited by child
processes

 Documentation/gitattributes.txt | 139 +++-
 Documentation/technical/protocol-common.txt |   6 +-
 builtin/archive.c   |   4 +-
 builtin/receive-pack.c  |   4 +-
 builtin/remote-ext.c|   4 +-
 builtin/upload-archive.c|   4 +-
 connect.c   |   2 +-
 convert.c   | 367 ++---
 daemon.c|   2 +-
 http-backend.c  |   2 +-
 pkt-line.c  | 155 -
 pkt-line.h  |  12 +-
 read-cache.c|   2 +-
 shallow.c   |   2 +-
 t/t0021-conversion.sh   | 482 +---
 t/t0021/rot13-filter.pl | 176 ++
 unpack-trees.c  |   1 +
 upload-pack.c   |  30 +-
 18 files changed, 1267 insertions(+), 127 deletions(-)
 create mode 100755 t/t0021/rot13-filter.pl

--
2.9.2

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] t7406: fix breakage on OSX

2016-08-10 Thread Stefan Beller
On Wed, Aug 10, 2016 at 11:27 AM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> On OSX `wc` prefixes the output of numbers with whitespace, such that
>> the `commit_count` would be "SP ". When using that in
>>
>> git submodule update --init --depth=$commit_count
>>
>> the depth would be empty and the number is interpreted as the pathspec.
>> Fix this by not using `wc` and rather instruct rev-list to count.
>>
>> Another way to fix this is to remove the `=` sign after the `--depth`
>> argument as then we are allowed to have more than just one whitespace
>> between `--depth` and the actual number. Prefer the solution of rev-list
>> counting as that is expected to be slightly faster and more self-sustained
>> within Git.
>
> You meant self-contained, I would guess.

Yes. Mind to fix that locally, or waiting for a resend?

>
> There are a couple of "log --oneline | wc -l" remaining that are
> currently safe but they may be a time-bomb waiting to go off.

$ grep -r "log --oneline | wc -l"
t6050-replace.sh: test $(git log --oneline | wc -l) = 7 &&
t6050-replace.sh: test $(git log --oneline | wc -l) = 3 &&
t7406-submodule-update.sh: test 1 = $(git -C submodule log --oneline | wc -l)
t7406-submodule-update.sh: test 1 = $(git -C submodule log --oneline | wc -l)
t7400-submodule-basic.sh: test 1 = $(git log --oneline | wc -l)

All of the occurrences are not white space sensitive AFAICT,
they are just bad examples, which may inspire others to follow
that pattern.

Thanks,
Stefan
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 02/15] pkt-line: call packet_trace() only if a packet is actually send

2016-08-10 Thread Jeff King
On Wed, Aug 10, 2016 at 03:24:38PM +0200, Lars Schneider wrote:

> > On Wed, Aug 10, 2016 at 03:03:58PM +0200, larsxschnei...@gmail.com wrote:
> > 
> >> From: Lars Schneider 
> >> 
> >> The packet_trace() call is not ideal in format_packet() as we would print
> >> a trace when a packet is formatted and (potentially) when the packet is
> >> actually send. This was no problem up until now because format_packet()
> >> was only used by one function. Fix it by moving the trace call into the
> >> function that actually sends the packet.
> > 
> > It looks like there are two functions: packet_write() and
> > packet_buf_write().
> 
> I did not call trace in packet_buf_write() because this function does not
> perform any writes.

Yes, but then who is responsible for the trace? The caller?

And why is it a bad thing to do it some time other than writing? It is
if you format and then _don't_ write the packet, but the current callers
are not doing that.

> > Your patch only touches one of them, and it looks like we would fail to
> > trace many packets (e.g., see receive-pack.c:report(), which uses
> > packet_buf_write() and then write()s out the result).
> 
> I see. But isn't it confusing if packet_buf_write() issues a trace call?
> If I just call this function then nothing happens at all. Shouldn't the
> trace call be made in receive-pack.c:report() ? Or shouldn't receive-pack
> let pkt-line.c perform the write calls?

How would report() do that without re-parsing each of the packets?

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/2] pack-objects: break delta cycles before delta-search phase

2016-08-10 Thread Jeff King
We do not allow cycles in the delta graph of a pack (i.e., A
is a delta of B which is a delta of A) for the obvious
reason that you cannot actually access any of the objects in
such a case.

There's a last-ditch attempt to notice cycles during the
write phase, during which we issue a warning to the user and
write one of the objects out in full. However, this is
"last-ditch" for two reasons:

  1. By this time, it's too late to find another delta for
 the object, so the resulting pack is larger than it
 otherwise could be.

  2. The warning is there because this is something that
 _shouldn't_ ever happen. If it does, then either:

   a. a pack we are reusing deltas from had its own
  cycle

   b. we are reusing deltas from multiple packs, and
  we found a cycle among them (i.e., A is a delta of
  B in one pack, but B is a delta of A in another,
  and we choose to use both deltas).

   c. there is a bug in the delta-search code

 So this code serves as a final check that none of these
 things has happened, warns the user, and prevents us
 from writing a bogus pack.

Right now, (2b) should never happen because of the static
ordering of packs in want_object_in_pack(). If two objects
have a delta relationship, then they must be in the same
pack, and therefore we will find them from that same pack.

However, a future patch would like to change that static
ordering, which will make (2b) a common occurrence. In
preparation, we should be able to handle those kinds of
cycles better. This patch does by introducing a
cycle-breaking step during the get_object_details() phase,
when we are deciding which deltas can be reused. That gives
us the chance to feed the objects into the delta search as
if the cycle did not exist.

We'll leave the detection and warning in the write_object()
phase in place, as it still serves as a check for case (2c).

This does mean we will stop warning for (2a). That case is
caused by bogus input packs, and we ideally would warn the
user about it.  However, since those cycles show up after
picking reusable deltas, they look the same as (2b) to us;
our new code will break the cycles early and the last-ditch
check will never see them.

We could do analysis on any cycles that we find to
distinguish the two cases (i.e., it is a bogus pack if and
only if every delta in the cycle is in the same pack), but
we don't need to. If there is a cycle inside a pack, we'll
run into problems not only reusing the delta, but accessing
the object data at all. So when we try to dig up the actual
size of the object, we'll hit that same cycle and kick in
our usual complain-and-try-another-source code.

Signed-off-by: Jeff King 
---
Actually, skimming the sha1_file code, I am not 100% sure that we detect
cycles in OBJ_REF_DELTA (you cannot have cycles in OBJ_OFS_DELTA since
they always point backwards in the pack). But if that is the case, then
I think we should fix that, not worry about special-casing it here.

 builtin/pack-objects.c  |  83 +
 pack-objects.h  |   9 
 t/t5314-pack-cycle-detection.sh | 113 
 3 files changed, 205 insertions(+)
 create mode 100755 t/t5314-pack-cycle-detection.sh

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 4a63398..3e30eae 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1495,6 +1495,82 @@ static int pack_offset_sort(const void *_a, const void 
*_b)
(a->in_pack_offset > b->in_pack_offset);
 }
 
+/*
+ * Drop an on-disk delta we were planning to reuse. Naively, this would
+ * just involve blanking out the "delta" field, but we have to deal
+ * with two extra pieces of book-keeping:
+ *
+ *   1. Removing ourselves from the delta_sibling linked list.
+ *
+ *   2. Updating our size; check_object() will have filled in the size of our
+ *  delta, but a non-delta object needs it true size.
+ */
+static void drop_reused_delta(struct object_entry *entry)
+{
+   struct object_entry **p = >delta->delta_child;
+   struct pack_window *w_curs = NULL;
+
+   while (*p) {
+   if (*p == entry)
+   *p = (*p)->delta_sibling;
+   else
+   p = &(*p)->delta_sibling;
+   }
+   entry->delta = NULL;
+
+   entry->size = get_size_from_delta(entry->in_pack, _curs,
+ entry->in_pack_offset + entry->in_pack_header_size);
+   unuse_pack(_curs);
+
+   /*
+* If we failed to get the size from this pack for whatever reason,
+* fall back to sha1_object_info, which may find another copy. And if
+* that fails, the error will be recorded in entry->type and dealt
+* with in prepare_pack().
+*/
+   if (entry->size == 0)
+   entry->type = sha1_object_info(entry->idx.sha1, >size);
+}
+
+/*
+ * Follow the chain of 

Re: t0027 racy?

2016-08-10 Thread Johannes Schindelin
Hi Torsten,

On Tue, 9 Aug 2016, Torsten Bögershausen wrote:

> >   git -c core.autocrlf=$crlf add $fname >"${pfx}_$f.err" 2>&1
> > 
> > would make more sense. We _know_ that we have to do convert_to_git() in
> > that step because the content is changed. And then you can ignore the
> > warnings from "git commit" (which are racy), or you can simply commit as
> > a whole later, as some other loops do.
> > 
> > But like Dscho, I do not actually understand what this test is checking.
> > The function is called commit_chk_wrnNNO(), so perhaps you really are
> > interested in what "commit" has to say. But IMHO that is not an
> > interesting test. We know that if it has to read the content from disk,
> > it will call convert_to_git(), which is the exact same code path used by
> > "git add". So I do not understand what it is accomplishing to make a
> > commit at all here.
> 
> It seems as if the test has been written without understanding the raciness.
> It should commit files with different line endings on top of
> a file with mixed line endings.
> The warning should be checked (and here "git add" can be used,
> or the file can be commited directly).
> I'm not sure why the test ended up in doing both.
> 
> However, doing it the right way triggers a bug in convert.c,
> (some warnings are missing, so I need some days to come up
> with a proper patch)

FWIW I would strongly prefer to use the warning of `git add` and not even
bother with `git commit`. What we are interested in is the warning
message, generated by convert_to_git(). Not using the first one and
triggering a second one merely adds unnecessary churn that increases the
CO2 budget of running the test.

On that matter, I wonder whether there would be a chance to revamp t0027
in a major way, with the following goals:

- to make it very obvious to the casual reader what is being tested

- to combine Git invocations when possible, e.g. running one big `git add`
  on a couple of files and then verify the relevant parts of the output

- dramatically decreasing the time required to run the test, without
  sacrificing correctness (I would wager a bet that not only a few of
  those 1388 test cases essentially exercise identical code paths)

Ciao,
Dscho

Re: [PATCH] t7406: fix breakage on OSX

2016-08-10 Thread Lars Schneider

> On 10 Aug 2016, at 19:56, Stefan Beller  wrote:
> 
> On OSX `wc` prefixes the output of numbers with whitespace, such that
> the `commit_count` would be "SP ". When using that in
> 
>git submodule update --init --depth=$commit_count
> 
> the depth would be empty and the number is interpreted as the pathspec.
> Fix this by not using `wc` and rather instruct rev-list to count.
> 
> Another way to fix this is to remove the `=` sign after the `--depth`
> argument as then we are allowed to have more than just one whitespace
> between `--depth` and the actual number. Prefer the solution of rev-list
> counting as that is expected to be slightly faster and more self-sustained
> within Git.
> 
> Reported-by: Lars Schneider 
> Helped-by: Junio C Hamano ,
> Signed-off-by: Stefan Beller 
> ---
> 
>  origin/sb/submodule-update-dot-branch
> 
> t/t7406-submodule-update.sh | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
> index d7983cf..64f322c 100755
> --- a/t/t7406-submodule-update.sh
> +++ b/t/t7406-submodule-update.sh
> @@ -877,7 +877,7 @@ test_expect_success 'submodule update clone shallow 
> submodule' '
>   test_when_finished "rm -rf super3" &&
>   first=$(git -C cloned submodule status submodule |cut -c2-41) &&
>   second=$(git -C submodule rev-parse HEAD) &&
> - commit_count=$(git -C submodule rev-list $first^..$second | wc -l) &&
> + commit_count=$(git -C submodule rev-list --count $first^..$second) &&
>   git clone cloned super3 &&
>   pwd=$(pwd) &&
>   (
> -- 
> 2.9.2.665.gdb8bb2f
> 

Ack.

- Lars

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 12/15] convert: generate large test files only once

2016-08-10 Thread larsxschneider
From: Lars Schneider 

Generate more interesting large test files with pseudo random characters
in between and reuse these test files in multiple tests. Run tests formerly
marked as EXPENSIVE every time but with a smaller data set.

Signed-off-by: Lars Schneider 
---
 t/t0021-conversion.sh | 48 ++--
 1 file changed, 38 insertions(+), 10 deletions(-)

diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index 7b45136..34c8eb9 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -4,6 +4,15 @@ test_description='blob conversion via gitattributes'
 
 . ./test-lib.sh
 
+if test_have_prereq EXPENSIVE
+then
+   T0021_LARGE_FILE_SIZE=2048
+   T0021_LARGISH_FILE_SIZE=100
+else
+   T0021_LARGE_FILE_SIZE=30
+   T0021_LARGISH_FILE_SIZE=2
+fi
+
 cat test.i &&
git add test test.t test.i &&
rm -f test test.t test.i &&
-   git checkout -- test test.t test.i
+   git checkout -- test test.t test.i &&
+
+   mkdir generated-test-data &&
+   for i in $(test_seq 1 $T0021_LARGE_FILE_SIZE)
+   do
+   RANDOM_STRING="$(test-genrandom end $i | tr -dc "A-Za-z0-9" )"
+   ROT_RANDOM_STRING="$(echo $RANDOM_STRING | ./rot13.sh )"
+   # Generate 1MB of empty data and 100 bytes of random characters
+   # printf "$(test-genrandom start $i)"
+   printf "%1048576d" 1 >>generated-test-data/large.file &&
+   printf "$RANDOM_STRING" >>generated-test-data/large.file &&
+   printf "%1048576d" 1 >>generated-test-data/large.file.rot13 &&
+   printf "$ROT_RANDOM_STRING" 
>>generated-test-data/large.file.rot13 &&
+
+   if test $i = $T0021_LARGISH_FILE_SIZE
+   then
+   cat generated-test-data/large.file 
>generated-test-data/largish.file &&
+   cat generated-test-data/large.file.rot13 
>generated-test-data/largish.file.rot13
+   fi
+   done
 '
 
 script='s/^\$Id: \([0-9a-f]*\) \$/\1/p'
@@ -199,9 +227,9 @@ test_expect_success 'required filter clean failure' '
 test_expect_success 'filtering large input to small output should use little 
memory' '
test_config filter.devnull.clean "cat >/dev/null" &&
test_config filter.devnull.required true &&
-   for i in $(test_seq 1 30); do printf "%1048576d" 1; done >30MB &&
-   echo "30MB filter=devnull" >.gitattributes &&
-   GIT_MMAP_LIMIT=1m GIT_ALLOC_LIMIT=1m git add 30MB
+   cp generated-test-data/large.file large.file &&
+   echo "large.file filter=devnull" >.gitattributes &&
+   GIT_MMAP_LIMIT=1m GIT_ALLOC_LIMIT=1m git add large.file
 '
 
 test_expect_success 'filter that does not read is fine' '
@@ -214,15 +242,15 @@ test_expect_success 'filter that does not read is fine' '
test_cmp expect actual
 '
 
-test_expect_success EXPENSIVE 'filter large file' '
+test_expect_success 'filter large file' '
test_config filter.largefile.smudge cat &&
test_config filter.largefile.clean cat &&
-   for i in $(test_seq 1 2048); do printf "%1048576d" 1; done >2GB &&
-   echo "2GB filter=largefile" >.gitattributes &&
-   git add 2GB 2>err &&
+   echo "large.file filter=largefile" >.gitattributes &&
+   cp generated-test-data/large.file large.file &&
+   git add large.file 2>err &&
test_must_be_empty err &&
-   rm -f 2GB &&
-   git checkout -- 2GB 2>err &&
+   rm -f large.file &&
+   git checkout -- large.file 2>err &&
test_must_be_empty err
 '
 
-- 
2.9.2

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 05/15] pkt-line: add packet_write_gently_fmt()

2016-08-10 Thread Jeff King
On Wed, Aug 10, 2016 at 03:04:01PM +0200, larsxschnei...@gmail.com wrote:

> +int packet_write_gently_fmt(int fd, const char *fmt, ...)
> +{
> + static struct strbuf buf = STRBUF_INIT;
> + va_list args;
> +
> + strbuf_reset();
> + va_start(args, fmt);
> + format_packet(1, , fmt, args);
> + va_end(args);
> + packet_trace(buf.buf + 4, buf.len - 4, 1);
> + return (write_in_full(fd, buf.buf, buf.len) == buf.len ? 0 : -1);
> +}

Could the end of this function just be:

  return packet_write_gently(fd, buf.buf, buf.len);

? I guess we'd prefer to avoid that, because it incurs an extra
memmove() of the data.

Similarly, I'd think this could share code with the non-gentle form
(which should be able to just call this and die() if returns an error).
Though sometimes the va_list transformation makes that awkward.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 02/15] pkt-line: call packet_trace() only if a packet is actually send

2016-08-10 Thread Lars Schneider

> On 10 Aug 2016, at 15:30, Jeff King  wrote:
> 
> On Wed, Aug 10, 2016 at 03:24:38PM +0200, Lars Schneider wrote:
> 
>>> On Wed, Aug 10, 2016 at 03:03:58PM +0200, larsxschnei...@gmail.com wrote:
>>> 
 From: Lars Schneider 
 
 The packet_trace() call is not ideal in format_packet() as we would print
 a trace when a packet is formatted and (potentially) when the packet is
 actually send. This was no problem up until now because format_packet()
 was only used by one function. Fix it by moving the trace call into the
 function that actually sends the packet.
>>> 
>>> It looks like there are two functions: packet_write() and
>>> packet_buf_write().
>> 
>> I did not call trace in packet_buf_write() because this function does not
>> perform any writes.
> 
> Yes, but then who is responsible for the trace? The caller?

>From my point of view the one that issues the write call.


> And why is it a bad thing to do it some time other than writing? It is
> if you format and then _don't_ write the packet, but the current callers
> are not doing that.

True, they don't do that. However, I don't think that is intuitive behavior
for future callers. If I call "format" and then trace tells me that a packet
was sent although it was not (yet).

> 
>>> Your patch only touches one of them, and it looks like we would fail to
>>> trace many packets (e.g., see receive-pack.c:report(), which uses
>>> packet_buf_write() and then write()s out the result).
>> 
>> I see. But isn't it confusing if packet_buf_write() issues a trace call?
>> If I just call this function then nothing happens at all. Shouldn't the
>> trace call be made in receive-pack.c:report() ? Or shouldn't receive-pack
>> let pkt-line.c perform the write calls?
> 
> How would report() do that without re-parsing each of the packets?

We could add a function to pkt-line that takes a list of strings and
generates a list of packets with a terminating flush packet out of it.
Then it sends the packets.

As a quick compromise we could also introduce a function 
"packet_buf_write_with_trace()"
that wraps the "packet_buf_write()" and calls trace. However, I can imagine that
would be perceived as ugly.

I guess my point is that I stumbled over the un-intutiive format_packet() 
behavior
and I wanted to improve the situation in a way that others don't run into this
trap. If you think that is no issue then it would be OK for me if we leave the
current behavior as is.

Thanks,
Lars




--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 2/2] pack-objects: use mru list when iterating over packs

2016-08-10 Thread Jeff King
In the original implementation of want_object_in_pack(), we
always looked for the object in every pack, so the order did
not matter for performance.

As of the last few patches, however, we can now often break
out of the loop early after finding the first instance, and
avoid looking in the other packs at all. In this case, pack
order can make a big difference, because we'd like to find
the objects by looking at as few packs as possible.

This patch switches us to the same packed_git_mru list that
is now used by normal object lookups.

Here are timings for p5303 on linux.git:

Test  HEAD^HEAD

5303.3: rev-list (1)  31.31(31.07+0.23)31.28(31.00+0.27) -0.1%
5303.4: repack (1)40.35(38.84+2.60)40.53(39.31+2.32) +0.4%
5303.6: rev-list (50) 31.37(31.15+0.21)31.41(31.16+0.24) +0.1%
5303.7: repack (50)   58.25(68.54+2.03)47.28(57.66+1.89) -18.8%
5303.9: rev-list (1000)   31.91(31.57+0.33)31.93(31.64+0.28) +0.1%
5303.10: repack (1000)304.80(376.00+3.92)  87.21(159.54+2.84) -71.4%

The rev-list numbers are unchanged, which makes sense (they
are not exercising this code at all). The 50- and 1000-pack
repack cases show considerable improvement.

The single-pack repack case doesn't, of course; there's
nothing to improve. In fact, it gives us a baseline for how
fast we could possibly go. You can see that though rev-list
can approach the single-pack case even with 1000 packs,
repack doesn't. The reason is simple: the loop we are
optimizing is only part of what the repack is doing. After
the "counting" phase, we do delta compression, which is much
more expensive when there are multiple packs, because we
have fewer deltas we can reuse (you can also see that these
numbers come from a multicore machine; the CPU times are
much higher than the wall-clock times due to the delta
phase).

So the good news is that in cases with many packs, we used
to be dominated by the "counting" phase, and now we are
dominated by the delta compression (which is faster, and
which we have already parallelized).

Here are similar numbers for git.git:

Test  HEAD^   HEAD
-
5303.3: rev-list (1)  1.55(1.51+0.02) 1.54(1.53+0.00) -0.6%
5303.4: repack (1)1.82(1.80+0.08) 1.82(1.78+0.09) +0.0%
5303.6: rev-list (50) 1.58(1.57+0.00) 1.58(1.56+0.01) +0.0%
5303.7: repack (50)   2.50(3.12+0.07) 2.31(2.95+0.06) -7.6%
5303.9: rev-list (1000)   2.22(2.20+0.02) 2.23(2.19+0.03) +0.5%
5303.10: repack (1000)10.47(16.78+0.22)   7.50(13.76+0.22) -28.4%

Not as impressive in terms of percentage, but still
measurable wins.  If you look at the wall-clock time
improvements in the 1000-pack case, you can see that linux
improved by roughly 10x as many seconds as git. That's
because it has roughly 10x as many objects, and we'd expect
this improvement to scale linearly with the number of
objects (since the number of packs is kept constant). It's
just that the "counting" phase is a smaller percentage of
the total time spent for a git.git repack, and hence the
percentage win is smaller.

The implementation itself is a straightforward use of the
MRU code. We only bother marking a pack as used when we know
that we are able to break early out of the loop, for two
reasons:

  1. If we can't break out early, it does no good; we have
 to visit each pack anyway, so we might as well avoid
 even the minor overhead of managing the cache order.

  2. The mru_mark() function reorders the list, which would
 screw up our traversal. So it is only safe to mark when
 we are about to break out of the loop. We could record
 the found pack and mark it after the loop finishes, of
 course, but that's more complicated and it doesn't buy
 us anything due to (1).

Note that this reordering does have a potential impact on
the final pack, as we store only a single "found" pack for
each object, even if it is present in multiple packs. In
principle, any copy is acceptable, as they all refer to the
same content. But in practice, they may differ in whether
they are stored as deltas, against which base, etc. This may
have an impact on delta reuse, and even the delta search
(since we skip pairs that were already in the same pack).

It's not clear whether this change of order would hurt or
even help average cases, though. The most likely reason to
have duplicate objects is from the completion of thin packs
(e.g., you have some objects in a "base" pack, then receive
several pushes; the packs you receive may be thin on the
wire, with deltas that refer to bases outside the pack, but
we complete them with duplicate base objects when indexing
them).

In such a case the current code would always find the thin
duplicates (because we currently walk the packs in reverse
chronological order). Whereas with 

Re: [PATCH v5 03/15] pkt-line: add `gentle` parameter to format_packet()

2016-08-10 Thread Jeff King
On Wed, Aug 10, 2016 at 03:29:26PM +0200, Lars Schneider wrote:

> 
> > On 10 Aug 2016, at 15:15, Jeff King  wrote:
> > 
> > On Wed, Aug 10, 2016 at 03:03:59PM +0200, larsxschnei...@gmail.com wrote:
> > 
> >> From: Lars Schneider 
> >> 
> >> format_packet() dies if the caller wants to format a packet larger than
> >> LARGE_PACKET_MAX. Certain callers might prefer an error response instead.
> > 
> > I am not sure I agree here. Certainly I see the usefulness of gently
> > handling a failure to write(). But if you are passing in too-large
> > buffers, isn't that a bug in the program?
> > 
> > How would you recover, except by splitting up the content? That might
> > not be possible depending on how you are using the pkt-lines. And even
> > if it is, wouldn't it be simpler to split it up before sending it to
> > format_packet()?
> 
> Good argument. I agree - this patch should be dropped.

Actually, after reading further, one thought did occur to me. Let's say
you are writing to a smudge filter, and one of the header packets you
send has the filename in it. So you might do something like:

  if (packet_write_fmt_gently(fd, "filename=%s", filename) < 0) {
if (filter_required)
die(...);
else
return -1; /* we tried our best; skip smudge */
  }

The "recovery" there is not to try sending again, but rather to give up.
And that is presumably a sane outcome for somebody who tries to checkout
a filename larger than 64K.

It does still feel a little weird that you cannot tell the difference
between a write() error and bad input. Because you really might want to
do something different between the two. Like:

  #define MAX_FILENAME (PKTLINE_DATA_MAXLEN - strlen("filename"))

  if (filename > MAX_FILENAME) {
warning("woah, that name is ridiculous; truncating");
ret = packet_write_fmt_gently(fd, "%.*s", MAX_FILENAME, filename);
  } else
ret = packet_write_fmt_gently(fd, "%s", filename);

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >