Re: [PATCH 1/4] t0021: use write_script to create rot13 shell script

2016-11-06 Thread Lars Schneider

> On 02 Nov 2016, at 19:17, Jeff King  wrote:
> 
> This avoids us fooling around with $SHELL_PATH and the
> executable bit ourselves.
> 
> Signed-off-by: Jeff King 
> ---
> t/t0021-conversion.sh | 4 +---
> 1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
> index a20b9f58e..dfde22549 100755
> --- a/t/t0021-conversion.sh
> +++ b/t/t0021-conversion.sh
> @@ -6,13 +6,11 @@ test_description='blob conversion via gitattributes'
> 
> TEST_ROOT="$(pwd)"
> 
> -cat <"$TEST_ROOT/rot13.sh"
> -#!$SHELL_PATH
> +write_script <<\EOF "$TEST_ROOT/rot13.sh"
> tr \
>   'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ' \
>   'nopqrstuvwxyzabcdefghijklmNOPQRSTUVWXYZABCDEFGHIJKLM'
> EOF
> -chmod +x "$TEST_ROOT/rot13.sh"
> 
> generate_random_characters () {
>   LEN=$1
> -- 
> 2.11.0.rc0.258.gf434c15
> 

This looks good to me (and it works on my machine).
However, I took a look at the "write_script" function and found this,
added by Junio in 840c519d:

echo "#!${2-"$SHELL_PATH"}" &&

There is some kind of variable expansion happening with the "2-" but
I can't quite figure out what is going on. Plus, I can't find anything 
about this in the sh docs.

Can anyone help me to understand it?

Thanks,
Lars
 


Re: [PATCH v5 08/27] sequencer: completely revamp the "todo" script parsing

2016-11-06 Thread Lars Schneider

> On 21 Oct 2016, at 14:24, Johannes Schindelin  
> wrote:
> 
> When we came up with the "sequencer" idea, we really wanted to have
> kind of a plumbing equivalent of the interactive rebase. Hence the
> choice of words: the "todo" script, a "pick", etc.
> 
> ...
> 
> Signed-off-by: Johannes Schindelin 
> ---
> sequencer.c | 284 ++--
> 1 file changed, 163 insertions(+), 121 deletions(-)
> 
> diff --git a/sequencer.c b/sequencer.c
> index 499f5ee..145de78 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -470,7 +470,26 @@ static int allow_empty(struct replay_opts *opts, struct 
> commit *commit)
>   return 1;
> }
> 
> -static int do_pick_commit(struct commit *commit, struct replay_opts *opts)
> +enum todo_command {
> + TODO_PICK = 0,
> + TODO_REVERT
> +};
> +
> +static const char *todo_command_strings[] = {
> + "pick",
> + "revert"
> +};
> +
> +static const char *command_to_string(const enum todo_command command)
> +{
> + if (command < ARRAY_SIZE(todo_command_strings))

With DEVELOPER=1 I get this error on macOS when I compile current git/next 
(b27dc33) using clang:

sequencer.c:632:14: error: comparison of constant 2 with expression of type 
'const enum todo_command' is always true 
[-Werror,-Wtautological-constant-out-of-range-compare]
if (command < ARRAY_SIZE(todo_command_strings))

Torsten discovered this problem already in v3 and Peff suggested
a working solution [1]. Is there any reason not to use Peff's suggestion?


Cheers,
Lars

[1] http://public-inbox.org/git/d9f4f658-94fb-cb9e-7da8-3a2fac120...@web.de/

Re: What's cooking in git.git (Oct 2016, #09; Mon, 31)

2016-11-03 Thread Lars Schneider

> On 2 Nov 2016, at 17:43, Johannes Sixt  wrote:
> 
> Am 02.11.2016 um 18:04 schrieb Torsten Bögershausen:
>>> * ls/filter-process (2016-10-17) 14 commits
>>>  (merged to 'next' on 2016-10-19 at ffd0de042c)
>> 
>> Some (late, as I recently got a new battery for the Mac OS 10.6 test system) 
>> comments:
>> t0021 failes here:
>> 
>> 
>> Can't locate object method "flush" via package "IO::Handle" at 
>> /Users/tb/projects/git/git.next/t/t0021/rot13-filter.pl line 90.
>> fatal: The remote end hung up unexpectedly
>> 
>> 
>> perl itself is 5.10 and we use the one shipped with Mac OS.
>> Why that ?
>> t0021 uses the hard-coded path:
>> t0021/rot13-filter.pl (around line 345) and the nice macro
>> PERL_PATH from the Makefile is fully ignored.
>> 
>> Commenting out the different "flush" makes the test hang, and I haven't 
>> digged further.
>> 
> 
> https://public-inbox.org/git/e8deda5f-11a6-1463-4fc5-25454084c...@kdbg.org/

Woooh. I am sorry Hannes - I completely missed that email! Looks like Peff 
addressed the issue already. His patches look very good but I want to try it on 
my machine tomorrow.

Thanks,
Lars


Re: [PATCH v11 13/14] convert: add filter..process option

2016-11-02 Thread Lars Schneider

> On 2 Nov 2016, at 18:03, Johannes Sixt  wrote:
> 
>> Am 17.10.2016 um 01:20 schrieb larsxschnei...@gmail.com:
>> +# Compare two files and ensure that `clean` and `smudge` respectively are
>> +# called at least once if specified in the `expect` file. The actual
>> +# invocation count is not relevant because their number can vary.
>> +# c.f. 
>> http://public-inbox.org/git/xmqqshv18i8i@gitster.mtv.corp.google.com/
>> +test_cmp_count () {
>> +expect=$1
>> +actual=$2
>> +for FILE in "$expect" "$actual"
>> +do
>> +sort "$FILE" | uniq -c | sed "s/^[ ]*//" |
>> +sed "s/^\([0-9]\) IN: clean/x IN: clean/" |
>> +sed "s/^\([0-9]\) IN: smudge/x IN: smudge/" >"$FILE.tmp" &&
> 
> This is not sufficiently portable. Some versions of uniq write the
> count left-adjusted, not right-adjusted. How about this on top:
> 
> diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
> index a20b9f58e3..f60858c517 100755
> --- a/t/t0021-conversion.sh
> +++ b/t/t0021-conversion.sh
> @@ -40,10 +40,9 @@ test_cmp_count () {
>actual=$2
>for FILE in "$expect" "$actual"
>do
> -sort "$FILE" | uniq -c | sed "s/^[ ]*//" |
> -sed "s/^\([0-9]\) IN: clean/x IN: clean/" |
> -sed "s/^\([0-9]\) IN: smudge/x IN: smudge/" >"$FILE.tmp" &&
> -mv "$FILE.tmp" "$FILE"
> +sort "$FILE" | uniq -c |
> +sed -e "s/^ *[0-9][0-9]* *IN: /x IN: /" >"$FILE.tmp" &&

This looks good (thanks for cleaning up the redundant clean/smudge stuff - that 
was a refactoring artifact!). One minor nit: doesn't sed understand '[0-9]+' ?

> +mv "$FILE.tmp" "$FILE" || return

Why '|| return' here?

>done &&
>test_cmp "$expect" "$actual"
> }

Thank you,
Lars

Re: What's cooking in git.git (Oct 2016, #09; Mon, 31)

2016-11-02 Thread Lars Schneider

On 2 Nov 2016, at 17:04, Torsten Bögershausen  wrote:

>> * ls/filter-process (2016-10-17) 14 commits
>>  (merged to 'next' on 2016-10-19 at ffd0de042c)
> 
> Some (late, as I recently got a new battery for the Mac OS 10.6 test system) 
> comments:
> t0021 failes here:
> 
> 
> Can't locate object method "flush" via package "IO::Handle" at 
> /Users/tb/projects/git/git.next/t/t0021/rot13-filter.pl line 90.
> fatal: The remote end hung up unexpectedly
> 
> 
> perl itself is 5.10 and we use the one shipped with Mac OS.
> Why that ?

Thanks for reporting! It surprises me that flush is not available. I don't have 
a 10.6 system to tests. Where you able to reproduce the problem on a newer 
system with an older Perl, too?

@Martin: do you have a clue?

I can't debug the issue right now (only on mobile) but I will look into it on 
Friday!

> t0021 uses the hard-coded path:
> t0021/rot13-filter.pl (around line 345) and the nice macro
> PERL_PATH from the Makefile is fully t

Can you check the line number? Rot13-filter.pl has only 192 lines. I'll look 
into the PERL_PATH. 


> Commenting out the different "flush" makes the test hang, and I haven't 
> digged further.

That would be expected because the pl file writes output to a file which is 
checked by the calling sh script.

Thanks,
Lars 

Re: [PATCH v3 0/3] quick reroll of Lars's git_open() w/ O_CLOEXEC

2016-10-25 Thread Lars Schneider

> On 25 Oct 2016, at 20:16, Junio C Hamano  wrote:
> 
> Here is to make sure everybody is on the same page regarding the
> series.  The patches are adjusted for comments from Eric and Dscho.

Thank you, Junio! Your v3 looks good to me and I compiled and tested
it on Windows.

There is one catch though:
I ran the tests multiple times on Windows and I noticed that the
following test seems flaky:
t0021 `required process filter should be used only for "clean" operation only'

This flaky-ness was not introduced by your v3. I was able to reproduce
the same behavior for v2. I wonder why I did not noticed that before. 
The only difference to before is that my Windows machines does some 
heavy CPU/network task in the background now (I can't/don't want to
stop that ATM).

Here is the relevant part of the log:

++ rm -f rot13-filter.log
++ git checkout --quiet --no-progress .
+ test_eval_ret_=255
+ want_trace
+ test t = t
+ test t = t
+ set +x
error: last command exited with $?=255

Looks like Git exists with 255. Any idea why that might happen?

@Dscho/Johannes: Can you try this on your Windows machines?

Thanks,
Lars




Re: [PATCH v2 2/2] read-cache: make sure file handles are not inherited by child processes

2016-10-24 Thread Lars Schneider

> On 24 Oct 2016, at 21:22, Johannes Sixt <j...@kdbg.org> wrote:
> 
> Am 24.10.2016 um 20:03 schrieb larsxschnei...@gmail.com:
>> From: Lars Schneider <larsxschnei...@gmail.com>
>> 
>> This fixes "convert: add filter..process option" (edcc8581) on
>> Windows.
> 
> Today's next falls flat on its face on Windows in t0021.15 "required process 
> filter should filter data"; might it be the failure meant here? (I haven't 
> dug deeper, yet.)

Yes, this is the failure meant here :-)

Cheers,
Lars




Re: [PATCH 0/3] fix travis TAP/--verbose conflict

2016-10-24 Thread Lars Schneider

> On 21 Oct 2016, at 12:41, Jeff King  wrote:
> 
> On Fri, Oct 21, 2016 at 04:43:48AM -0400, Jeff King wrote:
> 
>> The obvious fix would be to send "--verbose" output to stderr, but I
>> suspect that would end up annoying for people who do:
>> 
>>  ./t5547-push-quarantine.sh -v | less
>> 
>> to read long output. Probably we need some option like "--log" which
>> logs in the same way that "--tee" does, but _without_ sending the data
>> to stdout. Naively, that just means replacing the "tee" invocation with
>> "cat", but I suspect it will be a lot more complicated than that,
>> because we still need to let the TAP output go to stdout.
> 
> Yeah, it was definitely a lot more complicated. This patch series fixes
> it.

Thanks a lot for this detailed and quick fix :-)

Cheers,
Lars



Re: What's cooking in git.git (Oct 2016, #05; Thu, 20)

2016-10-21 Thread Lars Schneider

> On 21 Oct 2016, at 06:08, Johannes Schindelin  
> wrote:
> 
> Hi Junio & Lars,
> 
> On Thu, 20 Oct 2016, Junio C Hamano wrote:
> 
>> * ls/filter-process (2016-10-17) 14 commits
>>  (merged to 'next' on 2016-10-19 at ffd0de042c)
>> + contrib/long-running-filter: add long running filter example
>> + convert: add filter..process option
>> + convert: prepare filter..process option
>> + convert: make apply_filter() adhere to standard Git error handling
>> + pkt-line: add functions to read/write flush terminated packet streams
>> + pkt-line: add packet_write_gently()
>> + pkt-line: add packet_flush_gently()
>> + pkt-line: add packet_write_fmt_gently()
>> + pkt-line: extract set_packet_header()
>> + pkt-line: rename packet_write() to packet_write_fmt()
>> + run-command: add clean_on_exit_handler
>> + run-command: move check_pipe() from write_or_die to run_command
>> + convert: modernize tests
>> + convert: quote filter names in error messages
>> 
>> The smudge/clean filter API expect an external process is spawned
>> to filter the contents for each path that has a filter defined.  A
>> new type of "process" filter API has been added to allow the first
>> request to run the filter for a path to spawn a single process, and
>> all filtering need is served by this single process for multiple
>> paths, reducing the process creation overhead.
>> 
>> Will merge to 'master'.
> 
> This breaks in Git for Windows' SDK (I only now realized that t0060 was
> not the only thing breaking in `next` for a while now):
> ...
> -- snap --
> 
> Unsurprisingly, bisect identifies "convert: add filter..process
> option" as the first bad commit, although it only fails on the test case
> 15, but not on 17.
> 
> I am unfortunately still busy with trying to figure out what exactly makes
> t6030 hang on `pu` (seems it thinks stdin is a tty and just waits for an
> answer), and then trying to reduce that insane amount of time wasted on
> running, and waiting for, the test suite, and for unrelated reasons I'll
> have to go offline for the rest of the day, so I will most likely be
> unable to assist further with this.

Hi Johannes,

thanks for reporting this. Looks like I misunderstood your comment when
you wrote the error is already fixed in GfW:
https://github.com/git-for-windows/git/issues/770#issuecomment-251426487

I think this patch series (referenced by the link above, too) should fix 
the problem:
http://public-inbox.org/git/2016090521.72956-1-larsxschnei...@gmail.com/

Don't waste any time on this. I will look into it ASAP (traveling right now).

Cheers,
Lars

Prove "Tests out of sequence" Error

2016-10-20 Thread Lars Schneider
Hi,

on TravisCI I see these weird "Tests out of sequence" errors with prove
and they seem to not go away. I assume the reason that they not go away
is that the ".prove" file is carried over from on build to another (but I can't 
look into this file on TravisCI).

Has anyone an idea where these errors might come from?


t5547-push-quarantine.sh (Wstat: 0 Tests: 5 Failed: 0)
  Parse errors: Tests out of sequence.  Found (2) but expected (3)
Tests out of sequence.  Found (3) but expected (4)
Tests out of sequence.  Found (4) but expected (5)
Bad plan.  You planned 4 tests but ran 5.
Files=760, Tests=15109, 679 wallclock secs (21.91 usr  1.78 sys + 320.79 cusr 
529.13 csys = 873.61 CPU)
Result: FAIL
make[1]: *** [prove] Error 1
make: *** [test] Error 2


Example:
https://s3.amazonaws.com/archive.travis-ci.org/jobs/169385219/log.txt

Thanks,
Lars

Re: What's cooking in git.git (Oct 2016, #04; Mon, 17)

2016-10-18 Thread Lars Schneider

> On 17 Oct 2016, at 15:28, Junio C Hamano  wrote:
> ...
> 
> * ls/filter-process (2016-10-17) 14 commits
> - contrib/long-running-filter: add long running filter example
> - convert: add filter..process option
> - convert: prepare filter..process option
> - convert: make apply_filter() adhere to standard Git error handling
> - pkt-line: add functions to read/write flush terminated packet streams
> - pkt-line: add packet_write_gently()
> - pkt-line: add packet_flush_gently()
> - pkt-line: add packet_write_fmt_gently()
> - pkt-line: extract set_packet_header()
> - pkt-line: rename packet_write() to packet_write_fmt()
> - run-command: add clean_on_exit_handler
> - run-command: move check_pipe() from write_or_die to run_command
> - convert: modernize tests
> - convert: quote filter names in error messages
> 
> The smudge/clean filter API expect an external process is spawned
> to filter the contents for each path that has a filter defined.  A
> new type of "process" filter API has been added to allow the first
> request to run the filter for a path to spawn a single process, and
> all filtering need is served by this single process for multiple
> paths, reducing the process creation overhead.

Hi Junio,

what do you think about v11? Do you feel the series is becoming mature
enough for `next`?

Thanks,
Lars


[RFC] Case insensitive Git attributes

2016-10-16 Thread Lars Schneider
Hi,

Git attributes for path names are generally case sensitive. However, on 
a case insensitive file system (e.g. macOS/Windows) they appear to be
case insensitive (`*.bar` would match `foo.bar` and `foo.BAR`). That 
works great until a Git users joins the party with a case sensitive file 
system. For this Git user only files that match the exact case of the 
attribute pattern get the attributes (only `foo.bar`).

This inconsistent behavior can confuse Git users. An advanced Git user
could use a glob pattern (e.g. `*.[bB][aA][rR]) to match files in a
case insensitive way. However, this can get confusing quickly, too.

I wonder if we can do something about this. One idea could be to add an
attribute "case-sensitive" (or "caseSensitive") and set it to false 
(if desired) for all files in .gitattributes for a given repo.

### .gitattributes example ###

* case-sensitive=false
*.bar something

###

I haven't looked into the feasibility of an implementation, yet. However,
would that be an acceptable approach?

Thanks,
Lars





Re: [PATCH v10 13/14] convert: add filter..process option

2016-10-16 Thread Lars Schneider

> On 11 Oct 2016, at 03:09, Torsten Bögershausen <tbo...@web.de> wrote:
> 
> On Tue, Oct 11, 2016 at 10:11:22AM +0200, Lars Schneider wrote:
>> 
>>> On 10 Oct 2016, at 21:58, Junio C Hamano <gits...@pobox.com> wrote:
>>> 
>>> larsxschnei...@gmail.com writes:
>>> 
>>> [...]
>>>> 
>> -test_cmp_count_except_clean () {
>> -for FILE in $@
> 
>> +test_cmp_count () {
>> +expect=$1 actual=$2
> 
> That could be 
> expect="$1"
> actual="$2"

Sure!


>> +for FILE in "$expect" "$actual"
>>  do
> 
>> +sort "$FILE" | uniq -c | sed "s/^[ ]*//" |
>> +sed "s/^\([0-9]\) IN: clean/x IN: clean/" |
>> +sed "s/^\([0-9]\) IN: smudge/x IN: smudge/" 
>> >"$FILE.tmp" &&
>> +cat "$FILE.tmp" >"$FILE"
> 
> How about 
>   cp "$FILE.tmp" "$FILE"

OK, I'll use "mv".

Thanks,
Lars

Re: [PATCH v10 04/14] run-command: add clean_on_exit_handler

2016-10-16 Thread Lars Schneider

> On 16 Oct 2016, at 01:03, Johannes Schindelin <johannes.schinde...@gmx.de> 
> wrote:
> 
> Hi Lars,
> 
> On Sat, 15 Oct 2016, Lars Schneider wrote:
> 
>> 
>>> On 11 Oct 2016, at 05:12, Johannes Schindelin <johannes.schinde...@gmx.de> 
>>> wrote:
>>> 
>>> On Windows, pid_t translates to long long int, resulting in this build
>>> error:
>>> 
>> 
>> Thanks for hint! I'll change it!
>> 
>> However, I am building on Win 8.1 with your latest SDK and I cannot
>> reproduce the error. Any idea why that might be the case?
> 
> Are you building with DEVELOPER=1?

Argh! Of course ... I forgot to add this flag to my config.mak on Windows.

Thanks,
Lars


Re: [PATCH] convert: mark a file-local symbol static

2016-10-15 Thread Lars Schneider

> On 15 Oct 2016, at 14:01, Ramsay Jones <ram...@ramsayjones.plus.com> wrote:
> 
> 
> 
> On 15/10/16 16:05, Lars Schneider wrote:
>>> On 11 Oct 2016, at 16:46, Ramsay Jones <ram...@ramsayjones.plus.com> wrote:
> [snip]
>>> -void stop_multi_file_filter(struct child_process *process)
>>> +static void stop_multi_file_filter(struct child_process *process)
>> 
>> Done! Do you have some kind of script to detect these things
>> automatically or do you read the code that carefully?
> 
> Heh, I'm _far_ too lazy to read the code that carefully. :-D
> 
> A combination of 'make sparse' and a perl script (originally
> posted to the list by Junio) find all of these for me.

Interesting. Do you have a link to this script?
Does it generate false positives? 
Maybe I can add `make sparse` to the TravisCI build?

Cheers,
Lars

Re: [PATCH] convert: mark a file-local symbol static

2016-10-15 Thread Lars Schneider

> On 11 Oct 2016, at 16:46, Ramsay Jones  wrote:
> 
> If you need to re-roll your 'ls/filter-process' branch, could you
> please squash this into commit 85290197
> ("convert: add filter..process option", 08-10-2016).
> 
> 
> -void stop_multi_file_filter(struct child_process *process)
> +static void stop_multi_file_filter(struct child_process *process)

Done! Do you have some kind of script to detect these things
automatically or do you read the code that carefully?

Thanks,
Lars


Re: [PATCH v10 04/14] run-command: add clean_on_exit_handler

2016-10-15 Thread Lars Schneider

> On 11 Oct 2016, at 05:12, Johannes Schindelin  
> wrote:
> 
> Hi Lars,
> 
> On Sat, 8 Oct 2016, larsxschnei...@gmail.com wrote:
> 
>> @@ -31,6 +32,15 @@ static void cleanup_children(int sig, int in_signal)
>>  while (children_to_clean) {
>>  struct child_to_clean *p = children_to_clean;
>>  children_to_clean = p->next;
>> +
>> +if (p->process && !in_signal) {
>> +struct child_process *process = p->process;
>> +if (process->clean_on_exit_handler) {
>> +trace_printf("trace: run_command: running exit 
>> handler for pid %d", p->pid);
> 
> On Windows, pid_t translates to long long int, resulting in this build
> error:
> 
> -- snip --
> In file included from cache.h:10:0,
>  from run-command.c:1:
> run-command.c: In function 'cleanup_children':
> run-command.c:39:18: error: format '%d' expects argument of type 'int', but 
> argument 5 has type 'pid_t {aka long long int}' [-Werror=format=]
>  trace_printf("trace: run_command: running exit handler for pid %d", 
> p->pid);
>   ^
> trace.h:81:53: note: in definition of macro 'trace_printf'
>   trace_printf_key_fl(TRACE_CONTEXT, __LINE__, NULL, __VA_ARGS__)
>  ^~~
> cc1.exe: all warnings being treated as errors
> make: *** [Makefile:1987: run-command.o] Error 1
> -- snap --
> 
> Maybe use PRIuMAX as we do elsewhere (see output of `git grep
> printf.*pid`):
> 
>   trace_printf("trace: run_command: running exit handler for pid %"
>PRIuMAX, (uintmax_t)p->pid);

Thanks for hint! I'll change it!

However, I am building on Win 8.1 with your latest SDK and I cannot
reproduce the error. Any idea why that might be the case?

Thanks,
Lars



Re: [PATCH v10 14/14] contrib/long-running-filter: add long running filter example

2016-10-15 Thread Lars Schneider

> On 08 Oct 2016, at 22:42, Torsten Bögershausen <tbo...@web.de> wrote:
> 
> On 08.10.16 13:25, larsxschnei...@gmail.com wrote:
>> From: Lars Schneider <larsxschnei...@gmail.com>
>> 
>> Add a simple pass-thru filter as example implementation for the Git
>> filter protocol version 2. See Documentation/gitattributes.txt, section
>> "Filter Protocol" for more info.
>> 
> 
> Nothing wrong with code in contrib.
> I may have missed parts of the discussion, was there a good reason to
> drop the test case completely?
> 
>> When adding a new feature, make sure that you have new tests to show
>> the feature triggers the new behavior when it should, and to show the
>> feature does not trigger when it shouldn't.  After any code change, make
>> sure that the entire test suite passes.
> 
> Or is there a plan to add them later ?

The test is part of the "main feature patch" 13/14:
http://public-inbox.org/git/20161008112530.15506-14-larsxschnei...@gmail.com/

Cheers,
Lars

Re: [PATCH v10 13/14] convert: add filter..process option

2016-10-15 Thread Lars Schneider
@Peff: If you have time, it would be great if you could comment on
one question below prefixed with "@Peff". Thanks!


> On 12 Oct 2016, at 03:54, Jakub Narębski <jna...@gmail.com> wrote:
> 
> W dniu 12.10.2016 o 00:26, Lars Schneider pisze: 
>>> On 09 Oct 2016, at 01:06, Jakub Narębski <jna...@gmail.com> wrote:
>>>> 
>> 
>>>> After the filter started
>>>> Git sends a welcome message ("git-filter-client"), a list of
>>>> supported protocol version numbers, and a flush packet. Git expects
>>>> +to read a welcome response message ("git-filter-server") and exactly
>>>> +one protocol version number from the previously sent list. All further
>>>> +communication will be based on the selected version. The remaining
>>>> +protocol description below documents "version=2". Please note that
>>>> +"version=42" in the example below does not exist and is only there
>>>> +to illustrate how the protocol would look like with more than one
>>>> +version.
>>>> +
>>>> +After the version negotiation Git sends a list of all capabilities that
>>>> +it supports and a flush packet. Git expects to read a list of desired
>>>> +capabilities, which must be a subset of the supported capabilities list,
>>>> +and a flush packet as response:
>>>> +
>>>> +packet:  git> git-filter-client
>>>> +packet:  git> version=2
>>>> +packet:  git> version=42
>>>> +packet:  git> 
>>>> +packet:  git< git-filter-server
>>>> +packet:  git< version=2
>>>> +packet:  git> clean=true
>>>> +packet:  git> smudge=true
>>>> +packet:  git> not-yet-invented=true
>>>> +packet:  git> 
>>>> +packet:  git< clean=true
>>>> +packet:  git< smudge=true
>>>> +packet:  git< 
>>> 
>>> WARNING: This example is different from description!!!
>> 
>> Can you try to explain the difference more clearly? I read it multiple
>> times and I think this is sound.
> 
> I'm sorry it was *my mistake*.  I have read the example exchange wrong.
> 
> On the other hand that means that I have other comment, which I though
> was addressed already in v10, namely that not all exchanges ends with
> flush packet (inconsistency, and I think a bit of lack of extendability).

Well, this part of the protocol is not supposed to be extensible because
it is supposed to deal *only* with the version number. It needs to keep 
the same structure to ensure forward and backward compatibility.

However, for consistency sake I will add a flush packet.


>>> In description above the example you have 4-part handshake, not 3-part;
>>> the filter is described to send list of supported capabilities last
>>> (a subset of what Git command supports).
>> 
>> Part 1: Git sends a welcome message...
>> Part 2: Git expects to read a welcome response message...
>> Part 3: After the version negotiation Git sends a list of all capabilities...
>> Part 4: Git expects to read a list of desired capabilities...
>> 
>> I think example and text match, no?
> 
> Yes, it does; as I have said already, I have misread the example. 
> 
> Anyway, in some cases 4-way handshake, where Git sends list of
> supported capabilities first, is better.  If the protocol has
> to prepare something for each of capabilities, and perhaps check
> those preparation status, it can do it after Git sends what it
> could need, and before it sends what it does support.
> 
> Though it looks a bit strange that client (as Git is client here)
> sends its capabilities first...

Git tells the filter what it can do. Then the filter decides what
features it supports. I would prefer to keep it that way as I don't
see a strong advantage for the other way around.


>>> By the way, now I look at it, the argument for using the
>>> "=true" format instead of "capability="
>>> (or "supported-command=") is weak.  The argument for
>>> using "=" to make it easier to implement parsing
>>> is sound, but the argument for "=true" is weak.
>>> 
>>> The argument was that with "=true" one can simply
>>> parse metadata into hash / dictionary / hashmap, and choose
>>> response based on that.  Hash / hashmap / associative array
>>> needs different keys, so the reasoning we

Re: [PATCH v10 13/14] convert: add filter..process option

2016-10-11 Thread Lars Schneider

> On 09 Oct 2016, at 01:06, Jakub Narębski <jna...@gmail.com> wrote:
> 
> Part 1 of review, starting with the protocol v2 itself.
> 
> W dniu 08.10.2016 o 13:25, larsxschnei...@gmail.com pisze:
>> From: Lars Schneider <larsxschnei...@gmail.com>
>> 
>> +upon checkin. By default these commands process only a single
>> +blob and terminate.  If a long running `process` filter is used
>> +in place of `clean` and/or `smudge` filters, then Git can process
>> +all blobs with a single filter command invocation for the entire
>> +life of a single Git command, for example `git add --all`.  See
>> +section below for the description of the protocol used to
>> +communicate with a `process` filter.
> 
> I don't remember how this part looked like in previous versions
> of this patch series, but "... is used in place of `clean` ..."
> does not tell explicitly about the precedence of those 
> configuration variables.  I think it should be stated explicitly
> that `process` takes precedence over any `clean` and/or `smudge`
> settings for the same `filter.` (regardless of whether
> the long running `process` filter support "clean" and/or "smudge"
> operations or not).

This is stated explicitly later on. I moved it up here:

"If a long running `process` filter is used
in place of `clean` and/or `smudge` filters, then Git can process
all blobs with a single filter command invocation for the entire
life of a single Git command, for example `git add --all`. If a 
long running `process` filter is configured then it always takes 
precedence over a configured single blob filter. "

OK?


>> +If the filter command (a string value) is defined via
>> +`filter..process` then Git can process all blobs with a
>> +single filter invocation for the entire life of a single Git
>> +command. This is achieved by using a packet format (pkt-line,
>> +see technical/protocol-common.txt) based protocol over standard
>> +input and standard output as follows. All packets, except for the
>> +"*CONTENT" packets and the "" flush packet, are considered
>> +text and therefore are terminated by a LF.
> 
> Maybe s/standard input and output/\& of filter process,/ (that is,
> add "... of filter process," to the third sentence in the above
> paragraph).

You mean "This is achieved by using a packet format (pkt-line,
see technical/protocol-common.txt) based protocol over standard
input and standard output of filter process as follows." ?

I think I like the original version better.


>> After the filter started
> Git sends a welcome message ("git-filter-client"), a list of
>> supported protocol version numbers, and a flush packet. Git expects
>> +to read a welcome response message ("git-filter-server") and exactly
>> +one protocol version number from the previously sent list. All further
>> +communication will be based on the selected version. The remaining
>> +protocol description below documents "version=2". Please note that
>> +"version=42" in the example below does not exist and is only there
>> +to illustrate how the protocol would look like with more than one
>> +version.
>> +
>> +After the version negotiation Git sends a list of all capabilities that
>> +it supports and a flush packet. Git expects to read a list of desired
>> +capabilities, which must be a subset of the supported capabilities list,
>> +and a flush packet as response:
>> +
>> +packet:  git> git-filter-client
>> +packet:  git> version=2
>> +packet:  git> version=42
>> +packet:  git> 
>> +packet:  git< git-filter-server
>> +packet:  git< version=2
>> +packet:  git> clean=true
>> +packet:  git> smudge=true
>> +packet:  git> not-yet-invented=true
>> +packet:  git> 
>> +packet:  git< clean=true
>> +packet:  git< smudge=true
>> +packet:  git< 
> 
> WARNING: This example is different from description!!!

Can you try to explain the difference more clearly? I read it multiple
times and I think this is sound.


> In example you have Git sending "git-filter-client" and list of supported
> protocol versions, terminated with flush packet,

Correct.


> then filter driver
> process sends "git-filter-server", exactly one version, *AND* list of
> supported capabilities in "=true" format, terminated with
> flush packet.

Correct. That's what I read in the text and in the example.

> 
> In description abov

Re: [PATCH v10 13/14] convert: add filter..process option

2016-10-11 Thread Lars Schneider

> On 09 Oct 2016, at 07:32, Torsten Bögershausen  wrote:
> 
> On 09.10.16 01:06, Jakub Narębski wrote:
>>> +
 +packet:  git< status=abort
 +packet:  git< 
 +
 +
 +After the filter has processed a blob it is expected to wait for
 +the next "key=value" list containing a command. Git will close
 +the command pipe on exit. The filter is expected to detect EOF
 +and exit gracefully on its own.
>> Any "kill filter" solutions should probably be put here.  I guess
>> that filter exiting means EOF on its standard output when read
>> by Git command, isn't it?
>> 
> Isn't it that Git closes the command pipe, then filter sees EOF on it's stdin
> 
> and does a graceful exit.

Correct!

- Lars

Re: [PATCH v10 13/14] convert: add filter..process option

2016-10-11 Thread Lars Schneider

> On 10 Oct 2016, at 21:58, Junio C Hamano  wrote:
> 
> larsxschnei...@gmail.com writes:
> 
> [...]
>> +# Count unique lines except clean invocations in two files and compare
>> +# them. Clean invocations are not counted because their number can vary.
>> +# c.f. 
>> http://public-inbox.org/git/xmqqshv18i8i@gitster.mtv.corp.google.com/
>> +test_cmp_count_except_clean () {
>> +for FILE in $@
>> +do
>> +sort $FILE | uniq -c | sed "s/^[ ]*//" |
>> +sed "s/^\([0-9]\) IN: clean/x IN: clean/" >$FILE.tmp
>> +cat $FILE.tmp >$FILE
>> +done &&
>> +test_cmp $@
>> +}
> 
> Why do you even _care_ about the number of invocations?  While I
> told you why "clean" could be called multiple times under racy Git
> as an example, that was not meant to be an exhaustive example.  I
> wouldn't be surprised if we needed to run smudge twice, for example,
> in some weirdly racy cases in the future.
> 
> Can we just have the correctness (i.e. "we expect that the working
> tree file gets this as the result of checking it out, and we made
> sure that is the case") test without getting into such an
> implementation detail?

My goal is to check that clean/smudge is invoked at least once. I could
just run `uniq` to achieve that but then all other filter commands could
happen multiple times and the test would not detect that.

I also prefer to check the filter commands to ensure the filter is 
working as expected (e.g. no multiple start ups etc) in addition to 
checking the working tree.

Would the patch below work for you? If yes, then please squash it into
"convert: add filter..process option".

Thank you,
Lars



diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
index 9f892c0..714f706 100755
--- a/t/t0021-conversion.sh
+++ b/t/t0021-conversion.sh
@@ -31,38 +31,33 @@ filter_git () {
rm -f git-stderr.log
 }
 
-# Count unique lines in two files and compare them.
-test_cmp_count () {
-   for FILE in $@
-   do
-   sort $FILE | uniq -c | sed "s/^[ ]*//" >$FILE.tmp
-   cat $FILE.tmp >$FILE
-   done &&
-   test_cmp $@
-}
-
-# Count unique lines except clean invocations in two files and compare
-# them. Clean invocations are not counted because their number can vary.
+# Compare two files and ensure that `clean` and `smudge` respectively are
+# called at least once if specified in the `expect` file. The actual
+# invocation count is not relevant because their number can vary.
 # c.f. 
http://public-inbox.org/git/xmqqshv18i8i@gitster.mtv.corp.google.com/
-test_cmp_count_except_clean () {
-   for FILE in $@
+test_cmp_count () {
+   expect=$1 actual=$2
+   for FILE in "$expect" "$actual"
do
-   sort $FILE | uniq -c | sed "s/^[ ]*//" |
-   sed "s/^\([0-9]\) IN: clean/x IN: clean/" >$FILE.tmp
-   cat $FILE.tmp >$FILE
+   sort "$FILE" | uniq -c | sed "s/^[ ]*//" |
+   sed "s/^\([0-9]\) IN: clean/x IN: clean/" |
+   sed "s/^\([0-9]\) IN: smudge/x IN: smudge/" 
>"$FILE.tmp" &&
+   cat "$FILE.tmp" >"$FILE"
done &&
-   test_cmp $@
+   test_cmp "$expect" "$actual"
 }
 
-# Compare two files but exclude clean invocations because they can vary.
+# Compare two files but exclude all `clean` invocations because Git can
+# call `clean` zero or more times.
 # c.f. 
http://public-inbox.org/git/xmqqshv18i8i@gitster.mtv.corp.google.com/
 test_cmp_exclude_clean () {
-   for FILE in $@
+   expect=$1 actual=$2
+   for FILE in "$expect" "$actual"
do
-   grep -v "IN: clean" $FILE >$FILE.tmp
-   cat $FILE.tmp >$FILE
+   grep -v "IN: clean" "$FILE" >"$FILE.tmp" &&
+   cat "$FILE.tmp" >"$FILE"
done &&
-   test_cmp $@
+   test_cmp "$expect" "$actual"
 }
 
 # Check that the contents of two files are equal and that their rot13 version
@@ -395,7 +390,7 @@ test_expect_success PERL 'required process filter should 
filter data' '
IN: clean testsubdir/test3 '\''sq'\'',\$x.r $S3 [OK] -- 
OUT: $S3 . [OK]
STOP
EOF
-   test_cmp_count_except_clean expected.log rot13-filter.log &&
+   test_cmp_count expected.log rot13-filter.log &&
 
rm -f test2.r "testsubdir/test3 '\''sq'\'',\$x.r" &&



Re: [PATCH v8 11/11] convert: add filter..process option

2016-10-06 Thread Lars Schneider

> On 04 Oct 2016, at 23:00, Jakub Narębski <jna...@gmail.com> wrote:
> 
> [Some of answers and comments may got invalidated by v9]
> 
> W dniu 30.09.2016 o 21:38, Lars Schneider pisze:
>>> On 27 Sep 2016, at 17:37, Jakub Narębski <jna...@gmail.com> wrote:
>>> 
>>> Part second of the review of 11/11.
> [...]
>>>> +
>>>> +  if (start_command(process)) {
>>>> +  error("cannot fork to run external filter '%s'", cmd);
>>>> +  kill_multi_file_filter(hashmap, entry);
>>>> +  return NULL;
>>>> +  }
>>> 
>>> I guess there is a reason why we init hashmap entry, try to start
>>> external process, then kill entry of unable to start, instead of
>>> trying to start external process, and adding hashmap entry when
>>> we succeed?
>> 
>> Yes. This way I can reuse the kill_multi_file_filter() function.
> 
> I don't quite understand.  If you didn't fill the entry before
> using start_command(process), you would not need kill_multi_file_filter(),
> which in that case IIUC just removes the just created entry from hashmap.
> Couldn't you add entry to hashmap in the 'else' part?  Or would it
> be racy?

You are right. I'll fix that.


>> 
>>>> +  if (pair[0] && pair[0]->len && pair[1]) {
>>>> +  if (!strcmp(pair[0]->buf, "status=")) {
>>>> +  strbuf_reset(status);
>>>> +  strbuf_addbuf(status, pair[1]);
>>>> +  }
>>> 
>>> So it is last status= line wins behavior?
>> 
>> Correct.
> 
> Perhaps this should be described in code comment.

OK


>>>> 
>>>> +  fflush(NULL);
>>> 
>>> Why this fflush(NULL) is needed here?
>> 
>> This flushes all open output streams. The single filter does the same.
> 
> I know what it does, but I don't know why.  But "single filter does it"
> is good enough for me.  Still would want to know why, though ;-)

TBH I am not 100% sure why, too. I think this ensures that we don't have 
any outdated/unrelated/previous data in the stream buffers.


>>>> 
>>>> +  if (fd >= 0 && !src) {
>>>> +  if (fstat(fd, _stat) == -1)
>>>> +  return 0;
>>>> +  len = xsize_t(file_stat.st_size);
>>>> +  }
>>> 
>>> Errr... is it necessary?  The protocol no longer provides size=
>>> hint, and neither uses such hint if provided.
>> 
>> We require the size in write_packetized_from_buf() later.
> 
> Don't we use write_packetized_from_fd() in the case of fd >= 0?

Of course! Ah too many refactorings :-)
I'll remove that.

Thank you,
Lars



Re: [PATCH v8 11/11] convert: add filter..process option

2016-10-06 Thread Lars Schneider

> On 04 Oct 2016, at 22:50, Jakub Narębski <jna...@gmail.com> wrote:
> 
> [Some of answers may get invalidated by v9]
> 
> W dniu 30.09.2016 o 20:56, Lars Schneider pisze:
>>> On 27 Sep 2016, at 00:41, Jakub Narębski <jna...@gmail.com> wrote:
>>> 
>>>> +
>>>> +After the filter has processed a blob it is expected to wait for
>>>> +the next "key=value" list containing a command. Git will close
>>>> +the command pipe on exit. The filter is expected to detect EOF
>>>> +and exit gracefully on its own.
> 
> Is this still true?

Yes


>>> 
>>> Good to have it documented.  
>>> 
>>> Anyway, as it is Git command that spawns the filter driver process,
>>> assuming that the filter process doesn't daemonize itself, wouldn't
>>> the operating system reap it after its parent process, that is the
>>> git command it invoked, dies? So detecting EOF is good, but not
>>> strictly necessary for simple filter that do not need to free
>>> its resources, or can leave freeing resources to the operating
>>> system? But I may be wrong here.
>> 
>> The filter process runs independent of Git.
> 
> Ah.  So without some way to tell long-lived filter process that
> it can shut down, because no further data will be incoming, or
> killing it by Git, it would hang indefinitely?

Yes

- Lars

Re: [PATCH v8 00/11] Git filter protocol

2016-10-06 Thread Lars Schneider

> On 04 Oct 2016, at 21:04, Jakub Narębski <jna...@gmail.com> wrote:
> 
> W dniu 03.10.2016 o 19:13, Lars Schneider pisze: 
>>> On 01 Oct 2016, at 22:48, Jakub Narębski <jna...@gmail.com> wrote:
>>> W dniu 01.10.2016 o 20:59, Lars Schneider pisze: 
>>>> On 29 Sep 2016, at 23:27, Junio C Hamano <gits...@pobox.com> wrote:
>>>>> Lars Schneider <larsxschnei...@gmail.com> writes:
>>>>> 
>>>>> If the filter process refuses to die forever when Git told it to
>>>>> shutdown (by closing the pipe to it, for example), that filter
>>>>> process is simply buggy.  I think we want users to become aware of
>>>>> that, instead of Git leaving it behind, which essentially is to
>>>>> sweep the problem under the rug.
>>> 
>>> Well, it would be good to tell users _why_ Git is hanging, see below.
>> 
>> Agreed. Do you think it is OK to write the message to stderr?
> 
> On the other hand, this is why GIT_TRACE (and GIT_TRACE_PERFORMANCE)
> was invented for.  We do not signal troubles with single-shot filters,
> so I guess doing it for multi-file filters is not needed.

I am on the fence with this one.

@Junio/Peff:
Where would you prefer to see a "Waiting for filter 'XYZ'... " message?
On stderr or via GIT_TRACE?


> 
>>>>> I agree with what Peff said elsewhere in the thread; if a filter
>>>>> process wants to take time to clean things up while letting Git
>>>>> proceed, it can do its own process management, but I think it is
>>>>> sensible for Git to wait the filter process it directly spawned.
>>>> 
>>>> To realize the approach above I prototyped the run-command patch below:
>>>> 
>>>> I added an "exit_timeout" variable to the "child_process" struct.
>>>> On exit, Git will close the pipe to the process and wait "exit_timeout" 
>>>> seconds until it kills the child process. If "exit_timeout" is negative
>>>> then Git will wait until the process is done.
>>> 
>>> That might be good approach.  Probably the default would be to wait.
>> 
>> I think I would prefer a 2sec timeout or something as default. This way
>> we can ensure Git would not wait indefinitely for a buggy filter by default.
> 
> Actually this waiting for multi-file filter is only about waiting for
> the shutdown process of the filter.  The filter could still hang during
> processing a file, and git would hang too, if I understand it correctly.

Correct.


>> [...] this function is also used with the async struct... 
> 
> Hmmm... now I wonder if it is a good idea (similar treatment for
> single-file async-invoked filter, and multi-file pkt-line filters).
> 
> For single-file one-shot filter (correct me if I am wrong):
> 
> - git sends contents to filter, signals end with EOF
>   (after process is started)
> - in an async process:
>   - process is started
>   - git reads contents from filter, until EOF
>   - if process did not end, it is killed
> 
> 
> For multi-process pkt-line based filter (simplified):
> 
> - process is started
> - handshake
> - for each file
>   - file is send to filter process over pkt-line,
> end signalled with flush packet
>   - git reads from filter from pkt-line, until flush
> - ...
> 
> 
> See how single-shot filter is sent EOF, though in different part
> of code.  We need to signal multi-file filter that no more files
> will be coming.  Simplest solution is to send EOF (we could send
> "command=shutdown" for example...) to filter, and wait for EOF
> from filter (or for "status=finished" and EOF).

That's what we do. EOF does signal the multi-filter to shutdown.


> For full patch, you would need also to add to Documentation/config.txt

Why config.txt?


Thanks,
Lars

Re: [PATCH v9 04/14] run-command: add wait_on_exit

2016-10-06 Thread Lars Schneider

> On 06 Oct 2016, at 11:32, Johannes Schindelin <johannes.schinde...@gmx.de> 
> wrote:
> 
> Hi Junio & Lars,
> 
> On Wed, 5 Oct 2016, Junio C Hamano wrote:
> 
>> Lars Schneider <larsxschnei...@gmail.com> writes:
>> 
>>> OK. Something like the patch below would work nicely.
>> 
>> Yeah, something along that line; it would eliminate the need to
>> worry about a field named "stdin" ;-)
> 
> Not only a need to worry. Git for Windows' SDK's headers define
> 
>   #define stdin (&__iob_func()[0])
> 
> leading to the compile error
> 
>   In file included from git-compat-util.h:159:0,
> from cache.h:4,
> from run-command.c:1:
>   run-command.c:25:6: error: expected identifier or '(' before '&' token
> int stdin;
> ^
> 
> I meant to investigate this build failure of `pu` earlier but only got
> around to do it today.
> 
> Ciao,
> Dscho

Sorry for the trouble. The "stdin" will go away in the next round
as we agreed on a more generic solution:
http://public-inbox.org/git/1fd7fb64-0f40-47f0-a047-25b91b170...@gmail.com/

- Lars



Re: [PATCH v9 04/14] run-command: add wait_on_exit

2016-10-05 Thread Lars Schneider

> On 04 Oct 2016, at 21:30, Junio C Hamano <gits...@pobox.com> wrote:
> 
> larsxschnei...@gmail.com writes:
> 
>> From: Lars Schneider <larsxschnei...@gmail.com>
>> 
>> The flag 'clean_on_exit' kills child processes spawned by Git on exit.
>> A hard kill like this might not be desired in all cases.
>> 
>> Add 'wait_on_exit' which closes the child's stdin on Git exit and waits
>> until the child process has terminated.
>> 
>> The flag is used in a subsequent patch.
>> 
>> Signed-off-by: Lars Schneider <larsxschnei...@gmail.com>
>> ---
>> ...
>> static void cleanup_children(int sig, int in_signal)
>> {
>> +int status;
>> +struct child_to_clean *p = children_to_clean;
>> +
>> +/* Close the the child's stdin as indicator that Git will exit soon */
>> +while (p) {
>> +if (p->wait)
>> +if (p->stdin > 0)
>> +close(p->stdin);
>> +p = p->next;
>> +}
> 
> This part and the "stdin" field feels a bit too specific to the
> caller you are adding.  Allowing the user of the API to specify what
> clean-up cation needs to be taken in the form of a callback function
> may not be that much more effort and would be more flexible and
> useful, I would imagine?

OK. Something like the patch below would work nicely.
Does this look acceptable?

Thanks,
Lars


diff --git a/run-command.c b/run-command.c
index 3269362..a0256a6 100644
--- a/run-command.c
+++ b/run-command.c
@@ -21,6 +21,7 @@ void child_process_clear(struct child_process *child)
 
 struct child_to_clean {
pid_t pid;
+   void (*clean_on_exit_handler)(pid_t, int);
struct child_to_clean *next;
 };
 static struct child_to_clean *children_to_clean;
@@ -31,6 +32,11 @@ static void cleanup_children(int sig, int in_signal)
while (children_to_clean) {
struct child_to_clean *p = children_to_clean;
children_to_clean = p->next;
+
+   if (p->clean_on_exit_handler) {
+   p->clean_on_exit_handler(p->pid, in_signal);
+   }
+
kill(p->pid, sig);
if (!in_signal)
free(p);
@@ -49,10 +55,11 @@ static void cleanup_children_on_exit(void)
cleanup_children(SIGTERM, 0);
 }
 
-static void mark_child_for_cleanup(pid_t pid)
+static void mark_child_for_cleanup(pid_t pid, void 
(*clean_on_exit_handler)(pid_t, int))
 {
struct child_to_clean *p = xmalloc(sizeof(*p));
p->pid = pid;
+   p->clean_on_exit_handler = clean_on_exit_handler;
p->next = children_to_clean;
children_to_clean = p;
 
@@ -421,8 +428,8 @@ int start_command(struct child_process *cmd)
}
if (cmd->pid < 0)
error_errno("cannot fork() for %s", cmd->argv[0]);
-   else if (cmd->clean_on_exit)
-   mark_child_for_cleanup(cmd->pid);
+   else if (cmd->clean_on_exit || cmd->clean_on_exit_handler)
+   mark_child_for_cleanup(cmd->pid, cmd->clean_on_exit_handler);
 
/*
 * Wait for child's execvp. If the execvp succeeds (or if fork()
@@ -482,8 +489,8 @@ int start_command(struct child_process *cmd)
failed_errno = errno;
if (cmd->pid < 0 && (!cmd->silent_exec_failure || errno != ENOENT))
error_errno("cannot spawn %s", cmd->argv[0]);
-   if (cmd->clean_on_exit && cmd->pid >= 0)
-   mark_child_for_cleanup(cmd->pid);
+   if ((cmd->clean_on_exit || cmd->clean_on_exit_handler) && cmd->pid >= 0)
+   mark_child_for_cleanup(cmd->pid, cmd->clean_on_exit_handler);
 
argv_array_clear();
cmd->argv = sargv;
@@ -765,7 +772,7 @@ int start_async(struct async *async)
exit(!!async->proc(proc_in, proc_out, async->data));
}
 
-   mark_child_for_cleanup(async->pid);
+   mark_child_for_cleanup(async->pid, NULL);
 
if (need_in)
close(fdin[0]);
diff --git a/run-command.h b/run-command.h
index cf29a31..3630733 100644
--- a/run-command.h
+++ b/run-command.h
@@ -43,6 +43,7 @@ struct child_process {
unsigned stdout_to_stderr:1;
unsigned use_shell:1;
unsigned clean_on_exit:1;
+   void (*clean_on_exit_handler)(pid_t, int);
 };
 
 #define CHILD_PROCESS_INIT { NULL, ARGV_ARRAY_INIT, ARGV_ARRAY_INIT }
 


Re: [PATCH v9 09/14] pkt-line: add packet_write_gently()

2016-10-05 Thread Lars Schneider

> On 04 Oct 2016, at 21:33, Junio C Hamano <gits...@pobox.com> wrote:
> 
> larsxschnei...@gmail.com writes:
> 
>> From: Lars Schneider <larsxschnei...@gmail.com>
>> 
>> 
>> +static int packet_write_gently(const int fd_out, const char *buf, size_t 
>> size)
>> +{
>> +static char packet_write_buffer[LARGE_PACKET_MAX];
>> +const size_t packet_size = size + 4;
>> +
>> +if (packet_size > sizeof(packet_write_buffer))
>> +return error("packet write failed - data exceeds max packet 
>> size");
> 
> Hmph, in the previous round, this used to be "is the size larger
> than sizeof(..) - 4?", which avoided integer overflow issue rather
> nicely and more idiomatic.  If size is near the size_t's max,
> packet_size may wrap around to become very small, and we won't hit
> this error, will we?

You are right. Would the solution below be acceptable?
I would like to keep the `packet_size` variable as it eases the rest
of the function.

 
const size_t packet_size = size + 4;
 
-   if (packet_size > sizeof(packet_write_buffer))
+   if (size > sizeof(packet_write_buffer) - 4)
return error("packet write failed - data exceeds max packet 
size");

Thanks,
Lars

Re: [PATCH v9 10/14] pkt-line: add functions to read/write flush terminated packet streams

2016-10-05 Thread Lars Schneider

> On 04 Oct 2016, at 21:53, Junio C Hamano <gits...@pobox.com> wrote:
> 
> larsxschnei...@gmail.com writes:
> 
>> From: Lars Schneider <larsxschnei...@gmail.com>
>> 
> 
>> +int write_packetized_from_buf(const char *src_in, size_t len, int fd_out)
>> +{
>> +static char buf[LARGE_PACKET_DATA_MAX];
>> +int err = 0;
>> +size_t bytes_written = 0;
>> +size_t bytes_to_write;
>> +
>> +while (!err) {
>> +if ((len - bytes_written) > sizeof(buf))
>> +bytes_to_write = sizeof(buf);
>> +else
>> +bytes_to_write = len - bytes_written;
>> +if (bytes_to_write == 0)
>> +break;
>> +err = packet_write_gently(fd_out, src_in + bytes_written, 
>> bytes_to_write);
>> +bytes_written += bytes_to_write;
>> +}
>> +if (!err)
>> +err = packet_flush_gently(fd_out);
>> +return err;
>> +}
> 
> Hmph, what is buf[] used for, other than its sizeof() taken to yield
> a constant LARGE_PACKET_DATA_MAX?

Agreed. This is stupid. I will fix it.


>> 
>> +for (;;) {
>> +strbuf_grow(sb_out, LARGE_PACKET_DATA_MAX);
>> +packet_len = packet_read(fd_in, NULL, NULL,
>> +// TODO: explain + 1
> 
> No // C99 comment please.
> 
> And I agree that the +1 needs to be explained.

Oh. I did not send the very latest version :-(

Is this explanation OK?

+   strbuf_grow(sb_out, LARGE_PACKET_DATA_MAX);
+   packet_len = packet_read(fd_in, NULL, NULL,
+   /* strbuf_grow() above always allocates one extra byte to
+* store a '\0' at the end of the string. packet_read()
+* writes a '\0' extra byte at the end, too. Let it know
+* that there is already room for the extra byte.
+*/
+   sb_out->buf + sb_out->len, LARGE_PACKET_DATA_MAX+1,
+   PACKET_READ_GENTLE_ON_EOF);


Thanks,
Lars

Re: What's cooking in git.git (Oct 2016, #01; Mon, 3)

2016-10-04 Thread Lars Schneider

> On 04 Oct 2016, at 00:31, Junio C Hamano  wrote:
> 
> 
> * ls/filter-process (2016-09-23) 11 commits
> - convert: add filter..process option
> - convert: make apply_filter() adhere to standard Git error handling
> - convert: modernize tests
> - convert: quote filter names in error messages
> - pkt-line: add functions to read/write flush terminated packet streams
> - pkt-line: add packet_write_gently()
> - pkt-line: add packet_flush_gently()
> - pkt-line: add packet_write_fmt_gently()
> - run-command: move check_pipe() from write_or_die to run_command
> - pkt-line: extract set_packet_header()
> - pkt-line: rename packet_write() to packet_write_fmt()
> 
> The smudge/clean filter API expect an external process is spawned
> to filter the contents for each path that has a filter defined.  A
> new type of "process" filter API has been added to allow the first
> request to run the filter for a path to spawn a single process, and
> all filtering need is served by this single process for multiple
> paths, reducing the process creation overhead.
> 
> Somehow I thought this was getting ready for 'next' but it seems
> at least another round of reroll is coming?

I thought so, too, but t0021 was flaky and Jakub had a number of good 
suggestions. I just posted v9:
http://public-inbox.org/git/20161004125947.67104-1-larsxschnei...@gmail.com/

- Lars



Merge conflicts in .gitattributes can cause trouble

2016-10-04 Thread Lars Schneider
Hi,


If there is a conflict in the .gitattributes during a merge then it looks 
like as if the attributes are not applied (which kind of makes sense as Git 
would not know what to do). As a result Git can treat e.g. binary files 
as text and they can end up with changed line endings in the working tree. 
After resolving the conflict in .gitattributes all files would be marked 
as binary, again, and the user can easily commit the wrongly changed line 
endings.

Consider this script on Windows:

$ git init .
$ touch first.commit
$ git add .
$ git commit -m "first commit"

$ git checkout -b branch
$ printf "*.bin binary\n" >> .gitattributes
$ git add .
$ git commit -m "tracking *.bin files"

$ git checkout master
$ printf "binary\ndata\n" > file.dat # <-- Unix line ending!
$ printf "*.dat binary\n" >> .gitattributes # <-- Tell Git to keep Unix line 
ending!
$ git add .
$ git commit -m "tracking *.dat files"
$ git cat-file -p :file.dat | od -c
000   b   i   n   a   r   y  \n   d   a   t   a  \n 
  <-- Correct!
$ git checkout branch
$ git merge master # <-- Causes merge conflict!
$ printf "*.bin binary\n*.dat binary\n" > .gitattributes # <-- Fix merge 
conflict!
$ git add .
$ git commit -m "merged"
$ git cat-file -p :file.dat | od -c
000   b   i   n   a   r   y  \r  \n   d   a   t   a  \r  \n
  <-- Wrong!

Possible solutions:

1. We could print an appropriate warning if we detect a merge conflict 
   in .gitattributes

2. We could disable all line ending conversions in case of a merge conflict
   (I am not exactly sure about all the implications, though)

3. We could salvage what we could of the .gitattributes file, 
   perhaps by using the version from HEAD (or more likely, the ours stage of
   the index) -- suggested by Peff on the related GitHub issue mentioned below

Thoughts?

Thanks,
Lars


PS: I noticed that behavior while working with Git LFS and started a discussion
about it here: https://github.com/github/git-lfs/issues/1544 

Re: [PATCH v8 00/11] Git filter protocol

2016-10-03 Thread Lars Schneider

> On 03 Oct 2016, at 19:02, Junio C Hamano <gits...@pobox.com> wrote:
> 
> Lars Schneider <larsxschnei...@gmail.com> writes:
> 
>>> If the filter process refuses to die forever when Git told it to
>>> shutdown (by closing the pipe to it, for example), that filter
>>> process is simply buggy.  I think we want users to become aware of
>>> that, instead of Git leaving it behind, which essentially is to
>>> sweep the problem under the rug.
>>> 
>>> I agree with what Peff said elsewhere in the thread; if a filter
>>> process wants to take time to clean things up while letting Git
>>> proceed, it can do its own process management, but I think it is
>>> sensible for Git to wait the filter process it directly spawned.
>> 
>> To realize the approach above I prototyped the run-command patch below:
>> 
>> I added an "exit_timeout" variable to the "child_process" struct.
>> On exit, Git will close the pipe to the process and wait "exit_timeout" 
>> seconds until it kills the child process. If "exit_timeout" is negative
>> then Git will wait until the process is done.
> 
>> If we use that in the long running filter process, then we could make
>> the timeout even configurable. E.g. with "filter..process-timeout".
>> 
>> What do you think about this solution? 
> 
> Is such a configuration (or timeout in general) necessary?  I
> suspect that a need for timeout, especially needing timeout and
> needing to get killed that happens so often to require a
> configuration variable, is a sign of something else seriously wrong.
> 
> What's the justification for a filter to _require_ getting killed
> all the time when it is spawned?  Otherwise you wouldn't configure
> "this driver does not die when told, so we need a timeout" variable.
> Is it a sign of the flaw in the protocol to talk to it?  e.g. Git
> has a way to tell it to die, but it somehow is very hard to hear
> from filter's end and honor that request?
> 
> I think that we would need some timeout in the mechanism, but not to
> be used for "killing".
> 
> You would decide to "kill" an filter process in two cases: the
> filter is buggy and refuses to die when Git tells it to exit, or the
> code in Git waiting for its death is somehow miscounting its
> children, and thought it told to die one process but in fact it
> didn't (perhaps it told somebody else to die), or it thought it
> hasn't seen the child die when in fact it already did.

Agreed.


> Calling kill(2) and exiting would hide these two kind of bugs from
> end users.  Not doing so would give the end users a hung Git, which
> is a VERY GOOD thing.  Otherwise you would not notice bugs and lose
> the opportunity to diagnose and fix it.

Aha. I assumed that a hung Git because of a buggy filter would be a no-no.
Thanks for this clarification.


> The timeout would be good for you to give a message "filter process
> running the script '%s' is not exiting; I am waiting for it".  The
> user is still left with a hung Git, and can then see if that process
> is hanging around.  If it is, then we found a buggy filter.  Or we
> found a buggy Git.  Either needs to be fixed.  I do not think it
> would help anybody by doing a kill(2) to sweep possible bugs under
> the rug.

I could achieve that with this run-command patch: 
http://public-inbox.org/git/e9946e9f-6ee5-492b-b122-9078ceb88...@gmail.com/
(I'll remove the "timeout after x seconds" parts and keep the "wait until 
done" part with stderr output)


Thanks,
Lars

Re: [PATCH v8 00/11] Git filter protocol

2016-10-03 Thread Lars Schneider

> On 01 Oct 2016, at 22:48, Jakub Narębski <jna...@gmail.com> wrote:
> 
> W dniu 01.10.2016 o 20:59, Lars Schneider pisze: 
>> On 29 Sep 2016, at 23:27, Junio C Hamano <gits...@pobox.com> wrote:
>>> Lars Schneider <larsxschnei...@gmail.com> writes:
>>> 
>>> If the filter process refuses to die forever when Git told it to
>>> shutdown (by closing the pipe to it, for example), that filter
>>> process is simply buggy.  I think we want users to become aware of
>>> that, instead of Git leaving it behind, which essentially is to
>>> sweep the problem under the rug.
> 
> Well, it would be good to tell users _why_ Git is hanging, see below.

Agreed. Do you think it is OK to write the message to stderr?


>>> I agree with what Peff said elsewhere in the thread; if a filter
>>> process wants to take time to clean things up while letting Git
>>> proceed, it can do its own process management, but I think it is
>>> sensible for Git to wait the filter process it directly spawned.
>> 
>> To realize the approach above I prototyped the run-command patch below:
>> 
>> I added an "exit_timeout" variable to the "child_process" struct.
>> On exit, Git will close the pipe to the process and wait "exit_timeout" 
>> seconds until it kills the child process. If "exit_timeout" is negative
>> then Git will wait until the process is done.
> 
> That might be good approach.  Probably the default would be to wait.

I think I would prefer a 2sec timeout or something as default. This way
we can ensure Git would not wait indefinitely for a buggy filter by default.


>> If we use that in the long running filter process, then we could make
>> the timeout even configurable. E.g. with "filter..process-timeout".
> 
> Sidenote: we prefer camelCase rather than kebab-case for config
> variables, that is, "filter..processTimeout".

Sure!


> Also, how would one set default value of timeout for all process
> based filters?

I think we don't need that because a timeout is always specific
to a filter (if the 2sec default is not sufficient).


>> 
>> +while ((waitpid(p->pid, , 0)) < 0 && errno == 
>> EINTR)
>> +;   /* nothing */
> 
> Ah, this loop is here because waiting on waitpid() can be interrupted
> by the delivery of a signal to the calling process; though the result
> is -1, not just any < 0.

"< 0" is also used in wait_or_whine()


>> +while (getpgid(p->pid) >= 0 && tv.tv_sec - secs < 
>> p->timeout) {
>> +gettimeofday(, NULL);
>> +}
> 
> WTF?  Busy wait loop???

This was just a quick prototype to show "my thinking direction". 
I wasn't expecting a review. Sorry :-)


> Also, if we want to wait for child without blocking, then instead
> of cryptic getpgid(p->pid) maybe use waitpid(p->pid, , WNOHANG);
> it is more explicit.

Sure!


> There is also another complication: there can be more than one
> long-running filter driver used.  With this implementation we
> wait for each of one in sequence, e.g. 10s + 10s + 10s.

Good idea, I fixed that in the version below!


>> 
>> -static void mark_child_for_cleanup(pid_t pid)
>> +static void mark_child_for_cleanup(pid_t pid, int timeout, int stdin)
> 
> H... three parameters is not too much, but we might want to
> pass "struct child_process *" directly if this number grows.

I used parameters because this function is also used with the async struct... 

I've attached a more serious patch for review below.
What do you think?

Thanks,
Lars



diff --git a/run-command.c b/run-command.c
index 3269362..ca0feef 100644
--- a/run-command.c
+++ b/run-command.c
@@ -21,6 +21,9 @@ void child_process_clear(struct child_process *child)
 
 struct child_to_clean {
pid_t pid;
+   char *name;
+   int stdin;
+   int timeout;
struct child_to_clean *next;
 };
 static struct child_to_clean *children_to_clean;
@@ -28,12 +31,53 @@ static int installed_child_cleanup_handler;
 
 static void cleanup_children(int sig, int in_signal)
 {
+   int status;
+   struct timeval tv;
+   time_t secs;
+   struct child_to_clean *p = children_to_clean;
+
+   // Send EOF to children as indicator that Git will exit soon
+   while (p) {
+   if (p->timeout != 0) {
+   if (p->stdin > 0)
+   close(p->stdin);
+   }
+   p = p->next;
+   }
+
while (children_to_clean) {
-

Re: [PATCH v8 00/11] Git filter protocol

2016-10-01 Thread Lars Schneider

> On 29 Sep 2016, at 23:27, Junio C Hamano <gits...@pobox.com> wrote:
> 
> Lars Schneider <larsxschnei...@gmail.com> writes:
> 
>> We discussed that issue in v4 and v6:
>> http://public-inbox.org/git/20160803225313.pk3tfe5ovz4y3...@sigill.intra.peff.net/
>> http://public-inbox.org/git/xmqqbn0a3wy3@gitster.mtv.corp.google.com/
>> 
>> My impression was that you don't want Git to wait for the filter process.
>> If Git waits for the filter process - how long should Git wait?
> 
> I am not sure where you got that impression.  I did say that I do
> not want Git to _KILL_ my filter process.  That does not mean I want
> Git to go away without waiting for me.
> 
> If the filter process refuses to die forever when Git told it to
> shutdown (by closing the pipe to it, for example), that filter
> process is simply buggy.  I think we want users to become aware of
> that, instead of Git leaving it behind, which essentially is to
> sweep the problem under the rug.
> 
> I agree with what Peff said elsewhere in the thread; if a filter
> process wants to take time to clean things up while letting Git
> proceed, it can do its own process management, but I think it is
> sensible for Git to wait the filter process it directly spawned.

To realize the approach above I prototyped the run-command patch below:

I added an "exit_timeout" variable to the "child_process" struct.
On exit, Git will close the pipe to the process and wait "exit_timeout" 
seconds until it kills the child process. If "exit_timeout" is negative
then Git will wait until the process is done.

If we use that in the long running filter process, then we could make
the timeout even configurable. E.g. with "filter..process-timeout".

What do you think about this solution? 

Thanks,
Lars



diff --git a/run-command.c b/run-command.c
index 3269362..a933066 100644
--- a/run-command.c
+++ b/run-command.c
@@ -21,6 +21,8 @@ void child_process_clear(struct child_process *child)
 
 struct child_to_clean {
pid_t pid;
+   int stdin;
+   int timeout;
struct child_to_clean *next;
 };
 static struct child_to_clean *children_to_clean;
@@ -28,9 +30,30 @@ static int installed_child_cleanup_handler;
 
 static void cleanup_children(int sig, int in_signal)
 {
+   int status;
+   struct timeval tv;
+   time_t secs;
+
while (children_to_clean) {
struct child_to_clean *p = children_to_clean;
children_to_clean = p->next;
+
+   if (p->timeout != 0 && p->stdin > 0)
+   close(p->stdin);
+
+   if (p->timeout < 0) {
+   // Wait until the process finishes
+   while ((waitpid(p->pid, , 0)) < 0 && errno == 
EINTR)
+   ;   /* nothing */
+   } else if (p->timeout != 0) {
+   // Wait until the process finishes or timeout
+   gettimeofday(, NULL);
+   secs = tv.tv_sec;
+   while (getpgid(p->pid) >= 0 && tv.tv_sec - secs < 
p->timeout) {
+   gettimeofday(, NULL);
+   }
+   }
+
kill(p->pid, sig);
if (!in_signal)
free(p);
@@ -49,10 +72,12 @@ static void cleanup_children_on_exit(void)
cleanup_children(SIGTERM, 0);
 }
 
-static void mark_child_for_cleanup(pid_t pid)
+static void mark_child_for_cleanup(pid_t pid, int timeout, int stdin)
 {
struct child_to_clean *p = xmalloc(sizeof(*p));
p->pid = pid;
+   p->stdin = stdin;
+   p->timeout = timeout;
p->next = children_to_clean;
children_to_clean = p;
 
@@ -422,7 +447,7 @@ int start_command(struct child_process *cmd)
if (cmd->pid < 0)
error_errno("cannot fork() for %s", cmd->argv[0]);
else if (cmd->clean_on_exit)
-   mark_child_for_cleanup(cmd->pid);
+   mark_child_for_cleanup(cmd->pid, cmd->exit_timeout, cmd->in);
 
/*
 * Wait for child's execvp. If the execvp succeeds (or if fork()
@@ -483,7 +508,7 @@ int start_command(struct child_process *cmd)
if (cmd->pid < 0 && (!cmd->silent_exec_failure || errno != ENOENT))
error_errno("cannot spawn %s", cmd->argv[0]);
if (cmd->clean_on_exit && cmd->pid >= 0)
-   mark_child_for_cleanup(cmd->pid);
+   mark_child_for_cleanup(cmd->pid, cmd->exit_timeout, cmd->in);
 
argv_array_clear();
cmd->argv = sargv;
@@ -765,7 +790,7 @@ int start_async(struct async *async)
exit(!!async->pro

Re: [PATCH v8 11/11] convert: add filter..process option

2016-10-01 Thread Lars Schneider

> On 29 Sep 2016, at 01:14, Jakub Narębski  wrote:
> 
> Part third (and last) of the review of v8 11/11.
> 
> W dniu 20.09.2016 o 21:02, larsxschnei...@gmail.com napisał:
> 
> 
>> @@ -31,7 +31,10 @@ test_expect_success setup '
>>  cat test >test.i &&
>>  git add test test.t test.i &&
>>  rm -f test test.t test.i &&
>> -git checkout -- test test.t test.i
>> +git checkout -- test test.t test.i &&
>> +
>> +echo "content-test2" >test2.o &&
>> +echo "content-test3 - subdir" >"test3 - subdir.o"
> 
> I see that you prepare here a few uncommitted files, but both
> their names and their contents leave much to be desired - you
> don't know from the name and contents what they are for.
> 
> And the '"subdir"' file which is not in subdirectory is
> especially egregious.

These are 3 files with somewhat random test content. I renamed
"subdir" to "spaces".


>> +check_filter () {
>> +rm -f rot13-filter.log actual.log &&
>> +"$@" 2> git_stderr.log &&
>> +test_must_be_empty git_stderr.log &&
>> +cat >expected.log &&
> 
> This is too clever by half.  Having a function that both tests
> the behavior and prepares 'expected' file is too much.
> 
> In my opinion preparation of 'expected.log' file should be moved
> to another function or functions.
> 
> Also, if we are running sort on output, I think we should also
> run sort on 'expected.log', so that what we write doesn't need to
> be created sorted (so we don't have to sort expected lines by hand).
> Or maybe we should run the same transformation on rot13-filter.log
> and on the contents of expected.log.

Agreed. Very good suggestion!


>> +check_filter_ignore_clean () {
>> +rm -f rot13-filter.log actual.log &&
>> +"$@" &&
> 
> Why we don't check for stderr here?

Because this function is used by "git checkout" which writes all
kinds of stuff to stderr. I added "--quiet --no-progress" to
disable this behavior. 


>> +check_rot13 () {
>> +test_cmp "$1" "$2" &&
>> +./../rot13.sh <"$1" >expected &&
> 
> Why there is .. in this invocation?

Because this script is located in the root of the current test directory.


>> +git cat-file blob :"$2" >actual &&
>> +test_cmp expected actual
>> +}
>> +
>> +test_expect_success PERL 'required process filter should filter data' '
>> +test_config_global filter.protocol.process 
>> "$TEST_DIRECTORY/t0021/rot13-filter.pl clean smudge" &&
>> +test_config_global filter.protocol.required true &&
>> +rm -rf repo &&
>> +mkdir repo &&
>> +(
>> +cd repo &&
>> +git init &&
> 
> Don't you think that creating a fresh test repository for each
> separate test is a bit too much?  I guess that you want for
> each and every test to be completely independent, but this setup
> and teardown is a bit excessive.
> 
> Other tests in the same file (should we reuse the test, or use
> new test file) do not use this method.

I see your point. However, I am always annoyed if Git tests are
entangled because it makes working with them way way harder.
This test test runs in 4.5s on a slow Travis CI machine. I think
that is OK considering that we have tests running 3.5min (t3404).


>> +echo "*.r filter=protocol" >.gitattributes &&
>> +git add . &&
>> +git commit . -m "test commit" &&
>> +git branch empty &&
> 
> Err... I think it would be better to name it 'empty-branch'
> (or 'almost-empty-branch', as it does include .gitattributes file).
> See my mistake below (marked ...).

"empty-branch". OK


>> +
>> +cp ../test.o test.r &&
>> +cp ../test2.o test2.r &&
> 
> What does this test2.o / test2.r file tests, that test.o / test.r
> doesn't?  The name doesn't tell us.

This just tests multiple files with different content.


> Why it is test.r, but test2.r?  Why it isn't test1.r?

test.r already existed (created in setup test).


>> +mkdir testsubdir &&
>> +cp "../test3 - subdir.o" "testsubdir/test3 - subdir.r" &&
> 
> Why it needs to have different contents?

To check that the filer does the right thing with multiple files
and contents.



>> +>test4-empty.r &&
> 
> You test ordinary file, file in subdirectory, file with filename
> containing spaces, and an empty file.
> 
> Other tests of single file `clean`/`smudge` filters use filename
> that requires mangling; maybe we should use similar file?
> 
>special="name  with '\''sq'\'' and \$x" &&
>echo some test text >"$special" &&

OK.


> In case of `process` filter, a special filename could look like
> this:
> 
>process_special="name=with equals and\nembedded newlines\n" &&
>echo some test text >"$process_special" &&

I think this test would create trouble on Windows. I'll stick to
the special characters used in the single shot filter.


>> +<<-\EOF &&
>> +1 IN: clean test.r 57 [OK] -- OUT: 57 . 

Re: [PATCH v8 11/11] convert: add filter..process option

2016-09-30 Thread Lars Schneider

> On 27 Sep 2016, at 17:37, Jakub Narębski  wrote:
> 
> Part second of the review of 11/11.
> 
> W dniu 20.09.2016 o 21:02, larsxschnei...@gmail.com pisze:
> 
>> +
>> +if (!drv->process && (CAP_CLEAN & wanted_capability) && drv->clean)
> 
> This is just a very minor nitpicking, but wouldn't it be easier
> to read with those checks reordered?
> 
>  +if ((wanted_capability & CAP_CLEAN) && !drv->process && drv->clean)

OK


>> +
>> +if (start_command(process)) {
>> +error("cannot fork to run external filter '%s'", cmd);
>> +kill_multi_file_filter(hashmap, entry);
>> +return NULL;
>> +}
> 
> I guess there is a reason why we init hashmap entry, try to start
> external process, then kill entry of unable to start, instead of
> trying to start external process, and adding hashmap entry when
> we succeed?

Yes. This way I can reuse the kill_multi_file_filter() function.


>> +
>> +sigchain_push(SIGPIPE, SIG_IGN);
> 
> I guess that this is here to handle errors writing to filter
> by ourself, isn't it?

Yes.


>> +error("external filter '%s' does not support long running 
>> filter protocol", cmd);
> 
> We could have described the error here better.
> 
>  +error("external filter '%s' does not support filter protocol 
> version 2", cmd);

OK


>> +static void read_multi_file_filter_values(int fd, struct strbuf *status) {
> 
> This is more
> 
>  +static void read_multi_file_filter_status(int fd, struct strbuf *status) {
> 
> It doesn't read arbitrary values, it examines 'metadata' from
> filter for "status=" lines.

True!


>> +if (pair[0] && pair[0]->len && pair[1]) {
>> +if (!strcmp(pair[0]->buf, "status=")) {
>> +strbuf_reset(status);
>> +strbuf_addbuf(status, pair[1]);
>> +}
> 
> So it is last status= line wins behavior?

Correct.


> 
>> +}
> 
> Shouldn't we free 'struct strbuf **pair', maybe allocated by the
> strbuf_split_str() function, and reset to NULL?

True. strbuf_list_free() should be enough.


>> 
>> +fflush(NULL);
> 
> Why this fflush(NULL) is needed here?

This flushes all open output streams. The single filter does the same.


>> 
>> +if (fd >= 0 && !src) {
>> +if (fstat(fd, _stat) == -1)
>> +return 0;
>> +len = xsize_t(file_stat.st_size);
>> +}
> 
> Errr... is it necessary?  The protocol no longer provides size=
> hint, and neither uses such hint if provided.

We require the size in write_packetized_from_buf() later.


>> +
>> +err = strlen(filter_type) > PKTLINE_DATA_MAXLEN;
>> +if (err)
>> +goto done;
> 
> Errr... this should never happen.  We control which capabilities
> we pass, it can be only "clean" or "smudge", nothing else. Those
> would always be shorter than PKTLINE_DATA_MAXLEN.
> 
> Never mind that that is "command=smudge\n" etc. that needs to
> be shorter that PKTLINE_DATA_MAXLEN!
> 
> So, IMHO it should be at most assert, and needs to be corrected
> anyway.

OK!


> This should never happen, PATH_MAX everywhere is much shorter
> than PKTLINE_DATA_MAXLEN / LARGE_PACKET_MAX.  Or is it?
> 
> Anyway, we should probably explain or warn
> 
>   error("path name too long: '%s'", path);

OK


>> +/*
>> + * Something went wrong with the protocol filter.
>> + * Force shutdown and restart if another blob requires 
>> filtering!
> 
> Is this exclamation mark '!' here necessary?
> 

No.


Thanks,
Lars



Re: [PATCH v8 11/11] convert: add filter..process option

2016-09-30 Thread Lars Schneider

> On 27 Sep 2016, at 00:41, Jakub Narębski <jna...@gmail.com> wrote:
> 
> Part first of the review of 11/11.
> 
> W dniu 20.09.2016 o 21:02, larsxschnei...@gmail.com pisze:
>> From: Lars Schneider <larsxschnei...@gmail.com>
>> 
>> diff --git a/Documentation/gitattributes.txt 
>> b/Documentation/gitattributes.txt
>> index 7aff940..946dcad 100644
>> --- a/Documentation/gitattributes.txt
>> +++ b/Documentation/gitattributes.txt
>> @@ -293,7 +293,13 @@ checkout, when the `smudge` command is specified, the 
>> command is
>> fed the blob object from its standard input, and its standard
>> output is used to update the worktree file.  Similarly, the
>> `clean` command is used to convert the contents of worktree file
>> -upon checkin.
>> +upon checkin. By default these commands process only a single
>> +blob and terminate.  If a long running `process` filter is used
>   
> 
> Should we use this terminology here?  I have not read the preceding
> part of documentation, so I don't know if it talks about "blobs" or
> if it uses "files" and/or "file contents".

I used that because it was used in the paragraph above already.


>> +Long Running Filter Process
>> +^^^
>> +
>> +If the filter command (a string value) is defined via
>> +`filter..process` then Git can process all blobs with a
>> +single filter invocation for the entire life of a single Git
>> +command. This is achieved by using a packet format (pkt-line,
>> +see technical/protocol-common.txt) based protocol over standard
>> +input and standard output as follows. All packets are considered
>> +text and therefore are terminated by an LF. Exceptions are the
>> +"*CONTENT" packets and the flush packet.
> 
> I guess that reasoning here is that all but CONTENT packets are
> metadata, and thus to aid debuggability of the protocol are "text",
> as considered by pkt-line.
> 
> Perhaps a bit more readable would be the following (but current is
> just fine; I am nitpicking):
> 
>  All packets, except for the "{star}CONTENT" packets and the ""
>  flush packer, are considered text and therefore are terminated by
>  a LF.

OK, I use that!


> I think it might be a good idea to describe what flush packet is
> somewhere in this document; on the other hand referring (especially
> if hyperlinked) to pkt-line technical documentation might be good
> enough / better.  I'm unsure, but I tend on the side that referring
> to technical documentation is better.

I have this line in the first paragraph of the Long Running Filter process:
"packet format (pkt-line, see technical/protocol-common.txt) based protocol"

> 
>> +to read a welcome response message ("git-filter-server") and exactly
>> +one protocol version number from the previously sent list. All further
> 
> I guess that is to provide forward-compatibility, isn't it?  Also,
> "Git expects..." probably means filter process MUST send, in the
> RFC2119 (https://tools.ietf.org/html/rfc2119) meaning.

True. I feel "expects" reads better but I am happy to change it if
you feel strong about it.


>> +
>> +After the version negotiation Git sends a list of supported capabilities
>> +and a flush packet.
> 
> Is it that Git SHOULD send list of ALL supported capabilities, or is
> it that Git SHOULD NOT send capabilities it does not support, and that
> it MAY send only those capabilities it needs (so for example if command
> uses only `smudge`, it may not send `clean`, so that filter driver doesn't
> need to initialize data it would not need).

"After the version negotiation Git sends a list of all capabilities that
it supports and a flush packet."

Better?


> I wonder why it is "=true", and not "capability=".
> Is there a case where we would want to send "=false".  Or
> is it to allow configurable / value based capabilities?  Isn't it going
> a bit too far: is there even a hind of an idea for parametrize-able
> capability? YAGNI is a thing...

Peff suggested that format and I think it is OK:
http://public-inbox.org/git/20160803224619.bwtbvmslhuicx...@sigill.intra.peff.net/


> A few new capabilities that we might want to support in the near future
> is "size", "stream", which are options describing how to communicate,
> and "cleanFromFile", "smudgeToFile", which are new types of operations...
> but neither needs any parameter.
> 
> I guess that adding new capabilities doesn't require having to come up
> with the new version of the protocol, isn't it.

Correct.


>&

Re: [PATCH v8 00/11] Git filter protocol

2016-09-29 Thread Lars Schneider

> On 29 Sep 2016, at 18:57, Junio C Hamano  wrote:
> 
> Torsten Bögershausen  writes:
> 
>>> 1) Git exits
>>> 2) The filter process receives EOF and prints "STOP" to the log
>>> 3) t0021 checks the content of the log
>>> 
>>> Sometimes 3 happened before 2 which makes the test fail.
>>> (Example: https://travis-ci.org/git/git/jobs/162660563 )
>>> 
>>> I added a this to wait until the filter process terminates:
>>> 
>>> +wait_for_filter_termination () {
>>> +   while ps | grep -v grep | grep -F "/t0021/rot13-filter.pl" >/dev/null 
>>> 2>&1
>>> +   do
>>> +   echo "Waiting for /t0021/rot13-filter.pl to finish..."
>>> +   sleep 1
>>> +   done
>>> +}
>>> 
>>> Does this look OK to you?
>> Do we need the ps at all ?
>> How about this:
>> 
>> +wait_for_filter_termination () {
>> +while ! grep "STOP"  LOGFILENAME >/dev/null
>> +do
>> +echo "Waiting for /t0021/rot13-filter.pl to finish..."
>> +sleep 1
>> +done
>> +}
> 
> Running "ps" and grepping for a command is not suitable for script
> to reliably tell things, so it is out of question.  Compared to
> that, your version looks slightly better, but what if the machinery
> that being tested, i.e. the part that drives the filter process, is
> buggy or becomes buggy and causes the filter process that writes
> "STOP" to die before it actually writes that string?
> 
> I have a feeling that the machinery being tested needs to be fixed
> so that the sequence is always be:
> 
>0) Git spawns the filter process, as it needs some contents to
>   be filtered.
> 
>1) Git did everything it needed to do and decides that is time
>   to go.
> 
>2) Filter process receives EOF and prints "STOP" to the log.
> 
>3) Git waits until the filter process finishes.
> 
>4) t0021, after Git finishes, checks the log.


A pragmatic approach:

I could drop the "STOP" message that the filter writes to the log
on exit and everything would work as is. We could argue that this 
is OK because Git doesn't care anyways if the filter process has 
stopped or not.

Would that be OK for everyone?

- Lars

Re: [PATCH v8 00/11] Git filter protocol

2016-09-29 Thread Lars Schneider

> On 29 Sep 2016, at 18:57, Junio C Hamano  wrote:
> 
> Torsten Bögershausen  writes:
> 
>>> 1) Git exits
>>> 2) The filter process receives EOF and prints "STOP" to the log
>>> 3) t0021 checks the content of the log
>>> 
>>> Sometimes 3 happened before 2 which makes the test fail.
>>> (Example: https://travis-ci.org/git/git/jobs/162660563 )
>>> 
>>> I added a this to wait until the filter process terminates:
>>> 
>>> +wait_for_filter_termination () {
>>> +   while ps | grep -v grep | grep -F "/t0021/rot13-filter.pl" >/dev/null 
>>> 2>&1
>>> +   do
>>> +   echo "Waiting for /t0021/rot13-filter.pl to finish..."
>>> +   sleep 1
>>> +   done
>>> +}
>>> 
>>> Does this look OK to you?
>> Do we need the ps at all ?
>> How about this:
>> 
>> +wait_for_filter_termination () {
>> +while ! grep "STOP"  LOGFILENAME >/dev/null
>> +do
>> +echo "Waiting for /t0021/rot13-filter.pl to finish..."
>> +sleep 1
>> +done
>> +}
> 
> Running "ps" and grepping for a command is not suitable for script
> to reliably tell things, so it is out of question.  Compared to
> that, your version looks slightly better, but what if the machinery
> that being tested, i.e. the part that drives the filter process, is
> buggy or becomes buggy and causes the filter process that writes
> "STOP" to die before it actually writes that string?
> 
> I have a feeling that the machinery being tested needs to be fixed
> so that the sequence is always be:
> 
>0) Git spawns the filter process, as it needs some contents to
>   be filtered.
> 
>1) Git did everything it needed to do and decides that is time
>   to go.
> 
>2) Filter process receives EOF and prints "STOP" to the log.
> 
>3) Git waits until the filter process finishes.
> 
>4) t0021, after Git finishes, checks the log.
> 
> Repeated sleep combined with grep is probably just sweeping the real
> problem under the rug.  Do we have enough information to do the
> above?
> 
> An inspiration may be in the way we centrally clean all tempfiles
> and lockfiles before exiting.  We have a central registry of these
> files that need cleaning up and have a single atexit(3) handler to
> clean them up.  Perhaps we need a registry that filter processes
> spawned by the mechanism Lars introduces in this series, and have an
> atexit(3) handler that closes the pipe to them (which signals the
> filters that it is time for them to go) and wait(2) on them, or
> something?  I do not think we want any kill(2) to be involved in
> this clean-up procedure, but I do think we should wait(2) on what we
> spawn, as long as these processes are meant to be shut down when the
> main process of Git exits (this is different from things like
> credential-cache daemon where they are expected to persist and meant
> to serve multiple Git processes).

We discussed that issue in v4 and v6:
http://public-inbox.org/git/20160803225313.pk3tfe5ovz4y3...@sigill.intra.peff.net/
http://public-inbox.org/git/xmqqbn0a3wy3@gitster.mtv.corp.google.com/

My impression was that you don't want Git to wait for the filter process.
If Git waits for the filter process - how long should Git wait?

Thanks,
Lars

Re: [PATCH v8 00/11] Git filter protocol

2016-09-29 Thread Lars Schneider

> On 28 Sep 2016, at 23:49, Junio C Hamano  wrote:
> 
> I suspect that you are preparing a reroll already, but the one that
> is sitting in 'pu' seems to be flaky in t/t0021 and I seem to see
> occasional failures from it.
> 
> I didn't trace where the test goes wrong, but one easy mistake you
> could make (I am not saying that is the reason of the failure) is to
> assume your filter will not be called under certain condition (like
> immediately after you checked out from the index to the working
> tree), when the automated test goes fast enough and get you into a
> "racy git" situation---the filter may be asked to filter the
> contents from the working tree again to re-validate what's there is
> still what is in the index.

Thanks for the heads-up! 

This is what happens:

1) Git exits
2) The filter process receives EOF and prints "STOP" to the log
3) t0021 checks the content of the log

Sometimes 3 happened before 2 which makes the test fail.
(Example: https://travis-ci.org/git/git/jobs/162660563 )

I added a this to wait until the filter process terminates:

+wait_for_filter_termination () {
+   while ps | grep -v grep | grep -F "/t0021/rot13-filter.pl" >/dev/null 
2>&1
+   do
+   echo "Waiting for /t0021/rot13-filter.pl to finish..."
+   sleep 1
+   done
+}

Does this look OK to you?

- Lars


Re: [PATCH v8 07/11] pkt-line: add functions to read/write flush terminated packet streams

2016-09-27 Thread Lars Schneider

> On 27 Sep 2016, at 11:00, Jeff King <p...@peff.net> wrote:
> 
> On Tue, Sep 27, 2016 at 10:14:16AM +0200, Lars Schneider wrote:
> 
>>>>> + strbuf_grow(sb_out, PKTLINE_DATA_MAXLEN+1);
>>>>> + paket_len = packet_read(fd_in, NULL, NULL,
>>>>> + sb_out->buf + sb_out->len, PKTLINE_DATA_MAXLEN+1, 
>>>>> options);
>> [...]
>> After looking at it with fresh eyes I think the existing code is probably 
>> correct,
>> but maybe a bit confusing.
>> 
>> packet_read() adds a '\0' at the end of the destination buffer:
>> https://github.com/git/git/blob/21f862b498925194f8f1ebe8203b7a7df756555b/pkt-line.c#L206
>> 
>> That is why the destination buffer needs to be one byte larger than the 
>> expected content.
>> 
>> However, in this particular case that wouldn't be necessary because the 
>> destination
>> buffer is a 'strbuf' that allocates an extra byte for '\0' at the end. But 
>> we are not
>> supposed to write to this extra byte:
>> https://github.com/git/git/blob/21f862b498925194f8f1ebe8203b7a7df756555b/strbuf.h#L25-L31
> 
> Right. The allocation happens as part of strbuf_grow(), but whatever
> fills the buffer is expected to write the actual NUL (either manually,
> or by calling strbuf_setlen().
> 
> I see the bit you quoted warns not to touch the extra byte yourself,
> though I wonder if that is a bit heavy-handed (I guess it would matter
> if we moved the extra 1-byte growth into strbuf_setlen(), but I find
> that a rather unlikely change).
> 
> That being said, why don't you just use LARGE_PACKET_MAX here? It is
> already the accepted size for feeding to packet_read(), and we know it
> has enough space to hold a NUL terminator. Yes, we may over-allocate by
> 4 bytes, but that isn't really relevant. Strbufs over-allocate anyway.

TBH in that case I would prefer the "PKTLINE_DATA_MAXLEN+1" solution with
an additional comment explaining "+1".

Would that be OK for you?

I am not worried about the extra 4 bytes. I am worried that we make it harder
to see what is going on if we use LARGE_PACKET_MAX.


> So just:
> 
>  for (;;) {
>strbuf_grow(sb_out, LARGE_PACKET_MAX);
>packet_len = packet_read(fd_in, NULL, NULL,
>sb_out->buf + sb_out->len, LARGE_PACKET_MAX,
>options);
>if (packet_len <= 0)
>break;
>/*
> * no need for strbuf_setlen() here; packet_read always adds a
> * NUL terminator.
> */
>sb_out->len += packet_len;
>  }
> 
> You _could_ make the final line of the loop use strbuf_setlen(); it
> would just overwrite something we already know is a NUL (and we know
> that no extra allocation is necessary).
> 
> Also, using LARGE_PACKET_MAX fixes the fact that this patch is using
> PKTLINE_DATA_MAXLEN before it is actually defined. :)

Yeah, I noticed that too and fixed it in v9 :-) Thanks for the reminder!


> You might want to occasionally run:
> 
>  git rebase -x make
> 
> to make sure all of your incremental steps are valid (or even "make
> test" if you are extremely patient; I often do that once after a big
> round of complex interactive-rebase reordering).

That is a good suggestion. I'll add that to my "tool-belt" :-)


Thanks,
Lars



Re: [PATCH v8 07/11] pkt-line: add functions to read/write flush terminated packet streams

2016-09-27 Thread Lars Schneider

On 26 Sep 2016, at 22:23, Lars Schneider <larsxschnei...@gmail.com> wrote:

> 
> On 25 Sep 2016, at 15:46, Jakub Narębski <jna...@gmail.com> wrote:
> 
>> W dniu 20.09.2016 o 21:02, larsxschnei...@gmail.com pisze:
>>> From: Lars Schneider <larsxschnei...@gmail.com>
> 
> 
>>> +   strbuf_grow(sb_out, PKTLINE_DATA_MAXLEN+1);
>>> +   paket_len = packet_read(fd_in, NULL, NULL,
>>> +   sb_out->buf + sb_out->len, PKTLINE_DATA_MAXLEN+1, 
>>> options);
>> 
>> A question (which perhaps was answered during the development of this
>> patch series): why is this +1 in PKTLINE_DATA_MAXLEN+1 here?
> 
> Nice catch. I think this is wrong:
> https://github.com/git/git/blob/6fe1b1407ed91823daa5d487abe457ff37463349/pkt-line.c#L196
> 
> It should be "if (len > size)" ... then we don't need the "+1" here.
> (but I need to think a bit more about this)

After looking at it with fresh eyes I think the existing code is probably 
correct,
but maybe a bit confusing.

packet_read() adds a '\0' at the end of the destination buffer:
https://github.com/git/git/blob/21f862b498925194f8f1ebe8203b7a7df756555b/pkt-line.c#L206

That is why the destination buffer needs to be one byte larger than the 
expected content.

However, in this particular case that wouldn't be necessary because the 
destination
buffer is a 'strbuf' that allocates an extra byte for '\0' at the end. But we 
are not
supposed to write to this extra byte:
https://github.com/git/git/blob/21f862b498925194f8f1ebe8203b7a7df756555b/strbuf.h#L25-L31


I see two options:


(1) I leave the +1 as is and add a comment why the extra byte is necessary.

Pro: No change in existing code necessary
Con: The destination buffer has two '\0' at the end.


(2) I add an option PACKET_READ_DISABLE_NUL_TERMINATION. If the option is
set then no '\0' byte is added to the end.

Pro: Correct solution, no byte wasted.
Con: Change in existing code required.


Any preference?


Thanks,
Lars

Re: [PATCH v8 07/11] pkt-line: add functions to read/write flush terminated packet streams

2016-09-26 Thread Lars Schneider

On 25 Sep 2016, at 15:46, Jakub Narębski <jna...@gmail.com> wrote:

> W dniu 20.09.2016 o 21:02, larsxschnei...@gmail.com pisze:
>> From: Lars Schneider <larsxschnei...@gmail.com>


>> +{
>> +static char buf[PKTLINE_DATA_MAXLEN];
> 
> Sidenote: we have LARGE_PACKET_MAX (used in previous patch), but
> PKTLINE_DATA_MAXLEN not LARGE_PACKET_DATA_MAX.

Agreed, I will rename it.


> 
>> +int err = 0;
>> +ssize_t bytes_to_write;
>> +
>> +while (!err) {
>> +bytes_to_write = xread(fd_in, buf, sizeof(buf));
>> +if (bytes_to_write < 0)
>> +return COPY_READ_ERROR;
>> +if (bytes_to_write == 0)
>> +break;
>> +err = packet_write_gently(fd_out, buf, bytes_to_write);
>> +}
>> +if (!err)
>> +err = packet_flush_gently(fd_out);
>> +return err;
>> +}
> 
> Looks good: clean and readable.
> 
> Sidenote (probably outside of scope of this patch): what are the
> errors that we can get from this function, beside COPY_READ_ERROR
> of course?

Everything that is returned by "read()"


>> +
>> static int get_packet_data(int fd, char **src_buf, size_t *src_size,
>> void *dst, unsigned size, int options)
>> {
>> @@ -305,3 +346,30 @@ char *packet_read_line_buf(char **src, size_t *src_len, 
>> int *dst_len)
>> {
>>  return packet_read_line_generic(-1, src, src_len, dst_len);
>> }
>> +
>> +ssize_t read_packetized_to_buf(int fd_in, struct strbuf *sb_out)
> 
> It's a bit strange that the signature of write_packetized_from_buf() is
> that different from read_packetized_to_buf().  This includes the return
> value: int vs ssize_t.  As I have checked, write() and read() both
> use ssize_t, while fread() and fwrite() both use size_t.

read_packetized_to_buf() returns the number of bytes read or a negative 
error code.

write_packetized_from_buf() returns 0 if the call was successful and an 
error code if not.

That's the reason these two functions have a different signature


> Perhaps this function should be named read_packetized_to_strbuf()
> (err, I asked this already)?

I agree with the rename as makes it distinct from 
write_packetized_from_buf().


>> +{
>> +int paket_len;
> 
> Possible typo: shouldn't it be called packet_len?

True!


> Shouldn't it be initialized to 0?

Well, it is set for sure later. That's why I think it is not necessary.

Plus, Eric Wong thought me not to:
"Compilers complain about uninitialized variables."
http://public-inbox.org/git/20160725072745.GB11634@starla/
(Note: he was talking about pointers there :-)


>> +int options = PACKET_READ_GENTLE_ON_EOF;
> 
> Why is this even a local variable?  It is never changed, and it is
> used only in one place; we can inline it.

Removed.


>> +
>> +size_t oldlen = sb_out->len;
>> +size_t oldalloc = sb_out->alloc;
> 
> Just a nitpick (feel free to ignore): doesn't this looks better:
> 
>  +size_t old_len   = sb_out->len;
>  +size_t old_alloc = sb_out->alloc;
> 
> Also perhaps s/old_/orig_/g.

Agreed. That matches the other variables better.


>> +strbuf_grow(sb_out, PKTLINE_DATA_MAXLEN+1);
>> +paket_len = packet_read(fd_in, NULL, NULL,
>> +sb_out->buf + sb_out->len, PKTLINE_DATA_MAXLEN+1, 
>> options);
> 
> A question (which perhaps was answered during the development of this
> patch series): why is this +1 in PKTLINE_DATA_MAXLEN+1 here?

Nice catch. I think this is wrong:
https://github.com/git/git/blob/6fe1b1407ed91823daa5d487abe457ff37463349/pkt-line.c#L196

It should be "if (len > size)" ... then we don't need the "+1" here.
(but I need to think a bit more about this)

> 
>> +if (paket_len <= 0)
>> +break;
>> +sb_out->len += paket_len;
>> +}
>> +
>> +if (paket_len < 0) {
>> +if (oldalloc == 0)
>> +strbuf_release(sb_out);
>> +else
>> +strbuf_setlen(sb_out, oldlen);
> 
> A question (maybe I don't understand strbufs): why there is a special
> case for oldalloc == 0?

I tried to mimic the behavior of strbuf_read() [1]. The error handling
was introduced in 2fc647 [2] to ease error handling:

"This allows for easier error handling, as callers only need to call
strbuf_release() if A) the command succeeded or B) if they would have had
to do so anyway because they added something to the strbuf themselves."

[1] 
https://github.com/git/git/blob/cda1bbd474805e653dda8a71d4ea3790e2a66cbb/strbuf.c#L377-L383
[2] https://github.com/git/git/commit/2fc647004ac7016128372a85db8245581e493812


Thanks,
Lars

Re: [PATCH v8 06/11] pkt-line: add packet_write_gently()

2016-09-26 Thread Lars Schneider

On 25 Sep 2016, at 13:26, Jakub Narębski <jna...@gmail.com> wrote:

> W dniu 20.09.2016 o 21:02, larsxschnei...@gmail.com pisze:
>> From: Lars Schneider <larsxschnei...@gmail.com>
>> ...
>> 
>> +static int packet_write_gently(const int fd_out, const char *buf, size_t 
>> size)
> 
> I'm not sure what naming convention the rest of Git uses, but isn't
> it more like '*data' rather than '*buf' here?

pkt-line seems to use 'buf' or 'buffer' for everything else.


>> +{
>> +static char packet_write_buffer[LARGE_PACKET_MAX];
> 
> I think there should be warning (as a comment before function
> declaration, or before function definition), that packet_write_gently()
> is not thread-safe (nor reentrant, but the latter does not matter here,
> I think).
> 
> Thread-safe vs reentrant: http://stackoverflow.com/a/33445858/46058
> 
> This is not something terribly important; I guess git code has tons
> of functions not marked as thread-unsafe...

I agree that the function is not thread-safe. However, I can't find 
an example in the Git source that marks a function as not thread-safe.
Unless is it explicitly stated in the coding guidelines I would prefer
not to start way to mark functions.


>> +if (size > sizeof(packet_write_buffer) - 4) {
> 
> First, wouldn't the following be more readable:
> 
>  +if (size + 4 > LARGE_PACKET_MAX) {

Peff suggested that here:
http://public-inbox.org/git/20160810132814.gqnipsdwyzjmu...@sigill.intra.peff.net/


>> +return error("packet write failed - data exceeds max packet 
>> size");
>> +}
> 
> Second, CodingGuidelines is against using braces (blocks) for one
> line conditionals: "We avoid using braces unnecessarily."
> 
> But this is just me nitpicking.

Fixed.


>> +packet_trace(buf, size, 1);
>> +size += 4;
>> +set_packet_header(packet_write_buffer, size);
>> +memcpy(packet_write_buffer + 4, buf, size - 4);
>> +if (write_in_full(fd_out, packet_write_buffer, size) == size)
> 
> Hmmm... in some places we use original size, in others (original) size + 4;
> perhaps it would be more readable to add a new local temporary variable
> 
>   size_t full_size = size + 4;

Agreed. I introduced "packet_size".

Thanks,
Lars

Re: [PATCH v8 02/11] pkt-line: extract set_packet_header()

2016-09-26 Thread Lars Schneider

On 24 Sep 2016, at 23:22, Jakub Narębski <jna...@gmail.com> wrote:

> W dniu 20.09.2016 o 21:02, larsxschnei...@gmail.com pisze:
> 
>> From: Lars Schneider <larsxschnei...@gmail.com>
>> 
>> Subject: [PATCH v8 02/11] pkt-line: extract set_packet_header()
>> 
>> set_packet_header() converts an integer to a 4 byte hex string. Make
>> this function locally available so that other pkt-line functions can
>> use it.
> 
> Ah. I have trouble understanding this commit message, as the
> set_packet_header() was not available before this patch, but it
> is good if one reads it together with commit summary / title.
> 
> Writing
> 
>  Extracted set_packet_header() function converts...
> 
> or
> 
>  New set_packet_header() function converts... 
> 
> would make it more clear, but it is all right as it is now.
> Perhaps also
> 
>  ... could use it.
> 
> as currently no other pkt-line function but the one set_packet_header()
> was extracted from, namely format_packet(), uses it.
> 
> But that is just nitpicking; no need to change on that account.

Changed it:

Extracted set_packet_header() function converts an integer to a 4 byte 
hex string. Make this function locally available so that other pkt-line 
functions could use it.

Thanks,
Lars

Re: [PATCH v8 01/11] pkt-line: rename packet_write() to packet_write_fmt()

2016-09-26 Thread Lars Schneider

On 24 Sep 2016, at 23:14, Jakub Narębski <jna...@gmail.com> wrote:

> Hello Lars,
> 
> W dniu 20.09.2016 o 21:02, larsxschnei...@gmail.com pisze:
> 
>> From: Lars Schneider <larsxschnei...@gmail.com>
>> 
>> packet_write() should be called packet_write_fmt() as the string
>> parameter can be formatted.
> 
> I would say:
> 
>  packet_write() should be called packet_write_fmt() because it
>  is printf-like function where first parameter is format string.
> 
> Or something like that.  But such minor change might be not worth
> yet another reroll of this patch series.
> 
> Perhaps it would be a good idea to explain the reasoning behind
> this change:
> 
>  This is important distinction to know from the name if the
>  function accepts arbitrary binary data and/or arbitrary
>  strings to be written - packet_write[_fmt()] do not.

packet_write() should be called packet_write_fmt() because it is a
printf-like function that takes a format string as first parameter.

packet_write_fmt() should be used for text strings only. Arbitrary
binary data should use a new packet_write() function that is introduced
in a subsequent patch.

Better?


>> pkt-line.h   |  2 +-
>> shallow.c|  2 +-
>> upload-pack.c| 30 +++---
>> 11 files changed, 29 insertions(+), 29 deletions(-)
> 
> Diffstat looks correct.  Was the patch generated by doing search
> and replace?

Yes.

- Lars

Re: [PATCH v8 03/11] run-command: move check_pipe() from write_or_die to run_command

2016-09-26 Thread Lars Schneider

> On 25 Sep 2016, at 00:12, Jakub Narębski <jna...@gmail.com> wrote:
> 
> W dniu 20.09.2016 o 21:02, larsxschnei...@gmail.com pisze:
>> From: Lars Schneider <larsxschnei...@gmail.com>
>> 
>> Move check_pipe() to run_command and make it public. This is necessary
>> to call the function from pkt-line in a subsequent patch.
> 
> All right.

Does this mean I can add your "Acked-by: Jakub Narebski <jna...@gmail.com>" ?

Thanks,
Lars

Re: [PATCH] run-command: async_exit no longer needs to be public

2016-09-23 Thread Lars Schneider

> On 23 Sep 2016, at 19:13, Junio C Hamano <gits...@pobox.com> wrote:
> 
> Lars Schneider <larsxschnei...@gmail.com> writes:
> 
>>> If you need to re-roll your 'ls/filter-process' branch, could you please
>>> squash this into the relevant commit c42a4cbc ("run-command: move 
>>> check_pipe()
>>> from write_or_die to run_command", 20-09-2016).
>>> 
>>> [Note that commit 9658846c ("write_or_die: handle EPIPE in async threads",
>>> 24-02-2016) introduced async_exit() specifically for use in the 
>>> implementation
>>> of check_pipe(). Now that you have moved check_pipe() into run-command.c,
>>> it no longer needs to be public.]
>> 
>> Hi Ramsay,
>> 
>> thanks for noticing this. I actually hope that I don't need another re-roll 
>> :-)
>> If I don't re-roll. Should I make a patch with this cleanup or do you
>> take care of it?
> 
> I can just squash the the patch you are responding to into c42a4cbc,
> with an additional paragraph "While at it, retire async_exit() as a
> public function as it no longer is called outside run-command API
> implementation", or something like that.
> 
> I do not offhand know if the topic is otherwise ready as-is, or
> needs further work.  When you need to reroll, you'd also need to
> fetch from the result of the above from me first and then start your
> work from it, though, if we go that route.

Sounds good to me!

Thank you, Junio!

Re: [PATCH] run-command: async_exit no longer needs to be public

2016-09-23 Thread Lars Schneider

> On 22 Sep 2016, at 18:56, Ramsay Jones  wrote:
> 
> 
> Signed-off-by: Ramsay Jones 
> ---
> 
> Hi Lars,
> 
> If you need to re-roll your 'ls/filter-process' branch, could you please
> squash this into the relevant commit c42a4cbc ("run-command: move check_pipe()
> from write_or_die to run_command", 20-09-2016).
> 
> [Note that commit 9658846c ("write_or_die: handle EPIPE in async threads",
> 24-02-2016) introduced async_exit() specifically for use in the implementation
> of check_pipe(). Now that you have moved check_pipe() into run-command.c,
> it no longer needs to be public.]

Hi Ramsay,

thanks for noticing this. I actually hope that I don't need another re-roll :-)
If I don't re-roll. Should I make a patch with this cleanup or do you
take care of it?

Thanks,
Lars



Re: [PATCH v1] travis-ci: ask homebrew for the its path instead of hardcoding it

2016-09-22 Thread Lars Schneider

> On 21 Sep 2016, at 18:42, Junio C Hamano <gits...@pobox.com> wrote:
> 
> Lars Schneider <larsxschnei...@gmail.com> writes:
> 
>>> On 21 Sep 2016, at 11:31, stefan.na...@atlas-elektronik.com wrote:
>>> 
>>> In the Subject: s/the //
>>> 
>>> Am 21.09.2016 um 10:45 schrieb larsxschnei...@gmail.com:
>>>> From: Lars Schneider <larsxschnei...@gmail.com>
>>>> 
>>>> The TravisCI macOS build is broken because homebrew (a macOS depedency
>>> 
>>> s/depedency/dependency/
>> 
>> Thanks for spotting both errors!
>> 
>> @Junio: Should I make a v2?
> 
> No.  osx before_install stuff was in there since the very beginning,
> i.e. 522354d7 ("Add Travis CI support", 2015-11-27), so I guess this
> needs to go to maint-2.7 and upwards, but I guess we should discourage
> people to stay on an older maintenance track forever, so let's do
> this only for 'maint' and upwards.

Sound good to me.

Thank you,
Lars


Minor nit: 
If still possible and no trouble for you please do
`s/the //` on the subject line of 72fa5cd29f2a9249462215109dbf41b4a6c0c768 (in 
PU)



Re: [PATCH v1] travis-ci: ask homebrew for the its path instead of hardcoding it

2016-09-21 Thread Lars Schneider

> On 21 Sep 2016, at 11:31, stefan.na...@atlas-elektronik.com wrote:
> 
> In the Subject: s/the //
> 
> Am 21.09.2016 um 10:45 schrieb larsxschnei...@gmail.com:
>> From: Lars Schneider <larsxschnei...@gmail.com>
>> 
>> The TravisCI macOS build is broken because homebrew (a macOS depedency
> 
> s/depedency/dependency/

Thanks for spotting both errors!

@Junio: Should I make a v2?

Thanks,
Lars


Re: [PATCH v2 1/1] git-p4: Add --checkpoint-period option to sync/clone

2016-09-16 Thread Lars Schneider

> On 15 Sep 2016, at 23:17, Ori Rawlings  wrote:
> 
> Importing a long history from Perforce into git using the git-p4 tool
> can be especially challenging. The `git p4 clone` operation is based
> on an all-or-nothing transactionality guarantee. Under real-world
> conditions like network unreliability or a busy Perforce server,
> `git p4 clone` and  `git p4 sync` operations can easily fail, forcing a
> user to restart the import process from the beginning. The longer the
> history being imported, the more likely a fault occurs during the
> process. Long enough imports thus become statistically unlikely to ever
> succeed.
> 
> The underlying git fast-import protocol supports an explicit checkpoint
> command. The idea here is to optionally allow the user to force an
> explicit checkpoint every  seconds. If the sync/clone operation fails
> branches are left updated at the appropriate commit available during the
> latest checkpoint. This allows a user to resume importing Perforce
> history while only having to repeat at most approximately  seconds
> worth of import activity.

This looks interesting! I ran into the same issue and added a parameter to the 
p4 commands to retry (patch not yet proposed to the mailing list):
https://github.com/autodesk-forks/git/commit/fcfc96a7814935ee6cefb9d69e44def30a90eabb

Would it make sense to print the "git-p4 resume command" in case an error 
happens and checkpoints are written?

Cheers,
Lars

Re: [PATCH v7 04/10] pkt-line: add packet_flush_gently()

2016-09-15 Thread Lars Schneider

> On 15 Sep 2016, at 21:44, Jeff King <p...@peff.net> wrote:
> 
> On Thu, Sep 15, 2016 at 05:42:58PM +0100, Lars Schneider wrote:
> 
>>>>>> +int packet_flush_gently(int fd)
>>>>>> +{
>>>>>> +packet_trace("", 4, 1);
>>>>>> +if (write_in_full(fd, "", 4) == 4)
>>>>>> +return 0;
>>>>>> +error("flush packet write failed");
>>>>>> +return -1;
>> [...]
>>>>> I suspect that it is a strong sign that the caller wants to be in
>>>>> control of when and what error message is produced; otherwise it
>>>>> wouldn't be calling the _gently() variant, no?
>>>> 
>>>> Agreed!
>>> 
>>> I am also OK with the current form, too.  Those who need to enhance
>>> it to packet_flush_gently(int fd, int quiet) can come later.
>> 
>> "caller wants to be in control [...] otherwise it wouldn't be calling 
>> the _gently() variant" convinced me. I would like to change it like
>> this:
>> 
>>  trace_printf_key(_packet, "flush packet write failed");
>>  return -1;
>> 
>> Objections?
> 
> I'm not sure that a trace makes sense, because it means that 99% of the
> time we are silent. AFAICT, the question is not "sometimes the user
> needs to see an error and sometimes not, and they should decide before
> starting the program". It is "sometimes the caller will report the error
> to the user as appropriate, and sometimes we need to do so". And only
> the calling code knows which is which.
> 
> So the "right" pattern is either:
> 
>  1. Return -1 and the caller is responsible for telling the user.
> 
> or
> 
>  2. Return -1 and stuff the error into an error strbuf, so it can be
> passed up the call chain easily (and callers do not have to come up
> with their own wording).
> 
> But if all current callers would just call error() themselves anyway,
> then it's OK to punt on this and let somebody else handle it later if
> they add a new caller who wants different behavior (and that is what
> Junio was saying above, I think).

OK. I'll go with 1. then.

Thanks,
Lars


Re: [PATCH v7 10/10] convert: add filter..process option

2016-09-15 Thread Lars Schneider

> On 13 Sep 2016, at 17:22, Junio C Hamano  wrote:
> 
> larsxschnei...@gmail.com writes:
> 
>> diff --git a/contrib/long-running-filter/example.pl 
>> b/contrib/long-running-filter/example.pl
>> ...
>> +sub packet_read {
>> +my $buffer;
>> +my $bytes_read = read STDIN, $buffer, 4;
>> +if ( $bytes_read == 0 ) {
>> +
>> +# EOF - Git stopped talking to us!
>> +exit();
>> +...
>> +packet_write( "clean=true\n" );
>> +packet_write( "smudge=true\n" );
>> +packet_flush();
>> +
>> +while (1) {
> 
> These extra SP around the contents inside () pair look unfamiliar
> and somewhat strange to me, but as long as they are consistently
> done (and I think you are mostly being consistent), it is OK.

Ups. I forgot to run PerlTidy here. I run PerlTidy with the flag 
"-pbp" (= Perl Best Practices). This seems to add no extra SP for
functions with one parameter (e.g. `foo("bar")`) and extra SP
for functions with multiple parameter (e.g. `foo( "bar", 1 )`).
Is this still OK?

Does anyone have a "Git PerlTidy configuration"?


> 
>> +#define CAP_CLEAN(1u<<0)
>> +#define CAP_SMUDGE   (1u<<1)
> 
> As these are meant to be usable together, i.e. bits in a single flag
> word, they are of type "unsigned int", which makes perfect sense.
> 
> Make sure your variables and fields that store them are of the same
> type.  I think I saw "int' used to pass them in at least one place.

Fixed!


>> +static int apply_filter(const char *path, const char *src, size_t len,
>> +int fd, struct strbuf *dst, struct convert_driver 
>> *drv,
>> +const int wanted_capability)
>> +{
>> +const char* cmd = NULL;
> 
> "const char *cmd = NULL;" of course.

Fixed!


>> diff --git a/unpack-trees.c b/unpack-trees.c
>> index 11c37fb..f6798f8 100644
>> --- a/unpack-trees.c
>> +++ b/unpack-trees.c
>> @@ -10,6 +10,7 @@
>> #include "attr.h"
>> #include "split-index.h"
>> #include "dir.h"
>> +#include "convert.h"
>> 
>> /*
>>  * Error messages expected by scripts out of plumbing commands such as
> 
> Why?  The resulting file seems to compile without this addition.

Of course. That shouldn't have been part of this commit.


Thank you,
Lars






Re: [PATCH v7 10/10] convert: add filter..process option

2016-09-15 Thread Lars Schneider

> On 13 Sep 2016, at 17:42, Junio C Hamano  wrote:
> 
> Torsten Bögershausen  writes:
> 
>> I would really consider to treat pathnames as binary, and not add a trailing 
>> '\n',
>> are there other opinions ?
> 
> It would be the most consistent if the same format as
> write_name_quoted() is used for this, I would think.

Is that the solution you had in mind?

quote_c_style(path, _path, NULL, 0);
err = packet_write_fmt_gently(process->in, "pathname=%s\n", 
quoted_path.buf);

Wouldn't that complicate the pathname parsing on the filter side?
Can't we just define in our filter protocol documentation that our 
"pathname" packet _always_ has a trailing "\n"? That would mean the 
receiver would know a packet "pathname=ABC\n\n" encodes the path
"ABC\n" [1].

Thanks,
Lars


[1] Following Torsten's example in 
http://public-inbox.org/git/96554f6d-988d-e0b8-7936-8d0f29a75...@web.de )



Re: [PATCH] pkt-line: mark a file-local symbol static

2016-09-15 Thread Lars Schneider

> On 14 Sep 2016, at 14:31, Ramsay Jones  wrote:
> 
> 
> Signed-off-by: Ramsay Jones 
> ---
> 
> Hi Lars,
> 
> If you need to re-roll your 'ls/filter-process' branch, could you
> please squash this into the relevant patch; commit 2afd9b22
> ("pkt-line: add packet_write_gently()", 08-09-2016).
> 
> [...]
> -int packet_write_gently(const int fd_out, const char *buf, size_t size)
> +static int packet_write_gently(const int fd_out, const char *buf, size_t 
> size)
> {
>   static char packet_write_buffer[LARGE_PACKET_MAX];

Done!

Thank you,
Lars

Re: [PATCH v7 04/10] pkt-line: add packet_flush_gently()

2016-09-15 Thread Lars Schneider

> On 13 Sep 2016, at 23:44, Junio C Hamano <gits...@pobox.com> wrote:
> 
> Lars Schneider <larsxschnei...@gmail.com> writes:
> 
>>> On 13 Sep 2016, at 00:30, Junio C Hamano <gits...@pobox.com> wrote:
>>> 
>>> larsxschnei...@gmail.com writes:
>>> 
>>>> From: Lars Schneider <larsxschnei...@gmail.com>
>>>> 
>>>> packet_flush() would die in case of a write error even though for some
>>>> callers an error would be acceptable. Add packet_flush_gently() which
>>>> writes a pkt-line flush packet and returns `0` for success and `-1` for
>>>> failure.
>>>> ...
>>>> +int packet_flush_gently(int fd)
>>>> +{
>>>> +  packet_trace("", 4, 1);
>>>> +  if (write_in_full(fd, "", 4) == 4)
>>>> +  return 0;
>>>> +  error("flush packet write failed");
>>>> +  return -1;
>>> 
>>> It is more idiomatic to do
>>> 
>>> return error(...);
>>> 
>>> but more importantly, does the caller even want an error message
>>> unconditionally printed here?
>>> 
>>> I suspect that it is a strong sign that the caller wants to be in
>>> control of when and what error message is produced; otherwise it
>>> wouldn't be calling the _gently() variant, no?
>> 
>> Agreed!
> 
> I am also OK with the current form, too.  Those who need to enhance
> it to packet_flush_gently(int fd, int quiet) can come later.

"caller wants to be in control [...] otherwise it wouldn't be calling 
the _gently() variant" convinced me. I would like to change it like
this:

trace_printf_key(_packet, "flush packet write failed");
return -1;

Objections?

Thanks,
Lars


Re: [PATCH v7 04/10] pkt-line: add packet_flush_gently()

2016-09-13 Thread Lars Schneider

> On 13 Sep 2016, at 00:30, Junio C Hamano <gits...@pobox.com> wrote:
> 
> larsxschnei...@gmail.com writes:
> 
>> From: Lars Schneider <larsxschnei...@gmail.com>
>> 
>> packet_flush() would die in case of a write error even though for some
>> callers an error would be acceptable. Add packet_flush_gently() which
>> writes a pkt-line flush packet and returns `0` for success and `-1` for
>> failure.
>> ...
>> +int packet_flush_gently(int fd)
>> +{
>> +packet_trace("", 4, 1);
>> +if (write_in_full(fd, "", 4) == 4)
>> +return 0;
>> +error("flush packet write failed");
>> +return -1;
> 
> It is more idiomatic to do
> 
>   return error(...);
> 
> but more importantly, does the caller even want an error message
> unconditionally printed here?
> 
> I suspect that it is a strong sign that the caller wants to be in
> control of when and what error message is produced; otherwise it
> wouldn't be calling the _gently() variant, no?

Agreed!

Thanks,
Lars


Re: [PATCH v7 10/10] convert: add filter..process option

2016-09-13 Thread Lars Schneider

> On 10 Sep 2016, at 17:40, Torsten Bögershausen  wrote:
> 
> []
> 
> One general question up here, more comments inline.
> The current order for a clean-filter is like this, I removed the error 
> handling:
> 
> int convert_to_git()
> {
>   ret |= apply_filter(path, src, len, -1, dst, filter);
>   if (ret && dst) {
>   src = dst->buf;
>   len = dst->len;
>   }
>   ret |= crlf_to_git(path, src, len, dst, ca.crlf_action, checksafe);
>   return ret | ident_to_git(path, src, len, dst, ca.ident);
> }
> 
> The first step is the clean filter, the CRLF-LF conversion (if needed),
> then ident.
> The current implementation streams the whole file content to the filter,
> (STDIN of the filter) and reads back STDOUT from the filter into a STRBUF.
> This is to use the UNIX-like STDIN--STDOUT method for writing a filter.
> 
> However, int would_convert_to_git_filter_fd() and convert_to_git_filter_fd()
> offer a sort of short-cut:
> The filter reads from the file directly, and the output of the filter is
> read into a STRBUF.

Are you sure? As far as I understand the code the filter does not read from 
the file in any case today.  The functions would_convert_to_git_filter_fd() and 
convert_to_git_filter_fd() just avoid avoid mapping the file in Git. The 
content 
is still streamed via pipes:
https://github.com/git/git/commit/9035d75a2be9d80d82676504d69553245017f6d4


> It looks as if the multi-filter approach can use this in a similar way:
> Give the pathname to the filter, the filter opens the file for reading
> and stream the result via the pkt-line protocol into Git.
> This needs some more changes, and may be very well go into a separate patch
> series. (and should).
> 
> What I am asking for:
> When a multi-filter is used, the content is handled to the filter via 
> pkt-line,
> and the result is given to Git via pkt-line ?
> Nothing wrong with it, I just wonder, if it should be mentioned somewhere.

That is most certainly a good idea and the main reason I added "capabilities"
to the protocol. Joey Hess worked on this topic (not merged, yet) and I would 
like to make this available to the long-running filter protocol as soon as the
feature is available:
http://public-inbox.org/git/1468277112-9909-1-git-send-email-jo...@joeyh.name/


>> +sub packet_read {
>> +my $buffer;
>> +my $bytes_read = read STDIN, $buffer, 4;
>> +if ( $bytes_read == 0 ) {
>> +
>> +# EOF - Git stopped talking to us!
>> +exit();
>> +}
>> +elsif ( $bytes_read != 4 ) {
>> +die "invalid packet size '$bytes_read' field";
>> +}
> 
> This is half-kosher, I would say,
> (And I really. really would like to see an implementation in C ;-)

Would you be willing to contribute a patch? :-)


> A read function may look like this:
> 
>   ret = read(0, , 4);
>   if (ret < 0) {
> /* Error, nothing we can do */
> exit(1);
>   } else if (ret == 0) {
> /* EOF  */
> exit(0);
>   } else if (ret < 4) {
> /* 
>  * Go and read more, until we have 4 bytes or EOF or Error */
>   } else {
> /* Good case, see below */
>   }

I see. However, my intention was to provide an absolute minimal
example to teach a reader how the protocol works. I consider
all proper error handling an exercise for the reader ;-)


>> +#define CAP_CLEAN(1u<<0)
>> +#define CAP_SMUDGE   (1u<<1)
> 
> Is CAP_ too generic, and GIT_FILTER_CAP (or so) less calling for trouble ?

I had something like that but Junio suggested these names in V4:
http://public-inbox.org/git/xmqq8twd8uld@gitster.mtv.corp.google.com/


>> +
>> +err = (strlen(filter_type) > PKTLINE_DATA_MAXLEN);
> 
> Extra () needed ?
> More () in the code...

I thought it might improve readability, but I will remove them
if you think this would be more consistent with existing Git code.


Thanks,
Lars

Re: [ANNOUNCE] Git User's Survey 2016

2016-09-13 Thread Lars Schneider

> On 13 Sep 2016, at 17:54, Jakub Narębski  wrote:
> 
> On 13 September 2016 at 18:15, David Bainbridge
>  wrote:
>> Hi Jakub,
>> 
>> You said:
>> P.S. At request I can open a separate channel in survey, with
>> a separate survey URL, so that responses from particular site
>> or organization could be separated out.
>> 
>> Please can you open a channel for use by Ericsson?
> 
> Sent (privately to David).

Could you send me a channel for Autodesk, too?

Thank you,
Lars

Re: Git Miniconference at Plumbers

2016-09-12 Thread Lars Schneider

> On 12 Sep 2016, at 21:11, Junio C Hamano  wrote:
> 
> [..]
> properly; supporting "huge objects" better in the object layer,
> without having to resort to ugly hacks like GitLFS that will never
> be part of the core Git. [...]

I agree with you that GitLFS is an ugly hack.

Some applications have test data, image assets, and other data sets that
need to be versioned along with the source code.

How would you deal with these kind of "huge objects" _today_?

Thanks,
Lars


Re: [PATCH v7 10/10] convert: add filter..process option

2016-09-12 Thread Lars Schneider

> On 10 Sep 2016, at 08:29, Torsten Bögershausen  wrote:
> 
> On Thu, Sep 08, 2016 at 08:21:32PM +0200, larsxschnei...@gmail.com wrote:
> []
>> +packet:  git> git-filter-client
>> +packet:  git> version=2
>> +packet:  git> version=42
>> +packet:  git> 
>> +packet:  git< git-filter-server
>> +packet:  git< version=2
>> +packet:  git> clean=true
>> +packet:  git> smudge=true
>> +packet:  git> not-yet-invented=true
>> +packet:  git> 
>> +packet:  git< clean=true
>> +packet:  git< smudge=true
>> +packet:  git< 
> 
> It's probalby only me who has difficulties to distinguish
> '>' from '<'.

I see what you mean. However, this format is used with "GIT_TRACE_PACKET"
as well and therefore I would prefer to keep it.


> packet:  git> git-filter-client
> packet:  git> version=2
> packet:  git> version=42
> packet:  git> 
> packet:   filter> git-filter-server
> packet:   filter> version=2
> 
> (Otherwise the dialoge description is nice)

Thanks!


>> +
>> +Supported filter capabilities in version 2 are "clean" and
>> +"smudge".
>> +
>> +Afterwards Git sends a list of "key=value" pairs terminated with
>> +a flush packet. The list will contain at least the filter command
>> +(based on the supported capabilities) and the pathname of the file
>> +to filter relative to the repository root. Right after these packets
>> +Git sends the content split in zero or more pkt-line packets and a
>> +flush packet to terminate content.
>> +
>> +packet:  git> command=smudge\n
>> +packet:  git> pathname=path/testfile.dat\n
> 
> How do we send pathnames the have '\n' ?
> Not really recommended, but allowed.
> And here I am a little bit lost, is each of the lines packed into
> a pkt-line ?
> command=smudge is packet as pkt-line and pathname= is packed into
> another one ? (The we don't need the '\n' at all)

Every line is a dedicated packet. That's why '\n' in a path name would
not be a problem as the receiver is expected to read the entire packet
when parsing the value (and the receiver knows the packet length, too).

The '\n' at the end is required by the pkt-line format:
"A non-binary line SHOULD BE terminated by an LF..."
(see protocol-common.txt)


> Or go both lines into one pkt-line (thats what I think), then
> we don't need the '\n' afther the pathname.

No (see above).


> And in this case the pathname is always the last element, and a '\n'
> may occur in the pathname, since we know the length of the packet
> we know how long the pathname must be.
> 
> 
> [...]
>> 
>> +In case the filter cannot or does not want to process the content,
> 
> Does not want ? 
> I can see something like "I read through the file, there is nothing
> to do. So Git, I don't send anything back, you know where the file is.

That's right. Isn't that covered with "does not want"?


>> +it is expected to respond with an "error" status. Depending on the
>> +`filter..required` flag Git will interpret that as error
>> +but it will not stop or restart the filter process.
>> +
>> +packet:  git< status=error\n
>> +packet:  git< 
>> +
>> +
>> +If the filter experiences an error during processing, then it can
>> +send the status "error" after the content was (partially or
>> +completely) sent. Depending on the `filter..required` flag
>> +Git will interpret that as error but it will not stop or restart the
>> +filter process.
>> +
>> +packet:  git< status=success\n
>> +packet:  git< 
>> +packet:  git< HALF_WRITTEN_ERRONEOUS_CONTENT
>> +packet:  git< 
>> +packet:  git< status=error\n
>> +packet:  git< 
>> +
>> +
>> +If the filter dies during the communication or does not adhere to
>> +the protocol then Git will stop the filter process and restart it
> 
> My personal comment:
> When a filter is mis-behaving, Git should say so loud and clear, and
> die(). 
> The filter process can be left running, so that it can be debugged.

In the current implementation Git would die already if the filter is 
"required". In this particular case we *could* die, too. However, 
I would prefer to keep it as is because I think the users of a 
"non-required" filter do not expect Git to die if there is *any* 
problem with the filter.


> Here I stopped the review for a moment, 
> I still think that Git shouldn't kill anything, because we loose
> the ability to debug these processes.

Based on my experience debugging the filter in this state is hard
anyways. I think a user would rather try to reproduce the problem
and run Git with the "GIT_TRACE_PACKET" flag enabled for debugging.


Thanks,
Lars

Re: [PATCH v7 03/10] pkt-line: add packet_write_fmt_gently()

2016-09-12 Thread Lars Schneider

> On 11 Sep 2016, at 18:01, Stefan Beller <sbel...@google.com> wrote:
> 
> On Sun, Sep 11, 2016 at 4:36 AM, Lars Schneider
> <larsxschnei...@gmail.com> wrote:
> 
>>> 
>>>   call check_pipe from write_or_die here instead of
>>>   reproducing that function?
>> [...]
> 
>> Maybe it would be more suitable to move check_pipe to
>> run-command.h/c?
> 
> That's certainly possible.
> I don't have a strong opinion, where the code actually
> resides, but I do have a strong-ish opinion on code
> duplication. ;)

OK, then I will move check_pipe() to run-command.

Thanks,
Lars



Re: [PATCH v7 08/10] convert: modernize tests

2016-09-11 Thread Lars Schneider

> On 09 Sep 2016, at 00:05, Stefan Beller <sbel...@google.com> wrote:
> 
> On Thu, Sep 8, 2016 at 11:21 AM,  <larsxschnei...@gmail.com> wrote:
>> From: Lars Schneider <larsxschnei...@gmail.com>
>> 
>> Use `test_config` to set the config, check that files are empty with
>> `test_must_be_empty`, compare files with `test_cmp`, and remove spaces
>> after ">" and "<".
>> 
>> Please note that the "rot13" filter configured in "setup" keeps using
>> `git config` instead of `test_config` because subsequent tests might
>> depend on it.
>> 
>> Signed-off-by: Lars Schneider <larsxschnei...@gmail.com>
>> ---
> 
> Makes sense & Reviewed-by "Stefan Beller <sbel...@google.com>"

Thank you,
Lars



Re: [PATCH v7 06/10] pkt-line: add functions to read/write flush terminated packet streams

2016-09-11 Thread Lars Schneider

> On 08 Sep 2016, at 23:49, Stefan Beller <sbel...@google.com> wrote:
> 
> On Thu, Sep 8, 2016 at 11:21 AM,  <larsxschnei...@gmail.com> wrote:
>> From: Lars Schneider <larsxschnei...@gmail.com>
>> 
>> write_packetized_from_fd() and write_packetized_from_buf() write a
>> stream of packets. All content packets use the maximal packet size
>> except for the last one. After the last content packet a `flush` control
>> packet is written.
> 
> I presume we need both write_* things in a later patch; can you clarify why
> we need both of them?

Since 9035d7 Git streams from fd to required filters and from buf to
non-required filters. The Git filter protocol v2 makes use of all that,
too.

https://github.com/git/git/commit/9035d75a2be9d80d82676504d69553245017f6d4


>> +   if (paket_len < 0) {
>> +   if (oldalloc == 0)
>> +   strbuf_release(sb_out);
> 
> So if old alloc is 0, we release it, which is documented as
> /**
> * Release a string buffer and the memory it used. You should not use the
> * string buffer after using this function, unless you initialize it again.
> */
> 
>> +   else
>> +   strbuf_setlen(sb_out, oldlen);
> 
> Otherwise we just set the length back, such that it looks like before.
> 
> So as a caller the strbuf is in a different state in case of error
> depending whether
> the strbuf already had some data in it. I think it would be better if
> we only did
> `strbuf_setlen(sb_out, oldlen);` here, such that the caller can
> strbuf_release it
> unconditionally.

I tried to mimic the behavior of strbuf_read() [1]. The error handling
was introduced in 2fc647 [2] to ease error handling:

"This allows for easier error handling, as callers only need to call
strbuf_release() if A) the command succeeded or B) if they would have had
to do so anyway because they added something to the strbuf themselves."

Does this convince you to keep the proposed error handling? If yes, then
I would add a comment to the function to document that behavior explicitly!

[1] 
https://github.com/git/git/blob/cda1bbd474805e653dda8a71d4ea3790e2a66cbb/strbuf.c#L377-L383
[2] https://github.com/git/git/commit/2fc647004ac7016128372a85db8245581e493812


> Or to make things more confusing, you could use strbuf_reset in case of 0,
> as that is a strbuf_setlen internally. ;)


Thanks,
Lars

Re: [PATCH v7 05/10] pkt-line: add packet_write_gently()

2016-09-11 Thread Lars Schneider

> On 08 Sep 2016, at 23:24, Stefan Beller  wrote:
> 
> On Thu, Sep 8, 2016 at 11:21 AM,   wrote:
> 
>> 
>> Add packet_write_gently() which writes arbitrary data and returns `0`
>> for success and `-1` for an error.
> 
> I think documenting the return code is better done in either the header file
> or in a commend preceding the implementation instead of the commit message?
> 
> Maybe just a generic comment for *_gently is good enough, maybe even no
> comment. So the commit is fine, too. I dunno.

I agree that this is too verbose as this function follows the standard
Git return value conventions (AFAIK). I'll remove this from all commit
messages.


>> This function is used by other
>> pkt-line functions in a subsequent patch.
> 
> That's what I figured. Do we also need to mention that in the preceding patch
> for packet_flush_gently ?

I'll add this note to all commit messages that introduce new functions which
are used later.


Thanks,
Lars

Re: [PATCH v7 03/10] pkt-line: add packet_write_fmt_gently()

2016-09-11 Thread Lars Schneider

> On 08 Sep 2016, at 23:18, Stefan Beller  wrote:
> 
> On Thu, Sep 8, 2016 at 11:21 AM,   wrote:
> 
>> +static int packet_write_fmt_1(int fd, int gently,
>> +  const char *fmt, va_list args)
>> +{
>> +   struct strbuf buf = STRBUF_INIT;
>> +   size_t count;
>> +
>> +   format_packet(, fmt, args);
>> +   count = write_in_full(fd, buf.buf, buf.len);
>> +   if (count == buf.len)
>> +   return 0;
>> +
>> +   if (!gently) {
> 
>call check_pipe from write_or_die here instead of
>reproducing that function?

Yes, might be better. I wasn't sure because the check_pipe is
not public.

Where would you declare check_pipe? In cache.h?
Maybe it would be more suitable to move check_pipe to 
run-command.h/c?


>> +   if (errno == EPIPE) {
>> +   if (in_async())
>> +   async_exit(141);
>> +
>> +   signal(SIGPIPE, SIG_DFL);
>> +   raise(SIGPIPE);
>> +   /* Should never happen, but just in case... */
>> +   exit(141);
>> +   }
>> +   die_errno("packet write error");
>> +   }
>> +   error("packet write failed");
>> +   return -1;
> 
> I think the more idiomatic way is to
> 
>return error(...);
> 
> as error always return -1.

Of course!!


Thank you,
Lars



Re: [PATCH v1 2/2] read-cache: make sure file handles are not inherited by child processes

2016-09-08 Thread Lars Schneider

> On 07 Sep 2016, at 20:23, Junio C Hamano  wrote:
> 
> Eric Wong  writes:
> 
>> We probably should be using O_NOATIME for all O_RDONLY cases
>> to get the last bit of performance out (especially since
>> non-modern-Linux systems probably still lack relatime).
> 
> No, please do not go there.
> 
> The user can read from a file in a working tree using "less",
> "grep", etc., and they all update the atime, so should "git grep".
> We do not use atime ourselves on these files but we should let
> outside tools rely on the validity of atime (e.g. "what are the
> files that were looked at yesterday?").
> 
> If you grep for noatime in our current codebase, you'd notice that
> we use it only for files in objects/ hierarchy, and that makes very
> good sense.  These files are what we create for our _sole_ use and
> no other tools can peek at them and expect to get any useful
> information out of them (we hear from time to time that virus
> scanners leaving open file descriptors on them causing trouble, but
> that is an example of a useless access), and that makes a file in
> objects/ hierarchy a fair game for noatime optimization.

How do we deal with read-cache:ce_compare_data, though?

By your definition above we shouldn't use NOATIME since the read file
is not in objects/. However, the file read is not something the user
explicitly triggers. The read is part of the Git internal "clean"
machinery.

What would you suggest? Should I open the file with or without NOATIME?

Thanks,
Lars


Re: [PATCH v1 2/2] read-cache: make sure file handles are not inherited by child processes

2016-09-07 Thread Lars Schneider

> On 06 Sep 2016, at 23:06, Eric Wong  wrote:
> 
> larsxschnei...@gmail.com wrote:
>> static int ce_compare_data(const struct cache_entry *ce, struct stat *st)
>> {
>>  int match = -1;
>> -int fd = open(ce->name, O_RDONLY);
>> +int fd = open(ce->name, O_RDONLY | O_CLOEXEC);
>> 
>>  if (fd >= 0) {
>>  unsigned char sha1[20];
> 
> Also, this needs to check EINVAL when O_CLOEXEC != 0 the same
> way create_tempfile currently does.  Somebody could be building
> with modern headers but running an old kernel that doesn't
> understand O_CLOEXEC.
> 
> There should probably be a open() wrapper for handling this case
> since we're now up to 3 places where open(... O_CLOEXEC) is
> used.

Right! Actually "sha1_file.c:git_open_noatime()" is already a wrapper, no?
Can't we use this here? The O_NOATIME flag shouldn't hurt, right?

Thanks,
Lars



Re: [PATCH v1 1/2] sha1_file: open window into packfiles with CLOEXEC

2016-09-07 Thread Lars Schneider

> On 06 Sep 2016, at 13:38, Johannes Schindelin  
> wrote:
> 
> Hi Eric & Lars,
> 
> On Mon, 5 Sep 2016, Eric Wong wrote:
> 
>> larsxschnei...@gmail.com wrote:
>>> All processes that the Git main process spawns inherit the open file
>>> descriptors of the main process. These leaked file descriptors can
>>> cause problems.
>> 
>> 
>>> -int git_open_noatime(const char *name)
>>> +int git_open_noatime_cloexec(const char *name)
>>> {
>>> -   static int sha1_file_open_flag = O_NOATIME;
>>> +   static int sha1_file_open_flag = O_NOATIME | O_CLOEXEC;
>>> 
>>> for (;;) {
>>> int fd;
> 
>> I question the need for the "_cloexec" suffixing in the
>> function name since the old function is going away entirely.
> 
> Me, too. While it is correct, it makes things harder to read, so it may
> even cause more harm than it does good.

What name would you suggest? Leaving the name as-is seems misleading to me.
Maybe just "git_open()" ?


>> I prefer all FD-creating functions set cloexec by default
>> for FD > 2 to avoid inadvertantly leaking FDs.  So we
>> ought to use pipe2, accept4, socket(..., SOCK_CLOEXEC), etc...
>> and fallback to the racy+slower F_SETFD when not available.
> 
> In the original Pull Request where the change was contributed to Git for
> Windows, this was tested (actually, the code did not see whether fd > 2,
> but simply assumed that all newly opened file descriptors would be > 2
> anyway), and it failed:
> 
> https://github.com/git-for-windows/git/pull/755#issuecomment-220247972
> 
> So it appears that we would have to exclude at least the code path to `git
> upload-pack` from that magic.


I just realized that Dscho improved his original patch in GfW with a
fallback if CLOEXEC is not present.

I applied the same mechanism here. Would that be OK?

Thanks,
Lars

-   static int sha1_file_open_flag = O_NOATIME;
+   static int sha1_file_open_flag = O_NOATIME | O_CLOEXEC;

for (;;) {
int fd;
@@ -1471,12 +1471,17 @@ int git_open_noatime(const char *name)
if (fd >= 0)
return fd;

-   /* Might the failure be due to O_NOATIME? */
-   if (errno != ENOENT && sha1_file_open_flag) {
-   sha1_file_open_flag = 0;
+   /* Try again w/o O_CLOEXEC: the kernel might not support it */
+   if (O_CLOEXEC && errno == EINVAL && (sha1_file_open_flag & 
O_CLOEXEC)) {
+   sha1_file_open_flag &= ~O_CLOEXEC;
continue;
}

+   /* Might the failure be due to O_NOATIME? */
+   if (errno != ENOENT && (sha1_file_open_flag & O_NOATIME)) {
+   sha1_file_open_flag &= ~O_NOATIME;
+   continue;
+   }



Re: [PATCH v6 12/13] convert: add filter..process option

2016-09-05 Thread Lars Schneider

> On 30 Aug 2016, at 22:46, Jakub Narębski  wrote:
> 
>>> The filter can exit right after the "error-all". If the filter does
>>> not exit then Git will kill the filter. I'll add this to the docs.
>> 
>> OK.
> 
> Is it explicit kill, or implicit kill: close pipe and end process?

I close the pipe and call "finish_command".


>>> "abort" could be ambiguous because it could be read as "abort only
>>> this file". "abort-all" would work, though. Would you prefer to see
>>> "error" replaced by "abort" and "error-all" by "abort-all"?
>> 
>> No.
>> 
>> I was primarily reacting to "-all" part, so anything that ends with
>> "-all" is equally ugly from my point of view and not an improvement.
>> 
>> As I said, "error-all" as long as other reviewers are happy with is
>> OK by me, too.
> 
> I'm rather partial to "abort" instead of "error-all", or "quit"/"exit"
> (but I prefer "abort" or "bail-out"), as it better reflects what this
> response is about - ending filter process.

After some thinking I agree with "abort" instead of "error-all".


>>> A filter that dies during communication or does not adhere to the protocol
>>> is a faulty filter. Feeding the faulty filter after restart with the same 
>>> blob would likely cause the same error. 
>> 
>> Why does Git restart it and continue with the rest of the blobs
>> then?  Is there a reason to believe it may produce correct results
>> for other blobs if you run it again?
> 
> I guess the idea is that single filter can be run on different types
> of blobs, and it could fail on some types (some files) and not others.
> Like LFS-type filter failing on files with size larger than 2 GiB,
> or iconv-like filter to convert UTF-16 to UTF-8 failing on invalid
> byte sequences.

This mimics the v1 behavior and I will explain that in the documentation.


>>> B) If we communicate "shutdown" to the filter then we need to give the
>>>   filter some time to perform the exit before the filter is killed on
>>>   Git exit. I wasn't able to come up with a good answer how long Git 
>>>   should wait for the exit.
>> 
>> Would that be an issue?  A filter is buggy if told to shutdown,
>> ignores the request and hangs around forever.  I do not think we
>> even need to kill it.  It is not Git's problem.
> 
> I think the idea was for Git to request shutdown, and filter to tell
> when it finished (or just exiting, and Git getting SIGCHLD, I guess).
> It is hard to tell how much to wait, for example for filter process
> to send "sync" command, or finish unmounting, or something...

I like Junios approach because then we don't need to wait at all...


>> I personally would be reluctant to become a filter process writer if
>> the system it will be talking to can kill it without giving it a
>> chance to do a graceful exit, but perhaps that is just me.  I don't
>> know if closing the pipe going there where you are having Git to
>> kill the process on the other end is any more involved than what you
>> have without extra patches.
> 
> Isn't it the same problem with v1 filters being killed on Git process
> exit?  Unless v2 filter wants to do something _after_ writing output
> to Git, it should be safe... unless Git process is killed, but it
> is not different from filter being stray-killed.

Yes, it should be safe. However, I think it is probably nicer if the filter
process can shutdown gracefully instead of being killed.


> +Please note that you cannot use an existing `filter..clean`
> +or `filter..smudge` command with `filter..process`
> +because the former two use a different inter process communication
> +protocol than the latter one.
 
 Would it be a useful sample program we can ship in contrib/ if you
 created a "filter adapter" that reads these two configuration
 variables and act as a filter..process?
>>> 
>>> You mean a v2 filter that would use v1 filters under the hood?
>>> If we would drop v1, then this would be useful. Otherwise I don't
>>> see any need for such a thing.
>> 
>> I meant it as primarily an example people can learn from when they
>> want to write their own.
> 
> Errr... if it were any v1 filter, it would be useless (as bad or
> worse performance than ordinary v1 clean or smudge filter).  It
> might make sense if v1 filter and v2 wrapper were written in the
> same [dynamic] language, and wrapper could just load / require
> filter as a function / procedure, [compile it], and use it.
> For example smudge/clean filter in Perl (e.g. rot13), and v2 wrapper
> in Perl too.

I'll provide a simple Perl rot13 v2 filter in contrib.

Thanks,
Lars

Re: [PATCH v6 13/13] read-cache: make sure file handles are not inherited by child processes

2016-08-30 Thread Lars Schneider

> On 29 Aug 2016, at 21:45, Junio C Hamano <gits...@pobox.com> wrote:
> 
> Lars Schneider <larsxschnei...@gmail.com> writes:
> 
>> I see. Thanks for the explanation.
> 
> I expect the updated log message to explain the issue correctly
> then.

Sure!


>>> The parent is
>>> very likely to have pack windows open into .pack files and they need
>>> to be closed on the child side after fork(2) starts the child
>>> process but before execve(2) runs the helper, if we want to avoid
>>> file descriptor leaks.
>> 
>> I think I understand what you are saying. However, during my tests
>> .pack file fd's were never a problem.
> 
> I do not expect during the lifetime of your long-running helper
> anybody would try to unlink an existing packfile, so it is unlikely
> that "cannot unlink an open file on Windows" issue to come up.  And
> the cross-platform problem I pointed out is a fd leak; leaks would
> not show up until you run out of the resource, just like you
> wouldn't notice small memory leak here and there UNLESS you actively
> look for them.  I would be surprised if your "tests" found anything.
> 
>> How would I find the open .pack file fd's?  Should I go through
>> /proc/PID/fd? Why is this no problem for other longer running
>> commands such as the git-credential-cache--daemon or git-daemon?
> 
> Nobody said this is "no problem for others".  While discussing the
> patch that started this thread, we noticed that any open file
> descriptor the main process has when the long-running clean/smudge
> helper is spawned _is_ a problem.  Other helpers may share the same
> problem, and if they do, that is all the more reason that fixing the
> file descriptor leak is a good thing.
> 
> The primary thing I was wondering was if we should open the window
> into packfiles with CLOEXEC, just like we recently started opening
> the tempfiles and lockfiles with the flag.  The reason why I asked
> if the site that spawns (i.e. run_command API) has an easy access to
> the list of file descriptors that we opened ONLY for ourselves is
> because that would make it possible to consider an alternative
> approach close them before execve(2) in run_commands API.  That
> would save us from having to sprinkle existing calls to open(2) with
> CLOEXEC.  But if that is not the case, opening them with CLOEXEC is
> a much better solution than going outside the program to "find" them
> in non-portable ways.

The pack files are opened before the filter process is forked. Therefore,
I think CLOEXEC makes sense for them. Should this change be part of this 
series? If yes, would it look like below? Should I adjust the function name?

Thanks,
Lars

diff --git a/sha1_file.c b/sha1_file.c
index d5e1121..759991e 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1485,7 +1485,7 @@ int check_sha1_signature(const unsigned char *sha1, void 
*map,
 
 int git_open_noatime(const char *name)
 {
-   static int sha1_file_open_flag = O_NOATIME;
+   static int sha1_file_open_flag = O_NOATIME | O_CLOEXEC;
 
for (;;) {
int fd;





Re: [PATCH v6 10/13] convert: generate large test files only once

2016-08-30 Thread Lars Schneider

> On 29 Aug 2016, at 19:52, Junio C Hamano <gits...@pobox.com> wrote:
> 
> Lars Schneider <larsxschnei...@gmail.com> writes:
> 
>>> On 25 Aug 2016, at 21:17, Stefan Beller <sbel...@google.com> wrote:
>>> 
>>>> On Thu, Aug 25, 2016 at 4:07 AM,  <larsxschnei...@gmail.com> wrote:
>>>> From: Lars Schneider <larsxschnei...@gmail.com>
>>>> 
>>>> Generate more interesting large test files
>>> 
>>> How are the large test files more interesting?
>>> (interesting in the notion of covering more potential bugs?
>>> easier to debug? better to maintain, or just a pleasant read?)
>> 
>> The old large test file was 1MB of zeros and 1 byte with a one, repeated 
>> 2048 times.
>> 
>> Since the filter uses 64k packets we would test a large number of equally 
>> looking packets.
>> 
>> That's why I thought the pseudo random content is more interesting.
> 
> I guess my real question is why it is not just a single invocation
> of test-genrandom that gives you the whole test file; if you are
> using 20MB, the simplest would be to grab 20MB out of test-genrandom.
> With that hopefully you won't see large number of equally looking
> packets, no?

True, but applying rot13 (via tr ...) on 20+ MB takes quite a bit of
time. That's why I came up with the 1M SP in between.

However, I realized that testing a large amount of data is not really
necessary for the final series. A single packet is 64k. A 500k pseudo random
test file should be sufficient. This will make the test way simpler.

Thanks,
Lars

Re: [PATCH v6 10/13] convert: generate large test files only once

2016-08-30 Thread Lars Schneider

> On 29 Aug 2016, at 19:46, Junio C Hamano  wrote:
> 
> larsxschnei...@gmail.com writes:
> 
>> diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
>> index 7b45136..34c8eb9 100755
>> --- a/t/t0021-conversion.sh
>> +++ b/t/t0021-conversion.sh
>> @@ -4,6 +4,15 @@ test_description='blob conversion via gitattributes'
>> 
>> . ./test-lib.sh
>> 
>> +if test_have_prereq EXPENSIVE
>> +then
>> +T0021_LARGE_FILE_SIZE=2048
>> +T0021_LARGISH_FILE_SIZE=100
>> +else
>> +T0021_LARGE_FILE_SIZE=30
>> +T0021_LARGISH_FILE_SIZE=2
>> +fi
> 
> Minor: do we need T0021_ prefix?  What are you trying to avoid
> collisions with?

Not necessary. I'll remove the prefix.


>> +git checkout -- test test.t test.i &&
>> +
>> +mkdir generated-test-data &&
>> +for i in $(test_seq 1 $T0021_LARGE_FILE_SIZE)
>> +do
>> +RANDOM_STRING="$(test-genrandom end $i | tr -dc "A-Za-z0-9" )"
>> +ROT_RANDOM_STRING="$(echo $RANDOM_STRING | ./rot13.sh )"
> 
> In earlier iteration of loop with lower $i, what guarantees that
> some bytes survive "tr -dc"?

Nothing really, good catch! The seed "end" produces as first character always a 
"S" which would survive "tr -dc". However, that is clunky. I will always set "1"
as first character in $RANDOM_STRING to mitigate the problem.

> 
>> +# Generate 1MB of empty data and 100 bytes of random characters
> 
> 100 bytes?  It seems to me that you are giving 1MB and then $i-byte
> or less (which sometimes can be zero) of random string.

Outdated comment. Will fix!

> 
>> +# printf "$(test-genrandom start $i)"
>> +printf "%1048576d" 1 >>generated-test-data/large.file &&
>> +printf "$RANDOM_STRING" >>generated-test-data/large.file &&
>> +printf "%1048576d" 1 >>generated-test-data/large.file.rot13 &&
>> +printf "$ROT_RANDOM_STRING" 
>> >>generated-test-data/large.file.rot13 &&
>> +
>> +if test $i = $T0021_LARGISH_FILE_SIZE
>> +then
>> +cat generated-test-data/large.file 
>> >generated-test-data/largish.file &&
>> +cat generated-test-data/large.file.rot13 
>> >generated-test-data/largish.file.rot13
>> +fi
>> +done
> 
> This "now we are done with the loop, so copy them to the second
> pair" needs to be in the loop?  Shouldn't it come after 'done'?

No, it does not need to be in the loop. I think I could do this
after the loop instead:

head -c $((1048576*$T0021_LARGISH_FILE_SIZE)) generated-test-data/large.file 
>generated-test-data/largish.file


> I do not quite get the point of this complexity.  You are using
> exactly the same seed "end" every time, so in the first round you
> have 1M of SP, letter '1', letter 'S' (from the genrandom), then
> in the second round you have 1M of SP, letter '1', letter 'S' and
> letter 'p' (the last two from the genrandom), and go on.  Is it
> significant for the purpose of your test that the cruft inserted
> between the repetition of 1M of SP gets longer by one byte but they
> all share the same prefix (e.g. "1S", "1Sp", "1SpZ", "1SpZT",
> ... are what you insert between a large run of spaces)?

The pktline packets have a constant size. If the cruft between 1M of SP 
has a constant size as well then the generated packets for the test data
would repeat themselves. That's why I increased the length after every 1M
of SP.

However, I realized that this test complexity is not necessary. I'll
simplify it in the next round.

Thanks,
Lars

Re: [PATCH v6 13/13] read-cache: make sure file handles are not inherited by child processes

2016-08-29 Thread Lars Schneider

> On 29 Aug 2016, at 20:05, Junio C Hamano <gits...@pobox.com> wrote:
> 
> larsxschnei...@gmail.com writes:
> 
>> From: Lars Schneider <larsxschnei...@gmail.com>
>> 
>> Consider the case of a file that requires filtering and is present in
>> branch A but not in branch B. If A is the current HEAD and we checkout B
>> then the following happens:
>> 
>> 1. ce_compare_data() opens the file
>> 2.   index_fd() detects that the file requires to run a clean filter and
>> calls index_stream_convert_blob()
>> 4. index_stream_convert_blob() calls convert_to_git_filter_fd()
>> 5.   convert_to_git_filter_fd() calls apply_filter() which creates a
>> new long running filter process (in case it is the first file
>> of this kind to be filtered)
>> 6.   The new filter process inherits all file handles. This is the
>> default on Linux/OSX and is explicitly defined in the
>> `CreateProcessW` call in `mingw.c` on Windows.
>> 7. ce_compare_data() closes the file
>> 8. Git unlinks the file as it is not present in B
>> 
>> The unlink operation does not work on Windows because the filter process
>> has still an open handle to the file. Apparently that is no problem on
>> Linux/OSX. Probably because "[...] the two file descriptors share open
>> file status flags" (see fork(2)).
> 
> Wait, a, minute.  "that is no problem" may be true as long as "that"
> is "unlinking the now-gone file in the filesystem", but the reason
> does not have anything to do with the "open-file status flags";
> unlike Windows, you _can_ unlink file that has an open file
> descriptor on it.

I see. Thanks for the explanation.

> 
> And even on POSIX systems, if you are doing a long-running helper
> any open file descriptor in the parent process when the long-running
> helper is spawned will become leaked fd.  CLOEXEC is a possible
> solution (but not necessarily the only or the best one) to the fd
> leak in this case.
> 
> How much does the code that spawns these long-running helpers know
> about the file descriptors that happen to be open?

Nothing really.

>  The parent is
> very likely to have pack windows open into .pack files and they need
> to be closed on the child side after fork(2) starts the child
> process but before execve(2) runs the helper, if we want to avoid
> file descriptor leaks.

I think I understand what you are saying. However, during my tests
.pack file fd's were never a problem. I use start_command() [1]
which wraps the fork() and exec calls [2].

How would I find the open .pack file fd's? Should I go through 
/proc/PID/fd? Why is this no problem for other longer running 
commands such as the git-credential-cache--daemon or git-daemon?

Thanks,
Lars


[1] https://github.com/larsxschneider/git/blob/protocol-filter/v6/convert.c#L566
[2] 
https://github.com/larsxschneider/git/blob/protocol-filter/v6/run-command.c#L345-L412




Re: [PATCH v6 00/13] Git filter protocol

2016-08-29 Thread Lars Schneider

> On 25 Aug 2016, at 13:07, larsxschnei...@gmail.com wrote:
> 
> From: Lars Schneider <larsxschnei...@gmail.com>
> 
> The goal of this series is to avoid launching a new clean/smudge filter
> process for each file that is filtered.
> 
> A short summary about v1 to v5 can be found here:
> https://git.github.io/rev_news/2016/08/17/edition-18/
> 
> This series is also published on web:
> https://github.com/larsxschneider/git/pull/10

Hi Junio,

I am working on v7 of this patch series and I noticed that is does
not merge clean anymore (not even with git/git master). Should
I rebase my patch series? If yes, what would be the most
convenient base for you?

Thanks,
Lars



Re: [PATCH v6 09/13] convert: modernize tests

2016-08-29 Thread Lars Schneider

> On 26 Aug 2016, at 22:03, Junio C Hamano <gits...@pobox.com> wrote:
> 
> larsxschnei...@gmail.com writes:
> 
>> From: Lars Schneider <larsxschnei...@gmail.com>
>> 
>> Use `test_config` to set the config, check that files are empty with
>> `test_must_be_empty`, compare files with `test_cmp`, and remove spaces
>> after ">" and "<".
> 
> All of the above are good things to do, but the first one needs to
> be done a bit carefully.
> 
> It is unclear in the above description if you made sure that no
> subsequent test depends on the settings left by earlier test before
> replacing "git config" with "test_config".

I assumed that I would see test failures if a subsequent test depends
on the settings left by earlier tests. I'll add this comment to the
commit message.


>> diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh
>> index 7bac2bc..7b45136 100755
>> --- a/t/t0021-conversion.sh
>> +++ b/t/t0021-conversion.sh
>> @@ -13,8 +13,8 @@ EOF
>> chmod +x rot13.sh
>> 
>> test_expect_success setup '
>> -git config filter.rot13.smudge ./rot13.sh &&
>> -git config filter.rot13.clean ./rot13.sh &&
>> +test_config filter.rot13.smudge ./rot13.sh &&
>> +test_config filter.rot13.clean ./rot13.sh &&
>> 
>>  {
>>  echo "*.t filter=rot13"
> 
> For example, after this conversion, filter.rot13.* will be reverted
> back to unconfigured once setup is done.
> 
>> test_expect_success check '
>> 
>> -cmp test.o test &&
>> -cmp test.o test.t &&
>> +test_cmp test.o test &&
>> +test_cmp test.o test.t &&
>> 
>>  # ident should be stripped in the repository
>>  git diff --raw --exit-code :test :test.i &&
> 
> It happens to be true that this subsequent test does not do any
> operation to cause data come from and go into the object database
> for any path that match the pattern "*.t", because for whatever
> reason the previous "setup" step happens to do more than just
> "setup".  It already exercised the filtering by doing "git add" and
> "git checkout".
> 
> If we were writing the script t0021 from scratch today, we would
> have used test_config *AND* squashed there two tests into one
> (i.e. making it a single "the filter and ident operation" test).
> Then it is crystal clear that additional tests on commands that may
> involve filtering will always be added to that combined test
> (e.g. we may try "cat-file --filters" to ensure that rot13 filter is
> excercised).
> 
> But because we are not doing that, it may become tempting to add
> test for a new command that pays attention to a filter to either of
> the test, and it would have worked OK if this patch weren't there.
> Such an addition will break the test, because in the second "check"
> test, the filter commands are no longer active with this patch.
> 
> So while I do appreciate the change for the longer term, I am not
> quite happy with it.  It just looks like an invitation for future
> mistakes.

I'll follow your judgement here. However, from my point of view a future 
addition that causes test failures is no issue. Because these test failures
would be analyzed and then the tests could be refactored accordingly.


>> @@ -130,7 +130,7 @@ test_expect_success 'filter shell-escaped filenames' '
>> 
>>  # delete the files and check them out again, using a smudge filter
>>  # that will count the args and echo the command-line back to us
>> -git config filter.argc.smudge "sh ./argc.sh %f" &&
>> +test_config filter.argc.smudge "sh ./argc.sh %f" &&
>>  rm "$normal" "$special" &&
>>  git checkout -- "$normal" "$special" &&
> 
> This one is clearly OK.  Anything related to argc filter only
> appears in this single test so it is not just OK to use test_config,
> but it makes our intention very clear that nobody else would use the
> argc filter.  I think similar reasoning would apply to the test_config
> conversion in the remainder of the script.

OK, then I will keep these test_config's and revert the first one.


> As to other types of changes you did in this, I see no problem with
> them.


Thanks,
Lars


Re: [PATCH v6 06/13] pkt-line: add functions to read/write flush terminated packet streams

2016-08-29 Thread Lars Schneider

> On 26 Aug 2016, at 19:21, Junio C Hamano <gits...@pobox.com> wrote:
> 
> Lars Schneider <larsxschnei...@gmail.com> writes:
> 
>> OK, what function names would be more clear from your point of view?
>> 
>> write_packetized_stream_from_fd()
>> write_packetized_stream_from_buf()
>> read_packetized_stream_to_buf()
> 
> Would
> 
>write_packetized_from_fd()
>write_packetized_from_buf()
>read_packetized_to_buf()
> 
> be shorter, still understandable, be consistent with the existing
> convention to name write_/read_ a function that shuffles bytes via a
> file descriptor, and to the point, perhaps?

Agreed.

Thanks,
Lars


Re: [PATCH v6 05/13] pkt-line: add packet_write_gently()

2016-08-29 Thread Lars Schneider

> On 26 Aug 2016, at 19:15, Junio C Hamano <gits...@pobox.com> wrote:
> 
> Lars Schneider <larsxschnei...@gmail.com> writes:
> 
>>> Do you anticipate future need of non-gently variant of this
>>> function?  If so, perhaps a helper that takes a boolean "am I
>>> working for the gently variant?" may help share more code.
>> 
>> With helper you mean "an additional boolean parameter"? I don't 
>> see a need for a non-gently variant right now but I will
>> add this parameter if you think it is a good idea. How would the
>> signature look like?
>> 
>> int packet_write_gently(const int fd_out, const char *buf, size_t size, int 
>> gentle)
>> 
>> This would follow type_from_string_gently() in object.h.
> 
> I actually imagined it would be more like your packet_write_fmt vs
> packet_write_fmt_gently pair of functions.  If you do not have an
> immediate need for a non-gentle packet_write() right now, but you
> still forsee that it is likely some other caller may want one, you
> could still prepare for it by doing a static
> 
>   packet_write_1((const int fd_out, const char *buf, size_t size, int 
> gentle)
> 
> and make packet_write_gently() call it with gentle=1, without
> actually introducing packet_write() nobody yet calls.

I see. In that case I would like to keep packet_write_gently() as is
because I don't see the need for a non-gently variant right now.

If there is a need for packet_write() then we could just add it and
move the packet_write_gently() code to packet_write_1() following your
suggestion. No caller would need to change for this refactoring.

If you strongly disagree then I would use the "two function" approach
you suggested above right away, though.

Thanks,
Lars

Re: [PATCH v6 06/13] pkt-line: add functions to read/write flush terminated packet streams

2016-08-26 Thread Lars Schneider

> On 26 Aug 2016, at 00:27, Junio C Hamano <gits...@pobox.com> wrote:
> 
> larsxschnei...@gmail.com writes:
> 
>> From: Lars Schneider <larsxschnei...@gmail.com>
>> 
>> packet_write_stream_with_flush_from_fd() and
>> packet_write_stream_with_flush_from_buf() write a stream of packets. All
>> content packets use the maximal packet size except for the last one.
>> After the last content packet a `flush` control packet is written.
>> packet_read_till_flush() reads arbitrary sized packets until it detects
>> a `flush` packet.
> 
> These are awkwardly named and I couldn't guess what the input is (I
> can tell one is to read from fd and the other is <mem,len> buffer,
> but it is unclear if that is in packetized form or just raw data
> stream to be copied to the end from their names) without reading the
> implementation.  I _think_ you read a raw stream of data through the
> end (either EOF or length limit) and write it out packetized, and
> use the flush packet to mark the end of the stream.  In my mind,
> that is "writing a packetized stream".  The words "packetizing" and
> "stream" imply that the stream could consist of more data than what
> would fit in a single packet, which in turn implies that there needs
> a way to mark the end of one data item, so with_flush does not
> necessarily have to be their names.
> 
> The counter-part would be "reading a packetized stream".
> 
>> +int packet_write_stream_with_flush_from_fd(int fd_in, int fd_out)
>> +{
> 
> Especially this one I am tempted to suggest "copy-to-packetized-stream",
> as it reads a stream from one fd and then copies out while packetizing.

OK, what function names would be more clear from your point of view?

copy_to_packetized_stream_from_fd()
copy_to_packetized_stream_from_buf()
copy_to_packetized_stream_to_buf()

or

write_packetized_stream_from_fd()
write_packetized_stream_from_buf()
read_packetized_stream_to_buf()

?


>> +int packet_write_stream_with_flush_from_buf(const char *src_in, size_t len, 
>> int fd_out)
>> +{
>> +int err = 0;
>> +size_t bytes_written = 0;
>> +size_t bytes_to_write;
>> +
>> +while (!err) {
>> +if ((len - bytes_written) > sizeof(packet_write_buffer) - 4)
>> +bytes_to_write = sizeof(packet_write_buffer) - 4;
>> +else
>> +bytes_to_write = len - bytes_written;
>> +if (bytes_to_write == 0)
>> +break;
> 
> The lack of COPY_WRITE_ERROR puzzled me briefly here.  If you are
> assuming that your math at the beginning of this loop is correct and
> bytes_to_write will never exceed the write-buffer size, I think you
> should be able to (and it would be better to) assume that the math
> you do to tell xread() up to how many bytes it is allowed to read at
> once is also correct, losing the COPY_WRITE_ERROR check in the other
> function.  You can choose to play safer and do a check in this
> function, too.  Either way, we would want to be consistent.

OK. I'll remove the (I just realized meaningless) check in the other function:

+   if (bytes_to_write > sizeof(packet_write_buffer) - 4)
+   return COPY_WRITE_ERROR;

> 
>> +err = packet_write_gently(fd_out, src_in + bytes_written, 
>> bytes_to_write);
>> +bytes_written += bytes_to_write;
>> +}
>> +if (!err)
>> +err = packet_flush_gently(fd_out);
>> +return err;
>> +}
> 
>> +ssize_t packet_read_till_flush(int fd_in, struct strbuf *sb_out)
>> +{
>> +int len, ret;
>> +int options = PACKET_READ_GENTLE_ON_EOF;
>> +char linelen[4];
>> +
>> +size_t oldlen = sb_out->len;
>> +size_t oldalloc = sb_out->alloc;
>> +
>> +for (;;) {
>> +/* Read packet header */
>> +ret = get_packet_data(fd_in, NULL, NULL, linelen, 4, options);
>> +if (ret < 0)
>> +goto done;
>> +len = packet_length(linelen);
>> +if (len < 0)
>> +die("protocol error: bad line length character: %.4s", 
>> linelen);
>> +if (!len) {
>> +/* Found a flush packet - Done! */
>> +packet_trace("", 4, 0);
>> +break;
>> +}
>> +len -= 4;
>> +
>> +/* Read packet content */
>> +strbuf_grow(sb_out, len);
>> +ret = get_packet_data(fd_in, NULL, NULL, 

Re: [PATCH v6 05/13] pkt-line: add packet_write_gently()

2016-08-26 Thread Lars Schneider

> On 25 Aug 2016, at 23:50, Junio C Hamano <gits...@pobox.com> wrote:
> 
> larsxschnei...@gmail.com writes:
> 
>> From: Lars Schneider <larsxschnei...@gmail.com>
>> 
>> packet_write_fmt() has two shortcomings. First, it uses format_packet()
>> which lets the caller only send string data via "%s". That means it
>> cannot be used for arbitrary data that may contain NULs. Second, it will
>> always die on error.
> 
> As you introduced _gently in 3/13, the latter is no longer a valid
> excuse to add this function.  Just remove the sentence "Second, ...".

Agreed!


>> Add packet_write_gently() which writes arbitrary data and returns `0`
>> for success and `-1` for an error. This function is used by other
>> pkt-line functions in a subsequent patch.
>> 
>> Signed-off-by: Lars Schneider <larsxschnei...@gmail.com>
>> ---
>> pkt-line.c | 12 
>> 1 file changed, 12 insertions(+)
>> 
>> diff --git a/pkt-line.c b/pkt-line.c
>> index cad26df..7e8a803 100644
>> --- a/pkt-line.c
>> +++ b/pkt-line.c
>> @@ -3,6 +3,7 @@
>> #include "run-command.h"
>> 
>> char packet_buffer[LARGE_PACKET_MAX];
>> +static char packet_write_buffer[LARGE_PACKET_MAX];
>> static const char *packet_trace_prefix = "git";
>> static struct trace_key trace_packet = TRACE_KEY_INIT(PACKET);
>> static struct trace_key trace_pack = TRACE_KEY_INIT(PACKFILE);
>> @@ -155,6 +156,17 @@ int packet_write_fmt_gently(int fd, const char *fmt, 
>> ...)
>>  return (write_in_full(fd, buf.buf, buf.len) == buf.len ? 0 : -1);
>> }
>> 
>> +int packet_write_gently(const int fd_out, const char *buf, size_t size)
>> +{
>> +if (size > sizeof(packet_write_buffer) - 4)
>> +return -1;
>> +packet_trace(buf, size, 1);
>> +memmove(packet_write_buffer + 4, buf, size);
>> +size += 4;
>> +set_packet_header(packet_write_buffer, size);
> 
> It may not matter all that much, but from code-reader's point of
> view, when you know packet_write_buffer[] will contain things A and
> B in this order, and when you have enough information to compute A
> before stasrting to fill packet_write_buffer[], I would prefer to
> see you actually fill the buffer in that natural order.

I did that because when packet_write_stream_with_flush_from_fd()
calls packet_write_gently() then buf[] is actually packet_write_buffer[]:

https://github.com/larsxschneider/git/blob/d474e6a4c2523b87624a07111eb7a4f2dcd12426/pkt-line.c#L185-L192

Therefore I would override the first 4 bytes. However, the code evolved for
some reason in that way but looking at it now I think that is an obscure,
likely meaningless optimization. I'll use a separate buffer in 
packet_write_stream_with_flush_from_fd() and then fix the order here
following your advice.


> Do you anticipate future need of non-gently variant of this
> function?  If so, perhaps a helper that takes a boolean "am I
> working for the gently variant?" may help share more code.

With helper you mean "an additional boolean parameter"? I don't 
see a need for a non-gently variant right now but I will
add this parameter if you think it is a good idea. How would the
signature look like?

int packet_write_gently(const int fd_out, const char *buf, size_t size, int 
gentle)

This would follow type_from_string_gently() in object.h.

Thanks,
Lars--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 03/13] pkt-line: add packet_write_fmt_gently()

2016-08-26 Thread Lars Schneider

> On 25 Aug 2016, at 23:41, Junio C Hamano <gits...@pobox.com> wrote:
> 
> larsxschnei...@gmail.com writes:
> 
>> From: Lars Schneider <larsxschnei...@gmail.com>
>> 
>> packet_write_fmt() would die in case of a write error even though for
>> some callers an error would be acceptable. Add packet_write_fmt_gently()
>> which writes a formatted pkt-line and returns `0` for success and `-1`
>> for an error.
>> 
>> Signed-off-by: Lars Schneider <larsxschnei...@gmail.com>
>> ---
>> pkt-line.c | 12 
>> pkt-line.h |  1 +
>> 2 files changed, 13 insertions(+)
>> 
>> diff --git a/pkt-line.c b/pkt-line.c
>> index e8adc0f..3e8b2fb 100644
>> --- a/pkt-line.c
>> +++ b/pkt-line.c
>> @@ -137,6 +137,18 @@ void packet_write_fmt(int fd, const char *fmt, ...)
>>  write_or_die(fd, buf.buf, buf.len);
>> }
>> 
>> +int packet_write_fmt_gently(int fd, const char *fmt, ...)
>> +{
>> +static struct strbuf buf = STRBUF_INIT;
>> +va_list args;
>> +
>> +strbuf_reset();
>> +va_start(args, fmt);
>> +format_packet(, fmt, args);
>> +va_end(args);
>> +return (write_in_full(fd, buf.buf, buf.len) == buf.len ? 0 : -1);
>> +}
> 
> Even though its only a handful lines, it is a bit ugly to have a
> completely copied implementation only to have _gently().  I suspect
> that you should be able to
> 
>   static int packet_write_fmt_1(int fd, int gently,
>   const char *fmt, va_list args)
>{
>   struct strbuf buf = STRBUF_INIT;
>   size_t count;
> 
>   format_packet(, fmt, args);
>   
>   count = write_in_full(fd, buf.buf, buf.len);
>if (count == buf.len)
>   return 0;
>   if (!gently) {
>   check_pipe(errno);
>   die_errno("write error");
>   }
>return -1;
>   }
> 
> and then share that between the existing one:
> 
>   void packet_write_fmt(int fd, const char *fmt, ...)
>{
>   va_list args;
>   va_start(args, fmt);
>packet_write_fmt_1(fd, 0, fmt, args);
>va_end(args);
>   }
> 
> and the new one:
> 
>   void packet_write_fmt_gently(int fd, const char *fmt, ...)
>{
>   int status;
>   va_list args;
>   va_start(args, fmt);
>status = packet_write_fmt_1(fd, 1, fmt, args);
>va_end(args);
>   return status;
>   }

I agree with your criticism of the code duplication. 

However, I thought it would be OK, as Peff already 
tried to refactor it...
http://public-inbox.org/git/20160810150139.lpxyrqkr53s5f...@sigill.intra.peff.net/

... and I got the impression you agreed with Peff:
http://public-inbox.org/git/xmqqvaz84g9y@gitster.mtv.corp.google.com/


I will try to refactor it according to your suggestion above. 
Would "packet_write_fmt_1()" be an acceptable name or should 
I come up with something more expressive?

Thanks you,
Lars--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 06/13] pkt-line: add functions to read/write flush terminated packet streams

2016-08-25 Thread Lars Schneider

> On 25 Aug 2016, at 20:46, Stefan Beller <sbel...@google.com> wrote:
> 
>> On Thu, Aug 25, 2016 at 4:07 AM,  <larsxschnei...@gmail.com> wrote:
>> From: Lars Schneider <larsxschnei...@gmail.com>
>> 
>> packet_write_stream_with_flush_from_fd() and
>> packet_write_stream_with_flush_from_buf() write a stream of packets. All
>> content packets use the maximal packet size except for the last one.
>> After the last content packet a `flush` control packet is written.
>> 
>> packet_read_till_flush() reads arbitrary sized packets until it detects
>> a `flush` packet.
> 
> So the API provided by these read/write functions is intended
> to move a huge chunks of data. And as it puts the data on the wire one
> packet after the other without the possibility to intervene and e.g. send
> a side channel progress bar update, I would question the design of this.
> If I understand correctly this will be specifically  used for large
> files locally,
> so e.g. a file of 5 GB (such as a virtual machine tracked in Git), would
> require about 80k packets.

Peff suggested this approach arguing that the overhead is neglectable:
http://public-inbox.org/git/20160720134916.gb19...@sigill.intra.peff.net/


> Instead of having many packets of max length and then a remainder,
> I would suggest to invent larger packets for this use case. Then we can
> just send one packet instead.
> 
> Currently a packet consists of 4 bytes indicating the length in hex
> and then the payload of length-4 bytes. As the length is in hex
> the characters in the first 4 bytes are [0-9a-f], we can easily add another
> meaning for the length, e.g.:
> 
>  A packet starts with the overall length and then the payload.
>  If the first character of the length is 'v' the length is encoded as a
>  variable length quantity[1]. The high bit of the char indicates if
>  the next char is still part of the length field. The length must not exceed
>  LLONG_MAX (which results in a payload of 9223 Petabyte, so
>  enough for the foreseeable future).

Eventually I would like to resurrect Joey's cleanFromFile/smudgeToFile idea:

http://public-inbox.org/git/1468277112-9909-3-git-send-email-jo...@joeyh.name/

Then we would not need to transfer that much data over the pipes. However, I 
wonder if the large amount of packets would actually be a problem. Honestly, I 
would prefer to not change Git's packet format in this already large series ;-)


>  [1] A variable-length quantity (VLQ) is a universal code that uses
>  an arbitrary number of bytes to represent an arbitrarily large integer.
>  https://en.wikipedia.org/wiki/Variable-length_quantity
> 
> The neat thing about the packet system is we can dedicate packets
> to different channels (such as the side channels), but with the provided
> API here this makes it impossible to later add in these side channel
> as it is a pure streaming API now. So let's remove the complication
> of having to send multiple packets and just go with one large packet
> instead.

I tried to design the protocol as flexible as possible for the future with a 
version negotiation and a capabilities list. Therefore, I would think it should 
be possible to implement these ideas in the future if they are required.


> --
>I understand that my proposal would require writing code again,
>but it has also some long term advantages in the networking stack
>of Git: There are some worries that a capabilities line in fetch/push
>might overflow in the far future, when there are lots of capabilities.
> 
>Also a few days ago there was a proposal to add all symbolic refs
>to a capabilities line, which Peff shot down as "the packet may be
>too small".
> 
>There is an incredible hack that allows transporting refs > 64kB IIRC.
> 
>All these things could go away with the variable length encoded
>packets. But to make them go away in the future we would need
>to start with these variable length packets today. ;)
> 
> Just food for thought.

Thanks for thinking it through that thoroughly! I understand your point of view 
and I am curious what others thing.

Cheers,
Lars

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 10/13] convert: generate large test files only once

2016-08-25 Thread Lars Schneider


> On 25 Aug 2016, at 21:17, Stefan Beller <sbel...@google.com> wrote:
> 
>> On Thu, Aug 25, 2016 at 4:07 AM,  <larsxschnei...@gmail.com> wrote:
>> From: Lars Schneider <larsxschnei...@gmail.com>
>> 
>> Generate more interesting large test files
> 
> How are the large test files more interesting?
> (interesting in the notion of covering more potential bugs?
> easier to debug? better to maintain, or just a pleasant read?)

The old large test file was 1MB of zeros and 1 byte with a one, repeated 2048 
times.

Since the filter uses 64k packets we would test a large number of equally 
looking packets.

That's why I thought the pseudo random content is more interesting.


>> with pseudo random characters
>> in between and reuse these test files in multiple tests. Run tests
>> formerly marked as EXPENSIVE every time but with a smaller data set.
> 
> Sounds good to me.

Thank you.
Lars--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 07/13] pack-protocol: fix maximum pkt-line size

2016-08-25 Thread Lars Schneider

> On 25 Aug 2016, at 20:59, Stefan Beller <sbel...@google.com> wrote:
> 
>> On Thu, Aug 25, 2016 at 4:07 AM,  <larsxschnei...@gmail.com> wrote:
>> From: Lars Schneider <larsxschnei...@gmail.com>
>> 
>> According to LARGE_PACKET_MAX in pkt-line.h the maximal length of a
>> pkt-line packet is 65520 bytes. The pkt-line header takes 4 bytes and
>> therefore the pkt-line data component must not exceed 65516 bytes.
>> 
>> Signed-off-by: Lars Schneider <larsxschnei...@gmail.com>
>> ---
> 
> I was recently encouraged off list to really try hard to go with
> shorter series's.
> (Duh! Everyone knows that shorter series go in faster ;)
> 
> And as this is a strict bug fix of Documentation that makes sense
> outside this series,
> I'd like to encourage you to send this as a separate patch in case you
> need further
> rerolls.

Agreed!

Thanks,
Lars
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 03/13] pkt-line: add packet_write_fmt_gently()

2016-08-25 Thread Lars Schneider
On 25 Aug 2016, at 20:12, Stefan Beller  wrote:

>> +int packet_write_fmt_gently(int fd, const char *fmt, ...)
>> +{
>> +   static struct strbuf buf = STRBUF_INIT;
>> +   va_list args;
>> +
>> +   strbuf_reset();
>> +   va_start(args, fmt);
>> +   format_packet(, fmt, args);
> 
> format_packet also takes care of tracing the contents,
> which was a bit unexpected to me.

To me, too :)
http://public-inbox.org/git/20160810143321.q7mjirgr5ynml...@sigill.intra.peff.net/

The series is already pretty large and therefore I decided to leave this as-is.

> Do we also want to trace failure?

You mean in all new *_gently() functions? Good idea!

Thanks,
Lars--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] diff-highlight: add some tests.

2016-08-24 Thread Lars Schneider

> On 16 Aug 2016, at 22:48, Junio C Hamano <gits...@pobox.com> wrote:
> 
> Lars Schneider <larsxschnei...@gmail.com> writes:
> 
>>> On 30 Jul 2016, at 17:11, Brian Henderson <henderson...@gmail.com> wrote:
>>> 
>>> ---
>>> contrib/diff-highlight/Makefile  |  5 ++
>>> contrib/diff-highlight/t/Makefile| 19 +++
>>> contrib/diff-highlight/t/t9400-diff-highlight.sh | 63 ++
>>> contrib/diff-highlight/t/test-diff-highlight.sh  | 69 
>>> 
>>> 4 files changed, 156 insertions(+)
>>> create mode 100644 contrib/diff-highlight/Makefile
>>> create mode 100644 contrib/diff-highlight/t/Makefile
>>> create mode 100644 contrib/diff-highlight/t/t9400-diff-highlight.sh
>>> create mode 100644 contrib/diff-highlight/t/test-diff-highlight.sh
>>> 
>>> diff --git a/contrib/diff-highlight/Makefile 
>>> b/contrib/diff-highlight/Makefile
>> 
>> Would it make sense to add the contrib tests to the Travis-CI build, too?
>> https://travis-ci.org/git/git/branches
> 
> The more the merrier ;-)?  I do not think of a downside, especially
> if you are adding it as a separate thing that comes after the main
> test, or for even better isolation, an entirely separate job.

OK, if I will post a patch (might take a while).


> By the way, how flaky are existing tests?  Are people actively
> following breakage reports?

Good question - I was curious about that, too. 
That's why I made a little website (only tested on Chrome, requires JS): 

https://larsxschneider.github.io/git-ci-stats/

There you can see all builds for all branches including their failure reason.
In general I would say flakiness is no issue anymore.

I don't know who else is looking at the breakage reports but I do. 
That's the reason for my "next" breakage reports in the past:

http://public-inbox.org/git/a6131c47-3230-4ec4-b3f6-b2507c937...@gmail.com/
http://public-inbox.org/git/4877318e-3cbf-4c87-b24d-aae35c427...@gmail.com/
http://public-inbox.org/git/fb76544f-16f7-45ca-9649-fd62ee44b...@gmail.com/
...

Cheers,
Lars--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] mingw: ensure temporary file handles are not inherited by child processes

2016-08-17 Thread Lars Schneider

> On 17 Aug 2016, at 14:41, Johannes Schindelin  
> wrote:
> 
> From: Ben Wijen 
> 
> When the index is locked and child processes inherit the handle to
> said lock and the parent process wants to remove the lock before the
> child process exits, on Windows there is a problem: it won't work
> because files cannot be deleted if a process holds a handle on them.
> The symptom:
> 
>Rename from 'xxx/.git/index.lock' to 'xxx/.git/index' failed.
>Should I try again? (y/n)
> 
> Spawning child processes with bInheritHandles==FALSE would not work
> because no file handles would be inherited, not even the hStdXxx
> handles in STARTUPINFO (stdin/stdout/stderr).
> 
> Opening every file with O_NOINHERIT does not work, either, as e.g.
> git-upload-pack expects inherited file handles.
> 
> This leaves us with the only way out: creating temp files with the
> O_NOINHERIT flag. This flag is Windows-specific, however. For our
> purposes, it is equivalent our purposes) to O_CLOEXEC (which does not
> exist on Windows), so let's just open temporary files with the
> O_CLOEXEC flag and map that flag to O_NOINHERIT on Windows.
> 
> This fixes the test that we just introduced to demonstrate the problem.
> 
> Signed-off-by: Ben Wijen 
> Signed-off-by: Johannes Schindelin 
> ---
> compat/mingw.h| 4 
> t/t6026-merge-attr.sh | 2 +-
> tempfile.c| 2 +-
> 3 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/compat/mingw.h b/compat/mingw.h
> index 95e128f..753e641 100644
> --- a/compat/mingw.h
> +++ b/compat/mingw.h
> @@ -67,6 +67,10 @@ typedef int pid_t;
> #define F_SETFD 2
> #define FD_CLOEXEC 0x1
> 
> +#if !defined O_CLOEXEC && defined O_NOINHERIT
> +#define O_CLOEXECO_NOINHERIT
> +#endif
> +
> #ifndef EAFNOSUPPORT
> #define EAFNOSUPPORT WSAEAFNOSUPPORT
> #endif
> diff --git a/t/t6026-merge-attr.sh b/t/t6026-merge-attr.sh
> index 3d28c78..dd8f88d 100755
> --- a/t/t6026-merge-attr.sh
> +++ b/t/t6026-merge-attr.sh
> @@ -181,7 +181,7 @@ test_expect_success 'up-to-date merge without common 
> ancestor' '
>   )
> '
> 
> -test_expect_success !MINGW 'custom merge does not lock index' '
> +test_expect_success 'custom merge does not lock index' '
>   git reset --hard anchor &&
>   write_script sleep-one-second.sh <<-\EOF &&
>   sleep 1 &
> diff --git a/tempfile.c b/tempfile.c
> index 0af7ebf..db3981d 100644
> --- a/tempfile.c
> +++ b/tempfile.c
> @@ -120,7 +120,7 @@ int create_tempfile(struct tempfile *tempfile, const char 
> *path)
>   prepare_tempfile_object(tempfile);
> 
>   strbuf_add_absolute_path(>filename, path);
> - tempfile->fd = open(tempfile->filename.buf, O_RDWR | O_CREAT | O_EXCL, 
> 0666);
> + tempfile->fd = open(tempfile->filename.buf, O_RDWR | O_CREAT | O_EXCL | 
> O_CLOEXEC, 0666);

This solution works great for me. I struggled with a similar problem 
in my Git filter protocol series:
http://public-inbox.org/git/20160810130411.12419-16-larsxschnei...@gmail.com/

I also tried selectively disabling inheritance for file handles but it looks 
like
as this is not easily possible right now: 
https://github.com/git-for-windows/git/issues/770#issuecomment-238816331

- Lars--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/3] diff-highlight: add some tests.

2016-08-15 Thread Lars Schneider

> On 30 Jul 2016, at 17:11, Brian Henderson  wrote:
> 
> ---
> contrib/diff-highlight/Makefile  |  5 ++
> contrib/diff-highlight/t/Makefile| 19 +++
> contrib/diff-highlight/t/t9400-diff-highlight.sh | 63 ++
> contrib/diff-highlight/t/test-diff-highlight.sh  | 69 
> 4 files changed, 156 insertions(+)
> create mode 100644 contrib/diff-highlight/Makefile
> create mode 100644 contrib/diff-highlight/t/Makefile
> create mode 100644 contrib/diff-highlight/t/t9400-diff-highlight.sh
> create mode 100644 contrib/diff-highlight/t/test-diff-highlight.sh
> 
> diff --git a/contrib/diff-highlight/Makefile b/contrib/diff-highlight/Makefile

Would it make sense to add the contrib tests to the Travis-CI build, too?
https://travis-ci.org/git/git/branches

- Lars--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 14/15] convert: add filter..process option

2016-08-12 Thread Lars Schneider

> On 12 Aug 2016, at 19:13, Junio C Hamano <gits...@pobox.com> wrote:
> 
> Lars Schneider <larsxschnei...@gmail.com> writes:
> 
>>> If we do the success first and then error out halfway, we
>>> still have to clean up, so I do not see how this impacts
>>> implementation?
>> 
>> That is true. The reasoning is that an error in between is somewhat
>> less expected. Therefore additional work is OK.
>> 
>> An error upfront is much more likely because it is also a mechanism
>> for the filter to reject certain files. If the filter is configured
>> as "required=false" then this reject would actually be OK.
> 
> Unless the reasoning is "an error in between is so rare that we are
> OK if the protocol misbehaves and the receiving end omits error
> handing", I am not so sure how "therefore additional work is OK" is
> a reasonable conclusion.

Maybe I need to reword. An error is detected in either way if something
goes wrong. The advantage of the two step status is that if we fail early
then Git does not even need to create structures to read the response.

See Peff's answer here:
http://public-inbox.org/git/20160806121421.bs7n4lhed7phdshb%40sigill.intra.peff.net/
http://public-inbox.org/git/20160805222710.chefh5kiktyzketh%40sigill.intra.peff.net/

Thanks,
Lars--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 14/15] convert: add filter..process option

2016-08-12 Thread Lars Schneider

> On 12 Aug 2016, at 19:07, Stefan Beller <sbel...@google.com> wrote:
> 
> On Fri, Aug 12, 2016 at 9:59 AM, Lars Schneider
> <larsxschnei...@gmail.com> wrote:
>> 
>> The welcome message is necessary to distinguish the long running
>> filter protocol (v2) from the current one-shot filter protocol (v1).
>> This is becomes important if a users tries to use a v1 clean/smudge
>> filter with the v2 git config settings.
> 
> Oh I see, that's why we're at v2 now.
> How do you distinguish between v1 and v2? Does the welcome message
> need to follow a certain pattern to be recognized to make it v2+ ?

v1 has no format at all. It works like this:

1. Git starts the filter process
2. Git writes the entire file via stdin to the filter process
3. Git reads the result via stdout from the filter process
3. Git stops the filter process


Any v2+ would need to deal with the following:

packet:  git> git-filter-client
packet:  git> version=2
packet:  git> version=2+
packet:  git> 
packet:  git< git-filter-server

Everything after could be different in v2+ compared to v2.

Thanks,
Lars
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 14/15] convert: add filter..process option

2016-08-12 Thread Lars Schneider

> On 12 Aug 2016, at 18:48, Stefan Beller  wrote:
> 
> On Fri, Aug 12, 2016 at 9:38 AM, Jeff King  wrote:
>> On Fri, Aug 12, 2016 at 09:33:18AM -0700, Stefan Beller wrote:
>> 
 If the result content is empty then the filter is expected to respond
 with a success status and an empty list.
 
 packet:  git< status=success\n
 packet:  git< 
 packet:  git<   # empty content!
 packet:  git<   # empty list!
 
>>> 
>>> Why do we need the last flush packet? We'd expect as many successes
>>> as we send out contents? Do we plan on interleaving operation, i.e.
>>> Git sends out 10 files but the filter process is not as fast as Git sending
>>> out and the answers trickle in slowly?
>> 
>> There was prior discussion in the thread, but basically, it is there to
>> be able to signal an error that is encountered midway through sending
>> the file (i.e., to say "status=error"). If you do not have a final
>> flush, then you would send nothing, and the receiver would be left
>> wondering if you were successful, or if it simply did not get your error
>> report yet.
> 
>I did not follow the prior discussion, so I approached this review with
>no prior knowledge from prior reviews, but instead read through and
>was asking a lot of questions that came up immediately. In case my
>questions are too dumb just omit them, but I thought they were good
>material to answer in a commit message ("Why did we do it this way
>and not differently").

Thanks! That's very helpful and I will address your questions in the commit
message as anyone looking at this commit in the future will have no prior 
knowledge, too.


> Thanks for the explanation. So this is similar as the situation below
> that we wait for the flush and then an error/success report?

Correct. If we would just process the status packet then we wouldn't
even need to wait for the flush. I added flush because that allows us
to send an arbitrary list of key=value pairs in the future.


 If the filter experiences an error during processing, then it can
 send the status "error". Depending on the `filter..required`
 flag Git will interpret that as error but it will not stop or restart
 the filter process.
 
 packet:  git< status=success\n
>>> 
>>> So the first success is meaningless essentially?
>>> Would it make sense to move the sucess behind the content sending
>>> in all cases?
>> 
>> No, the first success says "good so far, here's the file content". The
>> second says "I succeeded in sending you the file content".
>> 
>> You _can_ drop the first one, but it may be more convenient for the
>> receiver to know up-front that there was a failure.
> 
> 
> If there was a failure upfront, it would become
> 
> packet:  git< 
> # no content is encapsulated here
> packet:  git< 
> packet:  git< status=error\n
> packet:  git< 

No, a failure upfront would look like this (see documentation):


packet:  git< status=error\n
packet:  git< 


No content and no 2nd key=value pair list is exchanged after an error.



> so from a protocol side I'd claim it doesn't look bad.
> I assume with convenient you mean the implementation
> side of things?
> 
> If we do the success first and then error out halfway, we
> still have to clean up, so I do not see how this impacts
> implementation?

That is true. The reasoning is that an error in between is somewhat
less expected. Therefore additional work is OK.

An error upfront is much more likely because it is also a mechanism
for the filter to reject certain files. If the filter is configured
as "required=false" then this reject would actually be OK.

Thanks,
Lars--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 14/15] convert: add filter..process option

2016-08-12 Thread Lars Schneider

> On 12 Aug 2016, at 18:33, Stefan Beller <sbel...@google.com> wrote:
> 
> On Wed, Aug 10, 2016 at 6:04 AM,  <larsxschnei...@gmail.com> wrote:
>> From: Lars Schneider <larsxschnei...@gmail.com>
>> 
>> Git's clean/smudge mechanism invokes an external filter process for every
>> single blob that is affected by a filter. If Git filters a lot of blobs
>> then the startup time of the external filter processes can become a
>> significant part of the overall Git execution time.
>> 
>> In a preliminary performance test this developer used a clean/smudge filter
>> written in golang to filter 12,000 files. This process took 364s with the
>> existing filter mechanism and 5s with the new mechanism. See details here:
>> https://github.com/github/git-lfs/pull/1382
>> 
>> This patch adds the `filter..process` string option which, if used,
>> keeps the external filter process running and processes all blobs with
>> the packet format (pkt-line) based protocol over standard input and standard
>> output described below.
>> 
>> Git starts the filter when it encounters the first file
>> that needs to be cleaned or smudged. After the filter started
>> Git sends a welcome message, a list of supported protocol
>> version numbers, and a flush packet. Git expects to read the
>> welcome message and one protocol version number from the
>> previously sent list. Afterwards Git sends a list of supported
>> capabilities and a flush packet. Git expects to read a list of
>> desired capabilities, which must be a subset of the supported
>> capabilities list, and a flush packet as response:
>> 
>> packet:  git> git-filter-client
>> packet:  git> version=2
>> packet:  git> version=42
>> packet:  git> 
>> packet:  git< git-filter-server
>> packet:  git< version=2
> 
> what follows is specific to version=2?
> version 42 may deem capabilities a bad idea?

"version=42" is just an example to show how the initialization could look
like in a distant future when we support even another protocol version.

You are correct, what follows is specific to version=2. I will state
that more clearly in the documentation.

Can you try to rephrase "version 42 may deem capabilities a bad idea?"
I am not sure I understand what you mean.


> 
>> packet:  git> clean=true
>> packet:  git> smudge=true
>> packet:  git> not-yet-invented=true
>> packet:  git> 
>> packet:  git< clean=true
>> packet:  git< smudge=true
>> packet:  git< 
>> 
>> Supported filter capabilities in version 2 are "clean" and
>> "smudge".
> 
> I assume version 2 is an example here and we actually start with v1?

No, it is actually called version 2 because I consider the current
clean/smudge protocol version 1.


> Can you clarify why we need welcome messages?
> (Is there a technical reason, or better debuggability for humans?)

The welcome message is necessary to distinguish the long running
filter protocol (v2) from the current one-shot filter protocol (v1).
This is becomes important if a users tries to use a v1 clean/smudge
filter with the v2 git config settings.


>> Afterwards Git sends a list of "key=value" pairs terminated with
>> a flush packet. The list will contain at least the filter command
>> (based on the supported capabilities) and the pathname of the file
>> to filter relative to the repository root. Right after these packets
>> Git sends the content split in zero or more pkt-line packets and a
>> flush packet to terminate content.
>> 
>> packet:  git> command=smudge\n
>> packet:  git> pathname=path/testfile.dat\n
>> packet:  git> 
>> packet:  git> CONTENT
>> packet:  git> 
>> 
>> 
>> The filter is expected to respond with a list of "key=value" pairs
>> terminated with a flush packet. If the filter does not experience
>> problems then the list must contain a "success" status. Right after
>> these packets the filter is expected to send the content in zero
>> or more pkt-line packets and a flush packet at the end. Finally, a
>> second list of "key=value" pairs terminated with a flush packet
>> is expected. The filter can change the status in the second list.
>> 
>> packet:  git< status=success\n
>> packet:  git&

Re: `git stash --help` tries to pull up nonexistent file gitstack.html

2016-08-12 Thread Lars Schneider

> On 12 Aug 2016, at 17:48, Junio C Hamano  wrote:
> 
> Joseph Musser  writes:
> 
>> Looks like a simple typo.
> 
> Unfortunately this does not reproduce to me (built from source on
> Ubuntu Linux).

I tried it with the latest released version on Windows and OSX (2.9.2)
and was not able to reproduce it, too.

- Lars
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 05/15] pkt-line: add packet_write_gently_fmt()

2016-08-10 Thread Lars Schneider

> On 10 Aug 2016, at 15:43, Jeff King  wrote:
> 
> On Wed, Aug 10, 2016 at 03:04:01PM +0200, larsxschnei...@gmail.com wrote:
> 
>> +int packet_write_gently_fmt(int fd, const char *fmt, ...)
>> +{
>> +static struct strbuf buf = STRBUF_INIT;
>> +va_list args;
>> +
>> +strbuf_reset();
>> +va_start(args, fmt);
>> +format_packet(1, , fmt, args);
>> +va_end(args);
>> +packet_trace(buf.buf + 4, buf.len - 4, 1);
>> +return (write_in_full(fd, buf.buf, buf.len) == buf.len ? 0 : -1);
>> +}
> 
> Could the end of this function just be:
> 
>  return packet_write_gently(fd, buf.buf, buf.len);
> 
> ? I guess we'd prefer to avoid that, because it incurs an extra
> memmove() of the data.

I don't think the memmove would be that expensive. However, format_packet()
already creates the packet_header and packet_write_gently would do the same
again, no?


> Similarly, I'd think this could share code with the non-gentle form
> (which should be able to just call this and die() if returns an error).
> Though sometimes the va_list transformation makes that awkward.

Yeah, the code duplication annoyed me, too. va_list was the reason I did it
that way. Do you think that is something that needs to be addressed in the
series?

Thanks,
Lars
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 03/15] pkt-line: add `gentle` parameter to format_packet()

2016-08-10 Thread Lars Schneider

> On 10 Aug 2016, at 15:37, Jeff King <p...@peff.net> wrote:
> 
> On Wed, Aug 10, 2016 at 03:29:26PM +0200, Lars Schneider wrote:
> 
>> 
>>> On 10 Aug 2016, at 15:15, Jeff King <p...@peff.net> wrote:
>>> 
>>> On Wed, Aug 10, 2016 at 03:03:59PM +0200, larsxschnei...@gmail.com wrote:
>>> 
>>>> From: Lars Schneider <larsxschnei...@gmail.com>
>>>> 
>>>> format_packet() dies if the caller wants to format a packet larger than
>>>> LARGE_PACKET_MAX. Certain callers might prefer an error response instead.
>>> 
>>> I am not sure I agree here. Certainly I see the usefulness of gently
>>> handling a failure to write(). But if you are passing in too-large
>>> buffers, isn't that a bug in the program?
>>> 
>>> How would you recover, except by splitting up the content? That might
>>> not be possible depending on how you are using the pkt-lines. And even
>>> if it is, wouldn't it be simpler to split it up before sending it to
>>> format_packet()?
>> 
>> Good argument. I agree - this patch should be dropped.
> 
> Actually, after reading further, one thought did occur to me. Let's say
> you are writing to a smudge filter, and one of the header packets you
> send has the filename in it. So you might do something like:
> 
>  if (packet_write_fmt_gently(fd, "filename=%s", filename) < 0) {
>   if (filter_required)
>   die(...);
>   else
>   return -1; /* we tried our best; skip smudge */
>  }
> 
> The "recovery" there is not to try sending again, but rather to give up.
> And that is presumably a sane outcome for somebody who tries to checkout
> a filename larger than 64K.

Yes!


> It does still feel a little weird that you cannot tell the difference
> between a write() error and bad input. Because you really might want to
> do something different between the two. Like:
> 
>  #define MAX_FILENAME (PKTLINE_DATA_MAXLEN - strlen("filename"))
> 
>  if (filename > MAX_FILENAME) {
>   warning("woah, that name is ridiculous; truncating");
>   ret = packet_write_fmt_gently(fd, "%.*s", MAX_FILENAME, filename);
>  } else
>   ret = packet_write_fmt_gently(fd, "%s", filename);


I can do that. However, I wouldn't truncate the filename as this
might create a weird outcome. I would just let the filter fail.

OK?

- Lars--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 02/15] pkt-line: call packet_trace() only if a packet is actually send

2016-08-10 Thread Lars Schneider

> On 10 Aug 2016, at 15:13, Jeff King <p...@peff.net> wrote:
> 
> On Wed, Aug 10, 2016 at 03:03:58PM +0200, larsxschnei...@gmail.com wrote:
> 
>> From: Lars Schneider <larsxschnei...@gmail.com>
>> 
>> The packet_trace() call is not ideal in format_packet() as we would print
>> a trace when a packet is formatted and (potentially) when the packet is
>> actually send. This was no problem up until now because format_packet()
>> was only used by one function. Fix it by moving the trace call into the
>> function that actually sends the packet.
> 
> It looks like there are two functions: packet_write() and
> packet_buf_write().

I did not call trace in packet_buf_write() because this function does not
perform any writes.


> Your patch only touches one of them, and it looks like we would fail to
> trace many packets (e.g., see receive-pack.c:report(), which uses
> packet_buf_write() and then write()s out the result).

I see. But isn't it confusing if packet_buf_write() issues a trace call?
If I just call this function then nothing happens at all. Shouldn't the
trace call be made in receive-pack.c:report() ? Or shouldn't receive-pack
let pkt-line.c perform the write calls?

-Lars


> PS Also, s/send/sent/ in the commit message.

Thanks :)
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 04/15] pkt-line: add packet_write_gently()

2016-08-10 Thread Lars Schneider

> On 10 Aug 2016, at 15:28, Jeff King <p...@peff.net> wrote:
> 
> On Wed, Aug 10, 2016 at 03:04:00PM +0200, larsxschnei...@gmail.com wrote:
> 
>> From: Lars Schneider <larsxschnei...@gmail.com>
>> 
>> packet_write() has two shortcomings. First, it uses format_packet() which
>> lets the caller only send string data via "%s". That means it cannot be
>> used for arbitrary data that may contain NULs. Second, it will always
>> die on error.
>> 
>> Add packet_write_gently() which writes arbitrary data and returns `0` for
>> success and `-1` for an error.
> 
> So now we have packet_write() and packet_write_gently(), but they differ
> in more than just whether they are gentle. That seems like a weird
> interface.
> 
> Should we either be picking a new name (e.g., packet_write_mem() or
> something), or migrating packet_write() to packet_write_fmt()?

Done in "[PATCH v5 08/15] pkt-line: rename packet_write() to packet_write_fmt()"


>> diff --git a/pkt-line.c b/pkt-line.c
>> index e6b8410..4f25748 100644
>> --- a/pkt-line.c
>> +++ b/pkt-line.c
>> @@ -3,6 +3,7 @@
>> #include "run-command.h"
>> 
>> char packet_buffer[LARGE_PACKET_MAX];
>> +char packet_write_buffer[LARGE_PACKET_MAX];
> 
> Should this be static? I.e., are random other bits of the code allowed
> to write into it (I guess not because it's not declared in pkt-line.h).

static is better!


>> +int packet_write_gently(const int fd_out, const char *buf, size_t size)
>> +{
>> +if (size > PKTLINE_DATA_MAXLEN)
>> +return -1;
>> +packet_trace(buf, size, 1);
>> +memmove(packet_write_buffer + 4, buf, size);
> 
> It looks like this iteration drops the idea of callers using a
> LARGE_PACKET_MAX buffer and only filling it at "buf+4" with
> PKTLINE_DATA_MAXLEN bytes (which is fine).
> 
> I wonder if we still need PKTLINE_DATA_MAXLEN, or of it is just
> obscuring things. The magic number "4" still appears separately here,
> and it actually makes it harder to see that things are correct.  I.e.,
> doing:
> 
>  if (size > sizeof(packet_write_buffer) - 4)
>   return -1;
>  memmove(packet_write_buffer + 4, buf, size);
> 
> is more obviously correct, because you do not have to wonder about the
> relationship between the size of your buffer and the macro.
> 
> It might still be worth having PKTLINE_DATA_MAXLEN public, though, if
> callers use it to size their input to packet_write_gently().

I agree. In a later patch I am using PKTLINE_DATA_MAXLEN inside pkt-line.c,
too. I will change it to your suggestion.

For now I would remove PKTLINE_DATA_MAXLEN because it should be an 
implementation
detail of pkt-line.c (plus it is not used by anyone).

Thanks,
Lars--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 03/15] pkt-line: add `gentle` parameter to format_packet()

2016-08-10 Thread Lars Schneider

> On 10 Aug 2016, at 15:15, Jeff King <p...@peff.net> wrote:
> 
> On Wed, Aug 10, 2016 at 03:03:59PM +0200, larsxschnei...@gmail.com wrote:
> 
>> From: Lars Schneider <larsxschnei...@gmail.com>
>> 
>> format_packet() dies if the caller wants to format a packet larger than
>> LARGE_PACKET_MAX. Certain callers might prefer an error response instead.
> 
> I am not sure I agree here. Certainly I see the usefulness of gently
> handling a failure to write(). But if you are passing in too-large
> buffers, isn't that a bug in the program?
> 
> How would you recover, except by splitting up the content? That might
> not be possible depending on how you are using the pkt-lines. And even
> if it is, wouldn't it be simpler to split it up before sending it to
> format_packet()?

Good argument. I agree - this patch should be dropped.

Thanks,
Lars
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 04/15] pkt-line: add packet_write_gently()

2016-08-10 Thread Lars Schneider

> On 10 Aug 2016, at 20:21, Junio C Hamano <gits...@pobox.com> wrote:
> 
> Lars Schneider <larsxschnei...@gmail.com> writes:
> 
>>> On 10 Aug 2016, at 19:17, Junio C Hamano <gits...@pobox.com> wrote:
>>> 
>> OK. Does this mean I can leave the "packet_write()" to "packet_write_fmt()"
>> rename as is in this series?
> 
> I didn't really check what order you are doing things to answer
> that.
> 
> If the function that is introduced in this step is a version of
> packet_write_fmt() that does its thing only gently, you would want
> to do the rename s/packet_write/packet_write_fmt/ before this step,
> and then add the new function as packet_write_fmt_gently(), I would
> think.

OK - will fix. I did it that way because I thought it would be easier
if we decide to drop the big rename patch.

Thanks,
Lars

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


<    2   3   4   5   6   7   8   9   10   11   >