Re: [PATCH 06/12] log: add -P as a synonym for --perl-regexp
On Sat, Apr 8, 2017 at 10:25 PM, Ævar Arnfjörð Bjarmasonwrote: > Add a short -P option as a synonym for the longer --perl-regexp, for > consistency with the options the corresponding grep invocations > accept. > > This was intentionally omitted in commit 727b6fc3ed ("log --grep: > accept --basic-regexp and --perl-regexp", 2012-10-03) for unspecified > future use. > > Since nothing has come along in over 4 1/2 years that's wanted to use > it, it's more valuable to make it consistent with "grep" than to keep > it open for future use, and to avoid the confusion of -P meaning > different things for grep & log, as is the case with the -G option. I initially had a strong reaction to the above "4 1/2 years entitles us to do anything that might inconvenience future developers" reasoning, but as long as we already have -E for extended regexp usable, even if we will never be able to use -G for basic regexp, I am OK with giving -P for pcre to be consistent with "git grep". I'd be even supportive if others agree with this change. Thanks.
Re: [PATCH 2/2] test-lib: exhaustively insert non-alnum ASCII into the TRASH_DIRECTORY name
> Change the test library to insert non-alphanumeric ASCII characters > into the TRASH_DIRECTORY name, that's the directory the test library > creates, chdirs to and runs each individual test from. > > Unless test_fails_on_unusual_directory_names=1 is declared before > importing test-lib.sh (and if perl isn't available on the system), the > trash directory will contain every non-alphanumeric character in > ASCII, in order. At the very least there must be an easier way to disable this, e.g. a command line option. This change is sure effective in smoking out bugs, but it's a major annoyance during development when it comes to debugging a test. At first I could not even cd into the trash directory, because TAB completing the directory name with all those non-printable characters didn't work (this may be a bug in the bash-completion package). And simply copy-pasting the dirname didn't work either, because 'ls' displayed it as trash directory.t9902-completion.??? !"#$%&'()*+,-:;<=>?@[\]^_`{|}~? After some headscratching, Sunday night may be my excuse, I figured out that 'cd tr*' works... only to be greeted with the ugliest-ever three-line(!) shell prompt. Therefore I would say that this should not even be enabled by default in test-lib.sh, so at least running a test directly from the command line as ./t1234-foo.sh would considerately give us an easily accessible trash directory even without any command line options. We could enable it for 'make test' by default via GIT_TEST_OPTS in t/Makefile, though. > This includes all the control characters, !, [], {} etc. the "." > character isn't included because it's already in the directory name, > and nor is "/" for obvious reasons, although that would actually work, > we'd just create a subdirectory, which would make the tests harder to > inspect when they fail.i 1. Heh. How an additional subdirectory would make the tests harder to inspect is nothing compared to the effect of all the other characters. 2. s/i$// > This change is inspired by the "submodule: prevent backslash expantion > in submodule names" patch[1]. If we'd had backslashes in the > TRASH_DIRECTORY all along that bug would have been fixed a long time > ago. This will flag such issues by marking tests that currently fail > with "test_fails_on_unusual_directory_names=1", ensure that new tests > aren't added unless a discussion is had about why the code can't > handle unusual pathnames, and prevent future regressions. > > 1. <20170407172306.172673-1-bmw...@google.com> > --- > t/README | 12 > t/test-lib.sh | 4 > 2 files changed, 16 insertions(+) > > diff --git a/t/README b/t/README > index ab386c3681..314dd40221 100644 > --- a/t/README > +++ b/t/README > @@ -345,6 +345,18 @@ assignment to variable 'test_description', like this: > This test registers the following structure in the cache > and tries to run git-ls-files with option --frotz.' > > +By default the tests will be run from a directory with a highly > +unusual filename that includes control characters, a newline, various > +punctuation etc., this is done to smoke out any bugs related to path > +handling. If for whatever reason the tests can't deal with such > +unusual path names, set: > + > +test_fails_on_unusual_directory_names=1 > + > +Before sourcing 'test-lib.sh' as described below. This option is > +mainly intended to grandfather in existing broken tests & code, and > +should usually not be used in new code, instead your tests or code > +probably need fixing. > > Source 'test-lib.sh' > > diff --git a/t/test-lib.sh b/t/test-lib.sh > index 13b5696822..089ff5ac7d 100644 > --- a/t/test-lib.sh > +++ b/t/test-lib.sh > @@ -914,6 +914,10 @@ fi > > # Test repository > TRASH_DIRECTORY="trash directory.$(basename "$0" .sh)" > +if test -z "$test_fails_on_unusual_directory_names" -a "$(perl -e 'print > 1+1' 2>/dev/null)" = "2" > +then > + TRASH_DIRECTORY="$TRASH_DIRECTORY.$(perl -e 'print join q[], grep { > /[^[:alnum:]]/ and !m<[./]> } map chr, 0x01..0x7f')" > +fi > test -n "$root" && TRASH_DIRECTORY="$root/$TRASH_DIRECTORY" > case "$TRASH_DIRECTORY" in > /*) ;; # absolute path is good > -- > 2.11.0
Re: [PATCH] Make git log work for git CWD outside of work tree
Danny Sauerwrites: > That's feedback I was looking for - what's the "right" way to get the > path to the file. My knee-jerk reaction is that you SHOULDN'T have to do anything special to "get the path". I do not have the code in front of me (this is my "down" week after all), so let me give the general outline of how to reason about this, with the knowledge of code evolution of the system. In the beginning, Git worked only from the top level of the working tree. If you have your project in ~/myproject with a file "f" in a directory "d", that file is ~/myproject/d/f", and you would do $ cd ~/myproject $ git add d/f to add it. If Git wanted to access the index, it just accessed ".git/index" as a relative path. If Git wanted to access a file at the top of the working tree, say ".gitignore", it just accessed ".gitignore" as a relatlvei path. Because it only worked from the top. Then we added a support for running Git from a subdirectory, so that you can say "cd ~/myproject/d && git add f". In order to keep the existing code that wants to access the index as ".git/index" and wants to access the top-level ".gitignore" as ".gitignore" working, the support to run Git from a subdirectory is designed this way: - Each main program of subcommand (e.g. cmd_log()) receives a parameter "prefix", which tells what subdirectory you were in when you started Git. - Before running the main program of subcommand, Git chdir(2)s up to the top level of the working tree. - The main program of subcommand receives the command line from the user intact. It is responsible for prepending the prefix to the path the user gave it from the command line. So if you did "cd ~/myproject/d && git add f", Git goes up to "~/myproject", passes argv=("f", NULL) and prefix="d/" to cmd_add(). Adding to the index wants to read and update the index, which is still done by opening ".git/index" (relative to the toplevel as before), and inspecting the top-level .gitigore file is done by opening ".gitignore" (relative to the toplevel as before). And the cmd_add() forms the path "d/f" by using the prefix "d/" and the user-supplied pathname "f". When we first added a support for having the (equivalent of) ".git/" directory outside the working tree by setting GIT_DIR environment variable, again, you can only use Git from the top-level of the working tree. Instead of "~/myproject/.git", you can keep your repository metainformation in say "~/mypro.git/" and point at it with GIT_DIR environment variable, and said $ export GIT_DIR=~/mypro.git $ cd ~/myproject $ git add d/f Later we also added GIT_WORK_TREE environment variable to be used together with GIT_DIR so that you can start from ~/myproject/d, very similarly to how you worked from a subdirectory without these environment variables, i.e. $ export GIT_DIR=~/mypro.git GIT_WORK_TREE=~/myproject $ cd ~/myproject/d $ git add f The way this support was added was to go to the top-level of the working tree (i.e. "~/myproject" in this example) and passing the prefix (i.e. "d" in this example). Notice that in all of the above configurations, if a Git command knows a path to something that is relative to the top of the working tree (e.g. ".git/index" in the ancient Git, ".gitignore" at the top-level that governs the entire working tree, or ".mailmap"), it should just be able to open that path without asking "where is the top of the working tree?". So if your directory arrangement is a variation of these basic configurations supported, e.g. if your git directory is ~/myproject/.git and your working tree is ~/myproject, and you use the GIT_DIR and GIT_WORK_TREE to point at them, regardless of which subdirectory of $GIT_WORK_TREE you started with, Git should already have done chdir(2) to ~/myproject/ before it starts cmd_log(), and it should be able to just open and read ".mailmap" (of course, it needs to limit itself to do so only when it is working with a repository with a working tree). If your arrangement is even more exotic, e.g. you have these two variables set, and then are running from OUTSIDE the working tree, my knee-jerk reaction is that you should get your head examined, as it is totally unclear what "git add foo" would mean if you did this: $ export GIT_DIR=~/myproject/.git GIT_WORK_TREE=~/myproject $ cd ~/myproject/../somewhere/else $ git add foo But it should still "work" in the sense that the above command should notice that you are trying to add "../somewhere/else/foo" to the index, which is obviously nonsense, and die with a message. Similarly, if you replace "git add foo" with "git log", it still should work in the above, i.e. $ export GIT_DIR=~/myproject/.git GIT_WORK_TREE=~/myproject $ cd ~/myproject/../somewhere/else $ git log If Git is not chdir(2)ing to ~/myproject before calling cmd_log() in the above (again, this is my down week so I didn't and
Re: [PATCH 0/2] test: Detect *lots* of bugs by adding non-alnum to trash dir names
Am 09.04.2017 um 21:11 schrieb Ævar Arnfjörð Bjarmason: This series changes the test library so that we interpolate the result of: perl -e 'print join q[], grep { /[^[:alnum:]]/ and !m<[./]> } map chr, 0x01..0x7f' Into the trash directory name we generate. Doing this makes 30% of the test suite fail. Wow. I know that I tend to bring out weird errors in systems, but this is much deeper than I could have ever expected. You might want to additionally test directory names like "asdf" and 'asdf' - some shell might try to be smart about such names and automatically unquote them. This would probably mean running the test suite multiple times, which some people find offensive (myself included). There's also high-bit characters that might cause oddness. 0x80 and 0xff look like interesting edge cases to me. Note that just appending these bytes is going to cause failures with filesystems that expect UTF-8 file names, you'd want to use a Unicode code point that gets encoded using a 0x80 resp. 0xff byte (I suspect 0xff is actually not a valid UTF-8 encoding at all but I'd have to double-check that). Is git supposed to run fine under Windows, or under Linux with an ntfs filesystem driver? If yes, then I'd expect this patch to cause the other 70% of tests to crash, because NTFS forbids several non-alnum printable characters, among them <=>\
[PATCH] grep: plug a trivial memory leak
Change the cleanup phase for the grep command to free the pathspec struct that's allocated earlier in the same block, and used just a few lines earlier. With "grep hi README.md" valgrind reports a loss of 239 bytes now, down from 351. The relevant --num-callers=40 --leak-check=full --show-leak-kinds=all backtrace is: [...] 187 (112 direct, 75 indirect) bytes in 1 blocks are definitely lost in loss record 70 of 110 [...]at 0x4C2BBAF: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) [...]by 0x60B339: do_xmalloc (wrapper.c:59) [...]by 0x60B2F6: xmalloc (wrapper.c:86) [...]by 0x576B37: parse_pathspec (pathspec.c:652) [...]by 0x4519F0: cmd_grep (grep.c:1215) [...]by 0x4062EF: run_builtin (git.c:371) [...]by 0x40544D: handle_builtin (git.c:572) [...]by 0x4060A2: run_argv (git.c:624) [...]by 0x4051C6: cmd_main (git.c:701) [...]by 0x4C5901: main (common-main.c:43) Signed-off-by: Ævar Arnfjörð Bjarmason--- Since pretty much any non-trivial git command leaks because it skips cleanup are patches in this category even accepted? Worth a try... builtin/grep.c | 1 + 1 file changed, 1 insertion(+) diff --git a/builtin/grep.c b/builtin/grep.c index 65070c52fc..3ffb5b4e81 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -1299,6 +1299,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix) hit |= wait_all(); if (hit && show_in_pager) run_pager(, prefix); + clear_pathspec(); free_grep_patterns(); return !hit; } -- 2.11.0
Re: [PATCH v5 3/8] convert: Split start_multi_file_filter into two separate functions
> On 07 Apr 2017, at 14:03, Ben Peartwrote: > > To enable future reuse of the filter..process infrastructure, > split start_multi_file_filter into two separate parts. > > start_multi_file_filter will now only contain the generic logic to > manage the creation and tracking of the child process in a hashmap. > > start_multi_file_filter_fn is a protocol specific initialization > function that will negotiate the multi-file-filter interface version > and capabilities. > > Signed-off-by: Ben Peart > --- > convert.c | 63 ++- > 1 file changed, 38 insertions(+), 25 deletions(-) > > diff --git a/convert.c b/convert.c > index 793c29ebfd..404757eac9 100644 > --- a/convert.c > +++ b/convert.c > @@ -565,35 +565,14 @@ static void stop_multi_file_filter(struct child_process > *process) > finish_command(process); > } > > -static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, > const char *cmd) > +static int start_multi_file_filter_fn(struct cmd2process *entry) > { > int err; > - struct cmd2process *entry; > - struct child_process *process; > - const char *argv[] = { cmd, NULL }; > struct string_list cap_list = STRING_LIST_INIT_NODUP; > char *cap_buf; > const char *cap_name; > - > - entry = xmalloc(sizeof(*entry)); > - entry->cmd = cmd; > - entry->supported_capabilities = 0; > - process = >process; > - > - child_process_init(process); > - process->argv = argv; > - process->use_shell = 1; > - process->in = -1; > - process->out = -1; > - process->clean_on_exit = 1; > - process->clean_on_exit_handler = stop_multi_file_filter; > - > - if (start_command(process)) { > - error("cannot fork to run external filter '%s'", cmd); > - return NULL; > - } > - > - hashmap_entry_init(entry, strhash(cmd)); > + struct child_process *process = >process; > + const char *cmd = entry->cmd; > > sigchain_push(SIGPIPE, SIG_IGN); > > @@ -642,7 +621,41 @@ static struct cmd2process > *start_multi_file_filter(struct hashmap *hashmap, cons > done: > sigchain_pop(SIGPIPE); > > - if (err || errno == EPIPE) { > + if (err || errno == EPIPE) > + err = err ? err : errno; Nice! I should have done this here, too: https://github.com/git/git/blob/b14f27f91770e0f99f64135348977a0ce1c7993a/convert.c#L755 This is clearly a bug in my code. I'll send a patch shortly. > + > + return err; > +} > + > +static struct cmd2process *start_multi_file_filter(struct hashmap *hashmap, > const char *cmd) > +{ > + int err; > + struct cmd2process *entry; > + struct child_process *process; > + const char *argv[] = { cmd, NULL }; > + > + entry = xmalloc(sizeof(*entry)); > + entry->cmd = cmd; > + entry->supported_capabilities = 0; > + process = >process; > + > + child_process_init(process); > + process->argv = argv; > + process->use_shell = 1; > + process->in = -1; > + process->out = -1; > + process->clean_on_exit = 1; > + process->clean_on_exit_handler = stop_multi_file_filter; > + > + if (start_command(process)) { > + error("cannot fork to run external filter '%s'", cmd); > + return NULL; > + } > + > + hashmap_entry_init(entry, strhash(cmd)); > + > + err = start_multi_file_filter_fn(entry); > + if (err) { > error("initialization for external filter '%s' failed", cmd); > kill_multi_file_filter(hashmap, entry); > return NULL; > -- > 2.12.0.windows.1.31.g1548525701.dirty Looks good to me. Thanks, Lars
Re: [PATCH v5 2/8] convert: move packet_write_list() into pkt-line as packet_writel()
> On 07 Apr 2017, at 14:03, Ben Peartwrote: > > Add packet_writel() which writes multiple lines in a single call and > then calls packet_flush_gently(). Update convert.c to use the new > packet_writel() function from pkt-line. > > Signed-off-by: Ben Peart > --- > convert.c | 23 ++- > pkt-line.c | 19 +++ > pkt-line.h | 1 + > 3 files changed, 22 insertions(+), 21 deletions(-) > > diff --git a/convert.c b/convert.c > index 8d652bf27c..793c29ebfd 100644 > --- a/convert.c > +++ b/convert.c > @@ -521,25 +521,6 @@ static struct cmd2process > *find_multi_file_filter_entry(struct hashmap *hashmap, > return hashmap_get(hashmap, , NULL); > } > > -static int packet_write_list(int fd, const char *line, ...) > -{ > - va_list args; > - int err; > - va_start(args, line); > - for (;;) { > - if (!line) > - break; > - if (strlen(line) > LARGE_PACKET_DATA_MAX) > - return -1; > - err = packet_write_fmt_gently(fd, "%s\n", line); > - if (err) > - return err; > - line = va_arg(args, const char*); > - } > - va_end(args); > - return packet_flush_gently(fd); > -} > - > static void read_multi_file_filter_status(int fd, struct strbuf *status) > { > struct strbuf **pair; > @@ -616,7 +597,7 @@ static struct cmd2process *start_multi_file_filter(struct > hashmap *hashmap, cons > > sigchain_push(SIGPIPE, SIG_IGN); > > - err = packet_write_list(process->in, "git-filter-client", "version=2", > NULL); > + err = packet_writel(process->in, "git-filter-client", "version=2", > NULL); > if (err) > goto done; > > @@ -632,7 +613,7 @@ static struct cmd2process *start_multi_file_filter(struct > hashmap *hashmap, cons > if (err) > goto done; > > - err = packet_write_list(process->in, "capability=clean", > "capability=smudge", NULL); > + err = packet_writel(process->in, "capability=clean", > "capability=smudge", NULL); > > for (;;) { > cap_buf = packet_read_line(process->out, NULL); > diff --git a/pkt-line.c b/pkt-line.c > index 58842544b4..2788aa1af6 100644 > --- a/pkt-line.c > +++ b/pkt-line.c > @@ -171,6 +171,25 @@ int packet_write_fmt_gently(int fd, const char *fmt, ...) > return status; > } > > +int packet_writel(int fd, const char *line, ...) > +{ > + va_list args; > + int err; > + va_start(args, line); > + for (;;) { > + if (!line) > + break; > + if (strlen(line) > LARGE_PACKET_DATA_MAX) > + return -1; > + err = packet_write_fmt_gently(fd, "%s\n", line); > + if (err) > + return err; > + line = va_arg(args, const char*); > + } > + va_end(args); > + return packet_flush_gently(fd); > +} > + > static int packet_write_gently(const int fd_out, const char *buf, size_t size) > { > static char packet_write_buffer[LARGE_PACKET_MAX]; > diff --git a/pkt-line.h b/pkt-line.h > index 12b18991f6..cb3eda9695 100644 > --- a/pkt-line.h > +++ b/pkt-line.h > @@ -25,6 +25,7 @@ void packet_buf_flush(struct strbuf *buf); > void packet_buf_write(struct strbuf *buf, const char *fmt, ...) > __attribute__((format (printf, 2, 3))); > int packet_flush_gently(int fd); > int packet_write_fmt_gently(int fd, const char *fmt, ...) > __attribute__((format (printf, 2, 3))); > +int packet_writel(int fd, const char *line, ...); > int write_packetized_from_fd(int fd_in, int fd_out); > int write_packetized_from_buf(const char *src_in, size_t len, int fd_out); > > -- > 2.12.0.windows.1.31.g1548525701.dirty Looks good to me. Thanks, Lars
Re: [PATCH v5 1/8] pkt-line: add packet_read_line_gently()
> On 07 Apr 2017, at 14:03, Ben Peartwrote: > > Add packet_read_line_gently() to enable reading a line without dying on > EOF. > > Signed-off-by: Ben Peart > --- > pkt-line.c | 12 > pkt-line.h | 10 ++ > 2 files changed, 22 insertions(+) > > diff --git a/pkt-line.c b/pkt-line.c > index d4b6bfe076..58842544b4 100644 > --- a/pkt-line.c > +++ b/pkt-line.c > @@ -323,6 +323,18 @@ char *packet_read_line(int fd, int *len_p) > return packet_read_line_generic(fd, NULL, NULL, len_p); > } > > +int packet_read_line_gently(int fd, int *dst_len, char** dst_line) > +{ > + int len = packet_read(fd, NULL, NULL, > + packet_buffer, sizeof(packet_buffer), > + PACKET_READ_CHOMP_NEWLINE|PACKET_READ_GENTLE_ON_EOF); Really minor nit: you could align the parameters according to fd similar to the "packet_read_line_generic" code (tab width 8). > + if (dst_len) > + *dst_len = len; > + if (dst_line) > + *dst_line = (len > 0) ? packet_buffer : NULL; Minor nit: The explicit check "len > 0" is necessary here as len can be "-1". The original "packet_read_line_generic" just checks for "len". I think it would be nice if the code would be consistent and both would check "len > 0". > + return len; > +} > + > char *packet_read_line_buf(char **src, size_t *src_len, int *dst_len) > { > return packet_read_line_generic(-1, src, src_len, dst_len); > diff --git a/pkt-line.h b/pkt-line.h > index 18eac64830..12b18991f6 100644 > --- a/pkt-line.h > +++ b/pkt-line.h > @@ -74,6 +74,16 @@ int packet_read(int fd, char **src_buffer, size_t > *src_len, char > char *packet_read_line(int fd, int *size); > > /* > + * Convenience wrapper for packet_read that sets the > PACKET_READ_GENTLE_ON_EOF > + * and CHOMP_NEWLINE options. The return value specifies the number of bytes > + * read into the buffer or -1 on truncated input. if the *dst_line parameter s/if/If/ > + * is not NULL it will return NULL for a flush packet and otherwise points to This sentences is a bit confusing to me. Maybe: If the *dst_line parameter is not NULL, then it will point to a static buffer (that may be overwritten by subsequent calls) or it will return NULL for a flush packet. ... feel free to completely ignore this as I am no native speaker. > + * a static buffer (that may be overwritten by subsequent calls). If the size > + * parameter is not NULL, the length of the packet is written to it. > + */ > +int packet_read_line_gently(int fd, int *size, char** dst_line); > + > +/* > * Same as packet_read_line, but read from a buf rather than a descriptor; > * see packet_read for details on how src_* is used. > */ If you send another round then you could address the minor nits. If not, then this patch as it is looks good to me. Thanks, Lars
[PATCH 2/2] test-lib: exhaustively insert non-alnum ASCII into the TRASH_DIRECTORY name
Change the test library to insert non-alphanumeric ASCII characters into the TRASH_DIRECTORY name, that's the directory the test library creates, chdirs to and runs each individual test from. Unless test_fails_on_unusual_directory_names=1 is declared before importing test-lib.sh (and if perl isn't available on the system), the trash directory will contain every non-alphanumeric character in ASCII, in order. This includes all the control characters, !, [], {} etc. the "." character isn't included because it's already in the directory name, and nor is "/" for obvious reasons, although that would actually work, we'd just create a subdirectory, which would make the tests harder to inspect when they fail.i This change is inspired by the "submodule: prevent backslash expantion in submodule names" patch[1]. If we'd had backslashes in the TRASH_DIRECTORY all along that bug would have been fixed a long time ago. This will flag such issues by marking tests that currently fail with "test_fails_on_unusual_directory_names=1", ensure that new tests aren't added unless a discussion is had about why the code can't handle unusual pathnames, and prevent future regressions. 1. <20170407172306.172673-1-bmw...@google.com> --- t/README | 12 t/test-lib.sh | 4 2 files changed, 16 insertions(+) diff --git a/t/README b/t/README index ab386c3681..314dd40221 100644 --- a/t/README +++ b/t/README @@ -345,6 +345,18 @@ assignment to variable 'test_description', like this: This test registers the following structure in the cache and tries to run git-ls-files with option --frotz.' +By default the tests will be run from a directory with a highly +unusual filename that includes control characters, a newline, various +punctuation etc., this is done to smoke out any bugs related to path +handling. If for whatever reason the tests can't deal with such +unusual path names, set: + +test_fails_on_unusual_directory_names=1 + +Before sourcing 'test-lib.sh' as described below. This option is +mainly intended to grandfather in existing broken tests & code, and +should usually not be used in new code, instead your tests or code +probably need fixing. Source 'test-lib.sh' diff --git a/t/test-lib.sh b/t/test-lib.sh index 13b5696822..089ff5ac7d 100644 --- a/t/test-lib.sh +++ b/t/test-lib.sh @@ -914,6 +914,10 @@ fi # Test repository TRASH_DIRECTORY="trash directory.$(basename "$0" .sh)" +if test -z "$test_fails_on_unusual_directory_names" -a "$(perl -e 'print 1+1' 2>/dev/null)" = "2" +then + TRASH_DIRECTORY="$TRASH_DIRECTORY.$(perl -e 'print join q[], grep { /[^[:alnum:]]/ and !m<[./]> } map chr, 0x01..0x7f')" +fi test -n "$root" && TRASH_DIRECTORY="$root/$TRASH_DIRECTORY" case "$TRASH_DIRECTORY" in /*) ;; # absolute path is good -- 2.11.0
[PATCH v3 2/4] t0021: make debug log file name configurable
The "rot13-filter.pl" helper wrote its debug logs always to "rot13-filter.log". Make this configurable by defining the log file as first parameter of "rot13-filter.pl". This is useful if "rot13-filter.pl" is configured multiple times similar to the subsequent patch 'convert: add "status=delayed" to filter process protocol'. Signed-off-by: Lars Schneider--- t/t0021-conversion.sh | 44 ++-- t/t0021/rot13-filter.pl | 8 +--- 2 files changed, 27 insertions(+), 25 deletions(-) diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh index ff2424225b..0139b460e7 100755 --- a/t/t0021-conversion.sh +++ b/t/t0021-conversion.sh @@ -28,7 +28,7 @@ file_size () { } filter_git () { - rm -f rot13-filter.log && + rm -f *.log && git "$@" } @@ -342,7 +342,7 @@ test_expect_success 'diff does not reuse worktree files that need cleaning' ' ' test_expect_success PERL 'required process filter should filter data' ' - test_config_global filter.protocol.process "rot13-filter.pl clean smudge" && + test_config_global filter.protocol.process "rot13-filter.pl debug.log clean smudge" && test_config_global filter.protocol.required true && rm -rf repo && mkdir repo && @@ -375,7 +375,7 @@ test_expect_success PERL 'required process filter should filter data' ' IN: clean testsubdir/test3 '\''sq'\'',\$x=.r $S3 [OK] -- OUT: $S3 . [OK] STOP EOF - test_cmp_count expected.log rot13-filter.log && + test_cmp_count expected.log debug.log && git commit -m "test commit 2" && rm -f test2.r "testsubdir/test3 '\''sq'\'',\$x=.r" && @@ -388,7 +388,7 @@ test_expect_success PERL 'required process filter should filter data' ' IN: smudge testsubdir/test3 '\''sq'\'',\$x=.r $S3 [OK] -- OUT: $S3 . [OK] STOP EOF - test_cmp_exclude_clean expected.log rot13-filter.log && + test_cmp_exclude_clean expected.log debug.log && filter_git checkout --quiet --no-progress empty-branch && cat >expected.log <<-EOF && @@ -397,7 +397,7 @@ test_expect_success PERL 'required process filter should filter data' ' IN: clean test.r $S [OK] -- OUT: $S . [OK] STOP EOF - test_cmp_exclude_clean expected.log rot13-filter.log && + test_cmp_exclude_clean expected.log debug.log && filter_git checkout --quiet --no-progress master && cat >expected.log <<-EOF && @@ -409,7 +409,7 @@ test_expect_success PERL 'required process filter should filter data' ' IN: smudge testsubdir/test3 '\''sq'\'',\$x=.r $S3 [OK] -- OUT: $S3 . [OK] STOP EOF - test_cmp_exclude_clean expected.log rot13-filter.log && + test_cmp_exclude_clean expected.log debug.log && test_cmp_committed_rot13 "$TEST_ROOT/test.o" test.r && test_cmp_committed_rot13 "$TEST_ROOT/test2.o" test2.r && @@ -419,7 +419,7 @@ test_expect_success PERL 'required process filter should filter data' ' test_expect_success PERL 'required process filter takes precedence' ' test_config_global filter.protocol.clean false && - test_config_global filter.protocol.process "rot13-filter.pl clean" && + test_config_global filter.protocol.process "rot13-filter.pl debug.log clean" && test_config_global filter.protocol.required true && rm -rf repo && mkdir repo && @@ -439,12 +439,12 @@ test_expect_success PERL 'required process filter takes precedence' ' IN: clean test.r $S [OK] -- OUT: $S . [OK] STOP EOF - test_cmp_count expected.log rot13-filter.log + test_cmp_count expected.log debug.log ) ' test_expect_success PERL 'required process filter should be used only for "clean" operation only' ' - test_config_global filter.protocol.process "rot13-filter.pl clean" && + test_config_global filter.protocol.process "rot13-filter.pl debug.log clean" && rm -rf repo && mkdir repo && ( @@ -462,7 +462,7 @@ test_expect_success PERL 'required process filter should be used only for "clean IN: clean test.r $S [OK] -- OUT: $S . [OK] STOP EOF - test_cmp_count expected.log rot13-filter.log && + test_cmp_count expected.log debug.log && rm test.r && @@ -474,12 +474,12 @@ test_expect_success PERL 'required process filter should be used only for "clean init handshake complete STOP
[PATCH v3 1/4] t0021: keep filter log files on comparison
The filter log files are modified on comparison. Write the modified log files to temp files for comparison to fix this. This is useful for the subsequent patch 'convert: add "status=delayed" to filter process protocol'. Signed-off-by: Lars Schneider--- t/t0021-conversion.sh | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh index 161f560446..ff2424225b 100755 --- a/t/t0021-conversion.sh +++ b/t/t0021-conversion.sh @@ -42,10 +42,10 @@ test_cmp_count () { for FILE in "$expect" "$actual" do sort "$FILE" | uniq -c | - sed -e "s/^ *[0-9][0-9]*[ ]*IN: /x IN: /" >"$FILE.tmp" && - mv "$FILE.tmp" "$FILE" || return + sed -e "s/^ *[0-9][0-9]*[ ]*IN: /x IN: /" >"$FILE.tmp" done && - test_cmp "$expect" "$actual" + test_cmp "$expect.tmp" "$actual.tmp" && + rm "$expect.tmp" "$actual.tmp" } # Compare two files but exclude all `clean` invocations because Git can @@ -56,10 +56,10 @@ test_cmp_exclude_clean () { actual=$2 for FILE in "$expect" "$actual" do - grep -v "IN: clean" "$FILE" >"$FILE.tmp" && - mv "$FILE.tmp" "$FILE" + grep -v "IN: clean" "$FILE" >"$FILE.tmp" done && - test_cmp "$expect" "$actual" + test_cmp "$expect.tmp" "$actual.tmp" && + rm "$expect.tmp" "$actual.tmp" } # Check that the contents of two files are equal and that their rot13 version -- 2.12.2
[PATCH v3 4/4] convert: add "status=delayed" to filter process protocol
Some `clean` / `smudge` filters might require a significant amount of time to process a single blob. During this process the Git checkout operation is blocked and Git needs to wait until the filter is done to continue with the checkout. Teach the filter process protocol (introduced in edcc858) to accept the status "delayed" as response to a filter request. Upon this response Git continues with the checkout operation. After the checkout operation Git calls "finish_delayed_checkout" which queries the filter for remaining blobs. If the filter is still working on the completion, then the filter is expected to block. If the filter has completed all remaining blobs then an empty response is expected. Git has a multiple code paths that checkout a blob. Support delayed checkouts only in `clone` (in unpack-trees.c) and `checkout` operations. Signed-off-by: Lars Schneider--- Documentation/gitattributes.txt | 73 +- builtin/checkout.c | 3 + cache.h | 37 ++- convert.c | 150 convert.h | 5 + entry.c | 124 +++- t/t0021-conversion.sh | 73 ++ t/t0021/rot13-filter.pl | 210 unpack-trees.c | 2 + 9 files changed, 587 insertions(+), 90 deletions(-) diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index e0b66c1220..329baa945f 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -425,8 +425,8 @@ packet: git< capability=clean packet: git< capability=smudge packet: git< -Supported filter capabilities in version 2 are "clean" and -"smudge". +Supported filter capabilities in version 2 are "clean", "smudge", +and "delay". Afterwards Git sends a list of "key=value" pairs terminated with a flush packet. The list will contain at least the filter command @@ -512,12 +512,77 @@ the protocol then Git will stop the filter process and restart it with the next file that needs to be processed. Depending on the `filter..required` flag Git will interpret that as error. -After the filter has processed a blob it is expected to wait for -the next "key=value" list containing a command. Git will close +After the filter has processed a command it is expected to wait for +a "key=value" list containing the next command. Git will close the command pipe on exit. The filter is expected to detect EOF and exit gracefully on its own. Git will wait until the filter process has stopped. +Delay +^ + +If the filter supports the "delay" capability, then Git can send the +flag "delay-able" after the filter command and pathname. This flag +denotes that the filter can delay filtering the current blob (e.g. to +compensate network latencies) by responding with no content but with +the status "delayed" and a flush packet. Git will answer with a +"delay-id", a number that identifies the blob, and a flush packet. The +filter acknowledges this number with a "success" status and a flush +packet. + +packet: git> command=smudge +packet: git> pathname=path/testfile.dat +packet: git> delay-able=1 +packet: git> +packet: git> CONTENT +packet: git> +packet: git< status=delayed +packet: git< +packet: git> delay-id=1 +packet: git> +packet: git< status=success +packet: git< + + +If the filter supports the "delay" capability then it must support the +"list_available_blobs" command. If Git sends this command, then the +filter is expected to return a list of "delay_ids" of blobs that are +available. The list must be terminated with a flush packet followed +by a "success" status that is also terminated with a flush packet. If +no blobs for the delayed paths are available, yet, then the filter is +expected to block the response until at least one blob becomes +available. The filter can tell Git that it has no more delayed blobs +by sending an empty list. + +packet: git> command=list_available_blobs +packet: git> +packet: git< 7 +packet: git< 13 +packet: git< +packet: git< status=success +packet: git< + + +After Git received the "delay_ids", it will request the corresponding +blobs again. These requests contain a "delay-id" and an empty content +section. The filter is expected to respond with the smudged content +in the usual way as explained above. + +packet: git> command=smudge +packet: git> pathname=test-delay10.a +packet: git> delay-id=0 +packet: git> +packet: git> #
[PATCH 1/2] tests: mark tests that fail when the TEST_DIRECTORY is unusual
Mark tests that fail when the TEST_DIRECTORY has a name that's unusual, i.e. contains control characters, various punctuation etc. This change is followed by a change that adds the code that picks up test_fails_on_unusual_directory_names to test-lib.sh, but comes before that change to not break most of the test suite during bisecting. This change is the result of: for f in $(grep -B3 last_fail_time .prove |grep sh | perl -pe 's/^ (.*?):/$1/'); do perl -0666 -pi -e 's/^(test_description)/test_fails_on_unusual_directory_names=1\n$1/m' $f done After running the test suite with the aforementioned change to test-lib.sh. The only exception is t-basic.sh, since its testing the test suite itself its failing 'test --verbose' test needs a change to the test suite wrapper code to be excluded. Signed-off-by: Ævar Arnfjörð Bjarmason--- t/t-basic.sh | 2 ++ t/t0003-attributes.sh| 1 + t/t0021-conversion.sh| 1 + t/t0060-path-utils.sh| 1 + t/t0300-credentials.sh | 1 + t/t0302-credential-store.sh | 1 + t/t1006-cat-file.sh | 1 + t/t1013-read-tree-submodule.sh | 1 + t/t1020-subdirectory.sh | 1 + t/t1050-large.sh | 1 + t/t1060-object-corruption.sh | 1 + t/t1300-repo-config.sh | 1 + t/t1305-config-include.sh| 1 + t/t1308-config-set.sh| 1 + t/t1309-early-config.sh | 1 + t/t1500-rev-parse.sh | 1 + t/t1504-ceiling-dirs.sh | 1 + t/t1507-rev-parse-upstream.sh| 1 + t/t1510-repo-setup.sh| 1 + t/t1515-rev-parse-outside-repo.sh| 1 + t/t2013-checkout-submodule.sh| 1 + t/t2025-worktree-add.sh | 1 + t/t2027-worktree-list.sh | 1 + t/t2028-worktree-move.sh | 1 + t/t3040-subprojects-basic.sh | 1 + t/t3050-subprojects-fetch.sh | 1 + t/t3409-rebase-preserve-merges.sh| 1 + t/t3426-rebase-submodule.sh | 1 + t/t3501-revert-cherry-pick.sh| 1 + t/t3512-cherry-pick-submodule.sh | 1 + t/t3513-revert-submodule.sh | 1 + t/t3600-rm.sh| 1 + t/t3900-i18n-commit.sh | 1 + t/t3906-stash-submodule.sh | 1 + t/t4027-diff-submodule.sh| 1 + t/t4030-diff-textconv.sh | 1 + t/t4031-diff-rewrite-binary.sh | 1 + t/t4035-diff-quiet.sh| 1 + t/t4041-diff-submodule-option.sh | 1 + t/t4059-diff-submodule-not-initialized.sh| 1 + t/t4060-diff-submodule-option-diff-format.sh | 1 + t/t4137-apply-submodule.sh | 1 + t/t4203-mailmap.sh | 1 + t/t4207-log-decoration-colors.sh | 1 + t/t4255-am-submodule.sh | 1 + t/t5000-tar-tree.sh | 1 + t/t5001-archive-attr.sh | 1 + t/t5002-archive-attr-pattern.sh | 1 + t/t5003-archive-zip.sh | 1 + t/t5150-request-pull.sh | 1 + t/t5300-pack-object.sh | 1 + t/t5304-prune.sh | 1 + t/t5305-include-tag.sh | 1 + t/t5306-pack-nobase.sh | 1 + t/t5310-pack-bitmaps.sh | 1 + t/t5311-pack-bitmaps-shallow.sh | 1 + t/t5400-send-pack.sh | 1 + t/t5401-update-hooks.sh | 1 + t/t5402-post-merge-hook.sh | 1 + t/t5403-post-checkout-hook.sh| 1 + t/t5404-tracking-branches.sh | 1 + t/t5406-remote-rejects.sh| 1 + t/t5407-post-rewrite-hook.sh | 1 + t/t5500-fetch-pack.sh| 1 + t/t5501-fetch-push-alternates.sh | 1 + t/t5502-quickfetch.sh| 1 + t/t5505-remote.sh| 1 + t/t5509-fetch-push-namespaces.sh | 1 + t/t5510-fetch.sh | 1 + t/t5512-ls-remote.sh | 1 + t/t5514-fetch-multiple.sh| 1 + t/t5515-fetch-merge-logic.sh | 1 + t/t5516-fetch-push.sh| 1 +
[PATCH v3 3/4] t0021: write "OUT" only on success
"rot13-filter.pl" used to write "OUT " to the debug log even in case of an abort or error. Fix this by writing "OUT " to the debug log only in the successful case if output is actually written. This is useful for the subsequent patch 'convert: add "status=delayed" to filter process protocol'. Signed-off-by: Lars Schneider--- t/t0021-conversion.sh | 6 +++--- t/t0021/rot13-filter.pl | 6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/t/t0021-conversion.sh b/t/t0021-conversion.sh index 0139b460e7..0c04d346a1 100755 --- a/t/t0021-conversion.sh +++ b/t/t0021-conversion.sh @@ -588,7 +588,7 @@ test_expect_success PERL 'process filter should restart after unexpected write f cat >expected.log <<-EOF && START init handshake complete - IN: smudge smudge-write-fail.r $SF [OK] -- OUT: $SF [WRITE FAIL] + IN: smudge smudge-write-fail.r $SF [OK] -- [WRITE FAIL] START init handshake complete IN: smudge test.r $S [OK] -- OUT: $S . [OK] @@ -634,7 +634,7 @@ test_expect_success PERL 'process filter should not be restarted if it signals a cat >expected.log <<-EOF && START init handshake complete - IN: smudge error.r $SE [OK] -- OUT: 0 [ERROR] + IN: smudge error.r $SE [OK] -- [ERROR] IN: smudge test.r $S [OK] -- OUT: $S . [OK] IN: smudge test2.r $S2 [OK] -- OUT: $S2 . [OK] STOP @@ -673,7 +673,7 @@ test_expect_success PERL 'process filter abort stops processing of all further f cat >expected.log <<-EOF && START init handshake complete - IN: smudge abort.r $SA [OK] -- OUT: 0 [ABORT] + IN: smudge abort.r $SA [OK] -- [ABORT] STOP EOF test_cmp_exclude_clean expected.log debug.log && diff --git a/t/t0021/rot13-filter.pl b/t/t0021/rot13-filter.pl index 0b943bb377..5e43faeec1 100644 --- a/t/t0021/rot13-filter.pl +++ b/t/t0021/rot13-filter.pl @@ -153,9 +153,6 @@ while (1) { die "bad command '$command'"; } - print $debug "OUT: " . length($output) . " "; - $debug->flush(); - if ( $pathname eq "error.r" ) { print $debug "[ERROR]\n"; $debug->flush(); @@ -178,6 +175,9 @@ while (1) { die "${command} write error"; } + print $debug "OUT: " . length($output) . " "; + $debug->flush(); + while ( length($output) > 0 ) { my $packet = substr( $output, 0, $MAX_PACKET_CONTENT_SIZE ); packet_bin_write($packet); -- 2.12.2
[PATCH v3 0/4] convert: add "status=delayed" to filter process protocol
Hi, in v3 "delay filter" became a series. Patch 1 to 3 are minor t0021 test adjustments and patch 4 is the actual change. Most significant change since v2: If the filter delays a blob, then Git send the filter a "delay-id". Git uses this "delay-id" as index in an array of delayed "cached entries". When Git requests a previously delayed blob then it will only send the "delay-id" to identify the blob. The actual blob content will not be send to the filter, again. If you review this series then please read the "Delay" section in "Documentation/gitattributes.txt" first for an overview of the delay mechanism. The changes in "t/t0021/rot13-filter.pl" are easier to review if you ignore whitespace changes. Thanks, Lars RFC: http://public-inbox.org/git/d10f7c47-14e8-465b-8b7a-a09a1b28a...@gmail.com/ v1: http://public-inbox.org/git/20170108191736.47359-1-larsxschnei...@gmail.com/ v2: http://public-inbox.org/git/20170226184816.30010-1-larsxschnei...@gmail.com/ Base Ref: v2.12.0 Web-Diff: https://github.com/larsxschneider/git/commit/08a461f103 Checkout: git fetch https://github.com/larsxschneider/git filter-process/delay-v3 && git checkout 08a461f103 Interdiff (v2..v3): diff --git a/Documentation/gitattributes.txt b/Documentation/gitattributes.txt index f6bad8db40..329baa945f 100644 --- a/Documentation/gitattributes.txt +++ b/Documentation/gitattributes.txt @@ -425,8 +425,8 @@ packet: git< capability=clean packet: git< capability=smudge packet: git< -Supported filter capabilities in version 2 are "clean" and -"smudge". +Supported filter capabilities in version 2 are "clean", "smudge", +and "delay". Afterwards Git sends a list of "key=value" pairs terminated with a flush packet. The list will contain at least the filter command @@ -473,15 +473,6 @@ packet: git< # empty content! packet: git< # empty list, keep "status=success" unchanged! -If the request cannot be fulfilled within a reasonable amount of time -then the filter can respond with a "delayed" status and a flush packet. -Git will perform the same request at a later point in time, again. The -filter can delay a response multiple times for a single request. - -packet: git< status=delayed -packet: git< - - In case the filter cannot or does not want to process the content, it is expected to respond with an "error" status. @@ -521,12 +512,77 @@ the protocol then Git will stop the filter process and restart it with the next file that needs to be processed. Depending on the `filter..required` flag Git will interpret that as error. -After the filter has processed a blob it is expected to wait for -the next "key=value" list containing a command. Git will close +After the filter has processed a command it is expected to wait for +a "key=value" list containing the next command. Git will close the command pipe on exit. The filter is expected to detect EOF and exit gracefully on its own. Git will wait until the filter process has stopped. +Delay +^ + +If the filter supports the "delay" capability, then Git can send the +flag "delay-able" after the filter command and pathname. This flag +denotes that the filter can delay filtering the current blob (e.g. to +compensate network latencies) by responding with no content but with +the status "delayed" and a flush packet. Git will answer with a +"delay-id", a number that identifies the blob, and a flush packet. The +filter acknowledges this number with a "success" status and a flush +packet. + +packet: git> command=smudge +packet: git> pathname=path/testfile.dat +packet: git> delay-able=1 +packet: git> +packet: git> CONTENT +packet: git> +packet: git< status=delayed +packet: git< +packet: git> delay-id=1 +packet: git> +packet: git< status=success +packet: git< + + +If the filter supports the "delay" capability then it must support the +"list_available_blobs" command. If Git sends this command, then the +filter is expected to return a list of "delay_ids" of blobs that are +available. The list must be terminated with a flush packet followed +by a "success" status that is also terminated with a flush packet. If +no blobs for the delayed paths are available, yet, then the filter is +expected to block the response until at least one blob becomes +available. The filter can tell Git that it has no more delayed blobs +by sending an empty list. + +packet: git> command=list_available_blobs +packet: git> +packet: git< 7 +packet: git< 13 +packet: git< +packet: git< status=success +packet: git< + +
[PATCH 0/2] test: Detect *lots* of bugs by adding non-alnum to trash dir names
There's a patch now on the ML for an issue with read v.s. read -r in submodule revealed when the submodule name contains a \. That fix is fine, but we should do better and structurally arrange our tests to detect these sorts of issues before they're reported. This series changes the test library so that we interpolate the result of: perl -e 'print join q[], grep { /[^[:alnum:]]/ and !m<[./]> } map chr, 0x01..0x7f' Into the trash directory name we generate. Doing this makes 30% of the test suite fail. Most of the diffstat below is just adding: test_fails_on_unusual_directory_names=1 To those tests that failed, which makes them not use these garbage trash directory names. Some of those failures are legitimate bugs, some are an artifact of us using shellscripting as our test language (namely interpolating ' and " somewhere). It might make sense to mark these tests more granularly, e.g.: test_fails_on_unusual_directory_name_types=quotes etc., if the test fails just because of " or ', but I think as it is this makes sense for inclusion, it makes sure we don't regress on the remaining 70% of our test suite. Ævar Arnfjörð Bjarmason (2): tests: mark tests that fail when the TEST_DIRECTORY is unusual test-lib: exhaustively insert non-alnum ASCII into the TRASH_DIRECTORY name t/README | 12 t/t-basic.sh | 2 ++ t/t0003-attributes.sh| 1 + t/t0021-conversion.sh| 1 + t/t0060-path-utils.sh| 1 + t/t0300-credentials.sh | 1 + t/t0302-credential-store.sh | 1 + t/t1006-cat-file.sh | 1 + t/t1013-read-tree-submodule.sh | 1 + t/t1020-subdirectory.sh | 1 + t/t1050-large.sh | 1 + t/t1060-object-corruption.sh | 1 + t/t1300-repo-config.sh | 1 + t/t1305-config-include.sh| 1 + t/t1308-config-set.sh| 1 + t/t1309-early-config.sh | 1 + t/t1500-rev-parse.sh | 1 + t/t1504-ceiling-dirs.sh | 1 + t/t1507-rev-parse-upstream.sh| 1 + t/t1510-repo-setup.sh| 1 + t/t1515-rev-parse-outside-repo.sh| 1 + t/t2013-checkout-submodule.sh| 1 + t/t2025-worktree-add.sh | 1 + t/t2027-worktree-list.sh | 1 + t/t2028-worktree-move.sh | 1 + t/t3040-subprojects-basic.sh | 1 + t/t3050-subprojects-fetch.sh | 1 + t/t3409-rebase-preserve-merges.sh| 1 + t/t3426-rebase-submodule.sh | 1 + t/t3501-revert-cherry-pick.sh| 1 + t/t3512-cherry-pick-submodule.sh | 1 + t/t3513-revert-submodule.sh | 1 + t/t3600-rm.sh| 1 + t/t3900-i18n-commit.sh | 1 + t/t3906-stash-submodule.sh | 1 + t/t4027-diff-submodule.sh| 1 + t/t4030-diff-textconv.sh | 1 + t/t4031-diff-rewrite-binary.sh | 1 + t/t4035-diff-quiet.sh| 1 + t/t4041-diff-submodule-option.sh | 1 + t/t4059-diff-submodule-not-initialized.sh| 1 + t/t4060-diff-submodule-option-diff-format.sh | 1 + t/t4137-apply-submodule.sh | 1 + t/t4203-mailmap.sh | 1 + t/t4207-log-decoration-colors.sh | 1 + t/t4255-am-submodule.sh | 1 + t/t5000-tar-tree.sh | 1 + t/t5001-archive-attr.sh | 1 + t/t5002-archive-attr-pattern.sh | 1 + t/t5003-archive-zip.sh | 1 + t/t5150-request-pull.sh | 1 + t/t5300-pack-object.sh | 1 + t/t5304-prune.sh | 1 + t/t5305-include-tag.sh | 1 + t/t5306-pack-nobase.sh | 1 + t/t5310-pack-bitmaps.sh | 1 + t/t5311-pack-bitmaps-shallow.sh | 1 + t/t5400-send-pack.sh | 1 + t/t5401-update-hooks.sh | 1 + t/t5402-post-merge-hook.sh | 1 + t/t5403-post-checkout-hook.sh| 1 + t/t5404-tracking-branches.sh | 1 + t/t5406-remote-rejects.sh| 1 + t/t5407-post-rewrite-hook.sh | 1 +
Re: [PATCH v2] convert: add "status=delayed" to filter process protocol
> On 27 Feb 2017, at 23:11, Jakub Narębskiwrote: > > W dniu 27.02.2017 o 11:32, Lars Schneider pisze: >> >>> On 27 Feb 2017, at 10:58, Jeff King wrote: >>> >>> On Sun, Feb 26, 2017 at 07:48:16PM +0100, Lars Schneider wrote: >>> +If the request cannot be fulfilled within a reasonable amount of time +then the filter can respond with a "delayed" status and a flush packet. +Git will perform the same request at a later point in time, again. The +filter can delay a response multiple times for a single request. + +packet: git< status=delayed +packet: git< + > > Is it something that happens instead of filter process sending the contents Correct! I'll clarify this in v3! >> >> I completely agree - I need to change that. However, the goal of the v2 >> iteration was to get the "convert" interface in an acceptable state. >> That's what I intended to say in the patch comment section: >> >>"Please ignore all changes behind async_convert_to_working_tree() and >> async_filter_finish() for now as I plan to change the implementation >> as soon as the interface is in an acceptable state." > > I think that it is more important to start with a good abstraction, > and the proposal for protocol, rather than getting bogged down in > implementation details that may change as the idea for protocol > extension changes. I'll send out v3 shortly as proposal for a complete solution. >>> I think it would be much more efficient to do something like: >>> >>> [Git issues a request and gives it an opaque index id] >>> git> command=smudge >>> git> pathname=foo >>> git> index=0 >>> git> >>> git> CONTENT >>> git> >>> >>> [The data isn't ready yet, so the filter tells us so...] >>> git< status=delayed >>> git< > > So is it only as replacement for "status=success" + contents or > "status=abort", that is upfront before sending any part of the file? Yes. > Or, as one can assume from the point of the paragraph with the > "status=delayed", it is about replacing null list for success or > "status=error" after sending some part (maybe empty) of a file, > that is: No. As this would complicate things I don't want to support it. (and I clarified that in the docs in v3). > If it would not be undue burden on the filter driver process, we might > require for it to say where to continue at (in bytes), e.g. > >git< from=16426 > > That should, of course, go below index/pathname line. This would make the protocol even more complicated. That's why I don't want to support splitting the response. >>> git< index=0 > > Or a filter driver could have used pathname as an index, that is > >git< pathname=path/testfile.dat In v3 I've used an index to help Git finding the right cache entry quickly. > >>> git< >>> git< CONTENT >>> git< >>> >>> From Git's side, the loop is something like: >>> >>> while (delayed_items > 0) { >>> /* issue a wait, and get back the status/index pair */ >>> status = send_wait(); >>> delayed_items--; > > This looks like my 'event loop' proposal[1][2], see below. I implemented something similar in v3. >> That could work! I had something like that in mind: >> >> I teach Git a new command "list_completed" or similar. The filter >> blocks this call until at least one item is ready for Git. >> Then the filter responds with a list of paths that identify the >> "ready items". Then Git asks for these ready items just with the >> path and not with any content. Could that work? Wouldn't the path >> be "unique" to identify a blob per filter run? > > Why in the "drain" phase it is still Git that needs to ask filter for > contents, one file after another? Wouldn't it be easier and simpler > for filter to finish sending contents, and send signal that it has > finished continue'ing? > > To summarize my earlier emails, current proposal looks for me as if > it were a "busy loop" solution, that is[2]: In v3 the implementation still uses kind of a busy loop (I expect the filter to block if there nothing ready, yet). An event loop would complicate the protocol as the filter would need to initiate an action. Right now only Git initiates actions. > Footnotes: > -- > a) We don't send the Git-side contents of blob again, isn't it? > So we need some protocol extension / new understanding anyway. > for example that we don't send contents if we request path again. Correct - v3 doesn't send the content again. > Also, one thing that we need to be solved, assuming that the proposed > extension allows to send partial data from filter to be delayed and > continued later, is that Git needs to keep this partial response in buf; > this is because of precedence of gitattributes applying: As mentioned above I don't want to support partial data as this complicates things and is of no use for my Git LFS problem case.
Re: [PATCH v2] convert: add "status=delayed" to filter process protocol
> On 27 Feb 2017, at 11:53, Jeff Kingwrote: > > On Mon, Feb 27, 2017 at 11:32:47AM +0100, Lars Schneider wrote: > >> ... > >>> From Git's side, the loop is something like: >>> >>> while (delayed_items > 0) { >>> /* issue a wait, and get back the status/index pair */ >>> status = send_wait(); >>> delayed_items--; >>> >>> /* >>> * use "index" to find the right item in our list of files; >>> * the format can be opaque to the filter, so we could index >>> * it however we like. But probably numeric indices in an array >>> * are the simplest. >>> */ >>> assert(index > 0 && index < nr_items); >>> item[index].status = status; >>> if (status == SUCCESS) >>> read_content([index]); >>> } >>> >>> and the filter side just attaches the "index" string to whatever its >>> internal queue structure is, and feeds it back verbatim when processing >>> that item finishes. >> >> That could work! I had something like that in mind: >> >> I teach Git a new command "list_completed" or similar. The filter >> blocks this call until at least one item is ready for Git. >> Then the filter responds with a list of paths that identify the >> "ready items". Then Git asks for these ready items just with the >> path and not with any content. Could that work? Wouldn't the path >> be "unique" to identify a blob per filter run? > > I think that could work, though I think there are few minor downsides > compared to what I wrote above: > > - if you respond with "these items are ready", and then make Git ask >for each again, it's an extra round-trip for each set of ready >items. You could just say "an item is ready; here it is" in a single >response. For a local pipe the latency is probably negligible, >though. It is true that the extra round-trip is not strictly necessary but I think it simplifies the protocol/the code as I can reuse the convert machinery as is. > - using paths as the index would probably work, but it means Git has >to use the path to find the "struct checkout_entry" again. Which >might mean a hashmap (though if you have them all in a sorted list, >I guess you could also do a binary search). Agreed. I changed my implementation to use an index following your suggestion. > - Using an explicit index communicates to the filter not only what the >index is, but also that Git is prepared to accept a delayed response >for the item. For backwards compatibility, the filter would probably >advertise "I have the 'delayed' capability", and then Git could >choose to use it or not on a per-item basis. Realistically it would >not change from item to item, but rather operation to operation. So >that means we can easily convert the call-sites in Git to the async >approach incrementally. As each one is converted, it turns on the >flag that causes the filter code to send the "index" tag. Agreed. I change the implementation accordingly and I will send out the patches shortly. Thanks, Lars
Re: [PATCH] Make git log work for git CWD outside of work tree
On 4/9/2017 5:54 AM, Johannes Schindelin wrote: > Hi Danny, > > On Sat, 8 Apr 2017, Danny Sauer wrote: >> diff --git a/git.c b/git.c >> index 8ff44f0..e147f01 100644 >> --- a/git.c >> +++ b/git.c >> @@ -440,7 +440,7 @@ static struct cmd_struct commands[] = { >> { "init", cmd_init_db }, >> { "init-db", cmd_init_db }, >> { "interpret-trailers", cmd_interpret_trailers, RUN_SETUP_GENTLY }, >> -{ "log", cmd_log, RUN_SETUP }, >> +{ "log", cmd_log, RUN_SETUP | NEED_WORK_TREE }, >> { "ls-files", cmd_ls_files, RUN_SETUP | SUPPORT_SUPER_PREFIX }, >> { "ls-remote", cmd_ls_remote, RUN_SETUP_GENTLY }, >> { "ls-tree", cmd_ls_tree, RUN_SETUP }, > This may work for you, but it does not work for me, as I often call `git > log` in a bare repository. And that call works, and it should keep > working. I was wondering about that situation, but honestly I wasn't positive how to test it and sort of just went on the hope that the read_mailmap_blob mechanism would continue handling it. So, I guess there's another test case which would be useful, as passing the test cases is what gave me that undeserved confidence. :) And I'm going to need to read a bit more to figure out what a bare repository actually is... > Instead, I think, you need to figure out why the .mailmap file is not read > correctly when you use the GIT_DIR & GIT_WORK_TREE approach. My vague > hunch is that you need to replace the ".mailmap" in read_mailmap() > (defined in mailmap.c): > > err |= read_mailmap_file(map, ".mailmap", repo_abbrev); > > by something like mkpath("%s/%s", get_git_work_tree(), ".mailmap"), but > probably only after testing that we're not in a bare repository (which > would also fix a bug, I suspect, as `git log` in a bare repository > probably heeds a .mailmap file in the current directory, which is > incorrect). I.e. something like: > > if (!is_bare_repository()) { > const char *path = mkpath("%s/%s", > get_git_work_tree(), ".mailmap") > err |= read_mailmap_file(map, path, repo_abbrev); > } That's feedback I was looking for - what's the "right" way to get the path to the file. The call as-is works in any subdirectory of a repository, because there's some logic buried along the way which puts you in the top level of the repo if you started underneath. But if you're outside, it does the same chdir to verify that the top level is a real repo and then returns you to where you started (which is perfectly reasonable). In comparing how shortlog works v/s log, the main difference I could see is the use of RUN_SETUP_GENTLY instead of RUN_SETUP as the flag, which itself seems to only differ by passing in a true value for nongit_ok in run_setup, ultimately leading to "some different directory checks" - which I somewhat got lost in due partially to not fully knowing what a bare repository is. :) But changing that didn't make it work in my (poorly-formalized) testing. I'm on-board with a proper test case being the first situation which needs rectifying. It's off to learn stuff for me, then. Thanks! --Danny
Unexpected working directory in post-receive hook in non-bare repository
Hello, The following snippet reproduces the issue for me (note the remote: line in its output): git --version rm -rf a b git init a cd a echo first >data git add data git commit -m initial cat >>.git/hooks/post-receive <>data git add data git commit -m test git push origin master:not-master According to man githooks "Before Git invokes a hook, it changes its working directory to either the root of the working tree in a non-bare repository, [...]". In this case "a" is non-bare and I expected the command to be run in the working tree; but instead it's run inside .git. (This caused some confusion in my case because I ran "git merge" in the hook which put files in the .git directory and I didn't notice it at first. I know running merge in receive-hooks is "bad practice" but it works fine in my setup.) The same happens for all hooks executed by git-receive-pack: pre-receive, update, post-receive, post-update. Is this a documentation issue or unexpected behavior? Regards Simon -- + privacy is necessary + using gnupg http://gnupg.org + public key id: 0x92FEFDB7E44C32F9 signature.asc Description: PGP signature
Re: [PATCH] push: document & test --force-with-lease with multiple remotes
On Sun, Apr 9, 2017 at 11:55 AM, Simon Ruderichwrote: > Hello, > > I like the documentation update and test. > > On Sat, Apr 08, 2017 at 11:41:00AM +, Ævar Arnfjörð Bjarmason wrote: >> [snip] >> >> ++ >> +Now when the background process runs `git fetch origin` the references >> +on `origin-push` won't be updated, and thus commands like: >> ++ >> + git push --force-with-lease origin > > I think this should be origin-push. Thanks! Well spotted. Will fix in v2 which I'll hold off on sending for now pending more comments. > Regards > Simon > -- > + privacy is necessary > + using gnupg http://gnupg.org > + public key id: 0x92FEFDB7E44C32F9
Re: Tools that do an automatic fetch defeat "git push --force-with-lease"
Jacob Kellerwrote: > Agreed. You "take" a lease whenever you push to the remote or when you > pull from the remote and when you pull into the branch. It should > store something that tracks both the branch and remote branch together > so that you can generalize it to multiple remotes. I don't see why it has to support multiple remotes (but then I don't have much experience with workflows involving multiple remotes, so I may well be missing something). A local branch can only have one remote tracking branch on one remote, and in my view --force-with-lease without arguments works with that remote tracking branch only. Is this view too simple? > It doesn't necessarily track perfectly with a branch that contains > extra work such as when doing pull --rebase, but maybe you have an > idea about that? Maybe I wasn't clear enough about that in my proposal, but I propose to always store the commit hash of the remote tracking branch as a new lease after push and pull, not the local branch. This way it works nicely with pull --rebase and a branch that has extra local commits. -- Stefan Haller Ableton http://www.ableton.com/
Re: [PATCH] Make git log work for git CWD outside of work tree
Hi Danny, On Sat, 8 Apr 2017, Danny Sauer wrote: > Make git log's `--use-mailmap` argument works if the GIT_DIR & > GIT_WORK_TREE env vars are set and git is run from outside of work tree. > Without the NEED_WORK_TREE set on the log subcommand, .mailmap is > silently not found. A laudable goal. How about adding a test case, say, to t/t4203-mailmap.sh, to ensure that Git won't regress on your fix? > diff --git a/git.c b/git.c > index 8ff44f0..e147f01 100644 > --- a/git.c > +++ b/git.c > @@ -440,7 +440,7 @@ static struct cmd_struct commands[] = { > { "init", cmd_init_db }, > { "init-db", cmd_init_db }, > { "interpret-trailers", cmd_interpret_trailers, RUN_SETUP_GENTLY }, > - { "log", cmd_log, RUN_SETUP }, > + { "log", cmd_log, RUN_SETUP | NEED_WORK_TREE }, > { "ls-files", cmd_ls_files, RUN_SETUP | SUPPORT_SUPER_PREFIX }, > { "ls-remote", cmd_ls_remote, RUN_SETUP_GENTLY }, > { "ls-tree", cmd_ls_tree, RUN_SETUP }, This may work for you, but it does not work for me, as I often call `git log` in a bare repository. And that call works, and it should keep working. Instead, I think, you need to figure out why the .mailmap file is not read correctly when you use the GIT_DIR & GIT_WORK_TREE approach. My vague hunch is that you need to replace the ".mailmap" in read_mailmap() (defined in mailmap.c): err |= read_mailmap_file(map, ".mailmap", repo_abbrev); by something like mkpath("%s/%s", get_git_work_tree(), ".mailmap"), but probably only after testing that we're not in a bare repository (which would also fix a bug, I suspect, as `git log` in a bare repository probably heeds a .mailmap file in the current directory, which is incorrect). I.e. something like: if (!is_bare_repository()) { const char *path = mkpath("%s/%s", get_git_work_tree(), ".mailmap") err |= read_mailmap_file(map, path, repo_abbrev); } But you really want to add the test case first, with `test_expect_failure`, to demonstrate what is currently broken, and then triumphantly setting it to `test_expect_success` with your patch. Ciao, Johannes
Re: [PATCH] push: document & test --force-with-lease with multiple remotes
Hello, I like the documentation update and test. On Sat, Apr 08, 2017 at 11:41:00AM +, Ævar Arnfjörð Bjarmason wrote: > [snip] > > ++ > +Now when the background process runs `git fetch origin` the references > +on `origin-push` won't be updated, and thus commands like: > ++ > + git push --force-with-lease origin I think this should be origin-push. Regards Simon -- + privacy is necessary + using gnupg http://gnupg.org + public key id: 0x92FEFDB7E44C32F9 signature.asc Description: PGP signature
Re: Tools that do an automatic fetch defeat "git push --force-with-lease"
Jeff Kingwrote: > > It might be possible to generate these lease tags prior to operations > > which modify history and then maybe having a way to list them so you > > can select which one you meant when you try to use force-with-lease.. > > So yeah, I think that is the more interesting direction. I hadn't > considered resolving the multiple-operation ambiguity at push time. But > I guess it would be something like "you did a rebase on sha1 X at time > T, and then one on Y at time T+N", and you pick which one you're > expecting. I think it's wrong to think about these leases as something that you take before you start a rewindy operation. That's the wrong time to take the lease; by that time, the remote tracking branch may already contain new things that you haven't seen yet, so using that as a lease at that time will overwrite those things later. You have to take the lease at a time where you know that your local branch and the remote tracking branch are up to date with each other, which is after pull and push. And if you do that, there's no multiple-operation ambiguity to deal with at all. > And I think that may be converging on the "integrate" refs that Stefan is > talking about elsewhere (or some isomorphism of it). Does it make things clearer if we don't use the term "integrate", but call the config value in my proposal simply "branch.*.lease"? -- Stefan Haller Ableton http://www.ableton.com/
Re: Tools that do an automatic fetch defeat "git push --force-with-lease"
On Sun, Apr 9, 2017 at 1:38 AM, Stefan Hallerwrote: > Jeff King wrote: > >> > It might be possible to generate these lease tags prior to operations >> > which modify history and then maybe having a way to list them so you >> > can select which one you meant when you try to use force-with-lease.. >> >> So yeah, I think that is the more interesting direction. I hadn't >> considered resolving the multiple-operation ambiguity at push time. But >> I guess it would be something like "you did a rebase on sha1 X at time >> T, and then one on Y at time T+N", and you pick which one you're >> expecting. > > I think it's wrong to think about these leases as something that you > take before you start a rewindy operation. That's the wrong time to take > the lease; by that time, the remote tracking branch may already contain > new things that you haven't seen yet, so using that as a lease at that > time will overwrite those things later. You have to take the lease at a > time where you know that your local branch and the remote tracking > branch are up to date with each other, which is after pull and push. And > if you do that, there's no multiple-operation ambiguity to deal with at > all. Agreed. You "take" a lease whenever you push to the remote or when you pull from the remote and when you pull into the branch. It should store something that tracks both the branch and remote branch together so that you can generalize it to multiple remotes. It doesn't necessarily track perfectly with a branch that contains extra work such as when doing pull --rebase, but maybe you have an idea about that? Thanks, Jake > >> And I think that may be converging on the "integrate" refs that Stefan is >> talking about elsewhere (or some isomorphism of it). > > Does it make things clearer if we don't use the term "integrate", but > call the config value in my proposal simply "branch.*.lease"? > Yes, I think so. > > -- > Stefan Haller > Ableton > http://www.ableton.com/
Re: Tools that do an automatic fetch defeat "git push --force-with-lease"
On Sun, Apr 9, 2017 at 1:38 AM, Stefan Hallerwrote: > Jacob Keller wrote: > >> What if we added a separate command something like: >> >> git create-lease >> >> which you're expected to run at the start of a rewind-y operation and >> it creates a tag (or some other ref like a tag but in a different >> namespace) which is used by force-with-lease? > > The problem with this is that it doesn't help to use "git create-lease" > right before you start your rewind-y operation, because by that time you > may already have fetched. You'd have to use "git create-lease" right > after you pull or push. But at the time I pull I don't know yet whether > I will later want to rewrite the branch, so to be sure I have to do this > every time I pull or push, and then I'd prefer git to do it for me. > No, you don't set the sha1 as the tip of "origin/master" you set it as the tip of "master" after you've performed all the integration and are about to rewind history somehow. >> However, I think using origin/master works fine as long as you don't >> auto-fetch. >> >> If you're doing it right, you can handle origin/master updates by >> checking that your rewind-y stuff is correct for the new origin/master >> RIGHT before you push. > > I'm not sure I understand what you mean by "checking that your rewind-y > stuff is correct for the new origin/master"; does that mean manually > inspecting origin/master to convince youself that you are not > overwriting something new? If so, I don't think this is acceptable. It > is probably ok to work this way if the other party only pushed commits > on top; it's reasonable to expect that you will recognize new commits as > ones that you haven't seen before. But what if the other party has > rewritten the branch and squashed improvements into commits in the > middle of it? The head commit will then look the same as before, and the > only way to tell whether you are overwriting something new is by > comparing the old and new hashes. So then we're back at having to > remember what the old hash was. > You can do a diff rather than a check of the log. Thanks, Jake > > -- > Stefan Haller > Berlin, Germany > http://www.haller-berlin.de/
Re: Tools that do an automatic fetch defeat "git push --force-with-lease"
Jacob Kellerwrote: > What if we added a separate command something like: > > git create-lease > > which you're expected to run at the start of a rewind-y operation and > it creates a tag (or some other ref like a tag but in a different > namespace) which is used by force-with-lease? The problem with this is that it doesn't help to use "git create-lease" right before you start your rewind-y operation, because by that time you may already have fetched. You'd have to use "git create-lease" right after you pull or push. But at the time I pull I don't know yet whether I will later want to rewrite the branch, so to be sure I have to do this every time I pull or push, and then I'd prefer git to do it for me. > However, I think using origin/master works fine as long as you don't > auto-fetch. > > If you're doing it right, you can handle origin/master updates by > checking that your rewind-y stuff is correct for the new origin/master > RIGHT before you push. I'm not sure I understand what you mean by "checking that your rewind-y stuff is correct for the new origin/master"; does that mean manually inspecting origin/master to convince youself that you are not overwriting something new? If so, I don't think this is acceptable. It is probably ok to work this way if the other party only pushed commits on top; it's reasonable to expect that you will recognize new commits as ones that you haven't seen before. But what if the other party has rewritten the branch and squashed improvements into commits in the middle of it? The head commit will then look the same as before, and the only way to tell whether you are overwriting something new is by comparing the old and new hashes. So then we're back at having to remember what the old hash was. -- Stefan Haller Berlin, Germany http://www.haller-berlin.de/
Re: why patch to the gitk no replay?
My patch submission (already the retry) "[PATCH 0/4] Gitk Inotify support" is also pending since 8th of March. I'm also not sure what I did wrong as I'm not used to those mailing list programming - or if Paul is just very busy On Fri, Mar 17, 2017 at 5:53 PM, Jonathan Niederwrote: > Hi, > > yanke131415 wrote: > >> I send a patch to gitk >> project(git://ozlabs.org/~paulus/gitk)with the target email address >> pau...@ozlabs.org. But several days later No replay of this patch i >> receive, is the email address i send patch wrong? Anyone who knows? > > Sending to this mailing list (and cc-ing Paul) is more likely to work. > > Hope that helps, > Jonathan