Re: [PATCH 4/5] trailer: reuse ignore_non_trailer() to ignore conflict lines
From: Junio C Hamano gits...@pobox.com Subject: Re: [PATCH 4/5] trailer: reuse ignore_non_trailer() to ignore conflict lines Date: Fri, 07 Nov 2014 12:27:03 -0800 Christian Couder chrisc...@tuxfamily.org writes: * Copyright (c) 2013, 2014 Christian Couder chrisc...@tuxfamily.org @@ -791,14 +792,24 @@ static int process_input_file(struct strbuf **lines, struct trailer_item **in_tok_last) { int count = 0; -int patch_start, trailer_start, i; +int trailer_start, trailer_end, patch_start, ignore_bytes, i; +struct strbuf sb; /* Get the line count */ while (lines[count]) count++; patch_start = find_patch_start(lines, count); -trailer_start = find_trailer_start(lines, patch_start); + +/* Get the index of the end of the trailers */ +for (i = 0; i patch_start; i++) +strbuf_addbuf(sb, lines[i]); +ignore_bytes = ignore_non_trailer(sb); +for (i = patch_start - 1; i = 0 ignore_bytes 0; i--) +ignore_bytes -= lines[i]-len; +trailer_end = i + 1; Looks like there is an impedance mismatch between the caller and the callee here. I can sort of understand why you might want to have an array of trailer items, one element per line in the trailer block, as that would make it easier on your brain when you have to reorder them, insert a new one or a remove an existing one, but you seem to be keeping _everything_ in that format, an array of strbuf with one strbuf per line, which sounds really wasteful. The data structure might need to be rethoughtbefore this code becomes ready for production. I would have expected it to be more like (1) slurp everything in a single strbuf, (2) find the trailer block inside that single strbuf, splitting what you read in the previous step into one strbuf for stuff before the trailer block, another strbuf for stuff after the trailer block and an array of lines in the tailer block, (3) do whatever your trailer flipping logic inside that array without having to worry about stuff before or after the trailer block and then finally (4) spit the whole thing out by concatenating the stuff before the trailer block, the stuff in the trailer block and the stuff after the trailer block. Oh well... I hope it is better now, as I encapsulated the call to ignore_non_trailer() into a new find_trailer_end() function. There were already find_trailer_start() and find_patch_start(), and I think this way we could have a nice high level line oriented API. Yeah, it won't be as efficient as using only one strbuf and only byte oriented functions, and it looks much less manly too :-) But over time in Git we have developed a number of less efficient but quite clean abstractions like strbuf, argv_array, sha1_array and so on, that we are quite happy with. So yeah, with old age we might have lost some manliness and lament the time when men were really men, and ... Well, let's wait for the hopefully big party^W conference next year to do that together :-) Thanks, Christian. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] trailer: reuse ignore_non_trailer() to ignore conflict lines
Christian Couder chrisc...@tuxfamily.org writes: Yeah, it won't be as efficient as using only one strbuf and only byte oriented functions, and it looks much less manly too :-) But over time in Git we have developed a number of less efficient but quite clean abstractions like strbuf, argv_array, sha1_array and so on, that we are quite happy with. Actually, all these examples you gave are fairly efficient and clean abstractions. I find it insulting to pretend that the one line per strbuf is in the same league. It isn't. And it is not about manliness. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] trailer: reuse ignore_non_trailer() to ignore conflict lines
Junio C Hamano gits...@pobox.com writes: Christian Couder chrisc...@tuxfamily.org writes: Yeah, it won't be as efficient as using only one strbuf and only byte oriented functions, and it looks much less manly too :-) But over time in Git we have developed a number of less efficient but quite clean abstractions like strbuf, argv_array, sha1_array and so on, that we are quite happy with. Actually, all these examples you gave are fairly efficient and clean abstractions. I find it insulting to pretend that the one line per strbuf is in the same league. It isn't. And it is not about manliness. By the way, I do not mean to say that all of strbuf (which is a rather full API) is uniformly efficient and cleanly abstracted. For example, I see strbuf_split() is a handy but a lousy invitation to stupid programmers to do stupid things and needs to be used carefully (rather, you need to carefully consider if a possible solution that uses it is really a good solution). If you start with a single strbuf (e.g. perhaps read from a file or a blob in bulk in an I/O efficient way), pass it around and then have to manipulate each and every line in it to massage the information on each line into other useful form (e.g. perhaps each line is a [+]src:dst refspec element and you are parsing it into an array of struct refspec), it may be tempting to write a parser from a single strbuf into a single refspec, use strbuf_split() to break the single strbuf input into multiple and feed each of them to your parser. But it does not have to be the only way to create such a caller. Unless the final other useful form is an array of strbuf, and you have to remember that the whole point of using strbuf is because it is easy to edit its contents efficiently in place, that needs to further be modified in various ways, the splitting of N lines into N strbuf, only to parse each line one by one, is a huge waste. A right approach may involve introducing a foreach_strbuf_line iterator function that takes a callback, or a macro of the same name that puts a control structure. Going back to the trailer processing, if you are likely to be editing each and every line in the input before producing the final result, an array of strbuf would be a perfectly sensible data structure to use. I just didn't get the impression that that is what is being done. You would be presented a single text with multiple lines (whose natural representation would be a ptr,len that fits in a single strbuf), you would be asked to inspect and manipulate one single section in the middle (namely, the runs of lines that begin with some keyword: ), and return the concatenation of the part before that one section (intact), the section you have manipulated (i.e. the trailer) and possibly the remainder (intact). Outside that single section you will be touching, I do not see a good reason for each of their lines to be first stored in a strbuf (while moving bunch of pointers in the array using ARRAY_GROW() and friends), only to be later concatenated back into a single string. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] trailer: reuse ignore_non_trailer() to ignore conflict lines
Junio C Hamano gits...@pobox.com writes: Junio C Hamano gits...@pobox.com writes: Christian Couder chrisc...@tuxfamily.org writes: Yeah, it won't be as efficient as using only one strbuf and only byte oriented functions, and it looks much less manly too :-) But over time in Git we have developed a number of less efficient but quite clean abstractions like strbuf, argv_array, sha1_array and so on, that we are quite happy with. Actually, all these examples you gave are fairly efficient and clean abstractions. I find it insulting to pretend that the one line per strbuf is in the same league. It isn't. And it is not about manliness. By the way, I do not mean to say that all of strbuf (which is a rather full API) is uniformly efficient and cleanly abstracted. ... Having said all that, please notice that I said might need to be rethought before the code becomes ready for production. After all, it would have perfectly been OK to experiment with this new topic in a script; e.g. this topic being text processing, it might have happened as a Perl script while we get the semantics and desirable features nailed down before rewriting the real thing in C. And if that were what happened here, I wouldn't have been talking about data structures and unnecessary complexity having to have one strbuf per line only to join them into one string (or printf() out to a stream one-after-another) later. In other words, I did not say this code is too ugly to live and we should clean it up as soon as possible or we should revert it from 'master'---it was a mistake to merge it. I consider the trailer code in 'master' today as experimental (in other words, something I might have written in a scripting language as a mock-up if I were doing it) that needs to be worked on further to still get the features and semantics right. And the patch series in this thread are hopefully taking us in the right direction to get them right [*1*]. You could have just said Yeah, I realize that the code is way suboptimal, but lets get it feature complete first and then clean it up, or something, instead of getting defensive with unnecessary excuses. [Footnote] *1* ... even though this step exposes why the approach taken, splitting the input lines first into individual lines without even looking at them, is a wrong one---if you started from a single buffer that is not yet split, it would have been much easier to find where the trailer block is, and that is why you are re-joining the buffer you have already split into lines (i.e. what I called impedance mismatch), which shows that the initial splitting is done prematurely. Hopefully you would realize that by now ;-). -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5] trailer: reuse ignore_non_trailer() to ignore conflict lines
Make sure we look for trailers before any conflict line by reusing the ignore_non_trailer() function. Signed-off-by: Christian Couder chrisc...@tuxfamily.org --- t/t7513-interpret-trailers.sh | 2 ++ trailer.c | 25 ++--- 2 files changed, 20 insertions(+), 7 deletions(-) diff --git a/t/t7513-interpret-trailers.sh b/t/t7513-interpret-trailers.sh index 1efb880..fed053a 100755 --- a/t/t7513-interpret-trailers.sh +++ b/t/t7513-interpret-trailers.sh @@ -232,6 +232,8 @@ test_expect_success 'with message that has comments' ' Reviewed-by: Johan Cc: Peff + # last comment + EOF cat basic_patch expected git interpret-trailers --trim-empty --trailer Cc: Peff message_with_comments actual diff --git a/trailer.c b/trailer.c index f4d51ba..f6aa181 100644 --- a/trailer.c +++ b/trailer.c @@ -2,6 +2,7 @@ #include string-list.h #include run-command.h #include string-list.h +#include commit.h #include trailer.h /* * Copyright (c) 2013, 2014 Christian Couder chrisc...@tuxfamily.org @@ -791,14 +792,24 @@ static int process_input_file(struct strbuf **lines, struct trailer_item **in_tok_last) { int count = 0; - int patch_start, trailer_start, i; + int trailer_start, trailer_end, patch_start, ignore_bytes, i; + struct strbuf sb; /* Get the line count */ while (lines[count]) count++; patch_start = find_patch_start(lines, count); - trailer_start = find_trailer_start(lines, patch_start); + + /* Get the index of the end of the trailers */ + for (i = 0; i patch_start; i++) + strbuf_addbuf(sb, lines[i]); + ignore_bytes = ignore_non_trailer(sb); + for (i = patch_start - 1; i = 0 ignore_bytes 0; i--) + ignore_bytes -= lines[i]-len; + trailer_end = i + 1; + + trailer_start = find_trailer_start(lines, trailer_end); /* Print lines before the trailers as is */ print_lines(lines, 0, trailer_start); @@ -807,14 +818,14 @@ static int process_input_file(struct strbuf **lines, printf(\n); /* Parse trailer lines */ - for (i = trailer_start; i patch_start; i++) { + for (i = trailer_start; i trailer_end; i++) { if (lines[i]-buf[0] != comment_line_char) { struct trailer_item *new = create_trailer_item(lines[i]-buf); add_trailer_item(in_tok_first, in_tok_last, new); } } - return patch_start; + return trailer_end; } static void free_all(struct trailer_item **first) @@ -831,7 +842,7 @@ void process_trailers(const char *file, int trim_empty, struct string_list *trai struct trailer_item *in_tok_last = NULL; struct trailer_item *arg_tok_first; struct strbuf **lines; - int patch_start; + int trailer_end; /* Default config must be setup first */ git_config(git_trailer_default_config, NULL); @@ -840,7 +851,7 @@ void process_trailers(const char *file, int trim_empty, struct string_list *trai lines = read_input_file(file); /* Print the lines before the trailers */ - patch_start = process_input_file(lines, in_tok_first, in_tok_last); + trailer_end = process_input_file(lines, in_tok_first, in_tok_last); arg_tok_first = process_command_line_args(trailers); @@ -851,7 +862,7 @@ void process_trailers(const char *file, int trim_empty, struct string_list *trai free_all(in_tok_first); /* Print the lines after the trailers as is */ - print_lines(lines, patch_start, INT_MAX); + print_lines(lines, trailer_end, INT_MAX); strbuf_list_free(lines); } -- 2.1.2.555.gfbecd99 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] trailer: reuse ignore_non_trailer() to ignore conflict lines
Christian Couder chrisc...@tuxfamily.org writes: * Copyright (c) 2013, 2014 Christian Couder chrisc...@tuxfamily.org @@ -791,14 +792,24 @@ static int process_input_file(struct strbuf **lines, struct trailer_item **in_tok_last) { int count = 0; - int patch_start, trailer_start, i; + int trailer_start, trailer_end, patch_start, ignore_bytes, i; + struct strbuf sb; /* Get the line count */ while (lines[count]) count++; patch_start = find_patch_start(lines, count); - trailer_start = find_trailer_start(lines, patch_start); + + /* Get the index of the end of the trailers */ + for (i = 0; i patch_start; i++) + strbuf_addbuf(sb, lines[i]); + ignore_bytes = ignore_non_trailer(sb); + for (i = patch_start - 1; i = 0 ignore_bytes 0; i--) + ignore_bytes -= lines[i]-len; + trailer_end = i + 1; Looks like there is an impedance mismatch between the caller and the callee here. I can sort of understand why you might want to have an array of trailer items, one element per line in the trailer block, as that would make it easier on your brain when you have to reorder them, insert a new one or a remove an existing one, but you seem to be keeping _everything_ in that format, an array of strbuf with one strbuf per line, which sounds really wasteful. The data structure might need to be rethoughtbefore this code becomes ready for production. I would have expected it to be more like (1) slurp everything in a single strbuf, (2) find the trailer block inside that single strbuf, splitting what you read in the previous step into one strbuf for stuff before the trailer block, another strbuf for stuff after the trailer block and an array of lines in the tailer block, (3) do whatever your trailer flipping logic inside that array without having to worry about stuff before or after the trailer block and then finally (4) spit the whole thing out by concatenating the stuff before the trailer block, the stuff in the trailer block and the stuff after the trailer block. Oh well... -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html