Re: [PATCH 4/5] trailer: reuse ignore_non_trailer() to ignore conflict lines

2014-11-09 Thread Christian Couder
From: Junio C Hamano gits...@pobox.com
Subject: Re: [PATCH 4/5] trailer: reuse ignore_non_trailer() to ignore conflict 
lines
Date: Fri, 07 Nov 2014 12:27:03 -0800

 Christian Couder chrisc...@tuxfamily.org writes:
 
   * Copyright (c) 2013, 2014 Christian Couder chrisc...@tuxfamily.org
 @@ -791,14 +792,24 @@ static int process_input_file(struct strbuf **lines,
struct trailer_item **in_tok_last)
  {
  int count = 0;
 -int patch_start, trailer_start, i;
 +int trailer_start, trailer_end, patch_start, ignore_bytes, i;
 +struct strbuf sb;
  
  /* Get the line count */
  while (lines[count])
  count++;
  
  patch_start = find_patch_start(lines, count);
 -trailer_start = find_trailer_start(lines, patch_start);
 +
 +/* Get the index of the end of the trailers */
 +for (i = 0; i  patch_start; i++)
 +strbuf_addbuf(sb, lines[i]);
 +ignore_bytes = ignore_non_trailer(sb);
 +for (i = patch_start - 1; i = 0  ignore_bytes  0; i--)
 +ignore_bytes -= lines[i]-len;
 +trailer_end = i + 1;
 
 Looks like there is an impedance mismatch between the caller and the
 callee here.  I can sort of understand why you might want to have an
 array of trailer items, one element per line in the trailer block,
 as that would make it easier on your brain when you have to reorder
 them, insert a new one or a remove an existing one, but you seem to
 be keeping _everything_ in that format, an array of strbuf with one
 strbuf per line, which sounds really wasteful.  The data structure
 might need to be rethoughtbefore this code becomes ready for
 production.
 
 I would have expected it to be more like (1) slurp everything in a
 single strbuf, (2) find the trailer block inside that single strbuf,
 splitting what you read in the previous step into one strbuf for
 stuff before the trailer block, another strbuf for stuff after the
 trailer block and an array of lines in the tailer block, (3) do
 whatever your trailer flipping logic inside that array without
 having to worry about stuff before or after the trailer block and
 then finally (4) spit the whole thing out by concatenating the stuff
 before the trailer block, the stuff in the trailer block and the
 stuff after the trailer block.
 
 Oh well...

I hope it is better now, as I encapsulated the call to
ignore_non_trailer() into a new find_trailer_end() function. There
were already find_trailer_start() and find_patch_start(), and I think
this way we could have a nice high level line oriented API.

Yeah, it won't be as efficient as using only one strbuf and only byte
oriented functions, and it looks much less manly too :-) But over time in
Git we have developed a number of less efficient but quite clean
abstractions like strbuf, argv_array, sha1_array and so on, that we
are quite happy with.

So yeah, with old age we might have lost some manliness and lament the
time when men were really men, and ... Well, let's wait for the
hopefully big party^W conference next year to do that together :-)

Thanks,
Christian.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] trailer: reuse ignore_non_trailer() to ignore conflict lines

2014-11-09 Thread Junio C Hamano
Christian Couder chrisc...@tuxfamily.org writes:

 Yeah, it won't be as efficient as using only one strbuf and only byte
 oriented functions, and it looks much less manly too :-) But over time in
 Git we have developed a number of less efficient but quite clean
 abstractions like strbuf, argv_array, sha1_array and so on, that we
 are quite happy with.

Actually, all these examples you gave are fairly efficient and clean
abstractions.  I find it insulting to pretend that the one line per
strbuf is in the same league.  It isn't.

And it is not about manliness.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] trailer: reuse ignore_non_trailer() to ignore conflict lines

2014-11-09 Thread Junio C Hamano
Junio C Hamano gits...@pobox.com writes:

 Christian Couder chrisc...@tuxfamily.org writes:

 Yeah, it won't be as efficient as using only one strbuf and only byte
 oriented functions, and it looks much less manly too :-) But over time in
 Git we have developed a number of less efficient but quite clean
 abstractions like strbuf, argv_array, sha1_array and so on, that we
 are quite happy with.

 Actually, all these examples you gave are fairly efficient and clean
 abstractions.  I find it insulting to pretend that the one line per
 strbuf is in the same league.  It isn't.

 And it is not about manliness.

By the way, I do not mean to say that all of strbuf (which is a
rather full API) is uniformly efficient and cleanly abstracted.

For example, I see strbuf_split() is a handy but a lousy invitation
to stupid programmers to do stupid things and needs to be used
carefully (rather, you need to carefully consider if a possible
solution that uses it is really a good solution).  If you start with
a single strbuf (e.g. perhaps read from a file or a blob in bulk in
an I/O efficient way), pass it around and then have to manipulate
each and every line in it to massage the information on each line
into other useful form (e.g. perhaps each line is a [+]src:dst
refspec element and you are parsing it into an array of struct
refspec), it may be tempting to write a parser from a single strbuf
into a single refspec, use strbuf_split() to break the single strbuf
input into multiple and feed each of them to your parser.

But it does not have to be the only way to create such a caller.
Unless the final other useful form is an array of strbuf, and you
have to remember that the whole point of using strbuf is because it
is easy to edit its contents efficiently in place, that needs to
further be modified in various ways, the splitting of N lines into N
strbuf, only to parse each line one by one, is a huge waste.  A
right approach may involve introducing a foreach_strbuf_line
iterator function that takes a callback, or a macro of the same name
that puts a control structure.

Going back to the trailer processing, if you are likely to be
editing each and every line in the input before producing the final
result, an array of strbuf would be a perfectly sensible data
structure to use.  I just didn't get the impression that that is
what is being done.  You would be presented a single text with
multiple lines (whose natural representation would be a ptr,len
that fits in a single strbuf), you would be asked to inspect and
manipulate one single section in the middle (namely, the runs of
lines that begin with some keyword: ), and return the
concatenation of the part before that one section (intact), the
section you have manipulated (i.e. the trailer) and possibly the
remainder (intact).  Outside that single section you will be
touching, I do not see a good reason for each of their lines to be
first stored in a strbuf (while moving bunch of pointers in the
array using ARRAY_GROW() and friends), only to be later concatenated
back into a single string.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] trailer: reuse ignore_non_trailer() to ignore conflict lines

2014-11-09 Thread Junio C Hamano
Junio C Hamano gits...@pobox.com writes:

 Junio C Hamano gits...@pobox.com writes:

 Christian Couder chrisc...@tuxfamily.org writes:

 Yeah, it won't be as efficient as using only one strbuf and only byte
 oriented functions, and it looks much less manly too :-) But over time in
 Git we have developed a number of less efficient but quite clean
 abstractions like strbuf, argv_array, sha1_array and so on, that we
 are quite happy with.

 Actually, all these examples you gave are fairly efficient and clean
 abstractions.  I find it insulting to pretend that the one line per
 strbuf is in the same league.  It isn't.

 And it is not about manliness.

 By the way, I do not mean to say that all of strbuf (which is a
 rather full API) is uniformly efficient and cleanly abstracted.
 ...

Having said all that, please notice that I said might need to be
rethought before the code becomes ready for production.

After all, it would have perfectly been OK to experiment with this
new topic in a script; e.g. this topic being text processing, it
might have happened as a Perl script while we get the semantics and
desirable features nailed down before rewriting the real thing in C.

And if that were what happened here, I wouldn't have been talking
about data structures and unnecessary complexity having to have one
strbuf per line only to join them into one string (or printf() out
to a stream one-after-another) later.

In other words, I did not say this code is too ugly to live and we
should clean it up as soon as possible or we should revert it from
'master'---it was a mistake to merge it.

I consider the trailer code in 'master' today as experimental (in
other words, something I might have written in a scripting language
as a mock-up if I were doing it) that needs to be worked on further
to still get the features and semantics right.  And the patch series
in this thread are hopefully taking us in the right direction to get
them right [*1*].

You could have just said Yeah, I realize that the code is way
suboptimal, but lets get it feature complete first and then clean it
up, or something, instead of getting defensive with unnecessary
excuses.


[Footnote]

*1* ... even though this step exposes why the approach taken,
splitting the input lines first into individual lines without even
looking at them, is a wrong one---if you started from a single
buffer that is not yet split, it would have been much easier to find
where the trailer block is, and that is why you are re-joining the
buffer you have already split into lines (i.e. what I called
impedance mismatch), which shows that the initial splitting is
done prematurely.  Hopefully you would realize that by now ;-).

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/5] trailer: reuse ignore_non_trailer() to ignore conflict lines

2014-11-07 Thread Christian Couder
Make sure we look for trailers before any conflict line
by reusing the ignore_non_trailer() function.

Signed-off-by: Christian Couder chrisc...@tuxfamily.org
---
 t/t7513-interpret-trailers.sh |  2 ++
 trailer.c | 25 ++---
 2 files changed, 20 insertions(+), 7 deletions(-)

diff --git a/t/t7513-interpret-trailers.sh b/t/t7513-interpret-trailers.sh
index 1efb880..fed053a 100755
--- a/t/t7513-interpret-trailers.sh
+++ b/t/t7513-interpret-trailers.sh
@@ -232,6 +232,8 @@ test_expect_success 'with message that has comments' '
 
Reviewed-by: Johan
Cc: Peff
+   # last comment
+
EOF
cat basic_patch expected 
git interpret-trailers --trim-empty --trailer Cc: Peff 
message_with_comments actual 
diff --git a/trailer.c b/trailer.c
index f4d51ba..f6aa181 100644
--- a/trailer.c
+++ b/trailer.c
@@ -2,6 +2,7 @@
 #include string-list.h
 #include run-command.h
 #include string-list.h
+#include commit.h
 #include trailer.h
 /*
  * Copyright (c) 2013, 2014 Christian Couder chrisc...@tuxfamily.org
@@ -791,14 +792,24 @@ static int process_input_file(struct strbuf **lines,
  struct trailer_item **in_tok_last)
 {
int count = 0;
-   int patch_start, trailer_start, i;
+   int trailer_start, trailer_end, patch_start, ignore_bytes, i;
+   struct strbuf sb;
 
/* Get the line count */
while (lines[count])
count++;
 
patch_start = find_patch_start(lines, count);
-   trailer_start = find_trailer_start(lines, patch_start);
+
+   /* Get the index of the end of the trailers */
+   for (i = 0; i  patch_start; i++)
+   strbuf_addbuf(sb, lines[i]);
+   ignore_bytes = ignore_non_trailer(sb);
+   for (i = patch_start - 1; i = 0  ignore_bytes  0; i--)
+   ignore_bytes -= lines[i]-len;
+   trailer_end = i + 1;
+
+   trailer_start = find_trailer_start(lines, trailer_end);
 
/* Print lines before the trailers as is */
print_lines(lines, 0, trailer_start);
@@ -807,14 +818,14 @@ static int process_input_file(struct strbuf **lines,
printf(\n);
 
/* Parse trailer lines */
-   for (i = trailer_start; i  patch_start; i++) {
+   for (i = trailer_start; i  trailer_end; i++) {
if (lines[i]-buf[0] != comment_line_char) {
struct trailer_item *new = 
create_trailer_item(lines[i]-buf);
add_trailer_item(in_tok_first, in_tok_last, new);
}
}
 
-   return patch_start;
+   return trailer_end;
 }
 
 static void free_all(struct trailer_item **first)
@@ -831,7 +842,7 @@ void process_trailers(const char *file, int trim_empty, 
struct string_list *trai
struct trailer_item *in_tok_last = NULL;
struct trailer_item *arg_tok_first;
struct strbuf **lines;
-   int patch_start;
+   int trailer_end;
 
/* Default config must be setup first */
git_config(git_trailer_default_config, NULL);
@@ -840,7 +851,7 @@ void process_trailers(const char *file, int trim_empty, 
struct string_list *trai
lines = read_input_file(file);
 
/* Print the lines before the trailers */
-   patch_start = process_input_file(lines, in_tok_first, in_tok_last);
+   trailer_end = process_input_file(lines, in_tok_first, in_tok_last);
 
arg_tok_first = process_command_line_args(trailers);
 
@@ -851,7 +862,7 @@ void process_trailers(const char *file, int trim_empty, 
struct string_list *trai
free_all(in_tok_first);
 
/* Print the lines after the trailers as is */
-   print_lines(lines, patch_start, INT_MAX);
+   print_lines(lines, trailer_end, INT_MAX);
 
strbuf_list_free(lines);
 }
-- 
2.1.2.555.gfbecd99


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/5] trailer: reuse ignore_non_trailer() to ignore conflict lines

2014-11-07 Thread Junio C Hamano
Christian Couder chrisc...@tuxfamily.org writes:

   * Copyright (c) 2013, 2014 Christian Couder chrisc...@tuxfamily.org
 @@ -791,14 +792,24 @@ static int process_input_file(struct strbuf **lines,
 struct trailer_item **in_tok_last)
  {
   int count = 0;
 - int patch_start, trailer_start, i;
 + int trailer_start, trailer_end, patch_start, ignore_bytes, i;
 + struct strbuf sb;
  
   /* Get the line count */
   while (lines[count])
   count++;
  
   patch_start = find_patch_start(lines, count);
 - trailer_start = find_trailer_start(lines, patch_start);
 +
 + /* Get the index of the end of the trailers */
 + for (i = 0; i  patch_start; i++)
 + strbuf_addbuf(sb, lines[i]);
 + ignore_bytes = ignore_non_trailer(sb);
 + for (i = patch_start - 1; i = 0  ignore_bytes  0; i--)
 + ignore_bytes -= lines[i]-len;
 + trailer_end = i + 1;

Looks like there is an impedance mismatch between the caller and the
callee here.  I can sort of understand why you might want to have an
array of trailer items, one element per line in the trailer block,
as that would make it easier on your brain when you have to reorder
them, insert a new one or a remove an existing one, but you seem to
be keeping _everything_ in that format, an array of strbuf with one
strbuf per line, which sounds really wasteful.  The data structure
might need to be rethoughtbefore this code becomes ready for
production.

I would have expected it to be more like (1) slurp everything in a
single strbuf, (2) find the trailer block inside that single strbuf,
splitting what you read in the previous step into one strbuf for
stuff before the trailer block, another strbuf for stuff after the
trailer block and an array of lines in the tailer block, (3) do
whatever your trailer flipping logic inside that array without
having to worry about stuff before or after the trailer block and
then finally (4) spit the whole thing out by concatenating the stuff
before the trailer block, the stuff in the trailer block and the
stuff after the trailer block.

Oh well...
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html