[PATCHv2 04/25] diff.c: introduce emit_diff_symbol

2017-06-30 Thread Stefan Beller
In a later patch we want to buffer all output before emitting it as a
new feature ("markup moved lines") conceptually cannot be implemented
in a single pass over the output.

There are different approaches to buffer all output such as:
* Buffering on the char level, i.e. we'd have a char[] which would
  grow at approximately 80 characters a line. This would keep the
  output completely unstructured, but might be very easy to implement,
  such as redirecting all output to a temporary file and working off
  that. The later passes over the buffer are quite complicated though,
  because we have to parse back any output and then decide if it should
  be modified.

* Buffer on a line level. As the output is mostly line oriented already,
  this would make sense, but it still is a bit awkward as we'd have to
  make sense of it again by looking at the first characters of a line
  to decide what part of a diff a line is.

* Buffer semantically. Imagine there is a formal grammar for the diff
  output and we'd keep the symbols of this grammar around. This keeps
  the highest level of structure in the buffered data, such that the
  actual memory requirements are less than say the first option. Instead
  of buffering the characters of the line, we'll buffer what we intend
  to do plus additional information for the specifics. An output of

diff --git a/new.txt b/new.txt
index fa69b07..412428c 100644
Binary files a/new.txt and b/new.txt differ

  could be buffered as
 DIFF_SYMBOL_DIFF_START + new.txt
 DIFF_SYMBOL_INDEX_MODE + fa69b07 412428c "non-executable" flag
 DIFF_SYMBOL_BINARY_FILES + new.txt

This and the following patches introduce the third option of buffering
by first moving any output to emit_diff_symbol, and then introducing the
buffering in this function.

Signed-off-by: Stefan Beller 
Signed-off-by: Junio C Hamano 
---
 diff.c | 22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/diff.c b/diff.c
index 77ef56a6e4..4637368d59 100644
--- a/diff.c
+++ b/diff.c
@@ -560,6 +560,24 @@ static void emit_line(struct diff_options *o, const char 
*set, const char *reset
emit_line_0(o, set, reset, line[0], line+1, len-1);
 }
 
+enum diff_symbol {
+   DIFF_SYMBOL_SEPARATOR
+};
+
+static void emit_diff_symbol(struct diff_options *o, enum diff_symbol s,
+const char *line, int len)
+{
+   switch (s) {
+   case DIFF_SYMBOL_SEPARATOR:
+   fprintf(o->file, "%s%c",
+   diff_line_prefix(o),
+   o->line_termination);
+   break;
+   default:
+   die("BUG: unknown diff symbol");
+   }
+}
+
 static int new_blank_line_at_eof(struct emit_callback *ecbdata, const char 
*line, int len)
 {
if (!((ecbdata->ws_rule & WS_BLANK_AT_EOF) &&
@@ -4828,9 +4846,7 @@ void diff_flush(struct diff_options *options)
 
if (output_format & DIFF_FORMAT_PATCH) {
if (separator) {
-   fprintf(options->file, "%s%c",
-   diff_line_prefix(options),
-   options->line_termination);
+   emit_diff_symbol(options, DIFF_SYMBOL_SEPARATOR, NULL, 
0);
if (options->stat_sep) {
/* attach patch instead of inline */
fputs(options->stat_sep, options->file);
-- 
2.13.0.31.g9b732c453e



Re: [PATCHv2 04/25] diff.c: introduce emit_diff_symbol

2017-06-23 Thread Stefan Beller
On Fri, Jun 23, 2017 at 1:07 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> diff --git a/new.txt b/new.txt
>> index fa69b07..412428c 100644
>> Binary files a/new.txt and b/new.txt differ
>>
>>   could be buffered as
>>  DIFF_SYMBOL_DIFF_START + new.txt
>>  DIFF_SYMBOL_INDEX_MODE + fa69b07 412428c "non-executable" flag
>>  DIFF_SYMBOL_BINARY_FILES + new.txt
>
> Hopefully this is an oversimplified example and I'd assume that you
> will do the right thing when showing a renamed filepair by allowing
> both old and new paths that could be different?

Yes, this is simplified to drive the point of having as little data as possible,
so I thought we might have a flag that says "same file name" instead of giving
the file name twice.

We do not do such an optimization yet.


Re: [PATCHv2 04/25] diff.c: introduce emit_diff_symbol

2017-06-23 Thread Junio C Hamano
Stefan Beller  writes:

> diff --git a/new.txt b/new.txt
> index fa69b07..412428c 100644
> Binary files a/new.txt and b/new.txt differ
>
>   could be buffered as
>  DIFF_SYMBOL_DIFF_START + new.txt
>  DIFF_SYMBOL_INDEX_MODE + fa69b07 412428c "non-executable" flag
>  DIFF_SYMBOL_BINARY_FILES + new.txt

Hopefully this is an oversimplified example and I'd assume that you
will do the right thing when showing a renamed filepair by allowing
both old and new paths that could be different?



[PATCHv2 04/25] diff.c: introduce emit_diff_symbol

2017-06-22 Thread Stefan Beller
In a later patch we want to buffer all output before emitting it as a
new feature ("markup moved lines") conceptually cannot be implemented
in a single pass over the output.

There are different approaches to buffer all output such as:
* Buffering on the char level, i.e. we'd have a char[] which would
  grow at approximately 80 characters a line. This would keep the
  output completely unstructured, but might be very easy to implement,
  such as redirecting all output to a temporary file and working off
  that. The later passes over the buffer are quite complicated though,
  because we have to parse back any output and then decide if it should
  be modified.

* Buffer on a line level. As the output is mostly line oriented already,
  this would make sense, but it still is a bit awkward as we'd have to
  make sense of it again by looking at the first characters of a line
  to decide what part of a diff a line is.

* Buffer semantically. Imagine there is a formal grammar for the diff
  output and we'd keep the symbols of this grammar around. This keeps
  the highest level of structure in the buffered data, such that the
  actual memory requirements are less than say the first option. Instead
  of buffering the characters of the line, we'll buffer what we intend
  to do plus additional information for the specifics. An output of

diff --git a/new.txt b/new.txt
index fa69b07..412428c 100644
Binary files a/new.txt and b/new.txt differ

  could be buffered as
 DIFF_SYMBOL_DIFF_START + new.txt
 DIFF_SYMBOL_INDEX_MODE + fa69b07 412428c "non-executable" flag
 DIFF_SYMBOL_BINARY_FILES + new.txt

This and the following patches introduce the third option of buffering
by first moving any output to emit_diff_symbol, and then introducing the
buffering in this function.

Signed-off-by: Stefan Beller 
---
 diff.c | 22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/diff.c b/diff.c
index 2f9722b382..2257d44e2c 100644
--- a/diff.c
+++ b/diff.c
@@ -559,6 +559,24 @@ static void emit_line(struct diff_options *o, const char 
*set, const char *reset
emit_line_0(o, set, reset, line[0], line+1, len-1);
 }
 
+enum diff_symbol {
+   DIFF_SYMBOL_SEPARATOR
+};
+
+static void emit_diff_symbol(struct diff_options *o, enum diff_symbol s,
+const char *line, int len)
+{
+   switch (s) {
+   case DIFF_SYMBOL_SEPARATOR:
+   fprintf(o->file, "%s%c",
+   diff_line_prefix(o),
+   o->line_termination);
+   break;
+   default:
+   die("BUG: unknown diff symbol");
+   }
+}
+
 static int new_blank_line_at_eof(struct emit_callback *ecbdata, const char 
*line, int len)
 {
if (!((ecbdata->ws_rule & WS_BLANK_AT_EOF) &&
@@ -4833,9 +4851,7 @@ void diff_flush(struct diff_options *options)
 
if (output_format & DIFF_FORMAT_PATCH) {
if (separator) {
-   fprintf(options->file, "%s%c",
-   diff_line_prefix(options),
-   options->line_termination);
+   emit_diff_symbol(options, DIFF_SYMBOL_SEPARATOR, NULL, 
0);
if (options->stat_sep) {
/* attach patch instead of inline */
fputs(options->stat_sep, options->file);
-- 
2.12.2.575.gb14f27f917