Jonathan Tan writes:
> Within the processing of the commit message, check for a scissors line
> or a patchbreak line first (before checking for in-body headers) so that
> a subsequent patch modifying the processing of in-body headers would not
> cause a scissors line or patchbreak line to be misidentified.
>
> If a line could be both an in-body header and a scissors line (for
> example, "From: -- >8 --"), this is considered a fatal error
> (previously, it would be interpreted as an in-body header).
The scissors line is designed to allow garbage other than scissors
and perforation marks to be on the same line, i.e.
/*
* The mark must be at least 8 bytes long (e.g. "-- >8 --").
* Even though there can be arbitrary cruft on the same line
* (e.g. "cut here"), in order to avoid misidentification, the
* perforation must occupy more than a third of the visible
* width of the line, and dashes and scissors must occupy more
* than half of the perforation.
*/
Even though it is not likely for people to do so, it would probably
be nicer if we can treat
From: -- >8 -- cut -- >8 -- >8 -- here -- >8 --
as a scissors line instead of making it a fatal error, by treating
that "From:" as just a random garbage.
But this is a minor point. It is not worth to make it work like so
if the resulting code will become messier.
> The following enumeration shows that processing is the same except (as
> described above) the in-body header + scissors line case.
>
> o in-body header (check_header OK)
> o passes UTF-8 conversion
> o [described above] is scissors line
> o [not possible] is patchbreak line
> o [not possible] is blank line
> o is none of the above - processed as header
> o fails UTF-8 conversion - processed as header
> o not in-body header
> o passes UTF-8 conversion
> o is scissors line - processed as scissors
> o is patchbreak line - processed as patchbreak
> o is blank line - ignored if in header_stage
> o is none of the above - log message
> o fails UTF-8 conversion - input error
>
> As for the result left in "line" (after the invocation of
> handle_commit_msg), it is unused (by its caller, handle_filter, and by
> handle_filter's callers, handle_boundary and handle_body) unless this
> line is a patchbreak line, in which case handle_patch is subsequently
> called (in handle_filter) on "line". In this case, "line" must have
> passed UTF-8 conversion both before and after this patch, so the result
> is still the same overall.
>
> Signed-off-by: Jonathan Tan
> ---
> mailinfo.c | 145
> -
> 1 file changed, 115 insertions(+), 30 deletions(-)
>
> diff --git a/mailinfo.c b/mailinfo.c
> index e19abe3..23a56c2 100644
> --- a/mailinfo.c
> +++ b/mailinfo.c
> @@ -340,23 +340,56 @@ static struct strbuf *decode_b_segment(const struct
> strbuf *b_seg)
> return out;
> }
>
> -static int convert_to_utf8(struct mailinfo *mi,
> -struct strbuf *line, const char *charset)
> +/*
> + * Attempts to convert line into UTF-8, storing the result in line.
> + *
> + * This differs from convert_to_utf8 in that conversion non-success is not
> + * considered an error case - mi->input_error is not set, and no error
> message
> + * is printed.
> + *
> + * If the conversion is unnecessary, returns 0 and stores NULL in old_buf (if
> + * old_buf is not NULL).
> + *
> + * If the conversion is successful, returns 0 and stores the unconverted
> string
> + * in old_buf and old_len (if they are respectively not NULL).
> + *
> + * If the conversion is unsuccessful, returns -1.
> + */
> +static int try_convert_to_utf8(const struct mailinfo *mi, struct strbuf
> *line,
> +const char *charset, char **old_buf,
> +size_t *old_len)
> {
> - char *out;
> + char *utf8;
>
> - if (!mi->metainfo_charset || !charset || !*charset)
> + if (!mi->metainfo_charset || !charset || !*charset ||
> + same_encoding(mi->metainfo_charset, charset)) {
> + if (old_buf)
> + *old_buf = NULL;
> return 0;
> + }
>
> - if (same_encoding(mi->metainfo_charset, charset))
> + utf8 = reencode_string(line->buf, mi->metainfo_charset, charset);
> + if (utf8) {
> + char *temp = strbuf_detach(line, old_len);
> + if (old_buf)
> + *old_buf = temp;
> + strbuf_attach(line, utf8, strlen(utf8), strlen(utf8));
> return 0;
> - out = reencode_string(line->buf, mi->metainfo_charset, charset);
> - if (!out) {
> + }
> + return -1;
> +}
> +
> +/*
> + * Converts line into UTF-8, setting mi->input_error to -1 upon failure.
> + */
> +static int convert_to_utf8(struct mailinfo *mi,
> +struct strbuf