Hi Ingo,

Ingo Schwarze writes:
> In ports land, many manual pages contain occasional non-ASCII
> characters - even though i don't consider that a particularly smart
> idea, but let's face it, those characters *are* out there.

I agree that this is appropriate for mandoc to try to handle for a
common, very limited subset of encodings.

> Since this is a somewhat bigger and user-visible change, i'm
> asking whether there are any concerns or comments before committing.

After applying this diff, mandoc -Tutf8 shows U+FFFD anywhere there's a
\& in the source... very obvious in the mdoc(7) page.

> +If not specified, autodetection uses the first match:
> +.Bl -tag -width iso-8859-1
> +.It Cm utf-8
> +if the first three bytes of the input file
> +are the UTF-8 byte order mark (BOM, 0xefbbbf)
> +.It Ar encoding
> +if the first or second line of the input file matches the
> +.Sy emacs
> +mode line format
> +.Pp
> +.D1 .\e" -*- Oo ...; Oc coding: Ar encoding ; No -*-
> +.It Cm utf-8
> +if the first non-ASCII byte in the file introduces a valid UTF-8 sequence
> +.It Cm iso-8859-1
> +otherwise
> +.El

I agree with this logic as well. I would be uncomfortable if it got any
more complicated.

-- 
Anthony J. Bentley

Reply via email to