Hi Ingo, Ingo Schwarze writes: > In ports land, many manual pages contain occasional non-ASCII > characters - even though i don't consider that a particularly smart > idea, but let's face it, those characters *are* out there.
I agree that this is appropriate for mandoc to try to handle for a common, very limited subset of encodings. > Since this is a somewhat bigger and user-visible change, i'm > asking whether there are any concerns or comments before committing. After applying this diff, mandoc -Tutf8 shows U+FFFD anywhere there's a \& in the source... very obvious in the mdoc(7) page. > +If not specified, autodetection uses the first match: > +.Bl -tag -width iso-8859-1 > +.It Cm utf-8 > +if the first three bytes of the input file > +are the UTF-8 byte order mark (BOM, 0xefbbbf) > +.It Ar encoding > +if the first or second line of the input file matches the > +.Sy emacs > +mode line format > +.Pp > +.D1 .\e" -*- Oo ...; Oc coding: Ar encoding ; No -*- > +.It Cm utf-8 > +if the first non-ASCII byte in the file introduces a valid UTF-8 sequence > +.It Cm iso-8859-1 > +otherwise > +.El I agree with this logic as well. I would be uncomfortable if it got any more complicated. -- Anthony J. Bentley
