Hi Marc,

On Tue, Sep 19, 2023 at 03:24:41PM +0200, Marc Espie wrote:
> On Tue, Sep 19, 2023 at 09:48:25AM -0300, Crystal Kolipe wrote:
> > deroff chokes when given lines > 2048 bytes, and produces non-deterministic
> > output on little endian archs.
> 
> Since you went to the trouble of reproducing the issue,
> it would be great if you could turn that into a non-regression test

I agree that more regressions tests would be a good thing.

Since the deroff code is currently broken, we either need to commit a fix or
at least decide what the desired behaviour is :-).

Specifically, dealing with long input lines.  Obviously the need to process
lines >= 2048 in actual roff source is not a common requirement.  However, not
everything passed to deroff is necessarily roff source, for example checking a
plain text file with /usr/bin/spell will run it through deroff first.

Maybe some intelligence should be added to spell to avoid doing that for
obvious non-roff input but that's a separate question.

Fixing the current deroff code in the most obvious way as I proposed results
in long input lines being broken at 2047 characters.  I was waiting to see if
there was a preference for increasing the buffer to LINEMAX + 1, and breaking
lines at 2048.

However, the concept of whole breaking input lines at an arbitrary point is
not historic deroff behaviour anyway.  That was added when the source was
first added to OpenBSD and cleaned up to the style of the day.

NetBSD subsequently imported the OpenBSD code, so their behaviour now matches
us in this regard, (including the out of bounds bug).

Most historic versions of the deroff code use a fixed line buffer size of 512
bytes with no bounds checking.  Longer lines would typically just cause a
crash or other unpredictable behaviour, depending on what was overwritten.

The exception here is Solaris, where in at least in one version deroff was
changed to support arbitrary line lengths without breaking the input to
multiple output lines at all.

So what do we want?

1. Traditional OpenBSD behaviour of breaking input lines at 2047, (which never
   actually worked correctly up to now).
2. Breaking input at 2048.
3. Support for arbitrary line length with no breaking.

Presumably nobody is relying on the current behaviour in scripts or other
code, as it's always been broken.

Maybe I'll just write a new deroff from scratch for fun sometime :-).

Reply via email to