Hi Marc, On Tue, Sep 19, 2023 at 03:24:41PM +0200, Marc Espie wrote: > On Tue, Sep 19, 2023 at 09:48:25AM -0300, Crystal Kolipe wrote: > > deroff chokes when given lines > 2048 bytes, and produces non-deterministic > > output on little endian archs. > > Since you went to the trouble of reproducing the issue, > it would be great if you could turn that into a non-regression test
I agree that more regressions tests would be a good thing. Since the deroff code is currently broken, we either need to commit a fix or at least decide what the desired behaviour is :-). Specifically, dealing with long input lines. Obviously the need to process lines >= 2048 in actual roff source is not a common requirement. However, not everything passed to deroff is necessarily roff source, for example checking a plain text file with /usr/bin/spell will run it through deroff first. Maybe some intelligence should be added to spell to avoid doing that for obvious non-roff input but that's a separate question. Fixing the current deroff code in the most obvious way as I proposed results in long input lines being broken at 2047 characters. I was waiting to see if there was a preference for increasing the buffer to LINEMAX + 1, and breaking lines at 2048. However, the concept of whole breaking input lines at an arbitrary point is not historic deroff behaviour anyway. That was added when the source was first added to OpenBSD and cleaned up to the style of the day. NetBSD subsequently imported the OpenBSD code, so their behaviour now matches us in this regard, (including the out of bounds bug). Most historic versions of the deroff code use a fixed line buffer size of 512 bytes with no bounds checking. Longer lines would typically just cause a crash or other unpredictable behaviour, depending on what was overwritten. The exception here is Solaris, where in at least in one version deroff was changed to support arbitrary line lengths without breaking the input to multiple output lines at all. So what do we want? 1. Traditional OpenBSD behaviour of breaking input lines at 2047, (which never actually worked correctly up to now). 2. Breaking input at 2048. 3. Support for arbitrary line length with no breaking. Presumably nobody is relying on the current behaviour in scripts or other code, as it's always been broken. Maybe I'll just write a new deroff from scratch for fun sometime :-).