Hi,
While attempting to fix one bug, the recent commit to sed regarding the
`y' command has introduced three new problems.
The first one is that it happily uses a plain `char' as the index for
the array `check', which obviously leads to havoc as soon as one tries
to translate non-ASCII characters, whereby sed, overwriting its own
memory, either fails with bogus error messages
$ ./sed $(printf 'y/\220/a/')
sed: 1: "y/\x90/a/": Repeated character in source string
$ ./sed $(printf 'y/\360/a/')
sed: 1: "y/\xf0/a/": unexpected EOF (pending }'s)
or merely dumps core.
$ ./sed $(printf 'y/\222/a/')
Segmentation fault (core dumped)
The second bug is that, even if one were to cast the said index to an
unsigned value, the `check' array would still be off by one, as it is
declared to have a length of 255 instead of 256, once again causing
trivial one-byte buffer overflows.
The third bug is more of a behavioural one: while the `y' command was
previously slightly diverging from its description by POSIX, at least
both `s' and `y' were compiled in the same way, such that the two
similar commands `yn\nnXn' and `sn\nnXn' were handled consistently, as
one would expect. With this commit, however, these two commands now
have exactly the opposite behaviour, thus making the poor sed even more
incoherent than it ever was.
As an aside, in case the irony is not overwhelming enough already, I
would also like to point out how the first submitted diff addressing
this issue decided to change the comment
/* We assume characters are 8 bits */
into
// We assume characters are 8 bits
after which it was objected that this modification was not justified and
hence should not be part of the final patch. However, at no point did
someone realize that the code is in fact not assuming anything regarding
the number of bits in a character, and hence that this comment, wether
written in one style or the other, is actually quite ridiculous anyway.
Perhaps the worst part of all this, though, is how the change of
behaviour, which made sed fail hard where it previously handled input in
a perfectly defined and reasonable way, was apparently approved because
"implementations do vary in how they handle [it], so throwing an error
is probably best". Following the same kind of reasoning, I think
OpenBSD should also modify the `echo' command to fail if given an
argument like `-E', as its behaviour in that case differs from system to
system, hence the current implementation is likewise "just creating a
trap for the user", and surely this is unacceptable and therefore ought
to be fixed, right?
Having thus shown that this commit was not only misguided technically
but also conceptually, rather than applying yet another misguided patch
on top of it in an attempt to repair this whole mess, I would like to
request that it instead be completely reverted, after which I would be
able to provide a proper and definitive fix to deal with this issue
(and, at the same time, many other unrelated bugs).
Regards,
kshe