First, the ARCs were created to address multiple issues. The two
overriding ones were:
1) Make sure multiple products (eventually consolidations) "played
nice".
2) Be the guardians of compatibility.
Obviously, these are very interrelated.
(Historical note: I believe this was the initial ordering, but the
priorities seem to have reversed.)
We focus on well defined interfaces because that is the best tool we
have to serve the higher level needs. We can make fairly automated
changes when a specific, well defined interface can be cited. When
there isn't such an interface, then evaluation of compatibility becomes
more difficult. Just because a behavior isn't well specified, doesn't
mean its not a compatibility issue.
In this case, we seem to clearly have a compatibility issue of
controversial magnitude. It does exist, because one could easily
contrive an existence proof. The question is all about the "magnitude".
It seems to be widely asserted that the compatibility exposure is
negligible. Let me counter that by citing a instance that was relayed
to me (second hand).
Circa 2000, I was the technical lead for the "whitesmoke" project. The
details are "Sun Private", but all that need to be said is that Sun was
working with a large customer with a very widely distributed, high
availability product. This story is from my counterpart in this
corporate relationship. He was relaying this story in the context of
"we don't change any components - only fully tested aggregations". His
story was one of a function SEVg'ing in response to a rare case of
inputs. The module which died, was quickly restarted by a "heartbeat"
deamon and all was well - an outage measured in seconds. The object was
upgraded. The function in question was modified to respond to this rare
case with inputs to respond with an error code. It was a valid error
code, but it turned out that there was a second bug in the consumer of
the output of the initial module. The result was an infinite exchange
of messages between the two objects, and not real computational progress
was made. The result was an outage of several days (mostly to diagnose
the problem).
This very similar to this proposal, except that we have an even more
significant change. The story is one of "SEVG -> error return". This
case proposes "SEVG -> success return".
The motivation for this proposal seems to be "Linux familiarity". The
"program" around "Linux familiarity", is all about development tools,
the utility set and additional components. It is explicitly not about
changes to the programmatic interfaces.
So, what I believe we have here is a non-negligible incompatibility in
the programmatic interfaces weighted against a perception of
"familiarity" in an area where "familiarity" isn't a goal. Hence, we
should not make this change.
I should point out (because if I don't, I'm sure that somebody will),
there have been numerous cases which proposed adding additional error
codes. The points to notice are:
1) Always in a Minor release (but we've already dealt with that)
2) Only when we had to, as in a standard required it or
overloading an existing error code couldn't be overloaded. In other
words, it was necessary for the application to react differently. We
always made these changes carefully.
So, that is my argument against making this change. I'm sure your
mileage will vary.
- jek3