First, the ARCs were created to address multiple issues.  The two 
overriding ones were:

    1)   Make sure multiple products (eventually consolidations) "played 
nice".
    2)   Be the guardians of compatibility.

Obviously, these are very interrelated.

(Historical note: I believe this was the initial ordering, but the 
priorities seem to have reversed.)

We focus on well defined interfaces because that is the best tool we 
have to serve the higher level needs. We can make fairly automated 
changes when a specific, well defined interface can be cited.  When 
there isn't such an interface, then evaluation of compatibility becomes 
more difficult.  Just because a behavior isn't well specified, doesn't 
mean its not a compatibility issue.

In this case, we seem to clearly have a compatibility issue of 
controversial magnitude.  It does exist, because one could easily 
contrive an existence proof.  The question is all about the "magnitude".

It seems to be widely asserted that the compatibility exposure is 
negligible.  Let me counter that by citing a instance that was relayed 
to me (second hand).

Circa 2000, I was the technical lead for the "whitesmoke" project.  The 
details are "Sun Private", but all that need to be said is that Sun was 
working with a large customer with a very widely distributed, high 
availability product.  This story is from my counterpart in this 
corporate relationship.  He was relaying this story in the context of 
"we don't change any components - only fully tested aggregations".  His 
story was one of a function SEVg'ing in response to a rare case of 
inputs.  The module which died, was quickly restarted by a "heartbeat" 
deamon and all was well - an outage measured in seconds.  The object was 
upgraded.  The function in question was modified to respond to this rare 
case with inputs to respond with an error code.  It was a valid error 
code, but it turned out that there was a second bug in the consumer of 
the output of the initial module.  The result was an infinite exchange 
of messages between the two objects, and not real computational progress 
was made.  The result was an outage of several days (mostly to diagnose 
the problem).

This very similar to this proposal, except that we have an even more 
significant change.  The story is one of "SEVG -> error return".  This 
case proposes "SEVG -> success return".

The motivation for this proposal seems to be "Linux familiarity".  The 
"program" around "Linux familiarity", is all about development tools, 
the utility set and additional components.  It is explicitly not about 
changes to the programmatic interfaces.

So, what I believe we have here is a non-negligible incompatibility in 
the programmatic interfaces weighted against a perception of 
"familiarity" in an area where "familiarity" isn't a goal.  Hence, we 
should not make this change.

I should point out (because if I don't, I'm sure that somebody will), 
there have been numerous cases which proposed adding additional error 
codes.  The points to notice are:

    1)   Always in a Minor release (but we've already dealt with that)

    2)   Only when we had to, as in a standard required it or 
overloading an existing error code couldn't be overloaded.  In other 
words, it was necessary for the application to react differently.  We 
always made these changes carefully.

So, that is my argument against making this change.  I'm sure your 
mileage will vary.

- jek3


Reply via email to