Joseph Kowalski wrote: ... > His > story was one of a function SEVg'ing in response to a rare case of > inputs. The module which died, was quickly restarted by a "heartbeat" > deamon and all was well - an outage measured in seconds. The object was > upgraded. The function in question was modified to respond to this rare > case with inputs to respond with an error code. It was a valid error > code, but it turned out that there was a second bug in the consumer of > the output of the initial module. The result was an infinite exchange > of messages between the two objects, and not real computational progress > was made. The result was an outage of several days (mostly to diagnose > the problem). ... > I should point out (because if I don't, I'm sure that somebody will), > there have been numerous cases which proposed adding additional error > codes. The points to notice are: > > 1) Always in a Minor release (but we've already dealt with that) > > 2) Only when we had to, as in a standard required it or overloading > an existing error code couldn't be overloaded. In other words, it > was necessary for the application to react differently. We always > made these changes carefully. > > So, that is my argument against making this change. I'm sure your > mileage will vary.
Note that exactly the same argument can be made for any bug fix; what this argues for is virtually complete stasis, since any change we make can and will cause problems for broken applications. Indeed, we've seen exactly the same problem with a library that used to dereference a user-supplied pointer, and then later this functionality was done in the kernel. What had been a SEGV was turned into a EFAULT, and the application misbehaved in a new way. I don't think requiring all previously broken programs to fail in exactly the same manner across all update releases of Solaris 10 is a reasonable standard, and is certainly not one that we follow. Jim already pointed out that the introduction of new options for existing commands would cause the same sorts of problems. Adding new codes to an ioctl would as well. - Bart -- Bart Smaalders Solaris Kernel Performance barts at cyber.eng.sun.com http://blogs.sun.com/barts "You will contribute more with mercurial than with thunderbird."
