298 Self Review]

James Carlson Tue, 29 May 2007 15:09:30 -0400

John Plocher writes:
> James Carlson wrote:
> > ... and then what, exactly?  Abandon all engineering sense 
> 
> This sounds like an overreaction.


So where in any of our documentation do we make the contents of those
messages any sort of stable interface?

> We /know/ that people depend on the debugging messages found in
> syslog so that they can DEBUG their systems.  Those messages are
> generated so that customers can debug problems on their systems.

Indeed.

> They are not in the same class as debugging printf()s in prototype
> code, although some coders use the facility for that purpose.
> 
> It behooves US to consider how our customers use this resource
> so that we can ensure that it best meets their needs.

I agree with that.  However, it doesn't mean that we need to enshrine
hackery.

> We /know/ that sysadmins depend on this info.  We /know/ that
> it is often used as a first step in diagnosing complex problems.
> We /know/ that having more information makes it easier to get
> a handle on the unknown.
> 
> With all that knowledge, why is it that we want to make things
> harder for our customers by removing valuable clues from that
> repository?
> 
> Oh, because it isn't an interface; humans don't matter and it
> isn't "pure architecture"?

How about "wrong tool for the job?"

> > In any event, if you think that's the right thing to do because it
> > appeals to our customer's needs, then I think we need to have a new
> > policy written. 
> 
> We already have such policies.  One comes under the heading "make
> Solaris easier to use", another says, in effect, "don't violate
> customer's expectations".

The only policy we have regarding syslog messages (as far as I know)
is that they're in the same class as other debugging messages -- plain
old text, no need for L10N or other translation support, and changes
aren't treated as worthy of architectural review.

So, are all of the data structures touched by "lsof" also
automatically Committed?  I'd like to know how far this new policy
might extend.

> We tend to call the latter "being Netscape'd".  It doesn't matter
> what we intended for an interface - if Netscape glommed onto it,
> we can not change it - it effectively is promoted to a Committed
> interface.

You'd argue that for syslog?

If so, then we have a major disconnect here.  The engineering work
necessary to make that happen just isn't being done.

> I wouldn't go so far as to say every message in syslog was now a
> Committed API, but I do feel that the diagnostic payload for many
> of the messages is, as a human consumable interface, absolutely
> Committed.  As such, removing significant info from the payload
> is a regression, and should be avoided when possible.  And, in this
> case, it certainly seems to be easily possible.

I can't easily parse that statement.  I can't make sense out of "as a
human consumable interface, absolutely Committed."

"Committed" means that we will provide suitable reference
documentation (such as man pages).  We don't have that.  It also means
that other projects can consume it without restriction.  We don't have
that either so long as those "other projects" apparently have to run
on top of humans rather than on computers.

> Mostly harmful?  In the scope of this case, how is having
> link status, speed and duplex considered harmful - or even
> only semi-usable?

Links flapping in the breeze cause log files to fill and overflow with
garbage, causing users to be unable to find any relevant messages, and
sometimes causing file systems to fill up.

Different drivers issue completely different messages in different
contexts, and these cause customers to get annoyed or confused.

-- 
James Carlson, Solaris Networking              <james.d.carlson at sun.com>
Sun Microsystems / 1 Network Drive         71.232W   Vox +1 781 442 2084
MS UBUR02-212 / Burlington MA 01803-2757   42.496N   Fax +1 781 442 1677

GLDv3 link status logging [PSARC/2007/298 Self Review]

Reply via email to