Re: [tools-discuss] ksh93 sleep

Jason King Fri, 20 Feb 2009 07:50:27 -0800

On Fri, Feb 20, 2009 at 4:47 AM, Alan Burlison <alan.burli...@sun.com> wrote:
> Jason King wrote:
>
>> I cringe every time I hear that justification trotted out.  If that is
>> the end goal, we can fix things much easier and quicker -- stop all
>> work on Solaris immediately, and ship Linux.  That is the only way
>> you'll have achieve Linux compatibility.
>>
>> The goal should be to have the best userland out there -- whether it's
>> the current Solaris utility, a GNU utility, BSD, AST, etc.
>
> The two justifications that I've seen for this change are:
>
> 1. It allows fractional seconds.
>
> 2. It allows us "to re-use existing, maintained code instead of maintaining
> two seperate codelines (e.g. one in usr/src/cmd/sleep/ and the AST/ksh93
> "sleep" version) which have to be kept in sync all the time".


And what happens if the GNU maintainers won't accept needed patches
for running on Solaris (probably not the case here, but it's happened
with other GNU utilities)?  Do we start maintaining a fork? Do we
start neutering Solaris features to stay compatible with GNU?  Or do
we decide that perhaps that GNU utility shouldn't be used in favor of
something that does what is needed?


>
> The first is presumably considered to be a good thing because it increases
> compatibility with other OSs (e.g. Linux).  However, on that front the ksh93
> sleep is not that much of an improvement, as although it supports fractional
> seconds it doesn't support the 's', 'm', 'h' and 'd' units that GNU sleep
> supports.


>
> The second justification seems slightly flimsy - as others have pointed out,
> the C implementation wasn't exactly huge and as this thread illustrates, the
> ksh93 implementation has several problems.

A couple of bugs were revealed when run previously untested failure
mode.  It was fixed and regression tests were created to prevent the
reoccurance.  It happens, I can easily point to even larger bugs that
have shipped to customers in Solaris 8, 9, and 10 (which are supposed
to be even more widely tested than the nevada builds).  While I think
everyone involved understands the importance of not putting back code
until it's ready, that doesn't mean it will always be bug-free.  One
in particular had an untested failure mode (it was not and still is
not documented anywhere in docs.sun.com) with the native LDAP2 client
(easily triggered by causing the right kind of slowdown -- not outage,
but slowdown in the LDAP server) caused some rather extreme non-linear
behavior on the client systems (causing getgroups(2) to essentially
block indefinitely), which at one F500 customer, triggered a massive
outage with revenue loss to the customer.  It was thankfully fixed,
however I don't recall there be any discussions about 'with all the
implementation problems of the LDAP2 client, we should drop the code
and use the padl.com libraries instead' or such, so I'm wondering what
the criteria people are working under.

Remember with this bug, all the previous testing managed to miss it,
it was only this one particular failure mode that revealed the
problem.

>
> The other factor that hasn't been discussed at all is that the second
> justification for this change is diametrically opposed to the justification
> behind the changes introduced by the following bugs:
>
> http://bugs.opensolaris.org/view_bug.do?bug_id=5019961
> http://bugs.opensolaris.org/view_bug.do?bug_id=6210677
>
> Are those changes going to be undone by rewriting /bin/true, /bin/false,
>  /bin/basename, /bin/dirname (and others) in ksh93?  If so, at which point
> did the justification for 5019961 and 6210677 become invalid?

Is there actual proof that they are slower?  I believe the desired
result is that they will be binaries that use libast.  If however you
are using sh/ksh, the idea is to trigger the builtin (since the code
is already mapped in -- save the fork/exec).   If using something
else, you'll have the fork/exec old or new, and while it does mean
that libast must be loaded and linked, it's about the same size as
libc, and total we're talking about an additional 1.5mb being mapped
in (of which a decent amount is likely going to already be cached
given that there's so many things in OpenSolaris today that use ksh).

>
>> Being different isn't a bad thing -- IF there is a definite advantage.
>>  I don't see a lot of people complaining because of differences with
>> OS X or the BSDs vs. Linux.
>
> Actually, that's just about the most frequent complaint that we get.

Then revive madhatter.  Otherwise there are always going to be
differences.   The only way to get rid of them is to ship Linux.

>>> I'm also unclear at which point ksh93 was elevated to the level of
>>> primacy
>>> that this change implies.  It appears that this change is making Solaris
>>> less rather than more shell-agnostic, and I'm failing to understand why
>>> that
>>> is considered to be a good thing.
>>
>> Bugs aside, a binary is a binary.  If /bin/sleep or /bin/printf happen
>> to be symlinks to ksh93, how does this prevent you from doing anything
>> in csh, zsh, bash, etc?
>
> The new /bin/sleep isn't a symlink to ksh93, it is a shell script.
>
>>  It's a bit like saying Solaris is not
>> language (programming) agnostic because libc is the primary stable API
>> for developers.  Yeah a lot of the stuff is written in C -- and a lot
>> of the system stuff is written as sh or ksh scripts.  It doesn't
>> prevent anyone from using csh, zsh, ruby, python, java, C++, Ada,
>> Fortran, or even Cobol on Solaris.
>
> The interface provided by libc is defined by the OS calling convention and
> the system linker, not the language it is written in.  The interface
> provided by shell-level commands is defined by the command-line arguments
> they accept, not the language they are written in.  What I'm pointing out is
> that this change makes the shell-level interface to sleep be the same as the
> ksh93 implementation.  If a shell doesn't provide a sleep builtin the system
> implementation will now have ksh93 semantics, hence the comment about the
> elevation of ksh93 to a new level of primacy.

If the command line arguments are incompatible with the existing
implementation, then that is probably a problem that should prevent
putback until it is fixed.  However, I don't recall seeing anyone
claiming that is the case here.  This was a bug where unnecessary work
was being done which could cause failure in instances (i.e. NIS not
running on a NIS client) where the previous implementation wouldn't
fail.

> However I haven't seen any justification as to why the ksh93-style interface
> is to be preferred over any of the other possible interfaces.

In a number of cases, the current utilities are encumbered (and don't
look like they'll be able to change).  The current ksh is one of
these, but not the only one.  While the ksh93 integration takes care
of ksh, it also could take care of a number of other utilities in
usr/closed as well.  It also brings along some duplication for some
existing stuff.  We can kill multiple birds with one stone and
consolidate, or leave duplicated code throughout the source tree.  I'd
prefer consolidation myself as long as the existing documented
interfaces continue to work as described.
_______________________________________________
tools-discuss mailing list
tools-discuss@opensolaris.org

Re: [tools-discuss] ksh93 sleep

Reply via email to