On Fri, Feb 20, 2009 at 4:47 AM, Alan Burlison <alan.burli...@sun.com> wrote: > Jason King wrote: > >> I cringe every time I hear that justification trotted out. If that is >> the end goal, we can fix things much easier and quicker -- stop all >> work on Solaris immediately, and ship Linux. That is the only way >> you'll have achieve Linux compatibility. >> >> The goal should be to have the best userland out there -- whether it's >> the current Solaris utility, a GNU utility, BSD, AST, etc. > > The two justifications that I've seen for this change are: > > 1. It allows fractional seconds. > > 2. It allows us "to re-use existing, maintained code instead of maintaining > two seperate codelines (e.g. one in usr/src/cmd/sleep/ and the AST/ksh93 > "sleep" version) which have to be kept in sync all the time".
And what happens if the GNU maintainers won't accept needed patches for running on Solaris (probably not the case here, but it's happened with other GNU utilities)? Do we start maintaining a fork? Do we start neutering Solaris features to stay compatible with GNU? Or do we decide that perhaps that GNU utility shouldn't be used in favor of something that does what is needed? > > The first is presumably considered to be a good thing because it increases > compatibility with other OSs (e.g. Linux). However, on that front the ksh93 > sleep is not that much of an improvement, as although it supports fractional > seconds it doesn't support the 's', 'm', 'h' and 'd' units that GNU sleep > supports. > > The second justification seems slightly flimsy - as others have pointed out, > the C implementation wasn't exactly huge and as this thread illustrates, the > ksh93 implementation has several problems. A couple of bugs were revealed when run previously untested failure mode. It was fixed and regression tests were created to prevent the reoccurance. It happens, I can easily point to even larger bugs that have shipped to customers in Solaris 8, 9, and 10 (which are supposed to be even more widely tested than the nevada builds). While I think everyone involved understands the importance of not putting back code until it's ready, that doesn't mean it will always be bug-free. One in particular had an untested failure mode (it was not and still is not documented anywhere in docs.sun.com) with the native LDAP2 client (easily triggered by causing the right kind of slowdown -- not outage, but slowdown in the LDAP server) caused some rather extreme non-linear behavior on the client systems (causing getgroups(2) to essentially block indefinitely), which at one F500 customer, triggered a massive outage with revenue loss to the customer. It was thankfully fixed, however I don't recall there be any discussions about 'with all the implementation problems of the LDAP2 client, we should drop the code and use the padl.com libraries instead' or such, so I'm wondering what the criteria people are working under. Remember with this bug, all the previous testing managed to miss it, it was only this one particular failure mode that revealed the problem. > > The other factor that hasn't been discussed at all is that the second > justification for this change is diametrically opposed to the justification > behind the changes introduced by the following bugs: > > http://bugs.opensolaris.org/view_bug.do?bug_id=5019961 > http://bugs.opensolaris.org/view_bug.do?bug_id=6210677 > > Are those changes going to be undone by rewriting /bin/true, /bin/false, > /bin/basename, /bin/dirname (and others) in ksh93? If so, at which point > did the justification for 5019961 and 6210677 become invalid? Is there actual proof that they are slower? I believe the desired result is that they will be binaries that use libast. If however you are using sh/ksh, the idea is to trigger the builtin (since the code is already mapped in -- save the fork/exec). If using something else, you'll have the fork/exec old or new, and while it does mean that libast must be loaded and linked, it's about the same size as libc, and total we're talking about an additional 1.5mb being mapped in (of which a decent amount is likely going to already be cached given that there's so many things in OpenSolaris today that use ksh). > >> Being different isn't a bad thing -- IF there is a definite advantage. >> I don't see a lot of people complaining because of differences with >> OS X or the BSDs vs. Linux. > > Actually, that's just about the most frequent complaint that we get. Then revive madhatter. Otherwise there are always going to be differences. The only way to get rid of them is to ship Linux. >>> I'm also unclear at which point ksh93 was elevated to the level of >>> primacy >>> that this change implies. It appears that this change is making Solaris >>> less rather than more shell-agnostic, and I'm failing to understand why >>> that >>> is considered to be a good thing. >> >> Bugs aside, a binary is a binary. If /bin/sleep or /bin/printf happen >> to be symlinks to ksh93, how does this prevent you from doing anything >> in csh, zsh, bash, etc? > > The new /bin/sleep isn't a symlink to ksh93, it is a shell script. > >> It's a bit like saying Solaris is not >> language (programming) agnostic because libc is the primary stable API >> for developers. Yeah a lot of the stuff is written in C -- and a lot >> of the system stuff is written as sh or ksh scripts. It doesn't >> prevent anyone from using csh, zsh, ruby, python, java, C++, Ada, >> Fortran, or even Cobol on Solaris. > > The interface provided by libc is defined by the OS calling convention and > the system linker, not the language it is written in. The interface > provided by shell-level commands is defined by the command-line arguments > they accept, not the language they are written in. What I'm pointing out is > that this change makes the shell-level interface to sleep be the same as the > ksh93 implementation. If a shell doesn't provide a sleep builtin the system > implementation will now have ksh93 semantics, hence the comment about the > elevation of ksh93 to a new level of primacy. If the command line arguments are incompatible with the existing implementation, then that is probably a problem that should prevent putback until it is fixed. However, I don't recall seeing anyone claiming that is the case here. This was a bug where unnecessary work was being done which could cause failure in instances (i.e. NIS not running on a NIS client) where the previous implementation wouldn't fail. > However I haven't seen any justification as to why the ksh93-style interface > is to be preferred over any of the other possible interfaces. In a number of cases, the current utilities are encumbered (and don't look like they'll be able to change). The current ksh is one of these, but not the only one. While the ksh93 integration takes care of ksh, it also could take care of a number of other utilities in usr/closed as well. It also brings along some duplication for some existing stuff. We can kill multiple birds with one stone and consolidate, or leave duplicated code throughout the source tree. I'd prefer consolidation myself as long as the existing documented interfaces continue to work as described. _______________________________________________ tools-discuss mailing list tools-discuss@opensolaris.org