On Thu, Jul 23, 2009 at 5:36 PM, Dr. David Kirkby<[email protected]> wrote: > > Many people looked at the reason there were 20 test failures of the MPFR > test suite on a Sun T5240. I believe the issue is due to memset). > > I telephoned Sun a couple of days back to report this officially. They > have been extremely efficient at handling this case. > > I now have some information from Sun. I asked the engineer if I could > make it public, and he has said yes. He is in fact going to put some of > it on a mailing list, and it will eventually appear in Sunsolve. > > I'm told the fix will be backported to Solaris 10 and I should have a > Interim Diagnostic Relief to test myself in a few weeks, but it wont be > a public patch for some time, until it's fully tested. > > Dave
Wow, Dave, you are amazingly good at doing things right and being a professional! Thanks!! William > > ---- > Your service case regarding memset(3C)'s behaviour on sun4v systems > when the size_t argument is nonzero but zero mod 232 has now been > transferred to Europe, and I have taken ownership. And I intend to > keep ownership until we've reached a mutually acceptable resolution, > barring vacation stand-ins and unforeseen events. > > Let me quickly recapitulate the facts: > > + This is on record as a bug under Change Request id 6507249, > + it's fixed in the internal development version of (future) > Solaris and thus in OpenSolaris based on builds snv_62 or later, > + it affects only > + 32-bit applications > + running on Solaris 10 > + on all SPARC sun4v (CoolThreads^TM) platforms, > + it originates in the hardware-optimized libc_psr_hwcap[12].so.1 > which (by default) get mounted over /platform/sun4v/lib/libc_psr.so.1 > during the Solaris boot sequence, > + it affects invocations of memset(3C) where the third (size_t) > argument is nonzero but its low-order 32 bits are zero (thus > it ought to be zero considered as a size_t). > > (A subtle point is that it won't affect the *first* call to memset() > after exec, as the runtime loader processing for lazy symbol binding > will clear the upper 32 bits as a side effect before passing control > to the newly-bound function entry point.) > > The bugfix has not (yet) been beackported to Solaris 10 because there > has not (yet) been any tangible demand for such a backport. Until > just now, there had not yet been a single external customer record > on CR#6507249! > > I am adding one for this present service request now. > > In fact, the vast majority of application code would not be at risk > of being affected by this bug. Most uses of memset() pass a compile- > time constant for the size, often some sizeof(struct such_and_such). > Passing a manifest 32-bit int variable for the size will also avoid > the bug. It can only happen when memset() is invoked with some > nontrivial arithmetic expression, or some explicit 64-bit variable > for the size. Such code idioms are quite rare. > > Also, there are a number of workarounds to choose from, depending on > the situation: > > + When the application source code is available for modification: > + store the expression result in a variable and then pass the > variable to memset() (though compiler optimizations might > subvert this), > + test the variable for being 32-bit-equal-to-zero and bypass > memset() if it is, > > + or at runtime: > + invoke the application with LD_NOAUXFLTR=1 in the environment > (cf. man ld.so.1(1), which selectively disables the optimized > libc_psr.so.1 just for this process), > + umount the optimized libc_psr.so.1 system-wide, > + interpose a different memset() implementation e.g. via an > LD_PRELOAD'ed shared object. > > > Since the MPFR library code we are using is open source, we have managed > > to work around this Solaris bug, by sticking an 'if' statement in front > > of the call to macro which calls memset(). > > > > Though of course I don't know if it will affect anything else. So I > > guess it is safer to unmount this, but I assume that will have quite a > > performance impact. > > The performance impact of not using the optimized libc_psr.so.1 > varies widely among applications, depending on how much memset()ing > and memcpy()ing and memmove()ing they do. It can range all the way > from negligible to a few ten percent in benchmarks. > > But the LD_NOAUXFLTR=1 approach limits the performance impact to > those applications which are known or suspected to be affected > by the bug.--- > > <SNIP> > > Would you be willing to test-drive any future binary fix in the > shape of an Interim Diagnostic Relief prior to patch creation, > as well as a release-candidate patch at the T-Patch stage prior > to patch release? For background information on IDRs, please see: > > http://sunsolve.sun.com/show.do?target=IDR > > Since the affected deliverables (libc_psr_hwcap1.so.1) have also > been modified by existing patches including some Kernel Update > patches, any such IDR would (have to) be built to fit onto a > particular set of patch revisions. The most recent change had > come in Kernel Update patch 127127-11, thus the easiest would > be an IDR with a hard dependency on this patch. Should you have > need for an IDR against older patch levels than this, please do > let me know! > > > > -- William Stein Associate Professor of Mathematics University of Washington http://wstein.org --~--~---------~--~----~------------~-------~--~----~ To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://www.sagemath.org -~----------~----~----~----~------~----~------~--~---
