William Stein wrote: > On Thu, Jul 23, 2009 at 5:36 PM, Dr. David > Kirkby<[email protected]> wrote: >> Many people looked at the reason there were 20 test failures of the MPFR >> test suite on a Sun T5240. I believe the issue is due to memset). >> >> I telephoned Sun a couple of days back to report this officially. They >> have been extremely efficient at handling this case. >> >> I now have some information from Sun. I asked the engineer if I could >> make it public, and he has said yes. He is in fact going to put some of >> it on a mailing list, and it will eventually appear in Sunsolve. >> >> I'm told the fix will be backported to Solaris 10 and I should have a >> Interim Diagnostic Relief to test myself in a few weeks, but it wont be >> a public patch for some time, until it's fully tested. >> >> Dave > > Wow, Dave, you are amazingly good at doing things right and being a > professional! Thanks!! > > William
Cheers. >> Your service case regarding memset(3C)'s behaviour on sun4v systems >> when the size_t argument is nonzero but zero mod 232 has now been >> transferred to Europe, and I have taken ownership. And I intend to >> keep ownership until we've reached a mutually acceptable resolution, >> barring vacation stand-ins and unforeseen events. That was supposed to be 2 to the power 32, but my 2^32 appears to come out as the number 232 there. >> Let me quickly recapitulate the facts: >> >> + This is on record as a bug under Change Request id 6507249, >> + it's fixed in the internal development version of (future) >> Solaris and thus in OpenSolaris based on builds snv_62 or later, >> + it affects only >> + 32-bit applications >> + running on Solaris 10 >> + on all SPARC sun4v (CoolThreads^TM) platforms, >> + it originates in the hardware-optimized libc_psr_hwcap[12].so.1 >> which (by default) get mounted over /platform/sun4v/lib/libc_psr.so.1 >> during the Solaris boot sequence, >> + it affects invocations of memset(3C) where the third (size_t) >> argument is nonzero but its low-order 32 bits are zero (thus >> it ought to be zero considered as a size_t). >> >> (A subtle point is that it won't affect the *first* call to memset() >> after exec, as the runtime loader processing for lazy symbol binding >> will clear the upper 32 bits as a side effect before passing control >> to the newly-bound function entry point.) >> >> The bugfix has not (yet) been beackported to Solaris 10 because there >> has not (yet) been any tangible demand for such a backport. Until >> just now, there had not yet been a single external customer record >> on CR#6507249! >> >> I am adding one for this present service request now. >> >> In fact, the vast majority of application code would not be at risk >> of being affected by this bug. Most uses of memset() pass a compile- >> time constant for the size, often some sizeof(struct such_and_such). >> Passing a manifest 32-bit int variable for the size will also avoid >> the bug. It can only happen when memset() is invoked with some >> nontrivial arithmetic expression, or some explicit 64-bit variable >> for the size. Such code idioms are quite rare. >> >> Also, there are a number of workarounds to choose from, depending on >> the situation: >> >> + When the application source code is available for modification: >> + store the expression result in a variable and then pass the >> variable to memset() (though compiler optimizations might >> subvert this), >> + test the variable for being 32-bit-equal-to-zero and bypass >> memset() if it is, >> >> + or at runtime: >> + invoke the application with LD_NOAUXFLTR=1 in the environment >> (cf. man ld.so.1(1), which selectively disables the optimized >> libc_psr.so.1 just for this process), >> + umount the optimized libc_psr.so.1 system-wide, >> + interpose a different memset() implementation e.g. via an >> LD_PRELOAD'ed shared object. >> >> > Since the MPFR library code we are using is open source, we have managed >> > to work around this Solaris bug, by sticking an 'if' statement in front >> > of the call to macro which calls memset(). >> > >> > Though of course I don't know if it will affect anything else. So I >> > guess it is safer to unmount this, but I assume that will have quite a >> > performance impact. >> >> The performance impact of not using the optimized libc_psr.so.1 >> varies widely among applications, depending on how much memset()ing >> and memcpy()ing and memmove()ing they do. It can range all the way >> from negligible to a few ten percent in benchmarks. >> >> But the LD_NOAUXFLTR=1 approach limits the performance impact to >> those applications which are known or suspected to be affected >> by the bug.--- >> >> <SNIP> >> >> Would you be willing to test-drive any future binary fix in the >> shape of an Interim Diagnostic Relief prior to patch creation, >> as well as a release-candidate patch at the T-Patch stage prior >> to patch release? For background information on IDRs, please see: >> >> http://sunsolve.sun.com/show.do?target=IDR >> >> Since the affected deliverables (libc_psr_hwcap1.so.1) have also >> been modified by existing patches including some Kernel Update >> patches, any such IDR would (have to) be built to fit onto a >> particular set of patch revisions. The most recent change had >> come in Kernel Update patch 127127-11, thus the easiest would >> be an IDR with a hard dependency on this patch. Should you have >> need for an IDR against older patch levels than this, please do >> let me know! >> > > > --~--~---------~--~----~------------~-------~--~----~ To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/sage-devel URLs: http://www.sagemath.org -~----------~----~----~----~------~----~------~--~---
