On Thu, Jul 23, 2009 at 5:36 PM, Dr. David
Kirkby<[email protected]> wrote:
>
> Many people looked at the reason there were 20 test failures of the MPFR
> test suite on a Sun T5240. I believe the issue is due to memset).
>
> I telephoned Sun a couple of days back to report this officially. They
> have been extremely efficient at handling this case.
>
> I now have some information from Sun. I asked the engineer if I could
> make it public, and he has said yes. He is in fact going to put some of
> it on a mailing list, and it will eventually appear in Sunsolve.
>
> I'm told the fix will be backported to Solaris 10 and I should have a
> Interim Diagnostic Relief to test myself in a few weeks, but it wont be
> a public patch for some time, until it's fully tested.
>
> Dave

Wow, Dave, you are amazingly good at doing things right and being a
professional!  Thanks!!

William

>
> ----
> Your service case regarding memset(3C)'s behaviour on sun4v systems
> when the size_t argument is nonzero but zero mod 232 has now been
> transferred to Europe, and I have taken ownership.  And I intend to
> keep ownership until we've reached a mutually acceptable resolution,
> barring vacation stand-ins and unforeseen events.
>
> Let me quickly recapitulate the facts:
>
> + This is on record as a bug under Change Request id 6507249,
> + it's fixed in the internal development version of (future)
>   Solaris and thus in OpenSolaris based on builds snv_62 or later,
> + it affects only
>   + 32-bit applications
>   + running on Solaris 10
>   + on all SPARC sun4v (CoolThreads^TM) platforms,
> + it originates in the hardware-optimized libc_psr_hwcap[12].so.1
>   which (by default) get mounted over /platform/sun4v/lib/libc_psr.so.1
>   during the Solaris boot sequence,
> + it affects invocations of memset(3C) where the third (size_t)
>   argument is nonzero but its low-order 32 bits are zero  (thus
>   it ought to be zero considered as a size_t).
>
> (A subtle point is that it won't affect the *first* call to memset()
> after exec, as the runtime loader processing for lazy symbol binding
> will clear the upper 32 bits as a side effect before passing control
> to the newly-bound function entry point.)
>
> The bugfix has not (yet) been beackported to Solaris 10 because there
> has not (yet) been any tangible demand for such a backport.  Until
> just now, there had not yet been a single external customer record
> on CR#6507249!
>
> I am adding one for this present service request now.
>
> In fact, the vast majority of application code would not be at risk
> of being affected by this bug.  Most uses of memset() pass a compile-
> time constant for the size, often some sizeof(struct such_and_such).
> Passing a manifest 32-bit int variable for the size will also avoid
> the bug.  It can only happen when memset() is invoked with some
> nontrivial arithmetic expression, or some explicit 64-bit variable
> for the size.  Such code idioms are quite rare.
>
> Also, there are a number of workarounds to choose from, depending on
> the situation:
>
> + When the application source code is available for modification:
>   + store the expression result in a variable and then pass the
>     variable to memset() (though compiler optimizations might
>     subvert this),
>   + test the variable for being 32-bit-equal-to-zero and bypass
>     memset() if it is,
>
> + or at runtime:
>   + invoke the application with LD_NOAUXFLTR=1 in the environment
>     (cf. man ld.so.1(1), which selectively disables the optimized
>     libc_psr.so.1 just for this process),
>   + umount the optimized libc_psr.so.1 system-wide,
>   + interpose a different memset() implementation e.g. via an
>     LD_PRELOAD'ed shared object.
>
>  > Since the MPFR library code we are using is open source, we have managed
>  > to work around this Solaris bug, by sticking an 'if' statement in front
>  > of the call to macro which calls memset().
>  >
>  > Though of course I don't know if it will affect anything else. So I
>  > guess it is safer to unmount this, but I assume that will have quite a
>  > performance impact.
>
> The performance impact of not using the optimized libc_psr.so.1
> varies widely among applications, depending on how much memset()ing
> and memcpy()ing and memmove()ing they do.  It can range all the way
> from negligible to a few ten percent in benchmarks.
>
> But the LD_NOAUXFLTR=1 approach limits the performance impact to
> those applications which are known or suspected to be affected
> by the bug.---
>
> <SNIP>
>
> Would you be willing to test-drive any future binary fix in the
> shape of an Interim Diagnostic Relief prior to patch creation,
> as well as a release-candidate patch at the T-Patch stage prior
> to patch release?  For background information on IDRs, please see:
>
>   http://sunsolve.sun.com/show.do?target=IDR
>
> Since the affected deliverables (libc_psr_hwcap1.so.1) have also
> been modified by existing patches including some Kernel Update
> patches, any such IDR would (have to) be built to fit onto a
> particular set of patch revisions.  The most recent change had
> come in Kernel Update patch 127127-11, thus the easiest would
> be an IDR with a hard dependency on this patch.  Should you have
> need for an IDR against older patch levels than this, please do
> let me know!
>
> >
>



-- 
William Stein
Associate Professor of Mathematics
University of Washington
http://wstein.org

--~--~---------~--~----~------------~-------~--~----~
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~----------~----~----~----~------~----~------~--~---

Reply via email to