William Stein wrote:
> On Thu, Jul 23, 2009 at 5:36 PM, Dr. David
> Kirkby<[email protected]> wrote:
>> Many people looked at the reason there were 20 test failures of the MPFR
>> test suite on a Sun T5240. I believe the issue is due to memset).
>>
>> I telephoned Sun a couple of days back to report this officially. They
>> have been extremely efficient at handling this case.
>>
>> I now have some information from Sun. I asked the engineer if I could
>> make it public, and he has said yes. He is in fact going to put some of
>> it on a mailing list, and it will eventually appear in Sunsolve.
>>
>> I'm told the fix will be backported to Solaris 10 and I should have a
>> Interim Diagnostic Relief to test myself in a few weeks, but it wont be
>> a public patch for some time, until it's fully tested.
>>
>> Dave
> 
> Wow, Dave, you are amazingly good at doing things right and being a
> professional!  Thanks!!
> 
> William

Cheers.

>> Your service case regarding memset(3C)'s behaviour on sun4v systems
>> when the size_t argument is nonzero but zero mod 232 has now been
>> transferred to Europe, and I have taken ownership.  And I intend to
>> keep ownership until we've reached a mutually acceptable resolution,
>> barring vacation stand-ins and unforeseen events.

That was supposed to be 2 to the power 32, but my 2^32 appears to come 
out as the number 232 there.


>> Let me quickly recapitulate the facts:
>>
>> + This is on record as a bug under Change Request id 6507249,
>> + it's fixed in the internal development version of (future)
>>   Solaris and thus in OpenSolaris based on builds snv_62 or later,
>> + it affects only
>>   + 32-bit applications
>>   + running on Solaris 10
>>   + on all SPARC sun4v (CoolThreads^TM) platforms,
>> + it originates in the hardware-optimized libc_psr_hwcap[12].so.1
>>   which (by default) get mounted over /platform/sun4v/lib/libc_psr.so.1
>>   during the Solaris boot sequence,
>> + it affects invocations of memset(3C) where the third (size_t)
>>   argument is nonzero but its low-order 32 bits are zero  (thus
>>   it ought to be zero considered as a size_t).
>>
>> (A subtle point is that it won't affect the *first* call to memset()
>> after exec, as the runtime loader processing for lazy symbol binding
>> will clear the upper 32 bits as a side effect before passing control
>> to the newly-bound function entry point.)
>>
>> The bugfix has not (yet) been beackported to Solaris 10 because there
>> has not (yet) been any tangible demand for such a backport.  Until
>> just now, there had not yet been a single external customer record
>> on CR#6507249!
>>
>> I am adding one for this present service request now.
>>
>> In fact, the vast majority of application code would not be at risk
>> of being affected by this bug.  Most uses of memset() pass a compile-
>> time constant for the size, often some sizeof(struct such_and_such).
>> Passing a manifest 32-bit int variable for the size will also avoid
>> the bug.  It can only happen when memset() is invoked with some
>> nontrivial arithmetic expression, or some explicit 64-bit variable
>> for the size.  Such code idioms are quite rare.
>>
>> Also, there are a number of workarounds to choose from, depending on
>> the situation:
>>
>> + When the application source code is available for modification:
>>   + store the expression result in a variable and then pass the
>>     variable to memset() (though compiler optimizations might
>>     subvert this),
>>   + test the variable for being 32-bit-equal-to-zero and bypass
>>     memset() if it is,
>>
>> + or at runtime:
>>   + invoke the application with LD_NOAUXFLTR=1 in the environment
>>     (cf. man ld.so.1(1), which selectively disables the optimized
>>     libc_psr.so.1 just for this process),
>>   + umount the optimized libc_psr.so.1 system-wide,
>>   + interpose a different memset() implementation e.g. via an
>>     LD_PRELOAD'ed shared object.
>>
>>  > Since the MPFR library code we are using is open source, we have managed
>>  > to work around this Solaris bug, by sticking an 'if' statement in front
>>  > of the call to macro which calls memset().
>>  >
>>  > Though of course I don't know if it will affect anything else. So I
>>  > guess it is safer to unmount this, but I assume that will have quite a
>>  > performance impact.
>>
>> The performance impact of not using the optimized libc_psr.so.1
>> varies widely among applications, depending on how much memset()ing
>> and memcpy()ing and memmove()ing they do.  It can range all the way
>> from negligible to a few ten percent in benchmarks.
>>
>> But the LD_NOAUXFLTR=1 approach limits the performance impact to
>> those applications which are known or suspected to be affected
>> by the bug.---
>>
>> <SNIP>
>>
>> Would you be willing to test-drive any future binary fix in the
>> shape of an Interim Diagnostic Relief prior to patch creation,
>> as well as a release-candidate patch at the T-Patch stage prior
>> to patch release?  For background information on IDRs, please see:
>>
>>   http://sunsolve.sun.com/show.do?target=IDR
>>
>> Since the affected deliverables (libc_psr_hwcap1.so.1) have also
>> been modified by existing patches including some Kernel Update
>> patches, any such IDR would (have to) be built to fit onto a
>> particular set of patch revisions.  The most recent change had
>> come in Kernel Update patch 127127-11, thus the easiest would
>> be an IDR with a hard dependency on this patch.  Should you have
>> need for an IDR against older patch levels than this, please do
>> let me know!
>>
> 
> 
> 


--~--~---------~--~----~------------~-------~--~----~
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at http://groups.google.com/group/sage-devel
URLs: http://www.sagemath.org
-~----------~----~----~----~------~----~------~--~---

Reply via email to