Re: [PATCH v10] Practical improvement to libgcc complex divide
On Wed, 7 Apr 2021, Patrick McGehearty via Gcc-patches wrote: > + macro_name = XALLOCAVEC (char, name_len > ++ sizeof ("__LIBGCC__EXCESS_PRECISION__")); > sprintf (macro_name, "__LIBGCC_%s_EXCESS_PRECISION__", name); > builtin_define_with_int_value (macro_name, excess_precision); > + > + char val_name[64]; > + > + macro_name = XALLOCAVEC (char, name_len > ++ sizeof ("__LIBGCC_EPSILON__")); > + sprintf (macro_name, "__LIBGCC_%s_EPSILON__", name); > + sprintf (val_name, "__%s_EPSILON__", float_h_prefix); > + builtin_define_with_value (macro_name, val_name, 0); > + > + macro_name = XALLOCAVEC (char, name_len + sizeof ("__LIBGCC_MAX__")); > + sprintf (macro_name, "__LIBGCC_%s_MAX__", name); > + sprintf (val_name, "__%s_MAX__", float_h_prefix); > + builtin_define_with_value (macro_name, val_name, 0); > + > + macro_name = XALLOCAVEC (char, name_len + sizeof ("__LIBGCC_MIN__")); > + sprintf (macro_name, "__LIBGCC_%s_MIN__", name); > + sprintf (val_name, "__%s_MIN__", float_h_prefix); > + builtin_define_with_value (macro_name, val_name, 0); I think there's an off-by-one error calculating the allocation sizes for these three macro names. Note that the code just above uses sizeof ("__LIBGCC__EXCESS_PRECISION__") with two underscores between "__LIBGCC" and "EXCESS_PRECISION__", reflecting that the macro name being constructed has both those underscores (around the %s expansion of size name_len). I think the three sizeof calls in the three subsequent allocations likewise need to have both those underscores present in their arguments. > diff --git a/libgcc/config/rs6000/_divkc3.c b/libgcc/config/rs6000/_divkc3.c > index d261f40..f7fa47f 100644 > --- a/libgcc/config/rs6000/_divkc3.c > +++ b/libgcc/config/rs6000/_divkc3.c > @@ -37,29 +37,115 @@ see the files COPYING3 and COPYING.RUNTIME respectively. > If not, see > #define __divkc3 __divkc3_sw > #endif > > +#define RBIG (__LIBGCC_TF_MAX__ / 2) > +#define RMIN (__LIBGCC_TF_MIN__) > +#define RMIN2 (__LIBGCC_TF_EPSILON__) > +#define RMINSCAL (1 / __LIBGCC_TF_EPSILON__) > +#define RMAX2 (RBIG * RMIN2) This file includes quad-float128.h, which does some remapping from TF to KF depending on __LONG_DOUBLE_IEEE128__. I think you probably need to have a similar __LONG_DOUBLE_IEEE128__ conditional here. If __LONG_DOUBLE_IEEE128__ is not defined, use __LIBGCC_KF_* macros instead of __LIBGCC_TF_*; if __LONG_DOUBLE_IEEE128__ is defined, use __LIBGCC_TF_* as above. (Unless the powerpc maintainers say otherwise.) -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH v10] Practical improvement to libgcc complex divide
- ping [A sincere and special thanks for the sustained perseverance of the reviewers in pointing me in the proper direction to get this patch polished. The original proposal was June 1, 2020 and only covered double precision. The current version is dramatically better, both from extending coverage to most precisions, improving the computation for accuracy and speed, and from improving the code maintainability. - Patrick McGehearty] On 4/7/2021 3:21 PM, Patrick McGehearty via Gcc-patches wrote: Changes in this version from Version 9: Replaced all uses of alloca with XALLOCAVEC in c_cpp_builtins() in c-cppbuiltin.c Did not replace alloca elsewhere in the same file. Fixed type of name_len. Fixed prototypes for abort & exit in new test programs. Fixed spelling errors and omitted words in comments. Changed XMTYPE to AMTYPE to avoid confusion with extended types. Removed XCTYPE as unused. (A for additional) Correctness and performance test programs used during development of this project may be found in the attachment to: https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg254210.html Summary of Purpose This patch to libgcc/libgcc2.c __divdc3 provides an opportunity to gain important improvements to the quality of answers for the default complex divide routine (half, float, double, extended, long double precisions) when dealing with very large or very small exponents. The current code correctly implements Smith's method (1962) [2] further modified by c99's requirements for dealing with NaN (not a number) results. When working with input values where the exponents are greater than *_MAX_EXP/2 or less than -(*_MAX_EXP)/2, results are substantially different from the answers provided by quad precision more than 1% of the time. This error rate may be unacceptable for many applications that cannot a priori restrict their computations to the safe range. The proposed method reduces the frequency of "substantially different" answers by more than 99% for double precision at a modest cost of performance. Differences between current gcc methods and the new method will be described. Then accuracy and performance differences will be discussed. Background This project started with an investigation related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59714. Study of Beebe[1] provided an overview of past and recent practice for computing complex divide. The current glibc implementation is based on Robert Smith's algorithm [2] from 1962. A google search found the paper by Baudin and Smith [3] (same Robert Smith) published in 2012. Elen Kalda's proposed patch [4] is based on that paper. I developed two sets of test data by randomly distributing values over a restricted range and the full range of input values. The current complex divide handled the restricted range well enough, but failed on the full range more than 1% of the time. Baudin and Smith's primary test for "ratio" equals zero reduced the cases with 16 or more error bits by a factor of 5, but still left too many flawed answers. Adding debug print out to cases with substantial errors allowed me to see the intermediate calculations for test values that failed. I noted that for many of the failures, "ratio" was a subnormal. Changing the "ratio" test from check for zero to check for subnormal reduced the 16 bit error rate by another factor of 12. This single modified test provides the greatest benefit for the least cost, but the percentage of cases with greater than 16 bit errors (double precision data) is still greater than 0.027% (2.7 in 10,000). Continued examination of remaining errors and their intermediate computations led to the various tests of input value tests and scaling to avoid under/overflow. The current patch does not handle some of the rare and most extreme combinations of input values, but the random test data is only showing 1 case in 10 million that has an error of greater than 12 bits. That case has 18 bits of error and is due to subtraction cancellation. These results are significantly better than the results reported by Baudin and Smith. Support for half, float, double, extended, and long double precision is included as all are handled with suitable preprocessor symbols in a single source routine. Since half precision is computed with float precision as per current libgcc practice, the enhanced algorithm provides no benefit for half precision and would cost performance. Further investigation showed changing the half precision algorithm to use the simple formula (real=a*c+b*d imag=b*c-a*d) caused no loss of precision and modest improvement in performance. The existing constants for each precision: float: FLT_MAX, FLT_MIN; double: DBL_MAX, DBL_MIN; extended and/or long double: LDBL_MAX, LDBL_MIN are used for avoiding the more common overflow/underflow cases. This use is made generic by defining appropriate __LIBGCC2_* macros in c-cppbuiltin.c. Tests are added for when both parts of the denominator have exponents small enough
[PATCH v10] Practical improvement to libgcc complex divide
Changes in this version from Version 9: Replaced all uses of alloca with XALLOCAVEC in c_cpp_builtins() in c-cppbuiltin.c Did not replace alloca elsewhere in the same file. Fixed type of name_len. Fixed prototypes for abort & exit in new test programs. Fixed spelling errors and omitted words in comments. Changed XMTYPE to AMTYPE to avoid confusion with extended types. Removed XCTYPE as unused. (A for additional) Correctness and performance test programs used during development of this project may be found in the attachment to: https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg254210.html Summary of Purpose This patch to libgcc/libgcc2.c __divdc3 provides an opportunity to gain important improvements to the quality of answers for the default complex divide routine (half, float, double, extended, long double precisions) when dealing with very large or very small exponents. The current code correctly implements Smith's method (1962) [2] further modified by c99's requirements for dealing with NaN (not a number) results. When working with input values where the exponents are greater than *_MAX_EXP/2 or less than -(*_MAX_EXP)/2, results are substantially different from the answers provided by quad precision more than 1% of the time. This error rate may be unacceptable for many applications that cannot a priori restrict their computations to the safe range. The proposed method reduces the frequency of "substantially different" answers by more than 99% for double precision at a modest cost of performance. Differences between current gcc methods and the new method will be described. Then accuracy and performance differences will be discussed. Background This project started with an investigation related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59714. Study of Beebe[1] provided an overview of past and recent practice for computing complex divide. The current glibc implementation is based on Robert Smith's algorithm [2] from 1962. A google search found the paper by Baudin and Smith [3] (same Robert Smith) published in 2012. Elen Kalda's proposed patch [4] is based on that paper. I developed two sets of test data by randomly distributing values over a restricted range and the full range of input values. The current complex divide handled the restricted range well enough, but failed on the full range more than 1% of the time. Baudin and Smith's primary test for "ratio" equals zero reduced the cases with 16 or more error bits by a factor of 5, but still left too many flawed answers. Adding debug print out to cases with substantial errors allowed me to see the intermediate calculations for test values that failed. I noted that for many of the failures, "ratio" was a subnormal. Changing the "ratio" test from check for zero to check for subnormal reduced the 16 bit error rate by another factor of 12. This single modified test provides the greatest benefit for the least cost, but the percentage of cases with greater than 16 bit errors (double precision data) is still greater than 0.027% (2.7 in 10,000). Continued examination of remaining errors and their intermediate computations led to the various tests of input value tests and scaling to avoid under/overflow. The current patch does not handle some of the rare and most extreme combinations of input values, but the random test data is only showing 1 case in 10 million that has an error of greater than 12 bits. That case has 18 bits of error and is due to subtraction cancellation. These results are significantly better than the results reported by Baudin and Smith. Support for half, float, double, extended, and long double precision is included as all are handled with suitable preprocessor symbols in a single source routine. Since half precision is computed with float precision as per current libgcc practice, the enhanced algorithm provides no benefit for half precision and would cost performance. Further investigation showed changing the half precision algorithm to use the simple formula (real=a*c+b*d imag=b*c-a*d) caused no loss of precision and modest improvement in performance. The existing constants for each precision: float: FLT_MAX, FLT_MIN; double: DBL_MAX, DBL_MIN; extended and/or long double: LDBL_MAX, LDBL_MIN are used for avoiding the more common overflow/underflow cases. This use is made generic by defining appropriate __LIBGCC2_* macros in c-cppbuiltin.c. Tests are added for when both parts of the denominator have exponents small enough to allow shifting any subnormal values to normal values all input values could be scaled up without risking overflow. That gained a clear improvement in accuracy. Similarly, when either numerator was subnormal and the other numerator and both denominator values were not too large, scaling could be used to reduce risk of computing with subnormals. The test and scaling values used all fit within the allowed exponent range for each precision required by the C standard. Float precision has more diffic