Just throwing a few ideas/warnings your way :) I ran into these same issues when implementing arb_shader_precision for GLSL.

It's not my intention to introduce more questions than answers here. This is all simply coming from a fellow dev who's been beaten down by float's guerrilla attacks, not a float expert by any means.

First, float is bees, let's get that out of the way.

nextafter does sound like the clearest solution. I used a hand-generated method to convert to the binary representation for my purposes, which made the most sense to me:

def _floatToBits(f):
    s = struct.pack('>f', f)
    # capital 'L' is important - sign bit is already built into 754
    return struct.unpack('>L', s)[0]

def _bitsToFloat(b):
    s = struct.pack('>l', b)
    return struct.unpack('>f', s)[0]

However nextafter accomplishes the same thing.

I did find it easiest/clearest to move to the binary representation to do all ulp-boundary math and comparisons, and only convert back to float at the end.

There are many gotcha's here in addition to the ones you've already mentioned. The first would be "who's ulp?" I haven't checked the spec governing these CL tests in question, but arb_shader_precision never specified which ulp it was referring to. Is it the value of an ulp at the expected value? Or the value of an ulp at one of the inputs? Or the largest of all those? Also, it's very helpful when the spec specifies a range of values for which the operation can be held to the specified number of ulps.

Secondly the rounding issue you raised is valid and difficult to get right. Operations submitted to the CPU will use the rounding mode it is set up for. I believe the default on intel chips is round-to-nearest, ties-to-even, which seems to be the case for most GPUs as well, but I'm not an expert on this subject by any means. Rounding doesn't just happen at the end, either - intermediate values in complex equations will be rounded as well - these rounding errors could multiply, or cancel, depending on the inputs. The other thing to consider is if the equation that produced the expected value used python's round(), and it was run with Python2, then round() would use round-away-from-zero; python3's round uses round-to-nearest, ties-to-even.

Regarding the generation of expected values, it's important to be very cautious about equations that stack multiple floating point operations, as this is where errors have the potential to get out of hand quickly. The most difficult issue here, in my mind, is fma. Unless you're testing 64-bit floating point on the GPU, it will be doing its calculations in 32-bit space, however, if the GPU compiler employs fma, this can cause intermediate operations to be done at higher precisions. This means that the GPU may be jumping between low-precision and high-precision operations over the course of a single equation, where python may well be doing all operations at a uniform level of precision. This can result in differences in expected results that are very hard to track down. Sadly, it's nearly impossible to predict how a compiler might employ fma for any give equation. You can use python's decimal, bigfloat or numpy (ugh) to clamp down the precision at which the operation is evaluated, but again, it will be difficult-to-impossible to predict if and where the GPU compiler employed fma.

Ultimately, you're attempting to get python to run an equation *in exactly the same way* that the shader code does, in order to produce an identical result. This is a difficult thing. Your ulps tolerance will likely, in some way, be accounting for these differences between the CPU and GPU generated results, in addition to its true purpose which is to identify real issues in the GPU compiler or hardware that introduce unintended floating point errors.

You will, ultimately, have to run all the test vectors through your chosen solution and carefully inspect any differences in the results. You may discover that certain test vectors are simply not appropriate.

In the end, good floating point tests use very carefully selected inputs. Wisdom or deception? Perhaps a bit of both.



On 06/15/2015 09:29 AM, Jan Vesely wrote:


On Sun, Jun 14, 2015 at 7:48 PM, Aaron Watry <[email protected]
<mailto:[email protected]>> wrote:



    On Sun, Jun 14, 2015 at 4:19 PM, Jan Vesely <[email protected]
    <mailto:[email protected]>> wrote:

        On Sat, 2015-06-13 at 21:22 -0500, Aaron Watry wrote:
        > Meh, this still feels broken.  Give me a bit longer.

        and it is :). I don't think this can work based on abs(expected
        - real),
        since ULP depends on the magnitude of the numbers. This
        information is
        lost after subtraction.


    Yeah, that's essentially the conclusion that I came to last night as
    well.  Currently, we're saying that 3 ULP is 3x the smallest
    representable floating point number, not 3x the interval between
    adjacent float values for the given magnitude.

    I was thinking that we might need to take the expected value, cast
    to unsigned, add the ulp value, then convert back to float.  From
    there, find the difference between expected and the expected+ulp,
    and use that as a tolerance.  We could alternatively call
    nextafterf/nextafter the required number of times from the expected
    value in either direction and get a min/max allowed value and then
    check that the result value is in that range.


I considered casting to ints too. However, it runs into problems in some
corner cases (like neg zero, crossing exponent boundaries, ...),
nextafter offered an easy way out. although it's still not perfect. When
the number sits on exponent boundary, going up or down gives different
steps. one example is 2^24:
16777214: 16777214.000000
16777215: 16777215.000000
16777216: 16777216.000000
16777217: 16777216.000000
16777218: 16777218.000000
16777219: 16777220.000000

we could take something like unit = nextafterf(expected) - expected;
and then compare against ulp * unit, that way we can even subtract 0.5
unit for python/piglit rounding.
Although nextafterf(expected) - expected hits representability issues
for very large numbers.

however the problem whether to take nextaftef(x, 0.0f) or nextafterf(x,
+/- inf) persits.
I'm not sure which one we want. if python always rounded to/away from zero,
we could always take the opposite, but it would cause problems for
non-integer ulp (which we don't support at the moment anyway).


Jan


    Yes, we still run into cases where the tests themselves are expected
    an incorrectly rounded value, but at least the ULP checking code
    might be functioning correctly then.

    Thoughts?

    --Aaron


        I had an idea some time back to implement this using nextafterf
        in both
        directions and checking whether the result falls in that interval.
        However, there is still a problem. Some of the expected values are
        already rounded (I'm not sure what rounding mode is used by
        python by
        default), but unless it always rounds in one direction, we'll
        still get
        slight differences based on whether the actual value was rounded
        up or
        down.

        I have attached a small test program that shows the deficiencies of
        fabsf based approaches.

        regards,
        Jan

         >
         > --Aaron
         >
         > On Sat, Jun 13, 2015 at 2:28 PM, Aaron Watry
        <[email protected] <mailto:[email protected]>> wrote:
         >
         > > We need to actually check against the float value from the
        union,
         > > instead of just doing (diff > ulp), which seems to cast the
        diff to
         > > an int before checking against ulp.
         > >
         > > Signed-off-by: Aaron Watry <[email protected]
        <mailto:[email protected]>>
         > > CC: Tom Stellard <[email protected]
        <mailto:[email protected]>>
         > > CC: Jan Vesely <[email protected]
        <mailto:[email protected]>>
         > > ---
         > >  tests/util/piglit-util-cl.c | 28 ++++++++++++++--------------
         > >  1 file changed, 14 insertions(+), 14 deletions(-)
         > >
         > > diff --git a/tests/util/piglit-util-cl.c
        b/tests/util/piglit-util-cl.c
         > > index 47e0c7a..6cdd718 100644
         > > --- a/tests/util/piglit-util-cl.c
         > > +++ b/tests/util/piglit-util-cl.c
         > > @@ -80,7 +80,7 @@ piglit_cl_probe_floating(float value,
        float expect,
         > > uint32_t ulp)
         > >
         > >         diff = fabsf(value - expect);
         > >
         > > -       if(diff > ulp || isnan(value)) {
         > > +       if (diff > t.f || isnan(value)) {
         > >                 printf("Expecting %f (0x%x) with tolerance
        %f (%u ulps),
         > > but got %f (0x%x)\n",
         > >                        e.f, e.u, t.f, t.u, v.f, v.u);
         > >                 return false;
         > > @@ -108,7 +108,7 @@ piglit_cl_probe_double(double value,
        double expect,
         > > uint64_t ulp)
         > >
         > >         diff = fabsl(value - expect);
         > >
         > > -       if(diff > ulp || isnan(value)) {
         > > +       if (diff > t.f || isnan(value)) {
         > >                 printf("Expecting %f (0x%lx) with tolerance
        %f (%lu ulps),
         > > but got %f (0x%lx)\n",
         > >                        e.f, e.u, t.f, t.u, v.f, v.u);
         > >                 return false;
         > > @@ -162,7 +162,7 @@
        piglit_cl_get_platform_version(cl_platform_id platform)
         > >         int scanf_count;
         > >         int major;
         > >         int minor;
         > > -
         > > +
         > >         /*
         > >          * Returned format:
         > >          *
         > >
        OpenCL<space><major_version.minor_version><space><platform-specific
         > > information>
         > > @@ -353,7 +353,7 @@ piglit_cl_get_info(void* fn_ptr, void*
        obj, cl_uint
         > > param)
         > >
         > >         if(errNo == CL_SUCCESS) {
         > >                 param_ptr = calloc(param_size, sizeof(char));
         > > -
         > > +
         > >                 /* retrieve param */
         > >                 if(fn_ptr == clGetPlatformInfo) {
         > >                         errNo =
        clGetPlatformInfo(*(cl_platform_id*)obj,
         > > param,
         > > @@ -463,7 +463,7 @@
        piglit_cl_get_program_build_info(cl_program program,
         > > cl_device_id device,
         > >                 .program = program,
         > >                 .device = device
         > >         };
         > > -
         > > +
         > >         return piglit_cl_get_info(clGetProgramBuildInfo,
        &args, param);
         > >  }
         > >
         > > @@ -479,7 +479,7 @@
        piglit_cl_get_kernel_work_group_info(cl_kernel kernel,
         > > cl_device_id device,
         > >                 .kernel = kernel,
         > >                 .device = device
         > >         };
         > > -
         > > +
         > >         return piglit_cl_get_info(clGetKernelWorkGroupInfo,
        &args, param);
         > >  }
         > >
         > > @@ -620,7 +620,7 @@ piglit_cl_get_device_ids(cl_platform_id
        platform_id,
         > > cl_device_type device_type,
         > >
          piglit_cl_get_error_name(errNo));
         > >                                 return 0;
         > >                         }
         > > -
         > > +
         > >                         /* get device list */
         > >                         if(device_ids != NULL &&
        num_device_ids > 0) {
         > >                                 *device_ids =
        malloc(num_device_ids *
         > > sizeof(cl_device_id));
         > > @@ -761,7 +761,7 @@
         > >
        piglit_cl_build_program_with_source_extended(piglit_cl_context
        context,
         > >                         piglit_cl_get_error_name(errNo));
         > >                 return NULL;
         > >         }
         > > -
         > > +
         > >         errNo = clBuildProgram(program,
         > >                                context->num_devices,
         > >                                context->device_ids,
         > > @@ -788,7 +788,7 @@
         > >
        piglit_cl_build_program_with_source_extended(piglit_cl_context
        context,
         > >                         char* log =
         > > piglit_cl_get_program_build_info(program,
         > >
         > >  context->device_ids[i],
         > >
         > >  CL_PROGRAM_BUILD_LOG);
         > > -
         > > +
         > >                         printf("Build log for device %s:\n
        -------- \n%s\n
         > > -------- \n",
         > >                                device_name,
         > >                                log);
         > > @@ -848,11 +848,11 @@
         > >
        piglit_cl_build_program_with_binary_extended(piglit_cl_context
        context,
         > >                 for(i = 0; i < context->num_devices; i++) {
         > >                         char* device_name =
         > > piglit_cl_get_device_info(context->device_ids[i],
         > >
         > > CL_DEVICE_NAME);
         > > -
         > > +
         > >                         printf("Error for %s: %s\n",
         > >                                device_name,
         > >
        piglit_cl_get_error_name(binary_status[i]));
         > > -
         > > +
         > >                         free(device_name);
         > >                 }
         > >
         > > @@ -860,7 +860,7 @@
         > >
        piglit_cl_build_program_with_binary_extended(piglit_cl_context
        context,
         > >                 return NULL;
         > >         }
         > >         free(binary_status);
         > > -
         > > +
         > >         errNo = clBuildProgram(program,
         > >                                context->num_devices,
         > >                                context->device_ids,
         > > @@ -884,11 +884,11 @@
         > >
        piglit_cl_build_program_with_binary_extended(piglit_cl_context
        context,
         > >                         char* log =
         > > piglit_cl_get_program_build_info(program,
         > >
         > >  context->device_ids[i],
         > >
         > >  CL_PROGRAM_BUILD_LOG);
         > > -
         > > +
         > >                         printf("Build log for device %s:\n
        -------- \n%s\n
         > > -------- \n",
         > >                                device_name,
         > >                                log);
         > > -
         > > +
         > >                         free(device_name);
         > >                         free(log);
         > >                 }
         > > --
         > > 2.1.4
         > >
         > >


        --
        Jan Vesely <[email protected] <mailto:[email protected]>>





_______________________________________________
Piglit mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/piglit


--

Micah Fedke
Collabora Ltd.
+44 1223 362967
https://www.collabora.com/
https://twitter.com/collaboraltd
_______________________________________________
Piglit mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/piglit

Reply via email to