haojin2 opened a new pull request #10255: [MXNET-142] Enhance test for LeakyReLU operator URL: https://github.com/apache/incubator-mxnet/pull/10255 ## Description ## Enhancement of test_leaky_relu and test_prelu ## Checklist ## ### Essentials ### - [x] The PR title starts with [MXNET-142] - [x] Changes are complete (i.e. I finished coding on this PR) - [x] All changes have test coverage - [x] Code is well-documented - [x] To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change ### Changes ### - [x] Improve test_leaky_relu to provide coverage for all float types - [x] Improve test_prelu to provide coverage for all float types ## Comments ## - This PR aims to address the issue with previous version of test_leaky_relu caused by precision issues with finite difference method with floating point 16 data type - With some experiment I discovered that finite difference method may not be suitable for checking numeric gradients with 16-bit floating point inputs. Here's an example (All numbers used are represented by 16-bit floating point numbers): x: [[-0.9,-0.8,-0.7], [-0.6,-0.5,-0.4], [-0.3,-0.2,-0.1], [0.1,0.2,0.3], [0.4,0.5,0.6], [0.7,0.8,0.9]] act_type: leaky_relu slope:0.25 Analytical Derivative: [[0.25,0.25,0.25], [0.25,0.25,0.25], [0.25,0.25,0.25], [ 1.0 , 1.0 , 1.0], [ 1.0 , 1.0 , 1.0], [ 1.0 , 1.0 , 1.0]] Numeric Derivative from finite difference method with epsilon=1e-4: [[ 0.61035156 0.61035156 0.61035156] [ 0.61035156 0.30517578 0.30517578] [ 0.30517578 0.15258789 0.22888184] [ 0.91552734 0.61035156 1.22070312] [ 1.22070312 1.22070312 2.44140625] [ 2.44140625 2.44140625 2.44140625]] Now if we divide all values in x by 256, which means we are shrinking their absolute values, then apply numeric method with the same epsilon=1e-4, we get a new set of derivatives: [ 0.25268555 0.25268555 0.25268555] [ 0.25268555 0.25024414 0.25024414] [ 0.25024414 0.24914551 0.24975586] [ 0.99902344 0.99658203 1.00097656] [ 1.00097656 1.00097656 1.01074219] [ 1.01074219 1.01074219 1.01074219]] We can see that derivatives from numeric and analytical methods have way bigger difference when the absolute value of input x gets bigger. As a result, we need to use a smaller range for drawing the random inputs if we want to do verification through numeric methods on 16-bit floating point numbers. - The seeds for both tests are set to be fixed because the test could still be a bit flaky for check_numeric_gradient, most randomized tests run on my local machine passed. The failure cases all failed with a slightly bigger error than the tolerance, to reduce the occasional flaky behavior, I chose to fix the seed, or we can also get rid of check_numeric_gradient and just check the analytical gradients.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services