Jim Apple has posted comments on this change. ( http://gerrit.cloudera.org:8080/11194 )
Change subject: IMPALA-7426: Use Mann-Whitney U to compare benchmarks ...................................................................... Patch Set 2: (7 comments) http://gerrit.cloudera.org:8080/#/c/11194/2/tests/benchmark/report_benchmark_results.py File tests/benchmark/report_benchmark_results.py: http://gerrit.cloudera.org:8080/#/c/11194/2/tests/benchmark/report_benchmark_results.py@a568 PS2, Line 568: > is removing num_clients intentional? Yes. This information was the same in each row and was not used in calculating statistical significance. http://gerrit.cloudera.org:8080/#/c/11194/2/tests/benchmark/report_benchmark_results.py@129 PS2, Line 129: dest="max_percent_change_threshold", default=float("inf"), > Is the default change intentional? Yes. High variance tests can have large percent changes without being statistically significant. http://gerrit.cloudera.org:8080/#/c/11194/2/tests/benchmark/report_benchmark_results.py@394 PS2, Line 394: if not options.hive_results: : try: : save_runtime_diffs(results, ref_results, self.perf_change, self.zval, self.tval) : except Exception as e: : LOG.error('Could not generate an html diff: {0}'.format(e)) > This code used to never run because it was indented incorrectly? It looks o Correct - it was dead code, it appears http://gerrit.cloudera.org:8080/#/c/11194/2/tests/benchmark/report_benchmark_results.py@563 PS2, Line 563: 'Base StdDev(%)', 'Iters', 'Median Diff(%)', 'Zval', > Should this be "MW Z-Val" or something to indicate what kind of zvalue this Done http://gerrit.cloudera.org:8080/#/c/11194/2/tests/benchmark/report_benchmark_results.py@1037 PS2, Line 1037: prefix = ('reg' if zval >= 0 and tval >= > This is different than __check_perf_change_significance's implementation. D Well, only kind of - if the change was significant, then abs(zval) > zval_thresh or abs(tval) > tval_thresh. Only then do we print reg or imp or ???. Maybe I'm not understanding your question. http://gerrit.cloudera.org:8080/#/c/11194/2/tests/benchmark/report_benchmark_results.py@1037 PS2, Line 1037: prefix = ('reg' if zval >= 0 and tval >= : 0 else 'imp' if zval <= 0 and tval <= 0 else '???') > I think this would be more legible as multiple lines. (I have a deep hatred Done http://gerrit.cloudera.org:8080/#/c/11194/2/tests/util/calculation_util.py File tests/util/calculation_util.py: http://gerrit.cloudera.org:8080/#/c/11194/2/tests/util/calculation_util.py@74 PS2, Line 74: Calculates the Mann-Whitney U Test Z value for the given samples and reference. > Have you compared this to https://docs.scipy.org/doc/scipy/reference/genera The normal approximation will be less valid with smaller sample sizes, but I expect that to show up infrequently. If you look at the code for scipy.stats.mannwhitneyu, you see it computes z and then calls distributions.norm.sf https://github.com/scipy/scipy/blob/v1.1.0/scipy/stats/stats.py#L4836-L4925 which calls norm_cdf https://github.com/scipy/scipy/blob/v1.1.0/scipy/stats/_continuous_distns.py#L101 which calls ndtr https://github.com/scipy/scipy/blob/v1.1.0/scipy/stats/_continuous_distns.py#L90 which is just turning a z-value into a p-value: https://github.com/scipy/scipy/blob/v1.1.0/scipy/special/cephes/ndtr.c We could do the same here or with the t-value, but I think it's fine to leave them uninterpreted. -- To view, visit http://gerrit.cloudera.org:8080/11194 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8d6631ebeba1422b832def5cd68537624f672fa0 Gerrit-Change-Number: 11194 Gerrit-PatchSet: 2 Gerrit-Owner: Jim Apple <jbapple-imp...@apache.org> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Jim Apple <jbapple-imp...@apache.org> Gerrit-Reviewer: Philip Zeyliger <phi...@cloudera.com> Gerrit-Comment-Date: Sat, 01 Sep 2018 02:16:19 +0000 Gerrit-HasComments: Yes