Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/11194 )
Change subject: IMPALA-7426: Use Mann-Whitney U to compare benchmarks ...................................................................... Patch Set 2: (7 comments) http://gerrit.cloudera.org:8080/#/c/11194/2/tests/benchmark/report_benchmark_results.py File tests/benchmark/report_benchmark_results.py: http://gerrit.cloudera.org:8080/#/c/11194/2/tests/benchmark/report_benchmark_results.py@a568 PS2, Line 568: is removing num_clients intentional? http://gerrit.cloudera.org:8080/#/c/11194/2/tests/benchmark/report_benchmark_results.py@129 PS2, Line 129: dest="max_percent_change_threshold", default=float("inf"), Is the default change intentional? http://gerrit.cloudera.org:8080/#/c/11194/2/tests/benchmark/report_benchmark_results.py@394 PS2, Line 394: if not options.hive_results: : try: : save_runtime_diffs(results, ref_results, self.perf_change, self.zval, self.tval) : except Exception as e: : LOG.error('Could not generate an html diff: {0}'.format(e)) This code used to never run because it was indented incorrectly? It looks ok to me, but I'm not clear how it ever ran before? http://gerrit.cloudera.org:8080/#/c/11194/2/tests/benchmark/report_benchmark_results.py@563 PS2, Line 563: 'Base StdDev(%)', 'Iters', 'Median Diff(%)', 'Zval', Should this be "MW Z-Val" or something to indicate what kind of zvalue this is? http://gerrit.cloudera.org:8080/#/c/11194/2/tests/benchmark/report_benchmark_results.py@1037 PS2, Line 1037: prefix = ('reg' if zval >= 0 and tval >= This is different than __check_perf_change_significance's implementation. Does that make sense? http://gerrit.cloudera.org:8080/#/c/11194/2/tests/benchmark/report_benchmark_results.py@1037 PS2, Line 1037: prefix = ('reg' if zval >= 0 and tval >= : 0 else 'imp' if zval <= 0 and tval <= 0 else '???') I think this would be more legible as multiple lines. (I have a deep hatred of python's non-ternary, but when it's nested, I can't make heads or tails of it.) http://gerrit.cloudera.org:8080/#/c/11194/2/tests/util/calculation_util.py File tests/util/calculation_util.py: http://gerrit.cloudera.org:8080/#/c/11194/2/tests/util/calculation_util.py@74 PS2, Line 74: Calculates the Mann-Whitney U Test Z value for the given samples and reference. Have you compared this to https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mannwhitneyu.html? Do we have a significant number of sample (at least 20 per that page) to use this? After installing python-scipy, I ran the following. Of note, the U calculation agrees (good). The scipy thing gives a p-value and we are returning a z-value and then taking its absolute value and looking at a threshold. Are we doing the right thing? You might want to update the pydoc here to explain what to do with one of these values. $cat test.py import math def calculate_mwu(samples, ref_samples): """ Calculates the Mann-Whitney U Test Z value for the given samples and reference. """ tag_a = [(s, 'A') for s in samples] tab_b = [(s, 'B') for s in ref_samples] ab = tag_a + tab_b ab.sort() # Assume no ties u = 0 count_b = 0 for v in ab: if v[1] == 'A': u += count_b else: count_b += 1 # u is normally distributed with the following mean and standard deviation: mean = len(samples) * len(ref_samples) / 2.0 stddev = math.sqrt(len(samples) * len(ref_samples) * (1 + len(ab)) / 12.0) print u, mean, stddev return (u - mean) / stddev from scipy.stats import mannwhitneyu a = range(30) b = [ z + 0.5 for z in a ] print mannwhitneyu(a, b) print mannwhitneyu(a, b, use_continuity=False) print mannwhitneyu(a, b, alternative="two-sided") print mannwhitneyu(a, b, alternative="less") print mannwhitneyu(a, b, alternative="less",use_continuity=False) print mannwhitneyu(a, b, alternative="two-sided",use_continuity=False) print calculate_mwu(a, b) print 1 - abs(calculate_mwu(a, b)) 09:28:24 pannier cmf [master !x?*] ~/src/cmf $python test.py MannwhitneyuResult(statistic=435.0, pvalue=0.8302552839111963) MannwhitneyuResult(statistic=435.0, pvalue=0.82449575165477107) MannwhitneyuResult(statistic=435.0, pvalue=0.8302552839111963) MannwhitneyuResult(statistic=435.0, pvalue=0.41512764195559815) MannwhitneyuResult(statistic=435.0, pvalue=0.41224787582738553) MannwhitneyuResult(statistic=435.0, pvalue=0.82449575165477107) 435 450.0 67.6387462923 -0.221766381286 435 450.0 67.6387462923 0.778233618714 -- To view, visit http://gerrit.cloudera.org:8080/11194 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8d6631ebeba1422b832def5cd68537624f672fa0 Gerrit-Change-Number: 11194 Gerrit-PatchSet: 2 Gerrit-Owner: Jim Apple <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Jim Apple <[email protected]> Gerrit-Reviewer: Philip Zeyliger <[email protected]> Gerrit-Comment-Date: Thu, 30 Aug 2018 16:40:58 +0000 Gerrit-HasComments: Yes
