[Impala-ASF-CR] IMPALA-7426: Use Mann-Whitney U to compare benchmarks

Philip Zeyliger (Code Review) Thu, 30 Aug 2018 09:42:00 -0700

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11194 )


Change subject: IMPALA-7426: Use Mann-Whitney U to compare benchmarks
......................................................................


Patch Set 2:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/11194/2/tests/benchmark/report_benchmark_results.py
File tests/benchmark/report_benchmark_results.py:

http://gerrit.cloudera.org:8080/#/c/11194/2/tests/benchmark/report_benchmark_results.py@a568
PS2, Line 568:
is removing num_clients intentional?


http://gerrit.cloudera.org:8080/#/c/11194/2/tests/benchmark/report_benchmark_results.py@129
PS2, Line 129:                  dest="max_percent_change_threshold", 
default=float("inf"),
Is the default change intentional?


http://gerrit.cloudera.org:8080/#/c/11194/2/tests/benchmark/report_benchmark_results.py@394
PS2, Line 394:       if not options.hive_results:
             :         try:
             :           save_runtime_diffs(results, ref_results, 
self.perf_change, self.zval, self.tval)
             :         except Exception as e:
             :           LOG.error('Could not generate an html diff: 
{0}'.format(e))
This code used to never run because it was indented incorrectly? It looks ok to 
me, but I'm not clear how it ever ran before?


http://gerrit.cloudera.org:8080/#/c/11194/2/tests/benchmark/report_benchmark_results.py@563
PS2, Line 563:                                      'Base StdDev(%)', 'Iters', 
'Median Diff(%)', 'Zval',
Should this be "MW Z-Val" or something to indicate what kind of zvalue this is?


http://gerrit.cloudera.org:8080/#/c/11194/2/tests/benchmark/report_benchmark_results.py@1037
PS2, Line 1037:     prefix = ('reg' if zval >= 0 and tval >=
This is different than __check_perf_change_significance's implementation. Does 
that make sense?


http://gerrit.cloudera.org:8080/#/c/11194/2/tests/benchmark/report_benchmark_results.py@1037
PS2, Line 1037:     prefix = ('reg' if zval >= 0 and tval >=
              :               0 else 'imp' if zval <= 0 and tval <= 0 else 
'???')
I think this would be more legible as multiple lines. (I have a deep hatred of 
python's non-ternary, but when it's nested, I can't make heads or tails of it.)


http://gerrit.cloudera.org:8080/#/c/11194/2/tests/util/calculation_util.py
File tests/util/calculation_util.py:

http://gerrit.cloudera.org:8080/#/c/11194/2/tests/util/calculation_util.py@74
PS2, Line 74:   Calculates the Mann-Whitney U Test Z value for the given 
samples and reference.
Have you compared this to 
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mannwhitneyu.html?
 Do we have a significant number of sample (at least 20 per that  page) to use 
this?

After installing python-scipy, I ran the following. Of note, the U calculation 
agrees (good). The scipy thing gives a p-value and we are returning a z-value 
and then taking its absolute value and looking at a threshold. Are we doing the 
right thing?

You might want to update the pydoc here to explain what to do with one of these 
values.



$cat test.py
import math

def calculate_mwu(samples, ref_samples):
  """
  Calculates the Mann-Whitney U Test Z value for the given samples and 
reference.
  """
  tag_a = [(s, 'A') for s in samples]
  tab_b = [(s, 'B') for s in ref_samples]
  ab = tag_a + tab_b
  ab.sort()
  # Assume no ties
  u = 0
  count_b = 0
  for v in ab:
    if v[1] == 'A':
      u += count_b
    else:
      count_b += 1
  # u is normally distributed with the following mean and standard deviation:
  mean = len(samples) * len(ref_samples) / 2.0
  stddev = math.sqrt(len(samples) * len(ref_samples) * (1 + len(ab)) / 12.0)
  print u, mean, stddev
  return (u - mean) / stddev

from scipy.stats import mannwhitneyu

a = range(30)
b = [ z + 0.5 for z in a ]

print mannwhitneyu(a, b)
print mannwhitneyu(a, b, use_continuity=False)
print mannwhitneyu(a, b, alternative="two-sided")
print mannwhitneyu(a, b, alternative="less")
print mannwhitneyu(a, b, alternative="less",use_continuity=False)
print mannwhitneyu(a, b, alternative="two-sided",use_continuity=False)

print calculate_mwu(a, b)
print 1 - abs(calculate_mwu(a, b))
09:28:24 pannier cmf [master !x?*] ~/src/cmf
$python test.py
MannwhitneyuResult(statistic=435.0, pvalue=0.8302552839111963)
MannwhitneyuResult(statistic=435.0, pvalue=0.82449575165477107)
MannwhitneyuResult(statistic=435.0, pvalue=0.8302552839111963)
MannwhitneyuResult(statistic=435.0, pvalue=0.41512764195559815)
MannwhitneyuResult(statistic=435.0, pvalue=0.41224787582738553)
MannwhitneyuResult(statistic=435.0, pvalue=0.82449575165477107)
435 450.0 67.6387462923
-0.221766381286
435 450.0 67.6387462923
0.778233618714



--
To view, visit http://gerrit.cloudera.org:8080/11194
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8d6631ebeba1422b832def5cd68537624f672fa0
Gerrit-Change-Number: 11194
Gerrit-PatchSet: 2
Gerrit-Owner: Jim Apple <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Jim Apple <[email protected]>
Gerrit-Reviewer: Philip Zeyliger <[email protected]>
Gerrit-Comment-Date: Thu, 30 Aug 2018 16:40:58 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7426: Use Mann-Whitney U to compare benchmarks

Reply via email to