#26035: Streamline sample quantile types used in the various modules --------------------------------+--------------------------- Reporter: karsten | Owner: iwakeh Type: enhancement | Status: accepted Priority: Medium | Milestone: Component: Metrics/Statistics | Version: Severity: Normal | Resolution: Keywords: | Actual Points: Parent ID: | Points: Reviewer: | Sponsor: Sponsor13 --------------------------------+---------------------------
Comment (by karsten): Thanks, very useful! Let me first try to answer the open questions: - What's up with a) and c) using slightly different percentile implementations? The reason is that we're including the 0th (minimum) and 100th percentile (maximum) in a) which we're not in c). It's totally possible that what we're using right now for a) is a terrible hack. Maybe we should instead use the formula for c) in a) and handle percentile 0 or 100 as a special case. Whatever the other implementations do. - What's up with e) and f) not being quartiles? What we're doing there is that we're computing the ''weighted'' quartiles. And again, it might be that it's a hack that we should rewrite. The goal should be to implement a weighted trimmed mean. The technical report probably has a better definition. What we cannot do, though, is use the exact same percentile definition as we're using for the other places. - I think you left out the Python code that is our current censorship detector. Which is fine, as I see how we could change that code to match what we're doing elsewhere. So, I guess the decision we need to make is whether we want to use R-1 or R-7 everywhere, right? I'm slightly leaning towards R-7 here. One reason is that, if we used R-1, we couldn't use R's default `median()` anymore, because that interpolates. I found a non-interpolating median implementation in Python, called [https://docs.python.org/3/library/statistics.html#statistics.median_low median_low] (or median_high). And I think the Tor daemon uses a low median for some things related to directory authority voting. But I believe it's not the standard. So, if we use R-7, we should have good tool support. Except for Java where we'd have to implement something ourselves, which would also have to handle special cases 0 and 100. By the way, do you feel strongly about avoiding Apache Commons Math? We'd only have to add it to metrics-web, and it would save us half a day of writing code and testing it. After all, we also rely on libraries for things like base64 encoding, which is not rocket science to implement ourselves. We wouldn't have to add it to the metrics-web .war file! P.S.: Did I write something about trucks? I meant insect legs! Unless those have a spare leg mounted somewhere, too, in which case I'll think even harder about a good example. ;) -- Ticket URL: <https://trac.torproject.org/projects/tor/ticket/26035#comment:5> Tor Bug Tracker & Wiki <https://trac.torproject.org/> The Tor Project: anonymity online
_______________________________________________ tor-bugs mailing list tor-bugs@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs