#23367: Onion address counts ignore descriptor upload overlap --------------------------------+------------------------------ Reporter: teor | Owner: metrics-team Type: defect | Status: needs_review Priority: Medium | Milestone: Component: Metrics/Statistics | Version: Severity: Normal | Resolution: Keywords: | Actual Points: Parent ID: #23126 | Points: Reviewer: | Sponsor: --------------------------------+------------------------------ Changes (by karsten):
* status: new => needs_review * keywords: metrics-2018 => Comment: Finally, I got it. (I didn't think the whole 2 years about this, but when I started looking at this ticket again this morning it took me a while to understand the bug...) The situation is slightly different from your description, because statistics are not collected from 00:00 UTC but from whenever a relay starts collecting them. Your general statement that we're accounting for descriptor upload overlap wrong is correct, though. My current thought is to document this inaccuracy rather than changing the code. It's a known inaccuracy of roughly 1/24 = 4.2% of absolute numbers. But it doesn't affect relative changes over time. I don't think that changing the code and reprocessing the statistics is worth the effort, also regarding explaining why the numbers have changed now. Here's how we could document this on the [https://metrics.torproject.org /reproducible-metrics.html#onion-services Reproducible Metrics] page: ''As an approximation, we assume that an onion service publishes its descriptor to twelve directories over a 24-hour period: the service stores two replicas per descriptor using different descriptor identifiers, both descriptor replicas get stored to three different onion-service directories each, and the service changes descriptor identifiers once every 24 hours which leads to two different descriptor identifiers per replica.'' ''To be clear, this approximation is not entirely accurate. For example, '''the descriptors of roughly 1/24 of services are seen by 3 rather than 2 sets of onion-service directories, when a service changes descriptor identifiers once at the beginning of a relay's statistics interval and once again towards the end. In some cases,''' the two replicas or the descriptors with changed descriptor identifiers could have been stored to the same directory. As another example, onion-service directories might have joined or left the network and other directories might have become responsible for storing a descriptor which also include that .onion address in their statistics. However, for the subsequent analysis, we assume that neither of these cases affects results substantially.'' What do you think about this change? I also agree that we should keep this in mind when we work on v3 stats. We should keep this ticket open, turn it into an enhancement, and update the summary a bit to make it clear that the remaining work is just for v3. -- Ticket URL: <https://trac.torproject.org/projects/tor/ticket/23367#comment:7> Tor Bug Tracker & Wiki <https://trac.torproject.org/> The Tor Project: anonymity online
_______________________________________________ tor-bugs mailing list tor-bugs@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs