#34023: Reduce the number of 50 KiB downloads -------------------------------+------------------------------ Reporter: karsten | Owner: metrics-team Type: enhancement | Status: needs_review Priority: Medium | Milestone: Component: Metrics/Onionperf | Version: Severity: Normal | Resolution: Keywords: | Actual Points: Parent ID: | Points: Reviewer: | Sponsor: -------------------------------+------------------------------
Comment (by karsten): Replying to [comment:2 robgjansen]: > **If we remove 50KiB, also remove 1Mib?** > > My main concern isn't the added load on the network, but rather that you are removing a metric that we have consistently used as a benchmark for the last ~decade. It's useful to be able to compare against a consistent benchmark over time. > > I notice from [https://trac.torproject.org/projects/tor/ticket/33076 #33076] the suggestion that we could use the 1MiB and/or 5MiB results to compute the 50KiB times (using the incremental DATAPERC timestamps). That seems reasonable to me. Following that logic though, why not remove the 1MiB file too? I think both the 50KiB and the 1MiB times could be computed from the 5MiB results, since we have incremental timestamps for every 10% of the download. That's a good point. Here's the math for that suggestion: - With only 5 MiB downloads we'd be downloading on average 5 MiB = 5120 KiB every 5 minutes, or 5120 * 8 * 1024 / (300 * 1000) = 140 kbps. And yes, we do have some code somewhere to compute partial completion timestamps from 5 MiB downloads. > **Reasons to keep all of the files.** > > I think there are two strong reasons to keep all 3 file sizes: > 1. You can specify a different timeout for each of the 3 sizes. That let's you cancel the smaller files much sooner if they are hanging. And if the timeouts are set realistically, it helps you get a better sense of how often we fail to meet a target completion time. In theory, we could retroactively apply timeouts by pretending that a partial 50 KiB or 1 MiB download taking longer than `x` would have timed out. > 1. Diversity of circuits. If you follow the suggestion above and remove 50KiB and 1MiB and only keep 5MiB, and then you get a crappy circuit, data points for all 3 download times will be affected. Previously that only affected one data point. We wouldn't change the frequency of making downloads. We would just extract more than one time-to-last-byte timestamp from a given measurement. But I see how we would want to document this very clearly to help our users interpret our data. > **Adjust download weights instead?** > > If you would like more data points for 1MiB and 5MiB files and fewer for 50KiB, have you considered adjusting the weights that are used in the TGen model file instead of completely removing a file size? [https://gitweb.torproject.org/onionperf.git/tree/onionperf/model.py#n90 The weights are specified here.] For example, if you want all file sizes to have equal download probabilities, set the weight for each file size to `1.0`.) We did consider this to avoid increasing load on the measurement hosts and network too much but then figured we can kill these downloads altogether. But you raise some important points above that require more attention before killing 50 KiB downloads entirely. New plan: use a weight of 1.0 for all three download sizes until we figure out how to kill 50 KiB and 1 MiB downloads. -- Ticket URL: <https://trac.torproject.org/projects/tor/ticket/34023#comment:3> Tor Bug Tracker & Wiki <https://trac.torproject.org/> The Tor Project: anonymity online
_______________________________________________ tor-bugs mailing list tor-bugs@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs