#33076: Graph onionperf and consensus information from Rob's experiments -------------------------------------------------+------------------------- Reporter: mikeperry | Owner: | metrics-team Type: task | Status: | needs_review Priority: Medium | Milestone: Component: Metrics/Analysis | Version: Severity: Normal | Resolution: Keywords: metrics-team-roadmap-2020Q1, sbws- | Actual Points: 3 roadmap | Parent ID: #33121 | Points: 6 Reviewer: | Sponsor: -------------------------------------------------+-------------------------
Comment (by dennis.jackson): Replying to [comment:24 karsten]: == 24 Hour Moving Average > I like your percentiles graph with the moving 24 hour window. We should include that graph type in our candidate list for graphs to be added to OnionPerf's visualization mode. Is that moving 24 hour window a standard visualization, or did you further process the data I gave you? At a high level: I'm loading the data into Pandas and then using the `rolling` function to compute statistics for a window. It's pretty flexible supports different weighting strategies for the window, but I used 'uniform' here. The code is contained in the python notebook I linked at the end of my post. Excerpt: {{{ time_period = 60*60*24 threshold = 10 p95 = lambda x : x.rolling(f'{time_period}s',min_periods=threshold).dl.quantile(0.95) }}} The resulting data can be plotted as a time series in your graphing library of choice :). == Measuring Latency > Regarding the dataset behind bandwidth measurements, I wonder if we should kill the 50 KiB downloads in deployed OnionPerfs and only keep the 1 MiB and 5 MiB downloads. If we later think that we need time-to-50KiB, we can always obtain that from the tgen logs. The main change would be that OnionPerfs consume more bandwidth and also put more load on the Tor network. The effect for graphs like these would be that we'd have 5 times as many measurements. I think that is definitely worth thinking about as 50 KB does seem too small to infer anything about bandwidth. It is maybe worth considering the cost of circuit construction though. For example, if we open a circuit for latency measurement, we could use Arthur's strategy of fetching HEAD only and maybe it is worth using that circuit for a series of measurements over a couple of minutes which would give us more reliable "point in time" data without any additional circuit construction overhead. == August Measurement Success Rate > But I think (and hope) that you're wrong about measurements not having finished. If DATAPERC100 is non-null that actually means that the measurement reached the point where it received 100% of expected bytes. See also the [https://metrics.torproject.org/collector.html#type-torperf Torperf and OnionPerf Measurement Results data format description]. You are quite right! I looked back at my code and whilst I was correctly checking DATAPERC100 is non-null to imply success, I also found a trailing `}` which captured my check in the wrong `if` clause. My bad! Rerunning with the fix shows only 29 measurements failed to finish in August. Much much healthier! == Number of Measurements in August > Are you sure about that 10k ttfb measurements number for the month of August? In theory, every OnionPerf instance should make a new measurement every 5 minutes. That's 12*24*31 = 8928 measurements per instance in August, or 8928*4 = 35712 measurements performed by all four instances in August. So, okay, not quite 10k, but also not that many more. We should spin up more OnionPerf instances as soon as it has become easier to operate them. Sorry, this was sloppy and incorrect wording on my part: "month of August" -> "Experimental period from August 4th - August 19th". There are 15k attempted measurements in this window, however op-hk did not achieve any successful connections and consequently only ~10k successful measurements in the dataset. == How many is enough? > What's a good number to keep running continuously, in your opinion? 10? 20? And maybe we should consider deploying more than 1 instance per host or data center, so that we have more measurements with comparable network properties. I think it would be worth pulling Mike (congestion related) and the network health team (#33178) in and thinking about this in terms of output statistics rather than measurements input. Possible Example: * For a given X `{minute,hour,day}` period, we want to measure for `{any circuit, circuits using this guard, circuits using this exit}`, `{probability of time out, p5-p50-p95 latency, p5-p50-95 bandwidth}` with a 90% confidence interval less than `{1%, 500ms, 500 KB/s}` This gives us a rolling target in terms of measurements we want to make, varying on network conditions and how fine grained we would like the statistics to be for a given time period. We could estimate the number of samples required (using the existing datasets) for each of these statistics, put in the cost per measurement and work out what is feasible for long term monitoring and short term experiments. -- Ticket URL: <https://trac.torproject.org/projects/tor/ticket/33076#comment:25> Tor Bug Tracker & Wiki <https://trac.torproject.org/> The Tor Project: anonymity online
_______________________________________________ tor-bugs mailing list tor-bugs@lists.torproject.org https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs