#26022: Fix a flaw in the noise-removing code in our onion service statistics
--------------------------------+------------------------------
 Reporter:  karsten             |          Owner:  metrics-team
     Type:  defect              |         Status:  needs_review
 Priority:  Medium              |      Milestone:
Component:  Metrics/Statistics  |        Version:
 Severity:  Normal              |     Resolution:
 Keywords:                      |  Actual Points:
Parent ID:                      |         Points:
 Reviewer:                      |        Sponsor:
--------------------------------+------------------------------

Comment (by karsten):

 Replying to [comment:7 amj703]:
 > Hey Karsten,

 Hey Aaron,

 > (My trac pseudonym is amj703 instead of ohmygodel. I've changed the cc).

 Oops, my bad!

 Thanks for your very quick response, after learning about this issue! :)

 > I agree you did find a bug in how the noisy numbers are adjusted. The
 change from integer division to floorDiv seems right to me.

 Great, I will fix that then.

 > One thing you might also consider doing to improve handling negative
 values is to disallow them (i.e. round up to zero). This could be done by
 the relay reporting its number. We probably discussed this option, and
 maybe the reason it wasn't chosen is that it slightly biases counts so
 their expected value isn't the true value (or at least the true rounded
 value).

 Hmm, I believe that disallowing negative values by rounding them up to
 zero would bias the result unnecessarily. For example, imagine an extreme
 case where we picked a large `bin_size` value and an even larger `delta_f`
 value: most relays would observe values between 0 and `bin_size`, and they
 would add very large positive or negative noise values. In such a case we
 need both negative and positive values to come up with a result close to
 0.

 > If that's the case, and negative values are allowed to prevent biasing,
 we should recognize that (1) values are already being biased because of
 the rounding (for which we minimize the worst-case bin by adding
 (-binSize/2) at the end), and (2) adding (-binSize/2) actually makes the
 bias worse for negative bin values.

 Hmm, I'm not so sure about this one, either. Remember that we implemented
 the code at relays to report the ''right side'' of a bin, not the
 ''center''. Take another example where a relay observes exactly 0 events
 over two days: on day 1 it adds a small positive noise value and reports
 `bin_size` as number, and on day 2 it adds a small negative noise value
 and reports 0 as number. If we subtract `bin_size / 2` from those reported
 values, we'll end up with 0 on average, which would be correct. But if we
 didn't, we'd end up with `bin_size / 2` as average, which seems wrong.

 Does this make sense? If so, I think I'd prefer to leave the general
 formula to remove noise unchanged and only focus on fixing the
 implementation bug related to integer truncation.

 Thanks again!

--
Ticket URL: <https://trac.torproject.org/projects/tor/ticket/26022#comment:9>
Tor Bug Tracker & Wiki <https://trac.torproject.org/>
The Tor Project: anonymity online
_______________________________________________
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Reply via email to