On 20/11/14 13:42, George Kadianakis wrote: > "A. Johnson" <[email protected]> writes: > >>>>> George and I have been working on a small proposal to add two >>>>> hidden-service related statistics: number of hidden services and >>>>> total hidden-service traffic. >>>> >>>> Great, I’m starting to focus more on this project now. Well, >>>> actually I’m going on a trip for a week today, but *then* I’m >>>> focusing more on this project :-) >>> >>> Sounds great! We're meeting every Tuesday at 16:00 UTC in #tor-dev. >>> Feel free to drop by. >> >> Excellent. I won’t be there this coming Tuesday, but I’ll be there the next >> Tuesday. >> >>> Replicas mean that each descriptor is stored under two identifiers, so >>> that's two places. Further, descriptor identifiers change once per >>> day, so during a 24-hour period, there are up to four descriptor >>> identifiers for a hidden service. >> >> That makes sense. It would be nice if the statistics would allow you >> to identify how long (i.e. how many hour periods) each descriptor was >> observed being published. That would allow us to figure out if there >> are lots of short-lived services or fewer long-lived >> services. Publishing statistics every hour would pretty much take care >> of this. If you are really set on 24 hours, then perhaps you could add >> the total number of published descriptors in addition to the number of >> *unique* published descriptors. >> >> Also, my suggestion about using additive noise applies equally well to >> the descriptor statistics. And multiplicative noise is a *bad idea* if >> you don’t have some adjustment for small values (e.g. 10% noise of a 0 >> value is 0, and 10% of 1 is only 0.1). >> >>> We have been thinking about many more hidden-service related >>> statistics in a separate document. We're currently discussing whether >>> we should turn it into a tech report, because we'll probably not want >>> to implement most of those statistics. If you have remarks or more >>> ideas, please feel free to edit the document. We're going to have a >>> public review round for this, too, but that might not happen in the >>> next week or two. >>> >>> https://etherpad.wikimedia.org/p/hs_stats_78281091 >> >> Great! I think we should go for at least a little more data in the >> current proposal (what is the timeline for this, btw?). I think we >> should come up with a list of statistics we might imagine gathering >> and identify the subset of those that we’re comfortable gathering at >> this point. For example, I think failure statistics is much more >> innocuous than other data, and those would be very useful. For >> example, they would help us understand how to improve the protocol is >> failing, and it might help us identify misuse of hidden services >> (e.g. by botnets clients stupidly looking for non-existent descriptors >> or by malicious crawlers attempting to brute force descriptors). So >> here are some ideas: >> 1. Number of fetch requests for descriptors that don’t exist (number of >> fetch requests that do succeed would of course be very useful as well) >> 2. Number of descriptor publishes to the wrong HSDir (actually I suspect >> that the HSDir doesn’t check this and wants to be accepting of any publish) >> 3. Number of rendezvous circuits that never connect (from the RP >> perspective) >> 4. Number of rendezvous circuits on which no data cells are ever sent >> > > (CC'ed [tor-dev])
Thanks, George, for moving the discussion here. Here's the latest proposal draft where I incorporated Aaron's suggestions: https://gitweb.torproject.org/user/karsten/torspec.git/blob/refs/heads/hs_stats:/proposals/238-hs-relay-stats.txt If people on this list have more feedback, please reply here. Thanks! All the best, Karsten > Thanks for the input Aaron! > > The timeline here is that we are hoping the proposal _and_ the > implementation to be ready by mid-December. Then we are hoping that we > can deploy the code to a few relays so that we have some data by January. > > So, time is tight. > > I'm currently OK with the two statistics in: > https://people.torproject.org/~karsten/volatile/238-hs-relay-stats.txt > > I feel that any other statistics will need to be carefully analyzed. > We should add the ideas you mentioned in the etherpad, and get them > included in the tech report (which we are also hoping to have ready in > some form by mid-January). > > The tech report is supposed to contain and analyze most of the HS > statistics we can think of. It will likely contain many stats that we > will never do, but also some stats that might be a good idea. The good > ones we should eventually integrate to the Tor proposal and write code > for. > >>> Thanks for the very valuable input! Let me know if the following >>> draft looks okay, and I'll start another thread on tor-dev@. >>> >>> https://people.torproject.org/~karsten/volatile/238-hs-relay-stats-2014-11-20.txt >> >> "Lab(\epsilon/C)” -> "Lap(\epsilon/C)” (that was my mistake. I think >> having the added noise both parameterized and included in the reported >> statistics is an idea worth thinking about. Making it a parameter >> allows you to easily change it without upgrading. Including it in the >> statistics would allow us to correct better for noise if different >> relays might be adding different amounts of noise due to inconsistent >> opinions of the noise parameter (if this should never happen, then I >> guess this wouldn’t be necessary). >> >> So again, sorry that I’m not going to be very responsive on this for the >> next week. I’m really happy that you’re working on it! >> >> Best, >> Aaron > _______________________________________________ tor-dev mailing list [email protected] https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
