Re: [tor-bugs] #32126 [Metrics/Ideas]: Add OONI's Vanilla Tor measurement data to Tor Metrics

2020-02-03 Thread Tor Bug Tracker & Wiki
#32126: Add OONI's Vanilla Tor measurement data to Tor Metrics
---+--
 Reporter:  karsten|  Owner:  metrics-team
 Type:  enhancement| Status:  new
 Priority:  Medium |  Milestone:
Component:  Metrics/Ideas  |Version:
 Severity:  Normal | Resolution:
 Keywords: |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+--

Comment (by phw):

 Thanks for starting this discussion, Karsten. Your analysis is very
 insightful and already helped me
 [https://dip.torproject.org/phw/bridgestrap/blob/master/tor.go#L19 pick a
 threshold for bridgestrap].

 Replying to [comment:1 karsten]:
 > The next step here is to discuss '''what''' results we want to add to
 Tor Metrics. Are these graphs useful, or is there something potentially
 more interesting in the data that we want to have? I'm hoping for input
 from other teams here.
 >
 > All graphs above are ECDFs, unlike other graphs on Tor Metrics. This is
 a smaller issue on the graphing side, because we need to process non-
 aggregated measurements for making a graph. It's also a possible issue on
 the usability side, because ECDFs are probably harder to understand than
 time plots.
 [[br]]
 From an anti-censorship point of view, I find your third graph – the
 bootstrap times broken down by percentages and countries – very useful. It
 gives us a good idea of where Tor (does not) work and the bootstrap phases
 provide a hint of how a block is implemented. It would also be great if
 one could provide arbitrary date ranges to explore this graph, just like
 it's currently possible for most metrics.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #32126 [Metrics/Ideas]: Add OONI's Vanilla Tor measurement data to Tor Metrics

2019-11-21 Thread Tor Bug Tracker & Wiki
#32126: Add OONI's Vanilla Tor measurement data to Tor Metrics
---+--
 Reporter:  karsten|  Owner:  metrics-team
 Type:  enhancement| Status:  new
 Priority:  Medium |  Milestone:
Component:  Metrics/Ideas  |Version:
 Severity:  Normal | Resolution:
 Keywords: |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+--

Comment (by hellais):

 @irl pointed me to this ticket on the OONI side ticket for things related
 to this here: https://github.com/ooni/pipeline/issues/13
 > The next step after answering the questions above is to figure out how
 we'd get the data for these new graphs. Some thoughts:
 > Maintaining our own copy of the OONI metadata database, like I did for
 this analysis, isn't feasible. We only need a small fraction of ~40G of
 this database which currently has a total size of 696G. Also, cloning this
 database took way too long for us to do it once per day.

 I would like to better understand this point and which aspect of it is not
 feasible.

 If you setup the MetaDB following the instructions here:
 https://github.com/ooni/sysadmin/blob/master/docs/metadb-sharing.md, it
 will be configured as a read-only replica which will **automatically
 sync** as soon as we write new changes to the OONI MetaDb.

 That is to say that there is no need to do a clone once per day, you just
 set it up once and then it will automatically sync every time.

 Did you eventually manage to set it up? What are you thoughts about the
 schema of the vanilla_tor tables, are they adequate?

 The MetaDB is the main way we are encouraging people to integrate OONI
 data for batch analysis and we already have several users of it. I would
 like to try to better understand what are the limitations and issues you
 are facing so we can try to best address them.

 > We might be able to maintain a copy of the .yaml files of vanilla_tor
 measurements only. We would sync these once or twice per day and serve
 them with CollecTor. We'd have to define our own database schema for
 importing and aggregating them. This is not a small project and not a
 small commitment.

 I think this approach is highly sub-optimal as it will require you
 duplicating the code we are already writing for parsing the OONI dataset.

 > A while ago we were hoping to get a .csv file from OONI with just the
 data we need. For example, the .csv file behind the three graphs above is
 150M large, though it could easily be reduced to 75M, uncompressed. Maybe
 we'd have to define precisely what data we want (the discussion above) and
 then write the database query for it. This would be the smallest project
 and commitment from our side; in other words, it would be most likely to
 happen soon.

 This is also a possibility if you give us an idea of what queries you need
 to run exactly.

 We already have some private API endpoints to support the usage of this
 data in OONI Explorer:
 https://github.com/ooni/api/blob/master/measurements/api/private.py#L638,
 though these are not really means to be consumed externally and may be
 subject to change.

 The MetaDB sync is the option that would be preferable for us.

 > A possible variant of the ideas above would be that we operate on a
 read-only copy of the metadata database where we can define views, run
 queries, and export results as .csv files.

 This is also a possibility, though I would like to better first understand
 what are the issues or limitations you are having in accessing the MetaDB.

 Keep in mind we are currently facing a lot of challenges in scaling up the
 MetaDB to support the increased usage of OONI Explorer and hence we are
 trying, when possible, to move people away from using our instance and
 setting up their own especially for batch analysis needs.

 If we do go this route we may setup a separate read-only replica just to
 be used by external consumers of the data.

 Once we get a better idea of what are your needs (an example query would
 be very useful) and if the vanilla_tor table is adequate we can maybe see
 if we can also do some sort of csv export of the data.

 For information on the schema of the vanilla_tor table see:
 https://github.com/ooni/pipeline/blob/master/af/shovel/centrifugation.py#L1605
 https://github.com/ooni/pipeline/blob/master/af/oometa/006-vanilla-
 tor.install.sql#L7

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #32126 [Metrics/Ideas]: Add OONI's Vanilla Tor measurement data to Tor Metrics

2019-10-17 Thread Tor Bug Tracker & Wiki
#32126: Add OONI's Vanilla Tor measurement data to Tor Metrics
---+--
 Reporter:  karsten|  Owner:  metrics-team
 Type:  enhancement| Status:  new
 Priority:  Medium |  Milestone:
Component:  Metrics/Ideas  |Version:
 Severity:  Normal | Resolution:
 Keywords: |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+--
Changes (by gaba):

 * cc: gaba (added)


--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #32126 [Metrics/Ideas]: Add OONI's Vanilla Tor measurement data to Tor Metrics

2019-10-17 Thread Tor Bug Tracker & Wiki
#32126: Add OONI's Vanilla Tor measurement data to Tor Metrics
---+--
 Reporter:  karsten|  Owner:  metrics-team
 Type:  enhancement| Status:  new
 Priority:  Medium |  Milestone:
Component:  Metrics/Ideas  |Version:
 Severity:  Normal | Resolution:
 Keywords: |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+--
Changes (by karsten):

 * cc: phw (added)


--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #32126 [Metrics/Ideas]: Add OONI's Vanilla Tor measurement data to Tor Metrics

2019-10-17 Thread Tor Bug Tracker & Wiki
#32126: Add OONI's Vanilla Tor measurement data to Tor Metrics
---+--
 Reporter:  karsten|  Owner:  metrics-team
 Type:  enhancement| Status:  new
 Priority:  Medium |  Milestone:
Component:  Metrics/Ideas  |Version:
 Severity:  Normal | Resolution:
 Keywords: |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+--

Comment (by karsten):

 Here are the results from my analysis in the past few days:

 [[Image(vanilla-tor-bootstrap-2019-10-17-a.png, 700px)]]

 This first graph shows all measured times until 100% bootstrapped between
 2016-01 and 2019-10. Some observations:

  - 50% of measurements were done in under 15 seconds, and roughly 90%
 finished in under 1 minute.
  - There's a bump shortly after 120 seconds, which is most likely the
 result of a 120 second timeout somewhere in the process.
  - A few percent of measurements did not succeed within the test timeout
 of 300 seconds: the line is not at 100% at the 300 seconds mark but
 roughly at 97%.

 [[Image(vanilla-tor-bootstrap-2019-10-17-b.png, 700px)]]

 The second graph shows different stages of the bootstrap process. Again
 some observations:

  - It's not entirely clear (to me) why 0% bootstrapped is not just a
 vertical line at the 0 s mark. If it requires work to get to 0%, it's not
 0% but rather 2%, 1%, or 0.5% of the process. Maybe a naming issue,
 possibly a measurement issue. At least all measurements succeed at
 bootstrapping to 0% within the test time.
  - The 20% line has a small bump right after 120 s, so there must be a 120
 s timeout for this early bootstrap phase. There's another bump at roughly
 130 s which could be due to the same 120 s timeout that was started later.
  - The 80% and 100% line are almost the same. If a client makes it to 80%,
 it's just a matter of seconds to get to 100%.

 [[Image(vanilla-tor-bootstrap-2019-10-17-c.png, 700px)]]

 The third graph shows the same data broken down by country for the slowest
 5 countries. Observations:

  - Most measurements in China and Egypt did not proceed past the 0%
 bootstrapped point.
  - Almost none of the Kazakhstan succeeded, even fewer than in China and
 Egypt. The 20% bootstrapped line looks really funny, starting to increase
 only after full 2 minutes. Maybe these measurements would succeed after 10
 or 20 minutes, which is something we won't find out from this data.
  - Belarus has two visible bumps shortly after 2 and 4 minutes. I would
 guess that there'd be more bumps after 6 and 8 and 10 minutes. Maybe this
 is related to some subset of relays not being reachable.
  - Turkey has roughly 1/4 of measurements not succeeding, with the
 remaining ones looking slow-but-okay. The reason might be that we're
 looking at almost 3 years of measurements here, and maybe bootstrapping
 succeeded in 75% of the time and did not succeed in 25% of the time.

 The next step here is to discuss '''what''' results we want to add to Tor
 Metrics. Are these graphs useful, or is there something potentially more
 interesting in the data that we want to have? I'm hoping for input from
 other teams here.

 All graphs above are ECDFs, unlike other graphs on Tor Metrics. This is a
 smaller issue on the graphing side, because we need to process non-
 aggregated measurements for making a graph. It's also a possible issue on
 the usability side, because ECDFs are probably harder to understand than
 time plots.

 The next step after answering the questions above is to figure out
 '''how''' we'd get the data for these new graphs. Some thoughts:
  - Maintaining our own copy of the OONI metadata database, like I did for
 this analysis, isn't feasible. We only need a small fraction of ~40G of
 this database which currently has a total size of 696G. Also, cloning this
 database took way too long for us to do it once per day.
  - We might be able to maintain a copy of the .yaml files of vanilla_tor
 measurements only. We would sync these once or twice per day and serve
 them with CollecTor. We'd have to define our own database schema for
 importing and aggregating them. This is not a small project and not a
 small commitment.
  - A while ago we were hoping to get a .csv file from OONI with just the
 data we need. For example, the .csv file behind the three graphs above is
 150M large, though it could easily be reduced to 75M, uncompressed. Maybe
 we'd have to define precisely what data we want (the discussion above) and
 then write the database query for it. This would be the smallest project
 and commitment from our side; in other words, it would be most likely to
 happen soon.
  - A possible variant of the ideas above would be that we operate on a
 read-only copy of the metadata database where we can define views, run
 queries, and export results as .csv files.


Re: [tor-bugs] #32126 [Metrics/Ideas]: Add OONI's Vanilla Tor measurement data to Tor Metrics

2019-10-17 Thread Tor Bug Tracker & Wiki
#32126: Add OONI's Vanilla Tor measurement data to Tor Metrics
---+--
 Reporter:  karsten|  Owner:  metrics-team
 Type:  enhancement| Status:  new
 Priority:  Medium |  Milestone:
Component:  Metrics/Ideas  |Version:
 Severity:  Normal | Resolution:
 Keywords: |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+--
Changes (by karsten):

 * Attachment "vanilla-tor-bootstrap-2019-10-17-c.png" added.


--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #32126 [Metrics/Ideas]: Add OONI's Vanilla Tor measurement data to Tor Metrics

2019-10-17 Thread Tor Bug Tracker & Wiki
#32126: Add OONI's Vanilla Tor measurement data to Tor Metrics
---+--
 Reporter:  karsten|  Owner:  metrics-team
 Type:  enhancement| Status:  new
 Priority:  Medium |  Milestone:
Component:  Metrics/Ideas  |Version:
 Severity:  Normal | Resolution:
 Keywords: |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+--
Changes (by karsten):

 * Attachment "vanilla-tor-bootstrap-2019-10-17-b.png" added.


--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

Re: [tor-bugs] #32126 [Metrics/Ideas]: Add OONI's Vanilla Tor measurement data to Tor Metrics

2019-10-17 Thread Tor Bug Tracker & Wiki
#32126: Add OONI's Vanilla Tor measurement data to Tor Metrics
---+--
 Reporter:  karsten|  Owner:  metrics-team
 Type:  enhancement| Status:  new
 Priority:  Medium |  Milestone:
Component:  Metrics/Ideas  |Version:
 Severity:  Normal | Resolution:
 Keywords: |  Actual Points:
Parent ID: | Points:
 Reviewer: |Sponsor:
---+--
Changes (by karsten):

 * Attachment "vanilla-tor-bootstrap-2019-10-17-a.png" added.


--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs

[tor-bugs] #32126 [Metrics/Ideas]: Add OONI's Vanilla Tor measurement data to Tor Metrics

2019-10-17 Thread Tor Bug Tracker & Wiki
#32126: Add OONI's Vanilla Tor measurement data to Tor Metrics
---+--
 Reporter:  karsten|  Owner:  metrics-team
 Type:  enhancement| Status:  new
 Priority:  Medium |  Milestone:
Component:  Metrics/Ideas  |Version:
 Severity:  Normal |   Keywords:
Actual Points: |  Parent ID:
   Points: |   Reviewer:
  Sponsor: |
---+--
 OONI has a test called [https://ooni.torproject.org/nettest/vanilla-tor/
 Vanilla Tor] which ''"attempts to start a connection to the Tor network.
 If the test successfully bootstraps a connection within a predefined
 amount of seconds (300 by default), then Tor is considered to be reachable
 from the vantage point of the user. But if the test does ''not'' manage to
 establish a connection, then the Tor network is likely blocked within the
 tested network."''

 We should get these measurements into Tor Metrics.

 I spent the last couple days on downloading a copy of the OONI metadata
 database and extracting useful data from it. I'll add some results after
 the lunch break.

--
Ticket URL: 
Tor Bug Tracker & Wiki 
The Tor Project: anonymity online
___
tor-bugs mailing list
tor-bugs@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-bugs