Re: [tor-dev] Tor Browser downloads and updates graphs

2016-10-09 Thread Karsten Loesing
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On 22/09/16 01:48, Aaron Johnson wrote:

Oops, this thread got lost in the Seattle preparations and only
surfaced today while doing some housekeeping.  Please find my response
below.

>> Log files are sorted as part of the sanitizing procedure, so that
>>  request order should not be preserved.  If you find a log file
>> that is not sorted, please let us know, because that would be a
>> bug.
> 
> That’s great! It just appeared ordered in that multiple related 
> requests appeared in sequence, but I see that sorting can have
> that effect too.

Okay, glad you didn't find a bug there.

>>> 2. The size of the response is included, which potentially 
>>> allows an adversary observing the client side to perform a 
>>> correlation attack (combined with #1 above). This could allow
>>> the adversary to learn interesting things like (i) this person
>>> is downloading arm and thus is probably running a relay or (ii)
>>> this person is creating Trac tickets with onion-service bugs
>>> and is likely running an onion service somewhere (or is Trac
>>> excluded from these logs?). The size could also be used as an 
>>> time-stamping mechanism alternative to #1 if the size of the 
>>> request can be changed by the adversary (e.g. by blog
>>> comments).
>> 
>> This seems less of a problem with request order not being 
>> preserved. And actually, the logged size is the size of the
>> object on the server, not the number of bytes written to the
>> client.  Even if these sizes were scrubbed, it would be quite
>> easy for an attacker to find out most of these sizes by simply
>> requesting objects themselves.  On the other hand, not including
>> them would make some analyses unnecessarily hard. I'd say it's
>> reasonable to keep them.
> 
> Here is a concern: if the adversary can cause the size to be
> modified (say by adding comments to an blog page), then he can
> effectively mark certain requests as happening within a certain
> time period by setting a unique size for that time period.

Alright, I see your point.  We should remove sizes of requested
objects that can be modified by users and hence adversaries.  The blog
is not affected here, because we're not including sanitized of the
blog yet, and even if we were, comments are manually approved by the
blog admin which only happens a few times per day and which takes away
control from an adversary.

But we do have Trac logs where users can easily add a comment or
modify a wiki page.  We should simply include 0 as requested object
size in those logs.  And we should make sure we're doing the same with
future sites where users can modify content.  Added to my list.

>>> 3. Even without fine-grained timing information, daily 
>>> per-server logs might include data from few enough clients
>>> that multiple requests can be reasonably inferred to be from
>>> the same client, which can collectively reveal lots of
>>> information (e.g. country based on browser localization used,
>>> platform, blog posts viewed/commented on if the blog server
>>> also releases logs).
>> 
>> We're removing almost all user data from request logs and only 
>> preserving data about the requested object.  For example, we're 
>> throwing away user agent strings and request parameters.  I don't
>>  really see the problem you're describing here.
> 
> This might be easiest to appreciate in the limit. Suppose you have
> a huge number of servers (relative to the number of clients) with
> DNS load-balancing among them. Each one basically has no requests
> or all those from the same client. Linking together multiple client
> requests allow them to collectively reveal information about the
> client. You might learn the language in one request, the platform
> in another, etc. A similar argument applies to splitting the logs
> across increasingly small time periods (per-day, per-hour, although
> at some point the time period gets below a given client’s
> “browsing session"). Obviously both of these examples are not near
> reality at some point, but the more you separate the logs across
> machines and over time, the more that requests might reasonably be
> inferred to belong to the same client. This presents an tradeoff
> you can make between accuracy and privacy by aggregating across
> more machines and over longer time periods.

So, I'm not sure if the following is feasible with the current
sanitizing code.  What we could do is merge all logs coming from
different servers for a given site and day, sort them, and provide
them as single sanitized log file.  That would address your concern
here without making the logs any less useful for analysis.  If we
cannot implement this right now, I'll make a note to implement this
when we re-implement this code in Java and add it to CollecTor.  Added
to my list, too.

>> let's do that now:
> 
> :-D

Well, we did discuss benefits and risks at length a few years ago, we
just didn't follow these guidelines simply because they didn't 

Re: [tor-dev] Tor Browser downloads and updates graphs

2016-09-22 Thread Karsten Loesing
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 20/09/16 17:46, Lunar wrote:
> Karsten Loesing:
>>> If you feel that's interesting enough, would it be possible to
>>>  also add the number of download of cryptographic signatures
>>> to the graph?
> 
>> Sure, added.
> 
> Thanks! These are interesting datapoints regarding the “but nobody 
> ever checks the signature” that I hear now and then.

Great!

>> That is something I'd like to add very soon, but I'd first want
>> to discuss whether the absolute numbers make sense before
>> breaking them down by operating system, release channel, and
>> locale.
> 
>> I'll bring the database with me to Seattle.  Maybe we can sit
>> down and run some queries on it together?  This was fun last
>> weekend together with Georg and Nicolas, and it would for sure be
>> fun with you and other interested folks in Seattle.
> 
> Sure! :)

Ah well, I found some time to make a graph for this and also found
this to be a good excuse to prototype R Markdown files:

https://tor-metrics.shinyapps.io/webstats2/

Enjoy!  And please let me know how to make that graph even more
useful!  (Keep in mind that this is a prototype and that the version
on Tor Metrics is likely going to provide fewer options to examine the
data; but we should use this prototype to learn which are the most
important things we want to have on Tor Metrics.)

> If we're going to get such graphs running as a way to measure the 
> amount of Tor Browser users, I wonder if we should not also try to 
> work with Tails' people to add their boot statistics, and maybe
> other projects who include Tor Browser without using the automated
> update mechanism.

Good idea, added to the list.

All the best,
Karsten
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJX49Q9AAoJEC3ESO/4X7XByW8H/icN2WiAHe8rJ7LhcPTP0teI
YPh4MUOAGCJ0lSdzgT2p3mNbsubvZ7jf1zuOqhIkqKmYzW7SQiMaBvtB/vz0Piz0
Lud2Q1iSeaLVYjlsdsyAfqvX/mEoC78nO5zWusiEEWw9HW0b1pWruQs69VRJMk2+
RDhYDLxTidy7UN/gfEHuE4cnGrf+tOsiMP20VQQFJv4Fl/GGHTR/sssgQj6WjGKd
My48IwDiNSDu8IxcNnjvUOAQ9AAaCr82Wu/KqP3u04+zFDaxhFFe18bU/RclBVlL
kgsNBrPyHpiVJ9psDjfzwQ7XscmXKzZ040GQjdv/K26RtGmPvODYD1M8Seyn3A4=
=9jg4
-END PGP SIGNATURE-
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Tor Browser downloads and updates graphs

2016-09-21 Thread Aaron Johnson
> 
> Log files are sorted as part of the sanitizing procedure, so that
> request order should not be preserved.  If you find a log file that is
> not sorted, please let us know, because that would be a bug.

That’s great! It just appeared ordered in that multiple related requests 
appeared in sequence, but I see that sorting can have that effect too.

> 
> 
> > 2. The size of the response is included, which potentially allows
> > an adversary observing the client side to perform a correlation
> > attack (combined with #1 above). This could allow the adversary to
> > learn interesting things like (i) this person is downloading arm
> > and thus is probably running a relay or (ii) this person is
> > creating Trac tickets with onion-service bugs and is likely running
> > an onion service somewhere (or is Trac excluded from these logs?).
> > The size could also be used as an time-stamping mechanism
> > alternative to #1 if the size of the request can be changed by the
> > adversary (e.g. by blog comments).
> 
> This seems less of a problem with request order not being preserved.
> And actually, the logged size is the size of the object on the server,
> not the number of bytes written to the client.  Even if these sizes
> were scrubbed, it would be quite easy for an attacker to find out most
> of these sizes by simply requesting objects themselves.  On the other
> hand, not including them would make some analyses unnecessarily hard.
> I'd say it's reasonable to keep them.

Here is a concern: if the adversary can cause the size to be modified (say by 
adding comments to an blog page), then he can effectively mark certain requests 
as happening within a certain time period by setting a unique size for that 
time period.

> 
> > 3. Even without fine-grained timing information, daily per-server
> > logs might include data from few enough clients that multiple
> > requests can be reasonably inferred to be from the same client,
> > which can collectively reveal lots of information (e.g. country
> > based on browser localization used, platform, blog posts
> > viewed/commented on if the blog server also releases logs).
> 
> We're removing almost all user data from request logs and only
> preserving data about the requested object.  For example, we're
> throwing away user agent strings and request parameters.  I don't
> really see the problem you're describing here.

This might be easiest to appreciate in the limit. Suppose you have a huge 
number of servers (relative to the number of clients) with DNS load-balancing 
among them. Each one basically has no requests or all those from the same 
client. Linking together multiple client requests allow them to collectively 
reveal information about the client. You might learn the language in one 
request, the platform in another, etc. A similar argument applies to splitting 
the logs across increasingly small time periods (per-day, per-hour, although at 
some point the time period gets below a given client’s “browsing session"). 
Obviously both of these examples are not near reality at some point, but the 
more you separate the logs across machines and over time, the more that 
requests might reasonably be inferred to belong to the same client. This 
presents an tradeoff you can make between accuracy and privacy by aggregating 
across more machines and over longer time periods.

> 
> let's do that now:

:-D

> 
> > • Don't collect data you don't need (minimization).
> 
> I can see us using sanitized web logs from all Tor web servers, not
> limited to Tor Browser/Tor Messenger downloads and Tor main website
> hits.  I used these logs to learn whether Atlas or Globe had more
> users, and I just recently looked at Metrics logs to see which graphs
> are requested most often.

A more conservative approach would be more “pull” than “push”, so you don’t 
collect data until you want it, at which point you add it to the collection 
list. Just a thought.

> 
> > • The benefits should outweigh the risks.
> 
> I'd say this is the case.  As you say below yourself, there is value
> of analyzing these logs, and I agree.  I have also been thinking a lot
> about possible risks, which resulted in the sanitizing procedure that
> is in place, which comes after the very restrictive logging policy at
> Tor's Apache processes, which throws away client IP addresses and
> other sensitive data right at the logging step.  All in all, yes,
> benefits do outweigh the risks here, in my opinion.

I think this is the ultimate test, and it sounds like you put a lot of thought 
into it (as expected).

> 
> 
> > • Consider auxiliary data (e.g. third-party data sets) when
> > assessing the risks.
> 
> I don't see a convincing scenario where this data set would make a
> third-party data set more dangerous.

Are there any files there are only of particular interest to a particular user 
or user subpopulation? Examples might be an individual’s blog or Tor 
instructions in Kurdish. If so, revealing that they have 

Re: [tor-dev] Tor Browser downloads and updates graphs

2016-09-21 Thread Karsten Loesing
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Aaron,

On 20/09/16 15:43, Aaron Johnson wrote:
>> 
>> Good thinking!  I summarized the methodology on the graph page
>> as: The graph above is based on sanitized Tor web server logs
>> [0]. These are a stripped-down version of Apache's "combined" log
>> format without IP addresses, log times, HTTP parameters,
>> referers, and user agent strings.
> ...
>> 
>> If you spot anything in the data that you think should be 
>> sanitized more thoroughly, please let us know!
> 
> Interesting, thanks. Here are some thoughts based on looking
> through one of these logs (from archeotrichon.torproject.org 
>  on 2015-09-20): 1. The
> order of requests appears to be preserved. If so, this allows an
> adversary to determine fine-grained timing information by inserting
> requests of his own at known times.

Log files are sorted as part of the sanitizing procedure, so that
request order should not be preserved.  If you find a log file that is
not sorted, please let us know, because that would be a bug.

> 2. The size of the response is included, which potentially allows
> an adversary observing the client side to perform a correlation
> attack (combined with #1 above). This could allow the adversary to
> learn interesting things like (i) this person is downloading arm
> and thus is probably running a relay or (ii) this person is
> creating Trac tickets with onion-service bugs and is likely running
> an onion service somewhere (or is Trac excluded from these logs?).
> The size could also be used as an time-stamping mechanism
> alternative to #1 if the size of the request can be changed by the
> adversary (e.g. by blog comments).

This seems less of a problem with request order not being preserved.
And actually, the logged size is the size of the object on the server,
not the number of bytes written to the client.  Even if these sizes
were scrubbed, it would be quite easy for an attacker to find out most
of these sizes by simply requesting objects themselves.  On the other
hand, not including them would make some analyses unnecessarily hard.
 I'd say it's reasonable to keep them.

> 3. Even without fine-grained timing information, daily per-server 
> logs might include data from few enough clients that multiple 
> requests can be reasonably inferred to be from the same client,
> which can collectively reveal lots of information (e.g. country
> based on browser localization used, platform, blog posts
> viewed/commented on if the blog server also releases logs).

We're removing almost all user data from request logs and only
preserving data about the requested object.  For example, we're
throwing away user agent strings and request parameters.  I don't
really see the problem you're describing here.

> I also feel compelled to raise the question of whether or not 
> releasing these logs went through Tor’s own recommended procedure
> for producing data on its users 
> (https://research.torproject.org/safetyboard.html#guidelines 
> ):

Git history says that those guidelines were put up in April 2016
whereas the rewrite of the web server log sanitizing code happened in
November 2015, with the original sanitizing process being written in
December 2011.  So, no, we didn't go through that procedure yet, but
let's do that now:

> • Only collect data that is safe to make public.

We're only using data after making it public, so we're not collecting
anything that we think wouldn't be safe to make public.

> • Don't collect data you don't need (minimization).

I can see us using sanitized web logs from all Tor web servers, not
limited to Tor Browser/Tor Messenger downloads and Tor main website
hits.  I used these logs to learn whether Atlas or Globe had more
users, and I just recently looked at Metrics logs to see which graphs
are requested most often.

> • Take reasonable security precautions, e.g. about who has access
> to your data sets or experimental systems.

We're doing that.  For example, I personally don't have access to
non-sanitized web logs, just to the sanitized ones as everyone else.

> • Limit the granularity of data (e.g. use bins or add noise).

We're throwing out time information and removing request order.

> • The benefits should outweigh the risks.

I'd say this is the case.  As you say below yourself, there is value
of analyzing these logs, and I agree.  I have also been thinking a lot
about possible risks, which resulted in the sanitizing procedure that
is in place, which comes after the very restrictive logging policy at
Tor's Apache processes, which throws away client IP addresses and
other sensitive data right at the logging step.  All in all, yes,
benefits do outweigh the risks here, in my opinion.

> • Consider auxiliary data (e.g. third-party data sets) when
> assessing the risks.

I don't see a convincing scenario where this data set would make a
third-party 

Re: [tor-dev] Tor Browser downloads and updates graphs

2016-09-20 Thread Lunar
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512

Karsten Loesing:
>> If you feel that's interesting enough, would it be possible to 
>> also add the number of download of cryptographic signatures to
>> the graph?
> 
> Sure, added.

Thanks! These are interesting datapoints regarding the “but nobody
ever checks the signature” that I hear now and then.

> That is something I'd like to add very soon, but I'd first want to 
> discuss whether the absolute numbers make sense before breaking
> them down by operating system, release channel, and locale.
> 
> I'll bring the database with me to Seattle.  Maybe we can sit down
> and run some queries on it together?  This was fun last weekend
> together with Georg and Nicolas, and it would for sure be fun with
> you and other interested folks in Seattle.

Sure! :)

If we're going to get such graphs running as a way to measure the
amount of Tor Browser users, I wonder if we should not also try to
work with Tails' people to add their boot statistics, and maybe other
projects who include Tor Browser without using the automated update
mechanism.

- -- 
Lunar 
-BEGIN PGP SIGNATURE-

iQIcBAEBCgAGBQJX4VnNAAoJEEAsIlA9Nuk2LBIP/i1+t1zCV3pRS4BnQI57w+/Z
oW5mj7Aorz30mkXmLZYgR+R6/1MOycEdATEGNJro8ZPqee6Xv094ml4nEFv38dlh
exrXn8AeQN1nxnnV6w1hO/rECBG7FJU7w1qXBmHG+UrqP1K4pXRjKhAbCZnFTTfr
2GBiLfaRDqMWHN6YgB/q2QgTk/uPlwpXfxJr2mHJKiJvJFTVGt0VnMm5TFSqDdEp
LbY/S8A8683xyW+RxHZXl3FM4cZL8kOPRb4MYU+QykP+d2RTf4AbOwygNEHD0973
fhgGkJaHp7Tf8y0Gj2gTBD/FLz9vXy8/UjnCos+yf7FnjFunpP1rqmt4AHIVO1rF
b8pERRvdvEmRnVQZ52EGydfExsGCwx8Z2VKqhXwtBt94pKhb+Hr3wXCl2MB8TeaY
GrSty+fNDkxw2pl42axRT4I7E0iskso0g597JImDHfdTBy8vfH/hCP3kDk9AAzqe
VNxNTRWSQ2wiGhgH0S/AqOIrPHhpzQ/3FJf6ziNo3CsspvJ8i/APGg2lWxz6PQNY
jHSQhlc0oxMUFj5VonWWsJ+YPgqIPxVaa7gr8gnXcGmtZtQXDJ5a955+amKOImxS
2L26MDpHh7LplQL5xQvaMetgT7uhsMeJvjbM7E/q5/PjVHAfADQs2+KfMkDsGHfA
cMZCuV4J7PkE4C3AEouI
=8Ht0
-END PGP SIGNATURE-
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Tor Browser downloads and updates graphs

2016-09-20 Thread Aaron Johnson
> 
> Good thinking!  I summarized the methodology on the graph page as: The
> graph above is based on sanitized Tor web server logs [0]. These are a
> stripped-down version of Apache's "combined" log format without IP
> addresses, log times, HTTP parameters, referers, and user agent strings.
...
> 
> If you spot anything in the data that you think should be sanitized
> more thoroughly, please let us know!

Interesting, thanks. Here are some thoughts based on looking through one of 
these logs (from archeotrichon.torproject.org 
 on 2015-09-20):
  1. The order of requests appears to be preserved. If so, this allows an 
adversary to determine fine-grained timing information by inserting requests of 
his own at known times.
  2. The size of the response is included, which potentially allows an 
adversary observing the client side to perform a correlation attack (combined 
with #1 above). This could allow the adversary to learn interesting things like 
(i) this person is downloading arm and thus is probably running a relay or (ii) 
this person is creating Trac tickets with onion-service bugs and is likely 
running an onion service somewhere (or is Trac excluded from these logs?). The 
size could also be used as an time-stamping mechanism alternative to #1 if the 
size of the request can be changed by the adversary (e.g. by blog comments).
  3. Even without fine-grained timing information, daily per-server logs might 
include data from few enough clients that multiple requests can be reasonably 
inferred to be from the same client, which can collectively reveal lots of 
information (e.g. country based on browser localization used, platform, blog 
posts viewed/commented on if the blog server also releases logs).

I also feel compelled to raise the question of whether or not releasing these 
logs went through Tor’s own recommended procedure for producing data on its 
users (https://research.torproject.org/safetyboard.html#guidelines 
):
• Only collect data that is safe to make public.
• Don't collect data you don't need (minimization).
• Take reasonable security precautions, e.g. about who has access to 
your data sets or experimental systems.
• Limit the granularity of data (e.g. use bins or add noise).
• The benefits should outweigh the risks.
• Consider auxiliary data (e.g. third-party data sets) when assessing 
the risks.
• Consider whether the user meant for that data to be private.
I definitely see the value of analyzing these logs, though, and it definitely 
helps that some sanitization was applied :-)

Best,
Aaron


signature.asc
Description: Message signed with OpenPGP using GPGMail
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Tor Browser downloads and updates graphs

2016-09-19 Thread Karsten Loesing
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 17/09/16 18:28, Aaron Johnson wrote:
>>> Here's
>>> 
>>> the same graph with more data, more request types, and of
>>> course a lot more shininess:
>>> 
>>> https://tor-metrics.shinyapps.io/webstats/ 
>>> ...
>> If you feel that's interesting enough, would it be possible to 
>> also add the number of download of cryptographic signatures to
>> the graph?
>> 
>> I also would love to see a breakdown by operating systems if you
>>  consider it a reasonable thing to do.
> 
> To these requests I would add a request for the methodology behind 
> releasing these stats. Are they raw numbers? Rounded? More
> generally, how are the web logs sanitized? I’m interested in how
> safe these statistics are to release and how they might be changed
> to be even more privacy-preserving.

Good thinking!  I summarized the methodology on the graph page as: The
graph above is based on sanitized Tor web server logs [0]. These are a
stripped-down version of Apache's "combined" log format without IP
addresses, log times, HTTP parameters, referers, and user agent strings.

[0] https://webstats.torproject.org/

I guess we'll write down the sanitizing process in more detail once we
make this part of CollecTor, but that may take a few more weeks or
even months.

If you spot anything in the data that you think should be sanitized
more thoroughly, please let us know!

> Thanks, Aaron

All the best,
Karsten
-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJX37hFAAoJEC3ESO/4X7XBzP8IAJkHpRM7uVidFeGbI4iKgEf6
T/xuyeGJRx74YLL06SAy+pPnpMrfuu7VeSdUw/syjqpDsl33uql5RvyWRlDzZu8N
0IRtHeYtXY619FVTzNCm5jFXIpySBJ8xT95KDrMSX+NSZlEJahgILH5EEdkeXGNE
TtOmococgysOx3rCA2EKTG5Y/OP7XSGuA9N1QfWm97KGwUpelv35eyP9Ydzfhi9I
dEBEAZ9HkGn9E2VhCs9D9G4ANgwNQqcHlm8y+S85ZEdLMhf7Eu5LQHtR3Ic0K6f5
QPtT7RGx0rOUVeSm56vD2Y5gx9wZ/nNYO2V8zyyJB3crTKTj8/7pzKUEndf0C3A=
=6FjO
-END PGP SIGNATURE-
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Tor Browser downloads and updates graphs

2016-09-19 Thread Karsten Loesing
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 17/09/16 17:52, Lunar wrote:
> Karsten Loesing:
>> On 11/09/16 18:13, Georg Koppen wrote:
>>> Here are the graphs showing initial downloads, update pings and
>>>  update requests over time:
>> 
>>> https://people.torproject.org/~karsten/volatile/torbrowser-annotated-2016-09-11.pdf
>>
>>>
>>
>>> 
Here's
>> 
>> the same graph with more data, more request types, and of course
>> a lot more shininess:
>> 
>> https://tor-metrics.shinyapps.io/webstats/

This is now updated with more data and a lot more text.  Enjoy!

>> Note that this is just a prototype that will go away in the
>> future without notice.  But if enough people like it we might run
>> our own Shiny Server at some point in the future.  Enjoy!  (And
>> thanks, Isabela, for suggesting Shiny!)
> 
> If you feel that's interesting enough, would it be possible to
> also add the number of download of cryptographic signatures to the
> graph?

Sure, added.

> I also would love to see a breakdown by operating systems if you 
> consider it a reasonable thing to do.

That is something I'd like to add very soon, but I'd first want to
discuss whether the absolute numbers make sense before breaking them
down by operating system, release channel, and locale.

I'll bring the database with me to Seattle.  Maybe we can sit down and
run some queries on it together?  This was fun last weekend together
with Georg and Nicolas, and it would for sure be fun with you and
other interested folks in Seattle.

> Seeing these graphs really made my day, I had been hoping for them
> for a long time (#10675 which maybe should be closed). Thanks a
> lot!

Oh, interesting.  Maybe we can learn something from that ticket, too.
 Let's not close it just yet.

> Thanks,

All the best,
Karsten

-BEGIN PGP SIGNATURE-
Comment: GPGTools - http://gpgtools.org

iQEcBAEBAgAGBQJX37atAAoJEC3ESO/4X7XB0mYH/RHyjaJNlXKjd0R/iDa18QO/
mcIN/OdSdwv/BvtnkzY+JqFZ3PN6Ukh2Q6ZN0n2ZX89XnS//vjLxWGuYgCNVykOY
mZDY0Qup/k1v0vhtOhQRxgUVHwfYf0sa1+PwjGs95yweQbGcPPT50/ey5m0IB+hh
uJyPEkx3yqbdUeVD5Fstkvr0SZDBoAbUZG2z2LlY5lev5biZCytVE99FhX6QLtAZ
9jUnzWzz1PFe7zANXxjjFfBNq6r4q66XP93UtGFr5XOxo+8b9czebzTAkhMepJ50
4eopjbx1asTyRJRWf16ASUxasH3qiITq882kqsT2uabBz3b/QP79HvXWmAnA4Xs=
=/1Er
-END PGP SIGNATURE-
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Tor Browser downloads and updates graphs

2016-09-17 Thread Lunar
Karsten Loesing:
> On 11/09/16 18:13, Georg Koppen wrote:
>> Here are the graphs showing initial downloads, update pings and 
>> update requests over time:
> 
>> https://people.torproject.org/~karsten/volatile/torbrowser-annotated-2016-09-11.pdf
>
>> 
> Here's
> 
> the same graph with more data, more request types, and of course a
> lot more shininess:
> 
> https://tor-metrics.shinyapps.io/webstats/
> 
> Note that this is just a prototype that will go away in the future 
> without notice.  But if enough people like it we might run our own 
> Shiny Server at some point in the future.  Enjoy!  (And thanks, 
> Isabela, for suggesting Shiny!)

If you feel that's interesting enough, would it be possible to also
add the number of download of cryptographic signatures to the graph?

I also would love to see a breakdown by operating systems if you
consider it a reasonable thing to do.

Seeing these graphs really made my day, I had been hoping for them for
a long time (#10675 which maybe should be closed). Thanks a lot!

Thanks,
-- 
Lunar 



signature.asc
Description: OpenPGP digital signature
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Tor Browser downloads and updates graphs

2016-09-14 Thread Arthur D. Edelstein
This is awesome, Karsten!

On Wed, Sep 14, 2016 at 1:16 PM, Karsten Loesing  wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> On 11/09/16 18:13, Georg Koppen wrote:
>> Here are the graphs showing initial downloads, update pings and
>> update requests over time:
>>
>> https://people.torproject.org/~karsten/volatile/torbrowser-annotated-2016-09-11.pdf
>
> Here's
>>
> the same graph with more data, more request types, and of
> course a lot more shininess:
>
> https://tor-metrics.shinyapps.io/webstats/
>
> Note that this is just a prototype that will go away in the future
> without notice.  But if enough people like it we might run our own
> Shiny Server at some point in the future.  Enjoy!  (And thanks,
> Isabela, for suggesting Shiny!)
>
> All the best,
> Karsten
>
> -BEGIN PGP SIGNATURE-
> Comment: GPGTools - http://gpgtools.org
>
> iQEcBAEBAgAGBQJX2bAfAAoJEC3ESO/4X7XBKTAH/1e45BQDTj0fDMGG+Po61/vB
> 2XQpQ6YFlF44VaPBkOaipqh9E3THzJwtTtTmkVW2lcziTiOYdWBiZ3XyzKIMMbCq
> K5uNH9xtgV9JJebTl1e6hVIrYMfflpju+7E9y8SBg3WeiL2Vd9jxa9aoCFgLEdbY
> kX74D9OTtwi5RnJ9op/1F+DU7hLOFoudDdgzcuS6I/FguTGs/fkZxlZ+gXO1QLxp
> djZt+dRCk0E9Pgm0KJkq9AUa0YN+YLQeYpwQpI9Ge3Meo/KXtI2tTmJRlDFf9v4T
> 7UaESIkc4MpflAsN/inHtAKXtgXCGkiEi3r/dbwq7BxfmdBeSXECGkA45m7cqjQ=
> =xf5/
> -END PGP SIGNATURE-
> ___
> tor-dev mailing list
> tor-dev@lists.torproject.org
> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Tor Browser downloads and updates graphs

2016-09-12 Thread teor

> On 13 Sep 2016, at 01:51, Mark Smith  wrote:
> 
> On 9/12/16 11:20 AM, David Fifield wrote:
>> Oh, thanks for finding that source code link. I looked for that code and
>> didn't find it.
>> 
>> But that's exactly what I'm saying: once someone has downloaded an
>> update, they stop sending update pings until their next restart, which
>> might explain the decreases in update pings at (8) and (9) in the graphs.
> 
> Ah, right. Sorry for my confusion.
> So, yes, your theory really is a good one, although it is surprising
> that months later the update ping count did not return to its older
> value, e.g., the March counts are significantly higher than August ones.
> Maybe our usage is dropping.
> 
> If we think the restart delay is a bad thing, we could be more
> aggressive about encouraging people to restart and apply updates.

That would mitigate issues where profile changes are ignored between when the 
update is applied and when users restart the browser:
https://trac.torproject.org/projects/tor/ticket/18179

> 
> --
> Mark Smith
> Pearl Crescent
> http://pearlcrescent.com/
> 
> ___
> tor-dev mailing list
> tor-dev@lists.torproject.org
> https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

Tim Wilson-Brown (teor)

teor2345 at gmail dot com
PGP C855 6CED 5D90 A0C5 29F6 4D43 450C BA7F 968F 094B
ricochet:ekmygaiu4rzgsk6n
xmpp: teor at torproject dot org








signature.asc
Description: Message signed with OpenPGP using GPGMail
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Tor Browser downloads and updates graphs

2016-09-12 Thread Nicolas Vigier
Hi,

On Mon, 12 Sep 2016, Rob van der Hoeven wrote:

> Hi Georg,
> 
> I think the behavior you see can be explained by an overloaded download
> server. From the initial downloads graph you can see that there are on
> average 80.000 downloads a day. From the update pings and update
> requests graphs you can estimate that there are about 800.000 active Tor
> browser users. So, when there is a new version of Tor browser the number

We can estimate about 800.000 active daily users if we assume that they
are all running their browser at different times during the day to make
two pings a day. But some of them probably run it only during some part
of the day that is less than 12 hours, which is not enough to make two
pings per day. So I think from the update pings, we can only estimate
that we have more than 800.000 active daily users, and less than 1.600.000.

To get a more precise estimate of active users, we could maybe count the
total of update downloads for each version (in an other graph). Although
this would be different from the update pings, as it would also include
the occasional users who don't use it everyday.

> of update requests massively overloads the download server. The
> saw-tooth form of the update requests graph is what you expect in this
> situation. First you get an update request from all users, the next day
> you get a request from all users minus the users who were updated the
> previous day (max 80.) and so on.
> 
> I wonder if it is possible that failed downloads are counted too? That
> would explain the spikes. But systems that are so heavily overloaded can
> generate all kinds of weird results. 

I think it is possible that failed downloads are counted too. What we are
counting is the number of http 302 responses (redirect) to initiate a
download. If the redirect works but the actual download fails, it is
still counted.

Nicolas



signature.asc
Description: PGP signature
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Tor Browser downloads and updates graphs

2016-09-12 Thread David Fifield
On Mon, Sep 12, 2016 at 11:12:15AM -0400, Mark Smith wrote:
> On 9/11/16 3:45 PM, David Fifield wrote:
> >> * We don't know what (8) or (9) is but it seems to us we are losing
> >> users over time and are only getting them back slowly if at all. A
> >> weekday/weekend pattern is visible there as well.
> > 
> > Does Tor Browser continue checking for further updates in the span of
> > time between when it downloads an update and when it is restarted? For
> > example, you are running 6.0, the browser downloads the 6.0.1 update and
> > stages it and asks you to restart; does the browser check for updates
> > until you actually restart? If not, then the decreases in update pings
> > might be people being tardy in restarting their browser.
> 
> That is a good theory, but I don't think update checks occur if there is
> a pending update. The code that checks and returns early is here:
> 
> https://gitweb.torproject.org/tor-browser.git/tree/toolkit/mozapps/update/nsUpdateService.js?h=tor-browser-45.4.0esr-6.0-1#n2388

Oh, thanks for finding that source code link. I looked for that code and
didn't find it.

But that's exactly what I'm saying: once someone has downloaded an
update, they stop sending update pings until their next restart, which
might explain the decreases in update pings at (8) and (9) in the graphs.
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Tor Browser downloads and updates graphs

2016-09-12 Thread Mark Smith
On 9/11/16 3:45 PM, David Fifield wrote:
>> * We don't know what (8) or (9) is but it seems to us we are losing
>> users over time and are only getting them back slowly if at all. A
>> weekday/weekend pattern is visible there as well.
> 
> Does Tor Browser continue checking for further updates in the span of
> time between when it downloads an update and when it is restarted? For
> example, you are running 6.0, the browser downloads the 6.0.1 update and
> stages it and asks you to restart; does the browser check for updates
> until you actually restart? If not, then the decreases in update pings
> might be people being tardy in restarting their browser.

That is a good theory, but I don't think update checks occur if there is
a pending update. The code that checks and returns early is here:

https://gitweb.torproject.org/tor-browser.git/tree/toolkit/mozapps/update/nsUpdateService.js?h=tor-browser-45.4.0esr-6.0-1#n2388

-- 
Mark Smith
Pearl Crescent
http://pearlcrescent.com/



signature.asc
Description: OpenPGP digital signature
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Tor Browser downloads and updates graphs

2016-09-12 Thread Tom Ritter
On 12 September 2016 at 03:37, Rob van der Hoeven
 wrote:
> One thing bothers me. The update requests graph never touches zero. It
> should, because that would mean that all Tor browsers have been updated.
> 100.000 seems to be the lowest value.

I'm not surprised by this at all. I think a very common mode of usage
is people who have TB on their computer but don't use it regularly. (I
have several friends like this.) Only when they want to search for
something 'embarrassing' (medical conditions, etc) will they use it.
With an update cycle of one-two months between releases, it's likely
these people are actually _never_ up to date (unless they choose to
restart TB during their browsing session.)

-tom
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


[tor-dev] Tor Browser downloads and updates graphs

2016-09-11 Thread Georg Koppen
Hi all!

So, Karsten, Nicolas and I were sitting together for a while and were
looking at past data for figuring out how many users downloaded and
updated their Tor Browser over time.

We actually got more questions than we were able to answer but I guess
that's fine for a start.

Here are the graphs showing initial downloads, update pings and update
requests over time:

https://people.torproject.org/~karsten/volatile/torbrowser-annotated-2016-09-11.pdf

We annotated the grapgs a bit highlighting things we wanted people to
point to.

The initial downloads are the number of package downloads from the
website for all supported platforms (Windows, OS X and Linux). Apart
from spike (5) all events seem to be non-Tor Browser related.

* On the downloads graph we seem to have a spike (5) in new downloads
with the release of Tor Browser 6.0. Maybe because it was much more
widely publicized in the media than previous/later releases?

* On the same graph, a big spike (6) can be seen at the same day the new
board got announced.

* There are other spikes on the initial downloads graph (1, 3, 4) where
we have no idea what happened while (2) is probably just an outlier.

The update pings are made by Tor Browser instances roughly twice a day
and they indicate the number of active Tor Browser users. More
importantly, one can see the decrease or increase of Tor Browser usage
over time.

* As (2) in the downloads graph (7) seems to be an outlier as well.

* We don't know what (8) or (9) is but it seems to us we are losing
users over time and are only getting them back slowly if at all. A
weekday/weekend pattern is visible there as well.

The graph with the update requests is basically showing how fast users
are updating to newly released Tor Browser versions.

* (10) shows a large spike correlating to the 6.0 release. It is not
clear to us where all those update requests were coming from given the
update request pattern before/after 6.0. One plausible explanation could
be that our infrastructure was heavily overloaded causing clients to
retry fetching the update.

We'd love to hear feedback, especially those that could shed light on
the events we did not have an explanation for.

Georg



signature.asc
Description: OpenPGP digital signature
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev