Re: [tor-dev] Better relay uptime visualisation

2015-12-08 Thread nusenu

> Also, here are the steps to reproduce:
> 
>   wget 
> https://collector.torproject.org/archive/relay-descriptors/consensuses/consensuses-2015-11.tar.xz
>   tar xvJf consensuses-2015-11.tar.xz
>   go get git.torproject.org/user/phw/sybilhunter.git
>   sybilhunter -data consensuses-2015-11/ -uptime

How much of an effort would it be to support onionoo files as input
data? (onionoo data would be able to display more data like AS, CC,
first-seen)
I could provide some archived onionoo data.



signature.asc
Description: OpenPGP digital signature
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Better relay uptime visualisation

2015-12-08 Thread Philipp Winter
On Mon, Dec 07, 2015 at 01:44:47PM -0800, David Fifield wrote:
> On Mon, Dec 07, 2015 at 02:51:23PM -0500, Philipp Winter wrote:
> > I spent some time improving the existing relay uptime visualisation [0].
> > Inspired by a research paper [1], the new algorithm uses single-linkage
> > clustering with Pearson's correlation coefficient as distance function.
> > The idea is that relays are grouped next to each other if their uptime
> > (basically a binary sequence) is highly correlated.  Check out the
> > following gallery.  It contains monthly relay uptime images, dating back
> > to 2007:
> > 
> 
> How about just taking the XOR of two sequences as the distance?

Here's Nov 2015, with XOR as distance:


> It would be interesting to know if there are any near-perfect
> anticorrelations; i.e., one relay starts when another stops.

It looks like there's many of them.  So far, I calculated the
correlation as 1 - Pearson(s1,s2) because I'm only interested in
positively correlated sequences.  Here's an uptime image with
Pearson(s1,s2) as distance function, so positive correlation is
considered just as much as negative correlation.  Have a look at the
leftmost part:


Cheers,
Philipp
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Better relay uptime visualisation

2015-12-08 Thread Philipp Winter
On Mon, Dec 07, 2015 at 09:57:18PM +, nusenu wrote:
> > and every column is a relay.  White pixels mean
> > that a relay was offline and black pixels means that a relay was
> > online.  Red pixels are used to highlight suspiciously similar clusters.
> 
> I assume they are highlighted only if they exceed a certain group size?
> What is the threshold?

Exactly.  Groups >= 5 are considered for highlighting.

> Until I looked at the heartbleed example I assumed grouping requires
> "perfect matches" across the entire month but after seeing the
> heartbleed example I'm not sure whether that is actually the case or if
> two distinct groups are just next to each other and do not have a
> "separator" between them.

Right, I don't use perfect matching, so we can account for some noise,
e.g., some of the Sybils having small downtimes, or not starting and
stopping at the exact same hour.  Here's the code:


> I would also find it useful to have it accept fingerprints as input and
> graph their uptime to look at a given set of relays in certain cases
> 
> example input could be the fingerprints from [1]+[2] after these relays
> have been around for some time.

Good point.  That has been on my todo list and I hope to get it done
soon.

> Are you planing to generate these graphs on an ongoing basis?

Yes, I would like to.  We could easily generate them every other hour,
or even hourly.  The details will depend on this thread, Karsten
started:


Cheers,
Philipp
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Better relay uptime visualisation

2015-12-08 Thread Philipp Winter
On Tue, Dec 08, 2015 at 04:52:45PM +, nusenu wrote:
>> Also, here are the steps to reproduce:
>> 
>>   wget 
>> https://collector.torproject.org/archive/relay-descriptors/consensuses/consensuses-2015-11.tar.xz
>>   tar xvJf consensuses-2015-11.tar.xz
>>   go get git.torproject.org/user/phw/sybilhunter.git
>>   sybilhunter -data consensuses-2015-11/ -uptime
> 
> How much of an effort would it be to support onionoo files as input
> data? (onionoo data would be able to display more data like AS, CC,
> first-seen)
> I could provide some archived onionoo data.

It's not trivial, but feasible.  Sybilhunter uses a Go-based descriptor
parsing library [0] that doesn't support Onionoo's format; so an Onionoo
parser is necessary, and an update to sybilhunter's uptime analysis
code.

[0] 

Cheers,
Philipp
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Better relay uptime visualisation

2015-12-08 Thread Philipp Winter
On Mon, Dec 07, 2015 at 11:43:38PM -0500, grarpamp wrote:
> Can a one be generated covering each year and maybe a five year one.

I haven't checked the complexity of the clustering algorithm I use, but
it's probably quadratic.  I think a full year worth of uptimes would
require pruning the data, e.g., remove all relays that were online for
only one or two hours.

For now, here's three months, Sep 2015 to Nov 2015, in a 12 MiB file:


Cheers,
Philipp
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Better relay uptime visualisation

2015-12-07 Thread Tim Wilson-Brown - teor

> On 8 Dec 2015, at 10:43, Tom Ritter  wrote:
> 
> On 7 December 2015 at 13:51, Philipp Winter  > wrote:
>> I spent some time improving the existing relay uptime visualisation [0].
>> Inspired by a research paper [1], the new algorithm uses single-linkage
>> clustering with Pearson's correlation coefficient as distance function.
>> The idea is that relays are grouped next to each other if their uptime
>> (basically a binary sequence) is highly correlated.  Check out the
>> following gallery.  It contains monthly relay uptime images, dating back
>> to 2007:
>> > >
>> 
>> If you aren't familiar with this type of visualisation: Every image
>> shows the uptime of all Tor relays that were online in a given month.
>> Every row is a consensus and every column is a relay.  White pixels mean
>> that a relay was offline and black pixels means that a relay was
>> online.  Red pixels are used to highlight suspiciously similar clusters.
> 
> That's really cool.  It seems to imply that the majority of the tor
> network stop operating halfway through the month though... Do the
> other tor graphs take into account hibernating relays?  For example, I
> would expect the time-to-download graph would be somewhat affected:
> https://metrics.torproject.org/torperf.html?graph=torperf=2015-10-01=2015-10-31=all=5mb
>  
> 
Hibernating relays run from the start of their first period to gauge load.
Then they start at a random time during the day/month, but early enough that 
they think they'll still use all their bandwidth.

I wonder if we're seeing another phenomenon? (daily / monthly server restarts?)
Or we could be seeing hibernation failing to work as intended.

Tim

Tim Wilson-Brown (teor)

teor2345 at gmail dot com
PGP 968F094B

teor at blah dot im
OTR CAD08081 9755866D 89E2A06F E3558B7F B5A9D14F



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


[tor-dev] Better relay uptime visualisation

2015-12-07 Thread Philipp Winter
I spent some time improving the existing relay uptime visualisation [0].
Inspired by a research paper [1], the new algorithm uses single-linkage
clustering with Pearson's correlation coefficient as distance function.
The idea is that relays are grouped next to each other if their uptime
(basically a binary sequence) is highly correlated.  Check out the
following gallery.  It contains monthly relay uptime images, dating back
to 2007:


If you aren't familiar with this type of visualisation: Every image
shows the uptime of all Tor relays that were online in a given month.
Every row is a consensus and every column is a relay.  White pixels mean
that a relay was offline and black pixels means that a relay was
online.  Red pixels are used to highlight suspiciously similar clusters.
A nice example is the Heartbleed incident:

The huge red block on the left shows all the relays that were removed by
the directory authorities because they didn't rotate their key pairs in
time.

The downside of single-linkage clustering is that it takes longer to
compute.  On my laptop, I can create an image covering one month in
under three minutes, so it's tolerable.

Another practical problem is that it's cumbersome to learn the relay
fingerprint of a given column.  I'm looking into JavaScript/HTML tricks
that can show text when you hover over a region in the image.  Perhaps
somebody knows more?

[0] 
[1] , Section 2

Cheers,
Philipp
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Better relay uptime visualisation

2015-12-07 Thread David Fifield
On Mon, Dec 07, 2015 at 02:51:23PM -0500, Philipp Winter wrote:
> I spent some time improving the existing relay uptime visualisation [0].
> Inspired by a research paper [1], the new algorithm uses single-linkage
> clustering with Pearson's correlation coefficient as distance function.
> The idea is that relays are grouped next to each other if their uptime
> (basically a binary sequence) is highly correlated.  Check out the
> following gallery.  It contains monthly relay uptime images, dating back
> to 2007:
> 

How about just taking the XOR of two sequences as the distance?

It would be interesting to know if there are any near-perfect
anticorrelations; i.e., one relay starts when another stops.

> Another practical problem is that it's cumbersome to learn the relay
> fingerprint of a given column.  I'm looking into JavaScript/HTML tricks
> that can show text when you hover over a region in the image.  Perhaps
> somebody knows more?

One way is to set an onmousemove handler that inserts text into a
preexisting element. For example (untested):




var OUTPUT_ELEM = document.getElementById("output");
/* Get an event's coordinates relative to a given element. */
function elem_coords(event, elem) {
var rect = elem.getBoundingClientRect();
/* http://stackoverflow.com/a/872537 */
if (typeof pageXOffset !== "undefined") {
scrollLeft = pageXOffset;
scrollTop = pageYOffset;
} else if (document.documentElement !== undefined && 
document.documentElement.clientHeight !== undefined) {
scrollLeft = document.documentElement.scrollLeft;
scrollTop = document.documentElement.scrollTop;
} else {
scrollLeft = document.body.scrollLeft;
scrollTop = document.body.scrollTop;
}
var x = event.pageX - (scrollLeft + rect.left);
var y = event.pageY - (scrollTop + rect.top);
return { x: x, y: y };
}
function onmousemove_callback(event) {
var c = elem_coords(event, img_element);
OUTPUT_ELEM.innerText = get_text_for_coordinates(c.x, c.y);
}
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Better relay uptime visualisation

2015-12-07 Thread nusenu


Philipp Winter:
> Red pixels are used to highlight suspiciously similar clusters.

Last year [1] there were a few huge groups, 3 of them are not flagged
(black lines, not red) even though they look like a perfectly matching
group?


[1] https://nymity.ch/sybilhunting/uptime-visualisation/slide_2014-12.html



signature.asc
Description: OpenPGP digital signature
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Better relay uptime visualisation

2015-12-07 Thread David Fifield
On Tue, Dec 08, 2015 at 10:47:08AM +1100, Tim Wilson-Brown - teor wrote:
> 
> On 8 Dec 2015, at 10:43, Tom Ritter <[1]t...@ritter.vg> wrote:
> 
> On 7 December 2015 at 13:51, Philipp Winter <[2]p...@nymity.ch> wrote:
> 
> I spent some time improving the existing relay uptime visualisation
> [0].
> Inspired by a research paper [1], the new algorithm uses 
> single-linkage
> clustering with Pearson's correlation coefficient as distance 
> function.
> The idea is that relays are grouped next to each other if their uptime
> (basically a binary sequence) is highly correlated.  Check out the
> following gallery.  It contains monthly relay uptime images, dating
> back
> to 2007:
> <[3]https://nymity.ch/sybilhunting/uptime-visualisation/>
> 
> If you aren't familiar with this type of visualisation: Every image
> shows the uptime of all Tor relays that were online in a given month.
> Every row is a consensus and every column is a relay.  White pixels
> mean
> that a relay was offline and black pixels means that a relay was
> online.  Red pixels are used to highlight suspiciously similar
> clusters.
> 
> 
> That's really cool.  It seems to imply that the majority of the tor
> network stop operating halfway through the month though... Do the
> other tor graphs take into account hibernating relays?  For example, I
> would expect the time-to-download graph would be somewhat affected:
> [4]https://metrics.torproject.org/torperf.html?graph=torperf=
> 2015-10-01=2015-10-31=all=5mb
> 
> 
> Hibernating relays run from the start of their first period to gauge load.
> Then they start at a random time during the day/month, but early enough that
> they think they'll still use all their bandwidth.
> 
> I wonder if we're seeing another phenomenon? (daily / monthly server 
> restarts?)
> Or we could be seeing hibernation failing to work as intended.

Relays turn on or off all the time. Of all the descriptors seen in a
year, less than 10% are continuously running the whole time. The rest
either started at some time or stopped at some time or both. See an
example here for 2014:

https://people.torproject.org/~dcf/graphs/microdescs/microdescs-2014-short.png
All we're seeing is the distributions of the dates at which the subset
of relays that stopped during the month actually stopped, which seems
pretty uniform. I'll bet that if you look at those relays in the
previous month, they are running at the end of the month, not
hibernating.
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Better relay uptime visualisation

2015-12-07 Thread grarpamp
Can a one be generated covering each year and maybe a five year one.
And three other check sets but sorted left to right by
first online date
FP
AS

As to the actual FP's, all I can think of is including a second text file
with pixel number to FP mappings. Or some "maps" style online zooming.
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev


Re: [tor-dev] Better relay uptime visualisation

2015-12-07 Thread Philipp Winter
On Mon, Dec 07, 2015 at 05:43:01PM -0600, Tom Ritter wrote:
> On 7 December 2015 at 13:51, Philipp Winter  wrote:
> > I spent some time improving the existing relay uptime visualisation [0].
> > Inspired by a research paper [1], the new algorithm uses single-linkage
> > clustering with Pearson's correlation coefficient as distance function.
> > The idea is that relays are grouped next to each other if their uptime
> > (basically a binary sequence) is highly correlated.  Check out the
> > following gallery.  It contains monthly relay uptime images, dating back
> > to 2007:
> > 
> >
> > If you aren't familiar with this type of visualisation: Every image
> > shows the uptime of all Tor relays that were online in a given month.
> > Every row is a consensus and every column is a relay.  White pixels mean
> > that a relay was offline and black pixels means that a relay was
> > online.  Red pixels are used to highlight suspiciously similar clusters.
> 
> That's really cool.  It seems to imply that the majority of the tor
> network stop operating halfway through the month though... Do the
> other tor graphs take into account hibernating relays?  For example, I
> would expect the time-to-download graph would be somewhat affected:
> https://metrics.torproject.org/torperf.html?graph=torperf=2015-10-01=2015-10-31=all=5mb

What I forgot to mention:  In all diagrams, I removed relays that were
always online, because an all-online uptime sequence isn't useful to
find Sybils.  In Nov 2015, for example, we had 10,984 unique relays by
fingerprint and 3,202 (29%) were always online, and are not shown in the
visualisation.

Also, here are the steps to reproduce:

  wget 
https://collector.torproject.org/archive/relay-descriptors/consensuses/consensuses-2015-11.tar.xz
  tar xvJf consensuses-2015-11.tar.xz
  go get git.torproject.org/user/phw/sybilhunter.git
  sybilhunter -data consensuses-2015-11/ -uptime

Cheers,
Philipp
___
tor-dev mailing list
tor-dev@lists.torproject.org
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev