Re: [freenet-dev] The current store size stats attack and Pitch Black

2013-03-01 Thread Matthew Toseland
On Wednesday 27 Feb 2013 18:54:34 Matthew Toseland wrote:
 operhiem1's graphs of probed total datastore size have been attacked recently 
 by nodes returning bogus store sizes (in the multi-petabyte range). This 
 caused a sudden jump in store sizes on the total store size graph. He 
 excluded outliers, and the spike went away, but now it's come back.
 
 The simplest explanation is that the person whose nodes are returning the 
 bogus stats has hacked their node to return bogus datastore stats even when 
 it is relaying a probe request. Given we use fairly high HTLs (30?) for 
 probes, this can affect enough traffic to have a big impact on stats.
 
 Total store size stats don't matter that much, but we need to use probe stats 
 for a couple of things that do:
 1. Pitch Black prevention will require probing for the typical distance 
 between a node and its peers. Granted on darknet it's harder for an attacker 
 to have a significant number of edges / nodes distributed across the keyspace.
 2. I would like to be able to test empirically whether a given change works. 
 Overall performance fluctuates too wildly based on too many factors, so 
 probing random nodes for a single statistic (e.g. the proportion of requests 
 rejected) seems the best way to sanity check a network-level change. If the 
 stats can be perverted this easily then we can't rely on them, so empiricism 
 doesn't work.
 
 So how can we deal with this problem?
 
 We can safely get stats from a randomly chosen target location, by routing 
 several parts of a probe request randomly and then towards that location. The 
 main problems with this are:
 - It gives too much control. Probes are supposed to be random.
 - A random location may not be a random node, e.g. for Pitch Black 
 countermeasures when we are being attacked.
 
 For empiricism I guess we probably want to just have a relatively small 
 number of trusted nodes which insert their stats regularly - canary nodes?
 
Turns out this is mostly a false alarm; operhiem1's stats weren't excluding 
outliers at all, the sample just fell out of the running average, then came 
back. Also the reject-probe stats shouldn't be significantly impacted by this, 
since they have a limited range.

However it's still worth looking at this.


signature.asc
Description: This is a digitally signed message part.
___
Devl mailing list
Devl@freenetproject.org
https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Re: [freenet-dev] The current store size stats attack and Pitch Black

2013-02-28 Thread Matthew Toseland
On Wednesday 27 Feb 2013 19:40:49 Matthew Toseland wrote:
 On Wednesday 27 Feb 2013 18:54:34 Matthew Toseland wrote:
  operhiem1's graphs of probed total datastore size have been attacked 
  recently by nodes returning bogus store sizes (in the multi-petabyte 
  range). This caused a sudden jump in store sizes on the total store size 
  graph. He excluded outliers, and the spike went away, but now it's come 
  back.
  
  The simplest explanation is that the person whose nodes are returning the 
  bogus stats has hacked their node to return bogus datastore stats even when 
  it is relaying a probe request. Given we use fairly high HTLs (30?) for 
  probes, this can affect enough traffic to have a big impact on stats.
  
  Total store size stats don't matter that much, but we need to use probe 
  stats for a couple of things that do:
  1. Pitch Black prevention will require probing for the typical distance 
  between a node and its peers. Granted on darknet it's harder for an 
  attacker to have a significant number of edges / nodes distributed across 
  the keyspace.
  2. I would like to be able to test empirically whether a given change 
  works. Overall performance fluctuates too wildly based on too many factors, 
  so probing random nodes for a single statistic (e.g. the proportion of 
  requests rejected) seems the best way to sanity check a network-level 
  change. If the stats can be perverted this easily then we can't rely on 
  them, so empiricism doesn't work.
  
  So how can we deal with this problem?
  
  We can safely get stats from a randomly chosen target location, by routing 
  several parts of a probe request randomly and then towards that location. 
  The main problems with this are:
  - It gives too much control. Probes are supposed to be random.
  - A random location may not be a random node, e.g. for Pitch Black 
  countermeasures when we are being attacked.
  
  For empiricism I guess we probably want to just have a relatively small 
  number of trusted nodes which insert their stats regularly - canary nodes?
  
 Preliminary conclusions, talking to digger3:
 
 There are 3 use cases.
 
 1) Empirical confirmation when we do a build that changes something. Measure 
 something to see if it worked. *NOT* overall performance, low level stuff 
 that should show a big change.
 = We can use canary nodes for this, run by people we trust. Some will need 
 to run artificial configs, and they're probably not representative of the 
 network as a whole.
 = TODO: We should try to organise this explicitly, preferably before trying 
 the planned AIMD changes...
 2) Pitch Black location distance detection.
 = Probably OK, because it's hard to get a lot of nodes in random places on 
 the keyspace on darknet.
 3) General stats: Datastore, bandwidth, link length distributions, etc. This 
 stuff can and should affect development.
 = This is much harder. *Maybe* fetch from a random location, but even there 
 it's problematic?
 = We can however improve this significantly by discarding a larger number of 
 outliers.
 Given that probes have HTL 30, and assuming opennet so nodes are randomly 
 distributed:
 10 nodes could corrupt 5% of probes
 21 nodes could corrupt 10% of probes
 44 nodes could corrupt 20% of probes.
 
 Also note that it depends on what the stat is - the probe request stats are a 
 percentage from 0 to 100, so much less vulnerable than datastore size, which 
 can be *big*.
 
One proposal: use low HTL probes from each node: (possibly combined with 
central reporting, possibly not)

https://bugs.freenetproject.org/view.php?id=5643


signature.asc
Description: This is a digitally signed message part.
___
Devl mailing list
Devl@freenetproject.org
https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Re: [freenet-dev] The current store size stats attack and Pitch Black

2013-02-28 Thread Michael Grube
I haven't had too much time to think about this. How would centralized
reporting work? Seems like a malicious person could have a bunch of nodes
join and simply report bad stats.

Just my feedback. I'll try to have some kind of decent response in the next
24 hours.

On Thu, Feb 28, 2013 at 9:31 AM, Matthew Toseland t...@amphibian.dyndns.org
 wrote:

 On Wednesday 27 Feb 2013 19:40:49 Matthew Toseland wrote:
  On Wednesday 27 Feb 2013 18:54:34 Matthew Toseland wrote:
   operhiem1's graphs of probed total datastore size have been attacked
 recently by nodes returning bogus store sizes (in the multi-petabyte
 range). This caused a sudden jump in store sizes on the total store size
 graph. He excluded outliers, and the spike went away, but now it's come
 back.
  
   The simplest explanation is that the person whose nodes are returning
 the bogus stats has hacked their node to return bogus datastore stats even
 when it is relaying a probe request. Given we use fairly high HTLs (30?)
 for probes, this can affect enough traffic to have a big impact on stats.
  
   Total store size stats don't matter that much, but we need to use
 probe stats for a couple of things that do:
   1. Pitch Black prevention will require probing for the typical
 distance between a node and its peers. Granted on darknet it's harder for
 an attacker to have a significant number of edges / nodes distributed
 across the keyspace.
   2. I would like to be able to test empirically whether a given change
 works. Overall performance fluctuates too wildly based on too many factors,
 so probing random nodes for a single statistic (e.g. the proportion of
 requests rejected) seems the best way to sanity check a network-level
 change. If the stats can be perverted this easily then we can't rely on
 them, so empiricism doesn't work.
  
   So how can we deal with this problem?
  
   We can safely get stats from a randomly chosen target location, by
 routing several parts of a probe request randomly and then towards that
 location. The main problems with this are:
   - It gives too much control. Probes are supposed to be random.
   - A random location may not be a random node, e.g. for Pitch Black
 countermeasures when we are being attacked.
  
   For empiricism I guess we probably want to just have a relatively
 small number of trusted nodes which insert their stats regularly - canary
 nodes?
  
  Preliminary conclusions, talking to digger3:
 
  There are 3 use cases.
 
  1) Empirical confirmation when we do a build that changes something.
 Measure something to see if it worked. *NOT* overall performance, low level
 stuff that should show a big change.
  = We can use canary nodes for this, run by people we trust. Some will
 need to run artificial configs, and they're probably not representative of
 the network as a whole.
  = TODO: We should try to organise this explicitly, preferably before
 trying the planned AIMD changes...
  2) Pitch Black location distance detection.
  = Probably OK, because it's hard to get a lot of nodes in random places
 on the keyspace on darknet.
  3) General stats: Datastore, bandwidth, link length distributions, etc.
 This stuff can and should affect development.
  = This is much harder. *Maybe* fetch from a random location, but even
 there it's problematic?
  = We can however improve this significantly by discarding a larger
 number of outliers.
  Given that probes have HTL 30, and assuming opennet so nodes are
 randomly distributed:
  10 nodes could corrupt 5% of probes
  21 nodes could corrupt 10% of probes
  44 nodes could corrupt 20% of probes.
 
  Also note that it depends on what the stat is - the probe request stats
 are a percentage from 0 to 100, so much less vulnerable than datastore
 size, which can be *big*.
 
 One proposal: use low HTL probes from each node: (possibly combined with
 central reporting, possibly not)

 https://bugs.freenetproject.org/view.php?id=5643

___
Devl mailing list
Devl@freenetproject.org
https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Re: [freenet-dev] The current store size stats attack and Pitch Black

2013-02-28 Thread Matthew Toseland
On Thursday 28 Feb 2013 14:34:58 Michael Grube wrote:
 I haven't had too much time to think about this. How would centralized
 reporting work? Seems like a malicious person could have a bunch of nodes
 join and simply report bad stats.

Right, sorry. What I meant was we might have the canary nodes - nodes run by 
people we trust - report aggregated stats, or just ask individual users. 
Obviously anything would need to be hard to spam.

There was a proposal on FMS to upload stats with a CAPTCHA...
 
 Just my feedback. I'll try to have some kind of decent response in the next
 24 hours.
 
 On Thu, Feb 28, 2013 at 9:31 AM, Matthew Toseland t...@amphibian.dyndns.org
  wrote:
 
  On Wednesday 27 Feb 2013 19:40:49 Matthew Toseland wrote:
   On Wednesday 27 Feb 2013 18:54:34 Matthew Toseland wrote:
operhiem1's graphs of probed total datastore size have been attacked
  recently by nodes returning bogus store sizes (in the multi-petabyte
  range). This caused a sudden jump in store sizes on the total store size
  graph. He excluded outliers, and the spike went away, but now it's come
  back.
   
The simplest explanation is that the person whose nodes are returning
  the bogus stats has hacked their node to return bogus datastore stats even
  when it is relaying a probe request. Given we use fairly high HTLs (30?)
  for probes, this can affect enough traffic to have a big impact on stats.
   
Total store size stats don't matter that much, but we need to use
  probe stats for a couple of things that do:
1. Pitch Black prevention will require probing for the typical
  distance between a node and its peers. Granted on darknet it's harder for
  an attacker to have a significant number of edges / nodes distributed
  across the keyspace.
2. I would like to be able to test empirically whether a given change
  works. Overall performance fluctuates too wildly based on too many factors,
  so probing random nodes for a single statistic (e.g. the proportion of
  requests rejected) seems the best way to sanity check a network-level
  change. If the stats can be perverted this easily then we can't rely on
  them, so empiricism doesn't work.
   
So how can we deal with this problem?
   
We can safely get stats from a randomly chosen target location, by
  routing several parts of a probe request randomly and then towards that
  location. The main problems with this are:
- It gives too much control. Probes are supposed to be random.
- A random location may not be a random node, e.g. for Pitch Black
  countermeasures when we are being attacked.
   
For empiricism I guess we probably want to just have a relatively
  small number of trusted nodes which insert their stats regularly - canary
  nodes?
   
   Preliminary conclusions, talking to digger3:
  
   There are 3 use cases.
  
   1) Empirical confirmation when we do a build that changes something.
  Measure something to see if it worked. *NOT* overall performance, low level
  stuff that should show a big change.
   = We can use canary nodes for this, run by people we trust. Some will
  need to run artificial configs, and they're probably not representative of
  the network as a whole.
   = TODO: We should try to organise this explicitly, preferably before
  trying the planned AIMD changes...
   2) Pitch Black location distance detection.
   = Probably OK, because it's hard to get a lot of nodes in random places
  on the keyspace on darknet.
   3) General stats: Datastore, bandwidth, link length distributions, etc.
  This stuff can and should affect development.
   = This is much harder. *Maybe* fetch from a random location, but even
  there it's problematic?
   = We can however improve this significantly by discarding a larger
  number of outliers.
   Given that probes have HTL 30, and assuming opennet so nodes are
  randomly distributed:
   10 nodes could corrupt 5% of probes
   21 nodes could corrupt 10% of probes
   44 nodes could corrupt 20% of probes.
  
   Also note that it depends on what the stat is - the probe request stats
  are a percentage from 0 to 100, so much less vulnerable than datastore
  size, which can be *big*.
  
  One proposal: use low HTL probes from each node: (possibly combined with
  central reporting, possibly not)
 
  https://bugs.freenetproject.org/view.php?id=5643
 
 


signature.asc
Description: This is a digitally signed message part.
___
Devl mailing list
Devl@freenetproject.org
https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

[freenet-dev] The current store size stats attack and Pitch Black

2013-02-27 Thread Matthew Toseland
operhiem1's graphs of probed total datastore size have been attacked recently 
by nodes returning bogus store sizes (in the multi-petabyte range). This caused 
a sudden jump in store sizes on the total store size graph. He excluded 
outliers, and the spike went away, but now it's come back.

The simplest explanation is that the person whose nodes are returning the bogus 
stats has hacked their node to return bogus datastore stats even when it is 
relaying a probe request. Given we use fairly high HTLs (30?) for probes, this 
can affect enough traffic to have a big impact on stats.

Total store size stats don't matter that much, but we need to use probe stats 
for a couple of things that do:
1. Pitch Black prevention will require probing for the typical distance between 
a node and its peers. Granted on darknet it's harder for an attacker to have a 
significant number of edges / nodes distributed across the keyspace.
2. I would like to be able to test empirically whether a given change works. 
Overall performance fluctuates too wildly based on too many factors, so probing 
random nodes for a single statistic (e.g. the proportion of requests rejected) 
seems the best way to sanity check a network-level change. If the stats can be 
perverted this easily then we can't rely on them, so empiricism doesn't work.

So how can we deal with this problem?

We can safely get stats from a randomly chosen target location, by routing 
several parts of a probe request randomly and then towards that location. The 
main problems with this are:
- It gives too much control. Probes are supposed to be random.
- A random location may not be a random node, e.g. for Pitch Black 
countermeasures when we are being attacked.

For empiricism I guess we probably want to just have a relatively small number 
of trusted nodes which insert their stats regularly - canary nodes?


signature.asc
Description: This is a digitally signed message part.
___
Devl mailing list
Devl@freenetproject.org
https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl

Re: [freenet-dev] The current store size stats attack and Pitch Black

2013-02-27 Thread Matthew Toseland
On Wednesday 27 Feb 2013 18:54:34 Matthew Toseland wrote:
 operhiem1's graphs of probed total datastore size have been attacked recently 
 by nodes returning bogus store sizes (in the multi-petabyte range). This 
 caused a sudden jump in store sizes on the total store size graph. He 
 excluded outliers, and the spike went away, but now it's come back.

http://127.0.0.1:/USK@pxtehd-TmfJwyNUAW2Clk4pwv7Nshyg21NNfXcqzFv4,LTjcTWqvsq3ju6pMGe9Cqb3scvQgECG81hRdgj5WO4s,AQACAAE/statistics/174/

http://asksteved.com/stats/


signature.asc
Description: This is a digitally signed message part.
___
Devl mailing list
Devl@freenetproject.org
https://emu.freenetproject.org/cgi-bin/mailman/listinfo/devl