Re: [Wikitech-l] Anonymous editors IP addresses

2014-07-29 Thread Adam Wight
++the EFF for more ideas, they are actively doing great work on so-called
perfect forward secrecy.

There are simple things we could do to achieve a better balance between
privacy and sockpantsing, such as cryptolog [1], in which IP addresses are
hashed using a salt that changes every day.  In theory, nobody can reverse
the function to reveal the IP, but you can still correlate all of an
address's edits for the day, week, or whatever, making CheckUser possible.

IP range blocking obviously needs to happen up-front, before the IP is
mangled.  I have no suggestions, but maybe browser and preferences
fingerprinting would be more effective anyway, since: tor.

-Adam

[1] https://git.eff.org/?p=cryptolog.git;a=summary


On Fri, Jul 11, 2014 at 8:45 AM, Chris Steipp cste...@wikimedia.org wrote:

 On Friday, July 11, 2014, Daniel Kinzler dan...@brightbyte.de wrote:

  Am 11.07.2014 17:19, schrieb Tyler Romeo:
   Most likely, we would encrypt the IP with AES or something using a
   configuration-based secret key. That way checkusers can still reverse
 the
   hash back into normal IP addresses without having to store the mapping
  in the
   database.
 
  There are two problems with this, I think.
 
  1) No forward secrecy. If that key is ever leaked, all IPs become
 plain.
  And
  it will be, sooner or later. This would probably not be obvious, so this
  feature
  would instill a false sense of security.
 

 This is probably the biggest issue. Even if we hmac it, it's trivial to
 brute force the entire ipv4 (and with intelligent assumptions about
 generation, most of the ipv6) range in seconds, if the key was ever known.


 
  2) No range blocks. It's often quite useful to be able to block a range
 of
  IPs.
  This is an important tool in the fight against spammers, taking it away
  would be
  a problem.
 

 Range blocks, I imagine, would continue working the same way they do.
 Someone would have to identify the correct range (which is very difficult
 when administrators can't see IP's), but on submission, we have the IP
 address to check against the blocks. (Unless someone proposes to store
 block ranges as hashes, that would definitely get rid of range blocks).


 
  -- daniel
 
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org javascript:;
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Anonymous editors IP addresses

2014-07-29 Thread Brian Wolff
but maybe browser and preferences
 fingerprinting would be more effective anyway, since: tor.

Probably not as effective as straight up blocking tor as we do now? :P
(Although seriously - I would love if we didn't block tor like we do
now. However you can't abuse the site with tor when you can't use tor
at all)

I'm somewhat doubtful about fingerprinting (Without doing any research
on it, so I may be out of tune here). We have millions of users,
mostly using commodity software. I'm doubtful we would be able to get
a fingerprint specific enough to uniquely identify a single user. Not
to mention that a sophisticated attacker would probably be able to
easily modify their fingerprint, especially if the fingerprint
criteria is open source [OTOH, a sophisticated attacker can get around
an IP block too].

The cryptolog approach - This has the property that there's a specific
time where all anon identifiers suddenly change (e.g. Midnight every
day in the setup cryptolog uses). Having an arbitrary point in time
where suddenly identifiers shift is probably an unwanted property.
(Although maybe it doesn't matter that much in practice? Someone who
actually deals with abuse on wiki would be better able to answer
that).

I suppose a related approach could be something like
*If this is first time IP edits (recently), make a (pseudo?) random
salt for that IP, throw it in memcached with an expiry time of a week
*Hash the IP with the salt
*Next time IP edits, if salt can be accessed from memcached, use that,
and update the expiry time so that it expires a week from this edit,
otherwise start over with new salt.

This would have the property that if an IP is continuously editing,
their identifier doesn't change, but if they stop editing for a week,
then the identifier switches. Still has the downside that in order for
someone to effectively make a range block they would have to have
checkuser rights (Although perhaps one could make checkuser-lite right
that just exposes IPs of anons, which normal admins get access to).
Also it would be much harder for admins to notice patterns, such as if
a specific subnet seems to be dealing out similar abuse, or if a
specific IP has been blocked once a month for the last 2 years.

--bawolff

On 7/29/14, Adam Wight awi...@wikimedia.org wrote:
 ++the EFF for more ideas, they are actively doing great work on so-called
 perfect forward secrecy.

 There are simple things we could do to achieve a better balance between
 privacy and sockpantsing, such as cryptolog [1], in which IP addresses are
 hashed using a salt that changes every day.  In theory, nobody can reverse
 the function to reveal the IP, but you can still correlate all of an
 address's edits for the day, week, or whatever, making CheckUser possible.

 IP range blocking obviously needs to happen up-front, before the IP is
 mangled.  I have no suggestions, but maybe browser and preferences
 fingerprinting would be more effective anyway, since: tor.

 -Adam

 [1] https://git.eff.org/?p=cryptolog.git;a=summary


 On Fri, Jul 11, 2014 at 8:45 AM, Chris Steipp cste...@wikimedia.org wrote:

 On Friday, July 11, 2014, Daniel Kinzler dan...@brightbyte.de wrote:

  Am 11.07.2014 17:19, schrieb Tyler Romeo:
   Most likely, we would encrypt the IP with AES or something using a
   configuration-based secret key. That way checkusers can still reverse
 the
   hash back into normal IP addresses without having to store the mapping
  in the
   database.
 
  There are two problems with this, I think.
 
  1) No forward secrecy. If that key is ever leaked, all IPs become
 plain.
  And
  it will be, sooner or later. This would probably not be obvious, so this
  feature
  would instill a false sense of security.
 

 This is probably the biggest issue. Even if we hmac it, it's trivial to
 brute force the entire ipv4 (and with intelligent assumptions about
 generation, most of the ipv6) range in seconds, if the key was ever known.


 
  2) No range blocks. It's often quite useful to be able to block a range
 of
  IPs.
  This is an important tool in the fight against spammers, taking it away
  would be
  a problem.
 

 Range blocks, I imagine, would continue working the same way they do.
 Someone would have to identify the correct range (which is very difficult
 when administrators can't see IP's), but on submission, we have the IP
 address to check against the blocks. (Unless someone proposes to store
 block ranges as hashes, that would definitely get rid of range blocks).


 
  -- daniel
 
  ___
  Wikitech-l mailing list
  Wikitech-l@lists.wikimedia.org javascript:;
  https://lists.wikimedia.org/mailman/listinfo/wikitech-l
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 

Re: [Wikitech-l] Anonymous editors IP addresses

2014-07-18 Thread Ricordisamoa

The CC BY-SA license, used on most WMF projects, requires /attribution/.
Attribution for edits made by unregistered/unlogged users is done by the 
exclusive means of their IP address.
By clicking the 'Save' button, they agreed to release their edits under 
CC BY-SA, and that their IP address would have been the only form of 
attribution of their changes to them.
While we can assume that there aren't any collisions between hashes of 
IP addresses, and we could change the attribution requirements for new 
edits, hiding or modifying the way IP addresses /of unregistered users 
who edited before that change/ are shown would be a substantial CC BY-SA 
infringement, as would be a change of registered users' names without 
their consent and without public logs of that change.


Il 11/07/2014 15:34, Gilles Dubuc ha scritto:

This interesting bot showed up on hackernews today:
https://news.ycombinator.com/item?id=8018284

While in this instance the access to anonymous' editors IP addresses is
definitely useful in terms of identifying edits with probable conflict of
interest, it makes me wonder what the history is behind the fact that
anonymous editors are identified by their IP addresses on WMF-hosted wikis.

IP addresses are closely guarded for registered users, why wouldn't
anonymous users be identified by a hash of their IP address in order to
protect their privacy as well? The exact same functionality of being able
to see all edits by a given anonymous IP would still exist, the IP itself
just wouldn't be publicly available, protected with the same access rights
as registered users'.

The use case that makes me think of that is someone living in a
totalitarian regime making a sensitive edit and forgetting that they're
logged out. Or just being unaware that being anonymous on the wiki doesn't
mean that their local authorities can figure out who they are based on IP
address and time. Understanding that they're somewhat protected when logged
in and not when logged out requires a certain level of technical
understanding. The easy way out of this argument is to state that these
users should be using Tor or something similar. But I still wonder why we
have this double standard of protecting registered users' privacy in
regards to IP addresses and not applying the same for anonymous users, when
simple hashing would do the job.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Anonymous editors IP addresses

2014-07-18 Thread Brian Wolff
On 7/18/14, Ricordisamoa ricordisa...@openmailbox.org wrote:
 The CC BY-SA license, used on most WMF projects, requires /attribution/.
 Attribution for edits made by unregistered/unlogged users is done by the
 exclusive means of their IP address.
 By clicking the 'Save' button, they agreed to release their edits under
 CC BY-SA, and that their IP address would have been the only form of
 attribution of their changes to them.
 While we can assume that there aren't any collisions between hashes of
 IP addresses, and we could change the attribution requirements for new
 edits, hiding or modifying the way IP addresses /of unregistered users
 who edited before that change/ are shown would be a substantial CC BY-SA
 infringement, as would be a change of registered users' names without
 their consent and without public logs of that change.

Additionally, if we used the same hash function as for new edits, it
would make it pretty trivial to figure out what most of the hashes
are. I think its safe to say we wouldn't modify old edits. After all,
you can still look at
https://en.wikipedia.org/wiki/Special:Contributions/216.143.215.xxx
despite us not using that scheme anymore.

--bawolff

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Anonymous editors IP addresses

2014-07-15 Thread Gryllida
On Fri, 11 Jul 2014, at 23:34, Gilles Dubuc wrote:
 IP addresses are closely guarded for registered users, why wouldn't
 anonymous users be identified by a hash of their IP address in order to
 protect their privacy as well?

While I don't horribly mind some changes in the direction you're writing, I 
think that:

1) Privacy is defined as The state of being free from unsanctioned intrusion. 
An IP, as a fundamental identifier, has as much to do with privacy as a car 
number you see on a street. (Anyone can look up a name by car number, in my 
area, which I expect to be common.)

Firefox folks are, iirc, considering providing IP-based links in the new tab 
with one of the next releases. These links would include local shops and 
restaurants. I've seen some argue that such decision goes against privacy, 
but I think it's the wrong term.

2) There are other nicer things to enable for anonymous readers that would make 
their editing experience more efficient. Such things include enabling some 
preferences and features for these contributors, which may be useful to a group 
of people editing from one IP:

https://meta.wikimedia.org/wiki/Musings_about_unregistered_contributors#Examples

Gryllida.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Anonymous editors IP addresses

2014-07-15 Thread Nick White
(a little off topic diversion)

On Tue, Jul 15, 2014 at 06:22:17PM +1000, Gryllida wrote: 
 An IP, as a fundamental identifier, has as much to do with privacy 
 as a car number you see on a street. (Anyone can look up a name by 
 car number, in my area, which I expect to be common.)

Actually numberplates were originally conceived as a privacy 
enhancing technology. The first numberplates had peoples' names on 
them, but that was considered too intrusive.

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

[Wikitech-l] Anonymous editors IP addresses

2014-07-11 Thread Gilles Dubuc
This interesting bot showed up on hackernews today:
https://news.ycombinator.com/item?id=8018284

While in this instance the access to anonymous' editors IP addresses is
definitely useful in terms of identifying edits with probable conflict of
interest, it makes me wonder what the history is behind the fact that
anonymous editors are identified by their IP addresses on WMF-hosted wikis.

IP addresses are closely guarded for registered users, why wouldn't
anonymous users be identified by a hash of their IP address in order to
protect their privacy as well? The exact same functionality of being able
to see all edits by a given anonymous IP would still exist, the IP itself
just wouldn't be publicly available, protected with the same access rights
as registered users'.

The use case that makes me think of that is someone living in a
totalitarian regime making a sensitive edit and forgetting that they're
logged out. Or just being unaware that being anonymous on the wiki doesn't
mean that their local authorities can figure out who they are based on IP
address and time. Understanding that they're somewhat protected when logged
in and not when logged out requires a certain level of technical
understanding. The easy way out of this argument is to state that these
users should be using Tor or something similar. But I still wonder why we
have this double standard of protecting registered users' privacy in
regards to IP addresses and not applying the same for anonymous users, when
simple hashing would do the job.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Anonymous editors IP addresses

2014-07-11 Thread Tyler Romeo
I agree that it’s a double standard, but looking at the bright side, it becomes 
a big encouragement to anonymous users to register and log in. The Account 
Creation Experience Team (or whoever the hell is in charge of that) can correct 
me, but I would imagine that we would see a big drop in registered accounts if 
IPs were hashed.

Also, it’d be really annoying to have hashes as usernames, so we’d have to 
think of an alternative scheme that makes things more readable.
-- 
Tyler Romeo
0x405D34A7C86B42DF

From: Gilles Dubuc gil...@wikimedia.org
Reply: Wikimedia developers wikitech-l@lists.wikimedia.org
Date: July 11, 2014 at 9:34:18
To: Wikimedia developers wikitech-l@lists.wikimedia.org
Subject:  [Wikitech-l] Anonymous editors  IP addresses  

This interesting bot showed up on hackernews today:
https://news.ycombinator.com/item?id=8018284

While in this instance the access to anonymous' editors IP addresses is
definitely useful in terms of identifying edits with probable conflict of
interest, it makes me wonder what the history is behind the fact that
anonymous editors are identified by their IP addresses on WMF-hosted wikis.

IP addresses are closely guarded for registered users, why wouldn't
anonymous users be identified by a hash of their IP address in order to
protect their privacy as well? The exact same functionality of being able
to see all edits by a given anonymous IP would still exist, the IP itself
just wouldn't be publicly available, protected with the same access rights
as registered users'.

The use case that makes me think of that is someone living in a
totalitarian regime making a sensitive edit and forgetting that they're
logged out. Or just being unaware that being anonymous on the wiki doesn't
mean that their local authorities can figure out who they are based on IP
address and time. Understanding that they're somewhat protected when logged
in and not when logged out requires a certain level of technical
understanding. The easy way out of this argument is to state that these
users should be using Tor or something similar. But I still wonder why we
have this double standard of protecting registered users' privacy in
regards to IP addresses and not applying the same for anonymous users, when
simple hashing would do the job.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

signature.asc
Description: Message signed with OpenPGP using AMPGpg
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Anonymous editors IP addresses

2014-07-11 Thread Gilles Dubuc

 I would imagine that we would see a big drop in registered accounts if IPs
 were hashed.


Why? Most casual web users don't even know what an IP address is, let alone
what their own address is. In fact the evolution of browsers tends to even
hide the URL. This is the sort of technical information that an
ever-shrinking portion of web users know about these days.

an alternative scheme that makes things more readable


A hash can take many forms. In fact it could be formatted just like an IP
address. Even if the hash format mixes letters and numbers, as long as the
length is similar, I don't see how IP addresses are superior in terms of
readability.


On Fri, Jul 11, 2014 at 10:25 AM, Tyler Romeo tylerro...@gmail.com wrote:

 I agree that it’s a double standard, but looking at the bright side, it
 becomes a big encouragement to anonymous users to register and log in. The
 Account Creation Experience Team (or whoever the hell is in charge of that)
 can correct me, but I would imagine that we would see a big drop in
 registered accounts if IPs were hashed.

 Also, it’d be really annoying to have hashes as usernames, so we’d have to
 think of an alternative scheme that makes things more readable.
 --
 Tyler Romeo
 0x405D34A7C86B42DF

 From: Gilles Dubuc gil...@wikimedia.org gil...@wikimedia.org
 Reply: Wikimedia developers wikitech-l@lists.wikimedia.org
 wikitech-l@lists.wikimedia.org
 Date: July 11, 2014 at 9:34:18
 To: Wikimedia developers wikitech-l@lists.wikimedia.org
 wikitech-l@lists.wikimedia.org
 Subject:  [Wikitech-l] Anonymous editors  IP addresses

 This interesting bot showed up on hackernews today:
 https://news.ycombinator.com/item?id=8018284

 While in this instance the access to anonymous' editors IP addresses is
 definitely useful in terms of identifying edits with probable conflict of
 interest, it makes me wonder what the history is behind the fact that
 anonymous editors are identified by their IP addresses on WMF-hosted wikis.

 IP addresses are closely guarded for registered users, why wouldn't
 anonymous users be identified by a hash of their IP address in order to
 protect their privacy as well? The exact same functionality of being able
 to see all edits by a given anonymous IP would still exist, the IP itself
 just wouldn't be publicly available, protected with the same access rights
 as registered users'.

 The use case that makes me think of that is someone living in a
 totalitarian regime making a sensitive edit and forgetting that they're
 logged out. Or just being unaware that being anonymous on the wiki doesn't
 mean that their local authorities can figure out who they are based on IP
 address and time. Understanding that they're somewhat protected when logged
 in and not when logged out requires a certain level of technical
 understanding. The easy way out of this argument is to state that these
 users should be using Tor or something similar. But I still wonder why we
 have this double standard of protecting registered users' privacy in
 regards to IP addresses and not applying the same for anonymous users, when
 simple hashing would do the job.
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Anonymous editors IP addresses

2014-07-11 Thread Risker
This is one of those perennial proposals that never quite seems to take
off; I can remember having some version of this discussion back in 2008,
and I know that some of our earliest edits show a partially obscured IP
address, not the whole thing. It might require Brion or Tim or someone else
of that length of experience to explain the original thinking.

Some of the pros of keeping the IP address as the username for
unregistered users:

   - Even in this day and age, there are plenty of people with stable IPs;
   they choose to edit as unregistered users for philosophical reasons, and
   their IP's edit history is essentially their own editing history
   - Especially on smaller projects (but also big ones), range blocks are
   usually calculated and applied by administrators, not checkusers/stewards.


Some of the cons of publishing the IP address as the username:

   - Privacy - IPv6 addresses in particular are including more and more
   very specific information that could be used to link RealLife Name with the
   edits. (My own ISP now gives enough information in many cases to narrow
   geolocation down to a one-block radius - a big change from 2 years ago when
   geolocation was about an 800 mile radius.)
   - Privacy - more and more jurisdictions consider a person's IP address
   to be private information.  Our page histories could be considered one
   gigantic privacy violation.
   - Increasingly dynamic IP addresses, often rotating within very large
   ranges that no longer link with any certainty to geolocation
   - Freaked out new users who didn't really get that their IP address was
   going to be very publicly displayed.


I'm pretty sure there are a whole pile more pros and cons that we can pull
out of the archives from various mailing lists, and I know that there have
periodically been discussions amongst developers and the rest of the
engineering team to try to come up with a better way - but like many
other interesting, good and even potentially necessary ideas, it's never
made it to the top of the priority heap.

Putting on my checkuser hat for just a minute...it's essential information
for having any chance at all of identifying multiple accounts or pattern
editing; however, the tables used by checkusers are non-public so
Checkusers continuing to have access to IP data should not be an issue.

Risker/Anne


On 11 July 2014 10:25, Tyler Romeo tylerro...@gmail.com wrote:

 I agree that it’s a double standard, but looking at the bright side, it
 becomes a big encouragement to anonymous users to register and log in. The
 Account Creation Experience Team (or whoever the hell is in charge of that)
 can correct me, but I would imagine that we would see a big drop in
 registered accounts if IPs were hashed.

 Also, it’d be really annoying to have hashes as usernames, so we’d have to
 think of an alternative scheme that makes things more readable.
 --
 Tyler Romeo
 0x405D34A7C86B42DF

 From: Gilles Dubuc gil...@wikimedia.org
 Reply: Wikimedia developers wikitech-l@lists.wikimedia.org
 Date: July 11, 2014 at 9:34:18
 To: Wikimedia developers wikitech-l@lists.wikimedia.org
 Subject:  [Wikitech-l] Anonymous editors  IP addresses

 This interesting bot showed up on hackernews today:
 https://news.ycombinator.com/item?id=8018284

 While in this instance the access to anonymous' editors IP addresses is
 definitely useful in terms of identifying edits with probable conflict of
 interest, it makes me wonder what the history is behind the fact that
 anonymous editors are identified by their IP addresses on WMF-hosted wikis.

 IP addresses are closely guarded for registered users, why wouldn't
 anonymous users be identified by a hash of their IP address in order to
 protect their privacy as well? The exact same functionality of being able
 to see all edits by a given anonymous IP would still exist, the IP itself
 just wouldn't be publicly available, protected with the same access rights
 as registered users'.

 The use case that makes me think of that is someone living in a
 totalitarian regime making a sensitive edit and forgetting that they're
 logged out. Or just being unaware that being anonymous on the wiki doesn't
 mean that their local authorities can figure out who they are based on IP
 address and time. Understanding that they're somewhat protected when logged
 in and not when logged out requires a certain level of technical
 understanding. The easy way out of this argument is to state that these
 users should be using Tor or something similar. But I still wonder why we
 have this double standard of protecting registered users' privacy in
 regards to IP addresses and not applying the same for anonymous users, when
 simple hashing would do the job.
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

 ___
 Wikitech-l mailing list
 Wikitech-l

Re: [Wikitech-l] Anonymous editors IP addresses

2014-07-11 Thread Gilles Dubuc
 Even in this day and age, there are plenty of people with stable IPs


With hashing, a given IP would always give the same hash. So this
uniqueness property would remain for people with stable IPs.


On Fri, Jul 11, 2014 at 10:55 AM, Risker risker...@gmail.com wrote:

 This is one of those perennial proposals that never quite seems to take
 off; I can remember having some version of this discussion back in 2008,
 and I know that some of our earliest edits show a partially obscured IP
 address, not the whole thing. It might require Brion or Tim or someone else
 of that length of experience to explain the original thinking.

 Some of the pros of keeping the IP address as the username for
 unregistered users:

- Even in this day and age, there are plenty of people with stable IPs;
they choose to edit as unregistered users for philosophical reasons, and
their IP's edit history is essentially their own editing history
- Especially on smaller projects (but also big ones), range blocks are
usually calculated and applied by administrators, not
 checkusers/stewards.


 Some of the cons of publishing the IP address as the username:

- Privacy - IPv6 addresses in particular are including more and more
very specific information that could be used to link RealLife Name with
 the
edits. (My own ISP now gives enough information in many cases to narrow
geolocation down to a one-block radius - a big change from 2 years ago
 when
geolocation was about an 800 mile radius.)
- Privacy - more and more jurisdictions consider a person's IP address
to be private information.  Our page histories could be considered one
gigantic privacy violation.
- Increasingly dynamic IP addresses, often rotating within very large
ranges that no longer link with any certainty to geolocation
- Freaked out new users who didn't really get that their IP address was
going to be very publicly displayed.


 I'm pretty sure there are a whole pile more pros and cons that we can pull
 out of the archives from various mailing lists, and I know that there have
 periodically been discussions amongst developers and the rest of the
 engineering team to try to come up with a better way - but like many
 other interesting, good and even potentially necessary ideas, it's never
 made it to the top of the priority heap.

 Putting on my checkuser hat for just a minute...it's essential information
 for having any chance at all of identifying multiple accounts or pattern
 editing; however, the tables used by checkusers are non-public so
 Checkusers continuing to have access to IP data should not be an issue.

 Risker/Anne


 On 11 July 2014 10:25, Tyler Romeo tylerro...@gmail.com wrote:

  I agree that it’s a double standard, but looking at the bright side, it
  becomes a big encouragement to anonymous users to register and log in.
 The
  Account Creation Experience Team (or whoever the hell is in charge of
 that)
  can correct me, but I would imagine that we would see a big drop in
  registered accounts if IPs were hashed.
 
  Also, it’d be really annoying to have hashes as usernames, so we’d have
 to
  think of an alternative scheme that makes things more readable.
  --
  Tyler Romeo
  0x405D34A7C86B42DF
 
  From: Gilles Dubuc gil...@wikimedia.org
  Reply: Wikimedia developers wikitech-l@lists.wikimedia.org
  Date: July 11, 2014 at 9:34:18
  To: Wikimedia developers wikitech-l@lists.wikimedia.org
  Subject:  [Wikitech-l] Anonymous editors  IP addresses
 
  This interesting bot showed up on hackernews today:
  https://news.ycombinator.com/item?id=8018284
 
  While in this instance the access to anonymous' editors IP addresses is
  definitely useful in terms of identifying edits with probable conflict of
  interest, it makes me wonder what the history is behind the fact that
  anonymous editors are identified by their IP addresses on WMF-hosted
 wikis.
 
  IP addresses are closely guarded for registered users, why wouldn't
  anonymous users be identified by a hash of their IP address in order to
  protect their privacy as well? The exact same functionality of being able
  to see all edits by a given anonymous IP would still exist, the IP itself
  just wouldn't be publicly available, protected with the same access
 rights
  as registered users'.
 
  The use case that makes me think of that is someone living in a
  totalitarian regime making a sensitive edit and forgetting that they're
  logged out. Or just being unaware that being anonymous on the wiki
 doesn't
  mean that their local authorities can figure out who they are based on IP
  address and time. Understanding that they're somewhat protected when
 logged
  in and not when logged out requires a certain level of technical
  understanding. The easy way out of this argument is to state that these
  users should be using Tor or something similar. But I still wonder why we
  have this double standard of protecting registered users' privacy

Re: [Wikitech-l] Anonymous editors IP addresses

2014-07-11 Thread Tyler Romeo
As a quick implementation note, we would not be using a hash for the IP address.

Most likely, we would encrypt the IP with AES or something using a 
configuration-based secret key. That way checkusers can still reverse the hash 
back into normal IP addresses without having to store the mapping in the 
database.

-- 
Tyler Romeo
0x405D34A7C86B42DF

From: Gilles Dubuc gil...@wikimedia.org
Reply: Wikimedia developers wikitech-l@lists.wikimedia.org
Date: July 11, 2014 at 10:59:55
To: Wikimedia developers wikitech-l@lists.wikimedia.org
Subject:  Re: [Wikitech-l] Anonymous editors  IP addresses  

With hashing, a given IP would always give the same hash. So this
uniqueness property would remain for people with stable IPs.
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Anonymous editors IP addresses

2014-07-11 Thread Ole Palnatoke Andersen
To my knowledge, there are currently six of these Twitter bots
(Canada, Denmark, France, Sweden, UK, US). I have collected them in a
Twitter list: https://twitter.com/palnatoke/lists/wikiedit

Please speak up if you notice more, so I can include them in the list, too.


Regards,
Ole

On Fri, Jul 11, 2014 at 3:34 PM, Gilles Dubuc gil...@wikimedia.org wrote:
 This interesting bot showed up on hackernews today:
 https://news.ycombinator.com/item?id=8018284

 While in this instance the access to anonymous' editors IP addresses is
 definitely useful in terms of identifying edits with probable conflict of
 interest, it makes me wonder what the history is behind the fact that
 anonymous editors are identified by their IP addresses on WMF-hosted wikis.

 IP addresses are closely guarded for registered users, why wouldn't
 anonymous users be identified by a hash of their IP address in order to
 protect their privacy as well? The exact same functionality of being able
 to see all edits by a given anonymous IP would still exist, the IP itself
 just wouldn't be publicly available, protected with the same access rights
 as registered users'.

 The use case that makes me think of that is someone living in a
 totalitarian regime making a sensitive edit and forgetting that they're
 logged out. Or just being unaware that being anonymous on the wiki doesn't
 mean that their local authorities can figure out who they are based on IP
 address and time. Understanding that they're somewhat protected when logged
 in and not when logged out requires a certain level of technical
 understanding. The easy way out of this argument is to state that these
 users should be using Tor or something similar. But I still wonder why we
 have this double standard of protecting registered users' privacy in
 regards to IP addresses and not applying the same for anonymous users, when
 simple hashing would do the job.
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l



-- 
http://palnatoke.org * @palnatoke * +4522934588

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Anonymous editors IP addresses

2014-07-11 Thread Daniel Kinzler
Am 11.07.2014 17:19, schrieb Tyler Romeo:
 Most likely, we would encrypt the IP with AES or something using a
 configuration-based secret key. That way checkusers can still reverse the
 hash back into normal IP addresses without having to store the mapping in the
 database.

There are two problems with this, I think.

1) No forward secrecy. If that key is ever leaked, all IPs become plain. And
it will be, sooner or later. This would probably not be obvious, so this feature
would instill a false sense of security.

2) No range blocks. It's often quite useful to be able to block a range of IPs.
This is an important tool in the fight against spammers, taking it away would be
a problem.

-- daniel

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Anonymous editors IP addresses

2014-07-11 Thread Chris Steipp
On Friday, July 11, 2014, Daniel Kinzler dan...@brightbyte.de wrote:

 Am 11.07.2014 17:19, schrieb Tyler Romeo:
  Most likely, we would encrypt the IP with AES or something using a
  configuration-based secret key. That way checkusers can still reverse the
  hash back into normal IP addresses without having to store the mapping
 in the
  database.

 There are two problems with this, I think.

 1) No forward secrecy. If that key is ever leaked, all IPs become plain.
 And
 it will be, sooner or later. This would probably not be obvious, so this
 feature
 would instill a false sense of security.


This is probably the biggest issue. Even if we hmac it, it's trivial to
brute force the entire ipv4 (and with intelligent assumptions about
generation, most of the ipv6) range in seconds, if the key was ever known.



 2) No range blocks. It's often quite useful to be able to block a range of
 IPs.
 This is an important tool in the fight against spammers, taking it away
 would be
 a problem.


Range blocks, I imagine, would continue working the same way they do.
Someone would have to identify the correct range (which is very difficult
when administrators can't see IP's), but on submission, we have the IP
address to check against the blocks. (Unless someone proposes to store
block ranges as hashes, that would definitely get rid of range blocks).



 -- daniel

 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org javascript:;
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Anonymous editors IP addresses

2014-07-11 Thread Brion Vibber
On Friday, July 11, 2014, Risker risker...@gmail.com wrote:

 This is one of those perennial proposals that never quite seems to take
 off; I can remember having some version of this discussion back in 2008,
 and I know that some of our earliest edits show a partially obscured IP
 address, not the whole thing. It might require Brion or Tim or someone else
 of that length of experience to explain the original thinking.


As I recall, UseModWiki (the perl-based wiki software we used before
switching to a custom solution which evolved into MediaWiki) obscured the
last octet of the IP address, which still left you with enough information
in most cases to track down an ISP or school/business/govt institution. I
think UseMod also exposed the IP addresses of logged-in users, but the way
logins worked were very different and it was possible to set your name to
someone else's name or some such oddities...

I'm not sure offhand if there was explicit discussion of switching to not
obscuring the last octet in the PHP software/nascent MediaWiki... But this
was back in 2001 when the internet was a little younger and everybody was
spewing their IP addresses all over their email and newsgroup posts too.
Folks are a lot more paranoid about that today.


In general I favor migrating away from publicly exposing IP addresses, but
not sure to what exactly would be best... I kinda like the idea of an
anonymous-but-consistent proto-account that can be transformed into a
named login if desired, but it needs to be thought out in more detail to
resolve potential difficulties.

-- brion
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Anonymous editors IP addresses

2014-07-11 Thread Gilles Dubuc
 I kinda like the idea of an
 anonymous-but-consistent proto-account that can be transformed into a
 named login if desired, but it needs to be thought out in more detail to
 resolve potential difficulties.


One could automatically create a pseudo-account (Anonymous #12345) upon
first edit. And that account would always be authenticated automaticallly
upon future edits coming from the same IP address. I don't think it should
be allowed to turn those pseudo-accounts into proper accounts, though,
they'd be marked as anonymous pseudo-accounts forever. Otherwise having a
way to upgrade to a proper account while conserving edits which were
potentially written by other people could get hairy, especially from a
legal standpoint.

Maybe it's a cookie-based approach you had in mind? Where we automatically
create an account tied to the user agent. That would mitigate the issue of
converting a pseudo-account that might have been shared between several
people to a proper account, but not completely get rid of it.


On Fri, Jul 11, 2014 at 11:45 AM, Brion Vibber bvib...@wikimedia.org
wrote:

 On Friday, July 11, 2014, Risker risker...@gmail.com wrote:

  This is one of those perennial proposals that never quite seems to take
  off; I can remember having some version of this discussion back in 2008,
  and I know that some of our earliest edits show a partially obscured IP
  address, not the whole thing. It might require Brion or Tim or someone
 else
  of that length of experience to explain the original thinking.


 As I recall, UseModWiki (the perl-based wiki software we used before
 switching to a custom solution which evolved into MediaWiki) obscured the
 last octet of the IP address, which still left you with enough information
 in most cases to track down an ISP or school/business/govt institution. I
 think UseMod also exposed the IP addresses of logged-in users, but the way
 logins worked were very different and it was possible to set your name to
 someone else's name or some such oddities...

 I'm not sure offhand if there was explicit discussion of switching to not
 obscuring the last octet in the PHP software/nascent MediaWiki... But this
 was back in 2001 when the internet was a little younger and everybody was
 spewing their IP addresses all over their email and newsgroup posts too.
 Folks are a lot more paranoid about that today.


 In general I favor migrating away from publicly exposing IP addresses, but
 not sure to what exactly would be best... I kinda like the idea of an
 anonymous-but-consistent proto-account that can be transformed into a
 named login if desired, but it needs to be thought out in more detail to
 resolve potential difficulties.

 -- brion
 ___
 Wikitech-l mailing list
 Wikitech-l@lists.wikimedia.org
 https://lists.wikimedia.org/mailman/listinfo/wikitech-l

___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Anonymous editors IP addresses

2014-07-11 Thread Happy Melon
On 11 July 2014 17:10, Gilles Dubuc gil...@wikimedia.org wrote:


 Maybe it's a cookie-based approach you had in mind? Where we automatically
 create an account tied to the user agent. That would mitigate the issue of
 converting a pseudo-account that might have been shared between several
 people to a proper account, but not completely get rid of it.


I'd have thought the chain of events go to a library computer, do some
edits, decide to upgrade to a real account, do so, realise you've
inadvertently swept up all the unsalubrious penis vandalism that has been
made on that computer previously would be unacceptably common.

--HM
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Anonymous editors IP addresses

2014-07-11 Thread Matthew Flaschen

On 07/11/2014 11:45 AM, Brion Vibber wrote:

As I recall, UseModWiki (the perl-based wiki software we used before
switching to a custom solution which evolved into MediaWiki) obscured the
last octet of the IP address, which still left you with enough information
in most cases to track down an ISP or school/business/govt institution. I
think UseMod also exposed the IP addresses of logged-in users, but the way
logins worked were very different and it was possible to set your name to
someone else's name or some such oddities...


Yeah, the main benefit to the current setup (which probably doesn't 
really require the last octet in most cases) is detecting casual abuse, 
which includes (but is not limited to) both blatant vandalism and 
conflict of interest edits.  (People have lunch breaks, and I don't 
claim every edit from a organizational IP is a conflict of interest, but 
many true COI edits have been caught this way).


If we look into something like proto-accounts or hashing or such, it 
would be good to try to maintain this benefit (do the lookup on the 
server, and expose who the IP block belongs to?), but I don't know if 
it's possible to have it both ways.


Matt Flaschen


___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l