Re: Request for feedback on crypto privacy protections of geolocation data
On 9/9/13 6:13 PM, Brian Smith wrote: I assume by prevents people from tracking individual access points means the following: Some people have a personal access point on them (e.g. in their phone). If somebody knows the SSID and MAC of this personal access point, then they could track this person's location by polling the database for that (SSID, MAC) pair. Tracking a person's movements by polling the database would not be useful because we would probably update the database infrequently (days or weeks). The location database would be generated offline from analysis of many raw measurements submitted by the stumbler app. The tracking scenario that might be viable is a tracker who knows someones MAC address and current SSID and that person moves to a different city or state. The database delay wouldn't matter as much. The hash of hashes scheme tries to protect against that by requiring two neighboring APs. MAC addresses are 48 bits. SSIDs are often guessable or predictable. Therefore, using the H(MAC+SSID) instead of just the plain MAC+SSID is not buying you much in terms of privacy, IMO. Basically, if you are really trying to use this as a privacy mechanism then you should store the MAC+SSID according to best practices for storing passwords. For example, use PBKDF2 with a large number of iterations. Regardless of whether you use SHA1, SHA2, PBKDF2, or something else, I will still call whatever function you use H(x). But, I am not sure that switching to PBKDF2 even buys you much improved privacy protection. The primary motivation for hashing the MAC+SSID was to avoid uploading the SSID (which is considered private data in some European countries) while still using the SSID as sort of weak protection against database pollution from malicious stumblers reporting spoofed MAC addresses. Even if our database will filled with junk MAC address, real clients would probably not see the same combination of MAC and SSID in the real world when they sent a geolocation request to the server. Other layers of privacy protection include filtering out ad-hoc Wi-Fi networks; MAC addresses with vendor prefixes from mobile device manufacters (e.g. Apple and HTC); SSIDs commonly associated with mobile devices (e.g. XXX's iPhone and Google's _nomap opt-out); and APs reported in multiple locations. I think that these things are much more important than the protection offered by H(x). My concern is that if you store the data on the server as H(x) then you will not be able to do the above filtering on the server unless H(x) is ineffective. That seems bad, because the server will be much easier to update to improve the filtering than the clients will be, AFAICT. Also, you will not be able to measure the effectiveness of the privacy protections on the server, which is also very bad. Very good points. We are currently filtering on the stumbler client side. Today, the server just receives mystery hashes with latitude and longitude. Given just MAC addresess, the server could still filter out ad-hoc networks; vendor prefixes for known mobile device manufacturers; and unrecognized vendor prefixes (because some mobile devices supposedly generate a completely random MAC addresses). We would still need to rely on the stumbler to filter SSIDs. We can't upload SSIDs to the server because they are considered private data in some European countries (though MAC addresses, which are more unique, are apparently not considered private data, in a legal sense). We've compiled a list of about 70 SSID prefixes and suffixes we've seen from mobile devices (e.g. Android*, Verizon *, or *'s iPhone). Not all of these mobile devices use ad-hoc MAC addresses. Trivia: over a couple years of my own Wi-Fi stumbling/wardriving in three countries and six US states, I have recorded over 100K unique APs and only eight used Google's _nomap SSID opt-out suffix! chris ___ dev-security mailing list dev-security@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-security
Re: Request for feedback on crypto privacy protections of geolocation data
On 10/09/13 00:58 AM, Chris Peterson wrote: I'm looking for some feedback on crypto privacy protections for a geolocation research project I'm working on with the Mozilla Services team. If you have general questions or suggestions about the project, I'm happy to answer them, but I'd like to focus this thread on crypto. Our team is prototyping a crowd-sourced version of Google's Street View cars to correlate Wi-Fi access points and cell towers to GPS positions. Our primary motivation is to provide non-proprietary location services for Firefox OS devices. If I read this correctly, you want your client devices to figure out where they are, right? If that is the case, why not flip it around. Instead of trying to interpolate the existing data that is broadcast out there, why not write a protocol to broadcast the direct location from the wireless access point? A lot of these routers run Linux, and this is a place where people would be interested in running a new service. A wireless router that broadcasts its geolocation is not a privacy issue. There is no reason why it can't be turned on by default. But anything else requires a horrible mishmash of approaches. To obtain what? Something the wireless can easily tell you directly. iang ___ dev-security mailing list dev-security@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-security
Re: Request for feedback on crypto privacy protections of geolocation data
On 10/09/13 06:05, Chris Peterson wrote: The device would scan for nearby APs and send the hash of each AP's MAC and SSID to our location server. Our server would not need to worry about the hash of hashes pairs because that would only be used for published data. The server would return an estimated latitude, longitude, and accuracy (radius in meters) of the device among the APs. BTW, how does the service figure out the lat/long of an AP? Do we do anything at all with signal strengths? Could we? Gerv ___ dev-security mailing list dev-security@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-security
Re: Request for feedback on crypto privacy protections of geolocation data
On 10/09/13 08:04, Henri Sivonen wrote: 1) Android has a mechanism for detecting when it is connecting to a portable AP provided by another Android device. Can we use the same or a similar detection mechanism to detect portable APs and filter them out? I suspect actually connecting to the APs, as opposed to passively sniffing, might be on the project's big list of NoNos... But if we could, I agree we could find more useful data. location.) Are there any plans for a crowdsourced mechanism for blacklisting such APs? Not sure about crowdsourcing, but I believe they plan to use over-time algorithms for blocking regularly-moving APs. Gerv ___ dev-security mailing list dev-security@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-security
Re: Request for feedback on crypto privacy protections of geolocation data
On 10/09/13 00:25, R. Jason Cronk wrote: Is the data aged? Not AFAIAA. What happens if I move? The raw database notes that you are now being detected in a new location. What happens then is up for debate. I'd argue that if your position was fixed for N months before, and it seems fixed again now, we should assume you have moved house and keep the point in the DB. APs which seem to move a lot, or move regularly, should be excluded. Does this give Mozilla the ability to historically track me if I move my device? Yes; this is why publishing the full raw stumbled data sets is sadly going to be not possible. Our published database would include two tables. The first table would map a random row id to metadata about an anonymous access point: Random1 == AP1.latitude, AP1.longitude, ... Random2 == AP2.latitude, AP2.longitude, ... I would be hesitant to use the word anonymous here. Latlong is easily combine with other publicly available databases that could identify individual address and thus individuals. Again, it comes down to granularity of the data. I'm not sure what threat you are seeing. Can you elaborate? This is just a list of latlongs which have a wireless access point. How can this information assist in identifying individuals or their locations? Gerv ___ dev-security mailing list dev-security@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-security
Re: Request for feedback on crypto privacy protections of geolocation data
On 09/09/13 22:58, Chris Peterson wrote: Google's Location Service prevents people from tracking individual access points by requiring requests to include at least 2-3 access points that Google knows are near each other. This proves the requester is near the access points. Related question: it would be great if there were some way to lift this restriction, at least for the web service if not for the database, while preserving the necessary privacy protections. My family's house, which is in a rural area, has a single access point; I want my phone to know where it is immediately when I'm there. Not everywhere has lots of access points. One thought I had was to allow submission of the MMC/MNC (mobile network IDs) as proof that you were nearby. Unlike Google's Location Service, our server does not store MAC addresses or SSIDs. We identify access points by hash IDs, specifically SHA1(MAC+SSID). To query the location of an access point in the database, you must know both its MAC address and current SSID. I think that this is an excellent idea, for the reasons you articulate later in the thread. Gerv ___ dev-security mailing list dev-security@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-security
Re: Request for feedback on crypto privacy protections of geolocation data
On 10/09/13 10:48, ianG wrote: If that is the case, why not flip it around. Instead of trying to interpolate the existing data that is broadcast out there, why not write a protocol to broadcast the direct location from the wireless access point? Because only a tiny, tiny fraction of devices would run it, and for most of those, the user wouldn't have correctly set the device's location anyway, and for some of them, they'd have set it and then moved. This is a boil the sea approach to the problem. Gerv ___ dev-security mailing list dev-security@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-security
Re: Request for feedback on crypto privacy protections of geolocation data
On 9/10/13 3:46 AM, Gervase Markham wrote: I believe the plan is to have a database of raw findings, then a processed database used by the web service, and a published database which may have even more data reduction. Chris P: can we get permission to store the raw SSID in the _unpublished_ database? SSIDs are considered personal data in some European countries, so we can't collect them without AP owner opt-in. Opt-in is infeasiable, so we can't even collect raw SSIDs. chris ___ dev-security mailing list dev-security@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-security
Fwd: Is there any reason not to enable proxy-autologin by default?
Bug 646452 https://bugzilla.mozilla.org/show_bug.cgi?id=646452 We currently have a signon.autologin.proxy that is disabled by default. When enabled, if a proxy needs a password and that password is saved, Firefox will attempt to authenticate without prompting (and prompt if there is a failure). Is there any reason why we shouldn't enable it by default? I've mentioned some possible reasons in comment 2https://bugzilla.mozilla.org/show_bug.cgi?id=646452#c2 but they don't seem that major to me. If we can't enable it by default, I've proposed a UI preference in Bug 910670 https://bugzilla.mozilla.org/show_bug.cgi?id=910670. Does that make sense? -Manish Goregaokar (:Manishearth) ___ dev-security mailing list dev-security@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-security
Re: Request for feedback on crypto privacy protections of geolocation data
On 9/9/13 6:13 PM, Brian Smith wrote: On Mon, Sep 9, 2013 at 2:58 PM, Chris Peterson cpeter...@mozilla.com wrote: Google's Location Service prevents people from tracking individual access points by requiring requests to include at least 2-3 access points that Google knows are near each other. This proves the requester is near the access points. I assume by prevents people from tracking individual access points means the following: Some people have a personal access point on them (e.g. in their phone). If somebody knows the SSID and MAC of this personal access point, then they could track this person's location by polling the database for that (SSID, MAC) pair. Google tries to limit this type of abuse as much as practical while providing still providing a location service based on such crowdsourced data. Unlike Google's Location Service, our server does not store MAC addresses or SSIDs. We identify access points by hash IDs, specifically SHA1(MAC+SSID). To query the location of an access point in the database, you must know both its MAC address and current SSID. MAC addresses are 48 bits. SSIDs are often guessable or predictable. Therefore, using the H(MAC+SSID) instead of just the plain MAC+SSID is not buying you much in terms of privacy, IMO. Basically, if you are really trying to use this as a privacy mechanism then you should store the MAC+SSID according to best practices for storing passwords. For example, use PBKDF2 with a large number of iterations. Regardless of whether you use SHA1, SHA2, PBKDF2, or something else, I will still call whatever function you use H(x). But, I am not sure that switching to PBKDF2 even buys you much improved privacy protection. Switching to PBKDF2 can buy you a lot of protection from brute forcing the database (specially if it is published as specified). So I would say use PBKDF2 for H and not worry about concatenation vs xoring. H1 = Hash(AP1.MAC + AP1.SSID) H2 = Hash(AP2.MAC + AP2.SSID) Our private database's schema looks something like: Hash(AP1.MAC + AP1.SSID) == AP1.latitude, AP1.longitude, ... Hash(AP2.MAC + AP2.SSID) == AP2.latitude, AP2.longitude, ... This is a pseudonymous data set... which can be problematic ( I would reduce the resolution of each entry so that we can have some k-anonymity here). You could even cluster the locations Our published database would include two tables. The first table would map a random row id to metadata about an anonymous access point: Random1 == AP1.latitude, AP1.longitude, ... Random2 == AP2.latitude, AP2.longitude, ... The second table's primary key would be a hash of hashes. It would map a hash of two neighboring access points' hash IDs to a row id of the first table. Something like: Hash(H1 + H2) == Random1 Hash(H2 + H1) == Random2 Someone querying the published database would need to know the MAC addresses and current SSIDs of two neighboring access points to look up either's location. If this is published as specified there are a couple of attacks I can think of now: 1. If you know lets say org a has ssid Y and uses vendor Z (~18 bits of entropy per AP) you can now lookup your table to determine where all of the locations of that org (~ 2^36 hashes) and given current speeds of asic hashing (~ US$ 1.5K for 63e9 H/s ~= 2^37 H/s) you could do this in less than 1 sec. (penalty for using video cards instead of asic: 100x so two mins). This assuming you are using plain sha1/sha256. 2. If you have now a set of common AP SSID (say fonera) and potential vendors for that system you can now test the closesness of any know location in you exposed list for ~ 2^32 potential MAC's inless than one sec per known location. If you dont know the vendor, think the number of tests would not be greater than 2^38 if you can discard mac address space. This again can the checked in a few secs. 3. From table 2 you can cluster locations of closely located AP and given table 1 you can actually know the exact AP locations from the clusters. You can then focus on the potential locations of interest. So I think publishing table 2 as suggested is a bad idea. I would start with the service first (with 3 AP locations required for high res data) and not the public location store. I would be OK with only 1 AP location for data retrieval if we significantly reduce the resolution of the reply to not less than one degree (at works that is a delta of ~20 miles) and there is more than one AP in that area. Camilo If you know the MAC+SSID of person X's personal access point and the MAC+SSID of person Y's personal access point, then you can use this database to ask the question are person X and person Y in the same location? This seems bad. I see that you attempt to address this below. btw, should we use SHA-2 instead of SHA-1? There is no reason to use SHA-1 when you have SHA-2 available. However, as I indicated above, it isn't clear it is a good idea to be using
Re: Request for feedback on crypto privacy protections of geolocation data
On 9/10/13 3:46 AM, Gervase Markham wrote: Related question: it would be great if there were some way to lift this restriction, at least for the web service if not for the database, while preserving the necessary privacy protections. My family's house, which is in a rural area, has a single access point; I want my phone to know where it is immediately when I'm there. Not everywhere has lots of access points. One thought I had was to allow submission of the MMC/MNC (mobile network IDs) as proof that you were nearby. Our location service (and stumbler) also collects cell data, so we can geolocate with Wi-Fi AP and/or cell data. chris ___ dev-security mailing list dev-security@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-security
Re: Request for feedback on crypto privacy protections of geolocation data
On 10.09.2013, at 03:46 , Gervase Markham g...@mozilla.org wrote: On 10/09/13 10:48, ianG wrote: If that is the case, why not flip it around. Instead of trying to interpolate the existing data that is broadcast out there, why not write a protocol to broadcast the direct location from the wireless access point? Because only a tiny, tiny fraction of devices would run it, and for most of those, the user wouldn't have correctly set the device's location anyway, and for some of them, they'd have set it and then moved. This is a boil the sea approach to the problem. In addition the CDMA cell networks actually have support for reporting the base stations lat/lon as part of the protocol. But in practice these are almost never set, as cell operators value ease of deployment and uniform configuration more than providing this extra service. In another anecdote, mobile operators cannot actually give you lists of all their cell towers and locations - we asked our partners. Thanks to a multitude of subsidiaries, subcontractors and partnerships, they often don't actually know how many cell towers they have and where they are. The same problem applies to the many wifi AP's officially being operated by some large telco. So even where this is possible, it's not actually a practically relevant approach. Hanno ___ dev-security mailing list dev-security@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-security
Re: Request for feedback on crypto privacy protections of geolocation data
On 9/10/13 11:53 AM, Stefan Arentz wrote: I wonder if it makes sense to ban specific MAC address ranges (vendors) from appearing in this database. For example I think it would be possible to detect specific chipsets as being mobile devices vs stationary access points. Our stumbler does some of this. MAC addresses encode whether a network is ad-hoc from another device or an infrastructure access point. Wireshark maintains a list [1] of known vendor OUIs (MAC address prefixes), so we can filter out, say, HTC and Motorola MAC addresses. Filtering Apple's MAC addresses is trickier if we choose to collect desktop and laptop MAC addresses. [1] https://anonsvn.wireshark.org/wireshark/trunk/manuf chris ___ dev-security mailing list dev-security@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-security
Re: Request for feedback on crypto privacy protections of geolocation data
On 10.09.2013, at 03:39 , Gervase Markham g...@mozilla.org wrote: BTW, how does the service figure out the lat/long of an AP? Do we do anything at all with signal strengths? Could we? This is a bit off-topic for the security discussion. I suggest starting a new thread on dev-geolocation, if you want to know more about the technical details. The short answer is: Yes, but it's a lot more complicated than that :) Cheers :) Hanno ___ dev-security mailing list dev-security@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-security
Re: Request for feedback on crypto privacy protections of geolocation data
On Sep 9, 2013, at 9:13 PM, Brian Smith br...@briansmith.org wrote: On Mon, Sep 9, 2013 at 2:58 PM, Chris Peterson cpeter...@mozilla.com wrote: Google's Location Service prevents people from tracking individual access points by requiring requests to include at least 2-3 access points that Google knows are near each other. This proves the requester is near the access points. I assume by prevents people from tracking individual access points means the following: Some people have a personal access point on them (e.g. in their phone). If somebody knows the SSID and MAC of this personal access point, then they could track this person's location by polling the database for that (SSID, MAC) pair. Google tries to limit this type of abuse as much as practical while providing still providing a location service based on such crowdsourced data. I wonder if it makes sense to ban specific MAC address ranges (vendors) from appearing in this database. For example I think it would be possible to detect specific chipsets as being mobile devices vs stationary access points. Also, when I tether my iPhone to my Mac, the Mac shows a different icon next to the network name. I think Android does the same. Maybe at a lower protocol level it is possible to see if an access point is a mobile device? Is that worth investigating? S. ___ dev-security mailing list dev-security@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-security
Re: Request for feedback on crypto privacy protections of geolocation data
On 9/10/2013 3:46 AM, Gervase Markham wrote: On 10/09/13 00:25, R. Jason Cronk wrote: Does this give Mozilla the ability to historically track me if I move my device? Yes; this is why publishing the full raw stumbled data sets is sadly going to be not possible. Why would we have two locations for the same AP? In fact, given the schema Chris outlined (1:1 mapping H(Mac+SSID) = location) I don't see how we even could. -Dan Veditz smime.p7s Description: S/MIME Cryptographic Signature ___ dev-security mailing list dev-security@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-security
Re: Request for feedback on crypto privacy protections of geolocation data
On 9/10/2013 10:09 AM, Hanno Schlichting wrote: As of this moment, we filter out any AP that has been detected in two different places (where different means more than ~1km away from each other). This is very conservative approach and we'll relax that later. What do you mean by filtered out? How are you tracking that it's now been seen in multiple locations? Given the simple storage schema at the top of the thread your choices seem limited to a) ignore the new location info, or b) throw out the old location info. a) means no one can ever move, and b) means the next time you see the new location that becomes the location... over and over as it moves around. That can't be right, so your database must be more complex. If you're storing more than originally implied that may have some impact on a security assessment. -Dan Veditz smime.p7s Description: S/MIME Cryptographic Signature ___ dev-security mailing list dev-security@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-security
Re: Request for feedback on crypto privacy protections of geolocation data
On 11/09/13 03:27 AM, Daniel Veditz wrote: On 9/9/2013 11:21 PM, Chris Peterson wrote: The primary motivation for hashing the MAC+SSID was to avoid uploading the SSID (which is considered private data in some European countries) private means we can't even /look/ at it, rather than merely can't store it? The data regime might be simply put as this: you can't store a number suitable for tracking (or any derivative of it if that simply creates a new tracking number) unless you have a compelling business reason, and you have agreement. The EU data protection regime makes a very strong distinction about any private tracking information. It also goes to another level if you share that information with anyone. The initial simple answer is, don't go there. (I have no idea how google finessed this issue, or even if they didn't.) I believe Europe also considers IP addresses private data, but they certainly don't ban HTTP connections from giving up the IP address to the server as part of a request. That's because IP addresses have to be given up to the server as part of TCP. A compelling case -- packets have to be returned somewhere. However, post-session storage is another issue, and data deletion practices should be in place. Logging is where it gets vexatious. iang ___ dev-security mailing list dev-security@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-security