Re: [ietf-privacy] [saag] Fwd: WGLC for draft-ietf-tzdist-service-05
Hi Daniel, Thanks again for this review. I think there are several categories of issues that you've raised, and I'd like to break them out. The first is the easy category: that which has been raised before and considered. There is only one issue in that category as to whether or not everything should run atop TLS. That issue need not be reconsidered, EXCEPT in as much as you have clarified the context (c2s versus s2s). There is another category of that which really must change. Most (if not all) of what you mark as security is in this category. The downgrade attack that is possible in the current text also does not match my understanding of the consensus of the working group. That is- either a client uses HTTP or it uses HTTPS, but it may not try HTTPS first, and then back off to HTTP. Do. Do not. There is no try. The other issues break down into two groups, I think. Authenticated sessions versus unauthenticated sessions. Because authentication is allowed, and because we do not specify a provisioning mechanism, likely there will be linkage between services. But we should also be mindful of what mitigations are truly available to us with regard to the other functions. Eliot On 1/30/15 3:13 AM, Daniel Kahn Gillmor wrote: Hi Daniel and Elliot-- On Wed 2015-01-28 14:24:28 -0500, Daniel Migault wrote: Our document describing Time Zone Data Distribution Service http://tools.ietf.org/html/draft-ietf-tzdist-service-05 [1] is close to be finalized and we would like to proceed to cross area review. We would greatly appreciate to get review by February 11. [...] [1] http://tools.ietf.org/html/draft-ietf-tzdist-service-05 Thanks for your work on this. This is the first time i've seen this draft; apologies for not looking at it earlier. I'm only subscribed to s...@ietf.org (and ietf-privacy, which is idle lately, but i've included here because some of my review touches on privacy), so this post might not make it through to tzd...@ietf.org -- feel free to forward it as needed. I did a quick skim here with my security and privacy hats on, and have a few comments: (privacy) Privacy Considerations section is missing === There is *no* Privacy Considerations section in the draft at all. Please read RFC 6973 for guidance in conducting a privacy review of the protocol. The act of querying these servers leaks something about the location of the person doing the query, at least, and may leak information about other locations that they're interested in. It's also possible that regular attempts to query this information will provide a linkable trail of the user, which could then be (mis)used without their knowledge or permission. Here's an attempt at a quick analysis, though i haven't thought through the protocol in detail. I hope you'll do your own analysis, and you're welcome to take any of mine: Implausibly: if the average user is interested in 5 timezones, and there are 774 known zones (find /usr/share/zoneinfo -type f | wc), and those interests were evenly distributed across the zones for every users, then the set of requests to update an individual's preferred timezones yields nearly 50 bits of entropy, far more than enough to distinguish every individual human from each other. More plausibly: timezone interest is probably less than 5 for most people, and it isn't evenly distributed: the people who are interested in Americas/New_York are more likely to be interested in Americas/Los_Angeles than in Arctic/Longyearbyen. But anyone with an unusual set of TZs can probably be identified (perhaps uniquely) by any provider they talk to just by what TZs they ask for. Since §4.1.4 says Clients SHOULD poll for changes, using an appropriate conditional request, at least once a day, a malicious provider intent on surveilling its users and with a mechanism to do so would have a daily checkin. I imagine this as some kind of background system service looking for updates. the daily checkin could be used to track a user's movements around the network, if their device is not stationary. The time of checkin could also be used as a linking mechanism, if the machine polls with rigid regularity. Are there strategies that someone interested in preserving their anonymity from a tzdata provider should take to remain anonymous? If so, what are they? (privacy) HTTP pipelining? == Clients requesting multiple unusual TZs together are more easily identifiable to servers, than clients who request only one. Should clients request all their interested TZs at once, or spread out their polling updates over time? HTTP pipelining is clearly more efficient; but what are the privacy implications if you have a system service that does this? (privacy) HTTP Cookies? === The choice of HTTP transport also allows for servers to set cookies in clients -- should clients
Re: [ietf-privacy] [saag] Fwd: WGLC for draft-ietf-tzdist-service-05
Hi Daniel and Elliot-- On Wed 2015-01-28 14:24:28 -0500, Daniel Migault wrote: Our document describing Time Zone Data Distribution Service http://tools.ietf.org/html/draft-ietf-tzdist-service-05 [1] is close to be finalized and we would like to proceed to cross area review. We would greatly appreciate to get review by February 11. [...] [1] http://tools.ietf.org/html/draft-ietf-tzdist-service-05 Thanks for your work on this. This is the first time i've seen this draft; apologies for not looking at it earlier. I'm only subscribed to s...@ietf.org (and ietf-privacy, which is idle lately, but i've included here because some of my review touches on privacy), so this post might not make it through to tzd...@ietf.org -- feel free to forward it as needed. I did a quick skim here with my security and privacy hats on, and have a few comments: (privacy) Privacy Considerations section is missing === There is *no* Privacy Considerations section in the draft at all. Please read RFC 6973 for guidance in conducting a privacy review of the protocol. The act of querying these servers leaks something about the location of the person doing the query, at least, and may leak information about other locations that they're interested in. It's also possible that regular attempts to query this information will provide a linkable trail of the user, which could then be (mis)used without their knowledge or permission. Here's an attempt at a quick analysis, though i haven't thought through the protocol in detail. I hope you'll do your own analysis, and you're welcome to take any of mine: Implausibly: if the average user is interested in 5 timezones, and there are 774 known zones (find /usr/share/zoneinfo -type f | wc), and those interests were evenly distributed across the zones for every users, then the set of requests to update an individual's preferred timezones yields nearly 50 bits of entropy, far more than enough to distinguish every individual human from each other. More plausibly: timezone interest is probably less than 5 for most people, and it isn't evenly distributed: the people who are interested in Americas/New_York are more likely to be interested in Americas/Los_Angeles than in Arctic/Longyearbyen. But anyone with an unusual set of TZs can probably be identified (perhaps uniquely) by any provider they talk to just by what TZs they ask for. Since §4.1.4 says Clients SHOULD poll for changes, using an appropriate conditional request, at least once a day, a malicious provider intent on surveilling its users and with a mechanism to do so would have a daily checkin. I imagine this as some kind of background system service looking for updates. the daily checkin could be used to track a user's movements around the network, if their device is not stationary. The time of checkin could also be used as a linking mechanism, if the machine polls with rigid regularity. Are there strategies that someone interested in preserving their anonymity from a tzdata provider should take to remain anonymous? If so, what are they? (privacy) HTTP pipelining? == Clients requesting multiple unusual TZs together are more easily identifiable to servers, than clients who request only one. Should clients request all their interested TZs at once, or spread out their polling updates over time? HTTP pipelining is clearly more efficient; but what are the privacy implications if you have a system service that does this? (privacy) HTTP Cookies? === The choice of HTTP transport also allows for servers to set cookies in clients -- should clients accept and re-transmit cookies from the server? What are the privacy implications? (privacy) Tracking via ETag? Also, conditional requests seem to be encouraged via the use of an ETag header. It looks to me like a provider who wants to track its users individually (even in the absence of cookies) could use a cache of personalized ETags to do so. For example, the first time any client requests TZ X (with no If-None-Match request header), the server mints a new ETag Y, generates a new client ID Z, and records: * Client ID Z * the requested TZ X * the new ETag Y * the time of issuance * the IP address * any other interesting metadata When a request comes in for TZ X with an If-None-Match: Y header, the server can link the two requests and record them both with client ID Z. When the underlying data for the TZ actually changes, the server mints a new ETag (for the new version of TZ X), but associates it with the same client ID Z. (privacy) Logging policy for distribution servers? == There is also no mention of recommended logging policy for the servers, no attempt to address data minimization or the risks to trackable users based on normal server logs. (privacy) Authenticated clients are trackable
Re: [ietf-privacy] [saag] Fwd: WGLC for draft-ietf-tzdist-service-05
Thank you Daniel for your prompt review. The working group and draft editor shall address your comments prior to advancing this document. N.B., some discussion has already occurred in this area, even though it is not covered in the draft. Eliot On 1/30/15 3:13 AM, Daniel Kahn Gillmor wrote: Hi Daniel and Elliot-- On Wed 2015-01-28 14:24:28 -0500, Daniel Migault wrote: Our document describing Time Zone Data Distribution Service http://tools.ietf.org/html/draft-ietf-tzdist-service-05 [1] is close to be finalized and we would like to proceed to cross area review. We would greatly appreciate to get review by February 11. [...] [1] http://tools.ietf.org/html/draft-ietf-tzdist-service-05 Thanks for your work on this. This is the first time i've seen this draft; apologies for not looking at it earlier. I'm only subscribed to s...@ietf.org (and ietf-privacy, which is idle lately, but i've included here because some of my review touches on privacy), so this post might not make it through to tzd...@ietf.org -- feel free to forward it as needed. I did a quick skim here with my security and privacy hats on, and have a few comments: (privacy) Privacy Considerations section is missing === There is *no* Privacy Considerations section in the draft at all. Please read RFC 6973 for guidance in conducting a privacy review of the protocol. The act of querying these servers leaks something about the location of the person doing the query, at least, and may leak information about other locations that they're interested in. It's also possible that regular attempts to query this information will provide a linkable trail of the user, which could then be (mis)used without their knowledge or permission. Here's an attempt at a quick analysis, though i haven't thought through the protocol in detail. I hope you'll do your own analysis, and you're welcome to take any of mine: Implausibly: if the average user is interested in 5 timezones, and there are 774 known zones (find /usr/share/zoneinfo -type f | wc), and those interests were evenly distributed across the zones for every users, then the set of requests to update an individual's preferred timezones yields nearly 50 bits of entropy, far more than enough to distinguish every individual human from each other. More plausibly: timezone interest is probably less than 5 for most people, and it isn't evenly distributed: the people who are interested in Americas/New_York are more likely to be interested in Americas/Los_Angeles than in Arctic/Longyearbyen. But anyone with an unusual set of TZs can probably be identified (perhaps uniquely) by any provider they talk to just by what TZs they ask for. Since §4.1.4 says Clients SHOULD poll for changes, using an appropriate conditional request, at least once a day, a malicious provider intent on surveilling its users and with a mechanism to do so would have a daily checkin. I imagine this as some kind of background system service looking for updates. the daily checkin could be used to track a user's movements around the network, if their device is not stationary. The time of checkin could also be used as a linking mechanism, if the machine polls with rigid regularity. Are there strategies that someone interested in preserving their anonymity from a tzdata provider should take to remain anonymous? If so, what are they? (privacy) HTTP pipelining? == Clients requesting multiple unusual TZs together are more easily identifiable to servers, than clients who request only one. Should clients request all their interested TZs at once, or spread out their polling updates over time? HTTP pipelining is clearly more efficient; but what are the privacy implications if you have a system service that does this? (privacy) HTTP Cookies? === The choice of HTTP transport also allows for servers to set cookies in clients -- should clients accept and re-transmit cookies from the server? What are the privacy implications? (privacy) Tracking via ETag? Also, conditional requests seem to be encouraged via the use of an ETag header. It looks to me like a provider who wants to track its users individually (even in the absence of cookies) could use a cache of personalized ETags to do so. For example, the first time any client requests TZ X (with no If-None-Match request header), the server mints a new ETag Y, generates a new client ID Z, and records: * Client ID Z * the requested TZ X * the new ETag Y * the time of issuance * the IP address * any other interesting metadata When a request comes in for TZ X with an If-None-Match: Y header, the server can link the two requests and record them both with client ID Z. When the underlying data for the TZ actually changes, the server mints a new ETag (for the new version of TZ X), but
Re: [ietf-privacy] [saag] Fwd: WGLC for draft-ietf-tzdist-service-05
Just following up on my own email, the working group is advised to take quite seriously privacy considerations. As Daniel referenced RFC 6973, even though we considered some of these issues, I refer you to an article in today's Wall Street Journal[1] that highlights how easy it is to correlate information to individuals and how important a role location plays into that. Eliot [1] http://www.wsj.com/articles/metadata-can-expose-persons-identity-even-when-name-isnt-1422558349?mod=WSJ_hp_EditorsPicks On 1/30/15 6:24 AM, Eliot Lear wrote: Thank you Daniel for your prompt review. The working group and draft editor shall address your comments prior to advancing this document. N.B., some discussion has already occurred in this area, even though it is not covered in the draft. Eliot On 1/30/15 3:13 AM, Daniel Kahn Gillmor wrote: Hi Daniel and Elliot-- On Wed 2015-01-28 14:24:28 -0500, Daniel Migault wrote: Our document describing Time Zone Data Distribution Service http://tools.ietf.org/html/draft-ietf-tzdist-service-05 [1] is close to be finalized and we would like to proceed to cross area review. We would greatly appreciate to get review by February 11. [...] [1] http://tools.ietf.org/html/draft-ietf-tzdist-service-05 Thanks for your work on this. This is the first time i've seen this draft; apologies for not looking at it earlier. I'm only subscribed to s...@ietf.org (and ietf-privacy, which is idle lately, but i've included here because some of my review touches on privacy), so this post might not make it through to tzd...@ietf.org -- feel free to forward it as needed. I did a quick skim here with my security and privacy hats on, and have a few comments: (privacy) Privacy Considerations section is missing === There is *no* Privacy Considerations section in the draft at all. Please read RFC 6973 for guidance in conducting a privacy review of the protocol. The act of querying these servers leaks something about the location of the person doing the query, at least, and may leak information about other locations that they're interested in. It's also possible that regular attempts to query this information will provide a linkable trail of the user, which could then be (mis)used without their knowledge or permission. Here's an attempt at a quick analysis, though i haven't thought through the protocol in detail. I hope you'll do your own analysis, and you're welcome to take any of mine: Implausibly: if the average user is interested in 5 timezones, and there are 774 known zones (find /usr/share/zoneinfo -type f | wc), and those interests were evenly distributed across the zones for every users, then the set of requests to update an individual's preferred timezones yields nearly 50 bits of entropy, far more than enough to distinguish every individual human from each other. More plausibly: timezone interest is probably less than 5 for most people, and it isn't evenly distributed: the people who are interested in Americas/New_York are more likely to be interested in Americas/Los_Angeles than in Arctic/Longyearbyen. But anyone with an unusual set of TZs can probably be identified (perhaps uniquely) by any provider they talk to just by what TZs they ask for. Since §4.1.4 says Clients SHOULD poll for changes, using an appropriate conditional request, at least once a day, a malicious provider intent on surveilling its users and with a mechanism to do so would have a daily checkin. I imagine this as some kind of background system service looking for updates. the daily checkin could be used to track a user's movements around the network, if their device is not stationary. The time of checkin could also be used as a linking mechanism, if the machine polls with rigid regularity. Are there strategies that someone interested in preserving their anonymity from a tzdata provider should take to remain anonymous? If so, what are they? (privacy) HTTP pipelining? == Clients requesting multiple unusual TZs together are more easily identifiable to servers, than clients who request only one. Should clients request all their interested TZs at once, or spread out their polling updates over time? HTTP pipelining is clearly more efficient; but what are the privacy implications if you have a system service that does this? (privacy) HTTP Cookies? === The choice of HTTP transport also allows for servers to set cookies in clients -- should clients accept and re-transmit cookies from the server? What are the privacy implications? (privacy) Tracking via ETag? Also, conditional requests seem to be encouraged via the use of an ETag header. It looks to me like a provider who wants to track its users individually (even in the absence of cookies) could use a cache of personalized ETags to do so. For example, the