Re: [ietf-privacy] [saag] Fwd: WGLC for draft-ietf-tzdist-service-05

2015-01-31 Thread Eliot Lear
Hi Daniel,

Thanks again for this review.

I think there are several categories of issues that you've raised, and
I'd like to break them out.  The first is the easy category: that which
has been raised before and considered.  There is only one issue in that
category as to whether or not everything should run atop TLS.  That
issue need not be reconsidered, EXCEPT in as much as you have clarified
the context (c2s versus s2s).

There is another category of that which really must change.  Most (if
not all) of what you mark as security is in this category.  The
downgrade attack that is possible in the current text also does not
match my understanding of the consensus of the working group.  That is-
either a client uses HTTP or it uses HTTPS, but it may not try HTTPS
first, and then back off to HTTP.  Do.  Do not.  There is no try.

The other issues break down into two groups, I think.  Authenticated
sessions versus unauthenticated sessions.  Because authentication is
allowed, and because we do not specify a provisioning mechanism, likely
there will be linkage between services.  But we should also be mindful
of what mitigations are truly available to us with regard to the other
functions.

Eliot

On 1/30/15 3:13 AM, Daniel Kahn Gillmor wrote:
 Hi Daniel and Elliot--

 On Wed 2015-01-28 14:24:28 -0500, Daniel Migault wrote:
 Our document describing Time Zone Data Distribution Service
 http://tools.ietf.org/html/draft-ietf-tzdist-service-05 [1] is close to
 be finalized and we would like to proceed to cross area review.

 We would greatly appreciate to get review by February 11.
  [...]
 [1] http://tools.ietf.org/html/draft-ietf-tzdist-service-05
 Thanks for your work on this.  This is the first time i've seen this
 draft; apologies for not looking at it earlier.

 I'm only subscribed to s...@ietf.org (and ietf-privacy, which is idle
 lately, but i've included here because some of my review touches on
 privacy), so this post might not make it through to tzd...@ietf.org --
 feel free to forward it as needed.

 I did a quick skim here with my security and privacy hats on, and have a
 few comments:

 (privacy) Privacy Considerations section is missing
 ===

 There is *no* Privacy Considerations section in the draft at all.
 Please read RFC 6973 for guidance in conducting a privacy review of the
 protocol.  The act of querying these servers leaks something about the
 location of the person doing the query, at least, and may leak
 information about other locations that they're interested in.  It's also
 possible that regular attempts to query this information will provide a
 linkable trail of the user, which could then be (mis)used without their
 knowledge or permission.

 Here's an attempt at a quick analysis, though i haven't thought through
 the protocol in detail.  I hope you'll do your own analysis, and you're
 welcome to take any of mine:

 Implausibly: if the average user is interested in 5 timezones, and there
 are 774 known zones (find /usr/share/zoneinfo -type f | wc), and those
 interests were evenly distributed across the zones for every users, then
 the set of requests to update an individual's preferred timezones yields
 nearly 50 bits of entropy, far more than enough to distinguish every
 individual human from each other.

 More plausibly: timezone interest is probably less than 5 for most
 people, and it isn't evenly distributed: the people who are interested
 in Americas/New_York are more likely to be interested in
 Americas/Los_Angeles than in Arctic/Longyearbyen.  But anyone with an
 unusual set of TZs can probably be identified (perhaps uniquely) by any
 provider they talk to just by what TZs they ask for.

 Since §4.1.4 says Clients SHOULD poll for changes, using an appropriate
 conditional request, at least once a day, a malicious provider intent
 on surveilling its users and with a mechanism to do so would have a
 daily checkin.  I imagine this as some kind of background system service
 looking for updates.  the daily checkin could be used to track a user's
 movements around the network, if their device is not stationary.  The
 time of checkin could also be used as a linking mechanism, if the
 machine polls with rigid regularity.

 Are there strategies that someone interested in preserving their
 anonymity from a tzdata provider should take to remain anonymous?  If
 so, what are they?


 (privacy) HTTP pipelining?
 ==

 Clients requesting multiple unusual TZs together are more easily
 identifiable to servers, than clients who request only one.  Should
 clients request all their interested TZs at once, or spread out their
 polling updates over time?  HTTP pipelining is clearly more efficient;
 but what are the privacy implications if you have a system service that
 does this?

 (privacy) HTTP Cookies?
 ===

 The choice of HTTP transport also allows for servers to set cookies in
 clients -- should clients 

Re: [ietf-privacy] [saag] Fwd: WGLC for draft-ietf-tzdist-service-05

2015-01-29 Thread Daniel Kahn Gillmor
Hi Daniel and Elliot--

On Wed 2015-01-28 14:24:28 -0500, Daniel Migault wrote:
 Our document describing Time Zone Data Distribution Service
 http://tools.ietf.org/html/draft-ietf-tzdist-service-05 [1] is close to
 be finalized and we would like to proceed to cross area review.

 We would greatly appreciate to get review by February 11.
 [...]
 [1] http://tools.ietf.org/html/draft-ietf-tzdist-service-05

Thanks for your work on this.  This is the first time i've seen this
draft; apologies for not looking at it earlier.

I'm only subscribed to s...@ietf.org (and ietf-privacy, which is idle
lately, but i've included here because some of my review touches on
privacy), so this post might not make it through to tzd...@ietf.org --
feel free to forward it as needed.

I did a quick skim here with my security and privacy hats on, and have a
few comments:

(privacy) Privacy Considerations section is missing
===

There is *no* Privacy Considerations section in the draft at all.
Please read RFC 6973 for guidance in conducting a privacy review of the
protocol.  The act of querying these servers leaks something about the
location of the person doing the query, at least, and may leak
information about other locations that they're interested in.  It's also
possible that regular attempts to query this information will provide a
linkable trail of the user, which could then be (mis)used without their
knowledge or permission.

Here's an attempt at a quick analysis, though i haven't thought through
the protocol in detail.  I hope you'll do your own analysis, and you're
welcome to take any of mine:

Implausibly: if the average user is interested in 5 timezones, and there
are 774 known zones (find /usr/share/zoneinfo -type f | wc), and those
interests were evenly distributed across the zones for every users, then
the set of requests to update an individual's preferred timezones yields
nearly 50 bits of entropy, far more than enough to distinguish every
individual human from each other.

More plausibly: timezone interest is probably less than 5 for most
people, and it isn't evenly distributed: the people who are interested
in Americas/New_York are more likely to be interested in
Americas/Los_Angeles than in Arctic/Longyearbyen.  But anyone with an
unusual set of TZs can probably be identified (perhaps uniquely) by any
provider they talk to just by what TZs they ask for.

Since §4.1.4 says Clients SHOULD poll for changes, using an appropriate
conditional request, at least once a day, a malicious provider intent
on surveilling its users and with a mechanism to do so would have a
daily checkin.  I imagine this as some kind of background system service
looking for updates.  the daily checkin could be used to track a user's
movements around the network, if their device is not stationary.  The
time of checkin could also be used as a linking mechanism, if the
machine polls with rigid regularity.

Are there strategies that someone interested in preserving their
anonymity from a tzdata provider should take to remain anonymous?  If
so, what are they?


(privacy) HTTP pipelining?
==

Clients requesting multiple unusual TZs together are more easily
identifiable to servers, than clients who request only one.  Should
clients request all their interested TZs at once, or spread out their
polling updates over time?  HTTP pipelining is clearly more efficient;
but what are the privacy implications if you have a system service that
does this?

(privacy) HTTP Cookies?
===

The choice of HTTP transport also allows for servers to set cookies in
clients -- should clients accept and re-transmit cookies from the
server?  What are the privacy implications?


(privacy) Tracking via ETag?


Also, conditional requests seem to be encouraged via the use of an ETag
header.  It looks to me like a provider who wants to track its users
individually (even in the absence of cookies) could use a cache of
personalized ETags to do so.

For example, the first time any client requests TZ X (with no
If-None-Match request header), the server mints a new ETag Y, generates
a new client ID Z, and records:

 * Client ID Z
 * the requested TZ X
 * the new ETag Y
 * the time of issuance
 * the IP address
 * any other interesting metadata

When a request comes in for TZ X with an If-None-Match: Y header, the
server can link the two requests and record them both with client ID Z.

When the underlying data for the TZ actually changes, the server mints a
new ETag (for the new version of TZ X), but associates it with the same
client ID Z.


(privacy) Logging policy for distribution servers?
==

There is also no mention of recommended logging policy for the servers,
no attempt to address data minimization or the risks to trackable users
based on normal server logs.

(privacy) Authenticated clients are trackable

Re: [ietf-privacy] [saag] Fwd: WGLC for draft-ietf-tzdist-service-05

2015-01-29 Thread Eliot Lear
Thank you Daniel for your prompt review.  The working group and draft
editor shall address your comments prior to advancing this document. 
N.B., some discussion has already occurred in this area, even though it
is not covered in the draft.

Eliot


On 1/30/15 3:13 AM, Daniel Kahn Gillmor wrote:
 Hi Daniel and Elliot--

 On Wed 2015-01-28 14:24:28 -0500, Daniel Migault wrote:
 Our document describing Time Zone Data Distribution Service
 http://tools.ietf.org/html/draft-ietf-tzdist-service-05 [1] is close to
 be finalized and we would like to proceed to cross area review.

 We would greatly appreciate to get review by February 11.
  [...]
 [1] http://tools.ietf.org/html/draft-ietf-tzdist-service-05
 Thanks for your work on this.  This is the first time i've seen this
 draft; apologies for not looking at it earlier.

 I'm only subscribed to s...@ietf.org (and ietf-privacy, which is idle
 lately, but i've included here because some of my review touches on
 privacy), so this post might not make it through to tzd...@ietf.org --
 feel free to forward it as needed.

 I did a quick skim here with my security and privacy hats on, and have a
 few comments:

 (privacy) Privacy Considerations section is missing
 ===

 There is *no* Privacy Considerations section in the draft at all.
 Please read RFC 6973 for guidance in conducting a privacy review of the
 protocol.  The act of querying these servers leaks something about the
 location of the person doing the query, at least, and may leak
 information about other locations that they're interested in.  It's also
 possible that regular attempts to query this information will provide a
 linkable trail of the user, which could then be (mis)used without their
 knowledge or permission.

 Here's an attempt at a quick analysis, though i haven't thought through
 the protocol in detail.  I hope you'll do your own analysis, and you're
 welcome to take any of mine:

 Implausibly: if the average user is interested in 5 timezones, and there
 are 774 known zones (find /usr/share/zoneinfo -type f | wc), and those
 interests were evenly distributed across the zones for every users, then
 the set of requests to update an individual's preferred timezones yields
 nearly 50 bits of entropy, far more than enough to distinguish every
 individual human from each other.

 More plausibly: timezone interest is probably less than 5 for most
 people, and it isn't evenly distributed: the people who are interested
 in Americas/New_York are more likely to be interested in
 Americas/Los_Angeles than in Arctic/Longyearbyen.  But anyone with an
 unusual set of TZs can probably be identified (perhaps uniquely) by any
 provider they talk to just by what TZs they ask for.

 Since §4.1.4 says Clients SHOULD poll for changes, using an appropriate
 conditional request, at least once a day, a malicious provider intent
 on surveilling its users and with a mechanism to do so would have a
 daily checkin.  I imagine this as some kind of background system service
 looking for updates.  the daily checkin could be used to track a user's
 movements around the network, if their device is not stationary.  The
 time of checkin could also be used as a linking mechanism, if the
 machine polls with rigid regularity.

 Are there strategies that someone interested in preserving their
 anonymity from a tzdata provider should take to remain anonymous?  If
 so, what are they?


 (privacy) HTTP pipelining?
 ==

 Clients requesting multiple unusual TZs together are more easily
 identifiable to servers, than clients who request only one.  Should
 clients request all their interested TZs at once, or spread out their
 polling updates over time?  HTTP pipelining is clearly more efficient;
 but what are the privacy implications if you have a system service that
 does this?

 (privacy) HTTP Cookies?
 ===

 The choice of HTTP transport also allows for servers to set cookies in
 clients -- should clients accept and re-transmit cookies from the
 server?  What are the privacy implications?


 (privacy) Tracking via ETag?
 

 Also, conditional requests seem to be encouraged via the use of an ETag
 header.  It looks to me like a provider who wants to track its users
 individually (even in the absence of cookies) could use a cache of
 personalized ETags to do so.

 For example, the first time any client requests TZ X (with no
 If-None-Match request header), the server mints a new ETag Y, generates
 a new client ID Z, and records:

  * Client ID Z
  * the requested TZ X
  * the new ETag Y
  * the time of issuance
  * the IP address
  * any other interesting metadata

 When a request comes in for TZ X with an If-None-Match: Y header, the
 server can link the two requests and record them both with client ID Z.

 When the underlying data for the TZ actually changes, the server mints a
 new ETag (for the new version of TZ X), but 

Re: [ietf-privacy] [saag] Fwd: WGLC for draft-ietf-tzdist-service-05

2015-01-29 Thread Eliot Lear
Just following up on my own email, the working group is advised to take
quite seriously privacy considerations.  As Daniel referenced RFC 6973,
even though we considered some of these issues, I refer you to an
article in today's Wall Street Journal[1] that highlights how easy it is
to correlate information to individuals and how important a role
location plays into that.

Eliot
[1]
http://www.wsj.com/articles/metadata-can-expose-persons-identity-even-when-name-isnt-1422558349?mod=WSJ_hp_EditorsPicks

On 1/30/15 6:24 AM, Eliot Lear wrote:
 Thank you Daniel for your prompt review.  The working group and draft
 editor shall address your comments prior to advancing this document. 
 N.B., some discussion has already occurred in this area, even though it
 is not covered in the draft.

 Eliot


 On 1/30/15 3:13 AM, Daniel Kahn Gillmor wrote:
 Hi Daniel and Elliot--

 On Wed 2015-01-28 14:24:28 -0500, Daniel Migault wrote:
 Our document describing Time Zone Data Distribution Service
 http://tools.ietf.org/html/draft-ietf-tzdist-service-05 [1] is close to
 be finalized and we would like to proceed to cross area review.

 We would greatly appreciate to get review by February 11.
  [...]
 [1] http://tools.ietf.org/html/draft-ietf-tzdist-service-05
 Thanks for your work on this.  This is the first time i've seen this
 draft; apologies for not looking at it earlier.

 I'm only subscribed to s...@ietf.org (and ietf-privacy, which is idle
 lately, but i've included here because some of my review touches on
 privacy), so this post might not make it through to tzd...@ietf.org --
 feel free to forward it as needed.

 I did a quick skim here with my security and privacy hats on, and have a
 few comments:

 (privacy) Privacy Considerations section is missing
 ===

 There is *no* Privacy Considerations section in the draft at all.
 Please read RFC 6973 for guidance in conducting a privacy review of the
 protocol.  The act of querying these servers leaks something about the
 location of the person doing the query, at least, and may leak
 information about other locations that they're interested in.  It's also
 possible that regular attempts to query this information will provide a
 linkable trail of the user, which could then be (mis)used without their
 knowledge or permission.

 Here's an attempt at a quick analysis, though i haven't thought through
 the protocol in detail.  I hope you'll do your own analysis, and you're
 welcome to take any of mine:

 Implausibly: if the average user is interested in 5 timezones, and there
 are 774 known zones (find /usr/share/zoneinfo -type f | wc), and those
 interests were evenly distributed across the zones for every users, then
 the set of requests to update an individual's preferred timezones yields
 nearly 50 bits of entropy, far more than enough to distinguish every
 individual human from each other.

 More plausibly: timezone interest is probably less than 5 for most
 people, and it isn't evenly distributed: the people who are interested
 in Americas/New_York are more likely to be interested in
 Americas/Los_Angeles than in Arctic/Longyearbyen.  But anyone with an
 unusual set of TZs can probably be identified (perhaps uniquely) by any
 provider they talk to just by what TZs they ask for.

 Since §4.1.4 says Clients SHOULD poll for changes, using an appropriate
 conditional request, at least once a day, a malicious provider intent
 on surveilling its users and with a mechanism to do so would have a
 daily checkin.  I imagine this as some kind of background system service
 looking for updates.  the daily checkin could be used to track a user's
 movements around the network, if their device is not stationary.  The
 time of checkin could also be used as a linking mechanism, if the
 machine polls with rigid regularity.

 Are there strategies that someone interested in preserving their
 anonymity from a tzdata provider should take to remain anonymous?  If
 so, what are they?


 (privacy) HTTP pipelining?
 ==

 Clients requesting multiple unusual TZs together are more easily
 identifiable to servers, than clients who request only one.  Should
 clients request all their interested TZs at once, or spread out their
 polling updates over time?  HTTP pipelining is clearly more efficient;
 but what are the privacy implications if you have a system service that
 does this?

 (privacy) HTTP Cookies?
 ===

 The choice of HTTP transport also allows for servers to set cookies in
 clients -- should clients accept and re-transmit cookies from the
 server?  What are the privacy implications?


 (privacy) Tracking via ETag?
 

 Also, conditional requests seem to be encouraged via the use of an ETag
 header.  It looks to me like a provider who wants to track its users
 individually (even in the absence of cookies) could use a cache of
 personalized ETags to do so.

 For example, the