Re: Firefox Hello new data collection

2016-04-05 Thread Joseph Lorenzo Hall
On Tue, Apr 5, 2016 at 12:05 PM, Chris Hofmann  wrote:
> Thie passage in https://www.mozilla.org/en-US/privacy/firefox-hello/ also
> would lead me to believe that the contents of my communication with another
> user (including shared URLs) are encrypted (and would be private).
>
> We've just invested heavily in making this point and trying to make that
> association that encryption mean strong privacy and vice-versa.
> https://blog.mozilla.org/blog/2016/03/30/everyday-internet-users-can-stand-up-for-encryption-heres-how/

As an outside lurker on dev-platform but a big fan of Mozilla's data
stewardship folks, this is the core of the issue for me. WebRTC
conversations should be assumed to be highly private and any
exfiltration on the client without explicit opt-in is seems very
dangerous. I'm not saying it should never be done but it should be
very very important and done very very carefully. I don't get the
sense that this data is that crucial to innovative Hello features. You
could opt-in folks to the study just-in-time using tab sharing. I know
that clobbers the UX but if it's that important I think you need to
take that hit given the sensitivity of real-time comms.

-- 
Joseph Lorenzo Hall
Chief Technologist, Center for Democracy & Technology [https://www.cdt.org]
e: j...@cdt.org, p: 202.407.8825, pgp: https://josephhall.org/gpg-key
Fingerprint: 3CA2 8D7B 9F6D DBD3 4B10  1607 5F86 6987 40A9 A871

CDT's annual dinner, Tech Prom, is April 6, 2016! https://cdt.org/annual-dinner
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Firefox Hello new data collection

2016-04-05 Thread adam
I think this should be abandoned in favour of an optional survey for Hello Users

05.04.2016, 17:06, "Chris Hofmann" :
> On Mon, Apr 4, 2016 at 3:01 AM, Romain Testard  wrote:
>
>>  Firefox Hello has its own privacy notice (details here
>>  ).
>
> Its unclear to me reading the follow through link to the
> TokBox Privacy Policy. -> https://tokbox.com/support/privacy-policy
>
> Does TokBox already have access to the contents of the messages and URLs
> that might have been shared?
>
> the tokbox policy says:
>
> The types of information collected include your name, e-mail address, and
> any other data you actively choose to provide.
>
> and leaves it vague about the definition of "other data you actively
> provide." Does that include shared URLs and message content?
>
> Thie passage in https://www.mozilla.org/en-US/privacy/firefox-hello/ also
> would lead me to believe that the contents of my communication with another
> user (including shared URLs) are encrypted (and would be private).
>
> We've just invested heavily in making this point and trying to make that
> association that encryption mean strong privacy and vice-versa.
> https://blog.mozilla.org/blog/2016/03/30/everyday-internet-users-can-stand-up-for-encryption-heres-how/
>
> How are we going to address the possible take away that some will have that
> we've just created a backdoor for parts (shared urls that are part of the
> message content) of the hello encrypted message channel if we turn this
> change on?
> ___
> dev-platform mailing list
> dev-platform@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Firefox Hello new data collection

2016-04-05 Thread Chris Hofmann
On Mon, Apr 4, 2016 at 3:01 AM, Romain Testard  wrote:

>
> Firefox Hello has its own privacy notice (details here
> ).
>
>
Its unclear to me reading the follow through link to the
TokBox Privacy Policy.  ->  https://tokbox.com/support/privacy-policy

Does TokBox already have access to the contents of the messages and URLs
that might have been shared?

the tokbox policy says:

The types of information collected include your name, e-mail address, and
any other data you actively choose to provide.

and leaves it vague about the definition of "other data you actively
provide."   Does that include shared URLs and message content?

Thie passage in https://www.mozilla.org/en-US/privacy/firefox-hello/ also
would lead me to believe that the contents of my communication with another
user (including shared URLs) are encrypted (and would be private).

We've just invested heavily in making this point and trying to make that
association that encryption mean strong privacy and vice-versa.
https://blog.mozilla.org/blog/2016/03/30/everyday-internet-users-can-stand-up-for-encryption-heres-how/

How are we going to address the possible take away that some will have that
we've just created a backdoor for parts (shared urls that are part of the
message content) of the hello encrypted message channel if we turn this
change on?
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Firefox Hello new data collection

2016-04-05 Thread Randell Jesup
>The privacy review bug is
>https://bugzilla.mozilla.org/show_bug.cgi?id=1261467.
>More details added below.
>> On 04/04/2016 10:01, Romain Testard wrote:
>>
>>> We would use a whitelist client-side to only collect domains that are
>>> part of the top 2000 domains (Alexa list of top domains). This
>>> prevents
>>> personal identification based on obscure domain usage.
>>>
>>
>> Mathematically, the combination of a set of (popular) domains shared could
>> still be uniquely identifying, especially as, AIUI, you will get the counts
>> of each domain and in what sequence they were visited / which ones were
>> visited in which session. It all depends on the number of unique users and
>> the number of domains they visit / share (not clear: see above). Because
>> the total number of Hello users compared with the number of Firefox users
>> is quite low, this still seems somewhat concerning to me. Have you tried to
>> remedy this in any way?
>>
>
>We are aggregating domain names, and are not storing session histories.
>These are submitted at the end of the session, so exact timestamps of any
>visit are not included.

There's been a bunch of surprises over the last few years where
"anonymized" data turned out to be de-anonymizable.  This is the sort of
data that feels like it could lead to surprises.  I think this would
need more looks by someone who actually understands that and where those
risks come from (not me).

There are added risks if you include the case of someone using our data
*and* data from one or more 3rd-party sites, and that's not easy to
reason about, which is why this needs careful consideration.

>> Finally, I am surprised that you're sharing this 2 weeks before we're
>> releasing Firefox 46. Hasn't this been tested and verified on Nightly
>> and/or other channels? Why was no privacy update made at/before that time?
>>
>
>We are shipping Hello through Go Faster. The Go Faster process allows us to
>uplift directly to Beta 46 directly since we're a system add-on
>(development was done about 2 weeks ago).
>Firefox Hello has its own privacy notice (details here
>).

Since the collection is not enabled currently anywhere, how known-stable
is it for beta?  Having the code in a disabled state safely is one
thing; having it known to be safe to turn on is another.

-- 
Randell Jesup, Mozilla Corp
remove "news" for personal email
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Firefox Hello new data collection

2016-04-04 Thread Chris Hofmann
On Mon, Apr 4, 2016 at 1:35 PM, Ian Bicking  wrote:

> On Mon, Apr 4, 2016 at 10:44 AM, Gijs Kruitbosch  >
> wrote:
>
>   I put some comments about data bias against international users
and the .edu domains and possible data leaks that could result directly in
the bug.


> >
> > We looked into this approach originally although we found that we'd lose
> a
> >>>
> >> level of granularity that can have an importance. We may find that Hello
> >> gets used a lot with a specific Website for a specific reason and using
> >> client side categories would prevent us from learning this.
> >>
> >
> > This was explicitly not in your original motivation, so you're moving the
> > goalposts here. If the goal is about separate categories or separate
> sites
> > then those are pretty distinct goals that require different approaches.
> If
> > the real point is "we have no idea, so we figured we'd just get the data
> > and then go from there", why not be upfront about it?
>
>
> We are looking for clues about how people are using Hello, and using
> domains as one way to understand this.  So yes, it is exploratory, and we
> are looking for insight we have not yet received, rather than a more binary
> signal such as do people use Hello for shopping or not.
>
> For example, two domains that are on the whitelist: steampowered.com and
> steamcommunity.com – these would both typically be categorized as
> "gaming",
> but they represent very different use cases (store vs. discussion).  Or
> aa.com, tripadvisor.com, and expedia.com are all travel sites, but
> represent different (but overlapping) use cases.
>
> But in that case, yeah, why not consider a survey or something less
> > intrusive, like asking people explicitly what type of site they were
> using,
> > or asking if Mozilla can use the domain in question ?
>
>
> Asking people what site they were using seems challenging.


Yeah, but if it's the intent to gather insight around how people are, or
could use hello for shared browsing that seems an indirect and possibly
error prone set of data to gather with out context of what users are trying
to accomplish.


> Do we suggest
> types?  Will people acknowledge the full path of sites they used?  How much
> do we have to annoy people with questions in order to get a large enough
> sample?  Will it ever be a representative sample?  Even if we do work to
> address these, how can we tell if we have done so if we don't have real
> usage data to compare to?
>

Simply ask users what type of task they are trying to accomplish.

Maybe give them a set of hints about the standard things that people do
on-line.

With that you could start to understand some interesting use cases that
hello could be made to address in a better way, or some possible use cases
that might be in confilict and may need multiple approaches to support.
(e.g. your game playing against v. shopping together use case)

You really don't want to know particular sites people visit, you want to
understand what it is they are trying to accomplish, right?

>
> As fully implemented, including the backend collection which further
> aggregates the information, I believe we are not collecting private or
> personally revealing information.  If we ask users to opt-in to collection
> I don't think we can accurately explain to users the limits of what we are
> collecting (especially at that moment when we are interrupting what they
> are doing), and I think it will make it appear that we are trying to
> collect personal information that we are not.
>

Yes, its easy to understand and believe that its our intention not to
collect personal information.

But that intention avoids the underlying question about if browsing history
is personal data, or is consider by our users to be such.

As it's been said in the past,  the road to leaking user personal data is
paved with good intentions.

The survey approach gets us a more direct path to the data we need for
designing a better co-browsing experience or experiences, since we really
would never really design a feature against a particular site or URL, but
are really looking for a common behavior pattern that lends itself to
streamlining across many sites and application.


>
>
> >
> > Also Alexa
> >> website categories are far from perfect which would add another level of
> >> complexity to understand the collected data.
> >>
> >
> > At no point did I say I expected you to use their categorization,
> whatever
> > that is. Categorize as you see fit, rather than as Alexa does it.
> >
> > Conversely, if their categorization is questionable, then your scrubbing
> > of the Adult category sounds like it might need auditing? Also, why not
> > other categories like "Banking" or "Medical" (NB: no idea what
> > categorization Alexa employs, but these seem like categories that ought
> to
> > be scrubbed, too)?
> >
>
> For filtering out adult sites we used a well-maintained blacklist.  Alexa
> categorization 

Re: Firefox Hello new data collection

2016-04-04 Thread Ian Bicking
On Mon, Apr 4, 2016 at 10:44 AM, Gijs Kruitbosch 
wrote:

> On 04/04/2016 11:01, Romain Testard wrote:
>
>> The privacy review bug is
>> https://bugzilla.mozilla.org/show_bug.cgi?id=1261467.
>> More details added below.
>>
>
> See response at the bottom.
>
> On Mon, Apr 4, 2016 at 11:23 AM, Gijs Kruitbosch > >
>> wrote:
>>
>>> On 04/04/2016 10:01, Romain Testard wrote:
>>>
>>>  We would use a whitelist client-side to only collect domains that
 are
  part of the top 2000 domains (Alexa list of top domains). This
 prevents
  personal identification based on obscure domain usage.


>>> Mathematically, the combination of a set of (popular) domains shared
>>> could
>>> still be uniquely identifying, especially as, AIUI, you will get the
>>> counts
>>> of each domain and in what sequence they were visited / which ones were
>>> visited in which session. It all depends on the number of unique users
>>> and
>>> the number of domains they visit / share (not clear: see above). Because
>>> the total number of Hello users compared with the number of Firefox users
>>> is quite low, this still seems somewhat concerning to me. Have you tried
>>> to
>>> remedy this in any way?
>>>
>>>
>> We are aggregating domain names, and are not storing session histories.
>> These are submitted at the end of the session, so exact timestamps of any
>> visit are not included.
>>
>
> But both Firefox and Hello sessions are commonly relatively short (<1d)
> and numerous. That means lots of data points, which will likely be enough
> to uniquely identify people even without exact timestamps of their visits.
> (FWIW, from a technical perspective, there is no reason why the submission
> time implies ("so") that exact timestamps of visits are not included.)


Yes, if an attacker has access to cross-domain tracking for several sites
that a user visits, and that attacker can access the reporting in transit,
it may be possible to correlate, thus finding the rest of the whitelisted
history, and some associated Firefox Hello data.  But that's only in the
case of an attack.  The actually data sent to the logging pipeline is
immediately pulled out of a session list and submitted as individual items,
and all other data (e.g., IP address) is left out of this logging.


>
> We looked into this approach originally although we found that we'd lose a
>>>
>> level of granularity that can have an importance. We may find that Hello
>> gets used a lot with a specific Website for a specific reason and using
>> client side categories would prevent us from learning this.
>>
>
> This was explicitly not in your original motivation, so you're moving the
> goalposts here. If the goal is about separate categories or separate sites
> then those are pretty distinct goals that require different approaches. If
> the real point is "we have no idea, so we figured we'd just get the data
> and then go from there", why not be upfront about it?


We are looking for clues about how people are using Hello, and using
domains as one way to understand this.  So yes, it is exploratory, and we
are looking for insight we have not yet received, rather than a more binary
signal such as do people use Hello for shopping or not.

For example, two domains that are on the whitelist: steampowered.com and
steamcommunity.com – these would both typically be categorized as "gaming",
but they represent very different use cases (store vs. discussion).  Or
aa.com, tripadvisor.com, and expedia.com are all travel sites, but
represent different (but overlapping) use cases.

But in that case, yeah, why not consider a survey or something less
> intrusive, like asking people explicitly what type of site they were using,
> or asking if Mozilla can use the domain in question ?


Asking people what site they were using seems challenging.  Do we suggest
types?  Will people acknowledge the full path of sites they used?  How much
do we have to annoy people with questions in order to get a large enough
sample?  Will it ever be a representative sample?  Even if we do work to
address these, how can we tell if we have done so if we don't have real
usage data to compare to?

As fully implemented, including the backend collection which further
aggregates the information, I believe we are not collecting private or
personally revealing information.  If we ask users to opt-in to collection
I don't think we can accurately explain to users the limits of what we are
collecting (especially at that moment when we are interrupting what they
are doing), and I think it will make it appear that we are trying to
collect personal information that we are not.


>
> Also Alexa
>> website categories are far from perfect which would add another level of
>> complexity to understand the collected data.
>>
>
> At no point did I say I expected you to use their categorization, whatever
> that is. Categorize as you see fit, rather than as Alexa does it.
>

Re: Firefox Hello new data collection

2016-04-04 Thread a...@imgland.xyz
However it is concerning to have code in an Open source project that is1.Mostly undocumented2.Could be confusing to privacy-aware users3.Harvests data without proper privacy notices4.Has been added prematurely without proper documentation and serves no purpose like say, dom.webcomponents.enabled , does in making the browser more standards-compliant or up to date with an early W3C SpecificationIn my opinion this sort of feature should be held back from any mainstream release until it is clear to the end user exactly what it does. You appear to still not have a real focus for it from your messages which I find very worrying. I find that clear documentation is important and my only gripe with alot of products right now.04.04.2016, 19:30, "Mark Banner" : On 04/04/2016 16:49, a...@imgland.xyz wrote:  I don't know much about Mozilla's privacy but in my opinion feel the  need to immediately remove it from Firefox and push a new beta build I can understand your concern, however, please understand that this logging functionality is currently disabled by default - see the "loop.logDomains" preference. We won't be enabling it until the privacy review is completed. If you wish to inspect and validate my assertion, you are quite welcome to. Here's a link to the code so that you can see what is currently on beta http://mxr.mozilla.org/mozilla-beta/search?string=loop.logDomains===%5E%5B%5E%5C0%5D*%24==mozilla-beta You can also see from the test file there, that we have a test to check that nothing is logged if the pref is set to false. Mark. ___ dev-platform mailing list dev-platform@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-platform___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Firefox Hello new data collection

2016-04-04 Thread Mark Banner

On 04/04/2016 16:49, a...@imgland.xyz wrote:

I don't know much about Mozilla's privacy but in my opinion feel the
need to immediately remove it from Firefox and push a new beta build


I can understand your concern, however, please understand that this 
logging functionality is currently disabled by default - see the 
"loop.logDomains" preference.


We won't be enabling it until the privacy review is completed.

If you wish to inspect and validate my assertion, you are quite welcome to.

Here's a link to the code so that you can see what is currently on beta

http://mxr.mozilla.org/mozilla-beta/search?string=loop.logDomains===%5E%5B%5E%5C0%5D*%24==mozilla-beta

You can also see from the test file there, that we have a test to check 
that nothing is logged if the pref is set to false.


Mark.
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Firefox Hello new data collection

2016-04-04 Thread Georg Fritzsche
On Mon, Apr 4, 2016 at 5:44 PM, Gijs Kruitbosch 
wrote:

It also seems like you filed the privacy review after the functionality was
> implemented and is now shipping, which per
> https://wiki.mozilla.org/Privacy/Reviews seems like it is too late to
> incorporate meaningful feedback. I'm not on the privacy team, but that
> order looks wrong to me.
>

We have a common data collection review process for Firefox now (with
additionally more intense privacy reviews where needed):
https://wiki.mozilla.org/Firefox/Data_Collection

The idea is definitely to request approval before landing new data
collections.
For more complex collections it is helpful to start communications in the
design phase to catch problems before implementation (maybe this happened
through other channels here?).

Georg
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Firefox Hello new data collection

2016-04-04 Thread adam
I don't know much about Mozilla's privacy but in my opinion feel the need to 
immediately remove it from Firefox and push a new beta build

04.04.2016, 16:45, "Gijs Kruitbosch" :
> On 04/04/2016 11:01, Romain Testard wrote:
>>  The privacy review bug is
>>  https://bugzilla.mozilla.org/show_bug.cgi?id=1261467.
>>  More details added below.
>
> See response at the bottom.
>
>>  On Mon, Apr 4, 2016 at 11:23 AM, Gijs Kruitbosch 
>>  wrote:
>>>  On 04/04/2016 10:01, Romain Testard wrote:
>>>
   We would use a whitelist client-side to only collect domains that are
   part of the top 2000 domains (Alexa list of top domains). This
  prevents
   personal identification based on obscure domain usage.
>>>
>>>  Mathematically, the combination of a set of (popular) domains shared could
>>>  still be uniquely identifying, especially as, AIUI, you will get the counts
>>>  of each domain and in what sequence they were visited / which ones were
>>>  visited in which session. It all depends on the number of unique users and
>>>  the number of domains they visit / share (not clear: see above). Because
>>>  the total number of Hello users compared with the number of Firefox users
>>>  is quite low, this still seems somewhat concerning to me. Have you tried to
>>>  remedy this in any way?
>>
>>  We are aggregating domain names, and are not storing session histories.
>>  These are submitted at the end of the session, so exact timestamps of any
>>  visit are not included.
>
> But both Firefox and Hello sessions are commonly relatively short (<1d)
> and numerous. That means lots of data points, which will likely be
> enough to uniquely identify people even without exact timestamps of
> their visits. (FWIW, from a technical perspective, there is no reason
> why the submission time implies ("so") that exact timestamps of visits
> are not included.)
>
>>>  We looked into this approach originally although we found that we'd lose a
>>  level of granularity that can have an importance. We may find that Hello
>>  gets used a lot with a specific Website for a specific reason and using
>>  client side categories would prevent us from learning this.
>
> This was explicitly not in your original motivation, so you're moving
> the goalposts here. If the goal is about separate categories or separate
> sites then those are pretty distinct goals that require different
> approaches. If the real point is "we have no idea, so we figured we'd
> just get the data and then go from there", why not be upfront about it?
> But in that case, yeah, why not consider a survey or something less
> intrusive, like asking people explicitly what type of site they were
> using, or asking if Mozilla can use the domain in question ?
>
>>  Also Alexa
>>  website categories are far from perfect which would add another level of
>>  complexity to understand the collected data.
>
> At no point did I say I expected you to use their categorization,
> whatever that is. Categorize as you see fit, rather than as Alexa does it.
>
> Conversely, if their categorization is questionable, then your scrubbing
> of the Adult category sounds like it might need auditing? Also, why not
> other categories like "Banking" or "Medical" (NB: no idea what
> categorization Alexa employs, but these seem like categories that ought
> to be scrubbed, too)?
>
>>>  6 months also seems incredibly long. You should be able to aggregate the
>>>  data and keep that ("60% of users share on sites of type X") and throw away
>>>  the raw data much sooner than that.
>>  Yes agreed, we'll look into what's the most optimal amount of time required
>>  to process the data and extract the useful information. I agree we should
>>  try to make this shorter - we'll learn from being on Beta and will adjust
>>  this accordingly.
>
> Well, why not make it 1 week to start with, and make it longer if you
> don't get enough information from beta (with a rationale as to why that
> is the case) ?
>
>>>  Finally, I am surprised that you're sharing this 2 weeks before we're
>>>  releasing Firefox 46. Hasn't this been tested and verified on Nightly
>>>  and/or other channels? Why was no privacy update made at/before that time?
>>
>>  We are shipping Hello through Go Faster. The Go Faster process allows us to
>>  uplift directly to Beta 46 directly since we're a system add-on
>>  (development was done about 2 weeks ago).
>>  Firefox Hello has its own privacy notice (details here
>>  ).
>
> But shipping through go faster does not absolve you from adequately
> testing changes and getting feedback on them. Is the add-on not getting
> tested on nightly at all? Or at the same time as it goes to beta? When
> will it be used on release - when 46 ships as release, or earlier, or later?
>
> It also seems like you filed the privacy review after the functionality
> was implemented and is now shipping, which 

Re: Firefox Hello new data collection

2016-04-04 Thread Gijs Kruitbosch

On 04/04/2016 11:01, Romain Testard wrote:

The privacy review bug is
https://bugzilla.mozilla.org/show_bug.cgi?id=1261467.
More details added below.


See response at the bottom.


On Mon, Apr 4, 2016 at 11:23 AM, Gijs Kruitbosch 
wrote:

On 04/04/2016 10:01, Romain Testard wrote:


 We would use a whitelist client-side to only collect domains that are
 part of the top 2000 domains (Alexa list of top domains). This
prevents
 personal identification based on obscure domain usage.



Mathematically, the combination of a set of (popular) domains shared could
still be uniquely identifying, especially as, AIUI, you will get the counts
of each domain and in what sequence they were visited / which ones were
visited in which session. It all depends on the number of unique users and
the number of domains they visit / share (not clear: see above). Because
the total number of Hello users compared with the number of Firefox users
is quite low, this still seems somewhat concerning to me. Have you tried to
remedy this in any way?



We are aggregating domain names, and are not storing session histories.
These are submitted at the end of the session, so exact timestamps of any
visit are not included.


But both Firefox and Hello sessions are commonly relatively short (<1d) 
and numerous. That means lots of data points, which will likely be 
enough to uniquely identify people even without exact timestamps of 
their visits. (FWIW, from a technical perspective, there is no reason 
why the submission time implies ("so") that exact timestamps of visits 
are not included.)



We looked into this approach originally although we found that we'd lose a

level of granularity that can have an importance. We may find that Hello
gets used a lot with a specific Website for a specific reason and using
client side categories would prevent us from learning this.


This was explicitly not in your original motivation, so you're moving 
the goalposts here. If the goal is about separate categories or separate 
sites then those are pretty distinct goals that require different 
approaches. If the real point is "we have no idea, so we figured we'd 
just get the data and then go from there", why not be upfront about it? 
But in that case, yeah, why not consider a survey or something less 
intrusive, like asking people explicitly what type of site they were 
using, or asking if Mozilla can use the domain in question ?



Also Alexa
website categories are far from perfect which would add another level of
complexity to understand the collected data.


At no point did I say I expected you to use their categorization, 
whatever that is. Categorize as you see fit, rather than as Alexa does it.


Conversely, if their categorization is questionable, then your scrubbing 
of the Adult category sounds like it might need auditing? Also, why not 
other categories like "Banking" or "Medical" (NB: no idea what 
categorization Alexa employs, but these seem like categories that ought 
to be scrubbed, too)?




6 months also seems incredibly long. You should be able to aggregate the
data and keep that ("60% of users share on sites of type X") and throw away
the raw data much sooner than that.


Yes agreed, we'll look into what's the most optimal amount of time required
to process the data and extract the useful information. I agree we should
try to make this shorter - we'll learn from being on Beta and will adjust
this accordingly.


Well, why not make it 1 week to start with, and make it longer if you 
don't get enough information from beta (with a rationale as to why that 
is the case) ?



Finally, I am surprised that you're sharing this 2 weeks before we're
releasing Firefox 46. Hasn't this been tested and verified on Nightly
and/or other channels? Why was no privacy update made at/before that time?



We are shipping Hello through Go Faster. The Go Faster process allows us to
uplift directly to Beta 46 directly since we're a system add-on
(development was done about 2 weeks ago).
Firefox Hello has its own privacy notice (details here
).


But shipping through go faster does not absolve you from adequately 
testing changes and getting feedback on them. Is the add-on not getting 
tested on nightly at all? Or at the same time as it goes to beta? When 
will it be used on release - when 46 ships as release, or earlier, or later?


It also seems like you filed the privacy review after the functionality 
was implemented and is now shipping, which per 
https://wiki.mozilla.org/Privacy/Reviews seems like it is too late to 
incorporate meaningful feedback. I'm not on the privacy team, but that 
order looks wrong to me.


Finally, that privacy policy at no point says anything about Mozilla 
having access to visited/shared domains and thereby potentially to 
personally identifying information.


~ Gijs
___
dev-platform mailing list

Re: Firefox Hello new data collection

2016-04-04 Thread adam
I agree with chofmann in that a simple survey request when users open Hello 
would probably work since Mozilla is trusted by alot of people.

04.04.2016, 16:22, "Chris Hofmann" :
> It also seems like you haven't explored other alternatives to get the data
> you are after, have some theories around what results you might expect, and
> what possible out comes will be pursed once you get the data.
>
> Have you looked at other studies like this and many more that tell about
> general browsing habits?
> http://www.adweek.com/socialtimes/online-time/463670
>
> Have you looked at just doing a simple survey to ask people to tell you
> what kinds of activities they most use when sharing sites with hello?
>
> If the survey or data collection results tell you that some people play
> games against each other *and* some people shop together what will you do
> then?
>
> -chofmann
>
> On Mon, Apr 4, 2016 at 3:01 AM, Romain Testard  wrote:
>
>>  The privacy review bug is
>>  https://bugzilla.mozilla.org/show_bug.cgi?id=1261467.
>>  More details added below.
>>
>>  On Mon, Apr 4, 2016 at 11:23 AM, Gijs Kruitbosch >  >
>>  wrote:
>>
>>  > Hi,
>>  >
>>  > It's very concerning to me that you have not answered the obvious
>>  > question: what domains are collected? All of the ones visited while the
>>  > browser is running? The ones visited while Hello is open? The ones
>>  visited
>>  > while shared through Hello? What about the ones that someone shared with
>>  > you through Hello, rather than that you shared with someone else?
>>  >
>>
>>  We only collect domains browsed whilst sharing your tabs on Firefox Hello
>>  (link generator side).
>>
>>  >
>>  > What about Private Browsing mode, have you disabled collection there?
>>
>>  Firefox Hello cannot be used with private browsing mode.
>>
>>  >
>>  >
>>  > On 04/04/2016 10:01, Romain Testard wrote:
>>  >
>>  >> We would use a whitelist client-side to only collect domains that
>>  are
>>  >> part of the top 2000 domains (Alexa list of top domains). This
>>  >> prevents
>>  >> personal identification based on obscure domain usage.
>>  >>
>>  >
>>  > Mathematically, the combination of a set of (popular) domains shared
>>  could
>>  > still be uniquely identifying, especially as, AIUI, you will get the
>>  counts
>>  > of each domain and in what sequence they were visited / which ones were
>>  > visited in which session. It all depends on the number of unique users
>>  and
>>  > the number of domains they visit / share (not clear: see above). Because
>>  > the total number of Hello users compared with the number of Firefox users
>>  > is quite low, this still seems somewhat concerning to me. Have you tried
>>  to
>>  > remedy this in any way?
>>  >
>>
>>  We are aggregating domain names, and are not storing session histories.
>>  These are submitted at the end of the session, so exact timestamps of any
>>  visit are not included.
>>
>>  The beginning of your message mentioned that you were interested in
>>  > different "types" of sites. I don't think it would be necessary to
>>  optimize
>>  > Hello for one shopping site over another, or for one search engine over
>>  > another, or for one news site over another. So, why don't you categorize
>>  > the domains in the whitelist according to broad categories ("news",
>>  > "search", "shopping", "games", or something like this) on the client
>>  side,
>>  > and then send that information instead? If the set of domains is limited
>>  > (which it is) then this should not take that long, and get you exactly
>>  the
>>  > information you want, and limit the privacy invasion that the current
>>  > collection scheme represents.
>>  >
>>  > We looked into this approach originally although we found that we'd lose
>>  a
>>  level of granularity that can have an importance. We may find that Hello
>>  gets used a lot with a specific Website for a specific reason and using
>>  client side categories would prevent us from learning this. Also Alexa
>>  website categories are far from perfect which would add another level of
>>  complexity to understand the collected data.
>>
>>  > 6 months also seems incredibly long. You should be able to aggregate the
>>  > data and keep that ("60% of users share on sites of type X") and throw
>>  away
>>  > the raw data much sooner than that.
>>  >
>>  Yes agreed, we'll look into what's the most optimal amount of time required
>>  to process the data and extract the useful information. I agree we should
>>  try to make this shorter - we'll learn from being on Beta and will adjust
>>  this accordingly.
>>
>>  >
>>  > Finally, I am surprised that you're sharing this 2 weeks before we're
>>  > releasing Firefox 46. Hasn't this been tested and verified on Nightly
>>  > and/or other channels? Why was no privacy update made at/before that
>>  time?
>>  >
>>
>>  We are shipping Hello through Go Faster. The Go Faster process allows us to
>>  uplift 

Re: Firefox Hello new data collection

2016-04-04 Thread Chris Hofmann
It also seems like you haven't explored other alternatives to get the data
you are after, have some theories around what results you might expect, and
what possible out comes will be pursed once you get the data.

Have you looked at other studies like this and many more that tell about
general browsing habits?
http://www.adweek.com/socialtimes/online-time/463670

Have you looked at just doing a simple survey to ask people to tell you
what kinds of activities they most use when sharing sites with hello?

If the survey or data collection results tell you that some people play
games against each other  *and* some people shop together what will you do
then?

-chofmann

On Mon, Apr 4, 2016 at 3:01 AM, Romain Testard  wrote:

> The privacy review bug is
> https://bugzilla.mozilla.org/show_bug.cgi?id=1261467.
> More details added below.
>
> On Mon, Apr 4, 2016 at 11:23 AM, Gijs Kruitbosch  >
> wrote:
>
> > Hi,
> >
> > It's very concerning to me that you have not answered the obvious
> > question: what domains are collected? All of the ones visited while the
> > browser is running? The ones visited while Hello is open? The ones
> visited
> > while shared through Hello? What about the ones that someone shared with
> > you through Hello, rather than that you shared with someone else?
> >
>
> We only collect domains browsed whilst sharing your tabs on Firefox Hello
> (link generator side).
>
> >
> > What about Private Browsing mode, have you disabled collection there?
>
>
> Firefox Hello cannot be used with private browsing mode.
>
> >
> >
> > On 04/04/2016 10:01, Romain Testard wrote:
> >
> >> We would use a whitelist client-side to only collect domains that
> are
> >> part of the top 2000 domains (Alexa list of top domains). This
> >> prevents
> >> personal identification based on obscure domain usage.
> >>
> >
> > Mathematically, the combination of a set of (popular) domains shared
> could
> > still be uniquely identifying, especially as, AIUI, you will get the
> counts
> > of each domain and in what sequence they were visited / which ones were
> > visited in which session. It all depends on the number of unique users
> and
> > the number of domains they visit / share (not clear: see above). Because
> > the total number of Hello users compared with the number of Firefox users
> > is quite low, this still seems somewhat concerning to me. Have you tried
> to
> > remedy this in any way?
> >
>
> We are aggregating domain names, and are not storing session histories.
> These are submitted at the end of the session, so exact timestamps of any
> visit are not included.
>
> The beginning of your message mentioned that you were interested in
> > different "types" of sites. I don't think it would be necessary to
> optimize
> > Hello for one shopping site over another, or for one search engine over
> > another, or for one news site over another. So, why don't you categorize
> > the domains in the whitelist according to broad categories ("news",
> > "search", "shopping", "games", or something like this) on the client
> side,
> > and then send that information instead? If the set of domains is limited
> > (which it is) then this should not take that long, and get you exactly
> the
> > information you want, and limit the privacy invasion that the current
> > collection scheme represents.
> >
> > We looked into this approach originally although we found that we'd lose
> a
> level of granularity that can have an importance. We may find that Hello
> gets used a lot with a specific Website for a specific reason and using
> client side categories would prevent us from learning this. Also Alexa
> website categories are far from perfect which would add another level of
> complexity to understand the collected data.
>
>
> > 6 months also seems incredibly long. You should be able to aggregate the
> > data and keep that ("60% of users share on sites of type X") and throw
> away
> > the raw data much sooner than that.
> >
> Yes agreed, we'll look into what's the most optimal amount of time required
> to process the data and extract the useful information. I agree we should
> try to make this shorter - we'll learn from being on Beta and will adjust
> this accordingly.
>
> >
> > Finally, I am surprised that you're sharing this 2 weeks before we're
> > releasing Firefox 46. Hasn't this been tested and verified on Nightly
> > and/or other channels? Why was no privacy update made at/before that
> time?
> >
>
> We are shipping Hello through Go Faster. The Go Faster process allows us to
> uplift directly to Beta 46 directly since we're a system add-on
> (development was done about 2 weeks ago).
> Firefox Hello has its own privacy notice (details here
> ).
>
> >
> > ~ Gijs
> > ___
> > dev-platform mailing list
> > dev-platform@lists.mozilla.org
> > 

Re: Firefox Hello new data collection

2016-04-04 Thread Romain Testard
The privacy review bug is
https://bugzilla.mozilla.org/show_bug.cgi?id=1261467.
More details added below.

On Mon, Apr 4, 2016 at 11:23 AM, Gijs Kruitbosch 
wrote:

> Hi,
>
> It's very concerning to me that you have not answered the obvious
> question: what domains are collected? All of the ones visited while the
> browser is running? The ones visited while Hello is open? The ones visited
> while shared through Hello? What about the ones that someone shared with
> you through Hello, rather than that you shared with someone else?
>

We only collect domains browsed whilst sharing your tabs on Firefox Hello
(link generator side).

>
> What about Private Browsing mode, have you disabled collection there?


Firefox Hello cannot be used with private browsing mode.

>
>
> On 04/04/2016 10:01, Romain Testard wrote:
>
>> We would use a whitelist client-side to only collect domains that are
>> part of the top 2000 domains (Alexa list of top domains). This
>> prevents
>> personal identification based on obscure domain usage.
>>
>
> Mathematically, the combination of a set of (popular) domains shared could
> still be uniquely identifying, especially as, AIUI, you will get the counts
> of each domain and in what sequence they were visited / which ones were
> visited in which session. It all depends on the number of unique users and
> the number of domains they visit / share (not clear: see above). Because
> the total number of Hello users compared with the number of Firefox users
> is quite low, this still seems somewhat concerning to me. Have you tried to
> remedy this in any way?
>

We are aggregating domain names, and are not storing session histories.
These are submitted at the end of the session, so exact timestamps of any
visit are not included.

The beginning of your message mentioned that you were interested in
> different "types" of sites. I don't think it would be necessary to optimize
> Hello for one shopping site over another, or for one search engine over
> another, or for one news site over another. So, why don't you categorize
> the domains in the whitelist according to broad categories ("news",
> "search", "shopping", "games", or something like this) on the client side,
> and then send that information instead? If the set of domains is limited
> (which it is) then this should not take that long, and get you exactly the
> information you want, and limit the privacy invasion that the current
> collection scheme represents.
>
> We looked into this approach originally although we found that we'd lose a
level of granularity that can have an importance. We may find that Hello
gets used a lot with a specific Website for a specific reason and using
client side categories would prevent us from learning this. Also Alexa
website categories are far from perfect which would add another level of
complexity to understand the collected data.


> 6 months also seems incredibly long. You should be able to aggregate the
> data and keep that ("60% of users share on sites of type X") and throw away
> the raw data much sooner than that.
>
Yes agreed, we'll look into what's the most optimal amount of time required
to process the data and extract the useful information. I agree we should
try to make this shorter - we'll learn from being on Beta and will adjust
this accordingly.

>
> Finally, I am surprised that you're sharing this 2 weeks before we're
> releasing Firefox 46. Hasn't this been tested and verified on Nightly
> and/or other channels? Why was no privacy update made at/before that time?
>

We are shipping Hello through Go Faster. The Go Faster process allows us to
uplift directly to Beta 46 directly since we're a system add-on
(development was done about 2 weeks ago).
Firefox Hello has its own privacy notice (details here
).

>
> ~ Gijs
> ___
> dev-platform mailing list
> dev-platform@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
>
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Firefox Hello new data collection

2016-04-04 Thread adam
This isn't technically about the data collection but it would be better if 
there was some sort of api that web developers could implement on sites like 
games so instead of regular chat things like co-op and game events could be 
streamlined into Hello itself

04.04.2016, 10:02, "Romain Testard" :
> Hi all,
>
> We wanted to let you know about new data collection that we will be doing
> for Firefox Hello starting with FF46 launch on April 19th, and the steps we
> took to prevent it from collecting personal identification. We want to
> collect more data about the websites that people share with Hello, to help
> optimize the product UX, understand what people use our new tab sharing
> feature for, and prioritize features accordingly. The product features and
> UX can be very different if we decide to optimize against “Shopping
> together” use cases as opposed to “Playing online games together”, just as
> examples.
>
> We did a lot of diligence for this and explored several options for getting
> the data. The approach described below is the one we settled on. It
> prevents personal identification and gets us the data we need to build the
> best tool we can while being sensitive to our users. This involves
> collecting the domain names for tabs shared on Firefox Hello on our own
> servers.
>
> How we collect the data
>
> We plan to put in place a data collection solution that prevents personal
> identification. The technical approach to doing this through the use of
> client-side whitelisting is outlined here:
>
>    -
>
>    Data will go to our servers and will be stored with our other server
>    metrics. We are aggregating domain names, and are not storing session
>    histories. These are submitted at the end of the session, so exact
>    timestamps of any visit are not included.
>    -
>
>    Users who have disabled Health Reports will also not submit this data.
>    -
>
>    We would use a whitelist client-side to only collect domains that are
>    part of the top 2000 domains (Alexa list of top domains). This prevents
>    personal identification based on obscure domain usage. We would subtract
>    the sites from the Adult
>     category and add all
>    the subdomains of:
>
>    -
>
>   google.com
>   (e.g.,
>   drive.google.com)
>   -
>
>   yahoo.com (e.g., games.yahoo.com)
>   -
>
>   developer.mozilla.org, bugzilla.mozilla.org, wiki.mozilla.org (this
>   helps us understand how much our user base is Mozillians)
>   -
>
>   tunes.apple.com
>   -
>
>    You can see the exact list here: DomainWhitelist.jsm
>    
> 
>
>    -
>
>    The data will only be kept for 6 months and we plan to revisit this
>    collection in 6 months. We’ll evaluate at the end of this period if we
>    should carry on collecting the data (the data is still useful and will help
>    further shape the product) or just stop.
>
> This e-mail is intended to make everyone aware of the data we’re collecting
> in Hello in an effort to be as transparent as possible. We want make sure
> people get the full picture of what we are trying to achieve and what we’re
> putting in place to protect our users.
>
> Let me know if you have any questions.
>
> Implementation bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1211542
>
> Technical documentation:
> https://github.com/mozilla/loop/blob/master/docs/DataCollection.md
>
> -Romain
> ___
> dev-platform mailing list
> dev-platform@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-platform
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Firefox Hello new data collection

2016-04-04 Thread Gijs Kruitbosch

On 04/04/2016 10:01, Romain Testard wrote:

Implementation bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1211542


Because this bug does not link to it: where is the bug for the privacy 
review of this collection? Judging by the people you CC'd I assume you 
got one, but where is it?


~ Gijs

___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Re: Firefox Hello new data collection

2016-04-04 Thread Gijs Kruitbosch

Hi,

It's very concerning to me that you have not answered the obvious 
question: what domains are collected? All of the ones visited while the 
browser is running? The ones visited while Hello is open? The ones 
visited while shared through Hello? What about the ones that someone 
shared with you through Hello, rather than that you shared with someone 
else?


What about Private Browsing mode, have you disabled collection there?

On 04/04/2016 10:01, Romain Testard wrote:

We would use a whitelist client-side to only collect domains that are
part of the top 2000 domains (Alexa list of top domains). This prevents
personal identification based on obscure domain usage.


Mathematically, the combination of a set of (popular) domains shared 
could still be uniquely identifying, especially as, AIUI, you will get 
the counts of each domain and in what sequence they were visited / which 
ones were visited in which session. It all depends on the number of 
unique users and the number of domains they visit / share (not clear: 
see above). Because the total number of Hello users compared with the 
number of Firefox users is quite low, this still seems somewhat 
concerning to me. Have you tried to remedy this in any way?


The beginning of your message mentioned that you were interested in 
different "types" of sites. I don't think it would be necessary to 
optimize Hello for one shopping site over another, or for one search 
engine over another, or for one news site over another. So, why don't 
you categorize the domains in the whitelist according to broad 
categories ("news", "search", "shopping", "games", or something like 
this) on the client side, and then send that information instead? If the 
set of domains is limited (which it is) then this should not take that 
long, and get you exactly the information you want, and limit the 
privacy invasion that the current collection scheme represents.


6 months also seems incredibly long. You should be able to aggregate the 
data and keep that ("60% of users share on sites of type X") and throw 
away the raw data much sooner than that.


Finally, I am surprised that you're sharing this 2 weeks before we're 
releasing Firefox 46. Hasn't this been tested and verified on Nightly 
and/or other channels? Why was no privacy update made at/before that time?


~ Gijs
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform


Firefox Hello new data collection

2016-04-04 Thread Romain Testard
Hi all,


We wanted to let you know about new data collection that we will be doing
for Firefox Hello starting with FF46 launch on April 19th, and the steps we
took to prevent it from collecting personal identification. We want to
collect more data about the websites that people share with Hello, to help
optimize the product UX, understand what people use our new tab sharing
feature for, and prioritize features accordingly. The product features and
UX can be very different if we decide to optimize against “Shopping
together” use cases as opposed to “Playing online games together”, just as
examples.


We did a lot of diligence for this and explored several options for getting
the data. The approach described below is the one we settled on. It
prevents personal identification and gets us the data we need to build the
best tool we can while being sensitive to our users. This involves
collecting the domain names for tabs shared on Firefox Hello on our own
servers.


How we collect the data


We plan to put in place a data collection solution that prevents personal
identification. The technical approach to doing this through the use of
client-side whitelisting is outlined here:



   -

   Data will go to our servers and will be stored with our other server
   metrics.  We are aggregating domain names, and are not storing session
   histories. These are submitted at the end of the session, so exact
   timestamps of any visit are not included.
   -

   Users who have disabled Health Reports will also not submit this data.
   -

   We would use a whitelist client-side to only collect domains that are
   part of the top 2000 domains (Alexa list of top domains). This prevents
   personal identification based on obscure domain usage. We would subtract
   the sites from the Adult
    category and add all
   the subdomains of:


   -

  google.com
  (e.g.,
  drive.google.com)
  -

  yahoo.com (e.g., games.yahoo.com)
  -

  developer.mozilla.org, bugzilla.mozilla.org, wiki.mozilla.org (this
  helps us understand how much our user base is Mozillians)
  -

  tunes.apple.com
  -

   You can see the exact list here: DomainWhitelist.jsm
   




   -

   The data will only be kept for 6 months and we plan to revisit this
   collection in 6 months. We’ll evaluate at the end of this period if we
   should carry on collecting the data (the data is still useful and will help
   further shape the product) or just stop.


This e-mail is intended to make everyone aware of the data we’re collecting
in Hello in an effort to be as transparent as possible. We want make sure
people get the full picture of what we are trying to achieve and what we’re
putting in place to protect our users.


Let me know if you have any questions.



Implementation bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1211542

Technical documentation:
https://github.com/mozilla/loop/blob/master/docs/DataCollection.md


-Romain
___
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform