Re: usage analytics

2020-11-03 Thread Valentin Kulichenko
Makes sense to me. I would love to know which components/APIs are used more
than others. Obviously, we should make sure everything is anonymous and we
don't collect any private user data, but I believe this is already
guaranteed by Google Analytics.

-Val

On Tue, Nov 3, 2020 at 3:59 AM Alexey Goncharuk 
wrote:

> Folks,
>
> I want to bump up this discussion and slightly change the format suggested
> by Nikita. I dot think it is correct to gather any information related to
> the user environment. However, can we collect just the fact of some of the
> Ignite APIs/subsystems being used with no user information whatsoever?
> Having started thinking about Ignite 3.0 I realized that we lack even some
> very basic knowledge on the impact of changing one or another feature or
> API.
>
> To my knowledge, the Ignite website already uses google analytics which is
> available to the community. The google analytics platform already has
> tooling to track app screen hits in a completely anonymous way, so we can
> use this tool to track Ignite components usage (once per node startup)
> sending solely component name and a unique environment hash - no IP
> addresses, no jdk/os/other information. The information will be available
> in the same toolkit we are already using to analyze the website and
> optimize our docs.
>
> WDYT?
>
> ср, 19 июл. 2017 г. в 01:15, :
>
> > I would try to ping legal again and see if they respond. If not, I think
> > we will need to come up with a simpler approach, that does not require
> > legal approval.
> >
> > ⁣D.​
> >
> > On Jul 18, 2017, 2:23 PM, at 2:23 PM, Nikita Ivanov  >
> > wrote:
> > >Igniters,
> > >Just a quick update. I haven't gotten response from ASF Legal on this
> > >thread and I frankly don't know how to proceed here. What's the process
> > >to
> > >arrive to a decision point here?
> > >
> > >Thanks!
> > >--
> > >Nikita Ivanov
> > >
> > >
> > >On Mon, Jul 10, 2017 at 3:11 PM, Konstantin Boudnik 
> > >wrote:
> > >
> > >> On Sat, Jul 08, 2017 at 11:04AM, Nikita Ivanov wrote:
> > >> > Cos,
> > >> > Based on my experience having it off by default negates the entire
> > >> > purpose... We need statistically meaningful data set to make any
> > >> inferences
> > >> > from it. Moreover, if we are going to ask folks to turn it on it
> > >will
> > >> > significantly skew the resulting data set anyways and show full
> > >picture.
> > >> I
> > >> > think "on" by default is the better option if we are to collect
> > >usage
> > >> stats
> > >> > to begin with.
> > >>
> > >> yes, sure. But having this "on" by default is likely to expose us to
> > >> another
> > >> shit-storm down the road. An interesting dilemma to have indeed. In
> > >my
> > >> experience, whenever I install something like a browser or an
> > >operating
> > >> system, it would ask if I want to make the particular piece of
> > >software
> > >> better
> > >> by sending back some anonymized stats. Basically, I am given a way to
> > >> explicitly opt-out if I wish.
> > >>
> > >> By turning the feature "on" by default is like saying: "we'll be
> > >collecting
> > >> some stats, but if you don't want to you can go here and there and
> > >disable
> > >> the
> > >> collection. Oh, and by the way - you need to go and figure out the
> > >exact
> > >> steps
> > >> to disable it."
> > >>
> > >> > Also, I want to re-iterate it again to avoid misunderstanding:
> > >there is
> > >> no
> > >> > proposal nor will there be a technical way to attribute collected
> > >data
> > >> back
> > >> > to a certain company. That's not what this is all about. We should
> > >only
> > >> be
> > >> > interested in aggregated stats (community size, geo information,
> > >language
> > >> > information, components usage).
> > >>
> > >> Yes, I think it is clear, but never hurts to re-iterate.
> > >>
> > >> Cos
> > >>
> > >> > Thoughts?
> > >> >
> > >> > --
> > >> > Nikita Ivanov
> > >> > Founder & CTO
> > >> > GridGain Systems
> > >> >
> > >> > On Fri, Jul 7, 2017 at 8:17 PM, Konstantin Boudnik 
> > >> wrote:
> > >> >
> > >> > > Actually, that should be OFF by default. It sounds like this
> > >reduce the
> > >> > > amount
> > >> > > of the data collected, but this would address the concerns of
> > >companies
> > >> > > like
> > >> > > Roman's. I know for sure that a few of my clients would sue my
> > >ass out
> > >> of
> > >> > > existence if I gave them the platform collecting their
> > >data-centers
> > >> info.
> > >> > >
> > >> > > Let's have it, set if off by default and document and easy way to
> > >turn
> > >> it
> > >> > > off.
> > >> > > Then start making rounds asking our user base to share _some_ of
> > >the
> > >> stats
> > >> > > with the community, so we can track the growth of the install
> > >base,
> > >> etc.
> > >> > >
> > >> > > Cos
> > >> > >
> > >> > > On Thu, Jul 06, 2017 at 08:20AM, Nikita Ivanov wrote:
> > >> > > > The idea so far is to have a single system property in
> > >configuration
> > >> that
> > >> > > > turns this off 

Re: usage analytics

2020-11-03 Thread Alexey Goncharuk
Folks,

I want to bump up this discussion and slightly change the format suggested
by Nikita. I dot think it is correct to gather any information related to
the user environment. However, can we collect just the fact of some of the
Ignite APIs/subsystems being used with no user information whatsoever?
Having started thinking about Ignite 3.0 I realized that we lack even some
very basic knowledge on the impact of changing one or another feature or
API.

To my knowledge, the Ignite website already uses google analytics which is
available to the community. The google analytics platform already has
tooling to track app screen hits in a completely anonymous way, so we can
use this tool to track Ignite components usage (once per node startup)
sending solely component name and a unique environment hash - no IP
addresses, no jdk/os/other information. The information will be available
in the same toolkit we are already using to analyze the website and
optimize our docs.

WDYT?

ср, 19 июл. 2017 г. в 01:15, :

> I would try to ping legal again and see if they respond. If not, I think
> we will need to come up with a simpler approach, that does not require
> legal approval.
>
> ⁣D.​
>
> On Jul 18, 2017, 2:23 PM, at 2:23 PM, Nikita Ivanov 
> wrote:
> >Igniters,
> >Just a quick update. I haven't gotten response from ASF Legal on this
> >thread and I frankly don't know how to proceed here. What's the process
> >to
> >arrive to a decision point here?
> >
> >Thanks!
> >--
> >Nikita Ivanov
> >
> >
> >On Mon, Jul 10, 2017 at 3:11 PM, Konstantin Boudnik 
> >wrote:
> >
> >> On Sat, Jul 08, 2017 at 11:04AM, Nikita Ivanov wrote:
> >> > Cos,
> >> > Based on my experience having it off by default negates the entire
> >> > purpose... We need statistically meaningful data set to make any
> >> inferences
> >> > from it. Moreover, if we are going to ask folks to turn it on it
> >will
> >> > significantly skew the resulting data set anyways and show full
> >picture.
> >> I
> >> > think "on" by default is the better option if we are to collect
> >usage
> >> stats
> >> > to begin with.
> >>
> >> yes, sure. But having this "on" by default is likely to expose us to
> >> another
> >> shit-storm down the road. An interesting dilemma to have indeed. In
> >my
> >> experience, whenever I install something like a browser or an
> >operating
> >> system, it would ask if I want to make the particular piece of
> >software
> >> better
> >> by sending back some anonymized stats. Basically, I am given a way to
> >> explicitly opt-out if I wish.
> >>
> >> By turning the feature "on" by default is like saying: "we'll be
> >collecting
> >> some stats, but if you don't want to you can go here and there and
> >disable
> >> the
> >> collection. Oh, and by the way - you need to go and figure out the
> >exact
> >> steps
> >> to disable it."
> >>
> >> > Also, I want to re-iterate it again to avoid misunderstanding:
> >there is
> >> no
> >> > proposal nor will there be a technical way to attribute collected
> >data
> >> back
> >> > to a certain company. That's not what this is all about. We should
> >only
> >> be
> >> > interested in aggregated stats (community size, geo information,
> >language
> >> > information, components usage).
> >>
> >> Yes, I think it is clear, but never hurts to re-iterate.
> >>
> >> Cos
> >>
> >> > Thoughts?
> >> >
> >> > --
> >> > Nikita Ivanov
> >> > Founder & CTO
> >> > GridGain Systems
> >> >
> >> > On Fri, Jul 7, 2017 at 8:17 PM, Konstantin Boudnik 
> >> wrote:
> >> >
> >> > > Actually, that should be OFF by default. It sounds like this
> >reduce the
> >> > > amount
> >> > > of the data collected, but this would address the concerns of
> >companies
> >> > > like
> >> > > Roman's. I know for sure that a few of my clients would sue my
> >ass out
> >> of
> >> > > existence if I gave them the platform collecting their
> >data-centers
> >> info.
> >> > >
> >> > > Let's have it, set if off by default and document and easy way to
> >turn
> >> it
> >> > > off.
> >> > > Then start making rounds asking our user base to share _some_ of
> >the
> >> stats
> >> > > with the community, so we can track the growth of the install
> >base,
> >> etc.
> >> > >
> >> > > Cos
> >> > >
> >> > > On Thu, Jul 06, 2017 at 08:20AM, Nikita Ivanov wrote:
> >> > > > The idea so far is to have a single system property in
> >configuration
> >> that
> >> > > > turns this off completely. I envision that this will be
> >prominently
> >> > > > featured on Ignite website so that everyone who would like to
> >> disable it
> >> > > -
> >> > > > can do it in seconds.
> >> > > >
> >> > > > Thoughts?
> >> > > >
> >> > > > --
> >> > > > Nikita Ivanov
> >> > > > Founder & CTO
> >> > > > GridGain Systems
> >> > > >
> >> > > > On Wed, Jul 5, 2017 at 9:27 PM, Roman Shtykh
> >
> >> wrote:
> >> > > >
> >> > > > > Nikita,
> >> > > > >
> >> > > > > Sending and storing (somewhere the company cannot securely
> >handle)
> >> any
> >> > > > > information (OS version, IP addresses, 

Re: usage analytics

2017-07-18 Thread dsetrakyan
I would try to ping legal again and see if they respond. If not, I think we 
will need to come up with a simpler approach, that does not require legal 
approval.

⁣D.​

On Jul 18, 2017, 2:23 PM, at 2:23 PM, Nikita Ivanov  wrote:
>Igniters,
>Just a quick update. I haven't gotten response from ASF Legal on this
>thread and I frankly don't know how to proceed here. What's the process
>to
>arrive to a decision point here?
>
>Thanks!
>--
>Nikita Ivanov
>
>
>On Mon, Jul 10, 2017 at 3:11 PM, Konstantin Boudnik 
>wrote:
>
>> On Sat, Jul 08, 2017 at 11:04AM, Nikita Ivanov wrote:
>> > Cos,
>> > Based on my experience having it off by default negates the entire
>> > purpose... We need statistically meaningful data set to make any
>> inferences
>> > from it. Moreover, if we are going to ask folks to turn it on it
>will
>> > significantly skew the resulting data set anyways and show full
>picture.
>> I
>> > think "on" by default is the better option if we are to collect
>usage
>> stats
>> > to begin with.
>>
>> yes, sure. But having this "on" by default is likely to expose us to
>> another
>> shit-storm down the road. An interesting dilemma to have indeed. In
>my
>> experience, whenever I install something like a browser or an
>operating
>> system, it would ask if I want to make the particular piece of
>software
>> better
>> by sending back some anonymized stats. Basically, I am given a way to
>> explicitly opt-out if I wish.
>>
>> By turning the feature "on" by default is like saying: "we'll be
>collecting
>> some stats, but if you don't want to you can go here and there and
>disable
>> the
>> collection. Oh, and by the way - you need to go and figure out the
>exact
>> steps
>> to disable it."
>>
>> > Also, I want to re-iterate it again to avoid misunderstanding:
>there is
>> no
>> > proposal nor will there be a technical way to attribute collected
>data
>> back
>> > to a certain company. That's not what this is all about. We should
>only
>> be
>> > interested in aggregated stats (community size, geo information,
>language
>> > information, components usage).
>>
>> Yes, I think it is clear, but never hurts to re-iterate.
>>
>> Cos
>>
>> > Thoughts?
>> >
>> > --
>> > Nikita Ivanov
>> > Founder & CTO
>> > GridGain Systems
>> >
>> > On Fri, Jul 7, 2017 at 8:17 PM, Konstantin Boudnik 
>> wrote:
>> >
>> > > Actually, that should be OFF by default. It sounds like this
>reduce the
>> > > amount
>> > > of the data collected, but this would address the concerns of
>companies
>> > > like
>> > > Roman's. I know for sure that a few of my clients would sue my
>ass out
>> of
>> > > existence if I gave them the platform collecting their
>data-centers
>> info.
>> > >
>> > > Let's have it, set if off by default and document and easy way to
>turn
>> it
>> > > off.
>> > > Then start making rounds asking our user base to share _some_ of
>the
>> stats
>> > > with the community, so we can track the growth of the install
>base,
>> etc.
>> > >
>> > > Cos
>> > >
>> > > On Thu, Jul 06, 2017 at 08:20AM, Nikita Ivanov wrote:
>> > > > The idea so far is to have a single system property in
>configuration
>> that
>> > > > turns this off completely. I envision that this will be
>prominently
>> > > > featured on Ignite website so that everyone who would like to
>> disable it
>> > > -
>> > > > can do it in seconds.
>> > > >
>> > > > Thoughts?
>> > > >
>> > > > --
>> > > > Nikita Ivanov
>> > > > Founder & CTO
>> > > > GridGain Systems
>> > > >
>> > > > On Wed, Jul 5, 2017 at 9:27 PM, Roman Shtykh
>
>> wrote:
>> > > >
>> > > > > Nikita,
>> > > > >
>> > > > > Sending and storing (somewhere the company cannot securely
>handle)
>> any
>> > > > > information (OS version, IP addresses, etc.) that can be used
>to
>> > > compromise
>> > > > > the services would be unacceptable.
>> > > > > Turning it off might be ok (possibly through the cluster
>settings,
>> not
>> > > via
>> > > > > globally-accessible site), but the thing that there's a risk
>some
>> > > > > information can leak outside (for any reason, starting from a
>human
>> > > > > mistake) is scary.
>> > > > >
>> > > > > -- Roman
>> > > > >
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Thursday, July 6, 2017 12:38 PM, Nikita Ivanov <
>> > > niva...@gridgain.com>
>> > > > > wrote:
>> > > > >
>> > > > >
>> > > > > Roman,
>> > > > > Thanks for the feedback. What are those questions
>specifically?
>> Are IP
>> > > > > addresses and OS is what causing it?
>> > > > >
>> > > > > Thanks!
>> > > > >
>> > > > > --
>> > > > > Nikita Ivanov
>> > > > > Founder & CTO
>> > > > > GridGain Systems
>> > > > >
>> > > > > On Wed, Jul 5, 2017 at 6:15 PM, Roman Shtykh
>> > > > >
>> > > > > wrote:
>> > > > >
>> > > > > NIkita,
>> > > > >
>> > > > > While this will help improve Ignite, it will prevent its
>adoption
>> by
>> > > many
>> > > > > projects -- sending and retaining IP adresses, OS versions,
>etc.
>> raises
>> > 

Re: usage analytics

2017-07-18 Thread Nikita Ivanov
Igniters,
Just a quick update. I haven't gotten response from ASF Legal on this
thread and I frankly don't know how to proceed here. What's the process to
arrive to a decision point here?

Thanks!
--
Nikita Ivanov


On Mon, Jul 10, 2017 at 3:11 PM, Konstantin Boudnik  wrote:

> On Sat, Jul 08, 2017 at 11:04AM, Nikita Ivanov wrote:
> > Cos,
> > Based on my experience having it off by default negates the entire
> > purpose... We need statistically meaningful data set to make any
> inferences
> > from it. Moreover, if we are going to ask folks to turn it on it will
> > significantly skew the resulting data set anyways and show full picture.
> I
> > think "on" by default is the better option if we are to collect usage
> stats
> > to begin with.
>
> yes, sure. But having this "on" by default is likely to expose us to
> another
> shit-storm down the road. An interesting dilemma to have indeed. In my
> experience, whenever I install something like a browser or an operating
> system, it would ask if I want to make the particular piece of software
> better
> by sending back some anonymized stats. Basically, I am given a way to
> explicitly opt-out if I wish.
>
> By turning the feature "on" by default is like saying: "we'll be collecting
> some stats, but if you don't want to you can go here and there and disable
> the
> collection. Oh, and by the way - you need to go and figure out the exact
> steps
> to disable it."
>
> > Also, I want to re-iterate it again to avoid misunderstanding: there is
> no
> > proposal nor will there be a technical way to attribute collected data
> back
> > to a certain company. That's not what this is all about. We should only
> be
> > interested in aggregated stats (community size, geo information, language
> > information, components usage).
>
> Yes, I think it is clear, but never hurts to re-iterate.
>
> Cos
>
> > Thoughts?
> >
> > --
> > Nikita Ivanov
> > Founder & CTO
> > GridGain Systems
> >
> > On Fri, Jul 7, 2017 at 8:17 PM, Konstantin Boudnik 
> wrote:
> >
> > > Actually, that should be OFF by default. It sounds like this reduce the
> > > amount
> > > of the data collected, but this would address the concerns of companies
> > > like
> > > Roman's. I know for sure that a few of my clients would sue my ass out
> of
> > > existence if I gave them the platform collecting their data-centers
> info.
> > >
> > > Let's have it, set if off by default and document and easy way to turn
> it
> > > off.
> > > Then start making rounds asking our user base to share _some_ of the
> stats
> > > with the community, so we can track the growth of the install base,
> etc.
> > >
> > > Cos
> > >
> > > On Thu, Jul 06, 2017 at 08:20AM, Nikita Ivanov wrote:
> > > > The idea so far is to have a single system property in configuration
> that
> > > > turns this off completely. I envision that this will be prominently
> > > > featured on Ignite website so that everyone who would like to
> disable it
> > > -
> > > > can do it in seconds.
> > > >
> > > > Thoughts?
> > > >
> > > > --
> > > > Nikita Ivanov
> > > > Founder & CTO
> > > > GridGain Systems
> > > >
> > > > On Wed, Jul 5, 2017 at 9:27 PM, Roman Shtykh 
> wrote:
> > > >
> > > > > Nikita,
> > > > >
> > > > > Sending and storing (somewhere the company cannot securely handle)
> any
> > > > > information (OS version, IP addresses, etc.) that can be used to
> > > compromise
> > > > > the services would be unacceptable.
> > > > > Turning it off might be ok (possibly through the cluster settings,
> not
> > > via
> > > > > globally-accessible site), but the thing that there's a risk some
> > > > > information can leak outside (for any reason, starting from a human
> > > > > mistake) is scary.
> > > > >
> > > > > -- Roman
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Thursday, July 6, 2017 12:38 PM, Nikita Ivanov <
> > > niva...@gridgain.com>
> > > > > wrote:
> > > > >
> > > > >
> > > > > Roman,
> > > > > Thanks for the feedback. What are those questions specifically?
> Are IP
> > > > > addresses and OS is what causing it?
> > > > >
> > > > > Thanks!
> > > > >
> > > > > --
> > > > > Nikita Ivanov
> > > > > Founder & CTO
> > > > > GridGain Systems
> > > > >
> > > > > On Wed, Jul 5, 2017 at 6:15 PM, Roman Shtykh
>  > > >
> > > > > wrote:
> > > > >
> > > > > NIkita,
> > > > >
> > > > > While this will help improve Ignite, it will prevent its adoption
> by
> > > many
> > > > > projects -- sending and retaining IP adresses, OS versions, etc.
> raises
> > > > > tons of questions when considering to use Ignite. Even if it can be
> > > opted
> > > > > out.
> > > > > -- Roman
> > > > >
> > > > >
> > > > > On Thursday, July 6, 2017 5:38 AM, Nikita Ivanov <
> > > nivano...@gmail.com>
> > > > > wrote:
> > > > >
> > > > >
> > > > >  Igniters,
> > > > > I would like to kick off the discussion on the idea of collecting
> > > Ignite
> > > > > usage statistics. The basic idea behind this 

Re: usage analytics

2017-07-08 Thread Nikita Ivanov
Cos,
Based on my experience having it off by default negates the entire
purpose... We need statistically meaningful data set to make any inferences
from it. Moreover, if we are going to ask folks to turn it on it will
significantly skew the resulting data set anyways and show full picture. I
think "on" by default is the better option if we are to collect usage stats
to begin with.

Also, I want to re-iterate it again to avoid misunderstanding: there is no
proposal nor will there be a technical way to attribute collected data back
to a certain company. That's not what this is all about. We should only be
interested in aggregated stats (community size, geo information, language
information, components usage).

Thoughts?

--
Nikita Ivanov
Founder & CTO
GridGain Systems

On Fri, Jul 7, 2017 at 8:17 PM, Konstantin Boudnik  wrote:

> Actually, that should be OFF by default. It sounds like this reduce the
> amount
> of the data collected, but this would address the concerns of companies
> like
> Roman's. I know for sure that a few of my clients would sue my ass out of
> existence if I gave them the platform collecting their data-centers info.
>
> Let's have it, set if off by default and document and easy way to turn it
> off.
> Then start making rounds asking our user base to share _some_ of the stats
> with the community, so we can track the growth of the install base, etc.
>
> Cos
>
> On Thu, Jul 06, 2017 at 08:20AM, Nikita Ivanov wrote:
> > The idea so far is to have a single system property in configuration that
> > turns this off completely. I envision that this will be prominently
> > featured on Ignite website so that everyone who would like to disable it
> -
> > can do it in seconds.
> >
> > Thoughts?
> >
> > --
> > Nikita Ivanov
> > Founder & CTO
> > GridGain Systems
> >
> > On Wed, Jul 5, 2017 at 9:27 PM, Roman Shtykh  wrote:
> >
> > > Nikita,
> > >
> > > Sending and storing (somewhere the company cannot securely handle) any
> > > information (OS version, IP addresses, etc.) that can be used to
> compromise
> > > the services would be unacceptable.
> > > Turning it off might be ok (possibly through the cluster settings, not
> via
> > > globally-accessible site), but the thing that there's a risk some
> > > information can leak outside (for any reason, starting from a human
> > > mistake) is scary.
> > >
> > > -- Roman
> > >
> > >
> > >
> > >
> > > On Thursday, July 6, 2017 12:38 PM, Nikita Ivanov <
> niva...@gridgain.com>
> > > wrote:
> > >
> > >
> > > Roman,
> > > Thanks for the feedback. What are those questions specifically? Are IP
> > > addresses and OS is what causing it?
> > >
> > > Thanks!
> > >
> > > --
> > > Nikita Ivanov
> > > Founder & CTO
> > > GridGain Systems
> > >
> > > On Wed, Jul 5, 2017 at 6:15 PM, Roman Shtykh  >
> > > wrote:
> > >
> > > NIkita,
> > >
> > > While this will help improve Ignite, it will prevent its adoption by
> many
> > > projects -- sending and retaining IP adresses, OS versions, etc. raises
> > > tons of questions when considering to use Ignite. Even if it can be
> opted
> > > out.
> > > -- Roman
> > >
> > >
> > > On Thursday, July 6, 2017 5:38 AM, Nikita Ivanov <
> nivano...@gmail.com>
> > > wrote:
> > >
> > >
> > >  Igniters,
> > > I would like to kick off the discussion on the idea of collecting
> Ignite
> > > usage statistics. The basic idea behind this is to better understand
> > > general and anonymous Ignite usage information to better calibrate
> > > community efforts in developing new features, improving existing ones,
> > > delivering better documentation - and in every other way to make our
> > > project a better software solution.
> > >
> > > Although such instrumentation is standard practice in commercially
> > > developed software, for an ASF project this could be a sensitive issue.
> > > Therefore I would like to initiate a full community discussion on how
> best
> > > to implement such practice for the benefit of project while ensuring
> the
> > > privacy protection of Ignite users.
> > >
> > > To ignite (pun intended) the discussion I'll outline below some of the
> > > basic thoughts that I have on this subject. They are here only to give
> an
> > > idea of what such instrumentation may potentially look like so that we
> can
> > > discuss the merits of this idea in a tangible context.
> > >
> > > Overview
> > > -
> > > Upon start and every hour thereafter each Ignite node will collect,
> encrypt
> > > and send usage statistics over HTTPS to the ASF-hosted server. That
> server
> > > will accept such HTTPS packets, decrypt them and store them in a
> > > time-series DB. A web interface will be provided to view the usage
> > > information.
> > >
> > > Opt-In or Opt-out
> > > -
> > > Opt-out. Ignite website will offer simple instructions (system
> property) on
> > > how to disable this instrumentation.
> > >
> > > Code, Infra, Access
> > > 

Re: usage analytics

2017-07-06 Thread Nikita Ivanov
The idea so far is to have a single system property in configuration that
turns this off completely. I envision that this will be prominently
featured on Ignite website so that everyone who would like to disable it -
can do it in seconds.

Thoughts?

--
Nikita Ivanov
Founder & CTO
GridGain Systems

On Wed, Jul 5, 2017 at 9:27 PM, Roman Shtykh  wrote:

> Nikita,
>
> Sending and storing (somewhere the company cannot securely handle) any
> information (OS version, IP addresses, etc.) that can be used to compromise
> the services would be unacceptable.
> Turning it off might be ok (possibly through the cluster settings, not via
> globally-accessible site), but the thing that there's a risk some
> information can leak outside (for any reason, starting from a human
> mistake) is scary.
>
> -- Roman
>
>
>
>
> On Thursday, July 6, 2017 12:38 PM, Nikita Ivanov 
> wrote:
>
>
> Roman,
> Thanks for the feedback. What are those questions specifically? Are IP
> addresses and OS is what causing it?
>
> Thanks!
>
> --
> Nikita Ivanov
> Founder & CTO
> GridGain Systems
>
> On Wed, Jul 5, 2017 at 6:15 PM, Roman Shtykh 
> wrote:
>
> NIkita,
>
> While this will help improve Ignite, it will prevent its adoption by many
> projects -- sending and retaining IP adresses, OS versions, etc. raises
> tons of questions when considering to use Ignite. Even if it can be opted
> out.
> -- Roman
>
>
> On Thursday, July 6, 2017 5:38 AM, Nikita Ivanov 
> wrote:
>
>
>  Igniters,
> I would like to kick off the discussion on the idea of collecting Ignite
> usage statistics. The basic idea behind this is to better understand
> general and anonymous Ignite usage information to better calibrate
> community efforts in developing new features, improving existing ones,
> delivering better documentation - and in every other way to make our
> project a better software solution.
>
> Although such instrumentation is standard practice in commercially
> developed software, for an ASF project this could be a sensitive issue.
> Therefore I would like to initiate a full community discussion on how best
> to implement such practice for the benefit of project while ensuring the
> privacy protection of Ignite users.
>
> To ignite (pun intended) the discussion I'll outline below some of the
> basic thoughts that I have on this subject. They are here only to give an
> idea of what such instrumentation may potentially look like so that we can
> discuss the merits of this idea in a tangible context.
>
> Overview
> -
> Upon start and every hour thereafter each Ignite node will collect, encrypt
> and send usage statistics over HTTPS to the ASF-hosted server. That server
> will accept such HTTPS packets, decrypt them and store them in a
> time-series DB. A web interface will be provided to view the usage
> information.
>
> Opt-In or Opt-out
> -
> Opt-out. Ignite website will offer simple instructions (system property) on
> how to disable this instrumentation.
>
> Code, Infra, Access
> ---
> Ignite instrumentation will be part of the Ignite code base. The collection
> server will be a separate module in the Ignite code base (released
> separately from Ignite). The collection server will be hosted by ASF Infra.
>
> Usage statistics will be publicly accessible by anyone in the community.
>
> Private, Personal Data
> --
> No private or personal data will ever be transferred. No emails, usernames,
> company names, grid names, etc.
>
> Data Retention
> 
> All data will be retained for 1 year and deleted permanently thereafter.
>
> Usage Data
> 
> The following data will be collected in each packet sent to the collection
> server:
> - GRID_SIZE (to correspond our testing environment with the more frequent
> cluster sizes)
> - IP_ADDR (for general geo-tracking as well as to know what documentation
> language should be a priority)
> - SES_ID (to track continues uptime vs. re-starts)
> - USERNAME_TYPE (privilege username vs. standard, to track production vs.
> dev/testing usage; note - this is not an actual username)
> - OS_NAME
> - OS_VER
> - OS_ARCH
> - JAVA_VER
> - JAVA_VENDOR
> - COMP_SQL (whether or not this feature was used)
> - COMP_COMPUTE (whether or not this feature was used)
> - COMP_DATAGRID (whether or not this feature was used)
> - COMP_STREAMING (whether or not this feature was used)
> - COMP_IGFS (whether or not this feature was used)
> - COMP_SERVICE (whether or not this feature was used)
> - COMP_PERSISTENCE (whether or not this feature was used)
>
> Please let's discuss this idea. Everyone's comments and suggestions are
> *extremely* welcome.
>
> Thanks,
> Nikita Ivanov.
>
>
>
>
>
>
>
>


Re: usage analytics

2017-07-05 Thread Rishi Yagnik
With such statistics collected by Ignite , we won't ever accept ignite in our 
environment.

However, turning on and off stats collection capabilities would be helpful here 
if the feature is accepted further for implementation.

Take Care,
Rishi

> On Jul 5, 2017, at 8:15 PM, Roman Shtykh  wrote:
> 
> NIkita,
> 
> While this will help improve Ignite, it will prevent its adoption by many 
> projects -- sending and retaining IP adresses, OS versions, etc. raises tons 
> of questions when considering to use Ignite. Even if it can be opted out.
> -- Roman
> 
> 
>On Thursday, July 6, 2017 5:38 AM, Nikita Ivanov  
> wrote:
> 
> 
> Igniters,
> I would like to kick off the discussion on the idea of collecting Ignite
> usage statistics. The basic idea behind this is to better understand
> general and anonymous Ignite usage information to better calibrate
> community efforts in developing new features, improving existing ones,
> delivering better documentation - and in every other way to make our
> project a better software solution.
> 
> Although such instrumentation is standard practice in commercially
> developed software, for an ASF project this could be a sensitive issue.
> Therefore I would like to initiate a full community discussion on how best
> to implement such practice for the benefit of project while ensuring the
> privacy protection of Ignite users.
> 
> To ignite (pun intended) the discussion I'll outline below some of the
> basic thoughts that I have on this subject. They are here only to give an
> idea of what such instrumentation may potentially look like so that we can
> discuss the merits of this idea in a tangible context.
> 
> Overview
> -
> Upon start and every hour thereafter each Ignite node will collect, encrypt
> and send usage statistics over HTTPS to the ASF-hosted server. That server
> will accept such HTTPS packets, decrypt them and store them in a
> time-series DB. A web interface will be provided to view the usage
> information.
> 
> Opt-In or Opt-out
> -
> Opt-out. Ignite website will offer simple instructions (system property) on
> how to disable this instrumentation.
> 
> Code, Infra, Access
> ---
> Ignite instrumentation will be part of the Ignite code base. The collection
> server will be a separate module in the Ignite code base (released
> separately from Ignite). The collection server will be hosted by ASF Infra.
> 
> Usage statistics will be publicly accessible by anyone in the community.
> 
> Private, Personal Data
> --
> No private or personal data will ever be transferred. No emails, usernames,
> company names, grid names, etc.
> 
> Data Retention
> 
> All data will be retained for 1 year and deleted permanently thereafter.
> 
> Usage Data
> 
> The following data will be collected in each packet sent to the collection
> server:
> - GRID_SIZE (to correspond our testing environment with the more frequent
> cluster sizes)
> - IP_ADDR (for general geo-tracking as well as to know what documentation
> language should be a priority)
> - SES_ID (to track continues uptime vs. re-starts)
> - USERNAME_TYPE (privilege username vs. standard, to track production vs.
> dev/testing usage; note - this is not an actual username)
> - OS_NAME
> - OS_VER
> - OS_ARCH
> - JAVA_VER
> - JAVA_VENDOR
> - COMP_SQL (whether or not this feature was used)
> - COMP_COMPUTE (whether or not this feature was used)
> - COMP_DATAGRID (whether or not this feature was used)
> - COMP_STREAMING (whether or not this feature was used)
> - COMP_IGFS (whether or not this feature was used)
> - COMP_SERVICE (whether or not this feature was used)
> - COMP_PERSISTENCE (whether or not this feature was used)
> 
> Please let's discuss this idea. Everyone's comments and suggestions are
> *extremely* welcome.
> 
> Thanks,
> Nikita Ivanov.
> 
> 


Re: usage analytics

2017-07-05 Thread Nikita Ivanov
Roman,
Thanks for the feedback. What are those questions specifically? Are IP
addresses and OS is what causing it?

Thanks!

--
Nikita Ivanov
Founder & CTO
GridGain Systems

On Wed, Jul 5, 2017 at 6:15 PM, Roman Shtykh 
wrote:

> NIkita,
>
> While this will help improve Ignite, it will prevent its adoption by many
> projects -- sending and retaining IP adresses, OS versions, etc. raises
> tons of questions when considering to use Ignite. Even if it can be opted
> out.
> -- Roman
>
>
> On Thursday, July 6, 2017 5:38 AM, Nikita Ivanov 
> wrote:
>
>
>  Igniters,
> I would like to kick off the discussion on the idea of collecting Ignite
> usage statistics. The basic idea behind this is to better understand
> general and anonymous Ignite usage information to better calibrate
> community efforts in developing new features, improving existing ones,
> delivering better documentation - and in every other way to make our
> project a better software solution.
>
> Although such instrumentation is standard practice in commercially
> developed software, for an ASF project this could be a sensitive issue.
> Therefore I would like to initiate a full community discussion on how best
> to implement such practice for the benefit of project while ensuring the
> privacy protection of Ignite users.
>
> To ignite (pun intended) the discussion I'll outline below some of the
> basic thoughts that I have on this subject. They are here only to give an
> idea of what such instrumentation may potentially look like so that we can
> discuss the merits of this idea in a tangible context.
>
> Overview
> -
> Upon start and every hour thereafter each Ignite node will collect, encrypt
> and send usage statistics over HTTPS to the ASF-hosted server. That server
> will accept such HTTPS packets, decrypt them and store them in a
> time-series DB. A web interface will be provided to view the usage
> information.
>
> Opt-In or Opt-out
> -
> Opt-out. Ignite website will offer simple instructions (system property) on
> how to disable this instrumentation.
>
> Code, Infra, Access
> ---
> Ignite instrumentation will be part of the Ignite code base. The collection
> server will be a separate module in the Ignite code base (released
> separately from Ignite). The collection server will be hosted by ASF Infra.
>
> Usage statistics will be publicly accessible by anyone in the community.
>
> Private, Personal Data
> --
> No private or personal data will ever be transferred. No emails, usernames,
> company names, grid names, etc.
>
> Data Retention
> 
> All data will be retained for 1 year and deleted permanently thereafter.
>
> Usage Data
> 
> The following data will be collected in each packet sent to the collection
> server:
> - GRID_SIZE (to correspond our testing environment with the more frequent
> cluster sizes)
> - IP_ADDR (for general geo-tracking as well as to know what documentation
> language should be a priority)
> - SES_ID (to track continues uptime vs. re-starts)
> - USERNAME_TYPE (privilege username vs. standard, to track production vs.
> dev/testing usage; note - this is not an actual username)
> - OS_NAME
> - OS_VER
> - OS_ARCH
> - JAVA_VER
> - JAVA_VENDOR
> - COMP_SQL (whether or not this feature was used)
> - COMP_COMPUTE (whether or not this feature was used)
> - COMP_DATAGRID (whether or not this feature was used)
> - COMP_STREAMING (whether or not this feature was used)
> - COMP_IGFS (whether or not this feature was used)
> - COMP_SERVICE (whether or not this feature was used)
> - COMP_PERSISTENCE (whether or not this feature was used)
>
> Please let's discuss this idea. Everyone's comments and suggestions are
> *extremely* welcome.
>
> Thanks,
> Nikita Ivanov.
>
>
>


Re: usage analytics

2017-07-05 Thread Roman Shtykh
NIkita,

While this will help improve Ignite, it will prevent its adoption by many 
projects -- sending and retaining IP adresses, OS versions, etc. raises tons of 
questions when considering to use Ignite. Even if it can be opted out.
-- Roman


On Thursday, July 6, 2017 5:38 AM, Nikita Ivanov  
wrote:
 

 Igniters,
I would like to kick off the discussion on the idea of collecting Ignite
usage statistics. The basic idea behind this is to better understand
general and anonymous Ignite usage information to better calibrate
community efforts in developing new features, improving existing ones,
delivering better documentation - and in every other way to make our
project a better software solution.

Although such instrumentation is standard practice in commercially
developed software, for an ASF project this could be a sensitive issue.
Therefore I would like to initiate a full community discussion on how best
to implement such practice for the benefit of project while ensuring the
privacy protection of Ignite users.

To ignite (pun intended) the discussion I'll outline below some of the
basic thoughts that I have on this subject. They are here only to give an
idea of what such instrumentation may potentially look like so that we can
discuss the merits of this idea in a tangible context.

Overview
-
Upon start and every hour thereafter each Ignite node will collect, encrypt
and send usage statistics over HTTPS to the ASF-hosted server. That server
will accept such HTTPS packets, decrypt them and store them in a
time-series DB. A web interface will be provided to view the usage
information.

Opt-In or Opt-out
-
Opt-out. Ignite website will offer simple instructions (system property) on
how to disable this instrumentation.

Code, Infra, Access
---
Ignite instrumentation will be part of the Ignite code base. The collection
server will be a separate module in the Ignite code base (released
separately from Ignite). The collection server will be hosted by ASF Infra.

Usage statistics will be publicly accessible by anyone in the community.

Private, Personal Data
--
No private or personal data will ever be transferred. No emails, usernames,
company names, grid names, etc.

Data Retention

All data will be retained for 1 year and deleted permanently thereafter.

Usage Data

The following data will be collected in each packet sent to the collection
server:
- GRID_SIZE (to correspond our testing environment with the more frequent
cluster sizes)
- IP_ADDR (for general geo-tracking as well as to know what documentation
language should be a priority)
- SES_ID (to track continues uptime vs. re-starts)
- USERNAME_TYPE (privilege username vs. standard, to track production vs.
dev/testing usage; note - this is not an actual username)
- OS_NAME
- OS_VER
- OS_ARCH
- JAVA_VER
- JAVA_VENDOR
- COMP_SQL (whether or not this feature was used)
- COMP_COMPUTE (whether or not this feature was used)
- COMP_DATAGRID (whether or not this feature was used)
- COMP_STREAMING (whether or not this feature was used)
- COMP_IGFS (whether or not this feature was used)
- COMP_SERVICE (whether or not this feature was used)
- COMP_PERSISTENCE (whether or not this feature was used)

Please let's discuss this idea. Everyone's comments and suggestions are
*extremely* welcome.

Thanks,
Nikita Ivanov.


   

usage analytics

2017-07-05 Thread Nikita Ivanov
Igniters,
I would like to kick off the discussion on the idea of collecting Ignite
usage statistics. The basic idea behind this is to better understand
general and anonymous Ignite usage information to better calibrate
community efforts in developing new features, improving existing ones,
delivering better documentation - and in every other way to make our
project a better software solution.

Although such instrumentation is standard practice in commercially
developed software, for an ASF project this could be a sensitive issue.
Therefore I would like to initiate a full community discussion on how best
to implement such practice for the benefit of project while ensuring the
privacy protection of Ignite users.

To ignite (pun intended) the discussion I'll outline below some of the
basic thoughts that I have on this subject. They are here only to give an
idea of what such instrumentation may potentially look like so that we can
discuss the merits of this idea in a tangible context.

Overview
-
Upon start and every hour thereafter each Ignite node will collect, encrypt
and send usage statistics over HTTPS to the ASF-hosted server. That server
will accept such HTTPS packets, decrypt them and store them in a
time-series DB. A web interface will be provided to view the usage
information.

Opt-In or Opt-out
-
Opt-out. Ignite website will offer simple instructions (system property) on
how to disable this instrumentation.

Code, Infra, Access
---
Ignite instrumentation will be part of the Ignite code base. The collection
server will be a separate module in the Ignite code base (released
separately from Ignite). The collection server will be hosted by ASF Infra.

Usage statistics will be publicly accessible by anyone in the community.

Private, Personal Data
--
No private or personal data will ever be transferred. No emails, usernames,
company names, grid names, etc.

Data Retention

All data will be retained for 1 year and deleted permanently thereafter.

Usage Data

The following data will be collected in each packet sent to the collection
server:
- GRID_SIZE (to correspond our testing environment with the more frequent
cluster sizes)
- IP_ADDR (for general geo-tracking as well as to know what documentation
language should be a priority)
- SES_ID (to track continues uptime vs. re-starts)
- USERNAME_TYPE (privilege username vs. standard, to track production vs.
dev/testing usage; note - this is not an actual username)
- OS_NAME
- OS_VER
- OS_ARCH
- JAVA_VER
- JAVA_VENDOR
- COMP_SQL (whether or not this feature was used)
- COMP_COMPUTE (whether or not this feature was used)
- COMP_DATAGRID (whether or not this feature was used)
- COMP_STREAMING (whether or not this feature was used)
- COMP_IGFS (whether or not this feature was used)
- COMP_SERVICE (whether or not this feature was used)
- COMP_PERSISTENCE (whether or not this feature was used)

Please let's discuss this idea. Everyone's comments and suggestions are
*extremely* welcome.

Thanks,
Nikita Ivanov.