Re: Adding new data anonymization processor bundle
Andy, You raise a great point about considering the provenance. Unless there's a way to exclude attributes from provenance tracking, I think we'd need to force the issue by not allowing attributes to be an input source for expression language. That's the only way to kinda force people to think "hey, I shouldn't put this here." In my opinion, that's not really something we should allow given the ramifications of people using the feature without reading up on the relevant documentation. On Wed, Jun 20, 2018 at 1:35 PM Andy LoPresto wrote: > Sivaprasanna, > > Thanks for joining this effort. I don’t recall what’s on the existing > Jira, but please be very aware of the challenges in data anonymization and > the various threat models — de-anonymizing data can lead to the leak of > PII, EPHI, PCI data, etc. In some cases, it can even lead to physical > danger against persons. > > There are a number of high impact examples of avoidable scenarios like > this. > > > https://arstechnica.com/tech-policy/2009/09/your-secrets-live-online-in-databases-of-ruin/ > > > https://arstechnica.com/tech-policy/2014/06/poorly-anonymized-logs-reveal-nyc-cab-drivers-detailed-whereabouts/ > > We should use publicly reviewed algorithms, document the risks and known > challenges well, take into consideration provenance and other NiFi-specific > features, and write a good summary of these features if/when they are > introduced. > > Andy LoPresto > alopre...@apache.org > alopresto.apa...@gmail.com > PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > > > On Jun 20, 2018, at 10:06, Sivaprasanna > wrote: > > > > Wow.. I dint realize there was a JIRA already. I'm interested and would > be > > happy to contribute my time & efforts on this. > > > >> On Wed, Jun 20, 2018 at 10:34 PM, Matt Burgess > wrote: > >> > >> I think is a great idea, I filed a Jira [1] a while ago in case > >> someone wanted to start working on it (or in case I got a chance). It > >> mentions ARX but any Apache-friendly implementation is of course > >> welcome. I think it should be in its own bundle as it is functionality > >> separate from all our other bundles (and not ubiquitous enough to put > >> in the standard NAR). > >> > >> Glad to hear you're interested in this, please feel free to reach out > >> with any questions and I too would be happy to review any > >> contributions. > >> > >> Thanks, > >> Matt > >> > >> [1] https://issues.apache.org/jira/browse/NIFI-4492 > >> > >> On Wed, Jun 20, 2018 at 12:57 PM Mike Thomsen > >> wrote: > >>> > >>> There's a framework called ARX that could very useful for this. The > only > >>> question you have is how compliant it would be with different sets of > >>> distinct legal requirements for privacy handling. In the absence of > >> strong > >>> legal guidance, I'd say err on the side of complying with health care > >>> regulations because that's where you're likely to find the clearest > >>> guidance and established tools. > >>> > >>> Ping me on any PR you send. > >>> > >>> On Wed, Jun 20, 2018 at 12:49 PM Sivaprasanna < > sivaprasanna...@gmail.com > >>> > >>> wrote: > >>> > With data becoming more critical and substantial to business > >> development, > new stringent regulations & law are getting introduced (GDPR being a > >> recent > example), I've been spending some time lately doing research on data > anonymization and after some hefty thinking, I finally decided to go > >> ahead > with the creation of new processor bundle that has processors like > 'AnonymizeRecord', 'DeanonymizeRecord' (not quite sure about the name > though). Following are my questions: > > - What do you guys think about these proposed processors? > - If the processors are okay to be introduced, are they "standard" > enough to get them added to our 'nifi-standard-bundles' module or > >> is it > better to keep it separated much like others like AWS, Azure > >> bundles, > etc. > > Having said this, I'm very much in the beginning phase with my > >> research and > development efforts so all your inputs & feedback on this one are > >> greatly > appreciated. > > Thanks. > > - > Sivaprasanna > > >> >
Re: Adding new data anonymization processor bundle
Sivaprasanna, Thanks for joining this effort. I don’t recall what’s on the existing Jira, but please be very aware of the challenges in data anonymization and the various threat models — de-anonymizing data can lead to the leak of PII, EPHI, PCI data, etc. In some cases, it can even lead to physical danger against persons. There are a number of high impact examples of avoidable scenarios like this. https://arstechnica.com/tech-policy/2009/09/your-secrets-live-online-in-databases-of-ruin/ https://arstechnica.com/tech-policy/2014/06/poorly-anonymized-logs-reveal-nyc-cab-drivers-detailed-whereabouts/ We should use publicly reviewed algorithms, document the risks and known challenges well, take into consideration provenance and other NiFi-specific features, and write a good summary of these features if/when they are introduced. Andy LoPresto alopre...@apache.org alopresto.apa...@gmail.com PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4 BACE 3C6E F65B 2F7D EF69 > On Jun 20, 2018, at 10:06, Sivaprasanna wrote: > > Wow.. I dint realize there was a JIRA already. I'm interested and would be > happy to contribute my time & efforts on this. > >> On Wed, Jun 20, 2018 at 10:34 PM, Matt Burgess wrote: >> >> I think is a great idea, I filed a Jira [1] a while ago in case >> someone wanted to start working on it (or in case I got a chance). It >> mentions ARX but any Apache-friendly implementation is of course >> welcome. I think it should be in its own bundle as it is functionality >> separate from all our other bundles (and not ubiquitous enough to put >> in the standard NAR). >> >> Glad to hear you're interested in this, please feel free to reach out >> with any questions and I too would be happy to review any >> contributions. >> >> Thanks, >> Matt >> >> [1] https://issues.apache.org/jira/browse/NIFI-4492 >> >> On Wed, Jun 20, 2018 at 12:57 PM Mike Thomsen >> wrote: >>> >>> There's a framework called ARX that could very useful for this. The only >>> question you have is how compliant it would be with different sets of >>> distinct legal requirements for privacy handling. In the absence of >> strong >>> legal guidance, I'd say err on the side of complying with health care >>> regulations because that's where you're likely to find the clearest >>> guidance and established tools. >>> >>> Ping me on any PR you send. >>> >>> On Wed, Jun 20, 2018 at 12:49 PM Sivaprasanna >> >>> wrote: >>> With data becoming more critical and substantial to business >> development, new stringent regulations & law are getting introduced (GDPR being a >> recent example), I've been spending some time lately doing research on data anonymization and after some hefty thinking, I finally decided to go >> ahead with the creation of new processor bundle that has processors like 'AnonymizeRecord', 'DeanonymizeRecord' (not quite sure about the name though). Following are my questions: - What do you guys think about these proposed processors? - If the processors are okay to be introduced, are they "standard" enough to get them added to our 'nifi-standard-bundles' module or >> is it better to keep it separated much like others like AWS, Azure >> bundles, etc. Having said this, I'm very much in the beginning phase with my >> research and development efforts so all your inputs & feedback on this one are >> greatly appreciated. Thanks. - Sivaprasanna >>
Re: Adding new data anonymization processor bundle
Wow.. I dint realize there was a JIRA already. I'm interested and would be happy to contribute my time & efforts on this. On Wed, Jun 20, 2018 at 10:34 PM, Matt Burgess wrote: > I think is a great idea, I filed a Jira [1] a while ago in case > someone wanted to start working on it (or in case I got a chance). It > mentions ARX but any Apache-friendly implementation is of course > welcome. I think it should be in its own bundle as it is functionality > separate from all our other bundles (and not ubiquitous enough to put > in the standard NAR). > > Glad to hear you're interested in this, please feel free to reach out > with any questions and I too would be happy to review any > contributions. > > Thanks, > Matt > > [1] https://issues.apache.org/jira/browse/NIFI-4492 > > On Wed, Jun 20, 2018 at 12:57 PM Mike Thomsen > wrote: > > > > There's a framework called ARX that could very useful for this. The only > > question you have is how compliant it would be with different sets of > > distinct legal requirements for privacy handling. In the absence of > strong > > legal guidance, I'd say err on the side of complying with health care > > regulations because that's where you're likely to find the clearest > > guidance and established tools. > > > > Ping me on any PR you send. > > > > On Wed, Jun 20, 2018 at 12:49 PM Sivaprasanna > > > wrote: > > > > > With data becoming more critical and substantial to business > development, > > > new stringent regulations & law are getting introduced (GDPR being a > recent > > > example), I've been spending some time lately doing research on data > > > anonymization and after some hefty thinking, I finally decided to go > ahead > > > with the creation of new processor bundle that has processors like > > > 'AnonymizeRecord', 'DeanonymizeRecord' (not quite sure about the name > > > though). Following are my questions: > > > > > >- What do you guys think about these proposed processors? > > >- If the processors are okay to be introduced, are they "standard" > > >enough to get them added to our 'nifi-standard-bundles' module or > is it > > >better to keep it separated much like others like AWS, Azure > bundles, > > > etc. > > > > > > Having said this, I'm very much in the beginning phase with my > research and > > > development efforts so all your inputs & feedback on this one are > greatly > > > appreciated. > > > > > > Thanks. > > > > > > - > > > Sivaprasanna > > > >
Re: Adding new data anonymization processor bundle
I think is a great idea, I filed a Jira [1] a while ago in case someone wanted to start working on it (or in case I got a chance). It mentions ARX but any Apache-friendly implementation is of course welcome. I think it should be in its own bundle as it is functionality separate from all our other bundles (and not ubiquitous enough to put in the standard NAR). Glad to hear you're interested in this, please feel free to reach out with any questions and I too would be happy to review any contributions. Thanks, Matt [1] https://issues.apache.org/jira/browse/NIFI-4492 On Wed, Jun 20, 2018 at 12:57 PM Mike Thomsen wrote: > > There's a framework called ARX that could very useful for this. The only > question you have is how compliant it would be with different sets of > distinct legal requirements for privacy handling. In the absence of strong > legal guidance, I'd say err on the side of complying with health care > regulations because that's where you're likely to find the clearest > guidance and established tools. > > Ping me on any PR you send. > > On Wed, Jun 20, 2018 at 12:49 PM Sivaprasanna > wrote: > > > With data becoming more critical and substantial to business development, > > new stringent regulations & law are getting introduced (GDPR being a recent > > example), I've been spending some time lately doing research on data > > anonymization and after some hefty thinking, I finally decided to go ahead > > with the creation of new processor bundle that has processors like > > 'AnonymizeRecord', 'DeanonymizeRecord' (not quite sure about the name > > though). Following are my questions: > > > >- What do you guys think about these proposed processors? > >- If the processors are okay to be introduced, are they "standard" > >enough to get them added to our 'nifi-standard-bundles' module or is it > >better to keep it separated much like others like AWS, Azure bundles, > > etc. > > > > Having said this, I'm very much in the beginning phase with my research and > > development efforts so all your inputs & feedback on this one are greatly > > appreciated. > > > > Thanks. > > > > - > > Sivaprasanna > >
Re: Adding new data anonymization processor bundle
There's a framework called ARX that could very useful for this. The only question you have is how compliant it would be with different sets of distinct legal requirements for privacy handling. In the absence of strong legal guidance, I'd say err on the side of complying with health care regulations because that's where you're likely to find the clearest guidance and established tools. Ping me on any PR you send. On Wed, Jun 20, 2018 at 12:49 PM Sivaprasanna wrote: > With data becoming more critical and substantial to business development, > new stringent regulations & law are getting introduced (GDPR being a recent > example), I've been spending some time lately doing research on data > anonymization and after some hefty thinking, I finally decided to go ahead > with the creation of new processor bundle that has processors like > 'AnonymizeRecord', 'DeanonymizeRecord' (not quite sure about the name > though). Following are my questions: > >- What do you guys think about these proposed processors? >- If the processors are okay to be introduced, are they "standard" >enough to get them added to our 'nifi-standard-bundles' module or is it >better to keep it separated much like others like AWS, Azure bundles, > etc. > > Having said this, I'm very much in the beginning phase with my research and > development efforts so all your inputs & feedback on this one are greatly > appreciated. > > Thanks. > > - > Sivaprasanna >
Adding new data anonymization processor bundle
With data becoming more critical and substantial to business development, new stringent regulations & law are getting introduced (GDPR being a recent example), I've been spending some time lately doing research on data anonymization and after some hefty thinking, I finally decided to go ahead with the creation of new processor bundle that has processors like 'AnonymizeRecord', 'DeanonymizeRecord' (not quite sure about the name though). Following are my questions: - What do you guys think about these proposed processors? - If the processors are okay to be introduced, are they "standard" enough to get them added to our 'nifi-standard-bundles' module or is it better to keep it separated much like others like AWS, Azure bundles, etc. Having said this, I'm very much in the beginning phase with my research and development efforts so all your inputs & feedback on this one are greatly appreciated. Thanks. - Sivaprasanna