Re: Adding new data anonymization processor bundle

2018-06-20 Thread Mike Thomsen
Andy, You raise a great point about considering the provenance. Unless there's a way to exclude attributes from provenance tracking, I think we'd need to force the issue by not allowing attributes to be an input source for expression language. That's the only way to kinda force people to think

Re: Adding new data anonymization processor bundle

2018-06-20 Thread Andy LoPresto
Sivaprasanna, Thanks for joining this effort. I don’t recall what’s on the existing Jira, but please be very aware of the challenges in data anonymization and the various threat models — de-anonymizing data can lead to the leak of PII, EPHI, PCI data, etc. In some cases, it can even lead to

Re: Adding new data anonymization processor bundle

2018-06-20 Thread Sivaprasanna
Wow.. I dint realize there was a JIRA already. I'm interested and would be happy to contribute my time & efforts on this. On Wed, Jun 20, 2018 at 10:34 PM, Matt Burgess wrote: > I think is a great idea, I filed a Jira [1] a while ago in case > someone wanted to start working on it (or in case I

Re: Adding new data anonymization processor bundle

2018-06-20 Thread Matt Burgess
I think is a great idea, I filed a Jira [1] a while ago in case someone wanted to start working on it (or in case I got a chance). It mentions ARX but any Apache-friendly implementation is of course welcome. I think it should be in its own bundle as it is functionality separate from all our other

Re: Adding new data anonymization processor bundle

2018-06-20 Thread Mike Thomsen
There's a framework called ARX that could very useful for this. The only question you have is how compliant it would be with different sets of distinct legal requirements for privacy handling. In the absence of strong legal guidance, I'd say err on the side of complying with health care