Hi Jeremy,

These are great questions and I appreciate your interest in securing data at 
all stages for your application.

Setting nifi.content.repository.archive.enabled=false will turn off content 
repository archiving, but the content will still sit at rest on the file system 
for some period of time (while the data is in use during the flow). To 
completely avoid persisting any content data to the file system, set 
nifi.content.repository.implementation=org.apache.nifi.controller.repository.VolatileContentRepository.
 This will direct NiFi to store the content in-memory during operation (with 
the understanding that power loss could cause data loss).

You can set a similar value to do the same with the provenance repository, with 
the same caveat. 
nifi.provenance.repository.implementation=org.apache.nifi.provenance.VolatileProvenanceRepository.

Unfortunately, at this time these settings are global for all NiFi data, rather 
than specific to a processor/process group.

I am working on efforts to provide the following features (and need to get them 
posted in the wiki roadmap to solicit feedback from the community):

* Transparent data encryption for repositories
        * Provenance
        * Content
        * Flowfile (attributes)
* Sensitive attributes
* Cryptographic signatures for provenance event records and lineage chains
* Features to ease data segmentation/isolation (i.e. raw data comes into input 
port/source processor, it is routed by attribute/signature to different 
nodes/clusters with varying security levels or underlying security 
hardening/policies)

I would suggest you stay tuned to the mailing list (off the top of my head, I 
can’t remember if changes to the wiki are posted to users@, so you might want 
to subscribe to dev@ as well) and welcome your input on these feature 
development efforts. There are some other members of our community similarly 
security-minded, and I think we will get some great collaboration on this 
moving forward.

Andy LoPresto
[email protected]
[email protected]
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Oct 20, 2016, at 2:03 PM, Jeremy Farbota <[email protected]> wrote:
> 
> Hello,
> 
> I'm using NiFi in a compliance setting. One of my use cases is for deheading 
> (hashing names, ssns, etc) and republishing. It works great for these tasks 
> but I need to cover my bases to make sure things are not stored on disk. E.g. 
> when I extract a name to an attribute for hashing, I do not want to store it 
> unencrypted at rest in the provenance repo.
> 
> It seems I can turn off the content repo with this setting:
> nifi.content.repository.archive.enabled=false
> 
> Is flowfile content stored on disk anywhere once the flowfile is dropped with 
> the setting above?
> 
> Regarding the provenance repo, the settings offer the ability to truncate the 
> attribute on retrieval e.g.
> nifi.provenance.repository.max.attribute.length=8
> 
> Does the above setting change only what can be retrieved or does it limit 
> what is stored?
> 
> If it is still storing all the attributes, then I will likely need to greatly 
> reduce the provenance repo max.storage.time. Would severely limiting the 
> provenance or content repo negatively affect NiFi's performance?
> 
> Is there a way that I can have these "secure" settings only for certain 
> templates? Or are these provenance and content repo setting only configurable 
> server wide?
> 
> Has there ever been thought to enable encryption at rest of the provenance 
> repo to deal with situations like mine?
> 
> Thanks in advance.
> 
> --
> 
>  <http://www.payoff.com/>
> Jeremy Farbota
> Software Engineer, Data
> [email protected] <mailto:[email protected]> • (217) 898-8110 
> <tel:(949)+430-0630>
> I'm a Storyteller. Discover your Financial Personality! 
> <https://www.payoff.com/quiz>
>   <https://www.facebook.com/payoff>   <https://www.twitter.com/payoff>  
> <https://www.linkedin.com/company/payoff-com>

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to