[ 
https://issues.apache.org/jira/browse/NIFI-3388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16927593#comment-16927593
 ] 

Deepak commented on NIFI-3388:
------------------------------

Hi Andy,

Any updates on this by when it will be available?

Thanks
Deepak

> Provide encrypted repository implementations
> --------------------------------------------
>
>                 Key: NIFI-3388
>                 URL: https://issues.apache.org/jira/browse/NIFI-3388
>             Project: Apache NiFi
>          Issue Type: Epic
>          Components: Core Framework
>    Affects Versions: 1.1.1
>            Reporter: Andy LoPresto
>            Assignee: Andy LoPresto
>            Priority: Critical
>              Labels: data-integrity, encryption, repository, security
>
> Apache NiFi secures data within the application but the various 
> *repositories* -- content, provenance, flowfile (aka attribute), and to a 
> lesser extent bulletin, counter, component status, and log -- are stored 
> unencrypted on disk. Thus far, OS-level access control policies and full disk 
> encryption (FDE) have been recommended to secure these repositories. However, 
> the underlying raw data can be viewed, and possibly manipulated, by a user 
> with access to the repository files. 
> With additional focus on data security (confidentiality, integrity, *and* 
> authentication), especially as more users intend to deploy NiFi on 
> third-party provisioned hardware and operating systems (AWS, Azure, etc.), 
> further steps should be taken to secure the repository data which NiFi writes 
> and reads. 
> Each of the repository implementations adheres to an interface definition:
> * Content: {{ContentRepository}}
> ** {{FileSystemRepository}} *
> ** {{VolatileContentRepository}}
> ** {{MockContentRepository}}
> * Provenance: {{ProvenanceRepository extends ProvenanceEventRepository}}
> ** {{PersistentProvenanceRepository}} (to be deprecated via 
> [NIFI-3356|https://issues.apache.org/jira/browse/NIFI-3356])*
> ** {{WriteAheadProvenanceRepository}} (to be introduced via NIFI-3356) * 
> ** {{VolatileProvenanceRepository}}
> ** {{MockProvenanceRepository}}
> * Flowfile: {{FlowFileRepository}} / {{FlowFileEventRepository}}
> ** {{WriteAheadFlowFileRepository implements FlowFileRepository}} *
> ** {{VolatileFlowFileRepository implements FlowFileRepository}}
> ** {{MockFlowFileRepository implements FlowFileRepository}}
> ** {{RingBufferEventRepository implements FlowFileEventRepository}}
> The bulletin, counter, component status, and log repositories currently have 
> only *volatile* implementations, and are not addressed in this ticket. The 
> repository implementations noted above with an asterisk will have new 
> implementations provided of the form 
> {{EncryptedWriteAheadFlowFileRepository}}, extending the behavior of the 
> existing repository and adding transparent encryption/decryption on 
> serialization, following the existing interface contracts and thus invisible 
> to the higher level code delegating these operations, aside from 
> configuration requirements.  
> There are substantial concerns to address in this approach. 
> * Should the various repositories all be required to have the same 
> encrypted/plaintext status (i.e. can they be encrypted independently)?
> * Should all encrypted repositories use the same encryption key, or should it 
> be segmented by repository?
> * If a content or provenance repository has multiple shards, do they all 
> require the same level of encryption? If not, can "sensitive" records be 
> routed to an encrypted repository and "normal" records to a plaintext 
> repository for performance reasons? 
> * Can a plaintext repository have encryption enabled at any time? Can an 
> encrypted repository have decryption removed? 
> * Performance impact on reading and writing from the repositories is not yet 
> captured
> ** The provenance repository requires many fast writes and reads during high 
> performance and query operations
> ** The flowfile repository requires many fast writes and reads
> ** The content repository requires fewer reads and writes, but the current 
> content repository stores multiple flowfile contents in the same ["sections" 
> of the 
> "containers"|https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html#deeper-view-content-claim]
>  that make up the repository, so content claims will need to be encrypted 
> separately to allow random-access seek and retrieve (i.e. if a 10 byte claim 
> and a 10 GB claim are stored in the same section, the 10 B claim must be 
> retrievable without reading 10 GB into memory to decrypt)
> * The provenance repository uses GZip compression to improve the use of disk 
> space. Compression over encrypted data will yield close to zero improvement 
> (as encryption intentionally generates near-random output, which means 
> pattern recognition/entropy removal will have no effect), and compression 
> before encryption can cause security vulnerabilities (see 
> [CRIME|http://security.stackexchange.com/a/19914/16485], [Compression and 
> Information Leakage of Plaintext by John 
> Kelsey|https://dl.acm.org/citation.cfm?id=741226], etc.)
> * The provenance repository event records are indexed by Lucene to allow 
> retrieval through the provenance query system, but encrypted fields cannot be 
> indexed. HMAC/CHF (Hash-based message authentication codes/cryptographic hash 
> functions) may provide an alternative for non-fuzzy matching for information 
> retrieval
> * The flowfile repository implementation uses swap files to maintain flowfile 
> state if too many are loaded into memory -- these swap files (and anything 
> else persisted to disk) will need to be encrypted
> * The content and provenance repositories can be spread across multiple 
> physical volumes. In that case, should data stored on different disks be 
> encrypted with the same key or unique keys (perhaps derived from a master key 
> using a disk identifier)? If a constituent disk is swapped out, will that 
> data be recoverable?
> * How is the configuration of encrypted repositories handled (i.e. new 
> properties in {{nifi.properties}})?
> ** How are the keys generated and secured?
> ** What permissions/policies are required to configure these properties?
> ** What UI / API signals (if any) are provided to inform users of the 
> encrypted/plaintext status of the repositories?
> * What encryption algorithm(s) is/are used?
> ** Performance considerations (CTR vs. CBC)
> ** AEAD considerations (GCM, CCM, vs. CBC/CTR + HMAC/SHA-256)
> * What actions should be taken if the encrypted data cannot be read 
> (authentication tag corrupted, cipher text malformed, etc.)? These are risk 
> vectors for DoS attacks
> I am currently working on this issue (planning & architecture stages), so I 
> would appreciate community feedback in order to provide the best possible 
> solution that balances the security, performance, and usability needs of 
> everyone. I will likely break the work into the following subtasks:
> * Build/consume encapsulated encryption service layer (see 
> [{{AESKeyedCipherProvider}}|https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/util/crypto/AESKeyedCipherProvider.java]
>  and 
> [{{KeyedEncryptor}}|https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-processors/src/main/java/org/apache/nifi/processors/standard/util/crypto/KeyedEncryptor.java])
> * Build {{EncryptedWriteAheadFlowFileRepository}}
> * Build {{EncryptedWriteAheadProvenanceRepository}}
> * Build {{EncryptedFileSystemContentRepository}} (order may change depending 
> on further investigation)
> For anyone interested in further detail on the existing repository design, 
> implementations, and use, see:
> * [NiFi's Write-Ahead Log Implementation (FlowFile 
> Repository)|https://cwiki.apache.org/confluence/display/NIFI/NiFi%27s+Write-Ahead+Log+Implementation]
> * [Persistent Provenance Repository 
> Design|https://cwiki.apache.org/confluence/display/NIFI/Persistent+Provenance+Repository+Design]
> * [NiFi 
> In-Depth|https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html]
> * [Admin 
> Guide|https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html]



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to