Vijay

nifi.content.repository.archive.max.retention.period=6 hours
nifi.content.repository.archive.max.usage.percentage=40%

Did you actually run out of disk space?  What error did you get?

We do remove content from the flow file repository when there is no
longer an active flow file that points at that version of content AND
when we need to free up space.

What version are you using?

Thanks

On Mon, Dec 20, 2021 at 10:55 AM Vijay Chhipa <vchh...@apple.com> wrote:
>
> Hi all,
>
> We have a use case where we list out the contents of a website and then 
> download each item in the list and process it.
> What I expected is that when each item (a file) is downloaded, after 
> processing is completed, and the flowfile is not in any of the queues the 
> disk storage will be released. But what I see is the content-repo size 
> continues to increase as the files are processed. If I pause the flow for 
> several hours (over 24 hours) the repo size stays at the increased level and 
> does not go down. Only when I clear all the queues does the content-repo size 
> goes down to the original size (before the flow started).
>
> I am not using provenance and have disabled it.
> Here is the relevant section of the properties file.
>
> I would have been okay with it but I need to process over 200K files each in 
> size almost 1GB.
>
> What is holding reference to these processed flow files and how can I design 
> the dataflow to not have the content repo filled up.
>
> nifi.flowfile.repository.implementation=org.apache.nifi.controller.repository.WriteAheadFlowFileRepository
> nifi.flowfile.repository.wal.implementation=org.apache.nifi.wali.SequentialAccessWriteAheadLog
> nifi.flowfile.repository.directory=/var/foo/bar/flowfile_repository
> nifi.flowfile.repository.partitions=256
> nifi.flowfile.repository.checkpoint.interval=2 mins
> nifi.flowfile.repository.always.sync=false
>
> # Content Repository
> nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository
> nifi.content.claim.max.appendable.size=1 MB
> nifi.content.claim.max.flow.files=10
> nifi.content.repository.directory.default=/var/foo/bar/content_repository
> nifi.content.repository.archive.max.retention.period=6 hours
> nifi.content.repository.archive.max.usage.percentage=40%
> nifi.content.repository.archive.enabled=false
> nifi.content.repository.always.sync=false
> nifi.content.viewer.url=../nifi-content-viewer/
>
> # Provenance Repository Properties
> nifi.provenance.repository.implementation=org.apache.nifi.provenance.VolatileProvenanceRepository
> nifi.provenance.repository.debug.frequency=1_000_000
> nifi.provenance.repository.encryption.key.provider.implementation=
> nifi.provenance.repository.encryption.key.provider.location=
> nifi.provenance.repository.encryption.key.id=
> nifi.provenance.repository.encryption.key=
>
> # Persistent Provenance Repository Properties
> nifi.provenance.repository.directory.default=/var/foo/bar/provenance_repository
> nifi.provenance.repository.max.storage.time=24 hours
> nifi.provenance.repository.max.storage.size=1 GB
> nifi.provenance.repository.rollover.time=30 secs
> nifi.provenance.repository.rollover.size=100 MB
> nifi.provenance.repository.query.threads=2
> nifi.provenance.repository.index.threads=2
> nifi.provenance.repository.compress.on.rollover=true
> nifi.provenance.repository.always.sync=false
>
>
> nifi.provenance.repository.indexed.fields=EventType, FlowFileUUID, Filename, 
> ProcessorID, Relationship
>
> nifi.provenance.repository.indexed.attributes=
>
> nifi.provenance.repository.index.shard.size=500 MB
> nifi.provenance.repository.max.attribute.length=65536
> nifi.provenance.repository.concurrent.merge.threads=2
>
> nifi.provenance.repository.warm.cache.frequency=1 hour
> nifi.provenance.repository.buffer.size=100000
>
> Thanks
> Vijay

Reply via email to