Ryan,

Is this this maybe a case of exhausting inodes on the filesystem rather than 
exhausting the space available?  If you do a ‘df -I’ on the system what do you 
see for inode usage?

Warm regards,

[cid:image001.jpg@01D68CDD.FC463950]<https://www.alertlogic.com/>
Jim Williams | Manager, Site Reliability Engineering
O: +1 713.341.7812 | C: +1 919.523.8767 | 
jwilli...@alertlogic.com<mailto:jwilli...@alertlogic.com> | 
alertlogic.com<http://www.alertlogic.com/> [cid:image002.png@01D68CDD.FC463950] 
<https://twitter.com/alertlogic> [cid:image003.png@01D68CDD.FC463950] 
<https://www.linkedin.com/company/alert-logic>

[cid:image004.png@01D68CDD.FC463950]

From: Joe Witt <joe.w...@gmail.com>
Sent: Thursday, September 17, 2020 10:19 AM
To: users@nifi.apache.org
Subject: Re: Content Claims Filling Disk - Best practice for small files?

can you share your flow.xml.gz?

On Thu, Sep 17, 2020 at 8:08 AM Ryan Hendrickson 
<ryan.andrew.hendrick...@gmail.com<mailto:ryan.andrew.hendrick...@gmail.com>> 
wrote:
1.12.0

Thanks,
Ryan

On Thu, Sep 17, 2020 at 11:04 AM Joe Witt 
<joe.w...@gmail.com<mailto:joe.w...@gmail.com>> wrote:
Ryan

What version are you using? I do think we had an issue that kept items around 
longer than intended that has been addressed.

Thanks

On Thu, Sep 17, 2020 at 7:58 AM Ryan Hendrickson 
<ryan.andrew.hendrick...@gmail.com<mailto:ryan.andrew.hendrick...@gmail.com>> 
wrote:
Hello,
I've got ~15 million FlowFiles, each roughly 4KB, totally in about 55GB of data 
on my canvas.

However, the content repository (on it's own partition) is completely full with 
350GB of data.  I'm pretty certain the way Content Claims store the data is 
responsible for this.  In previous experience, we've had files that are larger, 
and haven't seen this as much.

My guess is that as data was streaming through and being added to a claim, it 
isn't always released as the small files leaves the canvas.

We've run into this issue enough times that I figure there's probably a "best 
practice for small files" for the content claims settings.

These are our current settings:
nifi.content.repository.implementation=org.apache.nifi.controller.repository.FileSystemRepository
nifi.content.claim.max.appendable.size=1 MB
nifi.content.claim.max.flow.files=100
nifi.content.repository.directory.default=/var/nifi/repositories/content
nifi.content.repository.archive.max.retention.period=12 hours
nifi.content.repository.archive.max.usage.percentage=50%
nifi.content.repository.archive.enabled=true
nifi.content.repository.always.sync=false

https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#content-repository

There's 1024 folders on the disk (0-1023) for the Content Claims.
Each file inside the folders are roughly  2MB to 8 MB (Which is odd because I 
thought the max appendable size would make this no larger than 1MB.)

Is there a way to expand the number of folders and/or reduce the amount of 
individual FlowFiles that are stored in the claims?

I'm hoping there might be a best practice out there though.

Thanks,
Ryan

Confidentiality Notice | This email and any included attachments may be 
privileged, confidential and/or otherwise protected from disclosure. Access to 
this email by anyone other than the intended recipient is unauthorized. If you 
believe you have received this email in error, please contact the sender 
immediately and delete all copies. If you are not the intended recipient, you 
are notified that disclosing, copying, distributing or taking any action in 
reliance on the contents of this information is strictly prohibited.

Disclaimer

The information contained in this communication from the sender is 
confidential. It is intended solely for use by the recipient and others 
authorized to receive it. If you are not the recipient, you are hereby notified 
that any disclosure, copying, distribution or taking action in relation of the 
contents of this information is strictly prohibited and may be unlawful.

This email has been scanned for viruses and malware, and may have been 
automatically archived by Mimecast, a leader in email security and cyber 
resilience. Mimecast integrates email defenses with brand protection, security 
awareness training, web security, compliance and other essential capabilities. 
Mimecast helps protect large and small organizations from malicious activity, 
human error and technology failure; and to lead the movement toward building a 
more resilient world. To find out more, visit our website.

Reply via email to