Ben, Yes, precisely.
If you are seeing errors about too many open files then you can definitely have problems with this. Since it is unable to create a new 'partial file' it will keep writing to the current one. This means that the old data never gets compacted, so it will grow indefinitely. The Admin Guide [1] contains instructions on how to configure the max number of open file handles. Thanks -Mark [1] http://nifi.apache.org/docs/nifi-docs/html/administration-guide.html On Jul 13, 2017, at 10:05 PM, 尹文才 <[email protected]<mailto:[email protected]>> wrote: Hi Mark, thanks for your explanation, so you mean a lot of updates were appended to a lot of partial files before they're renamed to snapshots during check-pointing and the disk space are taken up by these partial files? Unfortunately the log files are removed by someone else and I didn't quite remember if there're any FlowFile repo related errors, what I did remember was I did see one error mentioning about too many open files error in NIFI log, I'm not sure if this error has any connection to the disk space problem. Thanks. Regards, Ben 2017-07-14 9:08 GMT+08:00 Mark Payne <[email protected]<mailto:[email protected]>>: Ben, The FlowFile Repository is implemented as a Write-Ahead Log. It just keeps appending 'updates' to the files until it 'checkpoints' (by default every 2 minutes). So if you have a lot of FlowFiles flowing through, those updates can take a lot of disk space. Especially if you have a lot of attributes on your FlowFiles or some really big attributes. You can make it checkpoint more frequently by changing the "nifi.flowfile.repository.checkpoint.interval" property in nifi.properties. That said, 30 GB is quite large for the FlowFile Repository, so you would need to have a lot of FlowFiles with really large attributes to reach that point. I have also seen issues where if you don't have enough open file handles it may fail to checkpoint, and so it can keep growing. Do you have any errors in the logs about the FlowFile Repo? Thanks -Mark On Jul 13, 2017, at 8:42 PM, 尹文才 <[email protected]<mailto:[email protected]>> wrote: Thanks Russ, I checked your notes and found them very helpful. Just like you I'm also using NIFI as a tool for ETL processes and the reason why I'm curious about the disk space problem is I came across an error in NIFI that prompted me the disk out of space error. I checked the disk usage by NIFI folders and found the FlowFile repository took up about 30GB which astonished me. I happened to read some NIFI internals from this document: https://github.com/JPercivall/nifi/blob/NIFI-1028/nifi-docs/src/main/asciidoc/nifi-in-depth.adoc which mentioned that the FlowFile repository is merely a repository keeping the metadata for all live FlowFiles currently processed in NIFI, so I was wondering why the repository would reach 30GB. Regards, Ben 2017-07-13 23:52 GMT+08:00 Russell Bateman <[email protected]<mailto:[email protected]>>: Ben, I took these notes last spring that may be useful to you. http://www.javahotchocolate.com/notes/nifi.html#20170428 Hope this helps, Russ On 07/12/2017 08:38 PM, 尹文才 wrote: Hi guys, I have a question about cleaning up the disk space used by NIFI from time to time. As you know NIFI saves a lot to disks, like the repository folders. I checked the NIFI official admin guide and I know the content repository supports toggling content archiving. So in order to save disk space, I could adjust the content archiving parameters or simply turn it off. My question is I didn't any similar options for the other repository folders, do they support archiving as well? What are the best practices to keep the disk space used by NIFI as low as possible? (I'm running NIFI in a single machine and sometimes I could see disk out of space error in NIFI bulletin board) Thanks. Regards, Ben
