Ben, The FlowFile Repository is implemented as a Write-Ahead Log. It just keeps appending 'updates' to the files until it 'checkpoints' (by default every 2 minutes). So if you have a lot of FlowFiles flowing through, those updates can take a lot of disk space. Especially if you have a lot of attributes on your FlowFiles or some really big attributes. You can make it checkpoint more frequently by changing the "nifi.flowfile.repository.checkpoint.interval" property in nifi.properties.
That said, 30 GB is quite large for the FlowFile Repository, so you would need to have a lot of FlowFiles with really large attributes to reach that point. I have also seen issues where if you don't have enough open file handles it may fail to checkpoint, and so it can keep growing. Do you have any errors in the logs about the FlowFile Repo? Thanks -Mark On Jul 13, 2017, at 8:42 PM, 尹文才 <[email protected]<mailto:[email protected]>> wrote: Thanks Russ, I checked your notes and found them very helpful. Just like you I'm also using NIFI as a tool for ETL processes and the reason why I'm curious about the disk space problem is I came across an error in NIFI that prompted me the disk out of space error. I checked the disk usage by NIFI folders and found the FlowFile repository took up about 30GB which astonished me. I happened to read some NIFI internals from this document: https://github.com/JPercivall/nifi/blob/NIFI-1028/nifi-docs/src/main/asciidoc/nifi-in-depth.adoc which mentioned that the FlowFile repository is merely a repository keeping the metadata for all live FlowFiles currently processed in NIFI, so I was wondering why the repository would reach 30GB. Regards, Ben 2017-07-13 23:52 GMT+08:00 Russell Bateman <[email protected]<mailto:[email protected]>>: Ben, I took these notes last spring that may be useful to you. http://www.javahotchocolate.com/notes/nifi.html#20170428 Hope this helps, Russ On 07/12/2017 08:38 PM, 尹文才 wrote: Hi guys, I have a question about cleaning up the disk space used by NIFI from time to time. As you know NIFI saves a lot to disks, like the repository folders. I checked the NIFI official admin guide and I know the content repository supports toggling content archiving. So in order to save disk space, I could adjust the content archiving parameters or simply turn it off. My question is I didn't any similar options for the other repository folders, do they support archiving as well? What are the best practices to keep the disk space used by NIFI as low as possible? (I'm running NIFI in a single machine and sometimes I could see disk out of space error in NIFI bulletin board) Thanks. Regards, Ben
