Hi Mark, thanks for your explanation, so you mean a lot of updates were appended to a lot of partial files before they're renamed to snapshots during check-pointing and the disk space are taken up by these partial files?
Unfortunately the log files are removed by someone else and I didn't quite remember if there're any FlowFile repo related errors, what I did remember was I did see one error mentioning about too many open files error in NIFI log, I'm not sure if this error has any connection to the disk space problem. Thanks. Regards, Ben 2017-07-14 9:08 GMT+08:00 Mark Payne <[email protected]>: > Ben, > > The FlowFile Repository is implemented as a Write-Ahead Log. It just keeps > appending 'updates' to the files > until it 'checkpoints' (by default every 2 minutes). So if you have a lot > of FlowFiles flowing through, those updates > can take a lot of disk space. Especially if you have a lot of attributes > on your FlowFiles or some really big attributes. > You can make it checkpoint more frequently by changing the > "nifi.flowfile.repository.checkpoint.interval" property in > nifi.properties. > > That said, 30 GB is quite large for the FlowFile Repository, so you would > need to have a lot of FlowFiles with really > large attributes to reach that point. I have also seen issues where if you > don't have enough open file handles it may > fail to checkpoint, and so it can keep growing. Do you have any errors in > the logs about the FlowFile Repo? > > Thanks > -Mark > > > On Jul 13, 2017, at 8:42 PM, 尹文才 <[email protected]> wrote: > > Thanks Russ, I checked your notes and found them very helpful. > Just like you I'm also using NIFI as a tool for ETL processes and the > reason why I'm curious about the disk space problem is I came across an > error in NIFI that prompted me the disk out of space error. I checked the > disk usage > by NIFI folders and found the FlowFile repository took up about 30GB which > astonished me. > > I happened to read some NIFI internals from this document: > https://github.com/JPercivall/nifi/blob/NIFI-1028/nifi-docs/src/main/ > asciidoc/nifi-in-depth.adoc which mentioned that the FlowFile repository > is merely a repository keeping the metadata for all live FlowFiles > currently processed in NIFI, so I was wondering why the repository would > reach 30GB. > > Regards, > Ben > > 2017-07-13 23:52 GMT+08:00 Russell Bateman <[email protected]>: > >> Ben, >> >> I took these notes last spring that may be useful to you. >> >> http://www.javahotchocolate.com/notes/nifi.html#20170428 >> >> Hope this helps, >> >> Russ >> >> >> On 07/12/2017 08:38 PM, 尹文才 wrote: >> >> Hi guys, I have a question about cleaning up the disk space used by NIFI >> from time to time. >> As you know NIFI saves a lot to disks, like the repository folders. I >> checked the NIFI official admin guide and I know the content repository >> supports >> toggling content archiving. So in order to save disk space, I could >> adjust the content archiving parameters or simply turn it off. My question >> is I didn't >> any similar options for the other repository folders, do they support >> archiving as well? What are the best practices to keep the disk space used >> by NIFI >> as low as possible? (I'm running NIFI in a single machine and sometimes I >> could see disk out of space error in NIFI bulletin board) Thanks. >> >> Regards, >> Ben >> >> >> > >
