Ben,

Yes, precisely.

If you are seeing errors about too many open files then you can definitely have 
problems with this. Since it is
unable to create a new 'partial file' it will keep writing to the current one. 
This means that the old data never
gets compacted, so it will grow indefinitely. The Admin Guide [1] contains 
instructions on how to configure
the max number of open file handles.

Thanks
-Mark

[1] http://nifi.apache.org/docs/nifi-docs/html/administration-guide.html



On Jul 13, 2017, at 10:05 PM, 尹文才 
<[email protected]<mailto:[email protected]>> wrote:

Hi Mark, thanks for your explanation, so you mean a lot of updates were 
appended to a lot of partial files before they're renamed to snapshots during 
check-pointing and the disk space are taken up by these partial files?

Unfortunately the log files are removed by someone else and I didn't quite 
remember if there're any FlowFile repo related errors, what I did remember was 
I did see one error mentioning about too many open files error in NIFI log,
I'm not sure if this error has any connection to the disk space problem. Thanks.

Regards,
Ben

2017-07-14 9:08 GMT+08:00 Mark Payne 
<[email protected]<mailto:[email protected]>>:
Ben,

The FlowFile Repository is implemented as a Write-Ahead Log. It just keeps 
appending 'updates' to the files
until it 'checkpoints' (by default every 2 minutes). So if you have a lot of 
FlowFiles flowing through, those updates
can take a lot of disk space. Especially if you have a lot of attributes on 
your FlowFiles or some really big attributes.
You can make it checkpoint more frequently by changing the
"nifi.flowfile.repository.checkpoint.interval" property in nifi.properties.

That said, 30 GB is quite large for the FlowFile Repository, so you would need 
to have a lot of FlowFiles with really
large attributes to reach that point. I have also seen issues where if you 
don't have enough open file handles it may
fail to checkpoint, and so it can keep growing. Do you have any errors in the 
logs about the FlowFile Repo?

Thanks
-Mark


On Jul 13, 2017, at 8:42 PM, 尹文才 
<[email protected]<mailto:[email protected]>> wrote:

Thanks Russ, I checked your notes and found them very helpful.
Just like you I'm also using NIFI as a tool for ETL processes and the reason 
why I'm curious about the disk space problem is I came across an error in NIFI 
that prompted me the disk out of space error. I checked the disk usage
by NIFI folders and found the FlowFile repository took up about 30GB which 
astonished me.

I happened to read some NIFI internals from this document: 
https://github.com/JPercivall/nifi/blob/NIFI-1028/nifi-docs/src/main/asciidoc/nifi-in-depth.adoc
  which mentioned that the FlowFile repository is merely a repository keeping 
the metadata for all live FlowFiles currently processed in NIFI, so I was 
wondering why the repository would reach 30GB.

Regards,
Ben

2017-07-13 23:52 GMT+08:00 Russell Bateman 
<[email protected]<mailto:[email protected]>>:
Ben,

I took these notes last spring that may be useful to you.

http://www.javahotchocolate.com/notes/nifi.html#20170428

Hope this helps,

Russ


On 07/12/2017 08:38 PM, 尹文才 wrote:
Hi guys, I have a question about cleaning up the disk space used by NIFI from 
time to time.
As you know NIFI saves a lot to disks, like the repository folders. I checked 
the NIFI official admin guide and I know the content repository supports
toggling content archiving. So in order to save disk space, I could adjust the 
content archiving parameters or simply turn it off. My question is I didn't
any similar options for the other repository folders, do they support archiving 
as well? What are the best practices to keep the disk space used by NIFI
as low as possible? (I'm running NIFI in a single machine and sometimes I could 
see disk out of space error in NIFI bulletin board) Thanks.

Regards,
Ben





Reply via email to