[ 
https://issues.apache.org/jira/browse/NIFI-3376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16103952#comment-16103952
 ] 

Michael Moser commented on NIFI-3376:
-------------------------------------

I also modified the title to describe the observations rather than propose a 
solution.  Thanks to all who are interested in investigating this!

> Content repository disk usage is not close to reported size in Status Bar
> -------------------------------------------------------------------------
>
>                 Key: NIFI-3376
>                 URL: https://issues.apache.org/jira/browse/NIFI-3376
>             Project: Apache NiFi
>          Issue Type: Improvement
>          Components: Core Framework
>    Affects Versions: 0.7.1, 1.1.1
>            Reporter: Michael Moser
>            Assignee: Michael Hogue
>         Attachments: NIFI-3376_Content_Repo_size_demo.xml
>
>
> On NiFi systems that deal with many files whose size is less than 1 MB, we 
> often see that the actual disk usage of the content_repository is much 
> greater than the size of flowfiles that NiFi reports are in its queues.  As 
> an example, NiFi may report "50,000 / 12.5 GB" but the content_repository 
> takes up 240 GB of its file system.  This leads to scenarios where a 500 GB 
> content_repository file system gets 100% full, but "I only had 40 GB of data 
> in my NiFi!"
> When several content claims exist in a single resource claim, and most but 
> not all content claims are terminated, the entire resource claim is still not 
> eligible for deletion or archive.  This could mean that only one 10 KB 
> content claim out of a 1 MB resource claim is counted by NiFi as existing in 
> its queues.
> If a particular flow has a slow egress point where flowfiles could back up 
> and remain on the system longer than expected, this problem is exacerbated.
> A potential solution is to compact resource claim files on disk. A background 
> thread could examine all resource claims, and for those that get "old" and 
> whose active content claim usage drops below a threshold, then rewrite the 
> resource claim file.
> A potential work-around is to allow modification of the FileSystemRepository 
> MAX_APPENDABLE_CLAIM_LENGTH to make it a smaller number.  This would increase 
> the probability that the content claims reference count in a resource claim 
> would reach 0 and the resource claim becomes eligible for deletion/archive.  
> Let users trade-off performance for more accurate accounting of NiFi queue 
> size to content repository size.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to