[jira] [Commented] (NIFI-7992) Content Repository can fail to cleanup archive directory fast enough

2020-11-10 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-7992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229503#comment-17229503
 ] 

ASF subversion and git services commented on NIFI-7992:
---

Commit badcfe1ab7b7166decb92a0d427ba48fbf613400 in nifi's branch 
refs/heads/main from Mark Payne
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=badcfe1 ]

NIFI-7992: Periodically check disk usage for content repo to see if 
backpressure should be applied. Log progress in background task. Improve 
performance of background cleanup task by not using an ArrayList Iterator and 
constantly calling remove but instead wait until the end of our cleanup loop 
and then removed from the list all elements that should be removed in a single 
update

This closes #4652.

Signed-off-by: Bryan Bende 


> Content Repository can fail to cleanup archive directory fast enough
> 
>
> Key: NIFI-7992
> URL: https://issues.apache.org/jira/browse/NIFI-7992
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Reporter: Mark Payne
>Assignee: Mark Payne
>Priority: Critical
> Fix For: 1.13.0
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> For the scenario where a use is generating many small FlowFiles and has the 
> "nifi.content.claim.max.appendable.size" property set to a small value, we 
> can encounter a situation where data is constantly archived but not cleaned 
> up quickly enough. As a result, the Content Repository can run out of space.
> The FileSystemRepository has a backpressure mechanism built in to avoid 
> allowing this to happen, but under the above conditions, it can sometimes 
> fail to prevent this situation. The backpressure mechanism works by 
> performing the following steps:
>  # When a new Content Claim is created, the Content Repository determines 
> which 'container' to use.
>  # Content Repository checks if the amount of storage space used for the 
> container is greater than the configured backpressure threshold.
>  # If so, the thread blocks until a background task completes cleanup of the 
> archive directories.
> However, in Step #2 above, it determines if the amount of space currently 
> being used by looking at a cached member variable. That cached member 
> variable is only updated on the first iteration, and when the said background 
> task completes.
> So, now consider a case where there are millions of files in the content 
> repository archive. The background task could take a massive amount of time 
> performing cleanup. Meanwhile, processors are able to write to the repository 
> without any backpressure being applied because the background task hasn't 
> updated the cached variable for the amount of space used. This continues 
> until the content repository fills.
> There are three important very simple things that should be changed:
>  # The background task should be faster in this case. While we cannot improve 
> the amount of time it takes to destroy the files, we do create an ArrayList 
> to hold all of the file info and then use an iterator, calling remove(). 
> Under the hood, this creates a copy of the underlying array for each file 
> that is removed. On my laptop, performing this procedure on an ArrayList with 
> 1 million elements took approximately 1 minute. Changing to a LinkedList took 
> 15 milliseconds but took much more heap. Keeping an ArrayList, then removing 
> all of elements at the end (via ArrayList.subList(0, n).clear()) resulted in 
> similar performance to LinkedList with the memory footprint of ArrayList.
>  # The check to see whether or not the content repository's usage has crossed 
> the threshold should not rely entirely on a cache that is populated by a 
> process that can take a long time. It should periodically calculate the disk 
> usage itself (perhaps once per minute).
>  # When backpressure does get applied, it can appear that the system has 
> frozen up, not performing any sort of work. The background task that is 
> clearing space should periodically log its progress at INFO level to allow 
> users to understand that this action is taking place.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (NIFI-7992) Content Repository can fail to cleanup archive directory fast enough

2020-11-10 Thread Joe Witt (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-7992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17229447#comment-17229447
 ] 

Joe Witt commented on NIFI-7992:


Didn't review the code in detail but did review this writeup and thinking back 
about 8 years ago when I think we last talked about this...the writeup/change 
makes a lot of sense!

> Content Repository can fail to cleanup archive directory fast enough
> 
>
> Key: NIFI-7992
> URL: https://issues.apache.org/jira/browse/NIFI-7992
> Project: Apache NiFi
>  Issue Type: Bug
>  Components: Core Framework
>Reporter: Mark Payne
>Assignee: Mark Payne
>Priority: Critical
> Fix For: 1.13.0
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> For the scenario where a use is generating many small FlowFiles and has the 
> "nifi.content.claim.max.appendable.size" property set to a small value, we 
> can encounter a situation where data is constantly archived but not cleaned 
> up quickly enough. As a result, the Content Repository can run out of space.
> The FileSystemRepository has a backpressure mechanism built in to avoid 
> allowing this to happen, but under the above conditions, it can sometimes 
> fail to prevent this situation. The backpressure mechanism works by 
> performing the following steps:
>  # When a new Content Claim is created, the Content Repository determines 
> which 'container' to use.
>  # Content Repository checks if the amount of storage space used for the 
> container is greater than the configured backpressure threshold.
>  # If so, the thread blocks until a background task completes cleanup of the 
> archive directories.
> However, in Step #2 above, it determines if the amount of space currently 
> being used by looking at a cached member variable. That cached member 
> variable is only updated on the first iteration, and when the said background 
> task completes.
> So, now consider a case where there are millions of files in the content 
> repository archive. The background task could take a massive amount of time 
> performing cleanup. Meanwhile, processors are able to write to the repository 
> without any backpressure being applied because the background task hasn't 
> updated the cached variable for the amount of space used. This continues 
> until the content repository fills.
> There are three important very simple things that should be changed:
>  # The background task should be faster in this case. While we cannot improve 
> the amount of time it takes to destroy the files, we do create an ArrayList 
> to hold all of the file info and then use an iterator, calling remove(). 
> Under the hood, this creates a copy of the underlying array for each file 
> that is removed. On my laptop, performing this procedure on an ArrayList with 
> 1 million elements took approximately 1 minute. Changing to a LinkedList took 
> 15 milliseconds but took much more heap. Keeping an ArrayList, then removing 
> all of elements at the end (via ArrayList.subList(0, n).clear()) resulted in 
> similar performance to LinkedList with the memory footprint of ArrayList.
>  # The check to see whether or not the content repository's usage has crossed 
> the threshold should not rely entirely on a cache that is populated by a 
> process that can take a long time. It should periodically calculate the disk 
> usage itself (perhaps once per minute).
>  # When backpressure does get applied, it can appear that the system has 
> frozen up, not performing any sort of work. The background task that is 
> clearing space should periodically log its progress at INFO level to allow 
> users to understand that this action is taking place.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)