Re: Content Claims Filling Disk - Best practice for small files?

2020-09-17 Thread Ryan Hendrickson
A couple things from it: 1. The sum of the "Claimant counts" equals the number of FlowFiles reported on the Canvas. 2. None are Awaiting Destruction 3. Claimant Count Lowest number is 1 (when it's not zero) 4. Claimant Count Highest number is 4,773 (Should this one be 100 based on the max size,

Re: Content Claims Filling Disk - Best practice for small files?

2020-09-17 Thread Ryan Hendrickson
Correction - it did work. I was expecting it to be in the same folder as where I ran nifi.sh from, vs NIFI_HOME. Reviewing it now... Ryan On Thu, Sep 17, 2020 at 1:51 PM Ryan Hendrickson < ryan.andrew.hendrick...@gmail.com> wrote: > Hey Mark, > I should have mentioned the PutElasticsearchHttp

Re: Content Claims Filling Disk - Best practice for small files?

2020-09-17 Thread Ryan Hendrickson
Hey Mark, I should have mentioned the PutElasticsearchHttp is going to 2 different clusters. We did play with different thread counts for each of them. At one point were wondering if too large a Batch Size would make the threads block each. It looks like PutElasticsearchHttp serializes every

Re: Content Claims Filling Disk - Best practice for small files?

2020-09-17 Thread Mark Payne
Ryan, OK, thanks. So the “100 based on the max size” is… “fun.” Not entirely sure when that property made it into nifi.properties - I’m guessing that when the max.appendable.claim.size was added, we intended to also implement a max number of FlowFiles. But it was never implemented. So I think

Re: Content Claims Filling Disk - Best practice for small files?

2020-09-17 Thread Ryan Hendrickson
1.12.0 Thanks, Ryan On Thu, Sep 17, 2020 at 11:04 AM Joe Witt wrote: > Ryan > > What version are you using? I do think we had an issue that kept items > around longer than intended that has been addressed. > > Thanks > > On Thu, Sep 17, 2020 at 7:58 AM Ryan Hendrickson < >

Re: Content Claims Filling Disk - Best practice for small files?

2020-09-17 Thread Joe Witt
can you share your flow.xml.gz? On Thu, Sep 17, 2020 at 8:08 AM Ryan Hendrickson < ryan.andrew.hendrick...@gmail.com> wrote: > 1.12.0 > > Thanks, > Ryan > > On Thu, Sep 17, 2020 at 11:04 AM Joe Witt wrote: > >> Ryan >> >> What version are you using? I do think we had an issue that kept items >>

RE: Content Claims Filling Disk - Best practice for small files?

2020-09-17 Thread Williams, Jim
Ryan, Is this this maybe a case of exhausting inodes on the filesystem rather than exhausting the space available? If you do a ‘df -I’ on the system what do you see for inode usage? Warm regards, [cid:image001.jpg@01D68CDD.FC463950] Jim Williams | Manager, Site

Re: Content Claims Filling Disk - Best practice for small files?

2020-09-17 Thread Mark Payne
Ryan, Thanks. So 1.12.0 has no known issues with content repo not being cleaned up properly. As you pointed out, nifi.content.claim.max.appendable.size is intended to cap the maximum number of FlowFiles that will be written to a single file. However, it does come with a couple of caveats.

Re: Content Claims Filling Disk - Best practice for small files?

2020-09-17 Thread Ryan Hendrickson
@Joe I can't export the flow.xml.gz easily, although it's pretty simple. We put just the following on it's own server because DistributeLoad (bug [1]) and PutElasticsearchHttp have a hard time keeping up. 1. Input Port 2. ControlRate (data rate | 1.7GB | 5 min) 3. Update Attributes

Re: Content Claims Filling Disk - Best practice for small files?

2020-09-17 Thread Mark Payne
Ryan, Why are you using DistributeLoad to go to two different PutElasticsearchHttp processors? Does that perform better for you than a single PutElasticsearchHttp processors with multiple concurrent tasks? It shouldn’t really. I’ve never used that processor, but if two instances of the

Content Claims Filling Disk - Best practice for small files?

2020-09-17 Thread Ryan Hendrickson
Hello, I've got ~15 million FlowFiles, each roughly 4KB, totally in about 55GB of data on my canvas. However, the content repository (on it's own partition) is completely full with 350GB of data. I'm pretty certain the way Content Claims store the data is responsible for this. In previous

Re: Content Claims Filling Disk - Best practice for small files?

2020-09-17 Thread Joe Witt
Ryan What version are you using? I do think we had an issue that kept items around longer than intended that has been addressed. Thanks On Thu, Sep 17, 2020 at 7:58 AM Ryan Hendrickson < ryan.andrew.hendrick...@gmail.com> wrote: > Hello, > I've got ~15 million FlowFiles, each roughly 4KB,