Russell,

Would you be able to make the counting processor you wrote (that maintains 
named NiFi counters) available to the public? This feature would be very useful 
to me. I need a counter that could survive a cluster restart for a few of my 
workflows. Currently I write out to an SQL server, but this is expensive to 
maintain.

Regards,

Kevin

From: Russell Bateman [mailto:[email protected]]
Sent: Wednesday, January 11, 2017 1:18 PM
To: [email protected]
Subject: Re: Deleting millions of files from a queue...

Yes, I had thought of that, but I needed to know how many reached the end and I 
lamely thought I could look at the queue size not thinking about how emptying 
the queue wouldn't be instantaneous.

I finally just put a counting processor we wrote (that maintains named NiFi 
counters) in each of the flows I'm testing, then let the flowfiles drain out 
into the bit bucket.

This question really only arose because I was concentrating on my testing and 
not the bigger picture of what hundreds of millions of flowfiles might do to me 
in the end. So, I guess this is sort of a wasted thread except that (I'm 
hoping) it will be Googlable by someone else down the road who wonders as I did.

Thanks
On 01/11/2017 12:08 AM, Lee Laim wrote:
Russ,
This sort of deviates from your original question, but Would applying a 
flowfile expiration time on the connection (during experimentation)  work with 
your flow?  This would keep the queue more manageable.

On Jan 10, 2017, at 4:35 PM, Russell Bateman 
<[email protected]<mailto:[email protected]>> wrote:
To update this thread, ...

1. Setting up a no-op processor to "drain" the queue doesn't seem to present 
any speed advantage over right-clicking the queue and choosing Empty queue.
2. Removing the flowfile and provenance repositories (cd flowfile_repository ; 
rm -rf *) is instantaneous.
3. However, removing the content repository from the filesystem via console 
isn't immediate. It does take time. It appears that it may not be taking as 
long as either method in #1 above, but it does take a very long time. I had 
over a hundred million files being emptied when I started this thread and I'm 
still only down just under 40% left as I write this final volley 2 hours after 
trying to delete them using the filesystem (CentOS 7, CPU inactive, 128Gb 
memory, 56 cores, hard drive--not SSD).
4. I don't dare delete the database_repository.
5. I'm assuming that once all three repositories are gone, I'll be able to 
restart NiFi without any damage to what I expect flowfile.xml.gz and the rest 
of the conf/ subdirectory are safe-guarding for me.

There may not be any instantaneous solution anyone can offer short of renaming 
the content repository subdirectory, setting up a background task to smoke it, 
and creating a new content repository to start afresh with. I haven't tried 
that yet.

It's likely a better idea to think ahead about this and provide for draining 
the queues as the flowfiles reach them. If all you want is a count of 
successful outcomes, you could do that with a no-op processor that counts as it 
goes and puts the number somewhere for safe-keeping. I wouldn't be doing this 
if I weren't trying to make some observations on performance, processing loads, 
etc., in short, testing.

If I experience anything nasty or noteworthy from this point on [4,5], I'll 
come back and update this thread again.

On 01/10/2017 02:50 PM, Russell Bateman wrote:
In my case, I'm experimenting with huge flows and huge numbers of files. I 
wasn't thinking about how much work I'd create for myself by storing up files 
in a queue at the end (or, in some cases, at intermediate points) when I might 
want to clean house and start over.

So, I can just bring NiFi down, smoke the repos, then restart safely?

On 01/10/2017 02:39 PM, Joe Witt wrote:
Millions or gajillions will indeed take a while as they have to swap in as 
presently implemented.  We could certainly optimize that if is a common need.

Blowing away the repos will certainly do the trick and be faster.  Though is 
clearly a blunt instrument.

Do you think we need an express queue killer option?

On Jan 10, 2017 1:32 PM, "Russell Bateman" 
<[email protected]<mailto:[email protected]>> wrote:
If I'm experimenting and have gajillions of flowfiles in a queue that takes a 
very long time to empty from the UI, is there a quicker way? I can certainly 
bounce NiFi, delete files, both, etc.




Reply via email to