Re: Deleting millions of files from a queue...

Russell Bateman Wed, 11 Jan 2017 13:18:06 -0800

Yes, I had thought of that, but I needed to know how many reached theend and I lamely thought I could look at the queue size not thinkingabout how emptying the queue wouldn't be instantaneous.

I finally just put a counting processor we wrote (that maintains namedNiFi counters) in each of the flows I'm testing, then let the flowfilesdrain out into the bit bucket.

This question really only arose because I was concentrating on mytesting and not the bigger picture of what hundreds of millions offlowfiles might do to me in the end. So, I guess this is sort of awasted thread except that (I'm hoping) it will be Googlable by someoneelse down the road who wonders as I did.


Thanks

On 01/11/2017 12:08 AM, Lee Laim wrote:

Russ,
This sort of deviates from your original question, but Would applyinga flowfile expiration time on the connection (during experimentation)work with your flow? This would keep the queue more manageable.
On Jan 10, 2017, at 4:35 PM, Russell Bateman <[email protected]<mailto:[email protected]>> wrote:
To update this thread, ...
1. Setting up a no-op processor to "drain" the queue doesn't seem topresent any speed advantage over right-clicking the queue andchoosing Empty queue.2. Removing the flowfile and provenance repositories (cdflowfile_repository ; rm -rf *) is instantaneous.3. However, removing the content repository from the filesystem viaconsole isn't immediate. It does take time. It appears that it maynot be taking as long as either method in #1 above, but it does takea very long time. I had over a hundred million files being emptiedwhen I started this thread and I'm still only down just under 40%left as I write this final volley 2 hours after trying to delete themusing the filesystem (CentOS 7, CPU inactive, 128Gb memory, 56 cores,hard drive--not SSD).
4. I don't dare delete the /database_repository/.
5. I'm assuming that once all three repositories are gone, I'll beable to restart NiFi without any damage to what I expect/flowfile.xml.gz/ and the rest of the /conf// subdirectory aresafe-guarding for me.
There may not be any instantaneous solution anyone can offer short ofrenaming the content repository subdirectory, setting up a backgroundtask to smoke it, and creating a new content repository to startafresh with. I haven't tried that yet.
It's likely a better idea to think ahead about this and provide fordraining the queues as the flowfiles reach them. If all you want is acount of successful outcomes, you could do that with a no-opprocessor that counts as it goes and puts the number somewhere forsafe-keeping. I wouldn't be doing this if I weren't trying to makesome observations on performance, processing loads, etc., in short,testing.
If I experience anything nasty or noteworthy from this point on[4,5], I'll come back and update this thread again.
On 01/10/2017 02:50 PM, Russell Bateman wrote:
In my case, I'm experimenting with huge flows and huge numbers offiles. I wasn't thinking about how much work I'd create for myselfby storing up files in a queue at the end (or, in some cases, atintermediate points) when I might want to clean house and start over.
So, I can just bring NiFi down, smoke the repos, then restart safely?


On 01/10/2017 02:39 PM, Joe Witt wrote:
Millions or gajillions will indeed take a while as they have toswap in as presently implemented. We could certainly optimize thatif is a common need.
Blowing away the repos will certainly do the trick and be faster.Though is clearly a blunt instrument.
Do you think we need an express queue killer option?
On Jan 10, 2017 1:32 PM, "Russell Bateman" <[email protected]<mailto:[email protected]>> wrote:
    If I'm experimenting and have gajillions of flowfiles in a
    queue that takes a very long time to empty from the UI, is
    there a quicker way? I can certainly bounce NiFi, delete files,
    both, etc.

Re: Deleting millions of files from a queue...

Reply via email to