Sure. I dropped a sample one into my notes for today here <http://www.javahotchocolate.com/notes/nifi.html#20170111>. However, it's not in a NAR. I assume you can package it up and do that? If you need help doing that, see some notes on this here <http://www.javahotchocolate.com/notes/nifi-project.html>. Please change the package name and make any other changes as you see fit.

On 01/11/2017 03:53 PM, Kevin Verhoeven wrote:

Russell,

Would you be able to make the counting processor you wrote (that maintains named NiFi counters) available to the public? This feature would be very useful to me. I need a counter that could survive a cluster restart for a few of my workflows. Currently I write out to an SQL server, but this is expensive to maintain.

Regards,

Kevin

*From:*Russell Bateman [mailto:[email protected]]
*Sent:* Wednesday, January 11, 2017 1:18 PM
*To:* [email protected]
*Subject:* Re: Deleting millions of files from a queue...

Yes, I had thought of that, but I needed to know how many reached the end and I lamely thought I could look at the queue size not thinking about how emptying the queue wouldn't be instantaneous.

I finally just put a counting processor we wrote (that maintains named NiFi counters) in each of the flows I'm testing, then let the flowfiles drain out into the bit bucket.

This question really only arose because I was concentrating on my testing and not the bigger picture of what hundreds of millions of flowfiles might do to me in the end. So, I guess this is sort of a wasted thread except that (I'm hoping) it will be Googlable by someone else down the road who wonders as I did.

Thanks

On 01/11/2017 12:08 AM, Lee Laim wrote:

    Russ,

    This sort of deviates from your original question, but Would
    applying a flowfile expiration time on the connection (during
    experimentation)  work with your flow?  This would keep the queue
    more manageable.

    On Jan 10, 2017, at 4:35 PM, Russell Bateman
    <[email protected] <mailto:[email protected]>> wrote:

        To update this thread, ...

        1. Setting up a no-op processor to "drain" the queue doesn't
        seem to present any speed advantage over right-clicking the
        queue and choosing Empty queue.
        2. Removing the flowfile and provenance repositories (cd
        flowfile_repository ; rm -rf *) is instantaneous.
        3. However, removing the content repository from the
        filesystem via console isn't immediate. It does take time. It
        appears that it may not be taking as long as either method in
        #1 above, but it does take a very long time. I had over a
        hundred million files being emptied when I started this thread
        and I'm still only down just under 40% left as I write this
        final volley 2 hours after trying to delete them using the
        filesystem (CentOS 7, CPU inactive, 128Gb memory, 56 cores,
        hard drive--not SSD).
        4. I don't dare delete the /database_repository/.
        5. I'm assuming that once all three repositories are gone,
        I'll be able to restart NiFi without any damage to what I
        expect /flowfile.xml.gz/ and the rest of the /conf//
        subdirectory are safe-guarding for me.

        There may not be any instantaneous solution anyone can offer
        short of renaming the content repository subdirectory, setting
        up a background task to smoke it, and creating a new content
        repository to start afresh with. I haven't tried that yet.

        It's likely a better idea to think ahead about this and
        provide for draining the queues as the flowfiles reach them.
        If all you want is a count of successful outcomes, you could
        do that with a no-op processor that counts as it goes and puts
        the number somewhere for safe-keeping. I wouldn't be doing
        this if I weren't trying to make some observations on
        performance, processing loads, etc., in short, testing.

        If I experience anything nasty or noteworthy from this point
        on [4,5], I'll come back and update this thread again.

        On 01/10/2017 02:50 PM, Russell Bateman wrote:

            In my case, I'm experimenting with huge flows and huge
            numbers of files. I wasn't thinking about how much work
            I'd create for myself by storing up files in a queue at
            the end (or, in some cases, at intermediate points) when I
            might want to clean house and start over.

            So, I can just bring NiFi down, smoke the repos, then
            restart safely?

            On 01/10/2017 02:39 PM, Joe Witt wrote:

                Millions or gajillions will indeed take a while as
                they have to swap in as presently implemented.  We
                could certainly optimize that if is a common need.

                Blowing away the repos will certainly do the trick and
                be faster.  Though is clearly a blunt instrument.

                Do you think we need an express queue killer option?

                On Jan 10, 2017 1:32 PM, "Russell Bateman"
                <[email protected] <mailto:[email protected]>>
                wrote:

                    If I'm experimenting and have gajillions of
                    flowfiles in a queue that takes a very long time
                    to empty from the UI, is there a quicker way? I
                    can certainly bounce NiFi, delete files, both, etc.


Reply via email to