Yeah, I spent a bit of time this morning before posting looking for a magic 8-10Gb advisory and generally for GC gotchas related to larger heap sizes in the 64-bit world, but couldn't find any. We're using 12Gb right now for NiFi and haven't noticed any trouble. We vaguely conceive of increasing this amount in the future as needed as our servers tend to run large amounts of memory.

The statement yesterday on this thread warning against using that much is what sent me into Google-it mode. I think this advice is a red herring.


On 10/14/2016 03:03 PM, Corey Flowers wrote:
We actually use heap sizes from 32 to 64Gb for ours but our volumes and graphs are both extremely large. Although I believe the smaller heap sizes were a limitation of the garbage collection in Java 7. We also moved to ssd drives, which did help through put quite a bit. Our systems were actually requesting the creation and removal of file handles faster than traditional disks could keep up with (we believe). In addition, unlike with traditional drives where we tired to minimize caching, we actually forced more disk caching when we moved to ssds. Still waiting to see the results of that on our volumes, although it does seemed to have help. Also remember, depending on how you code them, individual processors can use system memory outside of the heap. So you need to take that into consideration when designing the servers.

Sent from my iPhone

On Oct 14, 2016, at 1:36 PM, Joe Witt < <>> wrote:


You can definitely find a lot of material on the Internet about Java heap sizes, types of garbage collectors, application usage patterns. By all means please do experiment with different sizes appropriate for your case. We're not saying NiFi itself has any problem with large heaps.


On Fri, Oct 14, 2016 at 12:44 PM, Russell Bateman < <>> wrote:


    "not recommended to dedicate more than 8-10 GM to JVM heap space"
    by whom? Do you have links/references establishing this? I
    couldn't find anyone saying this or why.


    On 10/13/2016 05:47 PM, Ali Nazemian wrote:

    I have another question regarding the hardware recommendation.
    As far as I found out, Nifi uses on-heap memory currently, and
    it will not try to load the whole object in memory. From the
    garbage collection perspective, it is not recommended to
    dedicate more than 8-10 GB to JVM heap space. In this case, may
    I say spending money on system memory is useless? Probably 16 GB
    per each system is enough according to this architecture. Unless
    some architecture changes appear in the future to use off-heap
    memory as well. However, I found some articles about best
    practices, and in terms of memory recommendation it does not
    make sense. Would you please clarify this part for me?
    Thank you very much.

    Best regards,

    On Thu, Oct 13, 2016 at 11:38 PM, Ali Nazemian
    < <>> wrote:

        Thank you very much.
        I would be more than happy to provide some benchmark results
        after the implementation.
        Sincerely yours,

        On Thu, Oct 13, 2016 at 11:32 PM, Joe Witt
        < <>> wrote:


            I agree with your assumption.  It would be great to test
            that out and provide some numbers but intuitively I agree.

            I could envision certain scatter/gather data flows that
            could challenge that sequential access assumption but
            honestly with how awesome disk caching is in Linux these
            days in think practically speaking this is the right way
            to think about it.


            On Thu, Oct 13, 2016 at 8:29 AM, Ali Nazemian
            < <>>

                Dear Joe,

                Thank you very much. That was a really great
                I investigated the Nifi architecture, and it seems
                that most of the read/write operations for flow file
                repo and provenance repo are random. However, for
                content repo most of the read/write operations are
                sequential. Let's say cost does not matter. In this
                case, even choosing SSD for content repo can not
                provide huge performance gain instead of HDD. Am I
                right? Hence, it would be better to spend content
                repo SSD money on network infrastructure.

                Best regards,

                On Thu, Oct 13, 2016 at 10:22 PM, Joe Witt
                < <>> wrote:


                    You have a lot of nice resources to work with
                    there.  I'd recommend the series of RAID-1
                    configuration personally provided you keep in
                    mind this means you can only lose a single disk
                    for any one partition.  As long as they're being
                    monitored and would be quickly replaced this in
                    practice works well. If there could be lapses in
                    monitoring or time to replace then it is perhaps
                    safer to go with more redundancy or an
                    alternative RAID type.

                    I'd say do the OS, app installs w/user and audit
                    db stuff, application logs on one physical RAID
                    volume.  Have a dedicated physical volume for
                    the flow file repository. It will not be able to
                    use all the space but it certainly could benefit
                    from having no other contention. This could be a
                    great thing to have SSDs for actually. And for
                    the remaining volumes split them up for content
                    and provenance as you have.  You get to make the
                    overall performance versus retention decision.
                    Frankly, you have a great system to work with
                    and I suspect you're going to see excellent
                    results anyway.

                    Conservatively speaking expect say 50MB/s of
                    throughput per volume in the content repository
                    so if you end up with 8 of them could achieve
                    upwards of 400MB/s sustained. You'll also then
                    want to make sure you have a good 10G based
                    network setup as well.  Or, you could dial back
                    on the speed tradeoff and simply increase
                    retention or disk loss tolerance. Lots of ways
                    to play the game.

                    There are no published SSD vs HDD performance
                    benchmarks that I am aware of though this is a
                    good idea. Having a hybrid of SSDs and HDDs
                    could offer a really solid
                    performance/retention/cost tradeoff.  For
                    example having SSDs for the
                    OS/logs/provenance/flowfile with HDDs for the
                    content - that would be quite nice. At that rate
                    to take full advantage of the system you'd need
                    to have very strong network infrastructure
                    between NiFi and any systems it is interfacing
                    with  and your flows would need to be well tuned
                    for GC/memory efficiency.


                    On Thu, Oct 13, 2016 at 2:50 AM, Ali Nazemian
                    <>> wrote:

                        Dear Nifi Users/ developers,

                        I was wondering is there any benchmark about
                        the question that is it better to dedicate
                        disk control to Nifi or using RAID for this
                        purpose? For example, which of these
                        scenarios is recommended from the
                        performance point of view?
                        Scenario 1:
                        24 disk in total
                        2 disk- raid 1 for OS and fileflow repo
                        2 disk- raid 1 for provenance repo1
                        2 disk- raid 1 for provenance repo2
                        2 disk- raid 1 for content repo1
                        2 disk- raid 1 for content repo2
                        2 disk- raid 1 for content repo3
                        2 disk- raid 1 for content repo4
                        2 disk- raid 1 for content repo5
                        2 disk- raid 1 for content repo6
                        2 disk- raid 1 for content repo7
                        2 disk- raid 1 for content repo8
                        2 disk- raid 1 for content repo9

                        Scenario 2:
                        24 disk in total
                        2 disk- raid 1 for OS and fileflow repo
                        4 disk- raid 10 for provenance repo1
                        18 disk- raid 10 for content repo1

                        Moreover, is there any benchmark for SSD vs
                        HDD performance for Nifi?
                        Thank you very much.

                        Best regards,

-- A.Nazemian

-- A.Nazemian

-- A.Nazemian

Reply via email to