Dear Joe,

Thank you very much. That was a really great explanation.
I investigated the Nifi architecture, and it seems that most of the
read/write operations for flow file repo and provenance repo are random.
However, for content repo most of the read/write operations are sequential.
Let's say cost does not matter. In this case, even choosing SSD for content
repo can not provide huge performance gain instead of HDD. Am I right?
Hence, it would be better to spend content repo SSD money on network
infrastructure.

Best regards,
Ali

On Thu, Oct 13, 2016 at 10:22 PM, Joe Witt <joe.w...@gmail.com> wrote:

> Ali,
>
> You have a lot of nice resources to work with there.  I'd recommend the
> series of RAID-1 configuration personally provided you keep in mind this
> means you can only lose a single disk for any one partition.  As long as
> they're being monitored and would be quickly replaced this in practice
> works well.  If there could be lapses in monitoring or time to replace then
> it is perhaps safer to go with more redundancy or an alternative RAID type.
>
> I'd say do the OS, app installs w/user and audit db stuff, application
> logs on one physical RAID volume.  Have a dedicated physical volume for the
> flow file repository.  It will not be able to use all the space but it
> certainly could benefit from having no other contention.  This could be a
> great thing to have SSDs for actually.  And for the remaining volumes split
> them up for content and provenance as you have.  You get to make the
> overall performance versus retention decision.  Frankly, you have a great
> system to work with and I suspect you're going to see excellent results
> anyway.
>
> Conservatively speaking expect say 50MB/s of throughput per volume in the
> content repository so if you end up with 8 of them could achieve upwards of
> 400MB/s sustained.  You'll also then want to make sure you have a good 10G
> based network setup as well.  Or, you could dial back on the speed tradeoff
> and simply increase retention or disk loss tolerance.  Lots of ways to play
> the game.
>
> There are no published SSD vs HDD performance benchmarks that I am aware
> of though this is a good idea.  Having a hybrid of SSDs and HDDs could
> offer a really solid performance/retention/cost tradeoff.  For example
> having SSDs for the OS/logs/provenance/flowfile with HDDs for the content -
> that would be quite nice.  At that rate to take full advantage of the
> system you'd need to have very strong network infrastructure between NiFi
> and any systems it is interfacing with  and your flows would need to be
> well tuned for GC/memory efficiency.
>
> Thanks
> Joe
>
> On Thu, Oct 13, 2016 at 2:50 AM, Ali Nazemian <alinazem...@gmail.com>
> wrote:
>
>> Dear Nifi Users/ developers,
>> Hi,
>>
>> I was wondering is there any benchmark about the question that is it
>> better to dedicate disk control to Nifi or using RAID for this purpose? For
>> example, which of these scenarios is recommended from the performance point
>> of view?
>> Scenario 1:
>> 24 disk in total
>> 2 disk- raid 1 for OS and fileflow repo
>> 2 disk- raid 1 for provenance repo1
>> 2 disk- raid 1 for provenance repo2
>> 2 disk- raid 1 for content repo1
>> 2 disk- raid 1 for content repo2
>> 2 disk- raid 1 for content repo3
>> 2 disk- raid 1 for content repo4
>> 2 disk- raid 1 for content repo5
>> 2 disk- raid 1 for content repo6
>> 2 disk- raid 1 for content repo7
>> 2 disk- raid 1 for content repo8
>> 2 disk- raid 1 for content repo9
>>
>>
>> Scenario 2:
>> 24 disk in total
>> 2 disk- raid 1 for OS and fileflow repo
>> 4 disk- raid 10 for provenance repo1
>> 18 disk- raid 10 for content repo1
>>
>> Moreover, is there any benchmark for SSD vs HDD performance for Nifi?
>> Thank you very much.
>>
>> Best regards,
>> Ali
>>
>
>


-- 
A.Nazemian

Reply via email to