Thank you very much.
I would be more than happy to provide some benchmark results after the
implementation.
Sincerely yours,
Ali

On Thu, Oct 13, 2016 at 11:32 PM, Joe Witt <joe.w...@gmail.com> wrote:

> Ali,
>
> I agree with your assumption.  It would be great to test that out and
> provide some numbers but intuitively I agree.
>
> I could envision certain scatter/gather data flows that could challenge
> that sequential access assumption but honestly with how awesome disk
> caching is in Linux these days in think practically speaking this is the
> right way to think about it.
>
> Thanks
> Joe
>
> On Thu, Oct 13, 2016 at 8:29 AM, Ali Nazemian <alinazem...@gmail.com>
> wrote:
>
>> Dear Joe,
>>
>> Thank you very much. That was a really great explanation.
>> I investigated the Nifi architecture, and it seems that most of the
>> read/write operations for flow file repo and provenance repo are random.
>> However, for content repo most of the read/write operations are sequential.
>> Let's say cost does not matter. In this case, even choosing SSD for content
>> repo can not provide huge performance gain instead of HDD. Am I right?
>> Hence, it would be better to spend content repo SSD money on network
>> infrastructure.
>>
>> Best regards,
>> Ali
>>
>> On Thu, Oct 13, 2016 at 10:22 PM, Joe Witt <joe.w...@gmail.com> wrote:
>>
>>> Ali,
>>>
>>> You have a lot of nice resources to work with there.  I'd recommend the
>>> series of RAID-1 configuration personally provided you keep in mind this
>>> means you can only lose a single disk for any one partition.  As long as
>>> they're being monitored and would be quickly replaced this in practice
>>> works well.  If there could be lapses in monitoring or time to replace then
>>> it is perhaps safer to go with more redundancy or an alternative RAID type.
>>>
>>> I'd say do the OS, app installs w/user and audit db stuff, application
>>> logs on one physical RAID volume.  Have a dedicated physical volume for the
>>> flow file repository.  It will not be able to use all the space but it
>>> certainly could benefit from having no other contention.  This could be a
>>> great thing to have SSDs for actually.  And for the remaining volumes split
>>> them up for content and provenance as you have.  You get to make the
>>> overall performance versus retention decision.  Frankly, you have a great
>>> system to work with and I suspect you're going to see excellent results
>>> anyway.
>>>
>>> Conservatively speaking expect say 50MB/s of throughput per volume in
>>> the content repository so if you end up with 8 of them could achieve
>>> upwards of 400MB/s sustained.  You'll also then want to make sure you have
>>> a good 10G based network setup as well.  Or, you could dial back on the
>>> speed tradeoff and simply increase retention or disk loss tolerance.  Lots
>>> of ways to play the game.
>>>
>>> There are no published SSD vs HDD performance benchmarks that I am aware
>>> of though this is a good idea.  Having a hybrid of SSDs and HDDs could
>>> offer a really solid performance/retention/cost tradeoff.  For example
>>> having SSDs for the OS/logs/provenance/flowfile with HDDs for the content -
>>> that would be quite nice.  At that rate to take full advantage of the
>>> system you'd need to have very strong network infrastructure between NiFi
>>> and any systems it is interfacing with  and your flows would need to be
>>> well tuned for GC/memory efficiency.
>>>
>>> Thanks
>>> Joe
>>>
>>> On Thu, Oct 13, 2016 at 2:50 AM, Ali Nazemian <alinazem...@gmail.com>
>>> wrote:
>>>
>>>> Dear Nifi Users/ developers,
>>>> Hi,
>>>>
>>>> I was wondering is there any benchmark about the question that is it
>>>> better to dedicate disk control to Nifi or using RAID for this purpose? For
>>>> example, which of these scenarios is recommended from the performance point
>>>> of view?
>>>> Scenario 1:
>>>> 24 disk in total
>>>> 2 disk- raid 1 for OS and fileflow repo
>>>> 2 disk- raid 1 for provenance repo1
>>>> 2 disk- raid 1 for provenance repo2
>>>> 2 disk- raid 1 for content repo1
>>>> 2 disk- raid 1 for content repo2
>>>> 2 disk- raid 1 for content repo3
>>>> 2 disk- raid 1 for content repo4
>>>> 2 disk- raid 1 for content repo5
>>>> 2 disk- raid 1 for content repo6
>>>> 2 disk- raid 1 for content repo7
>>>> 2 disk- raid 1 for content repo8
>>>> 2 disk- raid 1 for content repo9
>>>>
>>>>
>>>> Scenario 2:
>>>> 24 disk in total
>>>> 2 disk- raid 1 for OS and fileflow repo
>>>> 4 disk- raid 10 for provenance repo1
>>>> 18 disk- raid 10 for content repo1
>>>>
>>>> Moreover, is there any benchmark for SSD vs HDD performance for Nifi?
>>>> Thank you very much.
>>>>
>>>> Best regards,
>>>> Ali
>>>>
>>>
>>>
>>
>>
>> --
>> A.Nazemian
>>
>
>


-- 
A.Nazemian

Reply via email to