Hi,
I have another question regarding the hardware recommendation.
As far as I found out, Nifi uses on-heap memory currently, and
it will not try to load the whole object in memory. From the
garbage collection perspective, it is not recommended to
dedicate more than 8-10 GB to JVM heap space. In this case, may
I say spending money on system memory is useless? Probably 16 GB
per each system is enough according to this architecture. Unless
some architecture changes appear in the future to use off-heap
memory as well. However, I found some articles about best
practices, and in terms of memory recommendation it does not
make sense. Would you please clarify this part for me?
Thank you very much.
Best regards,
Ali
On Thu, Oct 13, 2016 at 11:38 PM, Ali Nazemian
<[email protected] <mailto:[email protected]>> wrote:
Thank you very much.
I would be more than happy to provide some benchmark results
after the implementation.
Sincerely yours,
Ali
On Thu, Oct 13, 2016 at 11:32 PM, Joe Witt
<[email protected] <mailto:[email protected]>> wrote:
Ali,
I agree with your assumption. It would be great to test
that out and provide some numbers but intuitively I agree.
I could envision certain scatter/gather data flows that
could challenge that sequential access assumption but
honestly with how awesome disk caching is in Linux these
days in think practically speaking this is the right way
to think about it.
Thanks
Joe
On Thu, Oct 13, 2016 at 8:29 AM, Ali Nazemian
<[email protected] <mailto:[email protected]>>
wrote:
Dear Joe,
Thank you very much. That was a really great
explanation.
I investigated the Nifi architecture, and it seems
that most of the read/write operations for flow file
repo and provenance repo are random. However, for
content repo most of the read/write operations are
sequential. Let's say cost does not matter. In this
case, even choosing SSD for content repo can not
provide huge performance gain instead of HDD. Am I
right? Hence, it would be better to spend content
repo SSD money on network infrastructure.
Best regards,
Ali
On Thu, Oct 13, 2016 at 10:22 PM, Joe Witt
<[email protected] <mailto:[email protected]>> wrote:
Ali,
You have a lot of nice resources to work with
there. I'd recommend the series of RAID-1
configuration personally provided you keep in
mind this means you can only lose a single disk
for any one partition. As long as they're being
monitored and would be quickly replaced this in
practice works well. If there could be lapses in
monitoring or time to replace then it is perhaps
safer to go with more redundancy or an
alternative RAID type.
I'd say do the OS, app installs w/user and audit
db stuff, application logs on one physical RAID
volume. Have a dedicated physical volume for
the flow file repository. It will not be able to
use all the space but it certainly could benefit
from having no other contention. This could be a
great thing to have SSDs for actually. And for
the remaining volumes split them up for content
and provenance as you have. You get to make the
overall performance versus retention decision.
Frankly, you have a great system to work with
and I suspect you're going to see excellent
results anyway.
Conservatively speaking expect say 50MB/s of
throughput per volume in the content repository
so if you end up with 8 of them could achieve
upwards of 400MB/s sustained. You'll also then
want to make sure you have a good 10G based
network setup as well. Or, you could dial back
on the speed tradeoff and simply increase
retention or disk loss tolerance. Lots of ways
to play the game.
There are no published SSD vs HDD performance
benchmarks that I am aware of though this is a
good idea. Having a hybrid of SSDs and HDDs
could offer a really solid
performance/retention/cost tradeoff. For
example having SSDs for the
OS/logs/provenance/flowfile with HDDs for the
content - that would be quite nice. At that rate
to take full advantage of the system you'd need
to have very strong network infrastructure
between NiFi and any systems it is interfacing
with and your flows would need to be well tuned
for GC/memory efficiency.
Thanks
Joe
On Thu, Oct 13, 2016 at 2:50 AM, Ali Nazemian
<[email protected]
<mailto:[email protected]>> wrote:
Dear Nifi Users/ developers,
Hi,
I was wondering is there any benchmark about
the question that is it better to dedicate
disk control to Nifi or using RAID for this
purpose? For example, which of these
scenarios is recommended from the
performance point of view?
Scenario 1:
24 disk in total
2 disk- raid 1 for OS and fileflow repo
2 disk- raid 1 for provenance repo1
2 disk- raid 1 for provenance repo2
2 disk- raid 1 for content repo1
2 disk- raid 1 for content repo2
2 disk- raid 1 for content repo3
2 disk- raid 1 for content repo4
2 disk- raid 1 for content repo5
2 disk- raid 1 for content repo6
2 disk- raid 1 for content repo7
2 disk- raid 1 for content repo8
2 disk- raid 1 for content repo9
Scenario 2:
24 disk in total
2 disk- raid 1 for OS and fileflow repo
4 disk- raid 10 for provenance repo1
18 disk- raid 10 for content repo1
Moreover, is there any benchmark for SSD vs
HDD performance for Nifi?
Thank you very much.
Best regards,
Ali
--
A.Nazemian
--
A.Nazemian
--
A.Nazemian