We have been using Nifi extensively on AWS for the last 9 months processing 
relatively high volumes of data.  We have two primary uses cases for Nifi – 
ingest the data and process the data.  We do that on separate instances with 
Kafka in the middle.

For just consuming the data we use at least m4.xlarge because of the “high” 
network performance.

For processing the data it depends on how many processors we are running and 
how cpu intensive they are.  We have several custom processors.  We take 
advantage of the “concurrent tasks” option quite a bit so we try to scale 
accordingly going all the way up to the m4.10xlarge at times

Memory has rarely been an issue

We add a lot of storage to support the queues which as been a life saver!

We have run into issues with provenance not keeping up.  By using provisioned 
IOPS for the storage type we can bump up the IOPS accordingly without 
increasing storage.

As James mentioned start small and increase as needed.

Ralph

From: James Wing <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Thursday, June 30, 2016 at 8:38 AM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: Best EC2 instance type for NiFi

Stephane,

I think too much will depend on the nature of your data and the flow gauntlet 
you run it through.  Out of the box, NiFi can run on a t2.micro, although a 
modest flow will quickly exceed that.  A flow doing a high volume of regular 
expressions in parallel might benefit from a compute-optimized instance.  Some 
flows with simple processing of many large objects will be bound more by IO 
than CPU.  And the performance of the systems NiFi connects with is likely to 
be a big factor.

Learning which of these problems you will have requires developing and running 
the flow for a while.  I recommend a general-purpose instance until you scale 
up enough to know which, if any, specialized instance optimized for compute, 
memory, or IO would help.  You might also consider the disk configurations and 
provisioned IOPS options there.  The great thing about EC2 is that you can 
start small and trade up to a bigger instance when you know more.

Thanks,

James

On Wed, Jun 29, 2016 at 8:51 PM, Stéphane Maarek 
<[email protected]<mailto:[email protected]>> wrote:
Hi,

I'm wondering which instance on AWS EC2 is best suited for NiFi (let's say for 
a standalone). I'm wondering if it's a compute instance (c4), or something 
else? and why?

Thanks for your help!
Stephane

Reply via email to