Isha,

Just some perspective from the field. I have had success with containerized
NiFi and generally get along with it. That being said, I think there a few
caveats and issues you might find going down this road.

Standalone NiFi in a container works pretty much the way you would want and
expect. You do need to be careful about where you are mounting your NiFi
configuration directories, though. e.g. content_repository,
database_repository, flowfile_repository, provenance_repository, state,
logs and work. All of these directories are actively written by NiFi and
it's good to have these exported as bind mounts external from the container.

You will definitely want to bind mount the flow.xml.gz and flow.json.gz
files as well, or you will lose your live dataflow configuration changes as
you use NiFi. Any change to your nifi canvas gets written into flow.xml.gz,
which means you need to keep a copy of it outside of your container. And
there's potentially other files in the conf folder that you also want to
keep around. NiFi unfortunately doesn't organize the location of all these
directories into a single location by default, so you kind of have to
reconfigure and/or bind mount a lot of different paths.

I have found that NiFi clustering with a dockerized environment to be less
desirable. Primarily the problem is that the definition of cluster nodes is
mostly hard coded into the nifi.properties file. Usually in a containerized
environment, you want the ability to dynamically bring nodes up/down as
needed (with dynamic IP/network configuration), especially in container
orchestration frameworks like kubernetes. There's been a lot of experiments
and possibly even some reasonable solutions coming out to help with
containerized clusters, but generally you're going to find you have to
crack your knuckles a little bit to get this to work. If you're content
with a mostly statically defined non-elastic cluster configuration, then a
clustered NiFi on docker is possible.

As an option, if you stick with standalone deployments, what you can
instead do instead is front your individual NiFi node instances with a load
balancer. This may be a poor-man's approach to load distribution, but it
works reasonably well and I've seen it in action on large volume flows. If
you have the option that your data source can deliver to a load balancer,
then you can have the load balancer round-robin (or similar) to your
underlying standalone nodes. In a container orchestration environment, you
can imagine kubernetes being able to spin up and spin down containerized
nodes to handle demand, and managing a load balancer configuration as those
nodes are coming up. It's all possible, but will require some work.

Of course, doing anything with multiple standalone nodes, means that you
have to propagate changes from one NiFi canvas to all your nodes manually.
This is a huge pain and not really scalable. So the load balancer approach
is only good if your dataflow configurations are very static and don't
change day-to-day with operations.

That is, one of the issues with containerized NiFi is what to do with the
flow configuration itself. On the one hand, you kind of want to "burn in"
your flow configuration into your docker image. e.g. the flow.xml.gz and/or
flow.json.gz would be included as part of your image itself. This enables
your NiFi system to come up with a fully configured set of processors ready
to accept connections.

But part of the fun with NiFi is being able to make dataflow and processor
configuration changes on the fly as needed based on operational conditions.
For example, maybe you need to temporarily stop data moving to one location
and have it transported to another. This "live" and dynamic way to manage
NiFi is a powerful feature, but it kind of goes against the grain of a
containerized or static deployment approach. e.g. new nodes coming online
will not necessarily have the latest configuration changes that your
operational staff has added recently. The NiFi registry can somewhat help
here.

Finally to give a shout out, you may want to consider using a dockerized
minifi cluster instead of traditional NiFi. Minifi is maybe slightly more
aligned with a containerized clustering approach as Minifi more directly
supports this concept of a "burned in" processor configuration. In this
way, Minifi nodes can be spun up or down based on demand, without too much
fuss.e.g. minifi isn't really cluster aware and each node acts
independently, making it a bit easier solution for containerized or dynamic
deployments.

Hope this gives you some thoughts. There are definitely a lot of recipes
and approaches to containerized NiFi, so do some searching to find one that
matches what you're after. Almost any configuration can be done, based on
your needs.

/Adam



On Fri, Jan 27, 2023 at 3:15 AM Isha Lamboo <[email protected]>
wrote:

> Hi all,
>
>
>
> I’m looking for some perspectives from people using NiFi deployed in
> containers (Docker or otherwise).
>
>
>
> It seems to me that the NiFi architecture benefits from having a lot of
> compute resources to share for all flows, especially with large batches
> arriving periodically. On the other hand, it’s hard to prevent badly tuned
> flows from impacting others and more and more IT operations are moving to
> containerized environments, so I’m exploring the options for containerized
> NiFi as an alternative to our current VM-based approach.
>
>
> Do you deploy a few large containers similar in capacity to a VM to run
> all flows together or many small ones with only a few flows on each? And do
> you deploy them clustered or standalone?
>
>
>
> Thanks,
>
>
>
> Isha
>

Reply via email to