Re: Is nifi a good fit for this use case?

Joe Witt Wed, 18 Nov 2015 08:53:27 -0800

Hello Philippe,

I believe what you state to be true but should be clear that I am not
an expert in the details of Storm.  Definitely would encourage you to
ask their community or check their docs.  That said, in the case of
storm such concept makes less sense since in storm you design the
flow/topology you want and then you unleash it to the cluster and
storm (as I understand it) determines where it runs and such.  With
NiFi you are interactively altering the live/active flow of data.
This model for NiFi makes great sense because we're operating in the
'dataflow space' where we see our job as to capture/acquire data from
any number of systems, do routing, transformation, etc.. and
ultimately deliver to any number of destination systems.  I think the
model for Storm makes great sense too for their world which is firing
off processing tasks.  I think Storm would be less effective in our
space and I think our solution would be less effective in their space.
You can do data processing in both and you can do some elements of
dataflow/system integration in both.  That does create confusion for
folks admittedly as they look at each because depending on their
perspective they see them as competitive.  We've even been compared to
Spark which is also kinda wild.


It is all about design tradeoffs system developers make.  In NiFi all
of our tradeoffs are based on our strong focus on striving for
excellence in the dataflow/system integration space.  For higher order
complex event processing use systems like Storm and Spark and others
that were designed for that side.  NiFi will happily feed data to and
receive data from such systems.

Thanks
Joe

On Wed, Nov 18, 2015 at 2:33 AM,  <[email protected]> wrote:
> Hello Joe
> Thanks for  these clear explanations
> Another great  feature available  from NIfi  comparing with storm is :  ( if 
> I understand well  :-) )
> - The possibility to stop processors , then add some processors in the middle 
> of the topology and then restart the workflow ..
> It can be qualified as runtime topology modification ..... and that behavior 
> is not possible with Storm ( right ? pls tell me if I am wrong )
>
> Philippe
> Best regards
>
> -----Message d'origine-----
> De : Joe Witt [mailto:[email protected]]
> Envoyé : mercredi 11 novembre 2015 03:28
> À : [email protected]
> Objet : Re: Is nifi a good fit for this use case?
>
> Darren,
>
> In short, yes I think NiFi can handle such a case in a generic sense quite 
> well.
>
> Read on for the longer response...
>
> NiFi can process extremely large data, extremely large datasets, extremely 
> small data and high rates, variable sized data, etc.. It makes this efficient 
> by its design, how the content repository works whereby it supports 
> pass-by-reference and copy-on-write behavior and that it operates in a manner 
> that allows disk caching benefits to really shine through.
>
> Now that said if all that is of interest is pure 'processing' and having a 
> general purpose processing framework Storm, Spark, others are focused solely 
> on that space.  NiFi is focused on the management of dataflows from wherever 
> in your enterprise data is created, produced, etc.. to and through processing 
> systems and ultimately into storage systems like HDFS, NoSQL stores, 
> relational databases.
>
> So depending on what you're trying to do to these documents be it feature 
> extraction, transformation, etc.. NiFi may be a great choice or NiFi may 
> simply be the tool you use to feed this data into systems like Storm or Spark 
> or others.  You can absolutely parallelize the flow of data across a NiFi 
> cluster.  For producers we offer a library to interact with our site to site 
> protocol which will handle things like load balancing and failover and make 
> it really easy to stream data to NiFi.  Or NiFi itself could pull from your 
> system if perhaps these documents are sitting as files or available via some 
> other supported interface.
>
> NiFi can be configured to control the rate of processing, queue data, apply 
> back-pressure, handle errors, and a number of other features that are 
> beneficial to the dataflow management problem.
>
> NiFi supports making tradeoffs at key points in the flow for batch (time 
> tolerant) or low latency (time sensitive) processing/distribution.  Whether 
> data arrives in a streaming or batch fashion and whether it must be delivered 
> to systems in batch or streaming fashion is a concern that NiFi handles well 
> so the various systems can be less coupled.
>
> Regarding its elasticity I will state that NiFi is not elastic in the sense 
> that it will (at this time) automatically provision additional nodes to take 
> on the work load and then deprovision them as the load decreases.  We will 
> get there.  But what we support are key capabilities like event driven 
> processing with upper bounds on threads, back-pressure which can propogate to 
> the source causing data to go to lesser loaded nodes, and so on.  These are 
> elements of elastic behavior but it is not elastic provisioning (as folks 
> often mean).
>
> I hope this response is helpful.  If any of this was unclear or you want to 
> dive deeper just let us know.
>
> Thanks
> Joe
>
> On Tue, Nov 10, 2015 at 6:30 PM, Darren Govoni <[email protected]> wrote:
>> Hi,
>>   I studied the nifi website a bit and if I missed a key part, forgive
>> me for asking this question.
>> But I am wondering if or how nifi can accommodate processing large
>> data sets with possibly compute intensive operations.
>> For example, if we have say 2 million documents, how does nifi make
>> processing these documents efficient?
>> I understand the visual workflow and its nice. How is that
>> parallelized across a data set?
>>
>> Do we submit all the documents to a cluster of flows (how many?) that
>> execute some number of documents simultaneously?
>> Does nifi support batch processing? Is it elastic?
>>
>> Thanks.
>
> _________________________________________________________________________________________________________________________
>
> Ce message et ses pieces jointes peuvent contenir des informations 
> confidentielles ou privilegiees et ne doivent donc
> pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu 
> ce message par erreur, veuillez le signaler
> a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
> electroniques etant susceptibles d'alteration,
> Orange decline toute responsabilite si ce message a ete altere, deforme ou 
> falsifie. Merci.
>
> This message and its attachments may contain confidential or privileged 
> information that may be protected by law;
> they should not be distributed, used or copied without authorisation.
> If you have received this email in error, please notify the sender and delete 
> this message and its attachments.
> As emails may be altered, Orange is not liable for messages that have been 
> modified, changed or falsified.
> Thank you.
>

Re: Is nifi a good fit for this use case?

Reply via email to