Hello Joe
Thanks for  these clear explanations 
Another great  feature available  from NIfi  comparing with storm is :  ( if I 
understand well  :-) )
- The possibility to stop processors , then add some processors in the middle 
of the topology and then restart the workflow .. 
It can be qualified as runtime topology modification ..... and that behavior is 
not possible with Storm ( right ? pls tell me if I am wrong )

Philippe
Best regards

-----Message d'origine-----
De : Joe Witt [mailto:[email protected]] 
Envoyé : mercredi 11 novembre 2015 03:28
À : [email protected]
Objet : Re: Is nifi a good fit for this use case?

Darren,

In short, yes I think NiFi can handle such a case in a generic sense quite well.

Read on for the longer response...

NiFi can process extremely large data, extremely large datasets, extremely 
small data and high rates, variable sized data, etc.. It makes this efficient 
by its design, how the content repository works whereby it supports 
pass-by-reference and copy-on-write behavior and that it operates in a manner 
that allows disk caching benefits to really shine through.

Now that said if all that is of interest is pure 'processing' and having a 
general purpose processing framework Storm, Spark, others are focused solely on 
that space.  NiFi is focused on the management of dataflows from wherever in 
your enterprise data is created, produced, etc.. to and through processing 
systems and ultimately into storage systems like HDFS, NoSQL stores, relational 
databases.

So depending on what you're trying to do to these documents be it feature 
extraction, transformation, etc.. NiFi may be a great choice or NiFi may simply 
be the tool you use to feed this data into systems like Storm or Spark or 
others.  You can absolutely parallelize the flow of data across a NiFi cluster. 
 For producers we offer a library to interact with our site to site protocol 
which will handle things like load balancing and failover and make it really 
easy to stream data to NiFi.  Or NiFi itself could pull from your system if 
perhaps these documents are sitting as files or available via some other 
supported interface.

NiFi can be configured to control the rate of processing, queue data, apply 
back-pressure, handle errors, and a number of other features that are 
beneficial to the dataflow management problem.

NiFi supports making tradeoffs at key points in the flow for batch (time 
tolerant) or low latency (time sensitive) processing/distribution.  Whether 
data arrives in a streaming or batch fashion and whether it must be delivered 
to systems in batch or streaming fashion is a concern that NiFi handles well so 
the various systems can be less coupled.

Regarding its elasticity I will state that NiFi is not elastic in the sense 
that it will (at this time) automatically provision additional nodes to take on 
the work load and then deprovision them as the load decreases.  We will get 
there.  But what we support are key capabilities like event driven processing 
with upper bounds on threads, back-pressure which can propogate to the source 
causing data to go to lesser loaded nodes, and so on.  These are elements of 
elastic behavior but it is not elastic provisioning (as folks often mean).

I hope this response is helpful.  If any of this was unclear or you want to 
dive deeper just let us know.

Thanks
Joe

On Tue, Nov 10, 2015 at 6:30 PM, Darren Govoni <[email protected]> wrote:
> Hi,
>   I studied the nifi website a bit and if I missed a key part, forgive 
> me for asking this question.
> But I am wondering if or how nifi can accommodate processing large 
> data sets with possibly compute intensive operations.
> For example, if we have say 2 million documents, how does nifi make 
> processing these documents efficient?
> I understand the visual workflow and its nice. How is that 
> parallelized across a data set?
>
> Do we submit all the documents to a cluster of flows (how many?) that 
> execute some number of documents simultaneously?
> Does nifi support batch processing? Is it elastic?
>
> Thanks.

_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations 
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce 
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages 
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou 
falsifie. Merci.

This message and its attachments may contain confidential or privileged 
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete 
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been 
modified, changed or falsified.
Thank you.

Reply via email to