Hi Aakash.

Last summer I had an intern working for me who investigated using machine 
learning (unsupervised anomaly detection using kNN and LOF) against NiFi 
provenance data to perform error identification and build a processor 
recommendation engine. I can’t share the work as it is company internal, but 
there is definitely a growing community and interest in what you’re discussing.

If you truly want to distribute the computational load of performing the 
analysis to edge nodes, writing custom processors is likely a requirement. Can 
I make two suggestions before you begin writing code, though? First, 
investigate if you could deploy something like scikit-learn (Python) [1] or 
Apache Spark-ML [2] to reside alongside NiFi on the edge nodes (obviously 
depends on HW resources). Our early efforts involved writing custom NiFi code, 
but it turned out it was much easier to offload the data to scikit-learn and 
then ingest the results back into NiFi to continue data flow, while leaving the 
computation to an external system.

If you really want the computation to be running inside the NiFi JVM, also look 
at the ExecuteScript processor before trying to write a custom processor. While 
NiFi makes it easy to deploy custom code, the SDLC can provide a few constant 
delays — after you generate the Maven pom for the NAR, you will have to write 
the code in an IDE, test it, compile, build the NAR, drop it into the NiFi lib, 
and restart the entire application every time you make a change. To prototype 
your model, I recommend using the ES processor, which will provide immediate 
feedback. It also abstracts a lot of the boilerplate framework so you can hyper 
focus on the domain work. Matt Burgess has written a number of great articles 
which should get you up and running with it [3].

Once you have a model and computation you’re confident in, then it’s easy to 
translate it to a dedicated custom processor and deploy it. I find this 
methodology saves me a lot of time and a bit of frustration. Good luck. I’m 
very curious to see what your work yields.

[1] http://scikit-learn.org/stable/ <http://scikit-learn.org/stable/>
[2] https://spark.apache.org/mllib/ <https://spark.apache.org/mllib/>
[3] https://funnifi.blogspot.com



Andy LoPresto
[email protected]
[email protected]
PGP Fingerprint: 70EC B3E5 98A6 5A3F D3C4  BACE 3C6E F65B 2F7D EF69

> On Jan 27, 2017, at 5:45 AM, Aldrin Piri <[email protected]> wrote:
> 
> Hi, Aakash!
> 
> To my knowledge, I have not seen any discussion about such processors on the 
> lists specifically although have heard people mentioning assorted libraries 
> that might be a good fit for the NiFi ecosystem's intended purposes.  There 
> has been some foundational work such as the following issues which allow 
> processors to make use of the state management features in NiFi for the sake 
> of managing the flow of data to do some higher level inspection/analysis.
> 
> https://issues.apache.org/jira/browse/NIFI-1582 
> <https://issues.apache.org/jira/browse/NIFI-1582>
> https://issues.apache.org/jira/browse/NIFI-1682 
> <https://issues.apache.org/jira/browse/NIFI-1682>
> https://issues.apache.org/jira/browse/NIFI-2590 
> <https://issues.apache.org/jira/browse/NIFI-2590>
> 
> If my understanding of your question is correct, I believe your notion of 
> distribution may not directly align with the intended focus of NiFi, but 
> certainly could be some aspects that work.  Would you be willing to expand in 
> greater detail how you would envision such processors interacting with data 
> and possibly provide some of the libraries you were considering in your 
> initial message?
> 
> Thanks!
> 
> --aldrin
> 
> On Fri, Jan 27, 2017 at 7:38 AM, Aakash Khochare 
> <[email protected] <mailto:[email protected]>> 
> wrote:
> Greetings,
> 
> While I understand that the primary use of NiFi/MiNiFi is for secure data 
> ingress with the added benefit of Provenance, what are the views of the 
> community on writing Processors that implement Machine Learning Algorithms 
> and distribute them across Edge+ Cloud using NiFi and MiNiFi? Has anyone 
> tried writing such processors?
> 
> Regards,
> 
> Aakash Khochare
> 
> 
> 

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to