Re: Need opinion, NIFI for real time stream processing??

2018-05-21 Thread Bobby
Thanks a lot, i managed to create a solid poc for my project..Although batch
processing is nice, but client's requirement need to do stream processing
instead...
I actually have another problem yet to ask but i will open it in another
thread since it has different context..

Thank you again for your suggestion Joe 



--
Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Re: Need opinion, NIFI for real time stream processing??

2018-03-06 Thread Joe Witt
Bobby

What you describe is a well supported and strong use case for Apache NiFi.

1) You'll want to look into and experiment with the record oriented
processors.  Specifically there is a LookupRecord processor that would
probably work well for you.  With it you can plugin a lookup service
and if you like you can script or code one to behave exactly as you
wish with regard to caching/etc..

2) With the record oriented processors and the record readers/writers
data is naturally grouped together and kept together in microbatches
(one or more records per flowfile) as per whatever the behavior is of
the given protocols used to source data and then what happens in the
flow logic.  For instance, when doing a poll against Kafka numerous
messages are obtained in a single call.  We'd have them together in
one flowfile but framed as records.

3) You could leverage this with the scripting processors, custom
processors you might write, or using the Spring context processors.
However, it is possible you wont need/want to as you learn more about
NiFi.  Not that those things are good in their own right but rather
you might find NiFi handles the cases you might have used those for
well enough on its own.

Thanks
Joe

On Tue, Mar 6, 2018 at 11:44 PM, Bobby  wrote:
> Good day,
>
> I would like to know the possibilities of NIFI in real time processing;
>
> I'm thinking of a simple application of messaging,
> 1. A processor will accept messages and do a simple mapping of its
> attributes to database, i.e. if message_body.startsWith("123") then get data
> from table with ID = 123 and add the value as new attribute
> 2. A processor will act as a router
> 3. Two processors will accept act as dispatcher
> In simple illustration,
>
> |RECEIVER| -- |ROUTER| -- |DISPATCHER A|
>  -- |DISPATCHER B|
>
>
> Now my questions are:
> 1. What is the best practice / proper way to do database mapping with
> complex criteria?, in my experience, the best way to do this in Java is to
> fetch the data in specific table (let's say Table A) into a collection (Map)
> in regular schedule (per 5 mins or so); The reason i'm doing this is to
> reduce the connection thingy when querying. Imagine if for each message we
> have to open connection, query, close connection. I've tested this approach
> and it was much faster, even with connection pooling. The downside is i need
> space in my memory which grow along with my database size. I use MySQL and
> haven't tested in memory database, i just want to know the common and fast
> approach to do this. If there is no other way, i might create a standalone
> module using spring boot to handle this stuffs and communicating with NIFI
> processor using REST API
> 2. Is it OK to use one flow file for one message?
> 3. Is it possible and recommended to use dependency injection library like
> Spring?
>
> Thank you,
>
>
> Bobby
>
>
>
> --
> Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/


Need opinion, NIFI for real time stream processing??

2018-03-06 Thread Bobby
Good day,

I would like to know the possibilities of NIFI in real time processing; 

I'm thinking of a simple application of messaging, 
1. A processor will accept messages and do a simple mapping of its
attributes to database, i.e. if message_body.startsWith("123") then get data
from table with ID = 123 and add the value as new attribute
2. A processor will act as a router
3. Two processors will accept act as dispatcher 
In simple illustration,

|RECEIVER| -- |ROUTER| -- |DISPATCHER A|
 -- |DISPATCHER B|


Now my questions are:
1. What is the best practice / proper way to do database mapping with
complex criteria?, in my experience, the best way to do this in Java is to
fetch the data in specific table (let's say Table A) into a collection (Map)
in regular schedule (per 5 mins or so); The reason i'm doing this is to
reduce the connection thingy when querying. Imagine if for each message we
have to open connection, query, close connection. I've tested this approach
and it was much faster, even with connection pooling. The downside is i need
space in my memory which grow along with my database size. I use MySQL and
haven't tested in memory database, i just want to know the common and fast
approach to do this. If there is no other way, i might create a standalone
module using spring boot to handle this stuffs and communicating with NIFI
processor using REST API
2. Is it OK to use one flow file for one message?
3. Is it possible and recommended to use dependency injection library like
Spring?

Thank you,


Bobby



--
Sent from: http://apache-nifi-developer-list.39713.n7.nabble.com/