Re: Usecase for flink

2021-09-10 Thread Timo Walther
If your graphs fit in memory (at least after an initial partitioning), 
you could use any external library for graph processing within a single 
node in a Flink ProcessFunction.


Flink is a general data processor that allows to have arbitrary logic 
where user code is allowed.


Regards,
Timo

On 10.09.21 15:13, Dipanjan Mazumder wrote:
Good point what is the better option for graph processing with flink.. 
any suggestions


On Friday, September 10, 2021, 04:52:30 PM GMT+5:30, Martijn Visser 
 wrote:



Hi,

Please keep in mind that Gelly is approaching end-of-life [1]

Regards,

Martijn

[1] https://flink.apache.org/roadmap.html 



On Fri, 10 Sept 2021 at 09:09, Dipanjan Mazumder > wrote:


Hi Jing,

     Thanks for the input another question i had was can Gelly be
used for processing the graph that flink receives through kafka and
then using Gelly i decompose the graph into its nodes and edges and
then process them individually through substreams and then write the
final output of processing the graph somewhere.

I saw Gelly is for batch processing but had this question if it
supports above , it will solve my entire use case.

Regards
Dipanjan

On Friday, September 10, 2021, 09:50:08 AM GMT+5:30, JING ZHANG
mailto:beyond1...@gmail.com>> wrote:


Hi Dipanjan,
Base your description, I think Flink could handle this user case.
Don't worry that Flink can't handle this kind of data scale because
Flink is a distributed engine. As long as the problem of data skew
is carefully avoided, the input throughput can be handled through
appropriate resources.

Best,
JING ZHANG

Dipanjan Mazumder mailto:java...@yahoo.com>> 于
2021年9月10日周五 上午11:11写道:

Hi,

    I am working on a usecase and thinking of using flink for
the same. The use case is i will be having many large resource
graphs , i need to parse that graph for each node and edge and
evaluate each one of them against some suddhi rules , right now
the implementation for evaluating individual entities with flink
and siddhi are in place , but i am in dilemma whether i should
do the graph processing as well in flink or not.
So this is what i am planning to do

 From kafka will fetch the graph , decompose the graph into
nodes and edges , fetch additional meradata for each node and
edge from different Rest API’s and then pass the individual
nodes and edges which are resources to different substreams
which are already inplace and rules will work on individual
substreams to process individual nodes and edges and finally
they will spit the rule output into a stream. I will collate all
of them based on the graph id from that stream using another
operator and send the final result to an outputstream.

This is what i am thinking , now need input from all of you
whether this is a fair usecase to do with flink , will flink be
able to handle this level of processing at scale and volume or not.

Any help input will ease my understanding and will help me go
ahead with this idea.

Regard
dipanjan





Re: Usecase for flink

2021-09-10 Thread Dipanjan Mazumder
 Good point what is the better option for graph processing with flink.. any 
suggestions
On Friday, September 10, 2021, 04:52:30 PM GMT+5:30, Martijn Visser 
 wrote:  
 
 Hi,
Please keep in mind that Gelly is approaching end-of-life [1] 
Regards,
Martijn
[1] https://flink.apache.org/roadmap.html
On Fri, 10 Sept 2021 at 09:09, Dipanjan Mazumder  wrote:

 Hi Jing,
    Thanks for the input another question i had was can Gelly be used for 
processing the graph that flink receives through kafka and then using Gelly i 
decompose the graph into its nodes and edges and then process them individually 
through substreams and then write the final output of processing the graph 
somewhere. 
I saw Gelly is for batch processing but had this question if it supports above 
, it will solve my entire use case.
RegardsDipanjan
On Friday, September 10, 2021, 09:50:08 AM GMT+5:30, JING ZHANG 
 wrote:  
 
 Hi Dipanjan,Base your description, I think Flink could handle this user case. 
Don't worry that Flink can't handle this kind of data scale because Flink is a 
distributed engine. As long as the problem of data skew is carefully avoided, 
the input throughput can be handled through appropriate resources.

Best,JING ZHANG
Dipanjan Mazumder  于2021年9月10日周五 上午11:11写道:

Hi,
   I am working on a usecase and thinking of using flink for the same. The use 
case is i will be having many large resource graphs , i need to parse that 
graph for each node and edge and evaluate each one of them against some suddhi 
rules , right now the implementation for evaluating individual entities with 
flink and siddhi are in place , but i am in dilemma whether i should do the 
graph processing as well in flink or not.So this is what i am planning to do
>From kafka will fetch the graph , decompose the graph into nodes and edges , 
>fetch additional meradata for each node and edge from different Rest API’s and 
>then pass the individual nodes and edges which are resources to different 
>substreams which are already inplace and rules will work on individual 
>substreams to process individual nodes and edges and finally they will spit 
>the rule output into a stream. I will collate all of them based on the graph 
>id from that stream using another operator and send the final result to an 
>outputstream.
This is what i am thinking , now need input from all of you whether this is a 
fair usecase to do with flink , will flink be able to handle this level of 
processing at scale and volume or not.
Any help input will ease my understanding and will help me go ahead with this 
idea.
Regarddipanjan
  
  

Re: Usecase for flink

2021-09-10 Thread Martijn Visser
Hi,

Please keep in mind that Gelly is approaching end-of-life [1]

Regards,

Martijn

[1] https://flink.apache.org/roadmap.html

On Fri, 10 Sept 2021 at 09:09, Dipanjan Mazumder  wrote:

> Hi Jing,
>
> Thanks for the input another question i had was can Gelly be used for
> processing the graph that flink receives through kafka and then using Gelly
> i decompose the graph into its nodes and edges and then process them
> individually through substreams and then write the final output of
> processing the graph somewhere.
>
> I saw Gelly is for batch processing but had this question if it supports
> above , it will solve my entire use case.
>
> Regards
> Dipanjan
>
> On Friday, September 10, 2021, 09:50:08 AM GMT+5:30, JING ZHANG <
> beyond1...@gmail.com> wrote:
>
>
> Hi Dipanjan,
> Base your description, I think Flink could handle this user case.
> Don't worry that Flink can't handle this kind of data scale because Flink
> is a distributed engine. As long as the problem of data skew is carefully
> avoided, the input throughput can be handled through appropriate resources.
>
> Best,
> JING ZHANG
>
> Dipanjan Mazumder  于2021年9月10日周五 上午11:11写道:
>
> Hi,
>
>I am working on a usecase and thinking of using flink for the same.
> The use case is i will be having many large resource graphs , i need to
> parse that graph for each node and edge and evaluate each one of them
> against some suddhi rules , right now the implementation for evaluating
> individual entities with flink and siddhi are in place , but i am in
> dilemma whether i should do the graph processing as well in flink or not.
> So this is what i am planning to do
>
> From kafka will fetch the graph , decompose the graph into nodes and edges
> , fetch additional meradata for each node and edge from different Rest
> API’s and then pass the individual nodes and edges which are resources to
> different substreams which are already inplace and rules will work on
> individual substreams to process individual nodes and edges and finally
> they will spit the rule output into a stream. I will collate all of them
> based on the graph id from that stream using another operator and send the
> final result to an outputstream.
>
> This is what i am thinking , now need input from all of you whether this
> is a fair usecase to do with flink , will flink be able to handle this
> level of processing at scale and volume or not.
>
> Any help input will ease my understanding and will help me go ahead with
> this idea.
>
> Regard
> dipanjan
>
>


Re: Usecase for flink

2021-09-10 Thread Timo Walther

Hi Dipanjan,

Gelly is built on top of the DataSet API which is a batch-only API that 
is slowly phasing out.


It is not possible to connect a DataStream API program with a DataSet 
API program unless you go through a connector such as CSV in between.


Regards,
Timo


On 10.09.21 09:09, Dipanjan Mazumder wrote:

Hi Jing,

     Thanks for the input another question i had was can Gelly be used 
for processing the graph that flink receives through kafka and then 
using Gelly i decompose the graph into its nodes and edges and then 
process them individually through substreams and then write the final 
output of processing the graph somewhere.


I saw Gelly is for batch processing but had this question if it supports 
above , it will solve my entire use case.


Regards
Dipanjan

On Friday, September 10, 2021, 09:50:08 AM GMT+5:30, JING ZHANG 
 wrote:



Hi Dipanjan,
Base your description, I think Flink could handle this user case.
Don't worry that Flink can't handle this kind of data scale because 
Flink is a distributed engine. As long as the problem of data skew is 
carefully avoided, the input throughput can be handled through 
appropriate resources.


Best,
JING ZHANG

Dipanjan Mazumder mailto:java...@yahoo.com>> 于2021 
年9月10日周五 上午11:11写道:


Hi,

    I am working on a usecase and thinking of using flink for the
same. The use case is i will be having many large resource graphs ,
i need to parse that graph for each node and edge and evaluate each
one of them against some suddhi rules , right now the implementation
for evaluating individual entities with flink and siddhi are in
place , but i am in dilemma whether i should do the graph processing
as well in flink or not.
So this is what i am planning to do

 From kafka will fetch the graph , decompose the graph into nodes
and edges , fetch additional meradata for each node and edge from
different Rest API’s and then pass the individual nodes and edges
which are resources to different substreams which are already
inplace and rules will work on individual substreams to process
individual nodes and edges and finally they will spit the rule
output into a stream. I will collate all of them based on the graph
id from that stream using another operator and send the final result
to an outputstream.

This is what i am thinking , now need input from all of you whether
this is a fair usecase to do with flink , will flink be able to
handle this level of processing at scale and volume or not.

Any help input will ease my understanding and will help me go ahead
with this idea.

Regard
dipanjan





Re: Usecase for flink

2021-09-10 Thread Dipanjan Mazumder
 Hi Jing,
    Thanks for the input another question i had was can Gelly be used for 
processing the graph that flink receives through kafka and then using Gelly i 
decompose the graph into its nodes and edges and then process them individually 
through substreams and then write the final output of processing the graph 
somewhere. 
I saw Gelly is for batch processing but had this question if it supports above 
, it will solve my entire use case.
RegardsDipanjan
On Friday, September 10, 2021, 09:50:08 AM GMT+5:30, JING ZHANG 
 wrote:  
 
 Hi Dipanjan,Base your description, I think Flink could handle this user case. 
Don't worry that Flink can't handle this kind of data scale because Flink is a 
distributed engine. As long as the problem of data skew is carefully avoided, 
the input throughput can be handled through appropriate resources.

Best,JING ZHANG
Dipanjan Mazumder  于2021年9月10日周五 上午11:11写道:

Hi,
   I am working on a usecase and thinking of using flink for the same. The use 
case is i will be having many large resource graphs , i need to parse that 
graph for each node and edge and evaluate each one of them against some suddhi 
rules , right now the implementation for evaluating individual entities with 
flink and siddhi are in place , but i am in dilemma whether i should do the 
graph processing as well in flink or not.So this is what i am planning to do
>From kafka will fetch the graph , decompose the graph into nodes and edges , 
>fetch additional meradata for each node and edge from different Rest API’s and 
>then pass the individual nodes and edges which are resources to different 
>substreams which are already inplace and rules will work on individual 
>substreams to process individual nodes and edges and finally they will spit 
>the rule output into a stream. I will collate all of them based on the graph 
>id from that stream using another operator and send the final result to an 
>outputstream.
This is what i am thinking , now need input from all of you whether this is a 
fair usecase to do with flink , will flink be able to handle this level of 
processing at scale and volume or not.
Any help input will ease my understanding and will help me go ahead with this 
idea.
Regarddipanjan
  

Re: Usecase for flink

2021-09-09 Thread JING ZHANG
Hi Dipanjan,
Base your description, I think Flink could handle this user case.
Don't worry that Flink can't handle this kind of data scale because Flink
is a distributed engine. As long as the problem of data skew is carefully
avoided, the input throughput can be handled through appropriate resources.

Best,
JING ZHANG

Dipanjan Mazumder  于2021年9月10日周五 上午11:11写道:

> Hi,
>
>I am working on a usecase and thinking of using flink for the same.
> The use case is i will be having many large resource graphs , i need to
> parse that graph for each node and edge and evaluate each one of them
> against some suddhi rules , right now the implementation for evaluating
> individual entities with flink and siddhi are in place , but i am in
> dilemma whether i should do the graph processing as well in flink or not.
> So this is what i am planning to do
>
> From kafka will fetch the graph , decompose the graph into nodes and edges
> , fetch additional meradata for each node and edge from different Rest
> API’s and then pass the individual nodes and edges which are resources to
> different substreams which are already inplace and rules will work on
> individual substreams to process individual nodes and edges and finally
> they will spit the rule output into a stream. I will collate all of them
> based on the graph id from that stream using another operator and send the
> final result to an outputstream.
>
> This is what i am thinking , now need input from all of you whether this
> is a fair usecase to do with flink , will flink be able to handle this
> level of processing at scale and volume or not.
>
> Any help input will ease my understanding and will help me go ahead with
> this idea.
>
> Regard
> dipanjan
>


Re: Usecase for Flink

2015-12-19 Thread igor.berman
thanks Stephan,
yes, you got usecase right



--
View this message in context: 
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Usecase-for-Flink-tp4076p4092.html
Sent from the Apache Flink User Mailing List archive. mailing list archive at 
Nabble.com.


Re: Usecase for Flink

2015-12-18 Thread Stephan Ewen
If I understand you correctly, you want to write something like:

--

[cassandra]
  ^
  |
  V
(event source) > (Add event and lookup) ---> (further ops)

--

That should work with Flink, yes. You can communicate with an external
Cassandra service inside functions.

We are also working on making larger-than-memory state easily supported in
Flink, so future versions may allow you to
do this without any external service.






On Thu, Dec 17, 2015 at 8:54 AM, igor.berman  wrote:

> Hi,
> We are looking at Flink and trying to understand if our usecase is relevant
> to it.
>
> We need process stream of events. Each event is for some id(e.g. device
> id),
> when each event should be
> 1. stored in some persistent storage(e.g. cassandra)
> 2. previously persisted events should be fetched and some computation over
> whole history may or may not trigger some other events(e.g. sending email)
>
> so yes we have stream of events, but we need persistent store(aka external
> service) in the middle
> and there is no aggregation of those events into something smaller which
> could be stored in memory, i.e. number of ids might be huge and previous
> history of events per each id can be considerable so that no way to store
> everything in memory
>
> I was wondering if akka stream is sort of optional solution too
>
> please share your ideas :)
> thanks in advance,
> Igor
>
>
>
> --
> View this message in context:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Usecase-for-Flink-tp4076.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive
> at Nabble.com.
>