Re: Do GraphFrames support streaming?

2018-07-15 Thread kant kodali
I have tried this sort of approach in other streaming cases I ran into and
I believe the problem with this approach is

1) we got one stream (say stream1) going to disk say HDFS or a Database and
we got another Stream (say stream2) where for every row in stream2 we make
an I/O call to see if we can join with a row or rows in stream1 but this
would be too many I/O calls if we were trying to make an I/O call for every
row.
2) we could say we can make an I/O call per RDD partition in stream2 then
there is a possibility that we run into Full Table Scan issues as data from
stream1 gets big.

so I wonder if anyone was able to implement this approach in production
successfully(by which I mean making sure it is not resource intensive)?

Thanks!

On Sat, Jul 14, 2018 at 9:18 AM, Jörn Franke  wrote:

> No, streaming dataframe needs to be written to disk or similar (or an
> in-memory backend) then when the next stream arrive join them - create
> graph and store the next stream together with the existing stream on disk
> etc.
>
> On 14. Jul 2018, at 17:19, kant kodali  wrote:
>
> The question now would be can it be done in streaming fashion? Are you
> talking about the union of two streaming dataframes and then constructing a
> graphframe (also during streaming) ?
>
> On Sat, Jul 14, 2018 at 8:07 AM, Jörn Franke  wrote:
>
>> For your use case one might indeed be able to work simply with
>> incremental graph updates. However they are not straight forward in Spark.
>> You can union the new Data with the existing dataframes that represent your
>> graph and create from that a new graph frame.
>>
>> However I am not sure if this will fully fulfill your requirement for
>> incremental graph updates.
>>
>> On 14. Jul 2018, at 15:59, kant kodali  wrote:
>>
>> "You want to update incrementally an existing graph and run
>> incrementally a graph algorithm suitable for this - you have to
>> implement yourself as far as I am aware"
>>
>> I want to update the graph incrementally and want to run some graph
>> queries similar to Cypher like give me all the vertices that are connected
>> by a specific set of edges and so on. Don't really intend to run graph
>> algorithms like ConnectedComponents or anything else at this point but of
>> course, it's great to have.
>>
>> If we were to do this myself should I extend the GraphFrame? any
>> suggestions?
>>
>>
>> On Sun, Apr 29, 2018 at 3:24 AM, Jörn Franke 
>> wrote:
>>
>>> What is the use case you are trying to solve?
>>> You want to load graph data from a streaming window in separate graphs -
>>> possible but requires probably a lot of memory.
>>> You want to update an existing graph with new streaming data and then
>>> fully rerun an algorithms -> look at Janusgraph
>>> You want to update incrementally an existing graph and run incrementally
>>> a graph algorithm suitable for this - you have to implement yourself as far
>>> as I am aware
>>>
>>> > On 29. Apr 2018, at 11:43, kant kodali  wrote:
>>> >
>>> > Do GraphFrames support streaming?
>>>
>>
>>
>


Re: Do GraphFrames support streaming?

2018-07-14 Thread Jörn Franke
No, streaming dataframe needs to be written to disk or similar (or an in-memory 
backend) then when the next stream arrive join them - create graph and store 
the next stream together with the existing stream on disk etc.

> On 14. Jul 2018, at 17:19, kant kodali  wrote:
> 
> The question now would be can it be done in streaming fashion? Are you 
> talking about the union of two streaming dataframes and then constructing a 
> graphframe (also during streaming) ?
> 
>> On Sat, Jul 14, 2018 at 8:07 AM, Jörn Franke  wrote:
>> For your use case one might indeed be able to work simply with incremental 
>> graph updates. However they are not straight forward in Spark. You can union 
>> the new Data with the existing dataframes that represent your graph and 
>> create from that a new graph frame.
>> 
>> However I am not sure if this will fully fulfill your requirement for 
>> incremental graph updates.
>> 
>>> On 14. Jul 2018, at 15:59, kant kodali  wrote:
>>> 
>>> "You want to update incrementally an existing graph and run incrementally a 
>>> graph algorithm suitable for this - you have to implement yourself as far 
>>> as I am aware"
>>> 
>>> I want to update the graph incrementally and want to run some graph queries 
>>> similar to Cypher like give me all the vertices that are connected by a 
>>> specific set of edges and so on. Don't really intend to run graph 
>>> algorithms like ConnectedComponents or anything else at this point but of 
>>> course, it's great to have.
>>> 
>>> If we were to do this myself should I extend the GraphFrame? any 
>>> suggestions?
>>> 
>>> 
>>>> On Sun, Apr 29, 2018 at 3:24 AM, Jörn Franke  wrote:
>>>> What is the use case you are trying to solve?
>>>> You want to load graph data from a streaming window in separate graphs - 
>>>> possible but requires probably a lot of memory. 
>>>> You want to update an existing graph with new streaming data and then 
>>>> fully rerun an algorithms -> look at Janusgraph
>>>> You want to update incrementally an existing graph and run incrementally a 
>>>> graph algorithm suitable for this - you have to implement yourself as far 
>>>> as I am aware
>>>> 
>>>> > On 29. Apr 2018, at 11:43, kant kodali  wrote:
>>>> > 
>>>> > Do GraphFrames support streaming?
>>> 
> 


Re: Do GraphFrames support streaming?

2018-07-14 Thread kant kodali
The question now would be can it be done in streaming fashion? Are you
talking about the union of two streaming dataframes and then constructing a
graphframe (also during streaming) ?

On Sat, Jul 14, 2018 at 8:07 AM, Jörn Franke  wrote:

> For your use case one might indeed be able to work simply with incremental
> graph updates. However they are not straight forward in Spark. You can
> union the new Data with the existing dataframes that represent your graph
> and create from that a new graph frame.
>
> However I am not sure if this will fully fulfill your requirement for
> incremental graph updates.
>
> On 14. Jul 2018, at 15:59, kant kodali  wrote:
>
> "You want to update incrementally an existing graph and run incrementally
> a graph algorithm suitable for this - you have to implement yourself as
> far as I am aware"
>
> I want to update the graph incrementally and want to run some graph
> queries similar to Cypher like give me all the vertices that are connected
> by a specific set of edges and so on. Don't really intend to run graph
> algorithms like ConnectedComponents or anything else at this point but of
> course, it's great to have.
>
> If we were to do this myself should I extend the GraphFrame? any
> suggestions?
>
>
> On Sun, Apr 29, 2018 at 3:24 AM, Jörn Franke  wrote:
>
>> What is the use case you are trying to solve?
>> You want to load graph data from a streaming window in separate graphs -
>> possible but requires probably a lot of memory.
>> You want to update an existing graph with new streaming data and then
>> fully rerun an algorithms -> look at Janusgraph
>> You want to update incrementally an existing graph and run incrementally
>> a graph algorithm suitable for this - you have to implement yourself as far
>> as I am aware
>>
>> > On 29. Apr 2018, at 11:43, kant kodali  wrote:
>> >
>> > Do GraphFrames support streaming?
>>
>
>


Re: Do GraphFrames support streaming?

2018-07-14 Thread Jörn Franke
For your use case one might indeed be able to work simply with incremental 
graph updates. However they are not straight forward in Spark. You can union 
the new Data with the existing dataframes that represent your graph and create 
from that a new graph frame.

However I am not sure if this will fully fulfill your requirement for 
incremental graph updates.

> On 14. Jul 2018, at 15:59, kant kodali  wrote:
> 
> "You want to update incrementally an existing graph and run incrementally a 
> graph algorithm suitable for this - you have to implement yourself as far as 
> I am aware"
> 
> I want to update the graph incrementally and want to run some graph queries 
> similar to Cypher like give me all the vertices that are connected by a 
> specific set of edges and so on. Don't really intend to run graph algorithms 
> like ConnectedComponents or anything else at this point but of course, it's 
> great to have.
> 
> If we were to do this myself should I extend the GraphFrame? any suggestions?
> 
> 
>> On Sun, Apr 29, 2018 at 3:24 AM, Jörn Franke  wrote:
>> What is the use case you are trying to solve?
>> You want to load graph data from a streaming window in separate graphs - 
>> possible but requires probably a lot of memory. 
>> You want to update an existing graph with new streaming data and then fully 
>> rerun an algorithms -> look at Janusgraph
>> You want to update incrementally an existing graph and run incrementally a 
>> graph algorithm suitable for this - you have to implement yourself as far as 
>> I am aware
>> 
>> > On 29. Apr 2018, at 11:43, kant kodali  wrote:
>> > 
>> > Do GraphFrames support streaming?
> 


Re: Do GraphFrames support streaming?

2018-07-14 Thread kant kodali
"You want to update incrementally an existing graph and run incrementally a
graph algorithm suitable for this - you have to implement yourself as far
as I am aware"

I want to update the graph incrementally and want to run some graph queries
similar to Cypher like give me all the vertices that are connected by a
specific set of edges and so on. Don't really intend to run graph
algorithms like ConnectedComponents or anything else at this point but of
course, it's great to have.

If we were to do this myself should I extend the GraphFrame? any
suggestions?


On Sun, Apr 29, 2018 at 3:24 AM, Jörn Franke  wrote:

> What is the use case you are trying to solve?
> You want to load graph data from a streaming window in separate graphs -
> possible but requires probably a lot of memory.
> You want to update an existing graph with new streaming data and then
> fully rerun an algorithms -> look at Janusgraph
> You want to update incrementally an existing graph and run incrementally a
> graph algorithm suitable for this - you have to implement yourself as far
> as I am aware
>
> > On 29. Apr 2018, at 11:43, kant kodali  wrote:
> >
> > Do GraphFrames support streaming?
>


Re: Do GraphFrames support streaming?

2018-04-29 Thread Jörn Franke
What is the use case you are trying to solve?
You want to load graph data from a streaming window in separate graphs - 
possible but requires probably a lot of memory. 
You want to update an existing graph with new streaming data and then fully 
rerun an algorithms -> look at Janusgraph
You want to update incrementally an existing graph and run incrementally a 
graph algorithm suitable for this - you have to implement yourself as far as I 
am aware

> On 29. Apr 2018, at 11:43, kant kodali <kanth...@gmail.com> wrote:
> 
> Do GraphFrames support streaming?

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Do GraphFrames support streaming?

2018-04-29 Thread kant kodali
Do GraphFrames support streaming?