Re: How could I do this algorithm in Spark?

Guillermo Ortiz Thu, 25 Feb 2016 04:50:06 -0800

I'm taking a look to Pregel. It seems it's a good way to do it. The only
negative thing that I see it's not a really complex graph with a lot of
edges between the vertex .. They are more like a lot of isolated small
graphs


2016-02-25 12:32 GMT+01:00 Robin East <robin.e...@xense.co.uk>:

> The structures you are describing look like edges of a graph and you want
> to follow the graph to a terminal vertex and then propagate that value back
> up the path. On this assumption it would be simple to create the structures
> as graphs in GraphX and use Pregel for the algorithm implementation.
>
> -------------------------------------------------------------------------------
> Robin East
> *Spark GraphX in Action* Michael Malak and Robin East
> Manning Publications Co.
> http://www.manning.com/books/spark-graphx-in-action
>
>
>
>
>
> On 25 Feb 2016, at 10:52, Guillermo Ortiz <konstt2...@gmail.com> wrote:
>
> Oh, the letters were just an example, it could be:
> a , t
> b, o
> t, k
> k, c
>
> So.. a -> t -> k -> c and the result is: a,c; t,c; k,c and b,o
> I don't know if you were thinking about sortBy because the another example
> where letter were consecutive.
>
>
> 2016-02-25 9:42 GMT+01:00 Guillermo Ortiz <konstt2...@gmail.com>:
>
>> I don't see that sorting the data helps.
>> The answer has to be all the associations. In this case the answer has to
>> be:
>> a , b --> it was a error in the question, sorry.
>> b , d
>> c , d
>> x , y
>> y , y
>>
>> I feel like all the data which is associate should be in the same
>> executor.
>> On this case if I order the inputs.
>> a , b
>> x , y
>> b , c
>> y , y
>> c , d
>> --> to
>> a , b
>> b , c
>> c , d
>> x , y
>> y , y
>>
>> Now, a,b ; b,c; one partitions for example, "c,d" and "x,y" another one
>> and so on.
>> I could get the relation between "a,b,c", but not about "d" with "a,b,c",
>> am I wrong? I hope to be wrong!.
>>
>> It seems that it could be done with GraphX, but as you said, it seems a
>> little bit overhead.
>>
>>
>> 2016-02-25 5:43 GMT+01:00 James Barney <jamesbarne...@gmail.com>:
>>
>>> Guillermo,
>>> I think you're after an associative algorithm where A is ultimately
>>> associated with D, correct? Jakob would correct if that is a typo--a sort
>>> would be all that is necessary in that case.
>>>
>>> I believe you're looking for something else though, if I understand
>>> correctly.
>>>
>>> This seems like a similar algorithm to PageRank, no?
>>> https://github.com/amplab/graphx/blob/master/python/examples/pagerank.py
>>> Except return the "neighbor" itself, not the necessarily the rank of the
>>> page.
>>>
>>> If you wanted to, use Scala and Graphx for this problem. Might be a bit
>>> of overhead though: Construct a node for each member of each tuple with an
>>> edge between. Then traverse the graph for all sets of nodes that are
>>> connected. That result set would quickly explode in size, but you could
>>> restrict results to a minimum N connections. I'm not super familiar with
>>> Graphx myself, however. My intuition is saying 'graph problem' though.
>>>
>>> Thoughts?
>>>
>>>
>>> On Wed, Feb 24, 2016 at 6:43 PM, Jakob Odersky <ja...@odersky.com>
>>> wrote:
>>>
>>>> Hi Guillermo,
>>>> assuming that the first "a,b" is a typo and you actually meant "a,d",
>>>> this is a sorting problem.
>>>>
>>>> You could easily model your data as an RDD or tuples (or as a
>>>> dataframe/set) and use the sortBy (or orderBy for dataframe/sets)
>>>> methods.
>>>>
>>>> best,
>>>> --Jakob
>>>>
>>>> On Wed, Feb 24, 2016 at 2:26 PM, Guillermo Ortiz <konstt2...@gmail.com>
>>>> wrote:
>>>> > I want to do some algorithm in Spark.. I know how to do it in a single
>>>> > machine where all data are together, but I don't know a good way to
>>>> do it in
>>>> > Spark.
>>>> >
>>>> > If someone has an idea..
>>>> > I have some data like this
>>>> > a , b
>>>> > x , y
>>>> > b , c
>>>> > y , y
>>>> > c , d
>>>> >
>>>> > I want something like:
>>>> > a , d
>>>> > b , d
>>>> > c , d
>>>> > x , y
>>>> > y , y
>>>> >
>>>> > I need to know that a->b->c->d, so a->d, b->d and c->d.
>>>> > I don't want the code, just an idea how I could deal with it.
>>>> >
>>>> > Any idea?
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
>>>> For additional commands, e-mail: user-h...@spark.apache.org
>>>>
>>>>
>>>
>>
>
>

Re: How could I do this algorithm in Spark?

Reply via email to