I'm taking a look to Pregel. It seems it's a good way to do it. The only negative thing that I see it's not a really complex graph with a lot of edges between the vertex .. They are more like a lot of isolated small graphs
2016-02-25 12:32 GMT+01:00 Robin East <robin.e...@xense.co.uk>: > The structures you are describing look like edges of a graph and you want > to follow the graph to a terminal vertex and then propagate that value back > up the path. On this assumption it would be simple to create the structures > as graphs in GraphX and use Pregel for the algorithm implementation. > > ------------------------------------------------------------------------------- > Robin East > *Spark GraphX in Action* Michael Malak and Robin East > Manning Publications Co. > http://www.manning.com/books/spark-graphx-in-action > > > > > > On 25 Feb 2016, at 10:52, Guillermo Ortiz <konstt2...@gmail.com> wrote: > > Oh, the letters were just an example, it could be: > a , t > b, o > t, k > k, c > > So.. a -> t -> k -> c and the result is: a,c; t,c; k,c and b,o > I don't know if you were thinking about sortBy because the another example > where letter were consecutive. > > > 2016-02-25 9:42 GMT+01:00 Guillermo Ortiz <konstt2...@gmail.com>: > >> I don't see that sorting the data helps. >> The answer has to be all the associations. In this case the answer has to >> be: >> a , b --> it was a error in the question, sorry. >> b , d >> c , d >> x , y >> y , y >> >> I feel like all the data which is associate should be in the same >> executor. >> On this case if I order the inputs. >> a , b >> x , y >> b , c >> y , y >> c , d >> --> to >> a , b >> b , c >> c , d >> x , y >> y , y >> >> Now, a,b ; b,c; one partitions for example, "c,d" and "x,y" another one >> and so on. >> I could get the relation between "a,b,c", but not about "d" with "a,b,c", >> am I wrong? I hope to be wrong!. >> >> It seems that it could be done with GraphX, but as you said, it seems a >> little bit overhead. >> >> >> 2016-02-25 5:43 GMT+01:00 James Barney <jamesbarne...@gmail.com>: >> >>> Guillermo, >>> I think you're after an associative algorithm where A is ultimately >>> associated with D, correct? Jakob would correct if that is a typo--a sort >>> would be all that is necessary in that case. >>> >>> I believe you're looking for something else though, if I understand >>> correctly. >>> >>> This seems like a similar algorithm to PageRank, no? >>> https://github.com/amplab/graphx/blob/master/python/examples/pagerank.py >>> Except return the "neighbor" itself, not the necessarily the rank of the >>> page. >>> >>> If you wanted to, use Scala and Graphx for this problem. Might be a bit >>> of overhead though: Construct a node for each member of each tuple with an >>> edge between. Then traverse the graph for all sets of nodes that are >>> connected. That result set would quickly explode in size, but you could >>> restrict results to a minimum N connections. I'm not super familiar with >>> Graphx myself, however. My intuition is saying 'graph problem' though. >>> >>> Thoughts? >>> >>> >>> On Wed, Feb 24, 2016 at 6:43 PM, Jakob Odersky <ja...@odersky.com> >>> wrote: >>> >>>> Hi Guillermo, >>>> assuming that the first "a,b" is a typo and you actually meant "a,d", >>>> this is a sorting problem. >>>> >>>> You could easily model your data as an RDD or tuples (or as a >>>> dataframe/set) and use the sortBy (or orderBy for dataframe/sets) >>>> methods. >>>> >>>> best, >>>> --Jakob >>>> >>>> On Wed, Feb 24, 2016 at 2:26 PM, Guillermo Ortiz <konstt2...@gmail.com> >>>> wrote: >>>> > I want to do some algorithm in Spark.. I know how to do it in a single >>>> > machine where all data are together, but I don't know a good way to >>>> do it in >>>> > Spark. >>>> > >>>> > If someone has an idea.. >>>> > I have some data like this >>>> > a , b >>>> > x , y >>>> > b , c >>>> > y , y >>>> > c , d >>>> > >>>> > I want something like: >>>> > a , d >>>> > b , d >>>> > c , d >>>> > x , y >>>> > y , y >>>> > >>>> > I need to know that a->b->c->d, so a->d, b->d and c->d. >>>> > I don't want the code, just an idea how I could deal with it. >>>> > >>>> > Any idea? >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >>>> For additional commands, e-mail: user-h...@spark.apache.org >>>> >>>> >>> >> > >