Re: Looping over a DataSet and accesing another DataSet
I just found out that I am able to use arrays in tuple values, nvm about that question -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Looping-over-a-DataSet-and-accesing-another-DataSet-tp9778p9850.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.
Re: Looping over a DataSet and accesing another DataSet
I did mean the iteratino yes, I currently solved the problem by rewriting the algorithm in gelly's GathersumApply model, thnx for the tips I had another question regarding the original message, about appending items to a list, how would I do that? Because afaik it's not possible to add a list or array in a Tuple element right? -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Looping-over-a-DataSet-and-accesing-another-DataSet-tp9778p9843.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.
Re: Looping over a DataSet and accesing another DataSet
By 'loop' do you refer to an iteration? The output of a bulk iteration is processed as the input of the following iteration. Values updated in an iteration are available in the next iteration just as values updated by an operator are available to the following operator. Your chosen algorithm may not be a good fit for distributed processing frameworks like Flink, Spark, and Hadoop. You may need to recast your problem into an appropriate, scalable algorithm. Both the Gelly and Machine Learning libraries have good examples of efficient, scalable algorithms (Flink's "examples" demonstrate specific functionality). Greg On Mon, Oct 31, 2016 at 8:52 AM, otherwise777 <wou...@onzichtbaar.net> wrote: > Thank you for your reply, this is new information for me, > > Regarding the algorithm, i gave it a better look and i don't think it will > work with joining. When looping over the Edge set (u,v) we need to be able > to write and read A[u] and A[v]. If i join them it will create a new > instances of that value and it doesn't matter if it's changed in one > instance. > > For example i have the following edges: > u v > 1 2 > 1 3 > > With vertices and values: > 1 a > 2 b > 3 c > > If i join them i get: > u v u' v' > 1 2 a b > 1 3 a c > > If i loop over the joined set and change the u' value of the first instance > to "d" then in my next loop step it will be 'a'. > > > > > -- > View this message in context: http://apache-flink-user- > mailing-list-archive.2336050.n4.nabble.com/Looping-over-a- > DataSet-and-accesing-another-DataSet-tp9778p9784.html > Sent from the Apache Flink User Mailing List archive. mailing list archive > at Nabble.com. >
Re: Looping over a DataSet and accesing another DataSet
Thank you for your reply, this is new information for me, Regarding the algorithm, i gave it a better look and i don't think it will work with joining. When looping over the Edge set (u,v) we need to be able to write and read A[u] and A[v]. If i join them it will create a new instances of that value and it doesn't matter if it's changed in one instance. For example i have the following edges: u v 1 2 1 3 With vertices and values: 1 a 2 b 3 c If i join them i get: u v u' v' 1 2 a b 1 3 a c If i loop over the joined set and change the u' value of the first instance to "d" then in my next loop step it will be 'a'. -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Looping-over-a-DataSet-and-accesing-another-DataSet-tp9778p9784.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.
Re: Looping over a DataSet and accesing another DataSet
The DataSet API only supports binary joins but one can simulate an n-ary join by chaining successive join operations. Your algorithm requires a global ordering on edges, requiring a parallelism of 1, and will not scale in a distributed processing system. Flink excels at processing bulk (larger than memory) data in serial. Greg On Mon, Oct 31, 2016 at 5:54 AM, otherwise777 <wou...@onzichtbaar.net> wrote: > Thank you for your reply and explanation, I think there is one issue with > your method though, you said that i should make a join with the the key > value pair A on v and the Edge set (u,v), this would work, however i not > only need to access A[v] in one iteration but also A[u], so if i join on v > that won't be possible > > Did i understand it correctly? > > > > -- > View this message in context: http://apache-flink-user- > mailing-list-archive.2336050.n4.nabble.com/Looping-over-a- > DataSet-and-accesing-another-DataSet-tp9778p9782.html > Sent from the Apache Flink User Mailing List archive. mailing list archive > at Nabble.com. >
Re: Looping over a DataSet and accesing another DataSet
Thank you for your reply and explanation, I think there is one issue with your method though, you said that i should make a join with the the key value pair A on v and the Edge set (u,v), this would work, however i not only need to access A[v] in one iteration but also A[u], so if i join on v that won't be possible Did i understand it correctly? -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Looping-over-a-DataSet-and-accesing-another-DataSet-tp9778p9782.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.
Re: Looping over a DataSet and accesing another DataSet
Hello, In Flink, one often used way to access data from multiple DataSets at the same time is to perform a join (Flink actually calls equi-joins [1] just "join"), just as in the database world. For example, in the algorithm that you linked, you access A[u] for every edge (u,v). I assume that you have stored A in a DataSet of (index, value) pairs. You can achieve this access pattern by performing a join, and in the join condition you specify that the first endpoint of the edge should be equal to the index of A. This way, you get a DataSet where every record contains an edge (u,v) and also A[u], so you can do a map on this where the UDF of your map will get (u,v) and A[u]. Your algorithm also accesses A[v], which can be achieved by performing a second join that is similar to the first (using the result of the first). However, the updating of P will be more tricky to translate to Flink. I'm not sure I undersand the linked algorithm correctly: does every element of P contain a list, and the + means appending an element to a list? (in the line P[v] = P[u] + v) Best, Gábor [1] https://en.wikipedia.org/wiki/Join_(SQL)#Equi-join 2016-10-30 8:25 GMT+01:00 otherwise777 <wou...@onzichtbaar.net>: > Currently i'm trying to implement this algorithm [1] which requires me to > loop over one DataSet (the edges) and access another DataSet (the vertices), > for this loop i use a Mapping (i'm not sure if this is the correct way of > looping over a DataSet) but i don't know how to access the elements of > another DataSet while i'm looping over one. > > I know Gelly also has iterative support for these kind of things, but they > loop over the Vertices and not the Edges > > [1] http://prntscr.com/d0qeyd > > > > -- > View this message in context: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Looping-over-a-DataSet-and-accesing-another-DataSet-tp9778.html > Sent from the Apache Flink User Mailing List archive. mailing list archive at > Nabble.com.
Looping over a DataSet and accesing another DataSet
Currently i'm trying to implement this algorithm [1] which requires me to loop over one DataSet (the edges) and access another DataSet (the vertices), for this loop i use a Mapping (i'm not sure if this is the correct way of looping over a DataSet) but i don't know how to access the elements of another DataSet while i'm looping over one. I know Gelly also has iterative support for these kind of things, but they loop over the Vertices and not the Edges [1] http://prntscr.com/d0qeyd -- View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Looping-over-a-DataSet-and-accesing-another-DataSet-tp9778.html Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.