Hi Vasia, sorry that I should have read the archive before (it's already been posted in FLINK-1526, though with an ugly format). Now everything's clear and I think this thread should be closed here.
Thanks. @Vasia @Greg Best, Xingcan On Tue, Feb 14, 2017 at 3:55 PM, Vasiliki Kalavri <vasilikikala...@gmail.com > wrote: > Hi Xingcan, > > that's my bad, I was thinking of scatter-gather iterations in my previous > reply. You're right, in VertexCentricIteration a vertex is only active in > the next superstep if it has received at least one message in the current > superstep. Updating its value does not impact the activation. This is > intentional in the vertex-centric model. > > I agree that the current design of the iterative models is restrictive and > doesn't allow for the expression of complex iterative algorithms that > require updating edges or defining phases. We have discussed this before, > e.g. in [1]. The outcome of that discussion was that we should use for-loop > iterations for such cases, as the closed-loop iteration operators of Flink > might not provide the necessary flexibility. As you will see in the thread > though, that proposal didn't work out, as efficiently supporting for-loops > in Flink is not possible right now. > > -Vasia. > > [1]: http://apache-flink-mailing-list-archive.1008284. > n3.nabble.com/DISCUSS-Gelly-iteration-abstractions-td3949.html > > On 14 February 2017 at 08:10, Xingcan Cui <xingc...@gmail.com> wrote: > >> Hi Greg, >> >> I also found that in VertexCentricIteration.java, the message set is >> taken as the workset while the vertex set is taken as the delta for >> solution set. By doing like that, the setNewVertex method will not actually >> active a vertex. In other words, if no message is generated (the workset is >> empty) the "pact.runtime.workset-empty-aggregator" will judge >> convergence of the delta iteration and then the iteration just terminates. >> Is this a bug? >> >> Best, >> Xingcan >> >> >> On Mon, Feb 13, 2017 at 5:24 PM, Xingcan Cui <xingc...@gmail.com> wrote: >> >>> Hi Greg, >>> >>> Thanks for your attention. >>> >>> It takes me a little time to read the old PR on FLINK-1885. Though >>> the VertexCentricIteration, as well as its related classes, has been >>> refactored, I understand what Markus want to achieve. >>> >>> I am not sure if using a bulk iteration instead of a delta one could >>> eliminate the "out of memory" problem. Except for that, I think the "auto >>> update" has nothing to do with the bulk mode. Considering the compatible >>> guarantee, here is my suggestions to improve gelly's iteration API: >>> >>> 1) Add an "autoHalt" flag to the ComputeFunction. >>> >>> 2) If the flag is set true (default), apply the current mechanism . >>> >>> 3) If the flag is set false, call out.collect() to update the vertex >>> value whether the setNewVertexValue() method is called or not, unless the >>> user explicitly call a (new added) voteToHalt() method in the >>> ComputeFunction. >>> >>> By adding these, users can decide when to halt a vertex themselves. What >>> do you think? >>> >>> As for the "update edge values during vertex iterations" problem, I >>> think it needs a redesign for the gelly framework (Maybe merge the vertices >>> and edges into a single data set? Or just change the iterations' >>> implementation? I can't think it clearly now.), so that's it for now. >>> Besides, I don't think there will be someone who really would love to write >>> a graph algorithm with Flink native operators and that's why gelly is >>> designed, isn't it? >>> >>> Best, >>> Xingcan >>> >>> On Fri, Feb 10, 2017 at 10:31 PM, Greg Hogan <c...@greghogan.com> wrote: >>> >>>> Hi Xingcan, >>>> >>>> FLINK-1885 looked into adding a bulk mode to Gelly's iterative models. >>>> >>>> As an alternative you could implement your algorithm with Flink >>>> operators and a bulk iteration. Most of the Gelly library is written with >>>> native operators. >>>> >>>> Greg >>>> >>>> On Fri, Feb 10, 2017 at 5:02 AM, Xingcan Cui <xingc...@gmail.com> >>>> wrote: >>>> >>>>> Hi Vasia, >>>>> >>>>> b) As I said, when some vertices finished their work in current phase, >>>>> they have nothing to do (no value updates, no message received, just like >>>>> slept) but to wait for other vertices that have not finished (the current >>>>> phase) yet. After that in the next phase, all the vertices should go back >>>>> to work again and if there are some vertices become inactive in last >>>>> phase, >>>>> it could be hard to reactive them again by message since we even don't >>>>> know >>>>> which vertices to send to. The only solution is to keep all vertices >>>>> active, whether by updating vertices values in each super step or sending >>>>> heartbeat messages to vertices themselves (which will bring a lot of extra >>>>> work to the MessageCombiner). >>>>> >>>>> c) I know it's not elegant or even an awful idea to store the edge >>>>> info into vertex values. However, we can not change edge values or >>>>> maintain >>>>> states (even a pick or unpick mark) in edges during a vertex-centric >>>>> iteration. Then what can we do if an algorithm really need that? >>>>> >>>>> Thanks for your patience. >>>>> >>>>> Best, >>>>> Xingcan >>>>> >>>>> On Fri, Feb 10, 2017 at 4:50 PM, Vasiliki Kalavri < >>>>> vasilikikala...@gmail.com> wrote: >>>>> >>>>>> Hi Xingcan, >>>>>> >>>>>> On 9 February 2017 at 18:16, Xingcan Cui <xingc...@gmail.com> wrote: >>>>>> >>>>>>> Hi Vasia, >>>>>>> >>>>>>> thanks for your reply. It helped a lot and I got some new ideas. >>>>>>> >>>>>>> a) As you said, I did use the getPreviousIterationAggregate() >>>>>>> method in preSuperstep() of the next superstep. >>>>>>> However, if the (only?) global (aggregate) results can not be guaranteed >>>>>>> to be consistency, what should we >>>>>>> do with the postSuperstep() method? >>>>>>> >>>>>> >>>>>> The postSuperstep() method is analogous to the close() method in a >>>>>> RichFunction, which is typically used for cleanup. >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> b) Though we can active vertices by update method or messages, IMO, >>>>>>> it may be more proper for users >>>>>>> themselves to decide when to halt a vertex's iteration. Considering >>>>>>> a complex algorithm that contains different >>>>>>> phases inside a vertex-centric iteration. Before moving to the next >>>>>>> phase (that should be synchronized), >>>>>>> there may be some vertices that already finished their work in >>>>>>> current phase and they just wait for others. >>>>>>> Users may choose the finished vertices to idle until the next phase, >>>>>>> but rather than to halt them. >>>>>>> Can we consider adding the voteToHalt() method and some internal >>>>>>> variables to the Vertex/Edge class >>>>>>> (or just create an "advanced" version of them) to make the halting >>>>>>> more controllable? >>>>>>> >>>>>> >>>>>> >>>>>> I suppose adding a voteToHalt() method is possible, but I'm not sure >>>>>> I see how that would make halting more controllable. If a vertex hasn't >>>>>> changed value or hasn't received a message, it has no work to do in the >>>>>> next iteration, so why keep it active? If in a later superstep, a >>>>>> previously inactive vertex receives a message, it will become active >>>>>> again. >>>>>> Is this what you're looking for or am I missing something? >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> c) Sorry that I didn't make it clear before. Here the initialization >>>>>>> means a "global" one that executes once >>>>>>> before the iteration. For example, users may want to initialize the >>>>>>> vertices' values by their adjacent edges >>>>>>> before the iteration starts. Maybe we can add an extra >>>>>>> coGroupFunction to the configuration parameters >>>>>>> and apply it before the iteration? >>>>>>> >>>>>> >>>>>> >>>>>> You can initialize the graph by using any Gelly transformation >>>>>> methods before starting the iteration, e.g. mapVertices, mapEdges, >>>>>> reduceOnEdges, etc. >>>>>> Btw, a vertex can iterate over its edges inside the ComputeFunction >>>>>> using the getEdges() method. Initializing the vertex values with >>>>>> neighboring edges might not be a good idea if you have vertices with high >>>>>> degrees. >>>>>> >>>>>> >>>>>> Cheers, >>>>>> -Vasia. >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> What do you think? >>>>>>> >>>>>>> (BTW, I started a PR on FLINK-1526(MST Lib&Example). Considering the >>>>>>> complexity, the example is not >>>>>>> provided.) >>>>>>> >>>>>>> Really appreciate for all your help. >>>>>>> >>>>>>> Best, >>>>>>> Xingcan >>>>>>> >>>>>>> On Thu, Feb 9, 2017 at 5:36 PM, Vasiliki Kalavri < >>>>>>> vasilikikala...@gmail.com> wrote: >>>>>>> >>>>>>>> Hi Xingcan, >>>>>>>> >>>>>>>> On 7 February 2017 at 10:10, Xingcan Cui <xingc...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> I got some question about the vertex-centric iteration in Gelly. >>>>>>>>> >>>>>>>>> a) It seems the postSuperstep method is called before the >>>>>>>>> superstep barrier (I got different aggregate values of the same >>>>>>>>> superstep >>>>>>>>> in this method). Is this a bug? Or the design is just like that? >>>>>>>>> >>>>>>>> >>>>>>>> The postSuperstep() method is called inside the close() method of >>>>>>>> a RichCoGroupFunction that wraps the ComputeFunction. The close() >>>>>>>> method It is called after the last call to the coGroup() after >>>>>>>> each iteration superstep. >>>>>>>> The aggregate values are not guaranteed to be consistent during the >>>>>>>> same superstep when they are computed. To retrieve an aggregate value >>>>>>>> for >>>>>>>> superstep i, you should use the getPreviousIterationAggregate() >>>>>>>> method in superstep i+1. >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> b) There is not setHalt method for vertices. When no message >>>>>>>>> received, a vertex just quit the next iteration. Should I manually >>>>>>>>> send >>>>>>>>> messages (like heartbeat) to keep the vertices active? >>>>>>>>> >>>>>>>> >>>>>>>> That's because vertex halting is implicitly controlled by the >>>>>>>> underlying delta iterations of Flink. A vertex will remain active as >>>>>>>> long >>>>>>>> as it receives a message or it updates its value, otherwise it will >>>>>>>> become >>>>>>>> inactive. The documentation on Gelly iterations [1] and DataSet >>>>>>>> iterations >>>>>>>> [2] might be helpful. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> c) I think we may need an initialization method in the >>>>>>>>> ComputeFunction. >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> There exists a preSuperstep() method for initialization. This one >>>>>>>> will be executed once per superstep before the compute function is >>>>>>>> invoked >>>>>>>> for every vertex. Would this work for you? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> Any opinions? Thanks. >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Xingcan >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> I hope this helps, >>>>>>>> -Vasia. >>>>>>>> >>>>>>>> >>>>>>>> [1]: https://ci.apache.org/projects/flink/flink-docs-release-1.2/ >>>>>>>> dev/libs/gelly/iterative_graph_processing.html#vertex-centri >>>>>>>> c-iterations >>>>>>>> [2]: https://ci.apache.org/projects/flink/flink-docs-release-1.2/ >>>>>>>> dev/batch/iterations.html >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >