In aggregateMessagesWithActiveSet, Spark still have to read all edges. It means that a fixed time which scale with graph size is unavoidable on a pregel-like iteration.
But what if I have to iterate nearly 100 iterations but at the last 50 iterations there are only < 0.1% nodes need to be updated ? The fixed time make the program finished at a unacceptable time consumption. Alcaid 2015-04-08 1:41 GMT+08:00 Ankur Dave <ankurd...@gmail.com>: > We thought it would be better to simplify the interface, since the > active set is a performance optimization but the result is identical > to calling subgraph before aggregateMessages. > > The active set option is still there in the package-private method > aggregateMessagesWithActiveSet. You can actually access it publicly > via GraphImpl, though the API isn't guaranteed to be stable: > graph.asInstanceOf[GraphImpl[VD,ED]].aggregateMessagesWithActiveSet(...) > Ankur > > > On Tue, Apr 7, 2015 at 2:56 AM, James <alcaid1...@gmail.com> wrote: > > Hello, > > > > The old api of GraphX "mapReduceTriplets" has an optional parameter > > "activeSetOpt: Option[(VertexRDD[_]" that limit the input of sendMessage. > > > > However, to the new api "aggregateMessages" I could not find this option, > > why it does not offer any more? > > > > Alcaid >