Hi Maja, yep, your explanation makes sense. Clear now.
Paiki On 11 September 2012 16:09, Maja Kabiljo <[email protected]> wrote: > Hi Paolo, > > Glad to hear it works :-) > > The reason why you don't see the value you set with setAggregatedValue > right away is that we want to read aggregated values from previous > superstep and change them for next one. It goes the same with vertices > where you call aggregate to give values for next superstep and read the > values from previous. This is actually the part which wasn't working well > before - it wasn't possible to get aggregated value without changes that > vertices on the same worker made in current superstep. Hope this makes it > clear for you. > > Maja > > > On 9/11/12 12:45 PM, "Paolo Castagna" <[email protected]> wrote: > >>Hi, >>the green bar is back. :-) >> >>I made multiple mistakes in relation to the new aggregators but now I >>believe I grasped how they work. >> >>For those interested the PageRankVertex, PageRankMasterCompute and >>PageRankWorkerContext are here: >>https://github.com/castagna/jena-grande/blob/9dd50837d6a13c542cce5d77a69ce >>a071a91cee8/src/main/java/org/apache/jena/grande/giraph/pagerank/PageRankV >>ertex.java >>https://github.com/castagna/jena-grande/blob/9dd50837d6a13c542cce5d77a69ce >>a071a91cee8/src/main/java/org/apache/jena/grande/giraph/pagerank/PageRankM >>asterCompute.java >>https://github.com/castagna/jena-grande/blob/9dd50837d6a13c542cce5d77a69ce >>a071a91cee8/src/main/java/org/apache/jena/grande/giraph/pagerank/PageRankW >>orkerContext.java >> >>There might be some further improvement left, but I'll try that another >>time. >> >>For example: >> >> registerPersistentAggregator("dangling-current", >>DoubleSumAggregator.class); >> registerPersistentAggregator("error-current", >>DoubleSumAggregator.class); >> >>Could probably be registerAggregator. >> >>I also noticed that within the compute() method if I call >>setAggregatedValue("name", ...) and getAggregatedValue("name") I don't >>seem to get the value set back. But the value is sent to the worker. >>This is not important, but it confuses me. >> >>I do agree with you, now the situation around aggregators is cleaner >>than before. >> >>Thank you for your help. >> >>Paolo >> >>PS: >>There is still a known failure in the tests, that is to show that the >>SimplePageRankVertex approach is "too simple", it does not give back a >>probability distribution (i.e. sum at the end is not 1.0) and it does >>not take into account dangling nodes properly. >>On the other hand, PageRankVertex produces same results as two other >>implementations: one serial, all in memory and another one using JUNG. >> >>On 11 September 2012 11:03, Maja Kabiljo <[email protected]> wrote: >>> Hi Paolo, >>> >>> You get null for aggregated value because aggregators haven't been >>> registered yet in the moment WorkerContext.preApllication() is called. >>>But >>> I think that shouldn't be a problem since you can set initial values for >>> aggregators in MasterCompute.initialize(). >>> >>> Please also note that you are not using the new aggregator api in the >>> proper way. Function getAggregatedValue will return the value of the >>> aggregator, not the aggregator object itself. It's not possible to set >>>the >>> value of the aggregators on workers (in methods from WorkerContext and >>> Vertex), because that would produce nondeterministic results. You >>> aggregate on workers and set values on master. >>> >>> As for persistent vs regular aggregator, value of regular aggregator is >>> being reset before each superstep, while the persistent isn't. For >>> example, if you have a persistent sum aggregator its value is going to >>>be >>> the sum of all values given to it from the beginning of application. If >>> you have regular sum aggregator the value is going to be just the sum of >>> values from previous superstep. >>> >>> I can write a small tutorial about aggregators if someone can tell me >>> where and how to do that. :-) I see that for people who were using >>> aggregators before these changes will be confusing, but I think that for >>> the ones who are starting with current state it will be much easier. >>> >>> Maja >>> >>> On 9/11/12 9:49 AM, "Paolo Castagna" <[email protected]> wrote: >>> >>>>Hi, >>>>this is how I run the PageRank implementation (mine takes into account >>>>dangling nodes and checks for convergence): >>>> >>>>Map<String,String> params = new HashMap<String,String>(); >>>>params.put(GiraphJob.WORKER_CONTEXT_CLASS, >>>>"org.apache.jena.grande.giraph.pagerank.PageRankVertexWorkerContext"); >>>>params.put(GiraphJob.MASTER_COMPUTE_CLASS, >>>>"org.apache.jena.grande.giraph.pagerank.PageRankVertexMasterCompute"); >>>> >>>>String[] data = getData ( filename ); >>>>Iterable<String> results = InternalVertexRunner.run( >>>> PageRankVertex.class, >>>> PageRankVertexInputFormat.class, >>>> PageRankVertexOutputFormat.class, >>>> params, >>>> data >>>>); >>>> >>>>This used to work, however I was registering aggregators in >>>>PageRankVertexWorkerContext (see below). >>>> >>>>Now, I am trying to do the same in PageRankVertexMasterCompute which >>>>extends DefaultMasterCompute and has only one method: >>>> >>>>@Override >>>>public void initialize() throws InstantiationException, >>>>IllegalAccessException { >>>> log.debug("initialize"); >>>> registerPersistentAggregator("dangling-current", >>>>DoubleSumAggregator.class); >>>> registerPersistentAggregator("error-current", >>>>DoubleSumAggregator.class); >>>> registerPersistentAggregator("pagerank-sum", >>>>DoubleSumAggregator.class); >>>> registerPersistentAggregator("vertices-count", >>>>LongSumAggregator.class); >>>>} >>>> >>>>I am not 100% sure about registerAggregator vs. >>>>registerPersistentAggregator. >>>> >>>>The initialize() method is now being called, I see this on the console: >>>>09:34:46 DEBUG PageRankVertexMasterCompute :: initialize >>>> >>>>In PageRankVertexWorkerContext which extends WorkerContex I override >>>>the preApplication() method: >>>> >>>>@SuppressWarnings("unchecked") >>>>@Override >>>>public void preApplication() throws InstantiationException, >>>>IllegalAccessException { >>>> log.debug("preApplication()"); >>>> >>>>System.out.println(((Aggregator<DoubleWritable>)getAggregatedValue("erro >>>>r- >>>>current"))); >>>> >>>>((Aggregator<DoubleWritable>)getAggregatedValue("error-current")).setAgg >>>>re >>>>gatedValue( >>>>new DoubleWritable( Double.MAX_VALUE ) ); >>>>} >>>> >>>>The getAggregatedValue("error-current") above is null and I do not >>>>understand why. >>>> >>>>Just to make things even more clear, this is how I used to run the >>>>PageRank implementation locally: >>>>https://github.com/castagna/jena-grande/blob/2fa8a1b879a464d8e3db84e78ed >>>>d5 >>>>39c70274e7c/src/main/java/org/apache/jena/grande/giraph/pagerank/RunPage >>>>Ra >>>>nkVertexLocally.java >>>>And this is the WorkerContext I used to have: >>>>https://github.com/castagna/jena-grande/blob/2fa8a1b879a464d8e3db84e78ed >>>>d5 >>>>39c70274e7c/src/main/java/org/apache/jena/grande/giraph/pagerank/PageRan >>>>kV >>>>ertexWorkerContext.java >>>> >>>>As you can see, it used to call registerAggregator(...) in the >>>>preApplication() method: >>>> >>>>@SuppressWarnings("unchecked") >>>>@Override >>>>public void preApplication() throws InstantiationException, >>>>IllegalAccessException { >>>> log.debug("preApplication()"); >>>> registerAggregator("dangling-current", SumAggregator.class); >>>> registerAggregator("error-current", SumAggregator.class); >>>> registerAggregator("pagerank-sum", SumAggregator.class); >>>> registerAggregator("vertices-count", LongSumAggregator.class); >>>> >>>> >>>>((Aggregator<DoubleWritable>)getAggregator("error-current")).setAggregat >>>>ed >>>>Value( >>>>new DoubleWritable( Double.MAX_VALUE ) ); >>>>} >>>> >>>>The registerAggregator() method in WorkerContext is gone and I am >>>>trying to achieve the same via MasterCompute now. >>>> >>>>Regards, >>>>Paolo >>>> >>>> >>>> >>>> >>>>On 11 September 2012 00:20, Paolo Castagna <[email protected]> >>>>wrote: >>>>> Hi Gianmarco, >>>>> good, that was one problem... but I am not yet back to the green bar. >>>>> >>>>> Here is how I am running it locally now: >>>>> >>>>> Map<String,String> params = new HashMap<String,String>(); >>>>> params.put(GiraphJob.WORKER_CONTEXT_CLASS, >>>>> "org.apache.jena.grande.giraph.pagerank.PageRankVertexWorkerContext"); >>>>> params.put(GiraphJob.MASTER_COMPUTE_CLASS, >>>>> >>>>>"org.apache.jena.grande.giraph.pagerank.SimplePageRankVertexMasterCompu >>>>>te >>>>>"); >>>>> >>>>> String[] data = getData ( filename ); >>>>> Iterable<String> results = InternalVertexRunner.run( >>>>> PageRankVertex.class, >>>>> PageRankVertexInputFormat.class, >>>>> PageRankVertexOutputFormat.class, >>>>> params, >>>>> data >>>>> ); >>>>> >>>>> However, I need to learn more about the MasterComput (and its relation >>>>> with WorkerContext). >>>>> >>>>> Paolo >>>>> >>>>> On 10 September 2012 22:08, Gianmarco De Francisci Morales >>>>> <[email protected]> wrote: >>>>>> Hi Paolo, >>>>>> >>>>>> Are you setting the MasterCompute class? >>>>>> You can do it with this option of bin/giraph >>>>>> -mc,--masterCompute <arg> MasterCompute class >>>>>> >>>>>> Cheers, >>>>>> -- >>>>>> Gianmarco >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Sep 10, 2012 at 9:36 PM, Paolo Castagna >>>>>><[email protected]> >>>>>> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> first and foremost, thanks for all the work and improvements on >>>>>>>Giraph. >>>>>>> I went away from computers for a while (personal reasons) and >>>>>>>changed >>>>>>> job, now I am back and playing with Giraph when I can. >>>>>>> >>>>>>> I updated my little examples (overall, it was easy and quick, here >>>>>>>the >>>>>>> changes [1]. Just in case others are in a similar situation and want >>>>>>> to have a look). >>>>>>> >>>>>>> I am not sure I get the 'new' aggregators and in particular how I >>>>>>>can >>>>>>> 'register' them. My tests failing confirm my non understanding! And >>>>>>> forgive me if I come here and ask such a simple question. >>>>>>> >>>>>>> Here is what I used to do [2]: >>>>>>> >>>>>>> public class PageRankVertexWorkerContext extends WorkerContext { >>>>>>> >>>>>>> private static final Logger log = >>>>>>> LoggerFactory.getLogger(PageRankVertexWorkerContext.class); >>>>>>> >>>>>>> public static double errorPrevious = Double.MAX_VALUE; >>>>>>> public static double danglingPrevious = 0d; >>>>>>> >>>>>>> @SuppressWarnings("unchecked") >>>>>>> @Override >>>>>>> public void preApplication() throws InstantiationException, >>>>>>> IllegalAccessException { >>>>>>> log.debug("preApplication()"); >>>>>>> registerAggregator("dangling-current", SumAggregator.class); >>>>>>> registerAggregator("error-current", SumAggregator.class); >>>>>>> registerAggregator("pagerank-sum", SumAggregator.class); >>>>>>> registerAggregator("vertices-count", LongSumAggregator.class); >>>>>>> >>>>>>> >>>>>>> >>>>>>>((Aggregator<DoubleWritable>)getAggregator("error-current")).setAggre >>>>>>>ga >>>>>>>tedValue( >>>>>>> new DoubleWritable( Double.MAX_VALUE ) ); >>>>>>> } >>>>>>> >>>>>>> [...] >>>>>>> >>>>>>> >>>>>>> Here is what I am trying to do now [3]: >>>>>>> >>>>>>> public class PageRankVertexWorkerContext extends WorkerContext { >>>>>>> >>>>>>> private static final Logger log = >>>>>>> LoggerFactory.getLogger(PageRankVertexWorkerContext.class); >>>>>>> >>>>>>> public static double errorPrevious = Double.MAX_VALUE; >>>>>>> public static double danglingPrevious = 0d; >>>>>>> >>>>>>> // TODO: double check this... how is calling initialize()? >>>>>>> public static class SimplePageRankVertexMasterCompute extends >>>>>>> DefaultMasterCompute { >>>>>>> @Override >>>>>>> public void initialize() throws InstantiationException, >>>>>>> IllegalAccessException { >>>>>>> registerAggregator("dangling-current", >>>>>>>DoubleSumAggregator.class); >>>>>>> registerAggregator("error-current", >>>>>>>DoubleSumAggregator.class); >>>>>>> registerAggregator("pagerank-sum", DoubleSumAggregator.class); >>>>>>> registerAggregator("vertices-count", LongSumAggregator.class); >>>>>>> } >>>>>>> } >>>>>>> >>>>>>> [...] >>>>>>> >>>>>>> >>>>>>> I am not convinced someone is actually calling the initialize() >>>>>>>method >>>>>>> and there must be something I am missing (yesterday was late, after >>>>>>>a >>>>>>> long day at work). >>>>>>> >>>>>>> Anyway, is there a place/example where I can learn how to use >>>>>>> Aggregators with the new Giraph? >>>>>>> >>>>>>> Thanks again and it's good to see Giraph mailing list and JIRA >>>>>>>'brewing' >>>>>>> ;-) >>>>>>> >>>>>>> Paolo >>>>>>> >>>>>>> >>>>>>> [1] >>>>>>> >>>>>>>https://github.com/castagna/jena-grande/commit/3edc0a7780f5e7c25d3795 >>>>>>>6c >>>>>>>158d878b590858b5#src/main/java/org/apache/jena/grande/giraph/pagerank >>>>>>>/P >>>>>>>ageRankVertexWorkerContext.java >>>>>>> [2] >>>>>>> >>>>>>>https://github.com/castagna/jena-grande/blob/2fa8a1b879a464d8e3db84e7 >>>>>>>8e >>>>>>>dd539c70274e7c/src/main/java/org/apache/jena/grande/giraph/pagerank/P >>>>>>>ag >>>>>>>eRankVertexWorkerContext.java >>>>>>> [3] >>>>>>> >>>>>>>https://github.com/castagna/jena-grande/blob/3edc0a7780f5e7c25d37956c >>>>>>>15 >>>>>>>8d878b590858b5/src/main/java/org/apache/jena/grande/giraph/pagerank/P >>>>>>>ag >>>>>>>eRankVertexWorkerContext.java >>>>>> >>>>>> >>> >
