Since the 'aggregator' is being used for counting the number of updated vertices as well, I think there's no bug.
Can you provide your scenario as a unit test? On Wed, Apr 24, 2013 at 7:47 PM, Steven van Beelen <[email protected]> wrote: > I'm sorry to say so, but the problem still arises. Additionally I found > that 'aggregate(v, v.getValue())' > is called twice as often as 'aggregate(v, lastValue, v.getValue())'. > I can not seem to find in the AggregationRunner or GraphJobRunner why this > is so. > But, in a case were five vertices exists, aggregate(v, v.getValue()) will > be called five times, directly followed by the finalizeAggregation() call. > But proceeding this, five pairs of aggregate(v, v.getValue()) and > 'aggregate(v, > lastValue, v.getValue())' are called as logically follows from the > public void aggregateVertex(M lastValue, Vertex<V, E, M> v) in the > AggregationRunner class. > > Additionally to this I could give you my code, maybe some flaw in there > causes this problem? > > > On Wed, Apr 24, 2013 at 10:43 AM, Edward J. Yoon <[email protected]>wrote: > >> Steven, >> >> Could you please try your application again with >> http://people.apache.org/~edwardyoon/dist/test/ and feedback me >> whether it works correctly as you expected? >> >> On Wed, Apr 24, 2013 at 4:53 PM, Edward J. Yoon <[email protected]> >> wrote: >> > Thanks for your report. It could be a bug. I'll have a look at it now. >> > >> > On Wed, Apr 24, 2013 at 4:48 PM, Steven van Beelen <[email protected]> >> wrote: >> >> I'm running version 0.6.1. >> >> Looking at the results I found through testing, >> >> >> >> public void aggregateVertex(M lastValue, Vertex<V, E, M> v) >> >> >> >> doesn't seem to be the problem. Both 'aggregate(v, v.getValue())' and >> >> 'aggregate(v, lastValue, v.getValue())' >> >> are called correctly and work on the same values. >> >> >> >> However, when finalizing through 'finalizeAggregation()' in the >> >> 'public void doMasterAggregation(MapWritable updatedCnt)' method, >> >> >> >> the value aggregated upon by 'aggregate(v, lastValue, v.getValue())' >> >> is lost. That is what happens at me. >> >> >> >> Could it be that I'm implementing the aggregate methods incorrect? >> >> >> >> In the end however, I can not find a direct bug in TRUNK[1], although >> >> it is not clear to me what/which part of the code was changed through >> >> the ticket on JIRA. >> >> >> >> >> >> >> >> >> >> On Wed, Apr 24, 2013 at 2:41 AM, Edward J. Yoon <[email protected] >> >wrote: >> >> >> >>> I found the ticket on JIRA - >> >>> https://issues.apache.org/jira/browse/HAMA-659 >> >>> >> >>> And it seems already fixed. >> >>> >> >>> What is your version of hama here? and can you find some bug in >> TRUNK[1]? >> >>> >> >>> 1. >> >>> >> http://svn.apache.org/repos/asf/hama/trunk/graph/src/main/java/org/apache/hama/graph/AggregationRunner.java >> >>> >> >>> On Tue, Apr 23, 2013 at 9:41 PM, Steven van Beelen < >> [email protected]> >> >>> wrote: >> >>> > Could anyone tell me if I'm correct concerning the possible problem I >> >>> > posted and replied on in the previous two emails? >> >>> > >> >>> > >> >>> > On Wed, Apr 17, 2013 at 5:08 PM, Steven van Beelen < >> [email protected] >> >>> >wrote: >> >>> > >> >>> >> Additionally, I found this in the mail archives: >> >>> >> >> >>> >> >> >>> >> http://mail-archives.apache.org/mod_mbox/hama-user/201210.mbox/%3CCAJ-=ys=W8F5W4aduV+=+yfsvh41xsa22-wnqqrkapadzd+q...@mail.gmail.com%3E >> >>> >> This actually exactly covers my point. Is this still considered as a >> >>> bug, >> >>> >> calling two different aggregate functions in a row? >> >>> >> >> >>> >> >> >>> >> On Wed, Apr 17, 2013 at 2:35 PM, Steven van Beelen < >> >>> [email protected]>wrote: >> >>> >> >> >>> >>> Hi Thomas, >> >>> >>> >> >>> >>> Then I guess I did not explain myself clearly. >> >>> >>> What you describe is indeed how I think of the AverageAggregator to >> >>> work, >> >>> >>> but if I use the AverageAggregator in my own PageRank >> implementation it >> >>> >>> does not return >> >>> >>> the average of all absolute differences but just the average of >> the sum >> >>> >>> of all values. >> >>> >>> >> >>> >>> The (very) small example graph I use has only five vertices, were >> the >> >>> sum >> >>> >>> of every vertice it's value is always 1.0. >> >>> >>> When I use the AverageAggregator it will always return 0.2 when >> calling >> >>> >>> the getLastAggregatedValue method. >> >>> >>> It shouldn't do that right? >> >>> >>> >> >>> >>> >> >>> >>> On Wed, Apr 17, 2013 at 1:18 PM, Thomas Jungblut < >> >>> >>> [email protected]> wrote: >> >>> >>> >> >>> >>>> Hi Steven, >> >>> >>>> >> >>> >>>> the AverageAggregator is used to determine the average of all >> absolute >> >>> >>>> differences between old pagerank and new pagerank for every >> vertex. >> >>> >>>> This is documented like it should behave in the javadoc of the >> given >> >>> >>>> classes and suffices to track if pagerank values have yet >> converged or >> >>> >>>> not. >> >>> >>>> >> >>> >>>> What you describe is a perfectly valid way to track the pagerank >> >>> >>>> difference >> >>> >>>> throughout all supersteps. But this is not how (imho) the >> >>> >>>> AverageAggregator >> >>> >>>> should behave, so you have to write your own. >> >>> >>>> >> >>> >>>> >> >>> >>>> 2013/4/17 Steven van Beelen <[email protected]> >> >>> >>>> >> >>> >>>> > The values in my case are the DoubleWritable values each >> vertice has >> >>> >>>> and >> >>> >>>> > the aggregators aggregate on. >> >>> >>>> > My tests showed that, when the aggregator was set to >> >>> >>>> AverageAggregator, the >> >>> >>>> > average of all the vertice values from the past compute step >> were >> >>> >>>> returned. >> >>> >>>> > Actually, AverageAggregator should return the average >> difference of >> >>> >>>> all the >> >>> >>>> > old-new value pairs of every vertice instead of the mean. >> >>> >>>> > The average difference is then used to check whether >> convergence is >> >>> >>>> > reached, which is relevant for all task ofcourse. >> >>> >>>> > >> >>> >>>> > Hence, the convergence point, for which the Aggregator is used, >> will >> >>> >>>> not be >> >>> >>>> > reached. >> >>> >>>> > This thus makes it so that the algorithm will just run the >> maximum >> >>> >>>> number >> >>> >>>> > of iterations set (30 iterations on the PageRank example) in >> every >> >>> >>>> case. >> >>> >>>> > I experienced the same with my own PageRank implementation. >> >>> >>>> > >> >>> >>>> > I think it has something to do with the finalizeAggregation step >> >>> taken. >> >>> >>>> > Next to that, both the 'aggregate(VERTEX vertex, M value)' and >> >>> >>>> > 'aggregate(VERTEX vertex, M oldValue, M newValue)' methods are >> >>> called >> >>> >>>> every >> >>> >>>> > time, were one would think only the second (with old/new values) >> >>> would >> >>> >>>> > suffice. >> >>> >>>> > Because of this, the global variable 'absoluteDifference' in the >> >>> >>>> > 'AbsDiffAggregator' class is overwriten/overruled by the first >> >>> >>>> aggregate. >> >>> >>>> > Additionally, if one would make its own Aggregation class in the >> >>> same >> >>> >>>> > fashion as AbsDiffAggregator and AverageAggregator, but leave >> out >> >>> the >> >>> >>>> > 'aggregate(VERTEX vertex, M value)', my output turned out to be >> >>> 0.0000 >> >>> >>>> > every time. >> >>> >>>> > >> >>> >>>> > I hope I made myself clear. >> >>> >>>> > Regards >> >>> >>>> > >> >>> >>>> > >> >>> >>>> > On Wed, Apr 17, 2013 at 11:57 AM, Edward J. Yoon < >> >>> >>>> [email protected] >> >>> >>>> > >wrote: >> >>> >>>> > >> >>> >>>> > > Thanks for your report. >> >>> >>>> > > >> >>> >>>> > > What's the meaning of 'all the values'? Please give me more >> >>> details >> >>> >>>> > > about your problem. >> >>> >>>> > > >> >>> >>>> > > I didn't look at 'dangling links & aggregators' part of >> PageRank >> >>> >>>> > > example closely, but I think there's no bug. Aggregators is >> just >> >>> used >> >>> >>>> > > for global communication. For example, finding max value[1] >> can be >> >>> >>>> > > done in only one iteration using MaxValueAggregator. >> >>> >>>> > > >> >>> >>>> > > 1. >> >>> >>>> >> http://cdn.dejanseo.com.au/wp-content/uploads/2011/06/supersteps.png >> >>> >>>> > > >> >>> >>>> > > On Wed, Apr 17, 2013 at 6:27 PM, Steven van Beelen < >> >>> >>>> [email protected] >> >>> >>>> > > >> >>> >>>> > > wrote: >> >>> >>>> > > > Hello, >> >>> >>>> > > > >> >>> >>>> > > > I'm creating my own pagerank in hama for a testing and I >> think I >> >>> >>>> found >> >>> >>>> > a >> >>> >>>> > > > problem with the AverageAggregator. I'm not sure if it is >> me or >> >>> >>>> the the >> >>> >>>> > > > AverageAggregator class in general, but I believe it just >> >>> returns >> >>> >>>> the >> >>> >>>> > > mean >> >>> >>>> > > > of all the values instead of the average difference between >> the >> >>> >>>> old and >> >>> >>>> > > new >> >>> >>>> > > > value as intended. >> >>> >>>> > > > >> >>> >>>> > > > For testing, I created my own AbsDiffAggregator and >> >>> >>>> AverageAggregator >> >>> >>>> > > > classes, using FloatWritable instead of DoubleWritables. The >> >>> same >> >>> >>>> > problem >> >>> >>>> > > > still occured: I got a mean of all the values in the graph >> >>> instead >> >>> >>>> of >> >>> >>>> > an >> >>> >>>> > > > average difference. >> >>> >>>> > > > >> >>> >>>> > > > Could someone tell me if I'm doing something wrong or what I >> >>> should >> >>> >>>> > > provide >> >>> >>>> > > > to better explain my problem? >> >>> >>>> > > > >> >>> >>>> > > > Regards, >> >>> >>>> > > > Steven van Beelen, Vrije Universiteit of Amsterdam >> >>> >>>> > > >> >>> >>>> > > >> >>> >>>> > > >> >>> >>>> > > -- >> >>> >>>> > > Best Regards, Edward J. Yoon >> >>> >>>> > > @eddieyoon >> >>> >>>> > > >> >>> >>>> > >> >>> >>>> >> >>> >>> >> >>> >>> >> >>> >> >> >>> >> >>> >> >>> >> >>> -- >> >>> Best Regards, Edward J. Yoon >> >>> @eddieyoon >> >>> >> > >> > >> > >> > -- >> > Best Regards, Edward J. Yoon >> > @eddieyoon >> >> >> >> -- >> Best Regards, Edward J. Yoon >> @eddieyoon >> -- Best Regards, Edward J. Yoon @eddieyoon
