Thanks for the comments. I'll consider making it more clear how it really works.
For the uptrends-this-week I linked below the algorithm is like this: score = abs(h2 - h1) * log((h2 + 1) / (h1 + 1)); h2 = hits from now to 1 week ago h1 = (hits from 1 week ago to 3 weeks ago) / 2 The other time spans work similarly. Sounds easy, but the dataset is _huge_. 3 months of data is around 500GB uncompressed. The motivation is that I think using just score = h2 - h1 or score = h2 / h1 will filter out many interesting increases. With h2 - h1 you'll miss pages going from for example 1 000 to 20 000, since they'll drown in the daily fluctuations of the >100k hits pages. With h2 / h1 it's the opposite; pages going from 100k to 500k will drown the daily fluctuation of the <1k pages. The logarithm is there to attenuate newly created pages. On Mon, Mar 21, 2011 at 18:12, Pete Forsyth <[email protected]> wrote: > Fascinating, Johan. > > Can you describe the ranking a bit? It's very interesting to see that the > Chernobyl disaster had a 388% increase, but I don't understand why it would > be in a top 10 list among others whose upticks were in the thousands and > millions of percentage points. > > I do see on your "About Wikitrends" page that "Ranking is a measurement based > on both absolute and relative increase of page views." > > I would suggest that having that statement (perhaps with a tiny bit more > detail) in the header for the Wikitrends page itself (above the ranked > articles) would be very helpful; and that on the "About" page, it would be > nice to have a more detailed explanation of how the articles are ranked. > > Regardless, a very interesting tool, highlighting a revealing collection of > articles people are reading. > > -Pete > > > On Mar 21, 2011, at 10:04 AM, Johan Gunnarsson wrote: > >> Cool. See also: >> >> http://toolserver.org/~johang/wikitrends/english-uptrends-this-week.html >> >> It has more languages, longer time spans and a bit more sophisticated >> ranking algorithm. >> >> On Mon, Mar 21, 2011 at 17:14, Magnus Manske >> <[email protected]> wrote: >>> Top 50 viewed articles per hour, now aggregated and browsable: >>> >>> http://toolserver.org/~magnus/toptopics.php >>> >>> Currently en.wp and de.wp only. Backfilled 5 days. Will be updated >>> every hour automatically from now on. API coming soon-ish. >>> >>> Cheers, >>> Magnus >>> >>> _______________________________________________ >>> Wikipedia-l mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/wikipedia-l >>> >> >> _______________________________________________ >> Wikipedia-l mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/wikipedia-l > > Pete Forsyth > [email protected] > 503-383-9454 mobile > > _______________________________________________ > Wikipedia-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikipedia-l > _______________________________________________ Wikipedia-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikipedia-l
