Thanks for the comments. I'll consider making it more clear how it really works.

For the uptrends-this-week I linked below the algorithm is like this:

score = abs(h2 - h1) * log((h2 + 1) / (h1 + 1));
h2 = hits from now to 1 week ago
h1 = (hits from 1 week ago to 3 weeks ago) / 2

The other time spans work similarly. Sounds easy, but the dataset is
_huge_. 3 months of data is around 500GB uncompressed.

The motivation is that I think using just score = h2 - h1 or score =
h2 / h1 will filter out many interesting increases. With h2 - h1
you'll miss pages going from for example 1 000 to 20 000, since
they'll drown in the daily fluctuations of the >100k hits pages. With
h2 / h1 it's the opposite; pages going from 100k to 500k will drown
the daily fluctuation of the <1k pages. The logarithm is there to
attenuate newly created pages.

On Mon, Mar 21, 2011 at 18:12, Pete Forsyth <[email protected]> wrote:
> Fascinating, Johan.
>
> Can you describe the ranking a bit? It's very interesting to see that the 
> Chernobyl disaster had a 388% increase, but I don't understand why it would 
> be in a top 10 list among others whose upticks were in the thousands and 
> millions of percentage points.
>
> I do see on your "About Wikitrends" page that "Ranking is a measurement based 
> on both absolute and relative increase of page views."
>
> I would suggest that having that statement (perhaps with a tiny bit more 
> detail) in the header for the Wikitrends page itself (above the ranked 
> articles) would be very helpful; and that on the "About" page, it would be 
> nice to have a more detailed explanation of how the articles are ranked.
>
> Regardless, a very interesting tool, highlighting a revealing collection of 
> articles people are reading.
>
> -Pete
>
>
> On Mar 21, 2011, at 10:04 AM, Johan Gunnarsson wrote:
>
>> Cool. See also:
>>
>> http://toolserver.org/~johang/wikitrends/english-uptrends-this-week.html
>>
>> It has more languages, longer time spans and a bit more sophisticated
>> ranking algorithm.
>>
>> On Mon, Mar 21, 2011 at 17:14, Magnus Manske
>> <[email protected]> wrote:
>>> Top 50 viewed articles per hour, now aggregated and browsable:
>>>
>>> http://toolserver.org/~magnus/toptopics.php
>>>
>>> Currently en.wp and de.wp only. Backfilled 5 days. Will be updated
>>> every hour automatically from now on. API coming soon-ish.
>>>
>>> Cheers,
>>> Magnus
>>>
>>> _______________________________________________
>>> Wikipedia-l mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/wikipedia-l
>>>
>>
>> _______________________________________________
>> Wikipedia-l mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/wikipedia-l
>
> Pete Forsyth
> [email protected]
> 503-383-9454 mobile
>
> _______________________________________________
> Wikipedia-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikipedia-l
>

_______________________________________________
Wikipedia-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikipedia-l

Reply via email to