Re: [Wikitech-l] [Analytics] Wikiscan statistics tool for Wikimedia projects

2017-07-31 Thread Akeron
Thanks Eric, it looks interesting. Actually I am able to maintain a full
dataset for users but not for pages on big wikis, it may be a good
alternative to display the approximative number of edited pages over a
month or more.

2017-07-31 17:22 GMT+02:00 Erik Bernhardson :

> On Mon, Jul 31, 2017 at 7:18 AM, Akeron  wrote:
>
>> Hi Igal,
>> All suggestions are welcome :)
>> Supporting this feature shouldn't be too difficult in theory because it
>> is already working with this kind of aggregation (month are built from
>> days, years from months...). The main problem is scalability for stats
>> which require uniqueness like number of users or number of edits *per
>> page*. That's why yearly stats can actually be disabled on some big wikis.
>> So it would be feasible but with edits limitations for the range (like 3-5
>> millions) and it would be very slow to load with lots of edits.
>>
>
> One way to handle the scalability problem is to use HyperLogLog counters.
> These are an approximate algorithm for which you can store daily counters,
> and then merge the counters to get weekly/monthly/etc, avoiding the cost of
> doing the calculation over something like an entire year just for the one
> stat.  Of course because these are approximate they may not be exactly what
> you are looking for, just an idea.
>
>
>>
>> Akeron
>>
>> 2017-07-31 14:29 GMT+02:00 יגאל חיטרון :
>>
>>> Hello. It's amazing, thank you very much!
>>> Could I suggest one more feature, please? With it, the tool will be
>>> perfect. I'm talking about aggregation. Any kind of historical statistics
>>> for some day, month or year can be also shown as range of time. For
>>> example, if we have month statistics, we could fill From field to be Jan
>>> 2008 and To field to be May 2011, and get the aggregated numbers for this
>>> range. Is it possible?
>>> Thank you very much again,
>>> Igal (User:IKhitron)
>>>
>>> On Jul 30, 2017 22:18, "Pine W"  wrote:
>>>
>>> > Wikiscan is an interesting tool for statistics fans. I suggest briefly
>>> > reading this IEG page
>>> > , then
>>> > playing with the tool on https://wikiscan.org/
>>> >
>>> > Pine
>>> > ___
>>> > Wikitech-l mailing list
>>> > Wikitech-l@lists.wikimedia.org
>>> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>> ___
>>> Wikitech-l mailing list
>>> Wikitech-l@lists.wikimedia.org
>>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>>
>>
>>
>> ___
>> Analytics mailing list
>> analyt...@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
> ___
> Analytics mailing list
> analyt...@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [Analytics] Wikiscan statistics tool for Wikimedia projects

2017-07-31 Thread Erik Bernhardson
On Mon, Jul 31, 2017 at 7:18 AM, Akeron  wrote:

> Hi Igal,
> All suggestions are welcome :)
> Supporting this feature shouldn't be too difficult in theory because it is
> already working with this kind of aggregation (month are built from days,
> years from months...). The main problem is scalability for stats which
> require uniqueness like number of users or number of edits *per page*.
> That's why yearly stats can actually be disabled on some big wikis. So it
> would be feasible but with edits limitations for the range (like 3-5
> millions) and it would be very slow to load with lots of edits.
>

One way to handle the scalability problem is to use HyperLogLog counters.
These are an approximate algorithm for which you can store daily counters,
and then merge the counters to get weekly/monthly/etc, avoiding the cost of
doing the calculation over something like an entire year just for the one
stat.  Of course because these are approximate they may not be exactly what
you are looking for, just an idea.


>
> Akeron
>
> 2017-07-31 14:29 GMT+02:00 יגאל חיטרון :
>
>> Hello. It's amazing, thank you very much!
>> Could I suggest one more feature, please? With it, the tool will be
>> perfect. I'm talking about aggregation. Any kind of historical statistics
>> for some day, month or year can be also shown as range of time. For
>> example, if we have month statistics, we could fill From field to be Jan
>> 2008 and To field to be May 2011, and get the aggregated numbers for this
>> range. Is it possible?
>> Thank you very much again,
>> Igal (User:IKhitron)
>>
>> On Jul 30, 2017 22:18, "Pine W"  wrote:
>>
>> > Wikiscan is an interesting tool for statistics fans. I suggest briefly
>> > reading this IEG page
>> > , then
>> > playing with the tool on https://wikiscan.org/
>> >
>> > Pine
>> > ___
>> > Wikitech-l mailing list
>> > Wikitech-l@lists.wikimedia.org
>> > https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>> ___
>> Wikitech-l mailing list
>> Wikitech-l@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
>
>
> ___
> Analytics mailing list
> analyt...@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>
___
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l