https://bugzilla.wikimedia.org/show_bug.cgi?id=67411

[email protected] changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Whiteboard|                            |u=Community
                   |                            |c=General/Unknown p=0
                   |                            |s=2014-06-26

--- Comment #6 from [email protected] ---
(In reply to Andre Klapper from comment #1)
> So is there an indicator that the issue is with Wikimedia's data, and not
> with stats.grok.se processing it?

stats.grok.se typically reads our files without problems.
webstatscollector (the one producing those files) is more hairy.
It broke before. Especially around non-latin characters, it caused
issues before.

I'll check the files.

However, the way things look ... I am not sure if something is
broken. Given that we're measuring against edits and how
webstatscollector is filtering, everything might just be fine^Wwithin
expectations.

Checking it nonetheless.



(In reply to Oliver Keyes from comment #2)
> *Direct comparisons with edit events isn't possible because multiple edit
> events can be launched from a single pageview, [...]

Right. And for example bots need not do a pageview (in
webstatscollector sense). They can edit right away.



> and edit events themselves
> are excluded from the counter (wrong MIME type)

webstatscollector does not care about MIME types, and counts requests
regardless of MIME types.

However, webstatscollector cares about "/wiki/" being in the URL. And
for edits, they are typically made through the API or directly through
/w/index.php. None of which have "/wiki/" in the URL and hence do not
get counted by webstatscollector.



(In reply to Bawolff (Brian Wolff) from comment #3)
> Aren't these stats based on sampling as well?

It's one of the few parts that is unsampled :-)
stats.grok.se is driven by

  http://dumps.wikimedia.org/other/pagecounts-raw/

which is the output of webstatscollector, which consumes the full
unsampled firehose (well ... there is some packet loss).

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to