Addshore created this task.
Herald added subscribers: Zppix, Aklapper.
TASK DESCRIPTION
When running a query on the webrequest table for 2016/05/10 on wikidata wiki
looking at the page Special:RecentChangesLinked grouped by agent type I get
> count agent_type
> 299 user
> 123347 spider
The same timeframe through the pageview api shows very different numbers.
>
{"items":[{"project":"wikidata","article":"Special:RecentChangesLinked","granularity":"daily","timestamp":"2016051000","access":"all-access","agent":"all-agents","views":91400}]}
>
{"items":[{"project":"wikidata","article":"Special:RecentChangesLinked","granularity":"daily","timestamp":"2016051000","access":"all-access","agent":"spider","views":72}]}
>
{"items":[{"project":"wikidata","article":"Special:RecentChangesLinked","granularity":"daily","timestamp":"2016051000","access":"all-access","agent":"bot","views":0}]}
>
{"items":[{"project":"wikidata","article":"Special:RecentChangesLinked","granularity":"daily","timestamp":"2016051000","access":"all-access","agent":"user","views":91328}]}
As can be seen there should be very few actual user hit here, but the api
reports tens of thousands.
https://wikimedia.org/api/rest_v1/metrics/pageviews/per-article/wikidata/all-access/user/Special%3ARecentChangesLinked/daily/20160510/20160510
It seems the data is correct in webrequest but is getting messed up when
being further processed for the page view API.
The query used on webrequest is below (sorry if it looks terrible):
SELECT
count(*) as count, agent_type
FROM
webrequest
WHERE
year = 2016
AND month = 5
AND day = 10
AND uri_host = "www.wikidata.org"
AND http_status = 200
AND http_method = "GET"
AND uri_path LIKE "/wiki/Special:RecentChangesLinked%"
GROUP BY agent_type
ORDER BY count
LIMIT 999999;
TASK DETAIL
https://phabricator.wikimedia.org/T135164
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Addshore
Cc: Lydia_Pintscher, JAllemandou, Aklapper, madhuvishy, Addshore, Zppix,
D3r1ck01, Izno, Wikidata-bugs, aude, Mbch331, jeremyb
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs