JAllemandou added a subscriber: JAllemandou.
JAllemandou added a comment.
Hi, quick questions on that:
Is the need regular, or would one shots make it ?
Also, what level of aggregation ? Daily is good ?
Below is a hive request that makes daily aggregation over (so thought)
interesting dimension.
DISCLAIMER: These request need to scan a BIG volume of data (500Gb per day), so
let's discuss how to handle the thing if you need regular updates.
SELECT
CONCAT(LPAD(year, 4 ,0), '-', LPAD(month, 2, 0), '-', LPAD(day, 2, 0)) as
day,
regexp_extract(uri_path, '^/entity/.+(\\..+)$', 1) AS entity_format,
regexp_extract(uri_path, '^/wiki/Special:EntityData/.+(\\..+)$', 1) AS
special_entity_format,
access_method,
agent_type,
http_status,
COUNT(1) as count
FROM wmf.webrequest
WHERE webrequest_source = 'text'
AND year = 2015
AND month = 11
AND day = 16
AND normalized_host.project_class = 'wikidata'
AND uri_path rlike '^(/entity/|/wiki/Special:EntityData/).*$'
GROUP BY
year, month, day,
access_method,
agent_type,
http_status,
regexp_extract(uri_path, '^/entity/.+(\\..+)$', 1),
regexp_extract(uri_path, '^/wiki/Special:EntityData/.+(\\..+)$', 1)
ORDER BY
day, entity_format, special_entity_format, access_method, agent_type,
http_status
LIMIT 100000;
| day | entity_format | special_entity_format | access_method |
agent_type | http_status | count |
| 2015-11-16 | | | desktop | spider
| 200 | 2 |
| 2015-11-16 | | | desktop | spider
| 301 | 345473 |
| 2015-11-16 | | | desktop | spider
| 302 | 75 |
| 2015-11-16 | | | desktop | spider
| 303 | 312186 |
| 2015-11-16 | | | desktop | spider
| 400 | 21 |
| 2015-11-16 | | | desktop | spider
| 503 | 2 |
| 2015-11-16 | | | desktop | user
| 200 | 18 |
| 2015-11-16 | | | desktop | user
| 301 | 1398 |
| 2015-11-16 | | | desktop | user
| 302 | 38 |
| 2015-11-16 | | | desktop | user
| 303 | 2714 |
| 2015-11-16 | | | desktop | user
| 400 | 25 |
| 2015-11-16 | | | desktop | user
| 429 | 2 |
| 2015-11-16 | | .json | desktop | spider
| 200 | 719297 |
| 2015-11-16 | | .json | desktop | spider
| 301 | 501004 |
| 2015-11-16 | | .json | desktop | spider
| 304 | 10315 |
| 2015-11-16 | | .json | desktop | spider
| 400 | 7 |
| 2015-11-16 | | .json | desktop | spider
| 404 | 1777 |
| 2015-11-16 | | .json | desktop | spider
| 503 | 4 |
| 2015-11-16 | | .json | desktop | user
| 200 | 7675 |
| 2015-11-16 | | .json | desktop | user
| 301 | 97 |
| 2015-11-16 | | .json | desktop | user
| 302 | 10 |
| 2015-11-16 | | .json | desktop | user
| 304 | 1017 |
| 2015-11-16 | | .json | desktop | user
| 400 | 1 |
| 2015-11-16 | | .json | desktop | user
| 404 | 2 |
| 2015-11-16 | | .json | desktop | user
| 429 | 42 |
| 2015-11-16 | | .n3 | desktop | spider
| 200 | 65982 |
| 2015-11-16 | | .n3 | desktop | spider
| 301 | 1952 |
| 2015-11-16 | | .n3 | desktop | spider
| 304 | 17417 |
| 2015-11-16 | | .n3 | desktop | spider
| 404 | 13 |
| 2015-11-16 | | .n3 | desktop | spider
| 503 | 1 |
| 2015-11-16 | | .n3 | desktop | user
| 200 | 169 |
| 2015-11-16 | | .n3 | desktop | user
| 302 | 12 |
| 2015-11-16 | | .nt | desktop | spider
| 200 | 65717 |
| 2015-11-16 | | .nt | desktop | spider
| 301 | 2045 |
| 2015-11-16 | | .nt | desktop | spider
| 304 | 7927 |
| 2015-11-16 | | .nt | desktop | spider
| 404 | 13 |
| 2015-11-16 | | .nt | desktop | spider
| 503 | 1 |
| 2015-11-16 | | .nt | desktop | user
| 200 | 203 |
| 2015-11-16 | | .nt | desktop | user
| 302 | 15 |
| 2015-11-16 | | .org/resource/ | desktop | user
| 400 | 4 |
| 2015-11-16 | | .php | desktop | spider
| 200 | 65394 |
| 2015-11-16 | | .php | desktop | spider
| 301 | 2048 |
| 2015-11-16 | | .php | desktop | spider
| 304 | 7745 |
| 2015-11-16 | | .php | desktop | spider
| 404 | 22 |
| 2015-11-16 | | .php | desktop | user
| 200 | 168 |
| 2015-11-16 | | .php | desktop | user
| 302 | 14 |
| 2015-11-16 | | .rdf | desktop | spider
| 200 | 66840 |
| 2015-11-16 | | .rdf | desktop | spider
| 301 | 2088 |
| 2015-11-16 | | .rdf | desktop | spider
| 304 | 13343 |
| 2015-11-16 | | .rdf | desktop | spider
| 404 | 17 |
| 2015-11-16 | | .rdf | desktop | spider
| 503 | 4 |
| 2015-11-16 | | .rdf | desktop | user
| 200 | 182 |
| 2015-11-16 | | .rdf | desktop | user
| 302 | 10 |
| 2015-11-16 | | .ttl | desktop | spider
| 200 | 867900 |
| 2015-11-16 | | .ttl | desktop | spider
| 301 | 2069 |
| 2015-11-16 | | .ttl | desktop | spider
| 304 | 17560 |
| 2015-11-16 | | .ttl | desktop | spider
| 404 | 35 |
| 2015-11-16 | | .ttl | desktop | spider
| 503 | 24 |
| 2015-11-16 | | .ttl | desktop | user
| 200 | 181 |
| 2015-11-16 | | .ttl | desktop | user
| 302 | 13 |
| 2015-11-16 | | .ttl | desktop | user
| 304 | 1 |
| 2015-11-16 | .json | | desktop | spider
| 301 | 9089 |
| 2015-11-16 | .json | | desktop | spider
| 303 | 4551 |
| 2015-11-16 | .json | | desktop | user
| 303 | 20 |
| 2015-11-16 | .org/resource/ | | desktop | user
| 301 | 4 |
| 2015-11-16 | .org/resource/ | | desktop | user
| 303 | 4 |
| 2015-11-16 | .ttl | | desktop | user
| 303 | 2 |
TASK DETAIL
https://phabricator.wikimedia.org/T64874
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: JAllemandou
Cc: JAllemandou, Halfak, hoo, Addshore, Ricordisamoa, Aklapper, drdee, Tnegrin,
QChris, ezachte, Lydia_Pintscher, daniel, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs