JAllemandou added a comment.
Heya - I'm sorry I completely missed the ping :S
Quick analysis:
spark.sql("SELECT (http.request_headers['referer'] IS NOT NULL) as
defined_referer, count(1) as c from event.wdqs_external_sparql_query where year
= 2020 and month = 9 group by (http.request_headers['referer'] IS NOT NULL)
limit 100").show(100, false)
+---------------+---------+
|defined_referer|c |
+---------------+---------+
|false |165201676|
|true |5613278 |
+---------------+---------+
--> 3.3% of requests have referer defined for September
Among those 3.3%, here is the top 10:
spark.sql("SELECT http.request_headers['referer'] as referer, count(1) as c
from event.wdqs_external_sparql_query where year = 2020 and month = 9 and
http.request_headers['referer'] IS NOT NULL group by
http.request_headers['referer'] order by c desc limit 10").show(10, false)
+-------------------------------------------------+-------+
|referer |c |
+-------------------------------------------------+-------+
|https://query.wikidata.org/ |2730003|
|https://labs.minutelabs.io/Tree-of-Life-Explorer/|307426 |
|https://www.wikidata.org/ |212431 |
|https://labs.minutelabs.io/ |138757 |
|https://ru.wikipedia.org/ |107558 |
|https://query.wikidata.org/embed.html |102165 |
|https://wlmuk.toolforge.org/ |96946 |
|https://maps.wikilovesmonuments.org/ |89894 |
|https://wikishootme.toolforge.org/ |87632 |
|https://en.wikipedia.org/ |62147 |
+-------------------------------------------------+-------+
--> Using headers over a month, https://query.wikidata.org/ queries represent
1.6% of queries.
Having 3.3% of referer seems small. If someone with better gut-feeling of
that could chime-in that's be great, otherwise I'm gonna try to do more
advanced user-agent analysis on the data and try to judge if it feels organix
or not.
TASK DETAIL
https://phabricator.wikimedia.org/T261841
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Zbyszko, JAllemandou
Cc: CBogen, JAllemandou, Aklapper, Gehel, Alter-paule, Beast1978, Un1tY,
Akuckartz, Hook696, darthmon_wmde, Kent7301, joker88john, CucyNoiD, Nandana,
Namenlos314, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan,
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst,
LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, Jonas,
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs