JAllemandou added a comment.
I continued my analysis today looking at top-100 parsed user-agents from both queries-with-referer subset, and queries-without-referer subset, over the month of September. See https://phabricator.wikimedia.org/P12933 - The queries-with-referer have a defined user-agent. meaning that the user-agent-parser we use to extract structured information from the user-agent line provides values for a lot of its fields. By looking at the top-100 user-agents we actually cover more than 90% of requests made with referer - The queries-without-referer have either an undefined or `Spider` user-agent, meaning that the user-agent line is either not parseable or is parsed as a bot. I inspected manually the user-agent lines and confirm that most of the user-agent lines looks like bots (particularly the ones making most requests). By looking at the top 100 user-agents we also cover more than 90% of requests made without referer. This confirms that, despite being small, the requests providing a referer seems trustworthy. There is therefore nothing more to for this task, data is already available. TASK DETAIL https://phabricator.wikimedia.org/T261841 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Zbyszko, JAllemandou Cc: CBogen, JAllemandou, Aklapper, Gehel, Alter-paule, Beast1978, Un1tY, Akuckartz, Hook696, darthmon_wmde, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
_______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
