JAllemandou added a comment.

  Heya - I'm sorry I completely missed the ping :S
  Quick analysis:
  
    spark.sql("SELECT (http.request_headers['referer'] IS NOT NULL) as 
defined_referer, count(1) as c from event.wdqs_external_sparql_query where year 
= 2020 and month = 9 group by (http.request_headers['referer'] IS NOT NULL) 
limit 100").show(100, false)
    +---------------+---------+                                                 
    
    |defined_referer|c        |
    +---------------+---------+
    |false          |165201676|
    |true           |5613278  |
    +---------------+---------+
  
  --> 3.3% of requests have referer defined for September
  
  Among those 3.3%, here is the top 10:
  
    spark.sql("SELECT http.request_headers['referer'] as referer, count(1) as c 
from event.wdqs_external_sparql_query where year = 2020 and month = 9 and 
http.request_headers['referer'] IS NOT NULL group by 
http.request_headers['referer'] order by c desc limit 10").show(10, false)
    +-------------------------------------------------+-------+                 
    
    |referer                                          |c      |
    +-------------------------------------------------+-------+
    |https://query.wikidata.org/                      |2730003|
    |https://labs.minutelabs.io/Tree-of-Life-Explorer/|307426 |
    |https://www.wikidata.org/                        |212431 |
    |https://labs.minutelabs.io/                      |138757 |
    |https://ru.wikipedia.org/                        |107558 |
    |https://query.wikidata.org/embed.html            |102165 |
    |https://wlmuk.toolforge.org/                     |96946  |
    |https://maps.wikilovesmonuments.org/             |89894  |
    |https://wikishootme.toolforge.org/               |87632  |
    |https://en.wikipedia.org/                        |62147  |
    +-------------------------------------------------+-------+
  
  --> Using headers over a month, https://query.wikidata.org/ queries represent 
1.6% of queries.
  
  Having 3.3% of referer seems small. If someone with better gut-feeling of 
that could chime-in that's be great, otherwise I'm gonna try to do more 
advanced user-agent analysis on the data and try to judge if it feels organix 
or not.

TASK DETAIL
  https://phabricator.wikimedia.org/T261841

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Zbyszko, JAllemandou
Cc: CBogen, JAllemandou, Aklapper, Gehel, Alter-paule, Beast1978, Un1tY, 
Akuckartz, Hook696, darthmon_wmde, Kent7301, joker88john, CucyNoiD, Nandana, 
Namenlos314, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, Jonas, 
Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to