AndrewTavis_WMDE added a comment.

  Results from the following query to check automate traffic via isSpiderUDF 
<https://github.com/wikimedia/analytics-refinery-source/blob/master/refinery-hive/src/main/java/org/wikimedia/analytics/refinery/hive/IsSpiderUDF.java>
 is that `91.36%` of the `#tool: scholia` queries are automated:
  
    WITH automate_or_not AS (
        SELECT
            is_spider(http['request_headers']['user-agent']) AS is_spider
    
        FROM
            event.wdqs_external_sparql_query
    
        WHERE
            query LIKE '%# tool: scholia%'
    )
    
    SELECT
        is_spider AS is_spider,
        count(*) AS total_queries
        
    FROM
        automate_or_not
        
    GROUP BY
        is_spider
  
  @dcausse and I found the aforementioned UDF for this. Note for reporting: the 
UDF is based on user agents, so a similar comparison for queries that have the 
user agent `"Scholia"` will not work as they'd either all be automate or none 
of them would be.
  
  Shifting now to inspecting queries in the following comparisons:
  
  - `#tool: scholia` queries vs. user agent is `"Scholia"`
  - For `#tool: scholia` queries, those that are spiders and those that aren't

TASK DETAIL
  https://phabricator.wikimedia.org/T353453

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AndrewTavis_WMDE
Cc: Lydia_Pintscher, dcausse, Aklapper, Manuel, Danny_Benjafield_WMDE, 
Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, 
Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org

Reply via email to