AndrewTavis_WMDE added a comment.

  I've added the numbers for February to the sheet based on the first DAG run 
and also just went through the query job one final time to check. The queries 
that are being ran by the job are directly from the original queries with only 
a few minor changes:
  
  For counting the filtered user agents we're doing the following:
  
    count(
        DISTINCT CASE
            WHEN user_agent
            NOT LIKE 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/% 
(KHTML, like Gecko) Chrome/% Safari/%'
            THEN user_agent
        END
    ) AS total_filtered_user_agents,
  
  ... instead of:
  
    SELECT
        count(DISTINCT user_agent) AS total_filtered_user_agents
    
    ...
    
    WHERE
        AND user_agent NOT LIKE 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) 
AppleWebKit/% (KHTML, like Gecko) Chrome/% Safari/%'
  
  Within the `WHERE` clause we are further adding `webrequest_source = 'text'` 
as discussed, which was suggested by WMF data engineering and meaning that we 
are not losing any any information, but rather that we are querying from a 
subset of information that included our original results.
  
  I'll update the numbers for March once the next DAG run is finished at the 
start of next week!

TASK DETAIL
  https://phabricator.wikimedia.org/T342559

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: AndrewTavis_WMDE
Cc: AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414, 
Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, 
Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer, 
_jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Jdforrester-WMF, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to