AndrewTavis_WMDE added a comment.
I've added the numbers for February to the sheet based on the first DAG run
and also just went through the query job one final time to check. The queries
that are being ran by the job are directly from the original queries with only
a few minor changes:
For counting the filtered user agents we're doing the following:
count(
DISTINCT CASE
WHEN user_agent
NOT LIKE 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/%
(KHTML, like Gecko) Chrome/% Safari/%'
THEN user_agent
END
) AS total_filtered_user_agents,
... instead of:
SELECT
count(DISTINCT user_agent) AS total_filtered_user_agents
...
WHERE
AND user_agent NOT LIKE 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)
AppleWebKit/% (KHTML, like Gecko) Chrome/% Safari/%'
Within the `WHERE` clause we are further adding `webrequest_source = 'text'`
as discussed, which was suggested by WMF data engineering and meaning that we
are not losing any any information, but rather that we are querying from a
subset of information that included our original results.
I'll update the numbers for March once the next DAG run is finished at the
start of next week!
TASK DETAIL
https://phabricator.wikimedia.org/T342559
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: AndrewTavis_WMDE
Cc: AndrewTavis_WMDE, Manuel, Aklapper, Danny_Benjafield_WMDE, S8321414,
Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz,
Nandana, Lahi, Gq86, GoranSMilovanovic, QZanden, KimKelting, LawExplorer,
_jensen, rosalieper, Scott_WUaS, Wikidata-bugs, aude, Jdforrester-WMF, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]