[Wikidata-bugs] [Maniphest] T344052: [Analytics] Join information from access-tables (like webrequest) with edit-tables tables (that include oldids)

2023-09-01 Thread Manuel
Manuel added a comment.


  I have moved this comment from T336361#9134815 
,
  
  In T336361#9134815 , 
@AndrewTavis_WMDE wrote:
  
  > As far as totals for this task are concerned, @Manuel, what I'm getting is:
  >
  > **Population for the following is**:
  >
  > All views of `www.wikidata.org` and `m.wikidata.org` inclusively between 
01/7/2023 and 31/7/2023 that are further `agent_type != 'spider'` and mobile 
users with `user_agent_map.os_family = 'Android'`, `iOS` or `KaiOS`.
  >
  >   WITH pageview_actor_os_families AS (
  >   SELECT
  >   user_agent_map.os_family AS os_family
  >   
  >   FROM 
  >   wmf.pageview_actor
  >   
  >   WHERE
  >   year = 2023
  >   AND month = 7
  >   AND '2023-07-01' <= dt
  >   AND dt < '2023-08-01'
  >   AND uri_host IN ('www.wikidata.org', 'm.wikidata.org')
  >   AND is_pageview = True
  >   AND agent_type = 'user'
  >   )
  >   
  >   SELECT
  >   COUNT(*) AS total_views
  >   
  >   FROM
  >   pageview_actor_os_families
  >   
  >   WHERE
  >   os_family = 'Android'
  >   OR os_family = 'iOS'
  >   OR os_family = 'KaiOS'
  >
  > ... gives us `699,968`.
  >
  > **Population for the following is**:
  >
  > All edits of `wiki_db = 'wikidatawiki'` inclusively between 01/7/2023 and 
31/7/2023 that are further mobile users with `user_agent_map.os_family = 
'Android'`, `iOS` or `KaiOS`.
  >
  > For edits I need some help filtering out bots as we don't have the option 
of going from webrequests as mentioned above and I'm having a hard time 
figuring out how to access MediaWiki tables directly for `user_groups`. With 
that being said, I currently have the following that we just need to get the 
bots out of:
  >
  >   WITH cu_changes_os_families AS (
  >   SELECT
  >   ua_parser(cuc_agent) AS user_agent_map
  >   
  >   FROM 
  >   wmf_raw.mediawiki_private_cu_changes
  >   
  >   WHERE
  >   wiki_db = 'wikidatawiki'
  >   AND month = '2023-07'
  >   AND '20230701' <= cuc_timestamp
  >   AND cuc_timestamp < '20230801'
  >   )
  >   
  >   SELECT
  >   COUNT(*) AS total_edits
  >   
  >   FROM
  >   cu_changes_os_families
  >   
  >   WHERE
  >   user_agent_map.os_family = 'Android'
  >   OR user_agent_map.os_family = 'iOS'
  >   OR user_agent_map.os_family = 'KaiOS'
  >
  > ... gives us `221,073`. Let me know if you have any thoughts about how to 
make the connection to the MediaWiki tables or if not also where we could get 
bot status from! :)

TASK DETAIL
  https://phabricator.wikimedia.org/T344052

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Manuel
Cc: AndrewTavis_WMDE, Aklapper, Manuel, Danny_Benjafield_WMDE, Astuthiodit_1, 
karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, 
Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T344052: [Analytics] Join information from access-tables (like webrequest) with edit-tables tables (that include oldids)

2023-09-01 Thread Manuel
Manuel added a subscriber: AndrewTavis_WMDE.
Manuel added a comment.


  I have moved this comment from T336361#9134372 
.
  
  In T336361#9134372 , 
@AndrewTavis_WMDE wrote:
  
  > @Manuel, I think we'll need to abondon the idea of creating an edits subset 
of `wmf.webrequests`, sadly :( The following would be where we'd find the 
various actions that we'd need to collect to define as edits fully: 
https://www.wikidata.org/w/api.php. We know at the very least that we'd want 
`uri_query LIKE '?action=edit%'` and similarly `wbsetclaim` and `wbeditentity`, 
but figuring out what else needs to be added seems to be prohibitive given the 
discrepancy:
  >
  >   
  >   SELECT
  >   COUNT(*) AS total_edits
  >   
  >   FROM 
  >   wmf.webrequest
  >   
  >   WHERE
  >   year = 2023
  >   AND month = 7
  >   AND day = 31
  >   AND uri_host IN ('www.wikidata.org', 'm.wikidata.org')
  >   AND (
  >   uri_query LIKE '?action=edit%'
  >   OR uri_query LIKE '?action=wbsetclaim%'
  >   OR uri_query LIKE '?action=wbeditentity%'
  >   )
  >
  > ... gives us `25,374` (`26,329` if we do `%wbsetclaim%` and 
`%wbeditentity%`), and the following:
  >
  >   SELECT
  >   COUNT(*) AS total_edits
  >   
  >   FROM 
  >   wmf_raw.mediawiki_private_cu_changes
  >   
  >   WHERE
  >   wiki_db = 'wikidatawiki'
  >   AND month = '2023-07'
  >   AND '20230731' <= cuc_timestamp
  >   AND cuc_timestamp < '20230801'
  >
  > ... gives us `657,347`, with `25374/657347` being `3.86%`. This way of 
doing this is gonna get really messy as well as there are apparently 
combinations of actions and some actions also have `edit` in their names.
  >
  > There definitely should be a combination of those actions that gets us a 
similar number, but this would be something that we'd need to loop WMF into, 
and the easiest route would likely be to talk to them about getting an edits 
subset similar to `pageview_actor` as a table in the Data Lake.

TASK DETAIL
  https://phabricator.wikimedia.org/T344052

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Manuel
Cc: AndrewTavis_WMDE, Aklapper, Manuel, Danny_Benjafield_WMDE, Astuthiodit_1, 
karapayneWMDE, Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, 
Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T344052: [Analytics] Join information from access-tables (like webrequest) with edit-tables tables (that include oldids)

2023-09-01 Thread Manuel
Manuel renamed this task from "[Analytics] Join edits in `webrequest` with 
other tables" to "[Analytics] Join information from access-tables (like 
webrequest) with edit-tables tables (that include oldids) ".

TASK DETAIL
  https://phabricator.wikimedia.org/T344052

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Manuel
Cc: Aklapper, Manuel, Danny_Benjafield_WMDE, Astuthiodit_1, karapayneWMDE, 
Invadibot, maantietaja, ItamarWMDE, Akuckartz, Nandana, Lahi, Gq86, 
GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org