[Wikidata-bugs] [Maniphest] [Commented On] T177354: Metrics for SDoC: look at contributions
Nuria added a comment. Are there any docs we can look at with metrics?TASK DETAILhttps://phabricator.wikimedia.org/T177354EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyx, NuriaCc: Nuria, Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, Lahi, PDrouin-WMF, Gq86, E1presidente, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Jane023, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Steinsplitter, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T177354: Metrics for SDoC: look at contributions
chelsyx added a comment. Good idea! Thanks @Nuria !TASK DETAILhttps://phabricator.wikimedia.org/T177354EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyxCc: Nuria, Liuxinyu970226, Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Steinsplitter, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T177354: Metrics for SDoC: look at contributions
Nuria added a comment. @chelsyx That makes sense, thank you. I was also trying to make a meta point though: since prior work and statistics exist for commons it will be worth documenting ( on meta?) these numbers and why/how they differ with other numbers community might have access to. I know some of your discovery work has been going to meta but i doubt it is looked by our community there, findings should probably be documented here: https://meta.wikimedia.org/wiki/Structured_Data_on_Commons (even if your more technical work remains on github)TASK DETAILhttps://phabricator.wikimedia.org/T177354EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyx, NuriaCc: Nuria, Liuxinyu970226, Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Steinsplitter, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T177354: Metrics for SDoC: look at contributions
chelsyx added a comment. Hi @Nuria , the numbers I showed above are cumulative sum at the end of each month, while the numbers you talked about are newly uploads for each month. From my query, for Dec 2016, the number of newly uploaded files by bots are 392,566, by users = 392,786. This is closed to what is shown on https://stats.wikimedia.org/wikispecial/EN/TablesWikipediaCOMMONS.htm. I think the differences came from two sources: 1, I assume the numbers on https://stats.wikimedia.org/wikispecial/EN/TablesWikipediaCOMMONS.htm are computed at the end of each month and files could be deleted afterwards. For the numbers above, I used the image table and only counts the files that are still there on Oct 12, 2017. 2, According to commons bots, not all accounts being operated as bots has a bot flag, so I also include accounts with the keywords "bot_flag" or "bots" (see the query below). Query for counting newly uploaded files on commons: SELECT LEFT(img_timestamp, 6) AS yr_month, user_group, COUNT(*) AS n_files FROM ( -- Get active/inactive bots SELECT ug_user AS user_id, ug_group AS user_group FROM user_groups WHERE ug_group = 'bot' UNION SELECT ufg_user AS user_id, ufg_group AS user_group FROM user_former_groups WHERE ufg_group = 'bot' UNION -- Get user ids with bot categories in their user pages SELECT user.user_id, 'bot' AS user_group FROM user INNER JOIN ( -- all user page names with bot category SELECT REPLACE(page.page_title, '_', ' ') AS user_name FROM page INNER JOIN ( -- page ids with bot categories SELECT DISTINCT cl_from AS page_id FROM categorylinks WHERE cl_to REGEXP '_(bot_flag|bots)(_|$)' AND cl_type = 'page' ) AS bot_cat ON page.page_id=bot_cat.page_id WHERE page_namespace = 2 ) AS bot_name ON user.user_name=bot_name.user_name ) AS bots RIGHT JOIN image ON bots.user_id = image.img_user GROUP BY LEFT(img_timestamp, 6), user_group;TASK DETAILhttps://phabricator.wikimedia.org/T177354EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyxCc: Nuria, Liuxinyu970226, Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Steinsplitter, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T177354: Metrics for SDoC: look at contributions
Nuria added a comment. Is the user versus bot percentage overall? I am not sure that is of value to quantify usage as of 2017, right? See timeseries of uploads by bots/users at https://stats.wikimedia.org/wikispecial/EN/TablesWikipediaCOMMONS.htm (scroll down) Most recent monthly numbers (for December 2016) differ quite a bit from the percentages @chelsyx lists above , they are more like 50/50 so it will be worth checking if the select above can repro those numbers and maybe look at data monthly? Dec 2016 total=806,459 bots=392,565 user=413,894TASK DETAILhttps://phabricator.wikimedia.org/T177354EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyx, NuriaCc: Nuria, Liuxinyu970226, Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Steinsplitter, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T177354: Metrics for SDoC: look at contributions
chelsyx added a comment. Codebase and output: https://github.com/wikimedia-research/SDoC-Initial-Metrics/tree/master/T177354TASK DETAILhttps://phabricator.wikimedia.org/T177354EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyxCc: Liuxinyu970226, Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Steinsplitter, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T177354: Metrics for SDoC: look at contributions
chelsyx added a comment. @mpopov yup, I will put my stuff in the repo.TASK DETAILhttps://phabricator.wikimedia.org/T177354EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyxCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T177354: Metrics for SDoC: look at contributions
mpopov added a comment. @chelsyx do you wanna add your stuff to https://github.com/wikimedia-research/SDoC-Initial-Metrics ?TASK DETAILhttps://phabricator.wikimedia.org/T177354EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyx, mpopovCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T177354: Metrics for SDoC: look at contributions
chelsyx added a comment. The following two graphs breakdown the number by month: F10169825: nfile_bot_month.png F10169827: nfile_bot_month_prop.pngTASK DETAILhttps://phabricator.wikimedia.org/T177354EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyxCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T177354: Metrics for SDoC: look at contributions
chelsyx added a comment. Updated: On Oct 12, 2017, the number of files uploaded by bots is 9,390,721 (22.03%), and the number of files uploaded by users is 33,241,541 (77.97%). The following table break down the counts by media type: Media TypeUser GroupNumber of FilesProportion bitmapuser3135534373.55% bitmapbot884344720.74% drawinguser9059642.13% drawingbot2705160.63% audiouser6985661.64% audiobot956460.22% videouser717380.17% videobot363290.09% multimediauser40% officeuser2099260.49% officebot1447830.34% TASK DETAILhttps://phabricator.wikimedia.org/T177354EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyxCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T177354: Metrics for SDoC: look at contributions
chelsyx added a comment. @mpopov Looks like the file type categorization on commons is messier than we thought... For example, File:Krazy_Kat_Bugolist_1916_silent.ogv is an ogv file, but its img_minor_mime is ogg, img_major_mime is application, and img_media_type is video. This is the same for other ogv files. While for ogg files like File:Whitenoisesound.ogg, its img_minor_mime is ogg, img_major_mime is application, and img_media_type is audio. Not sure if the field img_media_type is more trustworthy...TASK DETAILhttps://phabricator.wikimedia.org/T177354EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyxCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T177354: Metrics for SDoC: look at contributions
mpopov added a comment. In T177354#3676545, @chelsyx wrote: Unfortunately, the mediawiki snapshot doesn't has the image table which describes images and other uploaded files. Ah, yeah. I missed the reference to image in your query. But looks like we can use img_timestamp, although those queries will take some time. Also something to note is that img_major_mime shows up as "application" for .ogg files (which are audio files) and .pdf files: SELECT DISTINCT img_major_mime, img_minor_mime FROM commonswiki.image; img_major_mimeimg_minor_mime imagegif imagejpeg imagepng imagetiff imagevnd.djvu imagewebp imagex-xcf imagesvg+xml applicationogg audiomidi audiowav audiowebm audiox-flac videowebm applicationpdf I recommend adding a CASE that returns "audio" for ogg files and "document" (for example) for PDFs.TASK DETAILhttps://phabricator.wikimedia.org/T177354EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyx, mpopovCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T177354: Metrics for SDoC: look at contributions
chelsyx added a comment. Hey @chelsyx - what time frame does this cover? Jumping in to say this looks like it's from launch of Commons to now. Thanks @mpopov ! Yes, this is the file counts on Oct 10. Can we also get a count of how this has changed over the last week and compare that to the last 30 days? It'd be interesting to see if the numbers are fairly consistent (individual vs institution) or if they have changed quite a bit when extending the time scope. @chelsyx this may be useful: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits as it contains monthly snapshots of the page & user tables as of April 2017 Unfortunately, the mediawiki snapshot doesn't has the image table which describes images and other uploaded files.TASK DETAILhttps://phabricator.wikimedia.org/T177354EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyxCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T177354: Metrics for SDoC: look at contributions
mpopov added a comment. In T177354#3675988, @debt wrote: Hey @chelsyx - what time frame does this cover? Jumping in to say this looks like it's from launch of Commons to now. Can we also get a count of how this has changed over the last week and compare that to the last 30 days? It'd be interesting to see if the numbers are fairly consistent (individual vs institution) or if they have changed quite a bit when extending the time scope. @chelsyx this may be useful: https://wikitech.wikimedia.org/wiki/Analytics/Data_Lake/Edits as it contains monthly snapshots of the page & user tables as of April 2017TASK DETAILhttps://phabricator.wikimedia.org/T177354EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyx, mpopovCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T177354: Metrics for SDoC: look at contributions
debt added a comment. Hey @chelsyx - what time frame does this cover? Can we also get a count of how this has changed over the last week and compare that to the last 30 days? It'd be interesting to see if the numbers are fairly consistent (individual vs institution) or if they have changed quite a bit when extending the time scope.TASK DETAILhttps://phabricator.wikimedia.org/T177354EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: chelsyx, debtCc: Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Mbch331___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs