chelsyx added a comment.

Hi @Nuria , the numbers I showed above are cumulative sum at the end of each month, while the numbers you talked about are newly uploads for each month. From my query, for Dec 2016, the number of newly uploaded files by bots are 392,566, by users = 392,786. This is closed to what is shown on https://stats.wikimedia.org/wikispecial/EN/TablesWikipediaCOMMONS.htm.

I think the differences came from two sources:
1, I assume the numbers on https://stats.wikimedia.org/wikispecial/EN/TablesWikipediaCOMMONS.htm are computed at the end of each month and files could be deleted afterwards. For the numbers above, I used the image table and only counts the files that are still there on Oct 12, 2017.
2, According to commons bots, not all accounts being operated as bots has a bot flag, so I also include accounts with the keywords "bot_flag" or "bots" (see the query below).

Query for counting newly uploaded files on commons:

SELECT LEFT(img_timestamp, 6) AS yr_month, user_group, COUNT(*) AS n_files
FROM (
-- Get active/inactive bots
SELECT ug_user AS user_id, ug_group AS user_group
FROM user_groups
WHERE ug_group = 'bot'
UNION
SELECT ufg_user AS user_id, ufg_group AS user_group
FROM user_former_groups
WHERE ufg_group = 'bot'
UNION
-- Get user ids with bot categories in their user pages
SELECT user.user_id, 'bot' AS user_group
FROM user INNER JOIN (
  -- all user page names with bot category
  SELECT REPLACE(page.page_title, '_', ' ') AS user_name
  FROM page INNER JOIN (
    -- page ids with bot categories 
    SELECT DISTINCT cl_from AS page_id
    FROM categorylinks
    WHERE cl_to REGEXP '_(bot_flag|bots)(_|$)'
      AND cl_type = 'page'
  ) AS bot_cat ON page.page_id=bot_cat.page_id
  WHERE page_namespace = 2
) AS bot_name ON user.user_name=bot_name.user_name
) AS bots RIGHT JOIN image ON bots.user_id = image.img_user
GROUP BY LEFT(img_timestamp, 6), user_group;

TASK DETAIL
https://phabricator.wikimedia.org/T177354

EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: chelsyx
Cc: Nuria, Liuxinyu970226, Aklapper, mpopov, chelsyx, Abit, SandraF_WMF, Ramsey-WMF, Capt_Swing, debt, E1presidente, Jmmuguerza, GoranSMilovanovic, QZanden, EBjune, Acer, Avner, Gehel, FloNight, Susannaanas, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Steinsplitter, Mbch331
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to