| chelsyx added a comment. |
Hello @thiemowmde ! The purpose of T177353 and its parent ticket T174519: [epic] SDoC: Determine baseline for metrics is to figure out a baseline for metrics on Commons in order to measure future successes for the #structured-data-commons (SDoC) project. The SDoC team and us (#discovery-analysis) came up with a list of stuff that would be interesting to measure, and create T177353 and other child tickets (see T174519 for more details). There is a exploratory nature in this work: some metrics in the list are clearly defined, while some -- for example, what is the exact meaning of "unhelpful" -- are not. Any ideas and comments are very welcome!
The Titleblacklist is used to block certain file names (generic, spam, etc.) through mw:Extension:Title blacklist when users try to upload files with these invalid names. However, regular _expression_ is not perfect and there are still some files with "unhelpful" names got uploaded -- e.g. File:Img-071129152243-0001.png and those in the move log whose change reason is meaningless or ambiguous, which now requires human to identify. That's why I'm thinking about using a machine learning model to help identify these files.
Cc: thiemowmde, Aklapper, Abit, Ramsey-WMF, mpopov, chelsyx, Lahi, PDrouin-WMF, Gq86, E1presidente, SandraF_WMF, GoranSMilovanovic, QZanden, Tramullas, Acer, Susannaanas, Aschroet, Jane023, Wikidata-bugs, PKM, Base, matthiasmullie, aude, Ricordisamoa, Fabrice_Florin, Raymond, Steinsplitter, Mbch331
_______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
