en:WP:DYK has a measure of 1,500+ characters of prose, which is a useful cutoff. There is weaponised javascript to measure that at en:WP:Did you know/DYKcheck
Probably doesn't translate to CJK languages which have radically different information content per character. cheers stuart -- ...let us be heard from red core to black sky On Tue, Sep 20, 2016 at 9:26 PM, Robert West <w...@cs.stanford.edu> wrote: > Hi everyone, > > Does anyone know if there's a straightforward (ideally > language-independent) way of identifying stub articles in Wikipedia? > > Whatever works is ok, whether it's publicly available data or data > accessible only on the WMF cluster. > > I've found lists for various languages (e.g., Italian > <https://it.wikipedia.org/wiki/Categoria:Stub> or English > <https://en.wikipedia.org/wiki/Category:All_stub_articles>), but the > lists are in different formats, so separate code is required for each > language, which doesn't scale. > > I guess in the worst case, I'll have to grep for the respective stub > templates in the respective wikitext dumps, but even this requires to know > for each language what the respective template is. So if anyone could point > me to a list of stub templates in different languages, that would also be > appreciated. > > Thanks! > Bob > > -- > Up for a little language game? -- http://www.unfun.me > > _______________________________________________ > Wiki-research-l mailing list > Wiki-research-l@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > >
_______________________________________________ Wiki-research-l mailing list Wiki-research-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wiki-research-l