[forwarding my answer from analytics ml, I forgot to subscribe to this list too]

Hi Robert,
one solution may be to use a query on Wikidata to retrieve the name
for the stubs category in all the different languages. Then you could
use a tool like PetScan to retrive all the pages in such categories,
or write your own tool by using either a query on the database or
Mediawiki API.
You can find a sample solution here:
http://paws-public.wmflabs.org/paws-public/3270/Stub%20categories.ipynb

I wrote that thing while on a train, so it may be messy and/or  sub-optimal.
I would like to thank Alex Monk and Yuvi Panda for their help with SQL
on paws today.

Best,
Giuseppe

2016-09-20 11:26 GMT+02:00 Robert West <w...@cs.stanford.edu>:
> Hi everyone,
>
> Does anyone know if there's a straightforward (ideally language-independent)
> way of identifying stub articles in Wikipedia?
>
> Whatever works is ok, whether it's publicly available data or data
> accessible only on the WMF cluster.
>
> I've found lists for various languages (e.g., Italian or English), but the
> lists are in different formats, so separate code is required for each
> language, which doesn't scale.
>
> I guess in the worst case, I'll have to grep for the respective stub
> templates in the respective wikitext dumps, but even this requires to know
> for each language what the respective template is. So if anyone could point
> me to a list of stub templates in different languages, that would also be
> appreciated.
>
> Thanks!
> Bob
>
> --
> Up for a little language game? -- http://www.unfun.me
>
> _______________________________________________
> Analytics mailing list
> analyt...@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>

_______________________________________________
Wiki-research-l mailing list
Wiki-research-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Reply via email to