date:20160920

[Wiki-research-l] Identifying Wikipedia stubs in various languages

2016-09-20 Thread Robert West

Hi everyone, Does anyone know if there's a straightforward (ideally language-independent) way of identifying stub articles in Wikipedia? Whatever works is ok, whether it's publicly available data or data accessible only on the WMF cluster. I've found lists for various languages (e.g., Italian

Re: [Wiki-research-l] Identifying Wikipedia stubs in various languages

2016-09-20 Thread Stuart A. Yeates

en:WP:DYK has a measure of 1,500+ characters of prose, which is a useful cutoff. There is weaponised javascript to measure that at en:WP:Did you know/DYKcheck Probably doesn't translate to CJK languages which have radically different information content per character. cheers stuart -- ...let us

Re: [Wiki-research-l] Identifying Wikipedia stubs in various languages

2016-09-20 Thread Morten Wang

I don't know of a clean, language-independent way of grabbing all stubs. Stuart's suggestion is quite sensible, at least for English Wikipedia. When I last checked a few years ago, the mean length of an English language stub (on a log-scale) is around 1kB (including all markup), and they're quite m

Re: [Wiki-research-l] Identifying Wikipedia stubs in various languages

2016-09-20 Thread Stuart A. Yeates

You _really_ need to exclude markup and include only body text when measuring stubs. It's not uncommon for mass-produced articles with a only one or two sentences of text to approach 1K characters, once you include maintenance templates, content templates, categories, infobox, references, etc, etc

Re: [Wiki-research-l] Identifying Wikipedia stubs in various languages

2016-09-20 Thread Andrew Gray

Hi all, I'd strongly caution against using the stub categories without *also* doing some kind of filtering on size. There's a real problem with "stub lag" - articles get tagged, incrementally improve, no-one thinks they've done enough to justify removing the tag (or notices the tag is there, or th

[Wiki-research-l] Fwd: [Analytics] Identifying Wikipedia stubs in various languages

2016-09-20 Thread Giuseppe Profiti

[forwarding my answer from analytics ml, I forgot to subscribe to this list too] Hi Robert, one solution may be to use a query on Wikidata to retrieve the name for the stubs category in all the different languages. Then you could use a tool like PetScan to retrive all the pages in such categories,

[Wiki-research-l] Identifying Wikipedia stubs in various languages

Re: [Wiki-research-l] Identifying Wikipedia stubs in various languages

Re: [Wiki-research-l] Identifying Wikipedia stubs in various languages

Re: [Wiki-research-l] Identifying Wikipedia stubs in various languages

Re: [Wiki-research-l] Identifying Wikipedia stubs in various languages

[Wiki-research-l] Fwd: [Analytics] Identifying Wikipedia stubs in various languages

6 matches

Site Navigation

Mail list logo

Footer information