I think there 2 routes that we can follow, for a part an automated one and
for a part one with involving communities to help out. With many thousands
of items, I hope we can first try to do the bulk in an automated way,
because over an million is too much labour intensive. Also such high
numbers (20 000+) can work demotivating for communities to start on them.
Then a second round with community input? I suspect that most items will
get a P31, and those who need a subclass of are more limited and more
complex, so there community input is welcome.

Dividing it in parts that somehow form a group together is certainly a good
approach. This I also did for adding countries: I often then work per
identifier or per sitelink to one Wikipedia, working it down before I move
to another group. With countries, many Wikipedias have infoboxes with in it
one row "Country | Foo Bar". I hope infoboxes can be used for P31 too. Does
anyone know how to extract data from infoboxes and adding it on Wikidata?

Romaine



PS: Yes, the elephant in the china shop also exist, but that is another
expression and with a different meaning.
For the current stage on working to get this issue solved, I prefer the
idea of Schrödinger's data
<https://en.wikipedia.org/wiki/Schr%C3%B6dinger%27s_cat>, until we add the
data to an item, we do not know if we have a source we can extract the data
from exist or not.





Op vr 28 feb 2025 om 23:35 schreef James Heald <[email protected]>:

> Breakdown by wiki of the number of items with sitelinks but no statements:
>    https://w.wiki/DFDh
>
> Led by
> * English wikipedia (68,000 articles),
> * Kazakh wiki (42,000 articles),
> * Polish wiki (30,000 articles),
> * Nepalese Newari wiki (25,000 articles),
> * Chinese wiki (21,500 articles)
> * Spanish wiki (21,500 articles)
>
> Note that some of these "articles" are in fact redirects.
>
>    -- James.
>
>
>
> On 28/02/2025 22:18, James Heald wrote:
> > Further to the below, this query https://w.wiki/DFDJ using a random
> > sample finds that
> >
> > * about 9.7% of items without statements have no wikipedia links.
> >
> > That's about 80,000 items, which is more than Yaron Koren found --
> > probably because I'm only including sitelinks to actual wikipedias, not
> > wikisource wikivoyage or wikicommons.
> >
> > * about 86% have one wikipedia link  (713,000 items)
> >
> > * 3.5% have two wikipedia links (29,000 items), 0.3% (about 2500 items)
> > have three wikipedia links
> >
> > So to fix these items will require analysing what information can be
> > extracted from the wiki articles.
> >
> >    -- James.
> >
> _______________________________________________
> Wikidata mailing list -- [email protected]
> Public archives at
> https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/7ROBPGG7RVBFUNVK3T5JZETRCYLV6TPC/
> To unsubscribe send an email to [email protected]
>
_______________________________________________
Wikidata mailing list -- [email protected]
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/734J2AN46KYICME45DI25TWS7EWNXADZ/
To unsubscribe send an email to [email protected]

Reply via email to