I think there 2 routes that we can follow, for a part an automated one and for a part one with involving communities to help out. With many thousands of items, I hope we can first try to do the bulk in an automated way, because over an million is too much labour intensive. Also such high numbers (20 000+) can work demotivating for communities to start on them. Then a second round with community input? I suspect that most items will get a P31, and those who need a subclass of are more limited and more complex, so there community input is welcome.
Dividing it in parts that somehow form a group together is certainly a good approach. This I also did for adding countries: I often then work per identifier or per sitelink to one Wikipedia, working it down before I move to another group. With countries, many Wikipedias have infoboxes with in it one row "Country | Foo Bar". I hope infoboxes can be used for P31 too. Does anyone know how to extract data from infoboxes and adding it on Wikidata? Romaine PS: Yes, the elephant in the china shop also exist, but that is another expression and with a different meaning. For the current stage on working to get this issue solved, I prefer the idea of Schrödinger's data <https://en.wikipedia.org/wiki/Schr%C3%B6dinger%27s_cat>, until we add the data to an item, we do not know if we have a source we can extract the data from exist or not. Op vr 28 feb 2025 om 23:35 schreef James Heald <[email protected]>: > Breakdown by wiki of the number of items with sitelinks but no statements: > https://w.wiki/DFDh > > Led by > * English wikipedia (68,000 articles), > * Kazakh wiki (42,000 articles), > * Polish wiki (30,000 articles), > * Nepalese Newari wiki (25,000 articles), > * Chinese wiki (21,500 articles) > * Spanish wiki (21,500 articles) > > Note that some of these "articles" are in fact redirects. > > -- James. > > > > On 28/02/2025 22:18, James Heald wrote: > > Further to the below, this query https://w.wiki/DFDJ using a random > > sample finds that > > > > * about 9.7% of items without statements have no wikipedia links. > > > > That's about 80,000 items, which is more than Yaron Koren found -- > > probably because I'm only including sitelinks to actual wikipedias, not > > wikisource wikivoyage or wikicommons. > > > > * about 86% have one wikipedia link (713,000 items) > > > > * 3.5% have two wikipedia links (29,000 items), 0.3% (about 2500 items) > > have three wikipedia links > > > > So to fix these items will require analysing what information can be > > extracted from the wiki articles. > > > > -- James. > > > _______________________________________________ > Wikidata mailing list -- [email protected] > Public archives at > https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/7ROBPGG7RVBFUNVK3T5JZETRCYLV6TPC/ > To unsubscribe send an email to [email protected] >
_______________________________________________ Wikidata mailing list -- [email protected] Public archives at https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/734J2AN46KYICME45DI25TWS7EWNXADZ/ To unsubscribe send an email to [email protected]
