Perhaps I count as a SPARQL expert now, but I do see one easy way to see all the Wikidata items with no statements (and are not lexemes):
https://query.wikidata.org/#SELECT%20%3Fitem%20%3Fwiki%0AWHERE%20%7B%0A%20%20%3Fitem%20wikibase%3Astatements%200%20.%0A%20%20MINUS%20%7B%20%3Fitem%20dct%3Alanguage%20%5B%5D%20%7D%20.%20%23%20exclude%20lexemes%0A%20%20%0A%7D Also, here is a query to get the true "duds" - no statements, no lexemes, and no Wikipedia/Wikimedia articles - it looks like there are about 8,000 of these, so thankfully not really an "elephant": https://query.wikidata.org/#SELECT%20DISTINCT%20%3Fitem%20WHERE%20%7B%0A%20%20%3Fitem%20wikibase%3Astatements%200%20.%0A%20%20MINUS%20%7B%20%3Fitem%20dct%3Alanguage%20%5B%5D%20%7D%20.%20%23%20exclude%20lexemes%0A%20%20MINUS%20%7B%20%5B%5D%20schema%3Aabout%20%3Fitem%3B%20schema%3AisPartOf%20%5B%5D%20%7D%20%23%20exclude%20items%20with%20no%20Wikipedia%2C%20etc.%20article%0A%7D Finally - on a minor no, it looks there are still about 2,000 human settlements in Wikidata without a country: https://wikidatawalkabout.org/?c=Q486972&lang=en&f.P17=novalue This is not meant to sound like a criticism - Romaine, you have obviously made an enormous improvement there! And perhaps the remaining ones are difficult to categorize. -Yaron On Fri, Feb 28, 2025 at 9:57 AM Nicolas VIGNERON <[email protected]> wrote: > Hi y'all, > > Good ideas. > > Queries for such a big number of items are often timing out. > Here is a working QLever query for items with more than 10 sitelinks : > https://qlever.cs.uni-freiburg.de/wikidata/VdiLsm. There is only one > result, you can decrease the value for more results. > Reminder, QLever results are not updated in real time, it's based on dumps > (who are late because right now, results are from 29.01.2025). > > Cheers, > Nicolas > > Le ven. 28 févr. 2025 à 18:05, Amir E. Aharoni < > [email protected]> a écrit : > >> I tried some queries, and they all timed out :( >> >> I'm not very good at SPARQL. >> >> But I agree with Andy: Dividing hundreds of thousands of items into small >> groups that can be processed by people who are likely to know something >> relevant about those items, is probably a better way to try to handle it >> than just looking at a huge pile of items. >> >> Some ways to divide them that I can think of immediately: >> 1. Having a sitelink to particular languages. >> 2. Having a label or a description in a particular languages. >> 3. Having certain characteristics in the label, like length, or presence >> of certain characters (even a mostly arbitrary characteristic, like "label >> starts with the letters 'Mi' " or "has digits in label", is better than >> nothing). >> >> If someone can make a bunch of queries that do something like this and >> actually work (and don't time out), this can be a nice beginning. >> >> בתאריך יום ו׳, 28 בפבר׳ 2025, 11:42, מאת Andy Mabbett < >> [email protected]>: >> >>> On Fri, 28 Feb 2025 at 16:06, Romaine Wiki <[email protected]> >>> wrote: >>> >>> > There are another 493k items with only one identifier and no other >>> statement. >>> > https://qlever.cs.uni-freiburg.de/wikidata/Z8OkZi?exec=true >>> > Often that single identifier is just the Google Knowledge Graph ID >>> (P2671). >>> >>> The first half-dozen or so I checked all also have a Wikipedia link in >>> one or more languages. >>> >>> Maybe it would be worth making a query for each of the top, say twenty >>> languages and posting on the relevant Village Pump? >>> >>> Or having an article or talk page template added by a bot, to each >>> affected article? >>> >>> -- >>> Andy Mabbett >>> https://pigsonthewing.org.uk >>> _______________________________________________ >>> Wikidata mailing list -- [email protected] >>> Public archives at >>> https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/4QUE62RXUBELM5HGQRNSZDKWR2QKSH4D/ >>> To unsubscribe send an email to [email protected] >>> >> _______________________________________________ >> Wikidata mailing list -- [email protected] >> Public archives at >> https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/OBQ3R7C4SAHDAYD5O2D7LMCWTVJ3LX5K/ >> To unsubscribe send an email to [email protected] >> > _______________________________________________ > Wikidata mailing list -- [email protected] > Public archives at > https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/VIXWQD2A4OF5JSXS7ZKTRGWYHZ6Q7WC7/ > To unsubscribe send an email to [email protected] > -- WikiWorks · MediaWiki Consulting · http://wikiworks.com
_______________________________________________ Wikidata mailing list -- [email protected] Public archives at https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/CRXPNWV2XOWKWT7JIFNV44JMPUMLPHBF/ To unsubscribe send an email to [email protected]
