[Wikidata] Re: The elephant in the room

Yaron Koren Fri, 28 Feb 2025 11:28:50 -0800

Perhaps I count as a SPARQL expert now, but I do see one easy way to see
all the Wikidata items with no statements (and are not lexemes):


https://query.wikidata.org/#SELECT%20%3Fitem%20%3Fwiki%0AWHERE%20%7B%0A%20%20%3Fitem%20wikibase%3Astatements%200%20.%0A%20%20MINUS%20%7B%20%3Fitem%20dct%3Alanguage%20%5B%5D%20%7D%20.%20%23%20exclude%20lexemes%0A%20%20%0A%7D

Also, here is a query to get the true "duds" - no statements, no lexemes,
and no Wikipedia/Wikimedia articles - it looks like there are about 8,000
of these, so thankfully not really an "elephant":

https://query.wikidata.org/#SELECT%20DISTINCT%20%3Fitem%20WHERE%20%7B%0A%20%20%3Fitem%20wikibase%3Astatements%200%20.%0A%20%20MINUS%20%7B%20%3Fitem%20dct%3Alanguage%20%5B%5D%20%7D%20.%20%23%20exclude%20lexemes%0A%20%20MINUS%20%7B%20%5B%5D%20schema%3Aabout%20%3Fitem%3B%20schema%3AisPartOf%20%5B%5D%20%7D%20%23%20exclude%20items%20with%20no%20Wikipedia%2C%20etc.%20article%0A%7D

Finally - on a minor no, it looks there are still about 2,000 human
settlements in Wikidata without a country:

https://wikidatawalkabout.org/?c=Q486972&lang=en&f.P17=novalue

This is not meant to sound like a criticism - Romaine, you have obviously
made an enormous improvement there! And perhaps the remaining ones are
difficult to categorize.

-Yaron

On Fri, Feb 28, 2025 at 9:57 AM Nicolas VIGNERON <[email protected]>
wrote:

> Hi y'all,
>
> Good ideas.
>
> Queries for such a big number of items are often timing out.
> Here is a working QLever query for items with more than 10 sitelinks :
> https://qlever.cs.uni-freiburg.de/wikidata/VdiLsm. There is only one
> result, you can decrease the value for more results.
> Reminder, QLever results are not updated in real time, it's based on dumps
> (who are late because right now, results are from 29.01.2025).
>
> Cheers,
> Nicolas
>
> Le ven. 28 févr. 2025 à 18:05, Amir E. Aharoni <
> [email protected]> a écrit :
>
>> I tried some queries, and they all timed out :(
>>
>> I'm not very good at SPARQL.
>>
>> But I agree with Andy: Dividing hundreds of thousands of items into small
>> groups that can be processed by people who are likely to know something
>> relevant about those items, is probably a better way to try to handle it
>> than just looking at a huge pile of items.
>>
>> Some ways to divide them that I can think of immediately:
>> 1. Having a sitelink to particular languages.
>> 2. Having a label or a description in a particular languages.
>> 3. Having certain characteristics in the label, like length, or presence
>> of certain characters (even a mostly arbitrary characteristic, like "label
>> starts with the letters 'Mi' " or "has digits in label", is better than
>> nothing).
>>
>> If someone can make a bunch of queries that do something like this and
>> actually work (and don't time out), this can be a nice beginning.
>>
>> בתאריך יום ו׳, 28 בפבר׳ 2025, 11:42, מאת Andy Mabbett ‏<
>> [email protected]>:
>>
>>> On Fri, 28 Feb 2025 at 16:06, Romaine Wiki <[email protected]>
>>> wrote:
>>>
>>> > There are another 493k items with only one identifier and no other
>>> statement.
>>> > https://qlever.cs.uni-freiburg.de/wikidata/Z8OkZi?exec=true
>>> > Often that single identifier is just the Google Knowledge Graph ID
>>> (P2671).
>>>
>>> The first half-dozen or so I checked all also have a Wikipedia link in
>>> one or more languages.
>>>
>>> Maybe it would be worth making a query for each of the top, say twenty
>>> languages and posting on the relevant Village Pump?
>>>
>>> Or having an article or talk page template added by a bot, to each
>>> affected article?
>>>
>>> --
>>> Andy Mabbett
>>> https://pigsonthewing.org.uk
>>> _______________________________________________
>>> Wikidata mailing list -- [email protected]
>>> Public archives at
>>> https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/4QUE62RXUBELM5HGQRNSZDKWR2QKSH4D/
>>> To unsubscribe send an email to [email protected]
>>>
>> _______________________________________________
>> Wikidata mailing list -- [email protected]
>> Public archives at
>> https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/OBQ3R7C4SAHDAYD5O2D7LMCWTVJ3LX5K/
>> To unsubscribe send an email to [email protected]
>>
> _______________________________________________
> Wikidata mailing list -- [email protected]
> Public archives at
> https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/VIXWQD2A4OF5JSXS7ZKTRGWYHZ6Q7WC7/
> To unsubscribe send an email to [email protected]
>


-- 
WikiWorks · MediaWiki Consulting · http://wikiworks.com

_______________________________________________
Wikidata mailing list -- [email protected]
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/CRXPNWV2XOWKWT7JIFNV44JMPUMLPHBF/
To unsubscribe send an email to [email protected]

[Wikidata] Re: The elephant in the room

Reply via email to