[Wikidata] Re: The elephant in the room

FlyingAce Sat, 01 Mar 2025 07:43:08 -0800

I think the point of filtering the items with existing sitelinks is not to
exclude them, but to use those sitelinks so we can have more information
about the item and add statements accordingly; we can start with those as a
way to reduce the backlog.


Regards,
FlyingAce

El vie, 28 de feb de 2025, 8:11 p. m., Romaine Wiki <[email protected]>
escribió:

> With your 1st query, items with no statements, I get 827715 results.
> That combined with items with only one identifier statement (493k items),
> we are over a million items with too limited statements. And then there are
> still many more without both P31 and P279. So I still see an elephant:
> still way too many items with this problem. (But I am not so much
> interested in when we have an elephant in the room or not, the point is
> that many items are empty or almost empty and the question on the table is
> how we can reduce this issue.)
>
> That you exclude items with Wikipedia/Wikimedia sitelinks but I see no
> reason for that, as still they miss the basic statements to be able to run
> a simple query and for most humans it is still impossible to tell what the
> item is about.
>
>
> ----
> Not so much related to the current discussion,
> The items with P31 human settlements without a country: I only included
> P31 = human settlement, and not a subclass of human settlements. This I did
> as I already got with the simple query (I shared in my other e-mail from a
> week ago) too many server timeouts. I already started with that project
> already earlier (than the e-mail) when we had 10 000+ items without
> country, and this has been brought back to only 93 (of which 84 relate to
> Armenia).
> And for the moment I left it there as after getting it down from 10 000+
> to 93 (together with the help of others), I got a bit tired of the subject.
> The remaining ones you list are those 93 left and items that have via a
> subclass of P31 as human settlement. Sure those need attention too, but
> that was not the point of what I wrote.
>
> Romaine
>
>
>
> Op vr 28 feb 2025 om 20:28 schreef Yaron Koren <[email protected]>:
>
>> Perhaps I count as a SPARQL expert now, but I do see one easy way to see
>> all the Wikidata items with no statements (and are not lexemes):
>>
>>
>> https://query.wikidata.org/#SELECT%20%3Fitem%20%3Fwiki%0AWHERE%20%7B%0A%20%20%3Fitem%20wikibase%3Astatements%200%20.%0A%20%20MINUS%20%7B%20%3Fitem%20dct%3Alanguage%20%5B%5D%20%7D%20.%20%23%20exclude%20lexemes%0A%20%20%0A%7D
>>
>> Also, here is a query to get the true "duds" - no statements, no lexemes,
>> and no Wikipedia/Wikimedia articles - it looks like there are about 8,000
>> of these, so thankfully not really an "elephant":
>>
>>
>> https://query.wikidata.org/#SELECT%20DISTINCT%20%3Fitem%20WHERE%20%7B%0A%20%20%3Fitem%20wikibase%3Astatements%200%20.%0A%20%20MINUS%20%7B%20%3Fitem%20dct%3Alanguage%20%5B%5D%20%7D%20.%20%23%20exclude%20lexemes%0A%20%20MINUS%20%7B%20%5B%5D%20schema%3Aabout%20%3Fitem%3B%20schema%3AisPartOf%20%5B%5D%20%7D%20%23%20exclude%20items%20with%20no%20Wikipedia%2C%20etc.%20article%0A%7D
>>
>> Finally - on a minor no, it looks there are still about 2,000 human
>> settlements in Wikidata without a country:
>>
>> https://wikidatawalkabout.org/?c=Q486972&lang=en&f.P17=novalue
>>
>> This is not meant to sound like a criticism - Romaine, you have obviously
>> made an enormous improvement there! And perhaps the remaining ones are
>> difficult to categorize.
>>
>> -Yaron
>>
>> On Fri, Feb 28, 2025 at 9:57 AM Nicolas VIGNERON <
>> [email protected]> wrote:
>>
>>> Hi y'all,
>>>
>>> Good ideas.
>>>
>>> Queries for such a big number of items are often timing out.
>>> Here is a working QLever query for items with more than 10 sitelinks :
>>> https://qlever.cs.uni-freiburg.de/wikidata/VdiLsm. There is only one
>>> result, you can decrease the value for more results.
>>> Reminder, QLever results are not updated in real time, it's based on
>>> dumps (who are late because right now, results are from 29.01.2025).
>>>
>>> Cheers,
>>> Nicolas
>>>
>>> Le ven. 28 févr. 2025 à 18:05, Amir E. Aharoni <
>>> [email protected]> a écrit :
>>>
>>>> I tried some queries, and they all timed out :(
>>>>
>>>> I'm not very good at SPARQL.
>>>>
>>>> But I agree with Andy: Dividing hundreds of thousands of items into
>>>> small groups that can be processed by people who are likely to know
>>>> something relevant about those items, is probably a better way to try to
>>>> handle it than just looking at a huge pile of items.
>>>>
>>>> Some ways to divide them that I can think of immediately:
>>>> 1. Having a sitelink to particular languages.
>>>> 2. Having a label or a description in a particular languages.
>>>> 3. Having certain characteristics in the label, like length, or
>>>> presence of certain characters (even a mostly arbitrary characteristic,
>>>> like "label starts with the letters 'Mi' " or "has digits in label", is
>>>> better than nothing).
>>>>
>>>> If someone can make a bunch of queries that do something like this and
>>>> actually work (and don't time out), this can be a nice beginning.
>>>>
>>>> בתאריך יום ו׳, 28 בפבר׳ 2025, 11:42, מאת Andy Mabbett ‏<
>>>> [email protected]>:
>>>>
>>>>> On Fri, 28 Feb 2025 at 16:06, Romaine Wiki <[email protected]>
>>>>> wrote:
>>>>>
>>>>> > There are another 493k items with only one identifier and no other
>>>>> statement.
>>>>> > https://qlever.cs.uni-freiburg.de/wikidata/Z8OkZi?exec=true
>>>>> > Often that single identifier is just the Google Knowledge Graph ID
>>>>> (P2671).
>>>>>
>>>>> The first half-dozen or so I checked all also have a Wikipedia link in
>>>>> one or more languages.
>>>>>
>>>>> Maybe it would be worth making a query for each of the top, say twenty
>>>>> languages and posting on the relevant Village Pump?
>>>>>
>>>>> Or having an article or talk page template added by a bot, to each
>>>>> affected article?
>>>>>
>>>>> --
>>>>> Andy Mabbett
>>>>> https://pigsonthewing.org.uk
>>>>> _______________________________________________
>>>>> Wikidata mailing list -- [email protected]
>>>>> Public archives at
>>>>> https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/4QUE62RXUBELM5HGQRNSZDKWR2QKSH4D/
>>>>> To unsubscribe send an email to [email protected]
>>>>>
>>>> _______________________________________________
>>>> Wikidata mailing list -- [email protected]
>>>> Public archives at
>>>> https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/OBQ3R7C4SAHDAYD5O2D7LMCWTVJ3LX5K/
>>>> To unsubscribe send an email to [email protected]
>>>>
>>> _______________________________________________
>>> Wikidata mailing list -- [email protected]
>>> Public archives at
>>> https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/VIXWQD2A4OF5JSXS7ZKTRGWYHZ6Q7WC7/
>>> To unsubscribe send an email to [email protected]
>>>
>>
>>
>> --
>> WikiWorks · MediaWiki Consulting · http://wikiworks.com
>> _______________________________________________
>> Wikidata mailing list -- [email protected]
>> Public archives at
>> https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/CRXPNWV2XOWKWT7JIFNV44JMPUMLPHBF/
>> To unsubscribe send an email to [email protected]
>>
> _______________________________________________
> Wikidata mailing list -- [email protected]
> Public archives at
> https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/6QED6TF273W2IWROPRL7N6WDCFVRQPZQ/
> To unsubscribe send an email to [email protected]
>

_______________________________________________
Wikidata mailing list -- [email protected]
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/HR6ROF2XD34OXNFKNFLPQHADHOGFE232/
To unsubscribe send an email to [email protected]

[Wikidata] Re: The elephant in the room

Reply via email to