Re: [Wikidata] Differences in label searching with SPARQL and MediaWiki API

Thad Guidry Fri, 17 Jul 2020 10:38:53 -0700

Thank you so much David!

This was such a great example that I had to add this to our SPARQL Examples
page in a new section "Mediawiki API":
*https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples#Mediawiki_API
<https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples#Mediawiki_API>*


The community thanks you sincerely!

Thad
https://www.linkedin.com/in/thadguidry/


On Mon, Jul 13, 2020 at 2:26 AM David Causse <[email protected]> wrote:

> On Sat, Jul 11, 2020 at 7:12 PM Thad Guidry <[email protected]> wrote:
>
>> This query times out:
>>
>> SELECT ?item ?label
>> WHERE
>> {
>>   ?item wdt:P31 ?instance ;
>>     rdfs:label ?label ;
>>     rdfs:label ?enLabel .
>>   FILTER(CONTAINS(lcase(?label), "Soriano")).
>>   FILTER(?instance != wd:Q5).
>>   SERVICE wikibase:label {bd:serviceParam wikibase:language "en".}
>> }
>> LIMIT 100
>>
>> I have this feeling that it's not actually using an index or even asking
>> the right question and so is slow and times out?
>>
>>
> Indeed, none of the criteria in your query allows the triple store to
> determine an index to follow to extract the results in a timely manner.
> The sole non negative criterion would be FILTER(CONTAINS(lcase(?label),
> "Soriano")) but being in a FILTER and moreover a function it cannot be used
> to determine an index to work on.
> The only way to speed-up your query would be to introduce a discriminant
> "matching" criterion.
>
> However the MediaWiki wbsearchentities API does seem to use an index and
>> is performant for label searching:
>>
>> https://www.wikidata.org/w/api.php?action=wbsearchentities&search=soriano&language=en
>>
>>
> wbsearchentitiies is backed by elasticsearch which is optimized for such
> lookups.
>
> How can I get my SPARQL query to be more performant or asking the right
>> question?
>>
>>
> Unfortunate I don't see an obvious way to adapt your sparql query and keep
> exactly the same semantic but to illustrate the problem:
>
> SELECT ?item ?label WHERE {
>   ?item wdt:P31 ?instance ;
>         rdfs:label "Soriano"@en .
>   FILTER(?instance != wd:Q5).
> }
> LIMIT 100
>
> will return results in a timely manner, only because we helped the graph
> traversal with an initial path on ?item rdfs:label "Soriano"@en.
>
> But by combining the query service and the wikidata API[0] baked by
> elasticsearch I think you can extract what you want:
>
> SELECT ?item ?itemLabel WHERE {
>   ?item wdt:P31 ?instance .
>   FILTER(?instance != wd:Q5).
>   SERVICE wikibase:mwapi {
>       bd:serviceParam wikibase:endpoint "www.wikidata.org";
>         wikibase:api "EntitySearch";
>         mwapi:search "soriano";
>         mwapi:language "en".
>       ?item wikibase:apiOutputItem mwapi:item.
>   }
>   SERVICE wikibase:label {bd:serviceParam wikibase:language "en".}
> }
> LIMIT 100
>
> This query will first contact EntitySearch (an alias to wbsearchentities)
> which will pass the items it found to the triple store which in turn can
> now query the graph in a timely manner. Obviously this solution only works
> if the number of items returned by wbsearchentities remains reasonable.
>
> 0: https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual/MWAPI
> --
> David C.
> _______________________________________________
> Wikidata mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikidata
>

_______________________________________________
Wikidata mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Differences in label searching with SPARQL and MediaWiki API

Reply via email to