[Wikidata] Re: Help make this Property Query faster

Thomas Douillard Fri, 05 Nov 2021 07:46:34 -0700

Wikidata has a huge number of labels in a high number of languages.
Is it be possible that indexing strategies based on the language of the
string literal a good thing ? It’s an RDF choice to encode the language in
the literal, it might not be the better choice for performance indeed.
But a query planner/rewriter should be able to detect a pattern like «
filter lang() = "en" » to take advantage of such an index ?


Retrieving label is important in general and do this efficiently might be a
something that makes a difference …

Le ven. 5 nov. 2021 à 11:55, David Causse <[email protected]> a écrit :

> Hi Thad,
>
> I looked at this query and I have nothing to add to what was suggested
> already to make it run faster.
> I think the main issue is the size of the intermediate results that have
> to have the language filter applied, sadly almost every time that a FILTER
> is being used on a string literal blazegraph might have to fetch its
> representation from its lexicon which incur a huge slowdown.
> Regarding indices and ordering I believe the right indices are being used
> otherwize the query would certainly time out, I doubt it can filter all
> english labels before joining them to the property labels.
>
> The criterion ?prop wdt:P31/wdt:P279* wd:Q18616576 does indeed seem
> useless to me and is pulling a couple false positives[1] into the join
> (totally harmless regarding query perf but should perhaps be cleaned up
> from wikidata?).
>
> So filtering & fetching the textual data is indeed what makes this query
> slow. I tried various combinations but could not come up with reasonable &
> stable sub-second response times. Fetching the textual data (possibly
> lazily) from another service might help but this certainly is a consequent
> rewrite of the client relying on this query.
>
> Caching is definitely going to help especially if this data is not subject
> to rapid/frequent changes, the WDQS infrastructure has a caching layer but
> retention might not be long enough to be useful for this particular tool.
> The json output seems indeed quite big (almost 5Mb), while not
> enormous it's still consequent and if this data is relatively stable there
> might be value in refreshing it on purpose (daily as you suggest) and
> making it available on a static storage.
>
> Another note about response times, you may see varying response times from
> the query service and the reasons might be one of the following:
> - it's cached on the query service caching layer (generally sub 100ms
> response time)
> - the server the query hits is heavily loaded
> - the server the query hits is an old generation (we have 2 different
> kinds of hardware setup in the cluster at the moment and might explain some
> of the variance you see).
>
> Hope it helps a bit,
>
> Regards,
>
> David.
>
>
> 1: https://w.wiki/4Lae
>
> On Wed, Nov 3, 2021 at 11:39 PM Thad Guidry <[email protected]> wrote:
>
>> Thanks Kingsley, Thomas, Jeff,
>>
>> From what I see the live query never is sub second and that's likely
>> because of 2 things:
>>   1. indexing not prioritizing this kind of query and aligning it (which
>> David Causse might know if that could be changed), essentially its metadata
>> about Wikidata (it's available properties).
>>   2. it's 2.2 MB of data
>>
>> I think that Yi Liu's Wikidata Property Explorer service then might want
>> to instead cache the results for 24 hours for the best of both worlds.
>>
>> To be fair, the raw amount of data requested seems to be approximately
>> 2.2 MB and so probably should be locally cached by his tool for some
>> determined time (like 24 hours).
>>
>> Thad
>> https://www.linkedin.com/in/thadguidry/
>> https://calendly.com/thadguidry/
>>
>> _______________________________________________
>> Wikidata mailing list -- [email protected]
>> To unsubscribe send an email to [email protected]
>>
> _______________________________________________
> Wikidata mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
>

_______________________________________________
Wikidata mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[Wikidata] Re: Help make this Property Query faster

Reply via email to