Re: At which point should I consider using text-query indexes?

Laura Morales Tue, 23 May 2017 00:39:44 -0700

Oh, this is interesting. I thought that predicates values (rdfs:label in this 
case) were already sorted and that using STRSTARTS() would be fast because it 
could take advantage of binary search or something. I didn't expect that this 
function would have to scan all the predicate values.
So in which scenario are sparql STR functions acceptable to use (in terms of 
"reasonable performance")?




Laura Morales kirjoitti 23.05.2017 klo 10:23:

> Thank you for the answer. So let's say I want to search nodes in my graph by 
> rdfs:label. Is this correct...
>
> 1) STRSTART(): fast by default because predicates are sorted. Only does exact 
> search.
> 2) STRSTART(LCASE(?label)): fast because predicates are sorted, but just a 
> little bit slower than 1) because if muse LCASE() some strings
> 3) REGEX(): slow because it must go through all rdfs:labels (use jena-text 
> instead)
> 4) CONTAINS(): slow because it must go through all rdfs:labels (use jena-text 
> instead)
>
> Is this correct?

I believe all of these are roughly equivalent in terms of performance.
All of them need to scan all the rdfs:label values. Obviously REGEX is a
bit more expensive than e.g. STRSTARTS but the difference is not very
big. I don't think there's any sorting of predicate values in TDB that
would help here.

> If my app has an input search box where users can search an item by title (on 
> a large graph), would it be a good idea to go with 2) or should I consider 
> setting up a text-query index?

I recommend setting up a text index if you want to do partial matching
of labels from a large graph.

-Osma

--
Osma Suominen
D.Sc. (Tech), Information Systems Specialist
National Library of Finland
P.O. Box 26 (Kaikukatu 4)
00014 HELSINGIN YLIOPISTO
Tel. +358 50 3199529
[email protected]
http://www.nationallibrary.fi

Re: At which point should I consider using text-query indexes?

Reply via email to