[Virtuoso-users] Performance weirdness REGEX vs. bif:contains

Sebastian Trüg Wed, 24 Mar 2010 17:16:41 +0000

In my rather simply query parser which allows users to write stuff like
"hastag:foobar" I am currently using the following query to match
"hastag" to an actual property (the db contains a bunch of ontologies,
each loaded in their own graph with a total of 1073 properties in a
database of about 1099726 triples.)


select distinct ?p where {
   { ?p a <http://www.w3.org/1999/02/22-rdf-syntax-ns#Property> .
     ?p <http://www.w3.org/2000/01/rdf-schema#label> ?l .
     ?l bif:contains \"'hastag*'\" .
   }
   UNION
   { ?p a <http://www.w3.org/1999/02/22-rdf-syntax-ns#Property> .
     FILTER(REGEX(STR(?p),'hastag*','i')) .
   }
}

It gets me the 2 relevant results:
p -> <http://www.semanticdesktop.org/ontologies/2007/08/15/nao#hasTag>
p -> <http://www.semanticdesktop.org/ontologies/2007/11/01/pimo#hasTag>

But the query time is rather long (this is not pure Virtuoso, there is
some roundtrips from Nepomuk in there): 00:00:02.951

Then I tried to only use regex filters:

select distinct ?p where {
   { ?p a <http://www.w3.org/1999/02/22-rdf-syntax-ns#Property> .
     ?p <http://www.w3.org/2000/01/rdf-schema#label> ?l .
     FILTER(REGEX(STR(?l),'hastag*','i')) .
   }
   UNION
   { ?p a <http://www.w3.org/1999/02/22-rdf-syntax-ns#Property> .
     FILTER(REGEX(STR(?p),'hastag*','i')) .
   }
}

And suddenly the query time is close to zero:  00:00:00.201

While I am of course happy with a lower query time I am a bit confused
since I thought the full text query should be much faster than the regex
filter. My only idea is that filtering over 1023 properties is faster
than any fulltext query could be. Is that the case?

Cheers,
Sebastian

[Virtuoso-users] Performance weirdness REGEX vs. bif:contains

Reply via email to