Re: [Virtuoso-users] Performance weirdness REGEX vs. bif:contains

Sebastian Trüg Thu, 25 Mar 2010 09:11:00 +0000

HI Ivan,

thanks a lot for looking into this. :)


On Thursday 25 March 2010 00:29:40 Ivan Mikhailov wrote:
> That's a bit weird, but with unspecified graph I should ask first what
> indexes are in use?

currently I am only using the default 6.1 index (what is it called 3.1 or 
something?)

> And even before this experiment I'd measure the speed of
> 
> select distinct ?p where {
>    graph ?g {
>     ?p a <http://www.w3.org/1999/02/22-rdf-syntax-ns#Property> .
>      ?p <http://www.w3.org/2000/01/rdf-schema#label> ?l .
>      ?l bif:contains \"'hastag*'\" .
>    }
>    UNION
>    { ?p a <http://www.w3.org/1999/02/22-rdf-syntax-ns#Property> .
>      FILTER(REGEX(STR(?p),'hastag*','i')) .
>    }
> }
> 
> in hope that the declaration of a property type and property label are
> always in same graph.

They are always in the same graph and specifying it gives a slight performance 
improvement but the query is still much slower than with a REGEX.

> 1073 properties is not so much, so you have less than 1000 graphs of
> ontologies and maybe much more graphs other than ontologies. If it's a

that is the case. I have A LOT of graphs.

> case then I'd recommend to experiment with an additional graph that will
> contain list of loaded ontologies. So the query may be something like
> 
> select distinct ?p where {
>    graph <ontology-list> { ?g a <loaded-ontology> }
>    {
>    graph ?g { ?p a
> <http://www.w3.org/1999/02/22-rdf-syntax-ns#Property> .
>      ?p <http://www.w3.org/2000/01/rdf-schema#label> ?l .
>      ?l bif:contains \"'hastag*'\" .
>    }
>    UNION
>    graph ?g { ?p a <http://www.w3.org/1999/02/22-rdf-syntax-ns#Property> .
>      FILTER(REGEX(STR(?p),'hastag*','i')) .
>    }
>    }
> }

impressive. This sped up the query A LOT. It is now nearly as fast as the 
REGEX alternative. :) nice!

Adding the graph restrictions to the REGEX query does not change its 
performance though. But that may be because it is so fast that the whole time 
is spent in code that needs to be executed in any case and does not change.

So what did we (well, I) learn from this: always try to minimize the number of 
graphs Virtuoso needs to look at. Well, sadly querying ontologies is the only 
area in Nepomuk where that is possible. Everywhere else we use graphs to 
attach metadata to triples....

Thanks again,
Cheers,
Sebastian

> 
> Best Regards,
> 
> Ivan Mikhailov
> OpenLink Software
> http://virtuoso.openlinksw.com
> 
> On Wed, 2010-03-24 at 18:16 +0100, Sebastian Trüg wrote:
> > In my rather simply query parser which allows users to write stuff like
> > "hastag:foobar" I am currently using the following query to match
> > "hastag" to an actual property (the db contains a bunch of ontologies,
> > each loaded in their own graph with a total of 1073 properties in a
> > database of about 1099726 triples.)
> > 
> > select distinct ?p where {
> > 
> >    { ?p a <http://www.w3.org/1999/02/22-rdf-syntax-ns#Property> .
> >    
> >      ?p <http://www.w3.org/2000/01/rdf-schema#label> ?l .
> >      ?l bif:contains \"'hastag*'\" .
> >    
> >    }
> >    UNION
> >    { ?p a <http://www.w3.org/1999/02/22-rdf-syntax-ns#Property> .
> >    
> >      FILTER(REGEX(STR(?p),'hastag*','i')) .
> >    
> >    }
> > 
> > }
> > 
> > It gets me the 2 relevant results:
> > p -> <http://www.semanticdesktop.org/ontologies/2007/08/15/nao#hasTag>
> > p -> <http://www.semanticdesktop.org/ontologies/2007/11/01/pimo#hasTag>
> > 
> > But the query time is rather long (this is not pure Virtuoso, there is
> > some roundtrips from Nepomuk in there): 00:00:02.951
> > 
> > Then I tried to only use regex filters:
> > 
> > select distinct ?p where {
> > 
> >    { ?p a <http://www.w3.org/1999/02/22-rdf-syntax-ns#Property> .
> >    
> >      ?p <http://www.w3.org/2000/01/rdf-schema#label> ?l .
> >      FILTER(REGEX(STR(?l),'hastag*','i')) .
> >    
> >    }
> >    UNION
> >    { ?p a <http://www.w3.org/1999/02/22-rdf-syntax-ns#Property> .
> >    
> >      FILTER(REGEX(STR(?p),'hastag*','i')) .
> >    
> >    }
> > 
> > }
> > 
> > And suddenly the query time is close to zero:  00:00:00.201
> > 
> > While I am of course happy with a lower query time I am a bit confused
> > since I thought the full text query should be much faster than the regex
> > filter. My only idea is that filtering over 1023 properties is faster
> > than any fulltext query could be. Is that the case?
> > 
> > Cheers,
> > Sebastian
> > 
> > 
> > -------------------------------------------------------------------------
> > ----- Download Intel&#174; Parallel Studio Eval
> > Try the new software tools for yourself. Speed compiling, find bugs
> > proactively, and fine-tune applications for parallel performance.
> > See why Intel Parallel Studio got high marks during beta.
> > http://p.sf.net/sfu/intel-sw-dev
> > _______________________________________________
> > Virtuoso-users mailing list
> > [email protected]
> > https://lists.sourceforge.net/lists/listinfo/virtuoso-users

Re: [Virtuoso-users] Performance weirdness REGEX vs. bif:contains

Reply via email to