Re: [Virtuoso-users] Performance weirdness REGEX vs. bif:contains

Ivan Mikhailov Wed, 24 Mar 2010 23:29:50 +0000

Hello Sebastian,

That's a bit weird, but with unspecified graph I should ask first what
indexes are in use?


1073 properties is not so much, so you have less than 1000 graphs of
ontologies and maybe much more graphs other than ontologies. If it's a
case then I'd recommend to experiment with an additional graph that will
contain list of loaded ontologies. So the query may be something like

select distinct ?p where {
   graph <ontology-list> { ?g a <loaded-ontology> }
   {
   graph ?g { ?p a
<http://www.w3.org/1999/02/22-rdf-syntax-ns#Property> .
     ?p <http://www.w3.org/2000/01/rdf-schema#label> ?l .
     ?l bif:contains \"'hastag*'\" .
   }
   UNION
   graph ?g { ?p a <http://www.w3.org/1999/02/22-rdf-syntax-ns#Property> .
     FILTER(REGEX(STR(?p),'hastag*','i')) .
   }
   }
}

And even before this experiment I'd measure the speed of

select distinct ?p where {
   graph ?g {
    ?p a <http://www.w3.org/1999/02/22-rdf-syntax-ns#Property> .
     ?p <http://www.w3.org/2000/01/rdf-schema#label> ?l .
     ?l bif:contains \"'hastag*'\" .
   }
   UNION
   { ?p a <http://www.w3.org/1999/02/22-rdf-syntax-ns#Property> .
     FILTER(REGEX(STR(?p),'hastag*','i')) .
   }
}

in hope that the declaration of a property type and property label are always 
in same graph.

Best Regards,

Ivan Mikhailov
OpenLink Software
http://virtuoso.openlinksw.com

On Wed, 2010-03-24 at 18:16 +0100, Sebastian Trüg wrote:
> In my rather simply query parser which allows users to write stuff like
> "hastag:foobar" I am currently using the following query to match
> "hastag" to an actual property (the db contains a bunch of ontologies,
> each loaded in their own graph with a total of 1073 properties in a
> database of about 1099726 triples.)
> 
> select distinct ?p where {
>    { ?p a <http://www.w3.org/1999/02/22-rdf-syntax-ns#Property> .
>      ?p <http://www.w3.org/2000/01/rdf-schema#label> ?l .
>      ?l bif:contains \"'hastag*'\" .
>    }
>    UNION
>    { ?p a <http://www.w3.org/1999/02/22-rdf-syntax-ns#Property> .
>      FILTER(REGEX(STR(?p),'hastag*','i')) .
>    }
> }
> 
> It gets me the 2 relevant results:
> p -> <http://www.semanticdesktop.org/ontologies/2007/08/15/nao#hasTag>
> p -> <http://www.semanticdesktop.org/ontologies/2007/11/01/pimo#hasTag>
> 
> But the query time is rather long (this is not pure Virtuoso, there is
> some roundtrips from Nepomuk in there): 00:00:02.951
> 
> Then I tried to only use regex filters:
> 
> select distinct ?p where {
>    { ?p a <http://www.w3.org/1999/02/22-rdf-syntax-ns#Property> .
>      ?p <http://www.w3.org/2000/01/rdf-schema#label> ?l .
>      FILTER(REGEX(STR(?l),'hastag*','i')) .
>    }
>    UNION
>    { ?p a <http://www.w3.org/1999/02/22-rdf-syntax-ns#Property> .
>      FILTER(REGEX(STR(?p),'hastag*','i')) .
>    }
> }
> 
> And suddenly the query time is close to zero:  00:00:00.201
> 
> While I am of course happy with a lower query time I am a bit confused
> since I thought the full text query should be much faster than the regex
> filter. My only idea is that filtering over 1023 properties is faster
> than any fulltext query could be. Is that the case?
> 
> Cheers,
> Sebastian
> 
> 
> ------------------------------------------------------------------------------
> Download Intel&#174; Parallel Studio Eval
> Try the new software tools for yourself. Speed compiling, find bugs
> proactively, and fine-tune applications for parallel performance.
> See why Intel Parallel Studio got high marks during beta.
> http://p.sf.net/sfu/intel-sw-dev
> _______________________________________________
> Virtuoso-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/virtuoso-users

Re: [Virtuoso-users] Performance weirdness REGEX vs. bif:contains

Reply via email to