Hi everybody. I have a rather complex SPARQL query, which is executed thousands of times in parallel threads (400 threads). The query is here somewhat simplified (namespaces, properties and variables have been reduced) for readability, but the complexity is left untouched (unions, number of graphs etc.). The query is run against 4 graphs, the biggest of which contains 5561181 triples.
PREFIX graphA: <GraphABaseURI:> ASK FROM NAMED <GraphBURI> FROM NAMED <GraphCURI> FROM NAMED <GraphABaseURI> FROM NAMED <GraphDBaseURI> WHERE{ { GRAPH <GraphABaseURI>{ ?variableA a graphA:ClassA . ?variableA graphA:propertyA ?variableB . ?variableB dcterms:title ?variableC . ?variableA graphA:propertyB ?variableD . ?variableL<GraphABaseURI:propertyB> ?variableD . ?variableD <propertyBURI> ?variableE } . GRAPH <GraphBURI>{ ?variableF <propertyCURI>/<propertyDURI> ?variableG . ?variableF <propertyEURI> ?variableH } . GRAPH <GraphCURI>{ ?variableI <http://www.w3.org/2004/02/skos/core#notation> ?variableJ . ?variableI <http://www.w3.org/2004/02/skos/core#prefLabel> ?variableK . FILTER (isLiteral(?variableK) && REGEX(?variableK, "literalA", "i")) } . FILTER (isLiteral(?variableJ) && ?variableG = ?variableJ) . FILTER (?variableE = ?variableH) } UNION { GRAPH <GraphABaseURI>{ ?variableA a graphA:ClassA . ?variableA graphA:propertyA ?variableB . ?variableB dcterms:title ?variableC . ?variableA graphA:propertyB ?variableD . ?variableL<propertyBURI> ?variableE . ?variableL <propertyFURI> ?variableD . } . GRAPH <GraphDBaseURI>{ ?variableM <propertyGURI> ?variableN . ?variableM <propertyHURI> ?variableO . FILTER (isLiteral(?variableO) && REGEX(?variableO, "literalA", "i")) } . FILTER (?variableE = ?variableN) . } UNION { GRAPH <GraphABaseURI>{ ?variableA a graphA:ClassA . ?variableA graphA:propertyA ?variableB . ?variableB dcterms:title ?variableC . ?variableA graphA:propertyB ?variableD . ?variableL<propertyBURI> ?variableE . ?variableL <propertyIURI> ?variableD . } . GRAPH <GraphDBaseURI>{ ?variableM <propertyGURI> ?variableN . ?variableM <propertyHURI> ?variableO . FILTER (isLiteral(?variableO) && REGEX(?variableO, "literalA", "i")) } . FILTER (?variableE = ?variableN) . } . FILTER (isLiteral(?variableC) && REGEX(?variableC, "literalB", "i")) . } I would not expect someone to transform the above query (of course...). I am only posting the query to demonstrate the complexity and all the SPARQL structures used. My questions: 1. Would I gain regarding performance if I had all my triples in one graph? This way I would avoid unions and simplify my query, however, would this also benefit in terms of performance? 2. Are there any kind of indexes that I could built and they could be of any help with the above query? I am not really confident on data indexing, however reading in http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtRDFPerformanceTuning#RDF Index Scheme I wonder if the virtuoso 7's default indexing scheme is suitable for queries like the above. While the predicates are defined in the above query's SPARQL triple patterns, there are many triple patterns that have not defined subject or predicate. Could this be a major problem regarding performance? 3. Perhaps there is a SPARQL syntax structure that I am not aware of and could be of great help in the above query. Could you suggest something? For example, I have already improved performance by removing STR() casts and using the isLiteral() function. Could you suggest anything else? 4. Perhaps you could suggest overusing a complex SPARQL syntax structure? Please note that I use Virtuoso Open source edition, built on Ubuntu, Version: 07.20.3214, Build: Oct 14 2015. Regards, Pantelis Natsiavas
------------------------------------------------------------------------------
_______________________________________________ Virtuoso-users mailing list Virtuoso-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/virtuoso-users