Re: Slow SPARQL query

Mikael Pesonen Fri, 26 Aug 2016 04:15:09 -0700


Haven't done any changes there :-)


-Mikael


On 26.8.2016 13:55, A. Soroka wrote:

Yes, unless you have made some changes in configuration that you would not have 
made without noticing. {grin} TDB is the default storage for Fuseki.

---
A. Soroka
The University of Virginia Library

On Aug 26, 2016, at 6:40 AM, Mikael Pesonen <[email protected]> wrote:


Hi Rob,

Im using this command to start db:

/usr/bin/java -Xmx3600M -jar apache-jena-fuseki-2.3.1/fuseki-server.jar 
--update --port 3030 --loc=../apache-jena-3.0.1/DB /ds

and s- command line tools to make queries. In documentation there is tdbquery 
which I'm not using. But TDB is still in use?

Thanks,
Mikael



On 26.8.2016 13:27, Rob Vesse wrote:

Mikael

  If you’re using Fuseki then you are using TDB already. TDB is a native RDF 
database that uses memory mapped files to make data access as fast as possible.

SDB is a legacy system built on top of relational databases, so queries have to 
be compiled into SQL, submitted to the underlying relational database, and 
their results translated back into RDF appropriately. More complex queries 
cannot be translated directly into a single SQL query due to the differing 
semantics between the two query languages I may require many SQL Queries to 
answer.  SDB is no longer actively developed and receives only minor bug fixes.

  As for when you would not use TDB no probably three main criteria:

1 -  the amount of data you will store is that into the billions of triples. 
TDB Will scale pretty well into the millions of triples although this will 
depend on the complexity of the queries.
2 -  when you need clustering for load balancing, failover etc. TDB is a single 
node system, while there are ways to do load balancing these typically rely on 
layering additional services on top of it
3 -  when you need reasoning support.  TDB does not natively support reasoning, 
you can use other Jena apis to add this but they Will substantially be great 
performance because they require all the data to be in heap memory. If your 
data is static then you can compute the inference closure once and persist that 
into the database but if you need dynamic inference or extremely large-scale 
inference then TDB Will not be suitable.

There are plenty of commercial options that do address the above three criteria 
and people can probably provide recommendations if you think you need a 
commercial option.

  It is also worth noting that some queries are simply hard for any query 
engine to answer

Rob

On 26/08/2016 10:46, "Mikael Pesonen" <[email protected]> wrote:

          Hi, still wondering what I should do to make the performance better.
          I read that TDB is faster. What is the reason not to use TDB? Cant 
find
     any comparison on SDB and TDB in that regard.
          Br,
     Mikael
               On 16.8.2016 13:13, Andy Seaborne wrote:
     > On 15/08/16 09:47, Mikael Pesonen wrote:
     >>
     >> Hi,
     >>
     >> what do you mean by masking? It should remove duplicates and it makes
     >> the query run in half time compared to without DISTINCT. Result count at
     >> least is the same.
     >>
     >> Mikael
     >
     > If DISTINCT cause a lot of results to be turned into a few, it is
     > hiding a lot of work by the query engine.
     >
     > If it's the inner DISTINCT that halves the execution time, then the
     > improvements (in dev builds) to property* may help you.
     >
     > If it's the outer one, it's a serialization issue (which I doubt at
     > this sacale).
     >
     >     Andy
     >
     >>
     >>
     >> On 12.8.2016 13:53, Andy Seaborne wrote:
     >>> On 08/08/16 11:56, Mikael Pesonen wrote:
     >>>>
     >>>> Hi Andy,
     >>>>
     >>>> storage is started like this:
     >>>>
     >>>> /usr/bin/java -Xmx3600M -jar
     >>>> /home/text/tools/apache-jena-fuseki-2.3.1/fuseki-server.jar --update
     >>>> --port 3030 --loc=../apache-jena-3.0.1/DB /ds
     >>>>
     >>>> Ontology data is simple SKOS, and document data is also simple DC
     >>>> metadata triplets. Query returns ~15k triplets.
     >>>>
     >>>> I tested the SKOS part, and this executed in less than one second,
     >>>> returning ~50 items:
     >>>
     >>> How many without the two DISTINCT?
     >>>
     >>> I am wondering if the DISTINCT (the inner one) is masking a lot of
     >>> results.
     >>>
     >>>>
     >>>> SELECT DISTINCT *
     >>>> WHERE {
     >>>>     GRAPH ?graph {
     >>>>         SELECT DISTINCT ?child WHERE {
     >>>> 
{<http://www.lingsoft.fi/ontologies/VerohallintoAsiakaskirjeet/c16e9937a515bda6>
     >>>>
     >>>>
     >>>> skos:narrower* ?child}
     >>>>             UNION
     >>>> 
{<http://www.lingsoft.fi/ontologies/VerohallintoAsiakaskirjeet/e56f6309f0d86b95>
     >>>>
     >>>>
     >>>> skos:narrower* ?child}
     >>>>             UNION
     >>>> 
{<http://www.lingsoft.fi/ontologies/VerohallintoAsiakaskirjeet/b393055ac0f3a0bc>
     >>>>
     >>>>
     >>>> skos:narrower* ?child}
     >>>>             UNION
     >>>> 
{<http://www.lingsoft.fi/ontologies/VerohallintoAsiakaskirjeet/642194686a67f935>
     >>>>
     >>>>
     >>>> skos:narrower* ?child}
     >>>>             UNION
     >>>> 
{<http://www.lingsoft.fi/ontologies/VerohallintoAsiakaskirjeet/a9beeb4bf0b0af70>
     >>>>
     >>>>
     >>>> skos:narrower* ?child}
     >>>>             UNION
     >>>> 
{<http://www.lingsoft.fi/ontologies/VerohallintoAsiakaskirjeet/ce3598292f301cec>
     >>>>
     >>>>
     >>>> skos:narrower* ?child}
     >>>>             UNION
     >>>> 
{<http://www.lingsoft.fi/ontologies/VerohallintoAsiakaskirjeet/26aa300e4c033981>
     >>>>
     >>>>
     >>>> skos:narrower* ?child}
     >>>>             UNION
     >>>> 
{<http://www.lingsoft.fi/ontologies/VerohallintoAsiakaskirjeet/bd07d765f36ea88f>
     >>>>
     >>>>
     >>>> skos:narrower* ?child}
     >>>>             UNION
     >>>> 
{<http://www.lingsoft.fi/ontologies/VerohallintoAsiakaskirjeet/bcf9e082e2ae8c9b>
     >>>>
     >>>>
     >>>> skos:narrower* ?child}
     >>>>             UNION
     >>>> 
{<http://www.lingsoft.fi/ontologies/VerohallintoAsiakaskirjeet/78d3955357a8ac10>
     >>>>
     >>>>
     >>>> skos:narrower* ?child}
     >>>>             UNION
     >>>> 
{<http://www.lingsoft.fi/ontologies/VerohallintoAsiakaskirjeet/369b1a9c822f55db>
     >>>>
     >>>>
     >>>> skos:narrower* ?child}
     >>>>             UNION
     >>>> 
{<http://www.lingsoft.fi/ontologies/VerohallintoAsiakaskirjeet/7098a84669b9feca>
     >>>>
     >>>>
     >>>> skos:narrower* ?child}
     >>>>             UNION
     >>>> 
{<http://www.lingsoft.fi/ontologies/VerohallintoAsiakaskirjeet/b7cb30c4efed996a>
     >>>>
     >>>>
     >>>> skos:narrower* ?child}
     >>>>         }
     >>>>     }
     >>>> }
     >>>>
     >>>> Br,
     >>>> Mikael
     >>>>
     >>>>
     >>>> On 8.8.2016 13:43, Andy Seaborne wrote:
     >>>>> There is a a certain amount of "it depend" here: what's the data
     >>>>> stored in? what shape is the data?( which Jena version?)
     >>>>>
     >>>>> In the next release, and available in development builds is:
     >>>>>
     >>>>> https://issues.apache.org/jira/browse/JENA-1195
     >>>>>
     >>>>> where property* got speeded up recently.  Usually, it took moderately
     >>>>> unusual data to show this up but the repeated use of an expensive
     >>>>> operation in property* may be happening here too.
     >>>>>
     >>>>> Mikael - are you able to try out a SNAPSHOT build?
     >>>>>
     >>>>>     Andy
     >>>>>
     >>>>>
     >>>>> On 08/08/16 11:37, Håvard Ottestad wrote:
     >>>>>> Is this any better?
     >>>>>>
     >>>>>> SELECT DISTINCT ?s ?p ?o WHERE {
     >>>>>>
     >>>>>>   GRAPH <http://www.lingsoft.fi/resource-meta/> {
     >>>>>>    ?s <http://purl.org/dc/terms/isPartOf>
     >>>>>> <http://www.lingsoft.fi/rdf/uid/574ef1a40236a> .
     >>>>>>     ?s <http://purl.org/dc/terms/subject> ?child .
     >>>>>>    ?s ?p ?o
     >>>>>>  }
     >>>>>>
     >>>>>>   GRAPH <http://www.lingsoft.fi/> {
     >>>>>>     SELECT DISTINCT ?child WHERE {
     >>>>>> {<http://www.lingsoft.fi/c16e9937a515bda6> skos:narrower*
     >>>>>> ?child} UNION {<http://www.lingsoft.fi/e56f6309f0d86b95>
     >>>>>> skos:narrower* ?child} UNION
     >>>>>> {<http://www.lingsoft.fi/b393055ac0f3a0bc> skos:narrower* ?child}
     >>>>>> UNION {<http://www.lingsoft.fi/642194686a67f935> skos:narrower*
     >>>>>> ?child} UNION {<http://www.lingsoft.fi/a9beeb4bf0b0af70>
     >>>>>> skos:narrower* ?child} UNION
     >>>>>> {<http://www.lingsoft.fi/ce3598292f301cec> skos:narrower* ?child}
     >>>>>> UNION {<http://www.lingsoft.fi/26aa300e4c033981> skos:narrower*
     >>>>>> ?child} UNION {<http://www.lingsoft.fi/bd07d765f36ea88f>
     >>>>>> skos:narrower* ?child} UNION
     >>>>>> {<http://www.lingsoft.fi/bcf9e082e2ae8c9b> skos:narrower* ?child}
     >>>>>> UNION {<http://www.lingsoft.fi/78d3955357a8ac10> skos:narrower*
     >>>>>> ?child} UNION {<http://www.lingsoft.fi/369b1a9c822f55db>
     >>>>>> skos:narrower* ?child} UNION
     >>>>>> {<http://www.lingsoft.fi/7098a84669b9feca> skos:narrower* ?child}
     >>>>>> UNION {<http://www.lingsoft.fi/b7cb30c4efed996a> skos:narrower*
     >>>>>> ?child}
     >>>>>>      }
     >>>>>>   }
     >>>>>>
     >>>>>> }
     >>>>>>
     >>>>>> Regards,
     >>>>>> Håvard M. Ottestad
     >>>>>>
     >>>>>>> On 08 Aug 2016, at 11:25, Mikael Pesonen
     >>>>>>> <[email protected]> wrote:
     >>>>>>>
     >>>>>>>
     >>>>>>> Hi,
     >>>>>>>
     >>>>>>> I'm not if this is the correct forum to ask but hope you can help.
     >>>>>>> This query takes over 20 seconds with jena:
     >>>>>>>
     >>>>>>> SELECT DISTINCT ?s ?p ?o WHERE { GRAPH <http://www.lingsoft.fi/> {
     >>>>>>> SELECT DISTINCT ?child WHERE {
     >>>>>>> {<http://www.lingsoft.fi/c16e9937a515bda6> skos:narrower* ?child}
     >>>>>>> UNION {<http://www.lingsoft.fi/e56f6309f0d86b95> skos:narrower*
     >>>>>>> ?child} UNION {<http://www.lingsoft.fi/b393055ac0f3a0bc>
     >>>>>>> skos:narrower* ?child} UNION
     >>>>>>> {<http://www.lingsoft.fi/642194686a67f935> skos:narrower* ?child}
     >>>>>>> UNION {<http://www.lingsoft.fi/a9beeb4bf0b0af70> skos:narrower*
     >>>>>>> ?child} UNION {<http://www.lingsoft.fi/ce3598292f301cec>
     >>>>>>> skos:narrower* ?child} UNION
     >>>>>>> {<http://www.lingsoft.fi/26aa300e4c033981> skos:narrower* ?child}
     >>>>>>> UNION {<http://www.lingsoft.fi/bd07d765f36ea88f> skos:narrower*
     >>>>>>> ?child} UNION {<http://www.lingsoft.fi/bcf9e082e2ae8c9b>
     >>>>>>> skos:narrower* ?child} UNION
     >>>>>>> {<http://www.lingsoft.fi/78d3955357a8ac10> skos:narrower* ?child}
     >>>>>>> UNION {<http://www.lingsoft.fi/369b1a9c822f55db> skos:narrower*
     >>>>>>> ?child} UNION {<http://www.lingsoft.fi/7098a84669b9feca>
     >>>>>>> skos:narrower* ?child} UNION
     >>>>>>> {<http://www.lingsoft.fi/b7cb30c4efed996a> skos:narrower* ?child} }
     >>>>>>> } GRAPH <http://www.lingsoft.fi/resource-meta/> { ?s
     >>>>>>> <http://purl.org/dc/terms/subject> ?child . ?s
     >>>>>>> <http://purl.org/dc/terms/isPartOf>
     >>>>>>> <http://www.lingsoft.fi/rdf/uid/574ef1a40236a> . ?s ?p ?o } }First
     >>>>>>> graph query is for getting keywords from an ontology graph, second
     >>>>>>> is for querying documents having those keywords. Is there better
     >>>>>>> way/order to make this query? Thank you for the help, Mikael
     >>>>>>>
     >>>>>>> --
     >>>>>>> www.lingsoft.fi
     >>>>>>>
     >>>>>>> Speech Applications - Language Management - Translation - Reader's
     >>>>>>> and Writer's Tools - Text Tools - E-books and M-books
     >>>>>>>
     >>>>>>> Mikael Pesonen
     >>>>>>> System Engineer
     >>>>>>>
     >>>>>>> e-mail: [email protected]
     >>>>>>> Tel. +358 2 279 3300
     >>>>>>>
     >>>>>>> Time zone: GMT+2
     >>>>>>>
     >>>>>>> Helsinki Office
     >>>>>>> Eteläranta 10
     >>>>>>> FI-00130 Helsinki
     >>>>>>> FINLAND
     >>>>>>>
     >>>>>>> Turku Office
     >>>>>>> Linnankatu 10 A
     >>>>>>> FI-20100 Turku
     >>>>>>> FINLAND
     >>>>>>>
     >>>>>
     >>>>
     >>>
     >>
     >
          --
     www.lingsoft.fi
          Speech Applications - Language Management - Translation - Reader's 
and Writer's Tools - Text Tools - E-books and M-books
          Mikael Pesonen
     System Engineer
          e-mail: [email protected]
     Tel. +358 2 279 3300
          Time zone: GMT+2
          Helsinki Office
     Eteläranta 10
     FI-00130 Helsinki
     FINLAND
          Turku Office
     Linnankatu 10 A
     FI-20100 Turku
     FINLAND

--
www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's 
Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: [email protected]
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Linnankatu 10 A
FI-20100 Turku
FINLAND


--
www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's 
Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: [email protected]
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Linnankatu 10 A
FI-20100 Turku
FINLAND

Re: Slow SPARQL query

Reply via email to