Hi Osma!
unfortunatelly I need to match those certain subjects - the application
is document search by selected keywords. And even if
?s <http://purl.org/dc/terms/subject> ?child .
is left out the search takes that ~10 seconds.
Br,
Mikael
On 16.8.2016 15:37, Osma Suominen wrote:
Hi Mikael,
This query will internally create an expensive cross product join if
there are many subjects (many different values for ?child), even
though the DISTINCT will then mask this internal duplication in the
final results.
There are many ways to avoid this. I assume that you don't need to
know the actual subject values (they are not returned by the query
anyway), so you could for example replace ?child with []. Then you
also shouldn't need DISTINCT anymore. Like this:
SELECT ?s ?p ?o WHERE {
GRAPH <http://www.lingsoft.fi/resource-meta/> {
?s <http://purl.org/dc/terms/isPartOf>
<http://www.lingsoft.fi/rdf/uid/574ef1a40236a> .
?s <http://purl.org/dc/terms/subject> [] .
?s ?p ?o
}
}
-Osma
On 16/08/16 13:36, Mikael Pesonen wrote:
Inner DISTINCT helps to halve the execution time, but entire query is
low even with that. I also tested without ontology search like this
SELECT DISTINCT ?s ?p ?o WHERE {
GRAPH <http://www.lingsoft.fi/resource-meta/> {
?s <http://purl.org/dc/terms/isPartOf>
<http://www.lingsoft.fi/rdf/uid/574ef1a40236a> .
?s <http://purl.org/dc/terms/subject> ?child .
?s ?p ?o
}
and it takes almost 10 seconds, so issue is not with the property*
query.
Is it possible to optimize this basic query or would the only option be
to move the data to RAM from harddrive to get faster queries?
Mikael
On 16.8.2016 13:13, Andy Seaborne wrote:
On 15/08/16 09:47, Mikael Pesonen wrote:
Hi,
what do you mean by masking? It should remove duplicates and it makes
the query run in half time compared to without DISTINCT. Result
count at
least is the same.
Mikael
If DISTINCT cause a lot of results to be turned into a few, it is
hiding a lot of work by the query engine.
If it's the inner DISTINCT that halves the execution time, then the
improvements (in dev builds) to property* may help you.
If it's the outer one, it's a serialization issue (which I doubt at
this sacale).
Andy
On 12.8.2016 13:53, Andy Seaborne wrote:
On 08/08/16 11:56, Mikael Pesonen wrote:
Hi Andy,
storage is started like this:
/usr/bin/java -Xmx3600M -jar
/home/text/tools/apache-jena-fuseki-2.3.1/fuseki-server.jar --update
--port 3030 --loc=../apache-jena-3.0.1/DB /ds
Ontology data is simple SKOS, and document data is also simple DC
metadata triplets. Query returns ~15k triplets.
I tested the SKOS part, and this executed in less than one second,
returning ~50 items:
How many without the two DISTINCT?
I am wondering if the DISTINCT (the inner one) is masking a lot of
results.
SELECT DISTINCT *
WHERE {
GRAPH ?graph {
SELECT DISTINCT ?child WHERE {
{<http://www.lingsoft.fi/ontologies/VerohallintoAsiakaskirjeet/c16e9937a515bda6>
skos:narrower* ?child}
UNION
{<http://www.lingsoft.fi/ontologies/VerohallintoAsiakaskirjeet/e56f6309f0d86b95>
skos:narrower* ?child}
UNION
{<http://www.lingsoft.fi/ontologies/VerohallintoAsiakaskirjeet/b393055ac0f3a0bc>
skos:narrower* ?child}
UNION
{<http://www.lingsoft.fi/ontologies/VerohallintoAsiakaskirjeet/642194686a67f935>
skos:narrower* ?child}
UNION
{<http://www.lingsoft.fi/ontologies/VerohallintoAsiakaskirjeet/a9beeb4bf0b0af70>
skos:narrower* ?child}
UNION
{<http://www.lingsoft.fi/ontologies/VerohallintoAsiakaskirjeet/ce3598292f301cec>
skos:narrower* ?child}
UNION
{<http://www.lingsoft.fi/ontologies/VerohallintoAsiakaskirjeet/26aa300e4c033981>
skos:narrower* ?child}
UNION
{<http://www.lingsoft.fi/ontologies/VerohallintoAsiakaskirjeet/bd07d765f36ea88f>
skos:narrower* ?child}
UNION
{<http://www.lingsoft.fi/ontologies/VerohallintoAsiakaskirjeet/bcf9e082e2ae8c9b>
skos:narrower* ?child}
UNION
{<http://www.lingsoft.fi/ontologies/VerohallintoAsiakaskirjeet/78d3955357a8ac10>
skos:narrower* ?child}
UNION
{<http://www.lingsoft.fi/ontologies/VerohallintoAsiakaskirjeet/369b1a9c822f55db>
skos:narrower* ?child}
UNION
{<http://www.lingsoft.fi/ontologies/VerohallintoAsiakaskirjeet/7098a84669b9feca>
skos:narrower* ?child}
UNION
{<http://www.lingsoft.fi/ontologies/VerohallintoAsiakaskirjeet/b7cb30c4efed996a>
skos:narrower* ?child}
}
}
}
Br,
Mikael
On 8.8.2016 13:43, Andy Seaborne wrote:
There is a a certain amount of "it depend" here: what's the data
stored in? what shape is the data?( which Jena version?)
In the next release, and available in development builds is:
https://issues.apache.org/jira/browse/JENA-1195
where property* got speeded up recently. Usually, it took
moderately
unusual data to show this up but the repeated use of an expensive
operation in property* may be happening here too.
Mikael - are you able to try out a SNAPSHOT build?
Andy
On 08/08/16 11:37, Håvard Ottestad wrote:
Is this any better?
SELECT DISTINCT ?s ?p ?o WHERE {
GRAPH <http://www.lingsoft.fi/resource-meta/> {
?s <http://purl.org/dc/terms/isPartOf>
<http://www.lingsoft.fi/rdf/uid/574ef1a40236a> .
?s <http://purl.org/dc/terms/subject> ?child .
?s ?p ?o
}
GRAPH <http://www.lingsoft.fi/> {
SELECT DISTINCT ?child WHERE {
{<http://www.lingsoft.fi/c16e9937a515bda6> skos:narrower*
?child} UNION {<http://www.lingsoft.fi/e56f6309f0d86b95>
skos:narrower* ?child} UNION
{<http://www.lingsoft.fi/b393055ac0f3a0bc> skos:narrower* ?child}
UNION {<http://www.lingsoft.fi/642194686a67f935> skos:narrower*
?child} UNION {<http://www.lingsoft.fi/a9beeb4bf0b0af70>
skos:narrower* ?child} UNION
{<http://www.lingsoft.fi/ce3598292f301cec> skos:narrower* ?child}
UNION {<http://www.lingsoft.fi/26aa300e4c033981> skos:narrower*
?child} UNION {<http://www.lingsoft.fi/bd07d765f36ea88f>
skos:narrower* ?child} UNION
{<http://www.lingsoft.fi/bcf9e082e2ae8c9b> skos:narrower* ?child}
UNION {<http://www.lingsoft.fi/78d3955357a8ac10> skos:narrower*
?child} UNION {<http://www.lingsoft.fi/369b1a9c822f55db>
skos:narrower* ?child} UNION
{<http://www.lingsoft.fi/7098a84669b9feca> skos:narrower* ?child}
UNION {<http://www.lingsoft.fi/b7cb30c4efed996a> skos:narrower*
?child}
}
}
}
Regards,
Håvard M. Ottestad
On 08 Aug 2016, at 11:25, Mikael Pesonen
<[email protected]> wrote:
Hi,
I'm not if this is the correct forum to ask but hope you can
help.
This query takes over 20 seconds with jena:
SELECT DISTINCT ?s ?p ?o WHERE { GRAPH
<http://www.lingsoft.fi/> {
SELECT DISTINCT ?child WHERE {
{<http://www.lingsoft.fi/c16e9937a515bda6> skos:narrower* ?child}
UNION {<http://www.lingsoft.fi/e56f6309f0d86b95> skos:narrower*
?child} UNION {<http://www.lingsoft.fi/b393055ac0f3a0bc>
skos:narrower* ?child} UNION
{<http://www.lingsoft.fi/642194686a67f935> skos:narrower* ?child}
UNION {<http://www.lingsoft.fi/a9beeb4bf0b0af70> skos:narrower*
?child} UNION {<http://www.lingsoft.fi/ce3598292f301cec>
skos:narrower* ?child} UNION
{<http://www.lingsoft.fi/26aa300e4c033981> skos:narrower* ?child}
UNION {<http://www.lingsoft.fi/bd07d765f36ea88f> skos:narrower*
?child} UNION {<http://www.lingsoft.fi/bcf9e082e2ae8c9b>
skos:narrower* ?child} UNION
{<http://www.lingsoft.fi/78d3955357a8ac10> skos:narrower* ?child}
UNION {<http://www.lingsoft.fi/369b1a9c822f55db> skos:narrower*
?child} UNION {<http://www.lingsoft.fi/7098a84669b9feca>
skos:narrower* ?child} UNION
{<http://www.lingsoft.fi/b7cb30c4efed996a> skos:narrower*
?child} }
} GRAPH <http://www.lingsoft.fi/resource-meta/> { ?s
<http://purl.org/dc/terms/subject> ?child . ?s
<http://purl.org/dc/terms/isPartOf>
<http://www.lingsoft.fi/rdf/uid/574ef1a40236a> . ?s ?p ?o }
}First
graph query is for getting keywords from an ontology graph,
second
is for querying documents having those keywords. Is there better
way/order to make this query? Thank you for the help, Mikael
--
www.lingsoft.fi
Speech Applications - Language Management - Translation -
Reader's
and Writer's Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer
e-mail: [email protected]
Tel. +358 2 279 3300
Time zone: GMT+2
Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND
Turku Office
Linnankatu 10 A
FI-20100 Turku
FINLAND
--
www.lingsoft.fi
Speech Applications - Language Management - Translation - Reader's and Writer's
Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer
e-mail: [email protected]
Tel. +358 2 279 3300
Time zone: GMT+2
Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND
Turku Office
Linnankatu 10 A
FI-20100 Turku
FINLAND