Thanks for explaining! SPARQL query is still as slow as before. So
getting rdf:type slows it down.
Br,
On 7.11.2017 16:51, [email protected] wrote:
Yes. That is exactly the expected behavior. Please read the entire page.
It explains that the query optimizer can use the stats file to
optimize the execution of queries. Any change you would expect to see
in behavior will occur at query time. Try your queries again and see
if there are changes in the execution times or query explanations.
ajs6f
Mikael Pesonen wrote on 11/7/17 9:43 AM:
Thanks for the help. So outputted stats into tmp file and moved to
stats.opt into index folder.
Rerunning tdbstats seems to give same result still:
(stats
(meta
(timestamp
"2017-11-07T16:39:53.665+02:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>)
(run@ "2017/11/07 16:39:53 EET")
(count 165865))
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://purl.org/vocab/frbr/core#Work>)
2)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#FileDataObject>)
1097)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://purl.org/vocab/frbr/core#Manifestation>)
896)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.w3.org/2004/02/skos/core#Concept>)
36)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.w3.org/ns/dcat#CatalogRecord>)
29284)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://purl.org/dc/dcmitype/Text>)
1622)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://xmlns.com/foaf/0.1/Document>)
1097)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://purl.org/dc/dcmitype/Collection>)
5)
(<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 34039)
(<http://xmlns.com/foaf/0.1/primaryTopic> 29264)
(<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#wordCount>
57)
(<http://resource.lingsoft.fi/rdf/resource/producer> 3)
(<http://resource.lingsoft.fi/rdf/resource/applicationVersion> 1)
(<http://purl.org/dc/elements/1.1/format> 4696)
(<http://purl.org/dc/terms/created> 723)
(<http://www.w3.org/2004/02/skos/core#topConceptOf> 1)
(<http://www.semanticdesktop.org/ontologies/2007/01/19/nie/#version> 1)
(<http://purl.org/dc/elements/1.1/description> 6)
(<http://www.w3.org/2004/02/skos/core#hiddenLabel> 35)
(<http://purl.org/dc/terms/type> 1623)
(<http://purl.org/dc/terms/accessRights> 1)
(<http://purl.org/dc/terms/identifier> 78)
(<http://purl.org/dc/terms/hasFormat> 3016)
(<http://purl.org/dc/terms/modified> 30899)
On 7.11.2017 16:27, [email protected] wrote:
Take a look at the link Rob sent you again. Please read the _entire_
page carefully. Under:
http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file
You will see: "The command line tdbstats will scan the data and
produce a rules file based on the frequency of
properties. The output should first go to a temporary file, then
that file moved into the database location."
You need to actually use the output of tdbstats by moving it into
your database directory.
ajs6f
Mikael Pesonen wrote on 11/7/17 6:30 AM:
Hi,
sorry, I don't understand how tdbstats work. I ran it against the
same graph that making the slow query and got the
result below (some lines removed)
Br,
Mikael
(stats
(meta
(timestamp
"2017-11-07T13:24:16.438+02:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>)
(run@ "2017/11/07 13:24:16 EET")
(count 165911))
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://purl.org/vocab/frbr/core#Work>)
3)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#FileDataObject>)
1098)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://purl.org/vocab/frbr/core#Manifestation>)
897)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.w3.org/2004/02/skos/core#Concept>)
36)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#InformationElement>)
1)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://purl.org/vocab/frbr/core#Expression>)
3)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.w3.org/ns/dcat#CatalogRecord>)
29284)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://purl.org/dc/dcmitype/Text>)
1623)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#InformationElement>)
2)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://xmlns.com/foaf/0.1/Document>)
1100)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://purl.org/dc/dcmitype/Collection>)
5)
(<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 34052)
(<http://xmlns.com/foaf/0.1/primaryTopic> 29264)
(<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#wordCount>
59)
...
(<http://purl.org/dc/elements/1.1/format> 4697)
(<http://purl.org/dc/terms/created> 725)
(<http://www.w3.org/2004/02/skos/core#topConceptOf> 1)
(<http://www.semanticdesktop.org/ontologies/2007/01/19/nie/#version>
1)
(<http://purl.org/dc/elements/1.1/description> 6)
(<http://www.w3.org/2004/02/skos/core#hiddenLabel> 35)
(<http://purl.org/dc/terms/type> 1624)
(<http://purl.org/dc/terms/accessRights> 2)
(<http://purl.org/dc/terms/identifier> 78)
...
On 30.10.2017 17:10, Andy Seaborne wrote:
Mikael,
I can't find anything that makes rdf:type special. Maybe some
distribution of data is the cause but I'm not seeing it.
Did you get a chance to get some stats?
Andy
On 27/10/17 12:27, Mikael Pesonen wrote:
Tried this also with other properties such as dcterms:created,
and it didnt slow down with them.
-Mikael
On 27.10.2017 13:02, Andy Seaborne wrote:
In this case, stats won't help. The <some resource> shoudl eb
the starting point.
(quadpattern
(quad ?g ?s ?p <some:resource>)
(quad ?g ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
?type)
)
(quadpattern
(quad ?g ?s ?p <some:resource>)
(quad ?g ?s ?p2 ?o2)
)))
Are you using inference as well?
Is it the same <some resource>?
Is the timing for the rdf:type variant on a cold system?
Andy
On 27/10/17 10:22, Mikael Pesonen wrote:
Hi,
thanks! I'll try that when get chance to stop jena. Yes we are
using TDB.
On 26.10.2017 16:15, Rob Vesse wrote:
Is TDB the underlying database?
If so is there a stats.opt file in your database directory?
I remember there being issues in the past with the statistics
for rdf:type triples being wrongly prioritised. You
might want to look at that file, assuming that it exists, and
you try adjusting values associated with rdf:type
based upon the guidance in the documentation:
http://jena.apache.org/documentation/tdb/optimizer.html#statistics-rule-file
Also if this is a database which is being updated then the
statistics can get out of date relative to the
database. You can use the commandline tdbstats tool to try
regenerating this:
http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file
Note that you will need to stop Fuseki in order to run this as
only a single process is permitted to access a TDB
database at a time
Rob
On 26/10/2017 13:47, "Mikael Pesonen"
<[email protected]> wrote:
Hi, I have trouble understanding why the first query is
slow and second
one is fast. Using Jena Fuseki 3.4.0.
So I want to get all resources that reference <some
resource>, and their
types:
SELECT * WHERE
{
GRAPH ?g
{
?s ?p <some resource> .
?s a ?type
}
}
SELECT * WHERE
{
GRAPH ?g
{
?s ?p <some resource> .
?s ?p2 ?o2
}
}
First one takes 5 seconds which is too slow for our
application. Can it
be rearranged somehow to make fast? Sorry if this is not
a correct forum
for this.
Thanks!
--
--
Lingsoft - 30 years of Leading Language Management
www.lingsoft.fi
Speech Applications - Language Management - Translation - Reader's and Writer's
Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer
e-mail: [email protected]
Tel. +358 2 279 3300
Time zone: GMT+2
Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND
Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND