Take a look at the link Rob sent you again. Please read the _entire_ page
carefully. Under:
http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file
You will see: "The command line tdbstats will scan the data and produce a rules file based on the frequency of
properties. The output should first go to a temporary file, then that file moved into the database location."
You need to actually use the output of tdbstats by moving it into your database
directory.
ajs6f
Mikael Pesonen wrote on 11/7/17 6:30 AM:
Hi,
sorry, I don't understand how tdbstats work. I ran it against the same graph
that making the slow query and got the
result below (some lines removed)
Br,
Mikael
(stats
(meta
(timestamp
"2017-11-07T13:24:16.438+02:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>)
(run@ "2017/11/07 13:24:16 EET")
(count 165911))
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://purl.org/vocab/frbr/core#Work>)
3)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#FileDataObject>)
1098)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://purl.org/vocab/frbr/core#Manifestation>)
897)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.w3.org/2004/02/skos/core#Concept>)
36)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#InformationElement>)
1)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://purl.org/vocab/frbr/core#Expression>)
3)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.w3.org/ns/dcat#CatalogRecord>)
29284)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://purl.org/dc/dcmitype/Text>)
1623)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#InformationElement>)
2)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://xmlns.com/foaf/0.1/Document>)
1100)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://purl.org/dc/dcmitype/Collection>)
5)
(<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 34052)
(<http://xmlns.com/foaf/0.1/primaryTopic> 29264)
(<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#wordCount> 59)
...
(<http://purl.org/dc/elements/1.1/format> 4697)
(<http://purl.org/dc/terms/created> 725)
(<http://www.w3.org/2004/02/skos/core#topConceptOf> 1)
(<http://www.semanticdesktop.org/ontologies/2007/01/19/nie/#version> 1)
(<http://purl.org/dc/elements/1.1/description> 6)
(<http://www.w3.org/2004/02/skos/core#hiddenLabel> 35)
(<http://purl.org/dc/terms/type> 1624)
(<http://purl.org/dc/terms/accessRights> 2)
(<http://purl.org/dc/terms/identifier> 78)
...
On 30.10.2017 17:10, Andy Seaborne wrote:
Mikael,
I can't find anything that makes rdf:type special. Maybe some distribution of
data is the cause but I'm not seeing it.
Did you get a chance to get some stats?
Andy
On 27/10/17 12:27, Mikael Pesonen wrote:
Tried this also with other properties such as dcterms:created, and it didnt
slow down with them.
-Mikael
On 27.10.2017 13:02, Andy Seaborne wrote:
In this case, stats won't help. The <some resource> shoudl eb the starting
point.
(quadpattern
(quad ?g ?s ?p <some:resource>)
(quad ?g ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type)
)
(quadpattern
(quad ?g ?s ?p <some:resource>)
(quad ?g ?s ?p2 ?o2)
)))
Are you using inference as well?
Is it the same <some resource>?
Is the timing for the rdf:type variant on a cold system?
Andy
On 27/10/17 10:22, Mikael Pesonen wrote:
Hi,
thanks! I'll try that when get chance to stop jena. Yes we are using TDB.
On 26.10.2017 16:15, Rob Vesse wrote:
Is TDB the underlying database?
If so is there a stats.opt file in your database directory?
I remember there being issues in the past with the statistics for rdf:type
triples being wrongly prioritised. You
might want to look at that file, assuming that it exists, and you try adjusting
values associated with rdf:type
based upon the guidance in the documentation:
http://jena.apache.org/documentation/tdb/optimizer.html#statistics-rule-file
Also if this is a database which is being updated then the statistics can get
out of date relative to the
database. You can use the commandline tdbstats tool to try regenerating this:
http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file
Note that you will need to stop Fuseki in order to run this as only a single
process is permitted to access a TDB
database at a time
Rob
On 26/10/2017 13:47, "Mikael Pesonen" <[email protected]> wrote:
Hi, I have trouble understanding why the first query is slow and second
one is fast. Using Jena Fuseki 3.4.0.
So I want to get all resources that reference <some resource>, and their
types:
SELECT * WHERE
{
GRAPH ?g
{
?s ?p <some resource> .
?s a ?type
}
}
SELECT * WHERE
{
GRAPH ?g
{
?s ?p <some resource> .
?s ?p2 ?o2
}
}
First one takes 5 seconds which is too slow for our application. Can it
be rearranged somehow to make fast? Sorry if this is not a correct forum
for this.
Thanks!
--