Hi,
sorry, I don't understand how tdbstats work. I ran it against the same
graph that making the slow query and got the result below (some lines
removed)
Br,
Mikael
(stats
(meta
(timestamp
"2017-11-07T13:24:16.438+02:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>)
(run@ "2017/11/07 13:24:16 EET")
(count 165911))
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://purl.org/vocab/frbr/core#Work>)
3)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#FileDataObject>)
1098)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://purl.org/vocab/frbr/core#Manifestation>)
897)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.w3.org/2004/02/skos/core#Concept>)
36)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#InformationElement>)
1)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://purl.org/vocab/frbr/core#Expression>)
3)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.w3.org/ns/dcat#CatalogRecord>)
29284)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://purl.org/dc/dcmitype/Text>)
1623)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#InformationElement>)
2)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://xmlns.com/foaf/0.1/Document>)
1100)
((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://purl.org/dc/dcmitype/Collection>)
5)
(<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 34052)
(<http://xmlns.com/foaf/0.1/primaryTopic> 29264)
(<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#wordCount> 59)
...
(<http://purl.org/dc/elements/1.1/format> 4697)
(<http://purl.org/dc/terms/created> 725)
(<http://www.w3.org/2004/02/skos/core#topConceptOf> 1)
(<http://www.semanticdesktop.org/ontologies/2007/01/19/nie/#version> 1)
(<http://purl.org/dc/elements/1.1/description> 6)
(<http://www.w3.org/2004/02/skos/core#hiddenLabel> 35)
(<http://purl.org/dc/terms/type> 1624)
(<http://purl.org/dc/terms/accessRights> 2)
(<http://purl.org/dc/terms/identifier> 78)
...
On 30.10.2017 17:10, Andy Seaborne wrote:
Mikael,
I can't find anything that makes rdf:type special. Maybe some
distribution of data is the cause but I'm not seeing it.
Did you get a chance to get some stats?
Andy
On 27/10/17 12:27, Mikael Pesonen wrote:
Tried this also with other properties such as dcterms:created, and it
didnt slow down with them.
-Mikael
On 27.10.2017 13:02, Andy Seaborne wrote:
In this case, stats won't help. The <some resource> shoudl eb the
starting point.
(quadpattern
(quad ?g ?s ?p <some:resource>)
(quad ?g ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type)
)
(quadpattern
(quad ?g ?s ?p <some:resource>)
(quad ?g ?s ?p2 ?o2)
)))
Are you using inference as well?
Is it the same <some resource>?
Is the timing for the rdf:type variant on a cold system?
Andy
On 27/10/17 10:22, Mikael Pesonen wrote:
Hi,
thanks! I'll try that when get chance to stop jena. Yes we are
using TDB.
On 26.10.2017 16:15, Rob Vesse wrote:
Is TDB the underlying database?
If so is there a stats.opt file in your database directory?
I remember there being issues in the past with the statistics for
rdf:type triples being wrongly prioritised. You might want to look
at that file, assuming that it exists, and you try adjusting
values associated with rdf:type based upon the guidance in the
documentation:
http://jena.apache.org/documentation/tdb/optimizer.html#statistics-rule-file
Also if this is a database which is being updated then the
statistics can get out of date relative to the database. You can
use the commandline tdbstats tool to try regenerating this:
http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file
Note that you will need to stop Fuseki in order to run this as
only a single process is permitted to access a TDB database at a time
Rob
On 26/10/2017 13:47, "Mikael Pesonen" <[email protected]>
wrote:
Hi, I have trouble understanding why the first query is slow
and second
one is fast. Using Jena Fuseki 3.4.0.
So I want to get all resources that reference <some
resource>, and their
types:
SELECT * WHERE
{
GRAPH ?g
{
?s ?p <some resource> .
?s a ?type
}
}
SELECT * WHERE
{
GRAPH ?g
{
?s ?p <some resource> .
?s ?p2 ?o2
}
}
First one takes 5 seconds which is too slow for our
application. Can it
be rearranged somehow to make fast? Sorry if this is not a
correct forum
for this.
Thanks!
--
--
Lingsoft - 30 years of Leading Language Management
www.lingsoft.fi
Speech Applications - Language Management - Translation - Reader's and Writer's
Tools - Text Tools - E-books and M-books
Mikael Pesonen
System Engineer
e-mail: [email protected]
Tel. +358 2 279 3300
Time zone: GMT+2
Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND
Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND