Re: Slow query when getting rdf:type

ajs6f Tue, 07 Nov 2017 06:28:28 -0800

Take a look at the link Rob sent you again. Please read the _entire_ page 
carefully. Under:


http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file

You will see: "The command line tdbstats will scan the data and produce a rules file based on the frequency ofproperties. The output should first go to a temporary file, then that file moved into the database location."


You need to actually use the output of tdbstats by moving it into your database 
directory.


ajs6f

Mikael Pesonen wrote on 11/7/17 6:30 AM:


Hi,

sorry, I don't understand how tdbstats work. I ran it against the same graph 
that making the slow query and got the
result below (some lines removed)

Br,
Mikael

(stats
  (meta
    (timestamp 
"2017-11-07T13:24:16.438+02:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>)
    (run@ "2017/11/07 13:24:16 EET")
    (count 165911))
  ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
<http://purl.org/vocab/frbr/core#Work>)
   3)
  ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#FileDataObject>)
   1098)
  ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
<http://purl.org/vocab/frbr/core#Manifestation>)
   897)
  ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
<http://www.w3.org/2004/02/skos/core#Concept>)
   36)
  ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#InformationElement>)
   1)
  ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
<http://purl.org/vocab/frbr/core#Expression>)
   3)
  ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
<http://www.w3.org/ns/dcat#CatalogRecord>)
   29284)
  ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
<http://purl.org/dc/dcmitype/Text>)
   1623)
  ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#InformationElement>)
   2)
  ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
<http://xmlns.com/foaf/0.1/Document>)
   1100)
  ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 
<http://purl.org/dc/dcmitype/Collection>)
   5)
  (<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 34052)
  (<http://xmlns.com/foaf/0.1/primaryTopic> 29264)
(<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#wordCount> 59)
...
  (<http://purl.org/dc/elements/1.1/format> 4697)
  (<http://purl.org/dc/terms/created> 725)
  (<http://www.w3.org/2004/02/skos/core#topConceptOf> 1)
(<http://www.semanticdesktop.org/ontologies/2007/01/19/nie/#version> 1)
  (<http://purl.org/dc/elements/1.1/description> 6)
  (<http://www.w3.org/2004/02/skos/core#hiddenLabel> 35)
  (<http://purl.org/dc/terms/type> 1624)
  (<http://purl.org/dc/terms/accessRights> 2)
  (<http://purl.org/dc/terms/identifier> 78)
...

On 30.10.2017 17:10, Andy Seaborne wrote:

Mikael,

I can't find anything that makes rdf:type special.  Maybe some distribution of 
data is the cause but I'm not seeing it.

Did you get a chance to get some stats?

    Andy


On 27/10/17 12:27, Mikael Pesonen wrote:


Tried this also with other properties such as dcterms:created, and it didnt 
slow down with them.

-Mikael


On 27.10.2017 13:02, Andy Seaborne wrote:

In this case, stats won't help.  The <some resource> shoudl eb the starting 
point.

(quadpattern
  (quad ?g ?s ?p <some:resource>)
  (quad ?g ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type)
)

(quadpattern
  (quad ?g ?s ?p <some:resource>)
  (quad ?g ?s ?p2 ?o2)
)))

Are you using inference as well?

Is it the same <some resource>?

Is the timing for the rdf:type variant on a cold system?

    Andy



On 27/10/17 10:22, Mikael Pesonen wrote:


Hi,

thanks! I'll try that when get chance to stop jena. Yes we are using TDB.



On 26.10.2017 16:15, Rob Vesse wrote:

Is TDB the underlying database?

If so is there a stats.opt  file in your database directory?

I remember there being issues in the past with the statistics for rdf:type 
triples being wrongly prioritised. You
might want to look at that file, assuming that it exists, and you try adjusting 
values associated with rdf:type
based upon the guidance in the documentation:

http://jena.apache.org/documentation/tdb/optimizer.html#statistics-rule-file

Also if this is a database which is being updated then the statistics can get 
out of date relative to the
database. You can use the commandline tdbstats tool to try regenerating this:

http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file

Note that you will need to stop Fuseki in order to run this as only a single 
process is permitted to access a TDB
database at a time

Rob

On 26/10/2017 13:47, "Mikael Pesonen" <[email protected]> wrote:

     Hi, I have trouble understanding why the first query is slow and second
     one is fast. Using Jena Fuseki 3.4.0.
     So I want to get all resources that reference <some resource>, and their
     types:
     SELECT * WHERE
     {
         GRAPH ?g
         {
             ?s ?p <some resource> .
                  ?s a ?type
         }
     }
     SELECT * WHERE
     {
         GRAPH ?g
         {
             ?s ?p <some resource> .
                  ?s ?p2 ?o2
         }
     }
     First one takes 5 seconds which is too slow for our application. Can it
     be rearranged somehow to make fast? Sorry if this is not a correct forum
     for this.
     Thanks!
     --

Re: Slow query when getting rdf:type

Reply via email to