Thanks for explaining! SPARQL query is still as slow as before. So getting rdf:type slows it down.

Br,

On 7.11.2017 16:51, [email protected] wrote:
Yes. That is exactly the expected behavior. Please read the entire page.

It explains that the query optimizer can use the stats file to optimize the execution of queries. Any change you would expect to see in behavior will occur at query time. Try your queries again and see if there are changes in the execution times or query explanations.


ajs6f

Mikael Pesonen wrote on 11/7/17 9:43 AM:

Thanks for the help. So outputted stats into tmp file and moved to stats.opt into index folder.

Rerunning tdbstats seems to give same result still:

(stats
  (meta
    (timestamp "2017-11-07T16:39:53.665+02:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>)
    (run@ "2017/11/07 16:39:53 EET")
    (count 165865))
  ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/vocab/frbr/core#Work>)
   2)
  ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#FileDataObject>)
   1097)
  ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/vocab/frbr/core#Manifestation>)
   896)
  ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2004/02/skos/core#Concept>)
   36)
  ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/dcat#CatalogRecord>)
   29284)
  ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/dc/dcmitype/Text>)
   1622)
  ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Document>)
   1097)
  ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/dc/dcmitype/Collection>)
   5)
  (<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 34039)
  (<http://xmlns.com/foaf/0.1/primaryTopic> 29264)
(<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#wordCount> 57)
  (<http://resource.lingsoft.fi/rdf/resource/producer> 3)
(<http://resource.lingsoft.fi/rdf/resource/applicationVersion> 1)
  (<http://purl.org/dc/elements/1.1/format> 4696)
  (<http://purl.org/dc/terms/created> 723)
  (<http://www.w3.org/2004/02/skos/core#topConceptOf> 1)
(<http://www.semanticdesktop.org/ontologies/2007/01/19/nie/#version> 1)
  (<http://purl.org/dc/elements/1.1/description> 6)
  (<http://www.w3.org/2004/02/skos/core#hiddenLabel> 35)
  (<http://purl.org/dc/terms/type> 1623)
  (<http://purl.org/dc/terms/accessRights> 1)
  (<http://purl.org/dc/terms/identifier> 78)
  (<http://purl.org/dc/terms/hasFormat> 3016)
  (<http://purl.org/dc/terms/modified> 30899)

On 7.11.2017 16:27, [email protected] wrote:
Take a look at the link Rob sent you again. Please read the _entire_ page carefully. Under:

http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file

You will see: "The command line tdbstats will scan the data and produce a rules file based on the frequency of properties. The output should first go to a temporary file, then that file moved into the database location."

You need to actually use the output of tdbstats by moving it into your database directory.


ajs6f

Mikael Pesonen wrote on 11/7/17 6:30 AM:

Hi,

sorry, I don't understand how tdbstats work. I ran it against the same graph that making the slow query and got the
result below (some lines removed)

Br,
Mikael

(stats
  (meta
    (timestamp "2017-11-07T13:24:16.438+02:00"^^<http://www.w3.org/2001/XMLSchema#dateTime>)
    (run@ "2017/11/07 13:24:16 EET")
    (count 165911))
  ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/vocab/frbr/core#Work>)
   3)
  ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#FileDataObject>)
   1098)
  ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/vocab/frbr/core#Manifestation>)
   897)
  ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2004/02/skos/core#Concept>)
   36)
  ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo#InformationElement>)
   1)
  ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/vocab/frbr/core#Expression>)
   3)
  ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/ns/dcat#CatalogRecord>)
   29284)
  ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/dc/dcmitype/Text>)
   1623)
  ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type>
<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#InformationElement>)
   2)
  ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Document>)
   1100)
  ((VAR <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://purl.org/dc/dcmitype/Collection>)
   5)
  (<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> 34052)
  (<http://xmlns.com/foaf/0.1/primaryTopic> 29264)
(<http://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#wordCount> 59)
...
  (<http://purl.org/dc/elements/1.1/format> 4697)
  (<http://purl.org/dc/terms/created> 725)
  (<http://www.w3.org/2004/02/skos/core#topConceptOf> 1)
(<http://www.semanticdesktop.org/ontologies/2007/01/19/nie/#version> 1)
  (<http://purl.org/dc/elements/1.1/description> 6)
  (<http://www.w3.org/2004/02/skos/core#hiddenLabel> 35)
  (<http://purl.org/dc/terms/type> 1624)
  (<http://purl.org/dc/terms/accessRights> 2)
  (<http://purl.org/dc/terms/identifier> 78)
...

On 30.10.2017 17:10, Andy Seaborne wrote:
Mikael,

I can't find anything that makes rdf:type special.  Maybe some distribution of data is the cause but I'm not seeing it.

Did you get a chance to get some stats?

    Andy


On 27/10/17 12:27, Mikael Pesonen wrote:

Tried this also with other properties such as dcterms:created, and it didnt slow down with them.

-Mikael


On 27.10.2017 13:02, Andy Seaborne wrote:
In this case, stats won't help. The <some resource> shoudl eb the starting point.

(quadpattern
  (quad ?g ?s ?p <some:resource>)
  (quad ?g ?s <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> ?type)
)

(quadpattern
  (quad ?g ?s ?p <some:resource>)
  (quad ?g ?s ?p2 ?o2)
)))

Are you using inference as well?

Is it the same <some resource>?

Is the timing for the rdf:type variant on a cold system?

    Andy



On 27/10/17 10:22, Mikael Pesonen wrote:

Hi,

thanks! I'll try that when get chance to stop jena. Yes we are using TDB.



On 26.10.2017 16:15, Rob Vesse wrote:
Is TDB the underlying database?

If so is there a stats.opt  file in your database directory?

I remember there being issues in the past with the statistics for rdf:type triples being wrongly prioritised. You might want to look at that file, assuming that it exists, and you try adjusting values associated with rdf:type
based upon the guidance in the documentation:

http://jena.apache.org/documentation/tdb/optimizer.html#statistics-rule-file

Also if this is a database which is being updated then the statistics can get out of date relative to the database. You can use the commandline tdbstats tool to try regenerating this:

http://jena.apache.org/documentation/tdb/optimizer.html#generating-a-statistics-file

Note that you will need to stop Fuseki in order to run this as only a single process is permitted to access a TDB
database at a time

Rob

On 26/10/2017 13:47, "Mikael Pesonen" <[email protected]> wrote:

     Hi, I have trouble understanding why the first query is slow and second
     one is fast. Using Jena Fuseki 3.4.0.
     So I want to get all resources that reference <some resource>, and their
     types:
     SELECT * WHERE
     {
         GRAPH ?g
         {
             ?s ?p <some resource> .
                  ?s a ?type
         }
     }
     SELECT * WHERE
     {
         GRAPH ?g
         {
             ?s ?p <some resource> .
                  ?s ?p2 ?o2
         }
     }
     First one takes 5 seconds which is too slow for our application. Can it      be rearranged somehow to make fast? Sorry if this is not a correct forum
     for this.
     Thanks!
     --









--
Lingsoft - 30 years of Leading Language Management

www.lingsoft.fi

Speech Applications - Language Management - Translation - Reader's and Writer's 
Tools - Text Tools - E-books and M-books

Mikael Pesonen
System Engineer

e-mail: [email protected]
Tel. +358 2 279 3300

Time zone: GMT+2

Helsinki Office
Eteläranta 10
FI-00130 Helsinki
FINLAND

Turku Office
Kauppiaskatu 5 A
FI-20100 Turku
FINLAND

Reply via email to