Looking for Pari* with your SPARQL on dbPedia takes 4 seconds on my supposedly efficient laptop CPU:
$ lscpu Architecture: x86_64 Mode(s) opératoire(s) des processeurs :32-bit, 64-bit Byte Order: Little Endian CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) par cœur : 2 Cœur(s) par socket : 4 Socket(s): 1 Nœud(s) NUMA : 1 Identifiant constructeur :GenuineIntel Famille de processeur :6 Modèle : 94 Model name: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz Révision : 3 Vitesse du processeur en MHz :2644.789 CPU max MHz: 3500,0000 CPU min MHz: 800,0000 BogoMIPS: 5181.67 I should try with SSD. I don't know whether TDB can exploit multi-core CPU. Also I don't know whether I can pre-compile the query with a parameter for runtime. Anyway, I'll implement the ordering by triple count in Semantic_forms. Maybe later can it be helpful within Jena-text. 2016-11-03 14:30 GMT+01:00 Osma Suominen <[email protected]>: > Hi Jean-Marc! > > AFAIK using the weights to order results is intimately linked to the text >> index querying. >> If I want the top 10 results, the search must have the weights beforehand >> otherwise I must get all the results to filter later. >> This is the reason for using AnalyzingInfixSuggester. >> Lucene 4_9_1 >> https://lucene.apache.org/core/4_9_1/suggest/org/apache/luce >> ne/search/suggest/analyzing/AnalyzingInfixSuggester.html >> Lucene 6_2_1 >> https://lucene.apache.org/core/6_2_1/suggest/org/apache/luce >> ne/search/suggest/analyzing/AnalyzingInfixSuggester.html >> >> I guess this is what you call "performance reasons" . >> > > I don't see why you couldn't, in principle, do something like this: > > SELECT ?s (COUNT(*) as ?count) > WHERE { > ?s text:query "édu*" . > ?s ?p ?o . > } > GROUP BY ?s > ORDER BY DESC(?count) > LIMIT 10 > > (note: untested query) > > I'm sure it will get slow if the number of hits from the text index is > more than a few dozen. But for a small number of results at a time, it > might work. > > As I wrote in the original post, "I'll have to implement also the callback >> for updates >> like class TextDocProducerTriples in Jena-text." . >> http://jena.apache.org/documentation/javadoc/text/org/ >> apache/jena/query/text/TextDocProducerTriples.html >> > > Isn't that called only when the indexed triple changes (e.g. the one with > rdfs:label or skos:prefLabel or whatever property you are indexing), but > not when other data related to the same subject changes? So if new triples > are added for the same subject, but its label is unchanged, then the text > index won't see the update and thus the count of references/triples won't > be updated either. > > I may be wrong here, I'm not sure how the update tracking works. > > -Osma > > > > -- > Osma Suominen > D.Sc. (Tech), Information Systems Specialist > National Library of Finland > P.O. Box 26 (Kaikukatu 4) > 00014 HELSINGIN YLIOPISTO > Tel. +358 50 3199529 > [email protected] > http://www.nationallibrary.fi > -- Jean-Marc Vanel Profil: http://163.172.179.125:9111/display?displayuri=http%3A%2F%2Fjmvanel.free.fr%2Fjmv.rdf%23me Déductions SARL - Consulting, services, training, Rule-based programming, Semantic Web +33 (0)6 89 16 29 52 Twitter: @jmvanel , @jmvanel_fr ; chat: irc://irc.freenode.net#eulergui
