You haven’t really given enough details for anyone to provide a specific analysis.
How did you load the dataset? It is possible that it took the first hour to load the data though that would be quite slow. How many triples are in the dataset? Do you have a spinning or flash disk? Flash disk will be much faster. This would also explain why the query timed out, if the database was locked for a write transaction no queries which are read transactions would be allowed to proceed during that time as the concurrency is Multiple Readers OR Single Writer (MRSW) Did other queries work in the same timeframe? I suspect not. Since you mention Fuseki I assume you use the default setup whereby you create a TDB database? i.e. you specify a --loc argument at the command line The performance problem here it Is most likely caused by the use of the regular expression. In order to answer the query the query engine first has to find all matching patterns and then for each possible match evaluate the regular expression against it. This requires looking up the full value of the variable being queried which is in a separate lookup table from the main indices since the main indices store only internal identifiers for efficiency. This is what is known as dictionary encoding and is standard across most RDF databases. Try removing the filter and simply counting the number of results, this will tell you how many times it has to attempt the regular expression. For simpler searches you will be likely better served by using alternative functions e.g. CONTAINS(?label, “entity”) Or SAMETERM(?label, “entity”) Although those won’t give you case insensitivity. If you intend to do a lots of text search you would be better off using the text indexing extensions. Rob On 27/04/2017 11:36, "Laura Morales" <[email protected]> wrote: ------------------------------------- PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX wn31: <http://wordnet-rdf.princeton.edu/wn31/> PREFIX wno: <http://wordnet-rdf.princeton.edu/ontology#> PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> SELECT * WHERE { ?synset a wno:Synset ; rdfs:label ?label ; wno:gloss ?gloss . FILTER regex(?label, "entity", "i") } LIMIT 10 ------------------------------------- what happened: - for a long time after loading the dataset (more than 1h), this last query timed out all the times I submitted it. So I thought it was a problem with indexes, and that I should read more about Jena indexes - but now all of a sudden it seems to work, albeit the query seems to take a few seconds to complete (which still feels a bit too slow since THE DATABASE is local on the same machine, and the dataset is not *that* huge) Does anybody know what I've run into? Do indexes have anything to do with this, or maybe some jena/fuseki cache/bootstrap activity, or it's just some monkey business going on with my computer? This feels so strange because I don't think I have done anything relevant with my computer that could have influenced this query. I just loaded the dataset into Fuseki, then started querying...
