I just realize I forgot to mention that I'm using Jena TDB. I'm also
thinking on testing with Virtuoso with Jena API. Would it be worth the
increase in performance?
On 2017-10-11 12:57, George News wrote:
> Hi all,
>
> The project I'm working in currently has a TDB with approximately 100M
> triplets and the size is increasing quite quickly. When I make a typical
> SPARQL query for getting data from the system, it takes ages, sometimes
> more than 10-20 minutes. I think performance wise this is not really
> user friendly. Therefore I need to know how I can increase the speed, etc.
>
> I'm running the whole system on a machine with Intel Xeon E312xx with
> 32Gb RAM and many times I'm getting OutofMemory Exceptions and the
> google.cache that Jena handles is the one that seems to be causing the
> problem.
>
> Are the figures I'm pointing normal (machine specs, response time,
> etc.)? Is it too big/too small?
>
> For the moment, we have decided to split the graph in pieces, that is,
> generating a new named graph every now and then so the amount of
> information stored in a "current" graph is smaller. Then restricting the
> query to a set of graphs things work better.
>
> Although this solution works, when we merge the graphs for historical
> queries, we are facing the same problem as before. Then, how can we
> increased the speed?
>
> I cannot disclosed the dataset or part of it, but I will try to somehow
> explain it.
>
> - Ids for entities are approximately 255 random ASCII characters. Does
> the size of the ids affect the speed of the SPARQL queries? If yes, can
> I apply a Lucene index to the IDs in order to reduce the query time?
>
> - The depth level of the graph or the information relationship is around
> 7-8 level at most, but most of the times it is required to link 3-4 levels.
>
> - Most of the queries include several:
> ?x myont:hasattribute ?b.
> ?a rdf:type ?b.
>
> Therefore checking the class and subclasses of entities. Is there anyway
> to speed up the inference as if I'm asking for the parent class I will
> get also the children ones defined in my ontology.
>
> - I know the "." in a query acts as more or less like an AND logical
> operation. Does the order of sentences have implications in the
> performance? Should I start with the most restrictive ones? Should I
> start with the simplest ones, i.e. checking number values, etc.?
>
> - Some of the queries uses spatial and time filtering? Is is worth
> implementing the support for spatial searches with SPARQL? Is there any
> kind of index for time searches?
>
> Any help is more than welcome.
>
> Regards,
> Jorge
>