Re: How to increase performance

Lorenz Buehmann Tue, 24 Oct 2017 07:57:00 -0700

Hi,

minor comments inline:



On 23.10.2017 22:44, George News wrote:
> Hi Rob,
>
> Thanks for your really helpful comments.
>
> As you mention the stack trace is not the one of the query. I actually
> don't have the query that originated that stack trace as this was in
> production and I was not logging every query.
>
> My idea with the previous email was to see if we can understand what
> causes the memory issue by analysing the stack trace and, then with the
> examples examples trying to understand how I can speed up the system.
>
> 1) Stack trace
> I almost reached the same conclusion as you. In this sense I have tried
> to implement HTTP streaming on my server and directly write to the
> resultset to an outputstream. Although in time the performance is quite
> similar to generate a full string with the whole serialize resultset, I
> guess in terms of memory consumption it should be better (I don't know
> how to measure the memory being used in realtime).
There are a lots of Java profilers that could be used.
>
> 2) Inference
> The SPARQL example and the dataset were attached to try to understand if
> inference could be the problem from such a big delay in getting the
> responses. If you consider the format of data and query is normal, then
> we are working well.
> We have made some test with and without inference and as expected the
> difference in performance is notice. However the results are not the
> expected, as the rdf:type subclass is not consider and some of the
> resources are not properly identified.
>
> In this sense, for instance considering a resource data like:
>
> ...
> {
>     "@id" :
> "http://virtual-testbed.org/1/R3081091219.temperature-ambient-sensor-0.quantity";,
>     "@type" : "http://purl.org/iot/vocab/m3-lite#AirTemperature";
>   },
> ...
>
> We have modified it to :
> ...
> {
>     "@id" :
> "http://virtual-testbed.org/1/R3081091219.temperature-ambient-sensor-0.quantity";,
>     "@type" : [ "http://purl.org/iot/vocab/m3-lite#AirTemperature";,
>                 "http://purl.oclc.org/NET/ssnx/ssn#SensingDevice";
>               ]
>   },
> ...
>
> This way we can ask for SensingDevice and for AirTemperature. But
> considering the ontology AirTemperature is a subclass SensingDevice, and
> taking into account that the ontology model is not used in the query
> model, how can I infer the subClassOf? Do I have to manually include
> "AirTemperature rdf:subClassOf SensingDevice" in my resource
> description? Isn't that same as including the ontology model merged with
> the data model (for instance by using a union) when launching the select
> query?
Given that your model doesn't change during time, you could materialize
some/all inferences and write them back to TDB. Depending on which kind
of inferences you really need, this could even be done by some SPARQL
Update queries.
Indeed, this would increase the number of triples in your dataset, thus,
it would consume more disk space. On the other hand, inference has not
to be done during query time, i.e. querying should be faster.



Cheers,

Lorenz
>
> Again I have to really thanks you guys. So great this list and the
> help you provide. Hope sometime I can pay it back.
>
> Regards,
> Jorge
>
>
>
> On 2017-10-23 17:33, Rob Vesse wrote:
>> Note that attachments are generally stripped from Apache mailing lists, it 
>> is usually better to just cut and paste inline
>>
>> Since you CC’d me directly I did get the attachments, the most interesting 
>> of which is the stack trace including here for the benefit of the rest of 
>> the list with some analysis interspersed
>>
>> default task-38
>>   at java.lang.OutOfMemoryError.<init>()V (OutOfMemoryError.java:48)
>>   at 
>> org.apache.jena.ext.com.google.common.cache.LocalCache$Strength$1.referenceValue(Lorg/apache/jena/ext/com/google/common/cache/LocalCache$Segment;Lorg/apache/jena/ext/com/google/common/cache/LocalCache$ReferenceEntry;Ljava/lang/Object;I)Lorg/apache/jena/ext/com/google/common/cache/LocalCache$ValueReference;
>>  (LocalCache.java:382)
>>   at 
>>
>> As I expected it does appear to be in the cache where the stack trace 
>> originates, however since this is the standard Google Guava cache that is 
>> happily used in many production installations across many different 
>> companies across many different projects far beyond Jena I would suspect 
>> that is actually something else that is the root cause.
>>
>> org.apache.jena.ext.com.google.common.cache.LocalCache$Segment.setValue(Lorg/apache/jena/ext/com/google/common/cache/LocalCache$ReferenceEntry;Ljava/lang/Object;Ljava/lang/Object;J)V
>>  (LocalCache.java:2165)
>>   at 
>> org.apache.jena.ext.com.google.common.cache.LocalCache$Segment.put(Ljava/lang/Object;ILjava/lang/Object;Z)Ljava/lang/Object;
>>  (LocalCache.java:2883)
>>   at 
>> org.apache.jena.ext.com.google.common.cache.LocalCache.put(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
>>  (LocalCache.java:4149)
>>   at 
>> org.apache.jena.ext.com.google.common.cache.LocalCache$LocalManualCache.put(Ljava/lang/Object;Ljava/lang/Object;)V
>>  (LocalCache.java:4754)
>>   at 
>> org.apache.jena.atlas.lib.cache.CacheGuava.put(Ljava/lang/Object;Ljava/lang/Object;)V
>>  (CacheGuava.java:76)
>>   at 
>> org.apache.jena.tdb.store.nodetable.NodeTableCache.cacheUpdate(Lorg/apache/jena/graph/Node;Lorg/apache/jena/tdb/store/NodeId;)V
>>  (NodeTableCache.java:207)
>>   at 
>> org.apache.jena.tdb.store.nodetable.NodeTableCache._retrieveNodeByNodeId(Lorg/apache/jena/tdb/store/NodeId;)Lorg/apache/jena/graph/Node;
>>  (NodeTableCache.java:129)
>>   at 
>> org.apache.jena.tdb.store.nodetable.NodeTableCache.getNodeForNodeId(Lorg/apache/jena/tdb/store/NodeId;)Lorg/apache/jena/graph/Node;
>>  (NodeTableCache.java:82)
>>   at 
>> org.apache.jena.tdb.store.nodetable.NodeTableWrapper.getNodeForNodeId(Lorg/apache/jena/tdb/store/NodeId;)Lorg/apache/jena/graph/Node;
>>  (NodeTableWrapper.java:50)
>>   at 
>> org.apache.jena.tdb.store.nodetable.NodeTableInline.getNodeForNodeId(Lorg/apache/jena/tdb/store/NodeId;)Lorg/apache/jena/graph/Node;
>>  (NodeTableInline.java:67)
>>   at 
>> org.apache.jena.tdb.store.nodetable.NodeTableWrapper.getNodeForNodeId(Lorg/apache/jena/tdb/store/NodeId;)Lorg/apache/jena/graph/Node;
>>  (NodeTableWrapper.java:50)
>>   at 
>> org.apache.jena.tdb.solver.BindingTDB.get1(Lorg/apache/jena/sparql/core/Var;)Lorg/apache/jena/graph/Node;
>>  (BindingTDB.java:122)
>>   at 
>> org.apache.jena.sparql.engine.binding.BindingBase.get(Lorg/apache/jena/sparql/core/Var;)Lorg/apache/jena/graph/Node;
>>  (BindingBase.java:121)
>>   at 
>> org.apache.jena.sparql.expr.ExprLib.evalOrElse(Lorg/apache/jena/sparql/expr/Expr;Lorg/apache/jena/sparql/engine/binding/Binding;Lorg/apache/jena/sparql/function/FunctionEnv;Lorg/apache/jena/sparql/expr/NodeValue;)Lorg/apache/jena/sparql/expr/NodeValue;
>>  (ExprLib.java:70)
>>   at 
>> org.apache.jena.sparql.expr.ExprLib.evalOrNull(Lorg/apache/jena/sparql/expr/Expr;Lorg/apache/jena/sparql/engine/binding/Binding;Lorg/apache/jena/sparql/function/FunctionEnv;)Lorg/apache/jena/sparql/expr/NodeValue;
>>  (ExprLib.java:38)
>>   at 
>> org.apache.jena.sparql.expr.aggregate.AccumulatorExpr.accumulate(Lorg/apache/jena/sparql/engine/binding/Binding;Lorg/apache/jena/sparql/function/FunctionEnv;)V
>>  (AccumulatorExpr.java:50)
>>   at 
>> org.apache.jena.sparql.engine.iterator.QueryIterGroup$1.initializeIterator()Ljava/util/Iterator;
>>  (QueryIterGroup.java:111)
>>
>> This shows that you are using grouping, this wasn’t in the example query you 
>> sent so this is not a stack trace associated with that specific query.
>>
>>   at org.apache.jena.atlas.iterator.IteratorDelayedInitialization.init()V 
>> (IteratorDelayedInitialization.java:40)
>>   at org.apache.jena.atlas.iterator.IteratorDelayedInitialization.hasNext()Z 
>> (IteratorDelayedInitialization.java:50)
>>   at 
>> org.apache.jena.sparql.engine.iterator.QueryIterPlainWrapper.hasNextBinding()Z
>>  (QueryIterPlainWrapper.java:53)
>>   at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext()Z 
>> (QueryIteratorBase.java:114)
>>   at 
>> org.apache.jena.sparql.engine.iterator.QueryIterProcessBinding.hasNextBinding()Z
>>  (QueryIterProcessBinding.java:66)
>>   at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext()Z 
>> (QueryIteratorBase.java:114)
>>   at 
>> org.apache.jena.sparql.engine.iterator.QueryIterConvert.hasNextBinding()Z 
>> (QueryIterConvert.java:58)
>>   at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext()Z 
>> (QueryIteratorBase.java:114)
>>   at 
>> org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding()Z
>>  (QueryIteratorWrapper.java:39)
>>   at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext()Z 
>> (QueryIteratorBase.java:114)
>>   at 
>> org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding()Z
>>  (QueryIteratorWrapper.java:39)
>>   at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext()Z 
>> (QueryIteratorBase.java:114)
>>   at org.apache.jena.sparql.engine.ResultSetStream.hasNext()Z 
>> (ResultSetStream.java:74)
>>   at org.apache.jena.sparql.engine.ResultSetCheckCondition.hasNext()Z 
>> (ResultSetCheckCondition.java:55)
>>   at 
>> es.semantic.project.serialize.ResultSetSerializer.asJSON()Ljava/lang/String; 
>> (ResultSetSerializer.java:161)
>>
>> This looks suspect to me, this is a method from your own code that produces 
>> a JSON string that encodes a result set. Depending on the results set the 
>> string could be very large and occupy a large portion of memory. Or equally 
>> if you are handling many requests in parallel storing many otherwise 
>> reasonably sized strings could exhaust memory.
>>
>> The general practice is to stream results directly back to the client, the 
>> details of how you do that will be specific to the framework you are using. 
>> This avoids buffering the entire results in memory which will also explain 
>> some of your perceived performance issues because your users are always 
>> forced to wait for the complete results set to be calculated before getting 
>> any response
>>
>>   at 
>> es.semantic.project.serialize.Serializer.writeAs(Ljava/lang/String;)Ljava/lang/String;
>>  (Serializer.java:98)
>>   at 
>> es.semantic.project.serialize.Serializer.writeAs(Ljavax/ws/rs/core/MediaType;)Ljava/lang/String;
>>  (Serializer.java:69)
>>   at 
>> es.semantic.project.rest.QueryRestService.executeQuery(Les/semantic/project/storage/sql/SparqlQuery$Scope;Ljava/lang/String;Ljavax/ws/rs/core/Request;)Ljavax/ws/rs/core/Response;
>>  (QueryRestService.java:340)
>>   at 
>> es.semantic.project.rest.QueryRestService$Proxy$_$$_WeldClientProxy.executeQuery(Les/semantic/project/storage/sql/SparqlQuery$Scope;Ljava/lang/String;Ljavax/ws/rs/core/Request;)Ljavax/ws/rs/core/Response;
>>  (Unknown Source)
>>   at 
>> sun.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
>>  (Native Method)
>>   at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
>>  (NativeMethodAccessorImpl.java:62)
>>   at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
>>  (DelegatingMethodAccessorImpl.java:43)
>>   at 
>> java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
>>  (Method.java:498)
>>   at 
>> org.jboss.resteasy.core.MethodInjectorImpl.invoke(Lorg/jboss/resteasy/spi/HttpRequest;Lorg/jboss/resteasy/spi/HttpResponse;Ljava/lang/Object;)Ljava/lang/Object;
>>  (MethodInjectorImpl.java:139)
>>   at 
>> org.jboss.resteasy.core.ResourceMethodInvoker.invokeOnTarget(Lorg/jboss/resteasy/spi/HttpRequest;Lorg/jboss/resteasy/spi/HttpResponse;Ljava/lang/Object;)Lorg/jboss/resteasy/specimpl/BuiltResponse;
>>  (ResourceMethodInvoker.java:295)
>>   at 
>> org.jboss.resteasy.core.ResourceMethodInvoker.invoke(Lorg/jboss/resteasy/spi/HttpRequest;Lorg/jboss/resteasy/spi/HttpResponse;Ljava/lang/Object;)Lorg/jboss/resteasy/specimpl/BuiltResponse;
>>  (ResourceMethodInvoker.java:249)
>>   at 
>> org.jboss.resteasy.core.ResourceMethodInvoker.invoke(Lorg/jboss/resteasy/spi/HttpRequest;Lorg/jboss/resteasy/spi/HttpResponse;)Lorg/jboss/resteasy/specimpl/BuiltResponse;
>>  (ResourceMethodInvoker.java:236)
>>   at 
>> org.jboss.resteasy.core.SynchronousDispatcher.invoke(Lorg/jboss/resteasy/spi/HttpRequest;Lorg/jboss/resteasy/spi/HttpResponse;Lorg/jboss/resteasy/core/ResourceInvoker;)V
>>  (SynchronousDispatcher.java:395)
>>   at 
>> org.jboss.resteasy.core.SynchronousDispatcher.invoke(Lorg/jboss/resteasy/spi/HttpRequest;Lorg/jboss/resteasy/spi/HttpResponse;)V
>>  (SynchronousDispatcher.java:202)
>>   at 
>> org.jboss.resteasy.plugins.server.servlet.ServletContainerDispatcher.service(Ljava/lang/String;Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;Z)V
>>  (ServletContainerDispatcher.java:221)
>>   at 
>> org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(Ljava/lang/String;Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)V
>>  (HttpServletDispatcher.java:56)
>>   at 
>> org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)V
>>  (HttpServletDispatcher.java:51)
>>   at 
>> javax.servlet.http.HttpServlet.service(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;)V
>>  (HttpServlet.java:790)
>>   at 
>> io.undertow.servlet.handlers.ServletHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>  (ServletHandler.java:85)
>>   at 
>> io.undertow.servlet.handlers.FilterHandler$FilterChainImpl.doFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;)V
>>  (FilterHandler.java:129)
>>   at 
>> org.apache.logging.log4j.web.Log4jServletFilter.doFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;Ljavax/servlet/FilterChain;)V
>>  (Log4jServletFilter.java:71)
>>   at 
>> io.undertow.servlet.core.ManagedFilter.doFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;Ljavax/servlet/FilterChain;)V
>>  (ManagedFilter.java:60)
>>   at 
>> io.undertow.servlet.handlers.FilterHandler$FilterChainImpl.doFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;)V
>>  (FilterHandler.java:131)
>>   at 
>> io.undertow.servlet.handlers.FilterHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>  (FilterHandler.java:84)
>>   at 
>> io.undertow.servlet.handlers.security.ServletSecurityRoleHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>  (ServletSecurityRoleHandler.java:62)
>>   at 
>> io.undertow.servlet.handlers.ServletDispatchingHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>  (ServletDispatchingHandler.java:36)
>>   at 
>> org.wildfly.extension.undertow.security.SecurityContextAssociationHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>  (SecurityContextAssociationHandler.java:78)
>>   at 
>> io.undertow.server.handlers.PredicateHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>  (PredicateHandler.java:43)
>>   at 
>> io.undertow.servlet.handlers.security.SSLInformationAssociationHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>  (SSLInformationAssociationHandler.java:131)
>>   at 
>> io.undertow.servlet.handlers.security.ServletAuthenticationCallHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>  (ServletAuthenticationCallHandler.java:57)
>>   at 
>> io.undertow.server.handlers.PredicateHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>  (PredicateHandler.java:43)
>>   at 
>> io.undertow.security.handlers.AbstractConfidentialityHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>  (AbstractConfidentialityHandler.java:46)
>>   at 
>> io.undertow.servlet.handlers.security.ServletConfidentialityConstraintHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>  (ServletConfidentialityConstraintHandler.java:64)
>>   at 
>> io.undertow.security.handlers.AuthenticationMechanismsHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>  (AuthenticationMechanismsHandler.java:60)
>>   at 
>> io.undertow.servlet.handlers.security.CachedAuthenticatedSessionHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>  (CachedAuthenticatedSessionHandler.java:77)
>>   at 
>> io.undertow.security.handlers.NotificationReceiverHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>  (NotificationReceiverHandler.java:50)
>>   at 
>> io.undertow.security.handlers.AbstractSecurityContextAssociationHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>  (AbstractSecurityContextAssociationHandler.java:43)
>>   at 
>> io.undertow.server.handlers.PredicateHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>  (PredicateHandler.java:43)
>>   at 
>> org.wildfly.extension.undertow.security.jacc.JACCContextIdHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>  (JACCContextIdHandler.java:61)
>>   at 
>> io.undertow.server.handlers.PredicateHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>  (PredicateHandler.java:43)
>>   at 
>> io.undertow.server.handlers.PredicateHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>  (PredicateHandler.java:43)
>>   at 
>> io.undertow.servlet.handlers.ServletInitialHandler.handleFirstRequest(Lio/undertow/server/HttpServerExchange;Lio/undertow/servlet/handlers/ServletChain;Lio/undertow/servlet/handlers/ServletRequestContext;Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;)V
>>  (ServletInitialHandler.java:284)
>>   at 
>> io.undertow.servlet.handlers.ServletInitialHandler.dispatchRequest(Lio/undertow/server/HttpServerExchange;Lio/undertow/servlet/handlers/ServletRequestContext;Lio/undertow/servlet/handlers/ServletChain;Ljavax/servlet/DispatcherType;)V
>>  (ServletInitialHandler.java:263)
>>   at 
>> io.undertow.servlet.handlers.ServletInitialHandler.access$000(Lio/undertow/servlet/handlers/ServletInitialHandler;Lio/undertow/server/HttpServerExchange;Lio/undertow/servlet/handlers/ServletRequestContext;Lio/undertow/servlet/handlers/ServletChain;Ljavax/servlet/DispatcherType;)V
>>  (ServletInitialHandler.java:81)
>>   at 
>> io.undertow.servlet.handlers.ServletInitialHandler$1.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>  (ServletInitialHandler.java:174)
>>   at 
>> io.undertow.server.Connectors.executeRootHandler(Lio/undertow/server/HttpHandler;Lio/undertow/server/HttpServerExchange;)V
>>  (Connectors.java:202)
>>   at io.undertow.server.HttpServerExchange$1.run()V 
>> (HttpServerExchange.java:793)
>>   at 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
>>  (ThreadPoolExecutor.java:1142)
>>   at java.util.concurrent.ThreadPoolExecutor$Worker.run()V 
>> (ThreadPoolExecutor.java:617)
>>   at java.lang.Thread.run()V (Thread.java:745)
>>
>> Nothing looks obviously wrong with either your query or your sample data 
>> though as I noted you didn’t provide the exact query that triggered this 
>> particular stack trace
>>
>> Rob
>>
>> On 23/10/2017 15:59, "George News" <[email protected]> wrote:
>>
>>     On 2017-10-11 15:47, Rob Vesse wrote:
>>     > Comments inline:
>>     > 
>>     > On 11/10/2017 11:57, "George News" <[email protected]> wrote:
>>     > 
>>     >     Hi all,
>>     >     
>>     >     The project I'm working in currently has a TDB with approximately 
>> 100M
>>     >     triplets and the size is increasing quite quickly. When I make a 
>> typical
>>     >     SPARQL query for getting data from the system, it takes ages, 
>> sometimes
>>     >     more than 10-20 minutes. I think performance wise this is not 
>> really
>>     >     user friendly. Therefore I need to know how I can increase the 
>> speed, etc.
>>     
>>     We have made a tdbdump of the current TDB and the size for the figures
>>     we pointed out are about 70GB in RDF/xml format.
>>     
>>     >     I'm running the whole system on a machine with Intel Xeon E312xx 
>> with
>>     >     32Gb RAM and many times I'm getting OutofMemory Exceptions and the
>>     >     google.cache that Jena handles is the one that seems to be causing 
>> the
>>     >     problem.
>>     > 
>>     >  Specifics stack traces would be useful to understand where the cache 
>> is being exploded. Certain kinds of query may use the cache more heavily 
>> than others so some elaboration on the general construction of queries would 
>> be interesting.
>>     
>>     Find ExceptionStackTrace.txt file as an example. Most of the times the
>>     error is quite similar.
>>     
>>     >     
>>     >     Are the figures I'm pointing normal (machine specs, response time,
>>     >     etc.)? Is it too big/too small?
>>     > 
>>     >  The size of the data seems small relative to the size of the machine. 
>> You don’t specify whether you change the JVM heap size, most memory usage in 
>> TDB is off-heap via memory mapped files so setting too large a heap can 
>> negatively impact performance.
>>     > 
>>     >  The response times seems very poor but that may be the nature of your 
>> queries and data structure, however since you are unable to show those we 
>> can only provide generalisations
>>     >
>>     >     For the moment, we have decided to split the graph in pieces, that 
>> is,
>>     >     generating a new named graph every now and then so the amount of
>>     >     information stored in a "current" graph is smaller. Then 
>> restricting the
>>     >     query to a set of graphs things work better.
>>     >     
>>     >     Although this solution works, when we merge the graphs for 
>> historical
>>     >     queries, we are facing the same problem as before. Then, how can we
>>     >     increased the speed?
>>     >     
>>     >     I cannot disclosed the dataset or part of it, but I will try to 
>> somehow
>>     >     explain it.
>>     >     
>>     >     - Ids for entities are approximately 255 random ASCII characters. 
>> Does
>>     >     the size of the ids affect the speed of the SPARQL queries? If 
>> yes, can
>>     >     I apply a Lucene index to the IDs in order to reduce the query 
>> time?
>>     > 
>>     >  It depends on the nature of the query. All terms are mapped into 
>> 64-bit internal identifiers, these are only mapped back to the original 
>> terms as and when that query engine and/or results serialisation requires 
>> it.  A cache is used to speed up the mapping in both directions so depending 
>> on the nature of the queries and your system loads you may be thrashing this 
>> cache.
>>     >     
>>     >     - The depth level of the graph or the information relationship is 
>> around
>>     >     7-8 level at most, but most of the times it is required to link 
>> 3-4 levels.
>>     > 
>>     >   Difficult to say how this impacts performance because it really 
>> depends on how you are querying that structure
>>     >     
>>     >     - Most of the queries include several:
>>     >     ?x myont:hasattribute ?b.
>>     >     ?a rdf:type ?b.
>>     >     
>>     >     Therefore checking the class and subclasses of entities. Is there 
>> anyway
>>     >     to speed up the inference as if I'm asking for the parent class I 
>> will
>>     >     get also the children ones defined in my ontology.
>>     > 
>>     > So are you actively using inference? If you are then that will 
>> significantly degrade performance because the inference closure is done 
>> entirely in memory i.e. not in TDB if inference is turned on and you will 
>> get minimal performance benefit from using TDB.
>>     > 
>>     >  If you only need simple inference like class and property hierarchy 
>> you may be better served by asserting those statically using SPARQL updates 
>> and not using dynamic inference
>>     
>>     sorry for the delay on providing examples of data and SPARQL queries we
>>     usually make.
>>     
>>     The data we are using is following the ontology that is publicly
>>     available at [1].
>>     
>>     Using this ontology, a sample of a semantic document in JSON-LD format
>>     can be found in the files attached (Observation.jsonld and
>>     Resource.jsonld). These individuals are stored in a TDB using JENA, and
>>     are stored in different graphs that can be merged within the code using
>>     MultiUnion in order to make queries. Then we are requesting for data by
>>     using SPARQL Select queries with quite a lot of inference required
>>     (SPARQL.txt).
>>     
>>     As you suggested, we have made some tries including more properties
>>     (mainly rdf:type) for the individual descriptions in order to disable
>>     inference from the requests. For instance, whenever I register a
>>     m3-lite#AirThermometer I'm always including it is also a
>>     ssn#SensingDevice. This way this device can be easily discovered by its
>>     more descriptive name and/or by its generic one.
>>     
>>     However, the results are not the expected using the same SPARQL
>>     sentences and I have to create specific SPARQL queries to properly
>>     discover the data. Is this the way you suggested to work? Should then I
>>     inform the users of our system about the way we are registering data?
>>     
>>     [1]: http://ontology.fiesta-iot.eu/ontologyDocs/fiesta-iot/doc
>>     
>>     
>>     >     - I know the "." in a query acts as more or less like an AND 
>> logical
>>     >     operation. Does the order of sentences have implications in the
>>     >     performance? Should I start with the most restrictive ones? Should 
>> I
>>     >     start with the simplest ones, i.e. checking number values, etc.?
>>     > 
>>     >  yes and no.  TDB Will attempt to do the necessary scans in an optimal 
>> order based on its knowledge of the statistics of the data. However this 
>> only applies within a single query pattern i.e. { } so depending on the 
>> structure of your query you may need to do some manual reordering. Also if 
>> inference is involved then that may interact.
>>     >     
>>     >     - Some of the queries uses spatial and time filtering? Is is worth
>>     >     implementing the support for spatial searches with SPARQL? Is 
>> there any
>>     >     kind of index for time searches?
>>     > 
>>     >  There is a geospatial indexing extension but there is no temporal 
>> indexing provided by Jena.
>>     
>>     As you can see from the Resource.jsonld, we are using location. Do you
>>     think indexing will help on locating the individuals?
>>     >     
>>     >     Any help is more than welcome.
>>     > 
>>     >  Without more detail it is difficult to provide more detailed help.
>>     > 
>>     > Rob
>>     >     
>>     >     Regards,
>>     >     Jorge
>>     >     
>>     > 
>>     > 
>>     > 
>>     > 
>>     > 
>>     
>>
>>
>>
>>
>>

Re: How to increase performance

Reply via email to