On 2017-10-24 16:56, Lorenz Buehmann wrote:
> Hi,
>
> minor comments inline:
>
>
> On 23.10.2017 22:44, George News wrote:
>> Hi Rob,
>>
>> Thanks for your really helpful comments.
>>
>> As you mention the stack trace is not the one of the query. I actually
>> don't have the query that originated that stack trace as this was in
>> production and I was not logging every query.
>>
>> My idea with the previous email was to see if we can understand what
>> causes the memory issue by analysing the stack trace and, then with the
>> examples examples trying to understand how I can speed up the system.
>>
>> 1) Stack trace
>> I almost reached the same conclusion as you. In this sense I have tried
>> to implement HTTP streaming on my server and directly write to the
>> resultset to an outputstream. Although in time the performance is quite
>> similar to generate a full string with the whole serialize resultset, I
>> guess in terms of memory consumption it should be better (I don't know
>> how to measure the memory being used in realtime).
> There are a lots of Java profilers that could be used.
>>
>> 2) Inference
>> The SPARQL example and the dataset were attached to try to understand if
>> inference could be the problem from such a big delay in getting the
>> responses. If you consider the format of data and query is normal, then
>> we are working well.
>> We have made some test with and without inference and as expected the
>> difference in performance is notice. However the results are not the
>> expected, as the rdf:type subclass is not consider and some of the
>> resources are not properly identified.
>>
>> In this sense, for instance considering a resource data like:
>>
>> ...
>> {
>> "@id" :
>> "http://virtual-testbed.org/1/R3081091219.temperature-ambient-sensor-0.quantity",
>> "@type" : "http://purl.org/iot/vocab/m3-lite#AirTemperature"
>> },
>> ...
>>
>> We have modified it to :
>> ...
>> {
>> "@id" :
>> "http://virtual-testbed.org/1/R3081091219.temperature-ambient-sensor-0.quantity",
>> "@type" : [ "http://purl.org/iot/vocab/m3-lite#AirTemperature",
>> "http://purl.oclc.org/NET/ssnx/ssn#SensingDevice"
>> ]
>> },
>> ...
>>
>> This way we can ask for SensingDevice and for AirTemperature. But
>> considering the ontology AirTemperature is a subclass SensingDevice, and
>> taking into account that the ontology model is not used in the query
>> model, how can I infer the subClassOf? Do I have to manually include
>> "AirTemperature rdf:subClassOf SensingDevice" in my resource
>> description? Isn't that same as including the ontology model merged with
>> the data model (for instance by using a union) when launching the select
>> query?
> Given that your model doesn't change during time, you could materialize
> some/all inferences and write them back to TDB. Depending on which kind
> of inferences you really need, this could even be done by some SPARQL
> Update queries.
> Indeed, this would increase the number of triples in your dataset, thus,
> it would consume more disk space. On the other hand, inference has not
> to be done during query time, i.e. querying should be faster.
>
I'm now having a deeper look at [1] in order to understand how to
extract inference triples. This way I will include them in the models I
stored in the TDB and later on instruct the users of the system in order
to properly query.
Hope that works ;)
Jorge
[1]: https://jena.apache.org/documentation/inference/
>
> Cheers,
>
> Lorenz
>>
>> Again I have to really thanks you guys. So great this list and the
>> help you provide. Hope sometime I can pay it back.
>>
>> Regards,
>> Jorge
>>
>>
>>
>> On 2017-10-23 17:33, Rob Vesse wrote:
>>> Note that attachments are generally stripped from Apache mailing lists, it
>>> is usually better to just cut and paste inline
>>>
>>> Since you CC’d me directly I did get the attachments, the most interesting
>>> of which is the stack trace including here for the benefit of the rest of
>>> the list with some analysis interspersed
>>>
>>> default task-38
>>> at java.lang.OutOfMemoryError.<init>()V (OutOfMemoryError.java:48)
>>> at
>>> org.apache.jena.ext.com.google.common.cache.LocalCache$Strength$1.referenceValue(Lorg/apache/jena/ext/com/google/common/cache/LocalCache$Segment;Lorg/apache/jena/ext/com/google/common/cache/LocalCache$ReferenceEntry;Ljava/lang/Object;I)Lorg/apache/jena/ext/com/google/common/cache/LocalCache$ValueReference;
>>> (LocalCache.java:382)
>>> at
>>>
>>> As I expected it does appear to be in the cache where the stack trace
>>> originates, however since this is the standard Google Guava cache that is
>>> happily used in many production installations across many different
>>> companies across many different projects far beyond Jena I would suspect
>>> that is actually something else that is the root cause.
>>>
>>> org.apache.jena.ext.com.google.common.cache.LocalCache$Segment.setValue(Lorg/apache/jena/ext/com/google/common/cache/LocalCache$ReferenceEntry;Ljava/lang/Object;Ljava/lang/Object;J)V
>>> (LocalCache.java:2165)
>>> at
>>> org.apache.jena.ext.com.google.common.cache.LocalCache$Segment.put(Ljava/lang/Object;ILjava/lang/Object;Z)Ljava/lang/Object;
>>> (LocalCache.java:2883)
>>> at
>>> org.apache.jena.ext.com.google.common.cache.LocalCache.put(Ljava/lang/Object;Ljava/lang/Object;)Ljava/lang/Object;
>>> (LocalCache.java:4149)
>>> at
>>> org.apache.jena.ext.com.google.common.cache.LocalCache$LocalManualCache.put(Ljava/lang/Object;Ljava/lang/Object;)V
>>> (LocalCache.java:4754)
>>> at
>>> org.apache.jena.atlas.lib.cache.CacheGuava.put(Ljava/lang/Object;Ljava/lang/Object;)V
>>> (CacheGuava.java:76)
>>> at
>>> org.apache.jena.tdb.store.nodetable.NodeTableCache.cacheUpdate(Lorg/apache/jena/graph/Node;Lorg/apache/jena/tdb/store/NodeId;)V
>>> (NodeTableCache.java:207)
>>> at
>>> org.apache.jena.tdb.store.nodetable.NodeTableCache._retrieveNodeByNodeId(Lorg/apache/jena/tdb/store/NodeId;)Lorg/apache/jena/graph/Node;
>>> (NodeTableCache.java:129)
>>> at
>>> org.apache.jena.tdb.store.nodetable.NodeTableCache.getNodeForNodeId(Lorg/apache/jena/tdb/store/NodeId;)Lorg/apache/jena/graph/Node;
>>> (NodeTableCache.java:82)
>>> at
>>> org.apache.jena.tdb.store.nodetable.NodeTableWrapper.getNodeForNodeId(Lorg/apache/jena/tdb/store/NodeId;)Lorg/apache/jena/graph/Node;
>>> (NodeTableWrapper.java:50)
>>> at
>>> org.apache.jena.tdb.store.nodetable.NodeTableInline.getNodeForNodeId(Lorg/apache/jena/tdb/store/NodeId;)Lorg/apache/jena/graph/Node;
>>> (NodeTableInline.java:67)
>>> at
>>> org.apache.jena.tdb.store.nodetable.NodeTableWrapper.getNodeForNodeId(Lorg/apache/jena/tdb/store/NodeId;)Lorg/apache/jena/graph/Node;
>>> (NodeTableWrapper.java:50)
>>> at
>>> org.apache.jena.tdb.solver.BindingTDB.get1(Lorg/apache/jena/sparql/core/Var;)Lorg/apache/jena/graph/Node;
>>> (BindingTDB.java:122)
>>> at
>>> org.apache.jena.sparql.engine.binding.BindingBase.get(Lorg/apache/jena/sparql/core/Var;)Lorg/apache/jena/graph/Node;
>>> (BindingBase.java:121)
>>> at
>>> org.apache.jena.sparql.expr.ExprLib.evalOrElse(Lorg/apache/jena/sparql/expr/Expr;Lorg/apache/jena/sparql/engine/binding/Binding;Lorg/apache/jena/sparql/function/FunctionEnv;Lorg/apache/jena/sparql/expr/NodeValue;)Lorg/apache/jena/sparql/expr/NodeValue;
>>> (ExprLib.java:70)
>>> at
>>> org.apache.jena.sparql.expr.ExprLib.evalOrNull(Lorg/apache/jena/sparql/expr/Expr;Lorg/apache/jena/sparql/engine/binding/Binding;Lorg/apache/jena/sparql/function/FunctionEnv;)Lorg/apache/jena/sparql/expr/NodeValue;
>>> (ExprLib.java:38)
>>> at
>>> org.apache.jena.sparql.expr.aggregate.AccumulatorExpr.accumulate(Lorg/apache/jena/sparql/engine/binding/Binding;Lorg/apache/jena/sparql/function/FunctionEnv;)V
>>> (AccumulatorExpr.java:50)
>>> at
>>> org.apache.jena.sparql.engine.iterator.QueryIterGroup$1.initializeIterator()Ljava/util/Iterator;
>>> (QueryIterGroup.java:111)
>>>
>>> This shows that you are using grouping, this wasn’t in the example query
>>> you sent so this is not a stack trace associated with that specific query.
>>>
>>> at org.apache.jena.atlas.iterator.IteratorDelayedInitialization.init()V
>>> (IteratorDelayedInitialization.java:40)
>>> at
>>> org.apache.jena.atlas.iterator.IteratorDelayedInitialization.hasNext()Z
>>> (IteratorDelayedInitialization.java:50)
>>> at
>>> org.apache.jena.sparql.engine.iterator.QueryIterPlainWrapper.hasNextBinding()Z
>>> (QueryIterPlainWrapper.java:53)
>>> at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext()Z
>>> (QueryIteratorBase.java:114)
>>> at
>>> org.apache.jena.sparql.engine.iterator.QueryIterProcessBinding.hasNextBinding()Z
>>> (QueryIterProcessBinding.java:66)
>>> at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext()Z
>>> (QueryIteratorBase.java:114)
>>> at
>>> org.apache.jena.sparql.engine.iterator.QueryIterConvert.hasNextBinding()Z
>>> (QueryIterConvert.java:58)
>>> at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext()Z
>>> (QueryIteratorBase.java:114)
>>> at
>>> org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding()Z
>>> (QueryIteratorWrapper.java:39)
>>> at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext()Z
>>> (QueryIteratorBase.java:114)
>>> at
>>> org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding()Z
>>> (QueryIteratorWrapper.java:39)
>>> at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext()Z
>>> (QueryIteratorBase.java:114)
>>> at org.apache.jena.sparql.engine.ResultSetStream.hasNext()Z
>>> (ResultSetStream.java:74)
>>> at org.apache.jena.sparql.engine.ResultSetCheckCondition.hasNext()Z
>>> (ResultSetCheckCondition.java:55)
>>> at
>>> es.semantic.project.serialize.ResultSetSerializer.asJSON()Ljava/lang/String;
>>> (ResultSetSerializer.java:161)
>>>
>>> This looks suspect to me, this is a method from your own code that produces
>>> a JSON string that encodes a result set. Depending on the results set the
>>> string could be very large and occupy a large portion of memory. Or equally
>>> if you are handling many requests in parallel storing many otherwise
>>> reasonably sized strings could exhaust memory.
>>>
>>> The general practice is to stream results directly back to the client, the
>>> details of how you do that will be specific to the framework you are using.
>>> This avoids buffering the entire results in memory which will also explain
>>> some of your perceived performance issues because your users are always
>>> forced to wait for the complete results set to be calculated before getting
>>> any response
>>>
>>> at
>>> es.semantic.project.serialize.Serializer.writeAs(Ljava/lang/String;)Ljava/lang/String;
>>> (Serializer.java:98)
>>> at
>>> es.semantic.project.serialize.Serializer.writeAs(Ljavax/ws/rs/core/MediaType;)Ljava/lang/String;
>>> (Serializer.java:69)
>>> at
>>> es.semantic.project.rest.QueryRestService.executeQuery(Les/semantic/project/storage/sql/SparqlQuery$Scope;Ljava/lang/String;Ljavax/ws/rs/core/Request;)Ljavax/ws/rs/core/Response;
>>> (QueryRestService.java:340)
>>> at
>>> es.semantic.project.rest.QueryRestService$Proxy$_$$_WeldClientProxy.executeQuery(Les/semantic/project/storage/sql/SparqlQuery$Scope;Ljava/lang/String;Ljavax/ws/rs/core/Request;)Ljavax/ws/rs/core/Response;
>>> (Unknown Source)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke0(Ljava/lang/reflect/Method;Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
>>> (Native Method)
>>> at
>>> sun.reflect.NativeMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
>>> (NativeMethodAccessorImpl.java:62)
>>> at
>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
>>> (DelegatingMethodAccessorImpl.java:43)
>>> at
>>> java.lang.reflect.Method.invoke(Ljava/lang/Object;[Ljava/lang/Object;)Ljava/lang/Object;
>>> (Method.java:498)
>>> at
>>> org.jboss.resteasy.core.MethodInjectorImpl.invoke(Lorg/jboss/resteasy/spi/HttpRequest;Lorg/jboss/resteasy/spi/HttpResponse;Ljava/lang/Object;)Ljava/lang/Object;
>>> (MethodInjectorImpl.java:139)
>>> at
>>> org.jboss.resteasy.core.ResourceMethodInvoker.invokeOnTarget(Lorg/jboss/resteasy/spi/HttpRequest;Lorg/jboss/resteasy/spi/HttpResponse;Ljava/lang/Object;)Lorg/jboss/resteasy/specimpl/BuiltResponse;
>>> (ResourceMethodInvoker.java:295)
>>> at
>>> org.jboss.resteasy.core.ResourceMethodInvoker.invoke(Lorg/jboss/resteasy/spi/HttpRequest;Lorg/jboss/resteasy/spi/HttpResponse;Ljava/lang/Object;)Lorg/jboss/resteasy/specimpl/BuiltResponse;
>>> (ResourceMethodInvoker.java:249)
>>> at
>>> org.jboss.resteasy.core.ResourceMethodInvoker.invoke(Lorg/jboss/resteasy/spi/HttpRequest;Lorg/jboss/resteasy/spi/HttpResponse;)Lorg/jboss/resteasy/specimpl/BuiltResponse;
>>> (ResourceMethodInvoker.java:236)
>>> at
>>> org.jboss.resteasy.core.SynchronousDispatcher.invoke(Lorg/jboss/resteasy/spi/HttpRequest;Lorg/jboss/resteasy/spi/HttpResponse;Lorg/jboss/resteasy/core/ResourceInvoker;)V
>>> (SynchronousDispatcher.java:395)
>>> at
>>> org.jboss.resteasy.core.SynchronousDispatcher.invoke(Lorg/jboss/resteasy/spi/HttpRequest;Lorg/jboss/resteasy/spi/HttpResponse;)V
>>> (SynchronousDispatcher.java:202)
>>> at
>>> org.jboss.resteasy.plugins.server.servlet.ServletContainerDispatcher.service(Ljava/lang/String;Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;Z)V
>>> (ServletContainerDispatcher.java:221)
>>> at
>>> org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(Ljava/lang/String;Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)V
>>> (HttpServletDispatcher.java:56)
>>> at
>>> org.jboss.resteasy.plugins.server.servlet.HttpServletDispatcher.service(Ljavax/servlet/http/HttpServletRequest;Ljavax/servlet/http/HttpServletResponse;)V
>>> (HttpServletDispatcher.java:51)
>>> at
>>> javax.servlet.http.HttpServlet.service(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;)V
>>> (HttpServlet.java:790)
>>> at
>>> io.undertow.servlet.handlers.ServletHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>> (ServletHandler.java:85)
>>> at
>>> io.undertow.servlet.handlers.FilterHandler$FilterChainImpl.doFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;)V
>>> (FilterHandler.java:129)
>>> at
>>> org.apache.logging.log4j.web.Log4jServletFilter.doFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;Ljavax/servlet/FilterChain;)V
>>> (Log4jServletFilter.java:71)
>>> at
>>> io.undertow.servlet.core.ManagedFilter.doFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;Ljavax/servlet/FilterChain;)V
>>> (ManagedFilter.java:60)
>>> at
>>> io.undertow.servlet.handlers.FilterHandler$FilterChainImpl.doFilter(Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;)V
>>> (FilterHandler.java:131)
>>> at
>>> io.undertow.servlet.handlers.FilterHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>> (FilterHandler.java:84)
>>> at
>>> io.undertow.servlet.handlers.security.ServletSecurityRoleHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>> (ServletSecurityRoleHandler.java:62)
>>> at
>>> io.undertow.servlet.handlers.ServletDispatchingHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>> (ServletDispatchingHandler.java:36)
>>> at
>>> org.wildfly.extension.undertow.security.SecurityContextAssociationHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>> (SecurityContextAssociationHandler.java:78)
>>> at
>>> io.undertow.server.handlers.PredicateHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>> (PredicateHandler.java:43)
>>> at
>>> io.undertow.servlet.handlers.security.SSLInformationAssociationHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>> (SSLInformationAssociationHandler.java:131)
>>> at
>>> io.undertow.servlet.handlers.security.ServletAuthenticationCallHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>> (ServletAuthenticationCallHandler.java:57)
>>> at
>>> io.undertow.server.handlers.PredicateHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>> (PredicateHandler.java:43)
>>> at
>>> io.undertow.security.handlers.AbstractConfidentialityHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>> (AbstractConfidentialityHandler.java:46)
>>> at
>>> io.undertow.servlet.handlers.security.ServletConfidentialityConstraintHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>> (ServletConfidentialityConstraintHandler.java:64)
>>> at
>>> io.undertow.security.handlers.AuthenticationMechanismsHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>> (AuthenticationMechanismsHandler.java:60)
>>> at
>>> io.undertow.servlet.handlers.security.CachedAuthenticatedSessionHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>> (CachedAuthenticatedSessionHandler.java:77)
>>> at
>>> io.undertow.security.handlers.NotificationReceiverHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>> (NotificationReceiverHandler.java:50)
>>> at
>>> io.undertow.security.handlers.AbstractSecurityContextAssociationHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>> (AbstractSecurityContextAssociationHandler.java:43)
>>> at
>>> io.undertow.server.handlers.PredicateHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>> (PredicateHandler.java:43)
>>> at
>>> org.wildfly.extension.undertow.security.jacc.JACCContextIdHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>> (JACCContextIdHandler.java:61)
>>> at
>>> io.undertow.server.handlers.PredicateHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>> (PredicateHandler.java:43)
>>> at
>>> io.undertow.server.handlers.PredicateHandler.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>> (PredicateHandler.java:43)
>>> at
>>> io.undertow.servlet.handlers.ServletInitialHandler.handleFirstRequest(Lio/undertow/server/HttpServerExchange;Lio/undertow/servlet/handlers/ServletChain;Lio/undertow/servlet/handlers/ServletRequestContext;Ljavax/servlet/ServletRequest;Ljavax/servlet/ServletResponse;)V
>>> (ServletInitialHandler.java:284)
>>> at
>>> io.undertow.servlet.handlers.ServletInitialHandler.dispatchRequest(Lio/undertow/server/HttpServerExchange;Lio/undertow/servlet/handlers/ServletRequestContext;Lio/undertow/servlet/handlers/ServletChain;Ljavax/servlet/DispatcherType;)V
>>> (ServletInitialHandler.java:263)
>>> at
>>> io.undertow.servlet.handlers.ServletInitialHandler.access$000(Lio/undertow/servlet/handlers/ServletInitialHandler;Lio/undertow/server/HttpServerExchange;Lio/undertow/servlet/handlers/ServletRequestContext;Lio/undertow/servlet/handlers/ServletChain;Ljavax/servlet/DispatcherType;)V
>>> (ServletInitialHandler.java:81)
>>> at
>>> io.undertow.servlet.handlers.ServletInitialHandler$1.handleRequest(Lio/undertow/server/HttpServerExchange;)V
>>> (ServletInitialHandler.java:174)
>>> at
>>> io.undertow.server.Connectors.executeRootHandler(Lio/undertow/server/HttpHandler;Lio/undertow/server/HttpServerExchange;)V
>>> (Connectors.java:202)
>>> at io.undertow.server.HttpServerExchange$1.run()V
>>> (HttpServerExchange.java:793)
>>> at
>>> java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V
>>> (ThreadPoolExecutor.java:1142)
>>> at java.util.concurrent.ThreadPoolExecutor$Worker.run()V
>>> (ThreadPoolExecutor.java:617)
>>> at java.lang.Thread.run()V (Thread.java:745)
>>>
>>> Nothing looks obviously wrong with either your query or your sample data
>>> though as I noted you didn’t provide the exact query that triggered this
>>> particular stack trace
>>>
>>> Rob
>>>
>>> On 23/10/2017 15:59, "George News" <[email protected]> wrote:
>>>
>>> On 2017-10-11 15:47, Rob Vesse wrote:
>>> > Comments inline:
>>> >
>>> > On 11/10/2017 11:57, "George News" <[email protected]> wrote:
>>> >
>>> > Hi all,
>>> >
>>> > The project I'm working in currently has a TDB with approximately
>>> 100M
>>> > triplets and the size is increasing quite quickly. When I make a
>>> typical
>>> > SPARQL query for getting data from the system, it takes ages,
>>> sometimes
>>> > more than 10-20 minutes. I think performance wise this is not
>>> really
>>> > user friendly. Therefore I need to know how I can increase the
>>> speed, etc.
>>>
>>> We have made a tdbdump of the current TDB and the size for the figures
>>> we pointed out are about 70GB in RDF/xml format.
>>>
>>> > I'm running the whole system on a machine with Intel Xeon E312xx
>>> with
>>> > 32Gb RAM and many times I'm getting OutofMemory Exceptions and the
>>> > google.cache that Jena handles is the one that seems to be
>>> causing the
>>> > problem.
>>> >
>>> > Specifics stack traces would be useful to understand where the cache
>>> is being exploded. Certain kinds of query may use the cache more heavily
>>> than others so some elaboration on the general construction of queries
>>> would be interesting.
>>>
>>> Find ExceptionStackTrace.txt file as an example. Most of the times the
>>> error is quite similar.
>>>
>>> >
>>> > Are the figures I'm pointing normal (machine specs, response time,
>>> > etc.)? Is it too big/too small?
>>> >
>>> > The size of the data seems small relative to the size of the
>>> machine. You don’t specify whether you change the JVM heap size, most
>>> memory usage in TDB is off-heap via memory mapped files so setting too
>>> large a heap can negatively impact performance.
>>> >
>>> > The response times seems very poor but that may be the nature of
>>> your queries and data structure, however since you are unable to show those
>>> we can only provide generalisations
>>> >
>>> > For the moment, we have decided to split the graph in pieces,
>>> that is,
>>> > generating a new named graph every now and then so the amount of
>>> > information stored in a "current" graph is smaller. Then
>>> restricting the
>>> > query to a set of graphs things work better.
>>> >
>>> > Although this solution works, when we merge the graphs for
>>> historical
>>> > queries, we are facing the same problem as before. Then, how can
>>> we
>>> > increased the speed?
>>> >
>>> > I cannot disclosed the dataset or part of it, but I will try to
>>> somehow
>>> > explain it.
>>> >
>>> > - Ids for entities are approximately 255 random ASCII characters.
>>> Does
>>> > the size of the ids affect the speed of the SPARQL queries? If
>>> yes, can
>>> > I apply a Lucene index to the IDs in order to reduce the query
>>> time?
>>> >
>>> > It depends on the nature of the query. All terms are mapped into
>>> 64-bit internal identifiers, these are only mapped back to the original
>>> terms as and when that query engine and/or results serialisation requires
>>> it. A cache is used to speed up the mapping in both directions so
>>> depending on the nature of the queries and your system loads you may be
>>> thrashing this cache.
>>> >
>>> > - The depth level of the graph or the information relationship is
>>> around
>>> > 7-8 level at most, but most of the times it is required to link
>>> 3-4 levels.
>>> >
>>> > Difficult to say how this impacts performance because it really
>>> depends on how you are querying that structure
>>> >
>>> > - Most of the queries include several:
>>> > ?x myont:hasattribute ?b.
>>> > ?a rdf:type ?b.
>>> >
>>> > Therefore checking the class and subclasses of entities. Is there
>>> anyway
>>> > to speed up the inference as if I'm asking for the parent class I
>>> will
>>> > get also the children ones defined in my ontology.
>>> >
>>> > So are you actively using inference? If you are then that will
>>> significantly degrade performance because the inference closure is done
>>> entirely in memory i.e. not in TDB if inference is turned on and you will
>>> get minimal performance benefit from using TDB.
>>> >
>>> > If you only need simple inference like class and property hierarchy
>>> you may be better served by asserting those statically using SPARQL updates
>>> and not using dynamic inference
>>>
>>> sorry for the delay on providing examples of data and SPARQL queries we
>>> usually make.
>>>
>>> The data we are using is following the ontology that is publicly
>>> available at [1].
>>>
>>> Using this ontology, a sample of a semantic document in JSON-LD format
>>> can be found in the files attached (Observation.jsonld and
>>> Resource.jsonld). These individuals are stored in a TDB using JENA, and
>>> are stored in different graphs that can be merged within the code using
>>> MultiUnion in order to make queries. Then we are requesting for data by
>>> using SPARQL Select queries with quite a lot of inference required
>>> (SPARQL.txt).
>>>
>>> As you suggested, we have made some tries including more properties
>>> (mainly rdf:type) for the individual descriptions in order to disable
>>> inference from the requests. For instance, whenever I register a
>>> m3-lite#AirThermometer I'm always including it is also a
>>> ssn#SensingDevice. This way this device can be easily discovered by its
>>> more descriptive name and/or by its generic one.
>>>
>>> However, the results are not the expected using the same SPARQL
>>> sentences and I have to create specific SPARQL queries to properly
>>> discover the data. Is this the way you suggested to work? Should then I
>>> inform the users of our system about the way we are registering data?
>>>
>>> [1]: http://ontology.fiesta-iot.eu/ontologyDocs/fiesta-iot/doc
>>>
>>>
>>> > - I know the "." in a query acts as more or less like an AND
>>> logical
>>> > operation. Does the order of sentences have implications in the
>>> > performance? Should I start with the most restrictive ones?
>>> Should I
>>> > start with the simplest ones, i.e. checking number values, etc.?
>>> >
>>> > yes and no. TDB Will attempt to do the necessary scans in an
>>> optimal order based on its knowledge of the statistics of the data. However
>>> this only applies within a single query pattern i.e. { } so depending on
>>> the structure of your query you may need to do some manual reordering. Also
>>> if inference is involved then that may interact.
>>> >
>>> > - Some of the queries uses spatial and time filtering? Is is worth
>>> > implementing the support for spatial searches with SPARQL? Is
>>> there any
>>> > kind of index for time searches?
>>> >
>>> > There is a geospatial indexing extension but there is no temporal
>>> indexing provided by Jena.
>>>
>>> As you can see from the Resource.jsonld, we are using location. Do you
>>> think indexing will help on locating the individuals?
>>> >
>>> > Any help is more than welcome.
>>> >
>>> > Without more detail it is difficult to provide more detailed help.
>>> >
>>> > Rob
>>> >
>>> > Regards,
>>> > Jorge
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>>
>>>
>>>
>>>
>>>
>>>
>
>