Re: Using JENA directly, instead of Fuseki/SPARQL

Dave Reynolds Fri, 22 Sep 2017 00:22:26 -0700

Sorry, missed this question ...

It depends on the scale of the data, the size of the tomcat application,the machine sizes available and how much API-side in-memory caching youwant to do.

We use both styles successfully. For modest data even at high load, orfor large data at modest load then having both on the same machine worksfine and is slightly easier to scale out. So long as your machines havea reasonable memory footprint for the data scale. For larger data(300-400MT plus) with either significant query rates or very memoryhungry applications we split the data and front end tiers.


Dave

On 19/09/17 20:02, Dimov, Stefan wrote:

Thanks for the response!

One more question:

Would it be better if I put Tomcat on one machine and have Fuseki on another?

Provided they are both in the same network and the connection between them is 
unobstructed, wouldn’t this improve the performance, considering they don’t 
share memory/CPU?

Regards,
Stefan

On 9/19/17, 5:24 AM, "Dave Reynolds" <[email protected]> wrote:

     On 19/09/17 11:33, George News wrote:
     >
     > On 2017-09-19 09:57, Dave Reynolds wrote:
     >> On 19/09/17 01:13, Dimov, Stefan wrote:
     >>> Hi,
     >>>
     >>> I have Tomcat setup, that receives REST requests, “translates” them
     >>> into SAPRQL queries, invokes them on the underlying FUSEKI and returns
     >>> the results:
     >>>
     >>>
     >>> USER AGENT
     >>> ^
     >>> REST
     >>> v
     >>> ---------------
     >>> TOMCAT
     >>> ^
     >>> REST
     >>> v
     >>> -------------
     >>> FUSEKI
     >>> ------------
     >>> JENA
     >>> -----------
     >>> TDB
     >>> ----------
     >>>
     >>> Would I be able to achieve significant performance improvement, if I
     >>> use directly the JENA libraries and bypass FUSEKI?
     >>
     >> Unlikely. We successfully use the set up you describe for dozens of
     >> services, some quite high load. We have a few which go direct to Jena
     >> for legacy reasons and they show no particular performance benefits.
     >>
     >> If your payloads can be large then make sure the way you are driving
     >> fuseki is streaming and doesn't accidentally store the entire SPARQL
     >> results in your tomcat app. This also means chosing a streamable media
     >> type for your fuseki requests.
     >
     > I'm using Jena to create my own REST service and I'm facing some issues
     > when SPARQL resultsets are big. Could you please give me a hint on the
     > streaming stuff from fuseki so I can incorporate that to my REST service?

If you are just doing SELECTs then it should be straightforward. Of the

     sparql results media types then at least XML and TSV are streaming. We
     just use Jena's QueryExecutionFactory.sparqlService in the REST service
     to set up the execution. We wrap the ResultSet from execSelect and
     process that one row at a time. Our wrapper keeps track of the
     underlying QueryExecution so we can close that when finished or in the
     event of a problem.

For DESCRIBE/CONSTRUCT queries then use a streamable media type for the

     RDF such as ntriples/nquads. We have less experience of that, we tend to
     actually execute those in batches (a SELECT provides a set of resource
     bindings and we then issue a DESCRIBE on those resources one batch at a
     time).

Dave

Re: Using JENA directly, instead of Fuseki/SPARQL

Reply via email to