On 19/09/17 11:33, George News wrote:

On 2017-09-19 09:57, Dave Reynolds wrote:
On 19/09/17 01:13, Dimov, Stefan wrote:
Hi,

I have Tomcat setup, that receives REST requests, “translates” them
into SAPRQL queries, invokes them on the underlying FUSEKI and returns
the results:


USER AGENT
^
REST
v
---------------
TOMCAT
^
REST
v
-------------
FUSEKI
------------
JENA
-----------
TDB
----------

Would I be able to achieve significant performance improvement, if I
use directly the JENA libraries and bypass FUSEKI?

Unlikely. We successfully use the set up you describe for dozens of
services, some quite high load. We have a few which go direct to Jena
for legacy reasons and they show no particular performance benefits.

If your payloads can be large then make sure the way you are driving
fuseki is streaming and doesn't accidentally store the entire SPARQL
results in your tomcat app. This also means chosing a streamable media
type for your fuseki requests.

I'm using Jena to create my own REST service and I'm facing some issues
when SPARQL resultsets are big. Could you please give me a hint on the
streaming stuff from fuseki so I can incorporate that to my REST service?

If you are just doing SELECTs then it should be straightforward. Of the sparql results media types then at least XML and TSV are streaming. We just use Jena's QueryExecutionFactory.sparqlService in the REST service to set up the execution. We wrap the ResultSet from execSelect and process that one row at a time. Our wrapper keeps track of the underlying QueryExecution so we can close that when finished or in the event of a problem.

For DESCRIBE/CONSTRUCT queries then use a streamable media type for the RDF such as ntriples/nquads. We have less experience of that, we tend to actually execute those in batches (a SELECT provides a set of resource bindings and we then issue a DESCRIBE on those resources one batch at a time).

Dave

Reply via email to