Re: Using JENA directly, instead of Fuseki/SPARQL

Dave Reynolds Tue, 19 Sep 2017 09:02:20 -0700


On 19/09/17 16:48, George News wrote:



On 2017-09-19 14:50, Dave Reynolds wrote:

On 19/09/17 13:42, George News wrote:



On 2017-09-19 14:24, Dave Reynolds wrote:

On 19/09/17 11:33, George News wrote:


On 2017-09-19 09:57, Dave Reynolds wrote:

On 19/09/17 01:13, Dimov, Stefan wrote:

Hi,

I have Tomcat setup, that receives REST requests, “translates” them
into SAPRQL queries, invokes them on the underlying FUSEKI and
returns
the results:


USER AGENT
^
REST
v
---------------
TOMCAT
^
REST
v
-------------
FUSEKI
------------
JENA
-----------
TDB
----------

Would I be able to achieve significant performance improvement, if I
use directly the JENA libraries and bypass FUSEKI?


Unlikely. We successfully use the set up you describe for dozens of
services, some quite high load. We have a few which go direct to Jena
for legacy reasons and they show no particular performance benefits.

If your payloads can be large then make sure the way you are driving
fuseki is streaming and doesn't accidentally store the entire SPARQL
results in your tomcat app. This also means chosing a streamable media
type for your fuseki requests.


I'm using Jena to create my own REST service and I'm facing some issues
when SPARQL resultsets are big. Could you please give me a hint on the
streaming stuff from fuseki so I can incorporate that to my REST
service?


If you are just doing SELECTs then it should be straightforward. Of the
sparql results media types then at least XML and TSV are streaming. We
just use Jena's QueryExecutionFactory.sparqlService in the REST service
to set up the execution. We wrap the ResultSet from execSelect and
process that one row at a time. Our wrapper keeps track of the
underlying QueryExecution so we can close that when finished or in the
event of a problem.


In my case I'm handling everything using Jena and not Fuseki so I'm not
using sparqlService but execSelect().

However your comment about streaming lead me towards this new approach
of handling ResultSet and not having to store everything in memory. I
think this achieve a similar thing.

QueryExecution qExec = QueryExecutionFactory.create(query, m);
ResultSet rs = qExec.execSelect();
StreamingOutput stream = new StreamingOutput() {
     @Override
     public void write(OutputStream os) throws IOException,
                                        WebApplicationException {
         ResultSetFormatter.outputAsJSON(os, rs);

res.close();

     }
};

return Response.ok(stream).build();


You might what to check if ResultSetFormatter.outputAsJSON is itself
streaming, it may not be. In our case we have custom JSON and CSV
serializers which take care to stream.


 From JSONOutputResultSet.java in Jena it seems they are using a stream,
therefore they serializing to a stream. Is that what you mean by "taking
care to stream"?

I meant more that there is no unbounded state or in-memory renderingrequired. At a quick glance at the code it does need to retain a map ofbNodes but that's unavoidable and unless you have very large resultssets with a very large number bNodes I can't see that being a problemfor you.


Dave

Re: Using JENA directly, instead of Fuseki/SPARQL

Reply via email to