Re: Using JENA directly, instead of Fuseki/SPARQL

George News Tue, 19 Sep 2017 05:43:35 -0700


On 2017-09-19 14:24, Dave Reynolds wrote:
> On 19/09/17 11:33, George News wrote:
>>
>> On 2017-09-19 09:57, Dave Reynolds wrote:
>>> On 19/09/17 01:13, Dimov, Stefan wrote:
>>>> Hi,
>>>>
>>>> I have Tomcat setup, that receives REST requests, “translates” them
>>>> into SAPRQL queries, invokes them on the underlying FUSEKI and returns
>>>> the results:
>>>>
>>>>
>>>> USER AGENT
>>>> ^
>>>> REST
>>>> v
>>>> ---------------
>>>> TOMCAT
>>>> ^
>>>> REST
>>>> v
>>>> -------------
>>>> FUSEKI
>>>> ------------
>>>> JENA
>>>> -----------
>>>> TDB
>>>> ----------
>>>>
>>>> Would I be able to achieve significant performance improvement, if I
>>>> use directly the JENA libraries and bypass FUSEKI?
>>>
>>> Unlikely. We successfully use the set up you describe for dozens of
>>> services, some quite high load. We have a few which go direct to Jena
>>> for legacy reasons and they show no particular performance benefits.
>>>
>>> If your payloads can be large then make sure the way you are driving
>>> fuseki is streaming and doesn't accidentally store the entire SPARQL
>>> results in your tomcat app. This also means chosing a streamable media
>>> type for your fuseki requests.
>>
>> I'm using Jena to create my own REST service and I'm facing some issues
>> when SPARQL resultsets are big. Could you please give me a hint on the
>> streaming stuff from fuseki so I can incorporate that to my REST service?
> 
> If you are just doing SELECTs then it should be straightforward. Of the
> sparql results media types then at least XML and TSV are streaming. We
> just use Jena's QueryExecutionFactory.sparqlService in the REST service
> to set up the execution. We wrap the ResultSet from execSelect and
> process that one row at a time. Our wrapper keeps track of the
> underlying QueryExecution so we can close that when finished or in the
> event of a problem.
>


In my case I'm handling everything using Jena and not Fuseki so I'm not
using sparqlService but execSelect().

However your comment about streaming lead me towards this new approach
of handling ResultSet and not having to store everything in memory. I
think this achieve a similar thing.

QueryExecution qExec = QueryExecutionFactory.create(query, m);
ResultSet rs = qExec.execSelect();
StreamingOutput stream = new StreamingOutput() {
        @Override
        public void write(OutputStream os) throws IOException,
                                           WebApplicationException {
                ResultSetFormatter.outputAsJSON(os, rs);
                
                res.close();
        }
};

return Response.ok(stream).build();


In the next major change I made to the code I might try using Fuseki and
not implementing everything by my own.


> For DESCRIBE/CONSTRUCT queries then use a streamable media type for the
> RDF such as ntriples/nquads. We have less experience of that, we tend to
> actually execute those in batches (a SELECT provides a set of resource
> bindings and we then issue a DESCRIBE on those resources one batch at a
> time).

I think the above will also work for this case.

Thanks


> Dave
> 
>

Re: Using JENA directly, instead of Fuseki/SPARQL

Reply via email to