Re: Using JENA directly, instead of Fuseki/SPARQL

George News Tue, 19 Sep 2017 08:48:17 -0700


On 2017-09-19 14:50, Dave Reynolds wrote:
> On 19/09/17 13:42, George News wrote:
>>
>>
>> On 2017-09-19 14:24, Dave Reynolds wrote:
>>> On 19/09/17 11:33, George News wrote:
>>>>
>>>> On 2017-09-19 09:57, Dave Reynolds wrote:
>>>>> On 19/09/17 01:13, Dimov, Stefan wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I have Tomcat setup, that receives REST requests, “translates” them
>>>>>> into SAPRQL queries, invokes them on the underlying FUSEKI and
>>>>>> returns
>>>>>> the results:
>>>>>>
>>>>>>
>>>>>> USER AGENT
>>>>>> ^
>>>>>> REST
>>>>>> v
>>>>>> ---------------
>>>>>> TOMCAT
>>>>>> ^
>>>>>> REST
>>>>>> v
>>>>>> -------------
>>>>>> FUSEKI
>>>>>> ------------
>>>>>> JENA
>>>>>> -----------
>>>>>> TDB
>>>>>> ----------
>>>>>>
>>>>>> Would I be able to achieve significant performance improvement, if I
>>>>>> use directly the JENA libraries and bypass FUSEKI?
>>>>>
>>>>> Unlikely. We successfully use the set up you describe for dozens of
>>>>> services, some quite high load. We have a few which go direct to Jena
>>>>> for legacy reasons and they show no particular performance benefits.
>>>>>
>>>>> If your payloads can be large then make sure the way you are driving
>>>>> fuseki is streaming and doesn't accidentally store the entire SPARQL
>>>>> results in your tomcat app. This also means chosing a streamable media
>>>>> type for your fuseki requests.
>>>>
>>>> I'm using Jena to create my own REST service and I'm facing some issues
>>>> when SPARQL resultsets are big. Could you please give me a hint on the
>>>> streaming stuff from fuseki so I can incorporate that to my REST
>>>> service?
>>>
>>> If you are just doing SELECTs then it should be straightforward. Of the
>>> sparql results media types then at least XML and TSV are streaming. We
>>> just use Jena's QueryExecutionFactory.sparqlService in the REST service
>>> to set up the execution. We wrap the ResultSet from execSelect and
>>> process that one row at a time. Our wrapper keeps track of the
>>> underlying QueryExecution so we can close that when finished or in the
>>> event of a problem.
>>>
>>
>> In my case I'm handling everything using Jena and not Fuseki so I'm not
>> using sparqlService but execSelect().
>>
>> However your comment about streaming lead me towards this new approach
>> of handling ResultSet and not having to store everything in memory. I
>> think this achieve a similar thing.
>>
>> QueryExecution qExec = QueryExecutionFactory.create(query, m);
>> ResultSet rs = qExec.execSelect();
>> StreamingOutput stream = new StreamingOutput() {
>>     @Override
>>     public void write(OutputStream os) throws IOException,
>>                                        WebApplicationException {
>>         ResultSetFormatter.outputAsJSON(os, rs);
>>        
>>         res.close();
>>     }
>> };
>>
>> return Response.ok(stream).build();
> 
> You might what to check if ResultSetFormatter.outputAsJSON is itself
> streaming, it may not be. In our case we have custom JSON and CSV
> serializers which take care to stream.


>From JSONOutputResultSet.java in Jena it seems they are using a stream,
therefore they serializing to a stream. Is that what you mean by "taking
care to stream"?

> As I say, we explicitly track the qExec and ensure that is closed when
> the formatting is completed, using try/finally to be sure. I'm not sure
> whether ResultSet.close() will ensure that for you. [Not saying it
> won't, just not sure.]

It is a simple example. We also check that the dataset transaction is
ended. And I guess I should check for
IOException/WebApplicationException somewhere in order to abort
transaction if the client closes or there is some connection error.

> Dave
>

Re: Using JENA directly, instead of Fuseki/SPARQL

Reply via email to