Re: Using JENA directly, instead of Fuseki/SPARQL

George News Tue, 19 Sep 2017 09:11:33 -0700

On 2017-09-19 18:01, Dave Reynolds wrote:
> 
> On 19/09/17 16:48, George News wrote:
>>
>>
>> On 2017-09-19 14:50, Dave Reynolds wrote:
>>> On 19/09/17 13:42, George News wrote:
>>>>
>>>>
>>>> On 2017-09-19 14:24, Dave Reynolds wrote:
>>>>> On 19/09/17 11:33, George News wrote:
>>>>>>
>>>>>> On 2017-09-19 09:57, Dave Reynolds wrote:
>>>>>>> On 19/09/17 01:13, Dimov, Stefan wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I have Tomcat setup, that receives REST requests, “translates” them
>>>>>>>> into SAPRQL queries, invokes them on the underlying FUSEKI and
>>>>>>>> returns
>>>>>>>> the results:
>>>>>>>>
>>>>>>>>
>>>>>>>> USER AGENT
>>>>>>>> ^
>>>>>>>> REST
>>>>>>>> v
>>>>>>>> ---------------
>>>>>>>> TOMCAT
>>>>>>>> ^
>>>>>>>> REST
>>>>>>>> v
>>>>>>>> -------------
>>>>>>>> FUSEKI
>>>>>>>> ------------
>>>>>>>> JENA
>>>>>>>> -----------
>>>>>>>> TDB
>>>>>>>> ----------
>>>>>>>>
>>>>>>>> Would I be able to achieve significant performance improvement,
>>>>>>>> if I
>>>>>>>> use directly the JENA libraries and bypass FUSEKI?
>>>>>>>
>>>>>>> Unlikely. We successfully use the set up you describe for dozens of
>>>>>>> services, some quite high load. We have a few which go direct to
>>>>>>> Jena
>>>>>>> for legacy reasons and they show no particular performance benefits.
>>>>>>>
>>>>>>> If your payloads can be large then make sure the way you are driving
>>>>>>> fuseki is streaming and doesn't accidentally store the entire SPARQL
>>>>>>> results in your tomcat app. This also means chosing a streamable
>>>>>>> media
>>>>>>> type for your fuseki requests.
>>>>>>
>>>>>> I'm using Jena to create my own REST service and I'm facing some
>>>>>> issues
>>>>>> when SPARQL resultsets are big. Could you please give me a hint on
>>>>>> the
>>>>>> streaming stuff from fuseki so I can incorporate that to my REST
>>>>>> service?
>>>>>
>>>>> If you are just doing SELECTs then it should be straightforward. Of
>>>>> the
>>>>> sparql results media types then at least XML and TSV are streaming. We
>>>>> just use Jena's QueryExecutionFactory.sparqlService in the REST
>>>>> service
>>>>> to set up the execution. We wrap the ResultSet from execSelect and
>>>>> process that one row at a time. Our wrapper keeps track of the
>>>>> underlying QueryExecution so we can close that when finished or in the
>>>>> event of a problem.
>>>>>
>>>>
>>>> In my case I'm handling everything using Jena and not Fuseki so I'm not
>>>> using sparqlService but execSelect().
>>>>
>>>> However your comment about streaming lead me towards this new approach
>>>> of handling ResultSet and not having to store everything in memory. I
>>>> think this achieve a similar thing.
>>>>
>>>> QueryExecution qExec = QueryExecutionFactory.create(query, m);
>>>> ResultSet rs = qExec.execSelect();
>>>> StreamingOutput stream = new StreamingOutput() {
>>>>      @Override
>>>>      public void write(OutputStream os) throws IOException,
>>>>                                         WebApplicationException {
>>>>          ResultSetFormatter.outputAsJSON(os, rs);
>>>>                  res.close();
>>>>      }
>>>> };
>>>>
>>>> return Response.ok(stream).build();
>>>
>>> You might what to check if ResultSetFormatter.outputAsJSON is itself
>>> streaming, it may not be. In our case we have custom JSON and CSV
>>> serializers which take care to stream.
>>
>>  From JSONOutputResultSet.java in Jena it seems they are using a stream,
>> therefore they serializing to a stream. Is that what you mean by "taking
>> care to stream"?
> 
> I meant more that there is no unbounded state or in-memory rendering
> required. At a quick glance at the code it does need to retain a map of
> bNodes but that's unavoidable and unless you have very large results
> sets with a very large number bNodes I can't see that being a problem
> for you.
> 
> Dave
>


Thanks a lot ... I'm learning quite a lot from you guys. Hope I manage
to solve my issues.

Re: Using JENA directly, instead of Fuseki/SPARQL

Reply via email to