Re: Using JENA directly, instead of Fuseki/SPARQL

Dimov, Stefan Fri, 22 Sep 2017 16:32:56 -0700

Thanks Dave,

Can you clarify, please, what do you mean by “reasonable memory footprint for 
the data scale. For larger data (300-400MT plus) ”?


“300-400MT” – do you mean 300-400MB RAM or probably by “MT” you mean 
“Mega-Transfer” or do you mean disk space?

If 300-400MB is the RAM you are using, what is the corresponding disk size TDB 
takes in your particular case?

Or if this is the disk-space, what is the size of your RAM?

Regards,
Stefan


On 9/22/17, 1:22 PM, "Dimov, Stefan" <[email protected]> wrote:

    Thanks, Dave!
    
    S.
    
    On 9/22/17, 12:21 AM, "Dave Reynolds" <[email protected]> wrote:
    
        Sorry, missed this question ...
        
        It depends on the scale of the data, the size of the tomcat 
application, 
        the machine sizes available and how much API-side in-memory caching you 
        want to do.
        
        We use both styles successfully. For modest data even at high load, or 
        for large data at modest load then having both on the same machine 
works 
        fine and is slightly easier to scale out. So long as your machines have 
        a reasonable memory footprint for the data scale. For larger data 
        (300-400MT plus) with either significant query rates or very memory 
        hungry applications we split the data and front end tiers.
        
        Dave
        
        On 19/09/17 20:02, Dimov, Stefan wrote:
        > Thanks for the response!
        > 
        > One more question:
        > 
        > Would it be better if I put Tomcat on one machine and have Fuseki on 
another?
        > 
        > Provided they are both in the same network and the connection between 
them is unobstructed, wouldn’t this improve the performance, considering they 
don’t share memory/CPU?
        > 
        > Regards,
        > Stefan
        > 
        > On 9/19/17, 5:24 AM, "Dave Reynolds" <[email protected]> 
wrote:
        > 
        >      On 19/09/17 11:33, George News wrote:
        >      >
        >      > On 2017-09-19 09:57, Dave Reynolds wrote:
        >      >> On 19/09/17 01:13, Dimov, Stefan wrote:
        >      >>> Hi,
        >      >>>
        >      >>> I have Tomcat setup, that receives REST requests, 
“translates” them
        >      >>> into SAPRQL queries, invokes them on the underlying FUSEKI 
and returns
        >      >>> the results:
        >      >>>
        >      >>>
        >      >>> USER AGENT
        >      >>> ^
        >      >>> REST
        >      >>> v
        >      >>> ---------------
        >      >>> TOMCAT
        >      >>> ^
        >      >>> REST
        >      >>> v
        >      >>> -------------
        >      >>> FUSEKI
        >      >>> ------------
        >      >>> JENA
        >      >>> -----------
        >      >>> TDB
        >      >>> ----------
        >      >>>
        >      >>> Would I be able to achieve significant performance 
improvement, if I
        >      >>> use directly the JENA libraries and bypass FUSEKI?
        >      >>
        >      >> Unlikely. We successfully use the set up you describe for 
dozens of
        >      >> services, some quite high load. We have a few which go direct 
to Jena
        >      >> for legacy reasons and they show no particular performance 
benefits.
        >      >>
        >      >> If your payloads can be large then make sure the way you are 
driving
        >      >> fuseki is streaming and doesn't accidentally store the entire 
SPARQL
        >      >> results in your tomcat app. This also means chosing a 
streamable media
        >      >> type for your fuseki requests.
        >      >
        >      > I'm using Jena to create my own REST service and I'm facing 
some issues
        >      > when SPARQL resultsets are big. Could you please give me a 
hint on the
        >      > streaming stuff from fuseki so I can incorporate that to my 
REST service?
        >      
        >      If you are just doing SELECTs then it should be straightforward. 
Of the
        >      sparql results media types then at least XML and TSV are 
streaming. We
        >      just use Jena's QueryExecutionFactory.sparqlService in the REST 
service
        >      to set up the execution. We wrap the ResultSet from execSelect 
and
        >      process that one row at a time. Our wrapper keeps track of the
        >      underlying QueryExecution so we can close that when finished or 
in the
        >      event of a problem.
        >      
        >      For DESCRIBE/CONSTRUCT queries then use a streamable media type 
for the
        >      RDF such as ntriples/nquads. We have less experience of that, we 
tend to
        >      actually execute those in batches (a SELECT provides a set of 
resource
        >      bindings and we then issue a DESCRIBE on those resources one 
batch at a
        >      time).
        >      
        >      Dave
        >      
        >      
        >

Re: Using JENA directly, instead of Fuseki/SPARQL

Reply via email to