Thanks,

So if I have a node withe TDB 300-400MT what would be the (minimum) appropriate 
RAM/Disk size? Roughly speaking …

Regards,
Stefan 

On 9/22/17, 4:36 PM, "[email protected]" <[email protected]> wrote:

    Can't speak for Dave, but I would think he meant "mega-triples"; 300-400 
million triples of data.
    It's a common way of talking about the size of RDF datasets.
    
    
    ajs6f
    Dimov, Stefan wrote on 9/22/17 7:32 PM:
    > Thanks Dave,
    >
    > Can you clarify, please, what do you mean by “reasonable memory footprint 
for the data scale. For larger data (300-400MT plus) ”?
    >
    > “300-400MT” – do you mean 300-400MB RAM or probably by “MT” you mean 
“Mega-Transfer” or do you mean disk space?
    >
    > If 300-400MB is the RAM you are using, what is the corresponding disk 
size TDB takes in your particular case?
    >
    > Or if this is the disk-space, what is the size of your RAM?
    >
    > Regards,
    > Stefan
    >
    >
    > On 9/22/17, 1:22 PM, "Dimov, Stefan" <[email protected]> wrote:
    >
    >     Thanks, Dave!
    >
    >     S.
    >
    >     On 9/22/17, 12:21 AM, "Dave Reynolds" <[email protected]> 
wrote:
    >
    >         Sorry, missed this question ...
    >
    >         It depends on the scale of the data, the size of the tomcat 
application,
    >         the machine sizes available and how much API-side in-memory 
caching you
    >         want to do.
    >
    >         We use both styles successfully. For modest data even at high 
load, or
    >         for large data at modest load then having both on the same 
machine works
    >         fine and is slightly easier to scale out. So long as your 
machines have
    >         a reasonable memory footprint for the data scale. For larger data
    >         (300-400MT plus) with either significant query rates or very 
memory
    >         hungry applications we split the data and front end tiers.
    >
    >         Dave
    >
    >         On 19/09/17 20:02, Dimov, Stefan wrote:
    >         > Thanks for the response!
    >         >
    >         > One more question:
    >         >
    >         > Would it be better if I put Tomcat on one machine and have 
Fuseki on another?
    >         >
    >         > Provided they are both in the same network and the connection 
between them is unobstructed, wouldn’t this improve the performance, 
considering they don’t share memory/CPU?
    >         >
    >         > Regards,
    >         > Stefan
    >         >
    >         > On 9/19/17, 5:24 AM, "Dave Reynolds" 
<[email protected]> wrote:
    >         >
    >         >      On 19/09/17 11:33, George News wrote:
    >         >      >
    >         >      > On 2017-09-19 09:57, Dave Reynolds wrote:
    >         >      >> On 19/09/17 01:13, Dimov, Stefan wrote:
    >         >      >>> Hi,
    >         >      >>>
    >         >      >>> I have Tomcat setup, that receives REST requests, 
“translates” them
    >         >      >>> into SAPRQL queries, invokes them on the underlying 
FUSEKI and returns
    >         >      >>> the results:
    >         >      >>>
    >         >      >>>
    >         >      >>> USER AGENT
    >         >      >>> ^
    >         >      >>> REST
    >         >      >>> v
    >         >      >>> ---------------
    >         >      >>> TOMCAT
    >         >      >>> ^
    >         >      >>> REST
    >         >      >>> v
    >         >      >>> -------------
    >         >      >>> FUSEKI
    >         >      >>> ------------
    >         >      >>> JENA
    >         >      >>> -----------
    >         >      >>> TDB
    >         >      >>> ----------
    >         >      >>>
    >         >      >>> Would I be able to achieve significant performance 
improvement, if I
    >         >      >>> use directly the JENA libraries and bypass FUSEKI?
    >         >      >>
    >         >      >> Unlikely. We successfully use the set up you describe 
for dozens of
    >         >      >> services, some quite high load. We have a few which go 
direct to Jena
    >         >      >> for legacy reasons and they show no particular 
performance benefits.
    >         >      >>
    >         >      >> If your payloads can be large then make sure the way 
you are driving
    >         >      >> fuseki is streaming and doesn't accidentally store the 
entire SPARQL
    >         >      >> results in your tomcat app. This also means chosing a 
streamable media
    >         >      >> type for your fuseki requests.
    >         >      >
    >         >      > I'm using Jena to create my own REST service and I'm 
facing some issues
    >         >      > when SPARQL resultsets are big. Could you please give me 
a hint on the
    >         >      > streaming stuff from fuseki so I can incorporate that to 
my REST service?
    >         >
    >         >      If you are just doing SELECTs then it should be 
straightforward. Of the
    >         >      sparql results media types then at least XML and TSV are 
streaming. We
    >         >      just use Jena's QueryExecutionFactory.sparqlService in the 
REST service
    >         >      to set up the execution. We wrap the ResultSet from 
execSelect and
    >         >      process that one row at a time. Our wrapper keeps track of 
the
    >         >      underlying QueryExecution so we can close that when 
finished or in the
    >         >      event of a problem.
    >         >
    >         >      For DESCRIBE/CONSTRUCT queries then use a streamable media 
type for the
    >         >      RDF such as ntriples/nquads. We have less experience of 
that, we tend to
    >         >      actually execute those in batches (a SELECT provides a set 
of resource
    >         >      bindings and we then issue a DESCRIBE on those resources 
one batch at a
    >         >      time).
    >         >
    >         >      Dave
    >         >
    >         >
    >         >
    >
    >
    >
    >
    

Reply via email to