Thanks, So if I have a node withe TDB 300-400MT what would be the (minimum) appropriate RAM/Disk size? Roughly speaking …
Regards, Stefan On 9/22/17, 4:36 PM, "[email protected]" <[email protected]> wrote: Can't speak for Dave, but I would think he meant "mega-triples"; 300-400 million triples of data. It's a common way of talking about the size of RDF datasets. ajs6f Dimov, Stefan wrote on 9/22/17 7:32 PM: > Thanks Dave, > > Can you clarify, please, what do you mean by “reasonable memory footprint for the data scale. For larger data (300-400MT plus) ”? > > “300-400MT” – do you mean 300-400MB RAM or probably by “MT” you mean “Mega-Transfer” or do you mean disk space? > > If 300-400MB is the RAM you are using, what is the corresponding disk size TDB takes in your particular case? > > Or if this is the disk-space, what is the size of your RAM? > > Regards, > Stefan > > > On 9/22/17, 1:22 PM, "Dimov, Stefan" <[email protected]> wrote: > > Thanks, Dave! > > S. > > On 9/22/17, 12:21 AM, "Dave Reynolds" <[email protected]> wrote: > > Sorry, missed this question ... > > It depends on the scale of the data, the size of the tomcat application, > the machine sizes available and how much API-side in-memory caching you > want to do. > > We use both styles successfully. For modest data even at high load, or > for large data at modest load then having both on the same machine works > fine and is slightly easier to scale out. So long as your machines have > a reasonable memory footprint for the data scale. For larger data > (300-400MT plus) with either significant query rates or very memory > hungry applications we split the data and front end tiers. > > Dave > > On 19/09/17 20:02, Dimov, Stefan wrote: > > Thanks for the response! > > > > One more question: > > > > Would it be better if I put Tomcat on one machine and have Fuseki on another? > > > > Provided they are both in the same network and the connection between them is unobstructed, wouldn’t this improve the performance, considering they don’t share memory/CPU? > > > > Regards, > > Stefan > > > > On 9/19/17, 5:24 AM, "Dave Reynolds" <[email protected]> wrote: > > > > On 19/09/17 11:33, George News wrote: > > > > > > On 2017-09-19 09:57, Dave Reynolds wrote: > > >> On 19/09/17 01:13, Dimov, Stefan wrote: > > >>> Hi, > > >>> > > >>> I have Tomcat setup, that receives REST requests, “translates” them > > >>> into SAPRQL queries, invokes them on the underlying FUSEKI and returns > > >>> the results: > > >>> > > >>> > > >>> USER AGENT > > >>> ^ > > >>> REST > > >>> v > > >>> --------------- > > >>> TOMCAT > > >>> ^ > > >>> REST > > >>> v > > >>> ------------- > > >>> FUSEKI > > >>> ------------ > > >>> JENA > > >>> ----------- > > >>> TDB > > >>> ---------- > > >>> > > >>> Would I be able to achieve significant performance improvement, if I > > >>> use directly the JENA libraries and bypass FUSEKI? > > >> > > >> Unlikely. We successfully use the set up you describe for dozens of > > >> services, some quite high load. We have a few which go direct to Jena > > >> for legacy reasons and they show no particular performance benefits. > > >> > > >> If your payloads can be large then make sure the way you are driving > > >> fuseki is streaming and doesn't accidentally store the entire SPARQL > > >> results in your tomcat app. This also means chosing a streamable media > > >> type for your fuseki requests. > > > > > > I'm using Jena to create my own REST service and I'm facing some issues > > > when SPARQL resultsets are big. Could you please give me a hint on the > > > streaming stuff from fuseki so I can incorporate that to my REST service? > > > > If you are just doing SELECTs then it should be straightforward. Of the > > sparql results media types then at least XML and TSV are streaming. We > > just use Jena's QueryExecutionFactory.sparqlService in the REST service > > to set up the execution. We wrap the ResultSet from execSelect and > > process that one row at a time. Our wrapper keeps track of the > > underlying QueryExecution so we can close that when finished or in the > > event of a problem. > > > > For DESCRIBE/CONSTRUCT queries then use a streamable media type for the > > RDF such as ntriples/nquads. We have less experience of that, we tend to > > actually execute those in batches (a SELECT provides a set of resource > > bindings and we then issue a DESCRIBE on those resources one batch at a > > time). > > > > Dave > > > > > > > > > >
