Re: Fuseki server: many data services or many fuseki installations?

Andy Seaborne Tue, 29 Mar 2016 01:59:28 -0700

On 21/03/16 13:35, Alexandra Kokkinaki wrote:

Hi Andy, thanks for your answers.



On Fri, Mar 18, 2016 at 11:43 AM, Andy Seaborne <[email protected]> wrote:

Hi,

it will depend on usage patterns. 2* 500 million isn't unreasonable but
validating with your expected usage is essential.
The critical factors are the usage patterns and the hardware available.
Number of queries, query complexity, number of updates, all matter. RAM is
good (which is true for any database) as are SSDs if you do lots of update
or need fast startup from cold.

What kind of usage patterns are considered not valid for big triple stores.
We are planning to use our fuseki server to allow, machine to machine
communication and also allow independent users to  express mostly spatial
queries We plan to do indexing and have a query time out too. Is that
enough to address performance issues?


They are a good idea.  It will protect the server.

It is possible to write SPARQL queries which are fundamentally expensive.

The TDB will need to get updated daily, using jena API, since I suppose
deleting and inserting everything back would take a long time. I read in (
https://lists.w3.org/Archives/Public/public-sparql-dev/2008JulSep/0029.html
) that it takes 5370secs for 100M triples  to be loaded in TDB, which is
good.
But here <https://www.w3.org/wiki/LargeTripleStores> it is said that it
took 36 hours to load 1.7B triples in TDB


... in 2008 ... with a spinning disk.

12k triples/s would be a bit slow nowadays.

At large scale tdbloader2 can be faster that tdbloader. You have to trywith your data on your hardware - it isn't a simple yes/no questionunfortunately.


tdbloader2 only loads from empty.

tdbloader does not do anything special when loading a partial database.

, which drives me towards the
daily updates rather than daily delete and insert.
How long would a 500 triple DB take to be loaded in an empty database?


500M?

Just run

tdbloader --loc DB <data> and see what rate you get - I'd be interestedin seeing the log. Every data set, every hardware set can be different.That's why it is hard to make any accurate predications - just try it.


tdbloader --loc=DB <the_data>

The pattern of the data makes a difference - LUBM loads very fast as ithas a high triples to nodes ratio so less bytes are being loaded. Alltriple stores report better figures on that data - a factor of x2 fasteris common - but it's not typical data.


        Andy

Multiple requests, whether same service or different service, are
competing for the same machine resources.  Fuseki runs requests
independently and in parallel.  There are per-database transactions
supporting multiple, truly parallel readers.

     Andy



Many thanks,

Alexandra



On 18/03/16 09:35, Alexandra Kokkinaki wrote:

Hi,

after researching on TDB performance with Big Data, I would still like to
know:
We have one fuseki server exposing 2 sparql endpoints (2million triples
each) as data services. We are planning to add one more, but with Big data

500Million triples


     - For big data is it better to use many installations of fuseki server
     or
     - many data services under the same Fuseki server?


Could fuseki cope with two or more services with more than  500 Million
triples each?

How does Fuseki cope when it has to serve concurrent queries to the
different data services?

Many thanks,

Alexandra

Re: Fuseki server: many data services or many fuseki installations?

Reply via email to