Re: Fuseki server: many data services or many fuseki installations?

Alexandra Kokkinaki Thu, 14 Apr 2016 05:45:07 -0700

Thanks Andy, so you are suggesting to break the updates (deletions and
insertions) in smaller requests to avoid any memory issues. I suppose that
we will make daily updates so that our triple store is up to date, which
will probably result some hundreds of triples.
I have seen the commit command but not the rollback. Is there any safety
net if something goes wrong during the update procedure to rollback?
Are there any nice examples about "TDB transactions" that I could start
looking at?


Many thanks

Alexandra

On Thu, Apr 14, 2016 at 12:32 PM, Andy Seaborne <[email protected]> wrote:

> On 12/04/16 14:39, Alexandra Kokkinaki wrote:
>
>> Hi Andy, thanks for your answers. So would it be feasible to add/delete
>> triples in an existing database?
>>
>
> Updates are supported.
>
> However, changing large amounts (deleting or adding or a mix) - 10's of
> millions of triples - in a single transaction (single HTTP request) will
> consume too much memory.  Such a large change would need to be broken up
> into multiple requests.
>
>         Andy
>
>
>
>> Thanks,
>>
>> Alexandra
>>
>> On Tue, Mar 29, 2016 at 9:58 AM, Andy Seaborne <[email protected]> wrote:
>>
>> On 21/03/16 13:35, Alexandra Kokkinaki wrote:
>>>
>>> Hi Andy, thanks for your answers.
>>>>
>>>>
>>>> On Fri, Mar 18, 2016 at 11:43 AM, Andy Seaborne <[email protected]>
>>>> wrote:
>>>>
>>>> Hi,
>>>>
>>>>>
>>>>> it will depend on usage patterns. 2* 500 million isn't unreasonable but
>>>>> validating with your expected usage is essential.
>>>>> The critical factors are the usage patterns and the hardware available.
>>>>> Number of queries, query complexity, number of updates, all matter. RAM
>>>>> is
>>>>> good (which is true for any database) as are SSDs if you do lots of
>>>>> update
>>>>> or need fast startup from cold.
>>>>>
>>>>> What kind of usage patterns are considered not valid for big triple
>>>>>
>>>> stores.
>>>> We are planning to use our fuseki server to allow, machine to machine
>>>> communication and also allow independent users to  express mostly
>>>> spatial
>>>> queries We plan to do indexing and have a query time out too. Is that
>>>> enough to address performance issues?
>>>>
>>>>
>>> They are a good idea.  It will protect the server.
>>>
>>> It is possible to write SPARQL queries which are fundamentally expensive.
>>>
>>> The TDB will need to get updated daily, using jena API, since I suppose
>>>
>>>> deleting and inserting everything back would take a long time. I read
>>>> in (
>>>>
>>>>
>>>> https://lists.w3.org/Archives/Public/public-sparql-dev/2008JulSep/0029.html
>>>> ) that it takes 5370secs for 100M triples  to be loaded in TDB, which is
>>>> good.
>>>> But here <https://www.w3.org/wiki/LargeTripleStores> it is said that it
>>>> took 36 hours to load 1.7B triples in TDB
>>>>
>>>>
>>> ... in 2008 ... with a spinning disk.
>>>
>>> 12k triples/s would be a bit slow nowadays.
>>>
>>> At large scale tdbloader2 can be faster that tdbloader. You have to try
>>> with your data on your hardware - it isn't a simple yes/no question
>>> unfortunately.
>>>
>>> tdbloader2 only loads from empty.
>>>
>>> tdbloader does not do anything special when loading a partial database.
>>>
>>> , which drives me towards the
>>>
>>>> daily updates rather than daily delete and insert.
>>>> How long would a 500 triple DB take to be loaded in an empty database?
>>>>
>>>>
>>> 500M?
>>>
>>> Just run
>>>
>>> tdbloader --loc DB <data> and see what rate you get - I'd be interested
>>> in
>>> seeing the log.  Every data set, every hardware set can be different.
>>> That's why it is hard to make any accurate predications - just try it.
>>>
>>> tdbloader --loc=DB <the_data>
>>>
>>> The pattern of the data makes a difference - LUBM loads very fast as it
>>> has a high triples to nodes ratio so less bytes are being loaded.  All
>>> triple stores report better figures on that data - a factor of x2 faster
>>> is
>>> common - but it's not typical data.
>>>
>>>          Andy
>>>
>>>
>>> Multiple requests, whether same service or different service, are
>>>
>>>> competing for the same machine resources.  Fuseki runs requests
>>>>> independently and in parallel.  There are per-database transactions
>>>>> supporting multiple, truly parallel readers.
>>>>>
>>>>>
>>>>>       Andy
>>>>
>>>>>
>>>>>
>>>>
>>>> Many thanks,
>>>>
>>>> Alexandra
>>>>
>>>>
>>>>
>>>>> On 18/03/16 09:35, Alexandra Kokkinaki wrote:
>>>>>
>>>>> Hi,
>>>>>
>>>>>>
>>>>>> after researching on TDB performance with Big Data, I would still like
>>>>>> to
>>>>>> know:
>>>>>> We have one fuseki server exposing 2 sparql endpoints (2million
>>>>>> triples
>>>>>> each) as data services. We are planning to add one more, but with Big
>>>>>> data
>>>>>>
>>>>>> 500Million triples
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>       - For big data is it better to use many installations of fuseki
>>>>>> server
>>>>>>       or
>>>>>>       - many data services under the same Fuseki server?
>>>>>>
>>>>>>
>>>>>> Could fuseki cope with two or more services with more than  500
>>>>>> Million
>>>>>> triples each?
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> How does Fuseki cope when it has to serve concurrent queries to the
>>>>>
>>>>>> different data services?
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> Many thanks,
>>>>>
>>>>>>
>>>>>> Alexandra
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Fuseki server: many data services or many fuseki installations?

Reply via email to