Re: performance measures

ajs6f Sun, 24 Dec 2017 10:24:32 -0800

> Any measurements would be unreliable at best and probably worthless.
> 1/ Different data gives different answers to queries.
> 2/ Caching matters a lot for databases and a different setup will cache 
> differently.


This is so true, and it's not even a complete list. It might be better to 
approach the problem from the application layer. Are you able to put together a 
good suite of test data, queries, and updates, accompanied by a good 
understanding of the kinds of load the triplestore will experience in 
production?

Adam Soroka

> On Dec 24, 2017, at 1:21 PM, Andy Seaborne <[email protected]> wrote:
> 
> On 24/12/17 14:11, Andrew U. Frank wrote:
>> thank you for the information; i take that using teh indexes  a one-variable 
>> query would be (close to) linear in the amount of triples found. i saw that 
>> TBD does build indexes and assumed they use hashes.
>> i have still the following questions:
>> 1. is performance different for a named or the default graph?
> 
> Query performance is approximately the same for GRAPH.
> Update is slower.
> 
>> 2. can i simplify measurements with putting pieces of the dataset in 
>> different graphs and then add more or less of these graphs to take a 
>> measure? say i have 5 named graphs, each with 10 million triples, do queries 
>> over 2, 3, 4 and 5 graphs give the same (or very similar) results than when 
>> i would load 20, 30, 40 and 50 million triples in a single named graph?
> 
> Any measurements would be unreliable at best and probably worthless.
> 
> 1/ Different data gives different answers to queries.
> 
> 2/ Caching matters a lot for databases and a different setup will cache 
> differently.
> 
>    Andy
> 
>> thank you for help!
>> andrew
>> On 12/23/2017 06:20 AM, ajs6f wrote:
>>> For example, the TIM in-memory dataset impl uses 3 indexes on triples and 6 
>>> on quads to ensure that all one-variable queries (i.e. for triples ?s <p> 
>>> <o>, <s> ?p <o>, <s> <p> ?o) will be as direct as possible. The indexes are 
>>> hashmaps (e.g. Map<Node, Map<Node, Set<Node>>>) and don't use the kind of 
>>> node directory that TDB does.
>>> 
>>> There are lots of other ways to play that out, according to the balance of 
>>> times costs and storage costs desired and the expected types of queries.
>>> 
>>> Adam
>>> 
>>>> On Dec 23, 2017, at 2:56 AM, Lorenz Buehmann 
>>>> <[email protected]> wrote:
>>>> 
>>>> 
>>>> On 23.12.2017 00:47, Andrew U. Frank wrote:
>>>>> are there some rules which queries are linear in the amount of data in
>>>>> the graph? is it correct to assume that searching for a triples based
>>>>> on a single condition (?p a X) is logarithmic in the size of the data
>>>>> collection?
>>>> Why should it be logarithmic? The complexity of matching a single BGP
>>>> depends on the implementation. I could search for matches by doing a
>>>> scan on the whole dataset - that would for sure be not logarithmic but
>>>> linear. Usually, if exists, a triple store would use the POS index in
>>>> order to find bindings for variable ?p.
>>>> 
>>>> Cheers,
>>>> Lorenz

Re: performance measures

Reply via email to