> Any measurements would be unreliable at best and probably worthless. > 1/ Different data gives different answers to queries. > 2/ Caching matters a lot for databases and a different setup will cache > differently.
This is so true, and it's not even a complete list. It might be better to approach the problem from the application layer. Are you able to put together a good suite of test data, queries, and updates, accompanied by a good understanding of the kinds of load the triplestore will experience in production? Adam Soroka > On Dec 24, 2017, at 1:21 PM, Andy Seaborne <[email protected]> wrote: > > On 24/12/17 14:11, Andrew U. Frank wrote: >> thank you for the information; i take that using teh indexes a one-variable >> query would be (close to) linear in the amount of triples found. i saw that >> TBD does build indexes and assumed they use hashes. >> i have still the following questions: >> 1. is performance different for a named or the default graph? > > Query performance is approximately the same for GRAPH. > Update is slower. > >> 2. can i simplify measurements with putting pieces of the dataset in >> different graphs and then add more or less of these graphs to take a >> measure? say i have 5 named graphs, each with 10 million triples, do queries >> over 2, 3, 4 and 5 graphs give the same (or very similar) results than when >> i would load 20, 30, 40 and 50 million triples in a single named graph? > > Any measurements would be unreliable at best and probably worthless. > > 1/ Different data gives different answers to queries. > > 2/ Caching matters a lot for databases and a different setup will cache > differently. > > Andy > >> thank you for help! >> andrew >> On 12/23/2017 06:20 AM, ajs6f wrote: >>> For example, the TIM in-memory dataset impl uses 3 indexes on triples and 6 >>> on quads to ensure that all one-variable queries (i.e. for triples ?s <p> >>> <o>, <s> ?p <o>, <s> <p> ?o) will be as direct as possible. The indexes are >>> hashmaps (e.g. Map<Node, Map<Node, Set<Node>>>) and don't use the kind of >>> node directory that TDB does. >>> >>> There are lots of other ways to play that out, according to the balance of >>> times costs and storage costs desired and the expected types of queries. >>> >>> Adam >>> >>>> On Dec 23, 2017, at 2:56 AM, Lorenz Buehmann >>>> <[email protected]> wrote: >>>> >>>> >>>> On 23.12.2017 00:47, Andrew U. Frank wrote: >>>>> are there some rules which queries are linear in the amount of data in >>>>> the graph? is it correct to assume that searching for a triples based >>>>> on a single condition (?p a X) is logarithmic in the size of the data >>>>> collection? >>>> Why should it be logarithmic? The complexity of matching a single BGP >>>> depends on the implementation. I could search for matches by doing a >>>> scan on the whole dataset - that would for sure be not logarithmic but >>>> linear. Usually, if exists, a triple store would use the POS index in >>>> order to find bindings for variable ?p. >>>> >>>> Cheers, >>>> Lorenz
