On 21/07/13 13:34, Marco Neumann wrote:
what are the most critical issues that prevent TDB from handling larger
data sets at the moment iyo? jvm? index?

Hi Marco,

Probably some degree of clustering.

Even with improved, more compact, indexing (simple run length encoding of the B+Tree leaves for example) doesn't make the jump IMO. More RAM, and more CPU does.

There is a limitation on system bus I/O to main RAM - there is only one bus on commodity hardware and the processor isn't actually running at 100%. Database code is doing a lot of data structure walking, and not a lot of CPU-intensive compute. [1]

So a single machine as the way to scale means one with special (=expensive) interconnect and RAM. Not so commodity any more.

Several machines, by which I mean a few, like 4-10, seems a more effective way to go.

There are some consequences of this - MVCC datastructures are better for transactions across a cluster. The demands of multi-machine transaction coordination are easier.

(MVCC are datastructures where when you write, you also copy all the tree nodes from root to the active block, then the structure has one root per transaction - also, transactions become one-write, not two (once to log, once to main DB). See CouchDB or Mulgara for example.)

And "just doing it" so there is a working architecture that can be improved rather than trying to be perfect first time.

        Andy

[1]

See
http://highscalability.com/blog/2013/6/13/busting-4-modern-hardware-myths-are-memory-hdds-and-ssds-rea.html

and the 45min video

http://www.youtube.com/watch?v=MC1EKLQ2Wmg


Reply via email to