On 14/01/16 17:57, Adam Kimball wrote:
HI all,

I’m working with a vendor to purchase a taxonomy mgmt tool.  The vendor uses 
SDB and we’ll need to setup a database with sufficient cpu/ram to support the 
tool.  The vendor, perhaps surprisingly, doesn’t have any data that can help me 
with sizing the DB.  I’m planning on storing 50m triples off the bat.  About 
1/3 of the properties will be data types but generally shorter than 256 
characters.

My hunch is:


   *   This is a relatively small amount of data from a disk perspective – 
256GB of SSD would be more than enough
   *   CPU wise a standard 2 proc 4 core machine would be more than enough

Thoughts?  Is there a better way of getting these questions answered?

Thanks!
-Adam


Hi Adam,

(1) Yes and (2) yes.


1/ Normal data is very roughly 5m triples to the 1G of disk, so you should have plenty of room.

Caveats:
  It assumes it is not dominated by long literals.
  It can be less (e.g. the ratio of triples/nodes is high)
  It assumes no compression.

(I did a very, very quick check with SDB/MySQL and with TDB on some BSBM data and got those sort of numbers).


2/ This one is more dependent on workload and environment.


The amount of RAM matters but at 50e6, triples, 10G disk,
a machine can cache the working set at 32G or greater. (Database tuning may be needed.)

It's a SQL database - all the usual applies.

If the machine has any other applications or databases, database performance is impacted.

If the machine is VM'ed, then that can cause poor performance.

If the machine is VM'ed and running on hardware supporting several VMs, then performance can be poor, erratic and thoroughly mysterious.

If the database is a long way away from the SDB application/engine (e.g. different datacenters), it can impact performance.

Having an SSD is good - the cold start performance is better.

        Andy

Reply via email to