Oak benchmarks (Was: [jr3] Index on randomly distributed data)

Jukka Zitting Thu, 08 Mar 2012 06:18:26 -0800

Hi,

On Tue, Mar 6, 2012 at 5:01 PM, Jukka Zitting <[email protected]> wrote:
> Rather than discuss this issue in the abstract, I suggest that we
> define a set of relevant performance benchmarks, and  use them for
> evaluating potential alternatives.


In addition to this specific case, I think it's important that we
define and implement a good set of performance and scalability
benchmarks as early as possible. That allows us to get a good picture
of where we are and what areas and potential bottlenecks need more
focus. Such a set of benchmarks should also make it easy to evaluate
alternative designs and produce hard evidence to help resolve
potential disagreements.

So what should we benchmark then? Here's one idea to get us started:

* Large, flat hierarchy (selected pages-articles dump from Wikipedia)
  * Time it takes to load all articles (ideally a single transaction)
  * Amount of disk space used
  * Time it takes to iterate over all articles
  * Number of reads by X clients in Y seconds (power-law distribution)
  * Number of writes by X clients in Y seconds (power-law distribution)

Ideally we'd design the benchmarks so that they can be run against not
just different configurations of Oak, but also Jackrabbit 2.x and
other databases (SQL and NoSQL) like Oracle, PostgreSQL, CouchDB and
MongoDB.

To start with, I'd target the following basic deployment configurations:

* 1 node, MB-range test sets (small embedded or development/testing deployment)
* 4 nodes, GB-range test sets (mid-size non-cloud deployment)
* 16 nodes, TB-range test sets (low-end cloud deployment)

WDYT?

BR,

Jukka Zitting

Oak benchmarks (Was: [jr3] Index on randomly distributed data)

Reply via email to