Re: Role of RDF on the Web and within enterprise applications. was: AW: Berlin SPARQL Benchmark V2 - Results for Sesame, Virtuoso, Jena TDB, D2R Server, and MySQL

2008-09-30 Thread Kingsley Idehen


Chris Bizer wrote:

Hi Orri,

  
It is my feeling that RDF has a dual role:  
1. interchange format:  This is like what XML does, except that RDF has

more semantics and expressivity.  
  

2: Database storage format for cases where data must be integrated and is

too heterogenous to easily 
  

fall into one relational schema.  This is for example the case in the open

web conversation and social 
  
space.  The first case is for mapping, the second for warehousing.   
Aside this, there is potential for more expressive queries through  the


query language dealing with
  

inferencing, like  subclass/subproperty/transitive etc.  These do not go


very well with SQL views.

I cannot agree more with what you say :-)

We are seeing the first RDF use case emerge within initiatives like the
Linking Open Data effort, where beside of being more expressive, RDF is also
playing its strength to provide for data links between record in different
databases.
  


Chris,

Talking with people from industry, I get the feeling that also more and more
people understand the second use case and that RDF is increasingly used as a
technology for something like poor man's data integration. You don't have
to spend a lot of time and money one designing a comprehensive data
warehouse. You just throw data having different schemata from different
sources together and instantly get the benefit that you can browse and query
the data and that you have proper provenance tracking (using Named Graphs).
Depending on how much data integration you need, you then start to apply
some identity resolution and schema mapping techniques. We have been talking
to some pharma and media companies that do data warehousing for years and
they all seam to be very interested in this quick and dirty approach.
  
Quick  Dirty is simply not how I would characterize this matter. I 
prefer to describe this as step 1 in a multi phased approach to RDF 
based data integration.

For both use cases, inferencing is a nice add-on but not essential. Within
the first use case, inferencing usually does not work as data published by
various autonomous sources tends to be to dirty for reasoning engines.
  
Inferencing is not a nice add-on, it is essential (in various degrees) 
once you get beyond the initial stages of heterogeneous data 
integration. As with all things, these matters are connected and 
inherently symbiotic: you can't inference without having something you 
want to reason about available in palatable form, which goes back to the 
phased approach I refer to above.


In my eyes, and experience, RDF is a powerful vehicle for implementing 
conceptual level data access that sits atop heterogeneous data sources. 
It's novelty comes from the platform independence that it injects into 
the data integration technology realm.



Kingsley

Cheers,

Chris


-Ursprüngliche Nachricht-
Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im
Auftrag von Orri Erling
Gesendet: Dienstag, 30. September 2008 00:16
An: 'Seaborne, Andy'; 'Story Henry'
Cc: [EMAIL PROTECTED]
Betreff: RE: Berlin SPARQL Benchmark V2 - Results for Sesame, Virtuoso, Jena
TDB, D2R Server, and MySQL


From Henry Story:

  

As a matter of interest, would it be possible to develop RDF stores
that optimize the layout of the data by analyzing the queries to the
database? A bit like a Java Just In Time compiler analyses the usage
of the classes in order to decide how to optimize the compilation.



From Andy Seaborne:

On a similar note, by mining the query logs it would be possible to create
parameterised queries and associated plan fragments without the client
needing to notify the server of the templates.  Couple with automatically
calculating possible materialized views or other layout optimizations, the
poor, overworked client application writer doesn't get brought into
optimizing the server.

Andy

  
 
Orri here:


With the BSBM workload, using parametrized queries as a small scale saves
roughly 1/3 of the execution time.  It is possible to remember query plans
and to notice if the same query text is submitted with only changes in
literal values.  If the first query ran quickly, one may presume the query
with substitutions will also run quickly.  There are of course exceptions.
But detecting these will mean running most of the optimizer cost model and
will eliminate any benefit from caching.


The other optimizations suggested have a larger upside but are far harder.  
I would say that if we have a predictable workload, then mapping 
relational to RDF is a lot easier than expecting the DBMS to figure out

materialized views to do the same.  If we do not have a predictable
workload, then making too many materialized views based on transient usage
patterns is a large downside because it grows the database, meaning less
working set.  The difference between in memory random access and a random
access with disk is about 5000 times.  Plus there is a high cost to making

Re: Role of RDF on the Web and within enterprise applications. was: AW: Berlin SPARQL Benchmark V2 - Results for Sesame, Virtuoso, Jena TDB, D2R Server, and MySQL

2008-09-30 Thread François-Paul Servant

Kingsley Idehen a écrit :

Chris Bizer wrote:

...
Depending on how much data integration you need, you then start to apply
some identity resolution and schema mapping techniques. We have been 
talking

to some pharma and media companies that do data warehousing for years and
they all seam to be very interested in this quick and dirty approach.
  
Quick  Dirty is simply not how I would characterize this matter. I 
prefer to describe this as step 1 in a multi phased approach to RDF 
based data integration.


Kingsley, I agree. I find it clean to make the basic but necessary first steps 
towards data integration (identification of things, mapping, etc.). Building 
upon legacy systems, that you just have to adapt, doesn't make the approach 
dirty: it makes it possible ;-)


cheers,

fps


-- Disclaimer 
Ce message ainsi que les eventuelles pieces jointes constituent une 
correspondance privee et confidentielle a l'attention exclusive du destinataire 
designe ci-dessus. Si vous n'etes pas le destinataire du present message ou une 
personne susceptible de pouvoir le lui delivrer, il vous est signifie que toute 
divulgation, distribution ou copie de cette transmission est strictement 
interdite. Si vous avez recu ce message par erreur, nous vous remercions d'en 
informer l'expediteur par telephone ou de lui retourner le present message, 
puis d'effacer immediatement ce message de votre systeme.
***
This e-mail and any attachments is a confidential correspondence intended only 
for use of the individual or entity named above. If you are not the intended 
recipient or the agent responsible for delivering the message to the intended 
recipient, you are hereby notified that any disclosure, distribution or copying 
of this communication is strictly prohibited. If you have received this 
communication in error, please notify the sender by phone or by replying this 
message, and then delete this message from your system.


Re: AW: Berlin SPARQL Benchmark V2 - Results for Sesame, Virtuoso, Jena TDB, D2R Server, and MySQL

2008-09-25 Thread Kingsley Idehen


Chris Bizer wrote:

Hi Kingsley and Paul,

Yes, I completely agree with you that different storage solutions fit
different use cases and that one of the main strengths of the RDF data model
is its flexibility and the possibility to mix different schemata.

Nevertheless, it think it is useful to give application developers an
indicator about what performance they can expect when they choose a specific
architecture, which is what the benchmark is trying to do.
  

Chris,

Yes, but the user profile has to be a little clearer. If you separate 
the results in the narrative you achieve the goal. You can use SQL 
numbers as a sort of benchamark if you clearly explain the nature skew 
that SQL enjoys due to the nature of the schema. 

We plan to run the benchmark again in January and it would be great to also
test Tucana/Kowari/Mulgara in this run.

As the performance of RDF stores is constantly improving, let's also hope
that the picture will not look that bad for them anymore then.
  
But at the current time, there is no clear sense of what better means 
:-) What's the goal?


What I fundamentally take from the benchmarks are the following:

1. Native RDF and RDF Views/Mapper scalability is becoming less of an 
issue (of course depending on your choice of product) and we are already 
at the point where this technology can be used for real-world solutions 
that have enterprise level scalability demands and expectations


2. It's impractical to create RDF warehouses from a existing SQL Data 
Sources when you can put RDF Views / Wrappers in front of the SQL Data 
Sources (SQL cost optimization technology has evolved significantly over 
the years across RDBMS engines).



And Yes, I would also like to see Mulgara and others RDF Stores in the 
next round of benchmarks :-)


Kingsley

Cheers,

Chris


-Ursprüngliche Nachricht-
Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im Auftrag
von Kingsley Idehen
Gesendet: Mittwoch, 24. September 2008 20:57
An: Paul Gearon
Cc: [EMAIL PROTECTED]; public-lod@w3.org
Betreff: Re: Berlin SPARQL Benchmark V2 - Results for Sesame, Virtuoso, Jena
TDB, D2R Server, and MySQL


Paul Gearon wrote:
  

On Mon, Sep 22, 2008 at 3:47 AM, Eyal Oren [EMAIL PROTECTED] wrote:
  


On 09/19/08/09/08 23:12 +0200, Orri Erling wrote:

  

Has has there been any analysis on whether there is a *fundamental*
reason for such performance difference? Or is it simply a question of
maturity; in other words, relational db technology has been around
  

for a
  

very long time and is very mature, whereas RDF implementations are
  

still
  

quite recent, so this gap will surely narrow ...?

  

This is a very complex subject.  I will offer some analysis below, but
this I fear will only raise further questions.  This is not the end of


the
  

road, far from it.
  


As far as I understand, another issue is relevant: this benchmark is
somewhat unfair as the relational stores have one advantage compared to
  

the
  

native triple stores: the relational data structure is fixed (Products,
Producers, Reviews, etc with given columns), while the triple
  

representation
  

is generic (arbitrary s,p,o).

  

This point has an effect on several levels.

For instance, the flexibility afforded by triples means that objects
stored in this structure require processing just to piece it all
together, whereas the RDBMS has already encoded the structure into the
table. Ironically, this is exactly the reason we
(Tucana/Kowari/Mulgara) ended up building an RDF database instead of
building on top of an RDBMS: The flexibility in table structure was
less efficient that a system that just knew it only had to deal with
3 columns. Obviously the shape of the data (among other things)
dictates what it is the better type of storage to use.

A related point is that processing RDF to create an object means you
have to move around a lot in the graph. This could mean a lot of
seeking on disk, while an RDBMS will usually find the entire object in
one place on the disk. And seeks kill performance.

This leads to the operations used to build objects from an RDF store.
A single object often requires the traversal of several statements,
where the object of one statement becomes the subject of the next.
Since the tables are typically represented as
Subject/Predicate/Object, this means that the main table will be
joined against itself. Even RDBMSs are notorious for not doing this
efficiently.

One of the problems with self-joins is that efficient operations like
merge-joins (when they can be identified) will still result in lots of
seeking, since simple iteration on both sides of the join means
seeking around in the same data. Of course, there ARE ways to optimize
some of this, but the various stores are only just starting to get to
these optimizations now.

Relational databases suffer similar problems, but joins are usually
only required for complex

Re: Berlin SPARQL Benchmark V2 - Results for Sesame, Virtuoso, Jena TDB, D2R Server, and MySQL

2008-09-24 Thread Paul Gearon

On Mon, Sep 22, 2008 at 3:47 AM, Eyal Oren [EMAIL PROTECTED] wrote:

 On 09/19/08/09/08 23:12 +0200, Orri Erling wrote:

 Has has there been any analysis on whether there is a *fundamental*
 reason for such performance difference? Or is it simply a question of
 maturity; in other words, relational db technology has been around for a
 very long time and is very mature, whereas RDF implementations are still
 quite recent, so this gap will surely narrow ...?

 This is a very complex subject.  I will offer some analysis below, but
 this I fear will only raise further questions.  This is not the end of the
 road, far from it.

 As far as I understand, another issue is relevant: this benchmark is
 somewhat unfair as the relational stores have one advantage compared to the
 native triple stores: the relational data structure is fixed (Products,
 Producers, Reviews, etc with given columns), while the triple representation
 is generic (arbitrary s,p,o).

This point has an effect on several levels.

For instance, the flexibility afforded by triples means that objects
stored in this structure require processing just to piece it all
together, whereas the RDBMS has already encoded the structure into the
table. Ironically, this is exactly the reason we
(Tucana/Kowari/Mulgara) ended up building an RDF database instead of
building on top of an RDBMS: The flexibility in table structure was
less efficient that a system that just knew it only had to deal with
3 columns. Obviously the shape of the data (among other things)
dictates what it is the better type of storage to use.

A related point is that processing RDF to create an object means you
have to move around a lot in the graph. This could mean a lot of
seeking on disk, while an RDBMS will usually find the entire object in
one place on the disk. And seeks kill performance.

This leads to the operations used to build objects from an RDF store.
A single object often requires the traversal of several statements,
where the object of one statement becomes the subject of the next.
Since the tables are typically represented as
Subject/Predicate/Object, this means that the main table will be
joined against itself. Even RDBMSs are notorious for not doing this
efficiently.

One of the problems with self-joins is that efficient operations like
merge-joins (when they can be identified) will still result in lots of
seeking, since simple iteration on both sides of the join means
seeking around in the same data. Of course, there ARE ways to optimize
some of this, but the various stores are only just starting to get to
these optimizations now.

Relational databases suffer similar problems, but joins are usually
only required for complex structures between different tables, which
can be stored on different spindles. Contrast this to RDF, which needs
to do do many of these joins for all but the simplest of data.

 One can question whether such flexibility is relevant in practice, and if
 so, one may try to extract such structured patterns from data on-the-fly.
 Still, it's important to note that we're comparing somewhat different things
 here between the relational and the triple representation of the benchmark.

This is why I think it is very important to consider the type of data
being stored before choosing the type of storage to use. For some
applications an RDBMS is going to win hands down every time. For other
applications, an RDF store is definitely the way to go. Understanding
the flexibility and performance constraints of each is important. This
kind of benchmarking helps with that. It also helps identify where RDF
databases need to pick up their act.

Regards,
Paul Gearon



Re: Berlin SPARQL Benchmark V2 - Results for Sesame, Virtuoso, Jena TDB, D2R Server, and MySQL

2008-09-24 Thread Kingsley Idehen


Paul Gearon wrote:

On Mon, Sep 22, 2008 at 3:47 AM, Eyal Oren [EMAIL PROTECTED] wrote:
  

On 09/19/08/09/08 23:12 +0200, Orri Erling wrote:


Has has there been any analysis on whether there is a *fundamental*
reason for such performance difference? Or is it simply a question of
maturity; in other words, relational db technology has been around for a
very long time and is very mature, whereas RDF implementations are still
quite recent, so this gap will surely narrow ...?


This is a very complex subject.  I will offer some analysis below, but
this I fear will only raise further questions.  This is not the end of the
road, far from it.
  

As far as I understand, another issue is relevant: this benchmark is
somewhat unfair as the relational stores have one advantage compared to the
native triple stores: the relational data structure is fixed (Products,
Producers, Reviews, etc with given columns), while the triple representation
is generic (arbitrary s,p,o).



This point has an effect on several levels.

For instance, the flexibility afforded by triples means that objects
stored in this structure require processing just to piece it all
together, whereas the RDBMS has already encoded the structure into the
table. Ironically, this is exactly the reason we
(Tucana/Kowari/Mulgara) ended up building an RDF database instead of
building on top of an RDBMS: The flexibility in table structure was
less efficient that a system that just knew it only had to deal with
3 columns. Obviously the shape of the data (among other things)
dictates what it is the better type of storage to use.

A related point is that processing RDF to create an object means you
have to move around a lot in the graph. This could mean a lot of
seeking on disk, while an RDBMS will usually find the entire object in
one place on the disk. And seeks kill performance.

This leads to the operations used to build objects from an RDF store.
A single object often requires the traversal of several statements,
where the object of one statement becomes the subject of the next.
Since the tables are typically represented as
Subject/Predicate/Object, this means that the main table will be
joined against itself. Even RDBMSs are notorious for not doing this
efficiently.

One of the problems with self-joins is that efficient operations like
merge-joins (when they can be identified) will still result in lots of
seeking, since simple iteration on both sides of the join means
seeking around in the same data. Of course, there ARE ways to optimize
some of this, but the various stores are only just starting to get to
these optimizations now.

Relational databases suffer similar problems, but joins are usually
only required for complex structures between different tables, which
can be stored on different spindles. Contrast this to RDF, which needs
to do do many of these joins for all but the simplest of data.

  

One can question whether such flexibility is relevant in practice, and if
so, one may try to extract such structured patterns from data on-the-fly.
Still, it's important to note that we're comparing somewhat different things
here between the relational and the triple representation of the benchmark.



This is why I think it is very important to consider the type of data
being stored before choosing the type of storage to use. For some
applications an RDBMS is going to win hands down every time. For other
applications, an RDF store is definitely the way to go. Understanding
the flexibility and performance constraints of each is important. This
kind of benchmarking helps with that. It also helps identify where RDF
databases need to pick up their act.

Regards,
Paul Gearon


  

Paul,

You make valid points, the problem here is that the benchmark has been 
released without enough clarity about it's prime purpose. To even 
compare RDF Quads Stores with an RDBMS engine when the schema is 
Relational in itself is kinda twisted.


The role of mappers (DR2Q  Virtuoso RDF Views) for instance,  should 
have been made much clearer, maybe in separate results tables. I say 
this because these mappers offer different approaches to projecting 
RDBMS based data in RDF Linked Data form, on the fly, and their purpose 
in this benchmark is all about raw performance and scalability as it 
relates to following RDF Linked Data generation and deployment conditions:


1. Schema is Relational
2. RDF warehouse is impractical

As I am sure you know, we could invert this whole benchmark Open World 
style, and then bring RDBMS engines to their knees by incorporating 
SPARQL query patterns comprised of ?p's and subclasses .


To conclude, the quad store numbers should simply be a conparisons of 
the quad stores themselves, and not the quad stores vs the mappers or 
native SQL. This clarification really needs to make it's way into the 
benchmark narrative.



--


Regards,

Kingsley Idehen   Weblog: http://www.openlinksw.com/blog/~kidehen
President  CEO 

AW: Berlin SPARQL Benchmark V2 - Results for Sesame, Virtuoso, Jena TDB, D2R Server, and MySQL

2008-09-24 Thread Chris Bizer

Hi Kingsley and Paul,

Yes, I completely agree with you that different storage solutions fit
different use cases and that one of the main strengths of the RDF data model
is its flexibility and the possibility to mix different schemata.

Nevertheless, it think it is useful to give application developers an
indicator about what performance they can expect when they choose a specific
architecture, which is what the benchmark is trying to do.

We plan to run the benchmark again in January and it would be great to also
test Tucana/Kowari/Mulgara in this run.

As the performance of RDF stores is constantly improving, let's also hope
that the picture will not look that bad for them anymore then.

Cheers,

Chris


-Ursprüngliche Nachricht-
Von: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Im Auftrag
von Kingsley Idehen
Gesendet: Mittwoch, 24. September 2008 20:57
An: Paul Gearon
Cc: [EMAIL PROTECTED]; public-lod@w3.org
Betreff: Re: Berlin SPARQL Benchmark V2 - Results for Sesame, Virtuoso, Jena
TDB, D2R Server, and MySQL


Paul Gearon wrote:
 On Mon, Sep 22, 2008 at 3:47 AM, Eyal Oren [EMAIL PROTECTED] wrote:
   
 On 09/19/08/09/08 23:12 +0200, Orri Erling wrote:
 
 Has has there been any analysis on whether there is a *fundamental*
 reason for such performance difference? Or is it simply a question of
 maturity; in other words, relational db technology has been around
for a
 very long time and is very mature, whereas RDF implementations are
still
 quite recent, so this gap will surely narrow ...?
 
 This is a very complex subject.  I will offer some analysis below, but
 this I fear will only raise further questions.  This is not the end of
the
 road, far from it.
   
 As far as I understand, another issue is relevant: this benchmark is
 somewhat unfair as the relational stores have one advantage compared to
the
 native triple stores: the relational data structure is fixed (Products,
 Producers, Reviews, etc with given columns), while the triple
representation
 is generic (arbitrary s,p,o).
 

 This point has an effect on several levels.

 For instance, the flexibility afforded by triples means that objects
 stored in this structure require processing just to piece it all
 together, whereas the RDBMS has already encoded the structure into the
 table. Ironically, this is exactly the reason we
 (Tucana/Kowari/Mulgara) ended up building an RDF database instead of
 building on top of an RDBMS: The flexibility in table structure was
 less efficient that a system that just knew it only had to deal with
 3 columns. Obviously the shape of the data (among other things)
 dictates what it is the better type of storage to use.

 A related point is that processing RDF to create an object means you
 have to move around a lot in the graph. This could mean a lot of
 seeking on disk, while an RDBMS will usually find the entire object in
 one place on the disk. And seeks kill performance.

 This leads to the operations used to build objects from an RDF store.
 A single object often requires the traversal of several statements,
 where the object of one statement becomes the subject of the next.
 Since the tables are typically represented as
 Subject/Predicate/Object, this means that the main table will be
 joined against itself. Even RDBMSs are notorious for not doing this
 efficiently.

 One of the problems with self-joins is that efficient operations like
 merge-joins (when they can be identified) will still result in lots of
 seeking, since simple iteration on both sides of the join means
 seeking around in the same data. Of course, there ARE ways to optimize
 some of this, but the various stores are only just starting to get to
 these optimizations now.

 Relational databases suffer similar problems, but joins are usually
 only required for complex structures between different tables, which
 can be stored on different spindles. Contrast this to RDF, which needs
 to do do many of these joins for all but the simplest of data.

   
 One can question whether such flexibility is relevant in practice, and if
 so, one may try to extract such structured patterns from data on-the-fly.
 Still, it's important to note that we're comparing somewhat different
things
 here between the relational and the triple representation of the
benchmark.
 

 This is why I think it is very important to consider the type of data
 being stored before choosing the type of storage to use. For some
 applications an RDBMS is going to win hands down every time. For other
 applications, an RDF store is definitely the way to go. Understanding
 the flexibility and performance constraints of each is important. This
 kind of benchmarking helps with that. It also helps identify where RDF
 databases need to pick up their act.

 Regards,
 Paul Gearon


   
Paul,

You make valid points, the problem here is that the benchmark has been 
released without enough clarity about it's prime purpose. To even 
compare RDF Quads Stores with an RDBMS

Re: Berlin SPARQL Benchmark V2 - Results for Sesame, Virtuoso, Jena TDB, D2R Server, and MySQL

2008-09-24 Thread Story Henry


As a matter of interest, would it be possible to develop RDF stores  
that optimize the layout of the data by analyzing the queries to the  
database? A bit like a Java Just In Time compiler analyses the usage  
of the classes in order to decide how to optimize the compilation.


Henry

On 24 Sep 2008, at 20:30, Paul Gearon wrote:


A related point is that processing RDF to create an object means you
have to move around a lot in the graph. This could mean a lot of
seeking on disk, while an RDBMS will usually find the entire object in
one place on the disk. And seeks kill performance.

This leads to the operations used to build objects from an RDF store.
A single object often requires the traversal of several statements,
where the object of one statement becomes the subject of the next.
Since the tables are typically represented as
Subject/Predicate/Object, this means that the main table will be
joined against itself. Even RDBMSs are notorious for not doing this
efficiently.





Re: Berlin SPARQL Benchmark V2 - Results for Sesame, Virtuoso, Jena TDB, D2R Server, and MySQL

2008-09-23 Thread Ivan Mikhailov

Hello Eyal,

 this benchmark is somewhat unfair as the relational stores have one advantage 
 compared to the 
 native triple stores: the relational data structure is fixed (Products, 
 Producers, Reviews, etc with given columns), while the triple 
 representation is generic (arbitrary s,p,o).
 
 One can question whether such flexibility is relevant in practice, and if 
 so, one may try to extract such structured patterns from data on-the-fly.

That will be our next big extension -- updateable RDV Views, as proposed
in http://esw.w3.org/topic/UpdatingRelationalDataViaSPARUL . So we will
be able to load BSBM data as RDF and query them via SPARQL web service
endpoint; thus we will masquerade  the relational storage entirely.

Best Regards,

Ivan Mikhailov,
OpenLink Software
http://virtuoso.openlinksw.com





Berlin SPARQL Benchmark V2 - Results for Sesame, Virtuoso, Jena TDB, D2R Server, and MySQL

2008-09-17 Thread Chris Bizer


Hi all,

over the last weeks, we have extended the Berlin SPARQL Benchmark 
(BSBM) to a multi-client scenario, fine-tuned the benchmark dataset 
and the query mix, and implemented a SQL version of the benchmark in 
order to be able to compare SPARQL stores with classical SQL stores.


Today, we have released the results of running the BSBM Benchmark 
Version 2 against:


+ three RDF stores (Virtuoso Version 5.0.8, Sesame Version 2.2, Jena 
TDB Version 0.53) and
+ two relational database-to-RDF wrappers (D2R Server Version 0.4 and 
Virtuoso - RDF Views Version 5.0.8).


for datasets ranging from 250,000 triples to 100,000,000 triples.

In order to set the SPARQL query performance into context we also 
report the results of running the SQL version of the benchmark against 
two relational database management systems (MySQL 5.1.26 and 
Virtuoso - RDBMS Version 5.0.8).


A comparison of the performance for a single client working against 
the stores is found here:


http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/results/index.html#comparison

A comparison of the performance for 1 to 16 clients simultaneously 
executing query mixes against the stores is found here:


http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/results/index.html#multiResults

The complete benchmark results including the setup of the experiment 
and the configuration of the different stores is found here:


http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/results/index.html

The current specification of the Berlin SPARQL Benchmark is found 
here:


http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/spec/20080912/

It is interesting to see:

1. that relational database to RDF wrappers generally outperform RDF 
stores for larger dataset sizes.
2. that no store outperforms the others for all queries and dataset 
sizes.
3. that the query throughput still varies widely within the 
multi-client scenario.
4. that the fastest RDF store is still 7 times slower than a 
relational database.


Thanks a lot to

+ Eli Lilly and Company and especially Susie Stephens for making this 
work possible through a research grant.
+ Orri Erling, Andy Seaborne, Arjohn Kampman, Michael Schmidt, Richard 
Cyganiak, Ivan Mikhailov, Patrick van Kleef, and Christian Becker for 
their feedback on the benchmark design and their help with configuring 
the stores and running the benchmark experiment.


Without all your help it would not been possible to conduct this 
experiment.


We highly welcome feedback on the benchmark design and the results of 
the experiment.


Cheers,

Chris Bizer and Andreas Schultz

--
Prof. Dr. Chris Bizer
Freie Universität Berlin
Phone: +49 30 838 55509
Mail: [EMAIL PROTECTED]
Web: www.bizer.de