Re: Fwd: ANN: New Berlin SPARQL Benchmark results

Kingsley Idehen Mon, 29 Apr 2013 08:38:40 -0700

On 4/29/13 8:05 AM, Marco Neumann wrote:

some interesting numbers here. I am sure Kingsley is going to be
delighted with some of the findings :)

Yes, he's had to keep quiet about these results and findings for too loooong :-)

Key take away: you can have open ended SPARQL endpoints that scale, massively.

Our mission re., Virtuoso 7.0 was simple: put the aforementioned misconception to rest. Again, you can build and deploy massively scalable RDF based Linked Data solutions that go where Hadoop, NewSQL, NoSQL, and even conventional RDBMS engines can't venture.

What the industry is still struggling to grasp about RDF based Linked Data is the fact that performance, scale, integration, and access controls aren't unique re., key hurdles. These problems still hound conventional RDBMS, NewSQL, and NoSQL products which do not attend to the "integration" (open data connectivity) and "access controls" issues. For instance, they simply don't have URIs as native data types which (by implication) makes every data object (RDF resource) accessible to user agents on a public or private HTTP network.

Linked Data HTTP URIs are extremely powerful Super Keys for heterogeneous data virtualization, integration, and management. It's these core features that will constructively tweak the DBMS world as we all used to know it :-)



Kingsley

Marco

---------- Forwarded message ----------
From: Christian Bizer <[email protected]>
Date: Mon, Apr 29, 2013 at 7:54 AM
Subject: ANN: New Berlin SPARQL Benchmark results for datasets ranging
from  10 million to 150 billion RDF triples
To: [email protected], [email protected], [email protected]

Hi all,

Berlin SPARQL Benchmark (BSBM) is a benchmark for measuring the
performance of storage systems that expose SPARQL endpoints. The
benchmark is built around an e-commerce use case in which a set of
products is offered by different vendors.The benchmark defines two
query mixes:
1. The query mix of theExplore use case
<http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/spec/ExploreUseCase/index.html>illustrates
the search and navigation pattern of a consumer looking for a product
via some web portal.
2. The query mix of theBusiness Intelligence use case
<http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/spec/BusinessIntelligenceUseCase/index.html>simulates
different stakeholders asking analytical questions against the
dataset. The query mix relies heavily on SPARQL 1.1 constructs like
GROUP BY and COUNT() and is designed to touch large portions of the
benchmark dataset.

I'm happy to announce the results of a new BSBM benchmark experiment.
The experiment compares the performance of

1. BigData
2. BigOwlim
3. Jena TDB
4. Virtuoso

on a single machine using datasets ranging from 10 million to 1
billion RDF triples (Explore and Business Intelligence query mixes).

In addition, it compares the performance of

1. BigOwlim
2. Virtuoso

on a cluster of 8 machines using datasets ranging from 10 billion to
150 billion RDF triples (Explore and Business Intelligence query
mixes).

The results of the experiment are found at

http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/results/V7/

I think that the results are quite impressive and demonstrate that
SPARQL stores got a lot more mature over the last years.

A year ago, many RDF stores still had problems with the SPARQL 1.1
constructs GROUP BY and COUNT() and were thus not able to execute the
Business Intelligence query mix. Now, all systems pass this test and
some of the systems show an impressive performance on grouping and
aggregating the data.

The 150 billion triples experiment has shown that given proper
hardware, it is possible to run analytical queries on amounts of data
that are beyond most (all?) of today's use cases: The whole LOD Cloud
[1] is estimated to consist only of 31 billion triples; the RDFa,
Microdata and Microformat dataset extracted by the WebDataCommons [2]
project from 3 billion HTML pages only consists of 7.3 billion
triples. So, 150 billion triples leave quite some room for the further
growth of structured data on the Web ;-)

More information about the Berlin SPARQL benchmark, the exact
specification of the benchmark query mixes, as well as results from
previous benchmarking experiments are found at

http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/

Lots of thanks to Peter Boncz  and Minh-Duc Pham who conducted the new
experiment as part of the EU project LOD2 and have provided their
results for being published on the BSBM website.

Cheers,

Chris

[1] http://lod-cloud.net/state/
[2] http://www.webdatacommons.org/

--

---
Marco Neumann
KONA



--

Regards,

Kingsley Idehen 
Founder & CEO
OpenLink Software
Company Web: http://www.openlinksw.com
Personal Weblog: http://www.openlinksw.com/blog/~kidehen
Twitter/Identi.ca handle: @kidehen
Google+ Profile: https://plus.google.com/112399767740508618350/about
LinkedIn Profile: http://www.linkedin.com/in/kidehen

smime.p7s
Description: S/MIME Cryptographic Signature

Re: Fwd: ANN: New Berlin SPARQL Benchmark results

Reply via email to