Re: [Wikidata] Scaling Wikidata Query Service

Amirouche Boubekki Wed, 12 Jun 2019 10:57:08 -0700

Le mer. 12 juin 2019 à 19:11, Stas Malyshev <[email protected]> a
écrit :


> Hi!
>
> >> So there needs to be some smarter solution, one that we'd unlike to
> > develop inhouse
> >
> > Big cat, small fish. As wikidata continue to grow, it will have specific
> > needs.
> > Needs that are unlikely to be solved by off-the-shelf solutions.
>
> Here I think it's good place to remind that we're not Google, and
> developing a new database engine inhouse is probably a bit beyond our
> resources and budgets.


Today, the problem is not the same as the one MySQL, PostgreSQL, blazegraph
and openlink had when they started working on their respective databases.
See
below.


> Fitting existing solution to our goals - sure, but developing something
> new of

that scale is probably not going to happen.
>

It will.

> FoundationDB and WiredTiger are respectively used at Apple (among other
> > companies)
> > and MongoDB since 3.2 all over-the-world. WiredTiger is also used at
> Amazon.
>
> I believe they are, but I think for our particular goals we have to
> limit themselves for a set of solution that are a proven good match for
> our case.
>

See the other mail I just sent. We are a turning point in database
engineering
history. The very last database systems that were built are all based on
Ordered Key Value Store, see Google Spanner paper [0].

Thanks to WT/MongoDB and Apple, those are readily available, in widespread
use
and fully open source. It is only missing a few pieces for making it work a
fully
backward compatible way with WDQS (at scale).

[0] https://ai.google/research/pubs/pub39966


> > That will be vendor lock-in for wikidata and wikimedia along all the
> > poor souls that try to interop with it.
>
> Since Virtuoso is using standard SPARQL, it won't be too much of a
> vendor lock in, though of course the standard does not cover all, so
> some corners are different in all SPARQL engines.


There is a big chance that same thing that happened with the www will
happen with RDF. That is one big player own all the implementations.


> This is why even migration between SPARQL engines, even excluding

operational aspects, is non-trivial.


I agree.


> Of course, migration to any non-SPARQL engine would be order of magnitude

more disruptive, so right now we do not seriously consider doing that.
>

I also agree.

>
> As I already mentioned, there's a difference between "you can do it" and
> "you can do it efficiently". [...] The tricky part starts when you need to
> run millions
> of queries on 10B triples database. If your backend is not optimal for
> that task, it's not going to perform.
>

I already did small benchmarks against blazegraph. I will do more intensive
benchmarks using wikidata (and reduce the requirements in terms of SSD).


Thanks for the reply.

_______________________________________________
Wikidata mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata

Re: [Wikidata] Scaling Wikidata Query Service

Reply via email to