Le mer. 12 juin 2019 à 19:11, Stas Malyshev <[email protected]> a écrit :
> Hi! > > >> So there needs to be some smarter solution, one that we'd unlike to > > develop inhouse > > > > Big cat, small fish. As wikidata continue to grow, it will have specific > > needs. > > Needs that are unlikely to be solved by off-the-shelf solutions. > > Here I think it's good place to remind that we're not Google, and > developing a new database engine inhouse is probably a bit beyond our > resources and budgets. Today, the problem is not the same as the one MySQL, PostgreSQL, blazegraph and openlink had when they started working on their respective databases. See below. > Fitting existing solution to our goals - sure, but developing something > new of that scale is probably not going to happen. > It will. > FoundationDB and WiredTiger are respectively used at Apple (among other > > companies) > > and MongoDB since 3.2 all over-the-world. WiredTiger is also used at > Amazon. > > I believe they are, but I think for our particular goals we have to > limit themselves for a set of solution that are a proven good match for > our case. > See the other mail I just sent. We are a turning point in database engineering history. The very last database systems that were built are all based on Ordered Key Value Store, see Google Spanner paper [0]. Thanks to WT/MongoDB and Apple, those are readily available, in widespread use and fully open source. It is only missing a few pieces for making it work a fully backward compatible way with WDQS (at scale). [0] https://ai.google/research/pubs/pub39966 > > That will be vendor lock-in for wikidata and wikimedia along all the > > poor souls that try to interop with it. > > Since Virtuoso is using standard SPARQL, it won't be too much of a > vendor lock in, though of course the standard does not cover all, so > some corners are different in all SPARQL engines. There is a big chance that same thing that happened with the www will happen with RDF. That is one big player own all the implementations. > This is why even migration between SPARQL engines, even excluding operational aspects, is non-trivial. I agree. > Of course, migration to any non-SPARQL engine would be order of magnitude more disruptive, so right now we do not seriously consider doing that. > I also agree. > > As I already mentioned, there's a difference between "you can do it" and > "you can do it efficiently". [...] The tricky part starts when you need to > run millions > of queries on 10B triples database. If your backend is not optimal for > that task, it's not going to perform. > I already did small benchmarks against blazegraph. I will do more intensive benchmarks using wikidata (and reduce the requirements in terms of SSD). Thanks for the reply.
_______________________________________________ Wikidata mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata
