Hannah_Bast added a comment.
In T206560#7562538 <https://phabricator.wikimedia.org/T206560#7562538>, @Fnielsen wrote: > I am taking the liberty to polute the thread with a reference to "MillenniumDB: A Persistent, Open-Source, Graph Database" https://arxiv.org/pdf/2111.01540.pdf from November 2021. Millennium may have some serious limitations in terms of requirements that can be setup, but interestingly they write "However, MillenniumDB was designed with the complete version of Wikidata – including qualifiers, references, etc. – in mind." and their benchmarks seems strong. They compare against Blazegraph, Jena, Virtuoso and Neo4J. Thanks for the pointer! Here are my first impressions from reading the paper: 1. The engine is based on similar ideas as QLever. However, QLever is around for 5 years already, which the authors fail to acknowledge. I am sure they didn't do it on purpose though. I wrote to them. 2. Like QLever, their engine currently is read-only and does not support SPARQL Update operations. Given the design of their engine, this is not something that will be easy to add. 3. Their engine is currently very far away from SPARQL 1.1 support. In the current version, even basic features like GROUP BY and mathematical expressions are missing. I am not sure whether they actually strive for SPARQL 1.1 support, since the motivation expressed in the paper goes more in the direction of a more general data model that is independent of a particular query language. Anyway, adding full SPARQL 1.1 support would be a lot of work, as we know from experience. 4. I find the evaluation misleading. Right at the beginning of their evaluation section, in Section 5.1, they claim that their engine is 30 times faster than Virtuoso for very simple queries (consisting of a single triple). We know Virtuoso very well and have compared it with QLever extensively. Virtuoso is a very mature and efficient engine and hard to beat, even on more complex queries. On simple queries, there are natural barriers to what can be achieved, and Virtuoso often (though not always) does the optimal thing. I think the authors either did not configure Virtuoso optimally or they stumbled on an artefact without being aware of it. Namely, Virtuoso is rather slow when it has to produce a very large output. That is not a weakness of their query processing engine, but of the way they translate their internal IDs to output IRIs and literals. @KingsleyIdehen maybe you can provide some feedback concerning @4, in particular, the last two sentences. TASK DETAIL https://phabricator.wikimedia.org/T206560 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Hannah_Bast Cc: accounting_data_logger, Fnielsen, nguyenm9, AndySeaborne, BenAtOlive, YULdigitalpreservation, Iamamz3, namedgraph, Versant.2612, AndreasKuczera, DD063520, Michael, toan, Kjauslin, Hannah_Bast, RShigapov, Izno, KingsleyIdehen, Daniel_Mietchen, Majavah, karapayneWMDE, MarioGom, Mohammed_Sadat_WMDE, Hjfocs, danshick-wmde, Thadguidry, Tpt, TallTed, Sj, Afandian, Justin0x2004, Jerven, TheKtk, Ivanhercaz, Jneubert, DanBri, Lydia_Pintscher, Tagishsimon, Samantha_Alipio_WMDE, Ostrzyciel, GreenReaper, WMDE-leszek, Salgo60, So9q, Krabina, Jecummings4, TomT0m, Akuckartz, Susannaanas, Addshore, Andrawaag, Gehel, Lucas_Werkmeister_WMDE, Aklapper, Smalyshev, BeautifulBold, Suran38, Invadibot, MPhamWMF, Jtm-lis, maantietaja, Peteosx1x, NavinRizwi, CBogen, Isaacandy, Demian, Olson.jared.m, Nandana, Namenlos314, Lahi, Gq86, Bryandamon, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, Steko, Samwilson, PhotographerTom, suriyaa, Psychoslave, tosfos, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Darenwelsh, Dinoguy1000, Manybubbles, brion, Mbch331, MarkAHershberger
_______________________________________________ Wikidata-bugs mailing list -- [email protected] To unsubscribe send an email to [email protected]
