[Wikidata-bugs] [Maniphest] T206560: [Epic] Evaluate alternatives to Blazegraph

Hannah_Bast Sat, 11 Dec 2021 02:12:47 -0800

Hannah_Bast added a comment.


  In T206560#7562538 <https://phabricator.wikimedia.org/T206560#7562538>, 
@Fnielsen wrote:
  
  > I am taking the liberty to polute the thread with a reference to 
"MillenniumDB: A Persistent, Open-Source, Graph Database" 
https://arxiv.org/pdf/2111.01540.pdf from November 2021. Millennium may have 
some serious limitations in terms of requirements that can be setup, but 
interestingly they write "However, MillenniumDB was designed with the complete 
version of Wikidata – including qualifiers, references, etc. – in mind." and 
their benchmarks seems strong. They compare against Blazegraph, Jena, Virtuoso 
and Neo4J.
  
  Thanks for the pointer! Here are my first impressions from reading the paper:
  
  1. The engine is based on similar ideas as QLever. However, QLever is around 
for 5 years already, which the authors fail to acknowledge. I am sure they 
didn't do it on purpose though. I wrote to them.
  
  2. Like QLever, their engine currently is read-only and does not support 
SPARQL Update operations. Given the design of their engine, this is not 
something that will be easy to add.
  
  3. Their engine is currently very far away from SPARQL 1.1 support. In the 
current version, even basic features like GROUP BY and mathematical expressions 
are missing. I am not sure whether they actually strive for SPARQL 1.1 support, 
since the motivation expressed in the paper goes more in the direction of a 
more general data model that is independent of a particular query language. 
Anyway, adding full SPARQL 1.1 support would be a lot of work, as we know from 
experience.
  
  4. I find the evaluation misleading. Right at the beginning of their 
evaluation section, in Section 5.1, they claim that their engine is 30 times 
faster than Virtuoso for very simple queries (consisting of a single triple). 
We know Virtuoso very well and have compared it with QLever extensively. 
Virtuoso is a very mature and efficient engine and hard to beat, even on more 
complex queries. On simple queries, there are natural barriers to what can be 
achieved, and Virtuoso often (though not always) does the optimal thing. I 
think the authors either did not configure Virtuoso optimally or they stumbled 
on an artefact without being aware of it. Namely, Virtuoso is rather slow when 
it has to produce a very large output. That is not a weakness of their query 
processing engine, but of the way they translate their internal IDs to output 
IRIs and literals.
  
  @KingsleyIdehen maybe you can provide some feedback concerning @4, in 
particular, the last two sentences.

TASK DETAIL
  https://phabricator.wikimedia.org/T206560

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Hannah_Bast
Cc: accounting_data_logger, Fnielsen, nguyenm9, AndySeaborne, BenAtOlive, 
YULdigitalpreservation, Iamamz3, namedgraph, Versant.2612, AndreasKuczera, 
DD063520, Michael, toan, Kjauslin, Hannah_Bast, RShigapov, Izno, 
KingsleyIdehen, Daniel_Mietchen, Majavah, karapayneWMDE, MarioGom, 
Mohammed_Sadat_WMDE, Hjfocs, danshick-wmde, Thadguidry, Tpt, TallTed, Sj, 
Afandian, Justin0x2004, Jerven, TheKtk, Ivanhercaz, Jneubert, DanBri, 
Lydia_Pintscher, Tagishsimon, Samantha_Alipio_WMDE, Ostrzyciel, GreenReaper, 
WMDE-leszek, Salgo60, So9q, Krabina, Jecummings4, TomT0m, Akuckartz, 
Susannaanas, Addshore, Andrawaag, Gehel, Lucas_Werkmeister_WMDE, Aklapper, 
Smalyshev, BeautifulBold, Suran38, Invadibot, MPhamWMF, Jtm-lis, maantietaja, 
Peteosx1x, NavinRizwi, CBogen, Isaacandy, Demian, Olson.jared.m, Nandana, 
Namenlos314, Lahi, Gq86, Bryandamon, GoranSMilovanovic, QZanden, EBjune, 
merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, Steko, 
Samwilson, PhotographerTom, suriyaa, Psychoslave, tosfos, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Darenwelsh, Dinoguy1000, 
Manybubbles, brion, Mbch331, MarkAHershberger

_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[Wikidata-bugs] [Maniphest] T206560: [Epic] Evaluate alternatives to Blazegraph

Reply via email to