| Smalyshev added a comment. |
Due to operational issues, Virtuoso has been only partially evaulated.
Install went fine, though some small adjustments were required. But nothing major. Performance evaluation for loading and some queries looked fine, in fact parallel loading were faster than in Blazegraph cases, though it required some changes to the data due to much stricter requirements Virtuoso has for geographical data. In general, I think there's a good chance performance is OK.
Functionality-wise, there's a significant delta between what we provide now and what Virtuoso can support. I have outlined it here: https://docs.google.com/document/d/1PSVIwuKrc1yeQwXgZmKxP6cqhby4Dfnz-JdQs0pkj8M/edit#
Virtuoso supports all standard SPARQL 1.1 syntax, as far as I could see, but beyond that there are many differences, of course.
Most of the missing functionality seems possible to fill (at least in theory), as Virtuoso provides custom types, custom functions and procedures and a large ecosystem supporting ingestion of data from other sources into RDF and integration capabilities.
However, it is not very likely that we will be able to support these capabilities in the same way and with the same syntax as our current solution does, so script migrations would be necessary. It would also require significant investment of time to develop these solutions in any case.
Virtuoso 7.x Opensource version has only experimental support on recent Debian platforms, but seems to work fine. The code is implemented in C, so it could be possible for us to contribute to it if necessary.
Clustering and HA setups are not available in Opensource version, neither is data replication. Other capabilities not supported in Opensource are graph-based access controls.
In general, I think Virtuoso can be a viable platform in case things with Blazegraph would not be sustainable anymore, and will support basic functionality (standard SPARQL, updating, etc.) adequately, but extensions and some additional capabilities may require significant effort to develop and will incur some migration pains to the users. Unfortunately, since all clustering and replication solutions seem to be non-opensource, we would not be able - at least without either using commercial version or getting some kind of special solution or special deal - to use other paradigm of clustering that we're using now.
Further testing would be necessary to evaluate performance of querying and Updater - the latter would probably require some modifications in order to be able to run against Virtuoso, but does not seem too hard. Since we do not have testing platform for it now and we have renewed activity on Blazagraph side, right now I am not planning to continue the testing in the short term.
Cc: Aklapper, Smalyshev, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
_______________________________________________ Wikidata-bugs mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
