Aha, hello jerven :) I should have remembered your earlier comment, delighted you are here.
Thank you again for sharing your promising experience + benchmarks + suggestions -- and for highlighting both similarities and differences. SJ On Tue, Aug 24, 2021 at 2:18 AM jerven Bolleman <[email protected]> wrote: > Hi Samuel, All, > > I am the software engineer responsible for sparql.uniprot.org. > I already offered to help in https://phabricator.wikimedia.org/T206561. > So no need to ask Andra or Egon ;) > > While we are good users of virtuoso, and strongly suggest it is > evaluated. As it is in general a good product that does scale.[1] > > One of the things we did differently than WDQS is to introduce a > controlled layer between the "public" and the "database". > To allow things like query rewriting/redirection upon data model > changes, as well as rewriting some schema rediscovery queries to a known > faster query. We also parse the queries with RDF4J before handing them > to virtuoso. This makes sure that the queries that we accept are only > valid SPARQL 1.1. Avoiding users getting used to almost SPARQL dialects > (i.e. retain the flexiblity to move to a different endpoint). We are in > the process of updating this code and contributing it to RDF4J, with the > first contribution in the develop/4.0.0 branch > > I think a number of current customizations in WDQS can be moved to a > front RDF4J layer. Then the RDF4J sail/repository layer can be used to > preserve flexibility. So that WDQS can more easily switch between > backend databases in the future. > > One large difference between UniProt and WDQS is that WikiData is > continually updated while UniProt is batch released a few times a year. > WDQS is somewhat easier in some areas and more difficult in others > because of that. > > Regards, > Jerven > > [1] No Database is perfect, but it does scale a lot better than > Blazegraph did. Which we also evaluated in the past. There is still a > lot of potential in Virtuoso to scale even better in the future. > > > > > > On 23/08/2021 21:36, Samuel Klein wrote: > > Ah, that's lovely. Thanks for the update, Kingsley! Uniprot is a good > > parallel to keep in mind. > > > > For Egon, Andra, others who work with them: Is there someone you'd > > recommend chatting with at uniprot? > > "scaling alongside uniprot" or at least engaging them on how to solve > > shared + comparable issues (they also offer authentication-free SPARQL > > querying) sounds like a compelling option. > > > > S. > > > > On Thu, Aug 19, 2021 at 4:32 PM Kingsley Idehen via Wikidata > > <[email protected] <mailto:[email protected]>> > wrote: > > > > On 8/18/21 5:07 PM, Mike Pham wrote: > >> > >> Wikidata community members, > >> > >> > >> Thank you for all of your work helping Wikidata grow and improve > >> over the years. In the spirit of better communication, we would > >> like to take this opportunity to share some of the current > >> challenges Wikidata Query Service (WDQS) is facing, and some > >> strategies we have for dealing with them. > >> > >> > >> WDQS currently risks failing to provide acceptable service quality > >> due to the following reasons: > >> > >> 1. > >> > >> Blazegraph scaling > >> > >> 1. > >> > >> Graph size. WDQS uses Blazegraph as our graph backend. > >> While Blazegraph can theoretically support 50 billion > >> edges <https://blazegraph.com/>, in reality Wikidata is > >> the largest graph we know of running on Blazegraph (~13 > >> billion triples > >> < > https://grafana.wikimedia.org/d/000000489/wikidata-query-service?viewPanel=7&orgId=1&refresh=1m > >), > >> and there is a risk that we will reach a size > >> < > https://www.w3.org/wiki/LargeTripleStores#Bigdata.28R.29_.2812.7B.29>limit > >> of what it can realistically support > >> <https://phabricator.wikimedia.org/T213210>. Once > >> Blazegraph is maxed out, WDQS can no longer be updated. > >> This will also break Wikidata tools that rely on WDQS. > >> > >> 2. > >> > >> Software support. Blazegraph is end of life software, > >> which is no longer actively maintained, making it an > >> unsustainable backend to continue moving forward with long > >> term. > >> > >> > >> Blazegraph maxing out in size poses the greatest risk for > >> catastrophic failure, as it would effectively prevent WDQS from > >> being updated further, and inevitably fall out of date. Our long > >> term strategy to address this is to move to a new graph backend > >> that best meets our WDQS needs and is actively maintained, and > >> begin the migration off of Blazegraph as soon as a viable > >> alternative is identified > >> <https://phabricator.wikimedia.org/T206560>. > >> > > > > Hi Mike, > > > > Do bear in mind that pre and post selection of Blazegraph for > > Wikidata, we've always offered an RDF-based DBMS that can handle > > current and future requirements for Wikidata, just as we do DBpedia. > > > > At the time of our first rendezvous, handling 50 billion triples > > would have typically required our Cluster Edition which is a > > Commercial Only offering -- basically, that was the deal breaker > > back then. > > > > Anyway, in recent times, our Open Source Edition has evolved to > > handle some 80 Billion+ triples (exemplified by the live Uniprot > > instance) where performance and scale is primary a function of > > available memory. > > > > I hope this helps. > > > > Related: > > > > [1] https://wikidata.demo.openlinksw.com/sparql > > <https://wikidata.demo.openlinksw.com/sparql>-- Our Live Wikidata > > SPARQL Query Endpoint > > [2] > > > https://docs.google.com/spreadsheets/d/15AXnxMgKyCvLPil_QeGC0DiXOP-Hu8Ln97fZ683ZQF0/edit#gid=0 > > < > https://docs.google.com/spreadsheets/d/15AXnxMgKyCvLPil_QeGC0DiXOP-Hu8Ln97fZ683ZQF0/edit#gid=0 > > > > -- Google Spreadsheet about various Virtuoso Configurations > > associated with some well-known public endpoints > > [3] https://t.co/EjAAO73wwE <https://t.co/EjAAO73wwE> -- this query > > doesn't complete with the current Blazegraph-based Wikidata endpoint > > [4] https://t.co/GTATPPJNBI <https://t.co/GTATPPJNBI> -- same query > > completing when applied to the Virtuoso-based endpoint > > [5] https://t.co/X7mLmcYC69 <https://t.co/X7mLmcYC69> -- about > > loading Wikidata's datasets into a Virtuoso instance > > [6] > > > https://twitter.com/search?q=%23Wikidata%20%23VirtuosoRDBMS%20%40kidehen&src=typed_query&f=live > > < > https://twitter.com/search?q=%2523Wikidata%20%2523VirtuosoRDBMS%20%2540kidehen&src=typed_query&f=live > > > > -- various demos shared via Twitter over the years regarding Wikidata > > > > -- > > Regards, > > > > Kingsley Idehen > > Founder & CEO > > OpenLink Software > > Home Page:http://www.openlinksw.com <http://www.openlinksw.com> > > Community Support:https://community.openlinksw.com < > https://community.openlinksw.com> > > Weblogs (Blogs): > > Company Blog:https://medium.com/openlink-software-blog < > https://medium.com/openlink-software-blog> > > Virtuoso Blog:https://medium.com/virtuoso-blog < > https://medium.com/virtuoso-blog> > > Data Access Drivers Blog: > https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers < > https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers> > > > > Personal Weblogs (Blogs): > > Medium Blog:https://medium.com/@kidehen < > https://medium.com/@kidehen> > > Legacy Blogs:http://www.openlinksw.com/blog/~kidehen/ < > http://www.openlinksw.com/blog/~kidehen/> > > http://kidehen.blogspot.com < > http://kidehen.blogspot.com> > > > > Profile Pages: > > Pinterest:https://www.pinterest.com/kidehen/ < > https://www.pinterest.com/kidehen/> > > Quora:https://www.quora.com/profile/Kingsley-Uyi-Idehen < > https://www.quora.com/profile/Kingsley-Uyi-Idehen> > > Twitter:https://twitter.com/kidehen <https://twitter.com/kidehen> > > Google+:https://plus.google.com/+KingsleyIdehen/about < > https://plus.google.com/+KingsleyIdehen/about> > > LinkedIn:http://www.linkedin.com/in/kidehen < > http://www.linkedin.com/in/kidehen> > > > > Web Identities (WebID): > > Personal: > http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i < > http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i> > > : > http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this > < > http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this > > > > > > _______________________________________________ > > Wikidata mailing list -- [email protected] > > <mailto:[email protected]> > > To unsubscribe send an email to [email protected] > > <mailto:[email protected]> > > > > > > > > -- > > Samuel Klein @metasj w:user:sj +1 617 529 > 4266 > > > > _______________________________________________ > > Wikidata mailing list -- [email protected] > > To unsubscribe send an email to [email protected] > > > > -- > > *Jerven Tjalling Bolleman* > Principal Software Developer > *SIB | Swiss Institute of Bioinformatics* > 1, rue Michel Servet - CH 1211 Geneva 4 - Switzerland > t +41 22 379 58 85 > [email protected] - www.sib.swiss > _______________________________________________ > Wikidata mailing list -- [email protected] > To unsubscribe send an email to [email protected] > -- Samuel Klein @metasj w:user:sj +1 617 529 4266
_______________________________________________ Wikidata mailing list -- [email protected] To unsubscribe send an email to [email protected]
