Aha, hello jerven :)  I should have remembered your earlier comment,
delighted you are here.

Thank you again for sharing your promising experience + benchmarks +
suggestions -- and for highlighting both similarities and differences.

SJ

On Tue, Aug 24, 2021 at 2:18 AM jerven Bolleman <[email protected]>
wrote:

> Hi Samuel, All,
>
> I am the software engineer responsible for sparql.uniprot.org.
> I already offered to help in https://phabricator.wikimedia.org/T206561.
> So no need to ask Andra or Egon ;)
>
> While we are good users of virtuoso, and strongly suggest it is
> evaluated. As it is in general a good product that does scale.[1]
>
> One of the things we did differently than WDQS is to introduce a
> controlled layer between the "public" and the "database".
> To allow things like query rewriting/redirection upon data model
> changes, as well as rewriting some schema rediscovery queries to a known
> faster query. We also parse the queries with RDF4J before handing them
> to virtuoso. This makes sure that the queries that we accept are only
> valid SPARQL 1.1. Avoiding users getting used to almost SPARQL dialects
> (i.e. retain the flexiblity to move to a different endpoint). We are in
> the process of updating this code and contributing it to RDF4J, with the
> first contribution in the develop/4.0.0 branch
>
> I think a number of current customizations in WDQS can be moved to a
> front RDF4J layer. Then the RDF4J sail/repository layer can be used to
> preserve flexibility. So that WDQS can more easily switch between
> backend databases in the future.
>
> One large difference between UniProt and WDQS is that WikiData is
> continually updated while UniProt is batch released a few times a year.
> WDQS is somewhat easier in some areas and more difficult in others
> because of that.
>
> Regards,
> Jerven
>
> [1] No Database is perfect, but it does scale a lot better than
> Blazegraph did. Which we also evaluated in the past. There is still a
> lot of potential in Virtuoso to scale even better in the future.
>
>
>
>
>
> On 23/08/2021 21:36, Samuel Klein wrote:
> > Ah, that's lovely.  Thanks for the update, Kingsley!  Uniprot is a good
> > parallel to keep in mind.
> >
> > For Egon, Andra, others who work with them: Is there someone you'd
> > recommend chatting with at uniprot?
> > "scaling alongside uniprot" or at least engaging them on how to solve
> > shared + comparable issues (they also offer authentication-free SPARQL
> > querying) sounds like a compelling option.
> >
> > S.
> >
> > On Thu, Aug 19, 2021 at 4:32 PM Kingsley Idehen via Wikidata
> > <[email protected] <mailto:[email protected]>>
> wrote:
> >
> >     On 8/18/21 5:07 PM, Mike Pham wrote:
> >>
> >>     Wikidata community members,
> >>
> >>
> >>     Thank you for all of your work helping Wikidata grow and improve
> >>     over the years. In the spirit of better communication, we would
> >>     like to take this opportunity to share some of the current
> >>     challenges Wikidata Query Service (WDQS) is facing, and some
> >>     strategies we have for dealing with them.
> >>
> >>
> >>     WDQS currently risks failing to provide acceptable service quality
> >>     due to the following reasons:
> >>
> >>     1.
> >>
> >>         Blazegraph scaling
> >>
> >>         1.
> >>
> >>             Graph size. WDQS uses Blazegraph as our graph backend.
> >>             While Blazegraph can theoretically support 50 billion
> >>             edges <https://blazegraph.com/>, in reality Wikidata is
> >>             the largest graph we know of running on Blazegraph (~13
> >>             billion triples
> >>             <
> https://grafana.wikimedia.org/d/000000489/wikidata-query-service?viewPanel=7&orgId=1&refresh=1m
> >),
> >>             and there is a risk that we will reach a size
> >>             <
> https://www.w3.org/wiki/LargeTripleStores#Bigdata.28R.29_.2812.7B.29>limit
> >>             of what it can realistically support
> >>             <https://phabricator.wikimedia.org/T213210>. Once
> >>             Blazegraph is maxed out, WDQS can no longer be updated.
> >>             This will also break Wikidata tools that rely on WDQS.
> >>
> >>         2.
> >>
> >>             Software support. Blazegraph is end of life software,
> >>             which is no longer actively maintained, making it an
> >>             unsustainable backend to continue moving forward with long
> >>             term.
> >>
> >>
> >>     Blazegraph maxing out in size poses the greatest risk for
> >>     catastrophic failure, as it would effectively prevent WDQS from
> >>     being updated further, and inevitably fall out of date. Our long
> >>     term strategy to address this is to move to a new graph backend
> >>     that best meets our WDQS needs and is actively maintained, and
> >>     begin the migration off of Blazegraph as soon as a viable
> >>     alternative is identified
> >>     <https://phabricator.wikimedia.org/T206560>.
> >>
> >
> >     Hi Mike,
> >
> >     Do bear in mind that pre and post selection of Blazegraph for
> >     Wikidata, we've always offered an RDF-based DBMS that can handle
> >     current and future requirements for Wikidata, just as we do DBpedia.
> >
> >     At the time of our first rendezvous, handling 50 billion triples
> >     would have typically required our Cluster Edition which is a
> >     Commercial Only offering -- basically, that was the deal breaker
> >     back then.
> >
> >     Anyway, in recent times, our Open Source Edition has evolved to
> >     handle some 80 Billion+ triples (exemplified by the live Uniprot
> >     instance) where performance and scale is primary a function of
> >     available memory.
> >
> >     I hope this helps.
> >
> >     Related:
> >
> >     [1] https://wikidata.demo.openlinksw.com/sparql
> >     <https://wikidata.demo.openlinksw.com/sparql>-- Our Live Wikidata
> >     SPARQL Query Endpoint
> >     [2]
> >
> https://docs.google.com/spreadsheets/d/15AXnxMgKyCvLPil_QeGC0DiXOP-Hu8Ln97fZ683ZQF0/edit#gid=0
> >     <
> https://docs.google.com/spreadsheets/d/15AXnxMgKyCvLPil_QeGC0DiXOP-Hu8Ln97fZ683ZQF0/edit#gid=0
> >
> >     -- Google Spreadsheet about various Virtuoso Configurations
> >     associated with some well-known public endpoints
> >     [3] https://t.co/EjAAO73wwE <https://t.co/EjAAO73wwE> -- this query
> >     doesn't complete with the current Blazegraph-based Wikidata endpoint
> >     [4] https://t.co/GTATPPJNBI <https://t.co/GTATPPJNBI> -- same query
> >     completing when applied to the Virtuoso-based endpoint
> >     [5] https://t.co/X7mLmcYC69 <https://t.co/X7mLmcYC69> -- about
> >     loading Wikidata's datasets into a Virtuoso instance
> >     [6]
> >
> https://twitter.com/search?q=%23Wikidata%20%23VirtuosoRDBMS%20%40kidehen&src=typed_query&f=live
> >     <
> https://twitter.com/search?q=%2523Wikidata%20%2523VirtuosoRDBMS%20%2540kidehen&src=typed_query&f=live
> >
> >     -- various demos shared via Twitter over the years regarding Wikidata
> >
> >     --
> >     Regards,
> >
> >     Kingsley Idehen
> >     Founder & CEO
> >     OpenLink Software
> >     Home Page:http://www.openlinksw.com  <http://www.openlinksw.com>
> >     Community Support:https://community.openlinksw.com  <
> https://community.openlinksw.com>
> >     Weblogs (Blogs):
> >     Company Blog:https://medium.com/openlink-software-blog  <
> https://medium.com/openlink-software-blog>
> >     Virtuoso Blog:https://medium.com/virtuoso-blog  <
> https://medium.com/virtuoso-blog>
> >     Data Access Drivers Blog:
> https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers  <
> https://medium.com/openlink-odbc-jdbc-ado-net-data-access-drivers>
> >
> >     Personal Weblogs (Blogs):
> >     Medium Blog:https://medium.com/@kidehen  <
> https://medium.com/@kidehen>
> >     Legacy Blogs:http://www.openlinksw.com/blog/~kidehen/  <
> http://www.openlinksw.com/blog/~kidehen/>
> >                    http://kidehen.blogspot.com  <
> http://kidehen.blogspot.com>
> >
> >     Profile Pages:
> >     Pinterest:https://www.pinterest.com/kidehen/  <
> https://www.pinterest.com/kidehen/>
> >     Quora:https://www.quora.com/profile/Kingsley-Uyi-Idehen  <
> https://www.quora.com/profile/Kingsley-Uyi-Idehen>
> >     Twitter:https://twitter.com/kidehen  <https://twitter.com/kidehen>
> >     Google+:https://plus.google.com/+KingsleyIdehen/about  <
> https://plus.google.com/+KingsleyIdehen/about>
> >     LinkedIn:http://www.linkedin.com/in/kidehen  <
> http://www.linkedin.com/in/kidehen>
> >
> >     Web Identities (WebID):
> >     Personal:
> http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i  <
> http://kingsley.idehen.net/public_home/kidehen/profile.ttl#i>
> >              :
> http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this
> <
> http://id.myopenlink.net/DAV/home/KingsleyUyiIdehen/Public/kingsley.ttl#this
> >
> >
> >     _______________________________________________
> >     Wikidata mailing list -- [email protected]
> >     <mailto:[email protected]>
> >     To unsubscribe send an email to [email protected]
> >     <mailto:[email protected]>
> >
> >
> >
> > --
> > Samuel Klein          @metasj           w:user:sj          +1 617 529
> 4266
> >
> > _______________________________________________
> > Wikidata mailing list -- [email protected]
> > To unsubscribe send an email to [email protected]
> >
>
> --
>
>         *Jerven Tjalling Bolleman*
> Principal Software Developer
> *SIB | Swiss Institute of Bioinformatics*
> 1, rue Michel Servet - CH 1211 Geneva 4 - Switzerland
> t +41 22 379 58 85
> [email protected] - www.sib.swiss
> _______________________________________________
> Wikidata mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
>


-- 
Samuel Klein          @metasj           w:user:sj          +1 617 529 4266
_______________________________________________
Wikidata mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to