[Wikidata-bugs] [Maniphest] T206560: [Epic] Evaluate alternatives to Blazegraph

2021-09-10 Thread KingsleyIdehen
KingsleyIdehen added a comment.


  A few things:
  
  1. This behavior is configurable i.e., we just have a restrictive timeout 
associated with the current public instance
  2. We could also include HTML-level messaging to complement the HTTP-level 
messaging
  
  This feature is the result of a fundamental challenge associated with the 
following, when you publish a query for ad-hoc query access to the Web:
  
  1. Unpredictable Query Complexity
  2. Unpredictable Query Solution Size
  3. Unpredictable Number of User Agents triggering and combination of the 
above.
  
  Conventional DBMS products can't handle this problem, hence the creation of 
this feature at the time Virtuoso was created circa 1998.
  
  Alternatively, as many do, you can simply throw lots of machines at the 
problem via horizontal partitioning using shards which sets you on the path to 
massive data centers (the norm worldwide these days).
  
  > SELECT (COUNT(*) AS ?count) WHERE { ?s ?p ?o }
  
  Is a typical expensive query that falls into the list I presented above. It's 
no different to
  
  > SELECT * FROM {Some-Table}
  
  in a typical SQL-compliant RDBMS which is why you don't see those published 
to Web -- although Virtuoso instances handle this too using the same "Anytime 
Query" functionality with a configurable timeout.
  
  Getting a result vs an actual latest and greatest result are different 
things. There is no way on earth that
  
  > SELECT (COUNT(*) AS ?count) WHERE { ?s ?p ?o }
  
  is happening in realtime on the Blazegraph instance . We could return a value 
from our statistics table too, but that isn't the global interpretation of a 
solution for that query.  For instance, we could put all the relevant stats in 
a voID graph which scheduled updates to it if need be.
  
  With Virtuoso, everything can be configured to suit the interaction behavior 
desired. It just so happens that ad-hoc querying, for global 24/7 access; 
irrespective of solution size, complexity, origins; is a fundamental challenge 
that isn't generally understood since its pegged to the emergence of the Web :)

TASK DETAIL
  https://phabricator.wikimedia.org/T206560

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: KingsleyIdehen
Cc: Hannah_Bast, RShigapov, Izno, KingsleyIdehen, Daniel_Mietchen, Majavah, 
karapayneWMDE, MarioGom, Mohammed_Sadat_WMDE, Hjfocs, danshick-wmde, 
Thadguidry, Tpt, TallTed, Sj, Afandian, Justin0x2004, Jerven, TheKtk, 
Ivanhercaz, Jneubert, DanBri, Lydia_Pintscher, Tagishsimon, 
Samantha_Alipio_WMDE, Ostrzyciel, GreenReaper, WMDE-leszek, Salgo60, So9q, 
Krabina, Jecummings4, TomT0m, Akuckartz, Susannaanas, Addshore, Andrawaag, 
Gehel, Lucas_Werkmeister_WMDE, Aklapper, Smalyshev, Invadibot, MPhamWMF, 
Jtm-lis, maantietaja, NavinRizwi, CBogen, Isaacandy, Demian, Olson.jared.m, 
Nandana, Namenlos314, Lahi, Gq86, Bryandamon, GoranSMilovanovic, QZanden, 
EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, 
Steko, Samwilson, PhotographerTom, suriyaa, Psychoslave, tosfos, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Darenwelsh, Dinoguy1000, 
Manybubbles, brion, Mbch331, MarkAHershberger
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T206560: [Epic] Evaluate alternatives to Blazegraph

2021-09-09 Thread KingsleyIdehen
KingsleyIdehen added a comment.


  That's a consequence of the "Anytime Query" feature in Virtuoso that provides 
partial solutions in situations where a query cannot be completed within a 
specific timeframe. This timeframe takes the form of a configurable timeout, 
and is an critical feature for enabling global ad-hoc query access, 24/7, 365 
days a year re the likes of DBpedia and Wikidata.
  
  When a partial query is returned, information is delivered via the HTTP 
response as per:
  
  curl -I 
"https://wikidata.demo.openlinksw.com/sparql?default-graph-uri=http%3A%2F%2Fwww.wikidata.org%2F=PREFIX+parl%3A+%3Chttps%3A%2F%2Fid.parliament.uk%2Fschema%2F%3E%0D%0APREFIX+rdfs%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0D%0APREFIX+wd%3A+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fentity%2F%3E%0D%0APREFIX+wdt%3A+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2F%3E%0D%0APREFIX+wikibase%3A+%3Chttp%3A%2F%2Fwikiba.se%2Fontology%23%3E%0D%0APREFIX+bd%3A+%3Chttp%3A%2F%2Fwww.bigdata.com%2Frdf%23%3E%0D%0APREFIX+dct%3A+%3Chttp%3A%2F%2Fpurl.org%2Fdc%2Fterms%2F%3E%0D%0APREFIX+dbpedia%3A+%3Chttp%3A%2F%2Fdbpedia.org%2F%3E+%0D%0APREFIX+foaf%3A+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2F%3E%0D%0APREFIX+wds%3A+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fentity%2Fstatement%2F%3E%0D%0APREFIX+wdv%3A+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fvalue%2F%3E%0D%0APREFIX+wdt%3A+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fdirect%2F%3E%0D%0APREFIX+p%3A+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2F%3E%0D%0APREFIX+ps%3A+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fstatement%2F%3E%0D%0APREFIX+pq%3A+%3Chttp%3A%2F%2Fwww.wikidata.org%2Fprop%2Fqualifier%2F%3E%0D%0A%0D%0ASELECT+%3Fperson_id+%3Fperson+%28COUNT%28%3Fprofession_id%29+AS+%3Fcount%29+%28GROUP_CONCAT%28%3Fprofession%3B+separator%3D%22%2C+%22%29+AS+%3Fprofessions%29+WHERE+%7B%0D%0A++%3Fperson_id+wdt%3AP31+wd%3AQ5+.%0D%0A++%3Fperson_id+wdt%3AP106+%3Fprofession_id+.%0D%0A++%3Fprofession_id+rdfs%3Alabel+%3Fprofession+.%0D%0A++%3Fperson_id+rdfs%3Alabel+%3Fperson+.%0D%0A++FILTER+%28LANG%28%3Fperson%29+%3D+%22en%22%29+.%0D%0A++FILTER+%28LANG%28%3Fprofession%29+%3D+%22en%22%29%0D%0A%7D%0D%0AGROUP+BY+%3Fperson_id+%3Fperson%0D%0AORDER+BY+DESC%28%3Fcount%29=text%2Fx-html%2Btr=3000_void=on_unconnected=on;
  
  Which returns:
  
  HTTP/1.1 200 OK
  Date: Thu, 09 Sep 2021 17:53:51 GMT
  Content-Type: text/html; charset=UTF-8
  Content-Length: 479290
  Connection: keep-alive
  Vary: Accept-Encoding
  Server: Virtuoso/08.03.3320 (Linux) x86_64-generic-linux-glibc25  VDB
  Accept-Ranges: bytes
  X-SPARQL-default-graph: http://www.wikidata.org/
  X-SQL-State: S1TAT
  X-SQL-Message: RC...: Returning incomplete results, query interrupted by 
result timeout.  Activity:  1.075M rnd  530.1K seq  16.78K same seg   109.7K 
same pg  14.13K same par  0 disk  0 spec disk  0B /  0
  X-Exec-Milliseconds: 2031
  X-Exec-DB-Activity: 1.075M rnd  530.1K seq  16.78K same seg   109.7K same pg  
14.13K same par  0 disk  0 spec disk  0B /  0 messages  0 
fork
  Content-disposition: filename=sparql_2021-09-09_17-53-51Z.html
  Expires: Thu, 09 Sep 2021 18:53:51 GMT
  Cache-Control: max-age=3600
  Strict-Transport-Security: max-age=15768000
  
  **Related**
  
  [1] DBpedia Fair Use Note <https://www.dbpedia.org/resources/sparql/>
  [2] Virtuoso Anytime Query Tips & Tricks Note 
<http://vos.openlinksw.com/owiki/wiki/VOS/VirtTipsAndTricksAnytimeSPARQLQuery>

TASK DETAIL
  https://phabricator.wikimedia.org/T206560

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: KingsleyIdehen
Cc: Hannah_Bast, RShigapov, Izno, KingsleyIdehen, Daniel_Mietchen, Majavah, 
karapayneWMDE, MarioGom, Mohammed_Sadat_WMDE, Hjfocs, danshick-wmde, 
Thadguidry, Tpt, TallTed, Sj, Afandian, Justin0x2004, Jerven, TheKtk, 
Ivanhercaz, Jneubert, DanBri, Lydia_Pintscher, Tagishsimon, 
Samantha_Alipio_WMDE, Ostrzyciel, GreenReaper, WMDE-leszek, Salgo60, So9q, 
Krabina, Jecummings4, TomT0m, Akuckartz, Susannaanas, Addshore, Andrawaag, 
Gehel, Lucas_Werkmeister_WMDE, Aklapper, Smalyshev, Invadibot, MPhamWMF, 
Jtm-lis, maantietaja, NavinRizwi, CBogen, Isaacandy, Demian, Olson.jared.m, 
Nandana, Namenlos314, Lahi, Gq86, Bryandamon, GoranSMilovanovic, QZanden, 
EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, 
Steko, Samwilson, PhotographerTom, suriyaa, Psychoslave, tosfos, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Darenwelsh, Dinoguy1000, 
Manybubbles, brion, Mbch331, MarkAHershberger
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T206560: [Epic] Evaluate alternatives to Blazegraph

2021-08-26 Thread KingsleyIdehen
KingsleyIdehen added a comment.


  For the record.
  
  At the time of our first rendezvous re Wikidata hosting, handling 20 billion+ 
triples would have typically required our Cluster Edition (a Commercial Only 
offering). That was the deal-breaker back at the time of initial Blazegraph 
selection for Wikidata i.e., it offered an Open Source based Cluster Edition.
  
  Anyway, in recent times, our Open Source Edition has evolved to handle some 
80 Billion+ triples (exemplified by the live Uniprot instance 
<https://sparql.uniprot.org/>) where performance and scale is primary a 
function of available memory. Fundamentally, the current 13 Billion Triples 
size of Wikidata and future growth all lie well within the range of Virtuoso's 
Open Source Edition.
  
  Also note, based on our experience hosting live DBpedia and Wikidata 
instances, we do have configuration best practices in place for uptime and 
scalability without the need for our Cluster Edition (which is really for 
dealing with massive setups in the 100 Billion Triples or higher range).
  
  I hope this helps.
  
  **Related**
  
  [1] Our Live Wikidata SPARQL Query Endpoint 
<https://wikidata.demo.openlinksw.com/sparql>
  [2] Google Spreadsheet about various Virtuoso Configurations associated with 
some well-known public endpoints 
<https://docs.google.com/spreadsheets/d/15AXnxMgKyCvLPil_QeGC0DiXOP-Hu8Ln97fZ683ZQF0/edit#gid=0>
  [3] This query doesn't complete with the current Blazegraph-based Wikidata 
endpoint <https://t.co/EjAAO73wwE>
  [4] Same query completing when applied to the Virtuoso-based endpoint 
<https://t.co/GTATPPJNBI>
  [5] About loading Wikidata's datasets into a Virtuoso instance 
<https://t.co/X7mLmcYC69>
  [6] Various demos shared via Twitter over the years regarding Wikidata 
<https://twitter.com/search?q=%23Wikidata%20%23VirtuosoRDBMS%20%40kidehen=typed_query=live>
 
  [7] Uniprot SPARQL Endpoint Presentation <https://t.co/EpuP27TFRE?amp=1>

TASK DETAIL
  https://phabricator.wikimedia.org/T206560

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: KingsleyIdehen
Cc: KingsleyIdehen, Daniel_Mietchen, Majavah, karapayneWMDE, MarioGom, 
Mohammed_Sadat_WMDE, Hjfocs, danshick-wmde, Thadguidry, Tpt, TallTed, Sj, 
Afandian, Justin0x2004, Jerven, TheKtk, Ivanhercaz, Jneubert, DanBri, 
Lydia_Pintscher, Tagishsimon, Samantha_Alipio_WMDE, Ostrzyciel, GreenReaper, 
WMDE-leszek, Salgo60, So9q, Krabina, Jecummings4, TomT0m, Akuckartz, 
Susannaanas, Addshore, Andrawaag, Gehel, Lucas_Werkmeister_WMDE, Aklapper, 
Smalyshev, Invadibot, MPhamWMF, Jtm-lis, maantietaja, NavinRizwi, CBogen, 
Isaacandy, Demian, Olson.jared.m, Nandana, Namenlos314, Lahi, Gq86, Bryandamon, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, Steko, Samwilson, PhotographerTom, suriyaa, 
Psychoslave, tosfos, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Darenwelsh, Dinoguy1000, Manybubbles, brion, Mbch331, MarkAHershberger
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T206561: Evaluate Virtuoso as alternative to Blazegraph

2021-08-25 Thread KingsleyIdehen
KingsleyIdehen added a comment.


  In T206561#7304519 <https://phabricator.wikimedia.org/T206561#7304519>, @So9q 
wrote:
  
  > I took a glance at Virtuoso.
  >
  > I found nothing about scaling Virtuoso to a cluster (which is IMO what WMF 
needs because of growing amounts of data and reaching the limits of what 1 
machine can handle)
  >
  > A snippet from WP:
  > "Virtuoso is designed to take advantage of operating system threading 
support and multiple CPUs. It consists of a single process with an adjustable 
pool of threads shared between clients. Multiple threads may work on a single 
index tree with minimal interference with each other. One cache of database 
pages is shared among all threads and old dirty pages are written back to disk 
as a background process."
  >
  > Virtuoso IMO is not the way forward for WMF. We need a distributed 
graph/column database with SPARQL on top. See 
https://phabricator.wikimedia.org/T289561 for an application that has exactly 
that (but seems abandoned since dec 2020 unfortunately)
  
  Again, Virtuoso 7.x Open Source Edition scales up to 80 Billion Triples as 
demonstrated by Uniprots live instance.
  
  You don't need the Virtuoso Cluster Edition until the scalability of the 
single-server edition is exhausted. Wikidata is a long way from reaching 80 
Billion+ triples.
  
  Virtuoso has also hosted DBpedia for the last 14 years i.e., since its 
inception .
  
  I hope that helps.
  
  Kingsley

TASK DETAIL
  https://phabricator.wikimedia.org/T206561

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: KingsleyIdehen
Cc: KingsleyIdehen, So9q, TallTed, Sj, Jerven, Base, TomT0m, Akuckartz, 
GreenReaper, Addshore, Lucas_Werkmeister_WMDE, Aklapper, Smalyshev, Invadibot, 
MPhamWMF, maantietaja, CBogen, Nandana, Namenlos314, Lahi, Gq86, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, 
Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org