Hello all,

It has been exciting to see the increased adoption and usage of the
Wikidata Query Service (WDQS) usage in the past year. To support this
growing demand, on 15 March 2021 the Search Platform team released a new
Streaming Updater to a test server https://query-preview.wikidata.org for
feedback before going to production on 15 April 2021 (pending any major
blockers discovered during testing). Once in production, WDQS will become
less of a bottleneck for Wikidata updates, and we’re looking forward to
better facilitating Wikidata’s continued growth as a more complete
knowledge graph.

Your relevant feedback on the following changes is important to us to
ensure we continue to best support your needs while scaling up the service
in production:

   1. New Streaming Updater: [1]
      -

      This improvement to the Updater will allow WDQS to better handle the
      volume of edits to Wikidata, improving data consistency and decreasing
      update latency: while the existing Updater fluctuates between 5–15
      updates/sec (averaging 10 updates/sec), the new Updater will be
able handle
      a throughput of 40–130 updates/sec (88 updates/sec on average). Without
      these performance improvements, edits to Wikidata were being throttled
      <https://phabricator.wikimedia.org/T243701>, approaching the point
      where they could become impossible. With the new Updater, edits
to Wikidata
      will be on the whole more consistent and have less lag, reducing the WDQS
      bottleneck to improving Wikidata content.
      -

      We don’t anticipate this to adversely affect workflows or usage, but
      it is a big update, and we would like you to let us know if you find any
      related bugs or problems so that we can properly address them.
      2. Blank node skolemization: [2]
      -

      To reliably use the new Streaming Updater to minimize the throttling
      of edits to Wikidata, skolemization of blank nodes was required, as
      detailed in the phabricator ticket. For more detail on why this was
      necessary, you can also refer to another attempt to design a “diff”
      format for RDF <https://www.w3.org/2001/sw/wiki/TurtlePatch>, where
      the solution suggested to handle blank nodes is also skolemization. We
      understand that this solution will unfortunately potentially introduce
      breaking changes to your usage of WDQS, RDF dumps, and
Special:EntityData;
      however, given the severe risk of the edits to Wikidata becoming
      impossible, we felt this was the best course of action to take in the
      timeframe we had. We acknowledge that this approach has its shortcomings,
      however, and invite you to provide us with feedback on how we can improve
      future usage of Wikidata and WDQS while maintaining their scalability and
      reliability.
      -

      From a user perspective of this change, (1) queries using isBlank()
      will need to be rewritten; (2) queries using isIRI/isURI will need to be
      verified; (3) WDQS results will no longer include blank nodes. If these
      changes affect your workflows, and/or you need to know how to modify your
      workflows to account for the blank node skolemization, please let us know
      what your specific use case is.
      -

      For more detail on how to modify your workflows, including examples,
      please refer to the following page:
      
https://www.mediawiki.org/wiki/Wikidata_Query_Service/Blank_Node_Skolemization
      3. Constraint fetching [3]
      -

      Constraints are a Wikibase concept that allows entities to be
      validated based on definable properties: i.e. all astronauts
must be human.
      Ideally, constraint fetching would be used to ensure data quality for
      Wikidata edits. The reality is that the current implementation of
      constraints fetching is not meeting our production quality standards and
      was generating detrimental noise in our logs.
      -

      As a result of the sub-par implementation, and the fact that the new
      Flink-based Streaming Updater doesn’t support it, current constraint
      fetching functionality will be disabled with the new Updater
release, until
      we can expose constraint violations in a more production-ready way
      [4][5][6]. We recognize that even functionality that doesn’t meet our
      production quality standards is still potentially useful for some, and we
      would like to hear your feedback if you are adversely affected by this
      change.

We’re looking forward to these new changes improving WDQS, and your
relevant feedback on these updates will help us make sure we can continue
to support your needs. If you have any questions, issues or suggestions,
feel free to reach out to us on the WDQS contact page
<https://www.wikidata.org/wiki/Wikidata:Contact_the_development_team/Query_Service_and_search#New_WDQS_Streaming_Updater_feedback>
.

original announcement on Wikidata Project Chat:
https://www.wikidata.org/wiki/Wikidata:Project_chat#New_WDQS_Streaming_Updater_now_available_on_pre-production_test_server_for_feedback

[1] - https://phabricator.wikimedia.org/T244590
[2] - https://phabricator.wikimedia.org/T244341
[3] - https://phabricator.wikimedia.org/T274982
[4] - https://phabricator.wikimedia.org/T204024
[5] - https://phabricator.wikimedia.org/T201147
[6] - https://phabricator.wikimedia.org/T201150




—

Mike Pham (he/him)
Sr Product Manager, Search Platform
Wikimedia Foundation <https://wikimediafoundation.org/>
_______________________________________________
Wikidata mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata

Reply via email to