Hi Guillaume,

Which file system is used with Blazegraph?  Is it NFS or Ext4, etc.?
Specifically, the file system used where Journal files are written and read
from? [1]
Because looking at the code, it seems there could be cases where unreported
errors can happen around file locking.

[1]
https://github.com/blazegraph/database/blob/master/bigdata-core/bigdata/src/java/com/bigdata/journal/FileMetadata.java

Thad
https://www.linkedin.com/in/thadguidry/
https://calendly.com/thadguidry/


On Wed, Feb 22, 2023 at 5:06 AM Guillaume Lederrey <[email protected]>
wrote:

> Hello all!
>
> TL;DR: We expect to successfully complete the recent data reload on
> Wikidata Query Service soon, but we've encountered multiple failures
> related to the size of the graph, and anticipate that this issue may worsen
> in the future. Although we succeeded this time, we cannot guarantee that
> future reload attempts will be successful given the current trend of the
> data reload process. Thank you for your understanding and patience..
>
> Longer version:
>
> WDQS is updated from a stream of recent changes on Wikidata, with a
> maximum delay of ~2 minutes. This process was improved as part of the WDQS
> Streaming Updater project to ensure data coherence[1] . However, the update
> process is still imperfect and can lead to data inconsistencies in some
> cases[2][3]. To address this, we reload the data from dumps a few times per
> year to reinitialize the system from a known good state.
>
> The recent reload of data from dumps started in mid-December and was
> initially met with some issues related to download and instabilities in
> Blazegraph, the database used by WDQS[4]. Loading the data into Blazegraph
> takes a couple of weeks due to the size of the graph, and we had multiple
> attempts where the reload failed after >90% of the data had been loaded.
> Our understanding of the issue is that a "race condition" in Blazegraph[5],
> where subtle timing changes lead to corruption of the journal in some rare
> cases, is to blame.[6]
>
> We want to reassure you that the last reload job was successful on one of
> our servers. The data still needs to be copied over to all of the WDQS
> servers, which will take a couple of weeks, but should not bring any
> additional issues. However, reloading the full data from dumps is becoming
> more complex as the data size grows, and we wanted to let you know why the
> process took longer than expected. We understand that data inconsistencies
> can be problematic, and we appreciate your patience and understanding while
> we work to ensure the quality and consistency of the data on WDQS.
>
> Thank you for your continued support and understanding!
>
>
>     Guillaume
>
>
> [1] https://phabricator.wikimedia.org/T244590
> [2] https://phabricator.wikimedia.org/T323239
> [3] https://phabricator.wikimedia.org/T322869
> [4] https://phabricator.wikimedia.org/T323096
> [5] https://en.wikipedia.org/wiki/Race_condition#In_software
> [6] https://phabricator.wikimedia.org/T263110
>
> --
> *Guillaume Lederrey* (he/him)
> Engineering Manager
> Wikimedia Foundation <https://wikimediafoundation.org/>
> _______________________________________________
> Wikidata mailing list -- [email protected]
> Public archives at
> https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/7QTJBRU2T3J22SNV4TGBRML4QNBGCEOU/
> To unsubscribe send an email to [email protected]
>
_______________________________________________
Wikidata mailing list -- [email protected]
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/[email protected]/message/U2T6JKVJFJK7HNQCXNPYBFGSHK4AJQTX/
To unsubscribe send an email to [email protected]

Reply via email to