alaa_wmde added a comment.
Suggested text:
Why migrating wb_terms is going to be a little more complicated?
----------------------------------------------------------------
The migration of `wb_terms` into the new schema is going to be a little more
complicated than usual. Due to the massive amount of data involved, our current
master database nodes does not have enough disk space to sustain two different
copies of the data to allow the usual process of:
1. read old, write old
2. read old, write both
3. read new, write both
4. read new, write new
That above process is still possible if we first migrate only proeprty terms
as they constitue a negligible percentage of the data (< 0.01%) thus have
negligible overhead on disk-space usage (<20MB extra needed).
For Item terms, the required extra disk space will be around 170GB which is
too much to ask for on the current master node. So we go around migrating Item
terms a little differently and do it as explained later in this section.
What is the implication on queries that used to run against `wb_terms` during
migration?
----------------------------------------------------------------------------------------
During the migration, in order to continue accessing full range of data as
needed in your tool, both schemas will have to be queried depending on some
conditions.
Those conditions will be different per entity type (property or item), and
will be required in certain periods of the whole migration process.
In which periods do I need to be running queries against which schemas?
-----------------------------------------------------------------------
The following timeline shows the checkpoints of the migration in
**production** environment and how they affect queries in general (dates are
approximate atm but will be fixed 2 weeks prior to ):
... May 29th: Property Terms migration starts
.
. here nothing need to be changed. all queries that fetch
property terms can still read from `wb_terms` as they used to.
.
... June 5th: Read proprty terms from new schema on Wikidata
.
. tools must beging querying new schema for property
terms here.
. terms will still be written to old `wb_terms` but that
is only for the sake of recoverability in case of problems.
.
... June 12th: Propert terms migraion begins
.
. we will migrate only the first 2,000,000 items (with
their terms, that's ~1% of the total amount).
. nothing need to be changed in here yet. all queries
that fetch item terms can still read from `wb_terms` as they used to.
.
... June 19th: Read item terms from one of the two schemas - read one
.
. tools must read item terms from one of the two schemas
based on the item ID:
. - if the integer part of Item id is less than 2,000,000
then it should be read from the new schema
. - otherwise, it should be read from `wb_terms`
.
... TBD: Item terms migration continues for all remaning items
.
. this will be delayed until we have more capacity on
database master node to continue the migration.
. this will be announced separately with dates when tools
should start reading all item terms from new schema
.
... TBD: Drop wb_terms table
A **test** env. will be provided by 15th of May. The state of that env. will
mimic a production state in which:
- tools will have to read property terms from new schema (as explained in
June 5th checkpoint above)
- tools will have to read item terms from one of the two schemas (as
explained in June 19th checkpoint above)
Related phab task where we developed the migration plan:
https://phabricator.wikimedia.org/T220480
TASK DETAIL
https://phabricator.wikimedia.org/T221746
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: alaa_wmde
Cc: Aklapper, Lea_Lacroix_WMDE, alaa_wmde, Nandana, Lahi, Gq86,
GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Wikidata-bugs,
aude, Lydia_Pintscher, JeroenDeDauw, Mbch331
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs