alaa_wmde added a comment.

  Suggested text:
  
  Why migrating wb_terms is going to be a little more complicated?
  ----------------------------------------------------------------
  
  The migration of `wb_terms` into the new schema is going to be a little more 
complicated than usual. Due to the massive amount of data involved, our current 
master database nodes does not have enough disk space to sustain two different 
copies of the data to allow the usual process of:
  
  1. read old, write old
  2. read old, write both
  3. read new, write both
  4. read new, write new
  
  That above process is still possible if we first migrate only proeprty terms 
as they constitue a negligible percentage of the data (< 0.01%) thus have 
negligible overhead on disk-space usage (<20MB extra needed).
  
  For Item terms, the required extra disk space will be around 170GB which is 
too much to ask for on the current master node. So we go around migrating Item 
terms a little differently and do it as explained later in this section.
  
  What is the implication on queries that used to run against `wb_terms` during 
migration?
  
----------------------------------------------------------------------------------------
  
  During the migration, in order to continue accessing full range of data as 
needed in your tool, both schemas will have to be queried depending on some 
conditions.
  
  Those conditions will be different per entity type (property or item), and 
will be required in certain periods of the whole migration process.
  
  In which periods do I need to be running queries against which schemas?
  -----------------------------------------------------------------------
  
  The following timeline shows the checkpoints of the migration in 
**production** environment and how they affect queries in general (dates are 
approximate atm but will be fixed 2 weeks prior to ):
  
    ... May 29th:   Property Terms migration starts
    .
    .                   here nothing need to be changed. all queries that fetch 
property terms can still read from `wb_terms` as they used to.
    .
    ... June 5th:   Read proprty terms from new schema on Wikidata
    .
    .                   tools must beging querying new schema for property 
terms here.
    .                   terms will still be written to old `wb_terms` but that 
is only for the sake of recoverability in case of problems.
    .
    ... June 12th:  Propert terms migraion begins
    .
    .                   we will migrate only the first 2,000,000 items (with 
their terms, that's ~1% of the total amount).
    .                   nothing need to be changed in here yet. all queries 
that fetch item terms can still read from `wb_terms` as they used to.
    .
    ... June 19th:  Read item terms from one of the two schemas - read one
    .
    .                   tools must read item terms from one of the two schemas 
based on the item ID:
    .                   - if the integer part of Item id is less than 2,000,000 
then it should be read from the new schema
    .                   - otherwise, it should be read from `wb_terms`
    .                   
    ... TBD:        Item terms migration continues for all remaning items
    .
    .                   this will be delayed until we have more capacity on 
database master node to continue the migration.
    .                   this will be announced separately with dates when tools 
should start reading all item terms from new schema
    .
    ... TBD:        Drop wb_terms table
  
  A **test** env. will be provided by 15th of May. The state of that env. will 
mimic a production state in which:
  
  - tools will have to read property terms from new schema (as explained in 
June 5th checkpoint above)
  - tools will have to read item terms from one of the two schemas (as 
explained in June 19th checkpoint above)
  
  Related phab task where we developed the migration plan: 
https://phabricator.wikimedia.org/T220480

TASK DETAIL
  https://phabricator.wikimedia.org/T221746

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: alaa_wmde
Cc: Aklapper, Lea_Lacroix_WMDE, alaa_wmde, Nandana, Lahi, Gq86, 
GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Wikidata-bugs, 
aude, Lydia_Pintscher, JeroenDeDauw, Mbch331
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to