Addshore added a comment.

  I'm generating some new lists to work from to reduce the amount of time the 
rest of the migration will take.
  
  Identifying "holes" in the tables, where some records exist, but some in 
other tables have gone missing due to bugs:
  
    addshore@stat1007:~$ analytics-mysql wikidatawiki -e "SELECT DISTINCT 
wbit_item_id as id FROM wbt_item_terms LEFT JOIN wbt_term_in_lang ON 
wbit_term_in_lang_id = wbtl_id LEFT JOIN wbt_type ON wbtl_type_id = wby_id LEFT 
JOIN wbt_text_in_lang ON wbtl_text_in_lang_id = wbxl_id LEFT JOIN wbt_text ON 
wbxl_text_id = wbx_id WHERE wbx_text IS NULL ORDER BY wbit_item_id ASC;" -N -B 
> 4march1740-holes-nulls.list
  
  Another one identifying all items that have no records yet (have not been 
migrated):
  
    addshore@stat1007:~$ cat 4march1740-holes-86000000.sql
    SELECT n
    FROM
    (
    SELECT (a.digit + (10 * b.digit) + (100 * c.digit) + (1000 * d.digit) + 
(10000 * e.digit) + (100000 * f.digit) + (1000000 * g.digit) + (10000000 * 
h.digit)) as n
        from (select 0 as digit union all select 1 union all select 2 union all 
select 3 union all select 4 union all select 5 union all select 6 union all 
select 7 union all select 8 union all select 9) as a
        cross join (select 0 as digit union all select 1 union all select 2 
union all select 3 union all select 4 union all select 5 union all select 6 
union all select 7 union all select 8 union all select 9) as b
        cross join (select 0 as digit union all select 1 union all select 2 
union all select 3 union all select 4 union all select 5 union all select 6 
union all select 7 union all select 8 union all select 9) as c
        cross join (select 0 as digit union all select 1 union all select 2 
union all select 3 union all select 4 union all select 5 union all select 6 
union all select 7 union all select 8 union all select 9) as d
        cross join (select 0 as digit union all select 1 union all select 2 
union all select 3 union all select 4 union all select 5 union all select 6 
union all select 7 union all select 8 union all select 9) as e
        cross join (select 0 as digit union all select 1 union all select 2 
union all select 3 union all select 4 union all select 5 union all select 6 
union all select 7 union all select 8 union all select 9) as f
        cross join (select 0 as digit union all select 1 union all select 2 
union all select 3 union all select 4 union all select 5 union all select 6 
union all select 7 union all select 8 union all select 9) as g
        cross join (select 0 as digit union all select 1 union all select 2 
union all select 3 union all select 4 union all select 5 union all select 6 
union all select 7 union all select 8 union all select 9) as h
    ) as a
    LEFT JOIN
    ( SELECT DISTINCT wbit_item_id from wbt_item_terms where wbit_item_id 
BETWEEN 0 AND 86000000 ) as b
    ON wbit_item_id = n
    WHERE wbit_item_id IS NULL
    AND n BETWEEN 0 AND 86000000
    ;
  
  After these lists have been migrated we will also have to deal with items 
above 86 million.
  
  When running these queries earlier I identified roughly 26 million more items 
to pass over
  
    addshore@stat1007:~$ wc -l 4march1143-holes-*.list
     19237277 4march1143-holes-87061632.list
      6753614 4march1143-holes-nulls.list
     25990891 total
    addshore@stat1007:~$ sort 4march1143-holes-*.list | uniq | wc -l
    25990891

TASK DETAIL
  https://phabricator.wikimedia.org/T219123

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Addshore
Cc: jcrespo, WMDE-leszek, Ladsgroup, Addshore, Jdforrester-WMF, ArielGlenn, 
Aklapper, alaa_wmde, Hazizibinmahdi, Iflorez, darthmon_wmde, Nandana, Lahi, 
Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Jonas, Wikidata-bugs, aude, Lydia_Pintscher, Mbch331
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to