Addshore added a comment.
I'm generating some new lists to work from to reduce the amount of time the
rest of the migration will take.
Identifying "holes" in the tables, where some records exist, but some in
other tables have gone missing due to bugs:
addshore@stat1007:~$ analytics-mysql wikidatawiki -e "SELECT DISTINCT
wbit_item_id as id FROM wbt_item_terms LEFT JOIN wbt_term_in_lang ON
wbit_term_in_lang_id = wbtl_id LEFT JOIN wbt_type ON wbtl_type_id = wby_id LEFT
JOIN wbt_text_in_lang ON wbtl_text_in_lang_id = wbxl_id LEFT JOIN wbt_text ON
wbxl_text_id = wbx_id WHERE wbx_text IS NULL ORDER BY wbit_item_id ASC;" -N -B
> 4march1740-holes-nulls.list
Another one identifying all items that have no records yet (have not been
migrated):
addshore@stat1007:~$ cat 4march1740-holes-86000000.sql
SELECT n
FROM
(
SELECT (a.digit + (10 * b.digit) + (100 * c.digit) + (1000 * d.digit) +
(10000 * e.digit) + (100000 * f.digit) + (1000000 * g.digit) + (10000000 *
h.digit)) as n
from (select 0 as digit union all select 1 union all select 2 union all
select 3 union all select 4 union all select 5 union all select 6 union all
select 7 union all select 8 union all select 9) as a
cross join (select 0 as digit union all select 1 union all select 2
union all select 3 union all select 4 union all select 5 union all select 6
union all select 7 union all select 8 union all select 9) as b
cross join (select 0 as digit union all select 1 union all select 2
union all select 3 union all select 4 union all select 5 union all select 6
union all select 7 union all select 8 union all select 9) as c
cross join (select 0 as digit union all select 1 union all select 2
union all select 3 union all select 4 union all select 5 union all select 6
union all select 7 union all select 8 union all select 9) as d
cross join (select 0 as digit union all select 1 union all select 2
union all select 3 union all select 4 union all select 5 union all select 6
union all select 7 union all select 8 union all select 9) as e
cross join (select 0 as digit union all select 1 union all select 2
union all select 3 union all select 4 union all select 5 union all select 6
union all select 7 union all select 8 union all select 9) as f
cross join (select 0 as digit union all select 1 union all select 2
union all select 3 union all select 4 union all select 5 union all select 6
union all select 7 union all select 8 union all select 9) as g
cross join (select 0 as digit union all select 1 union all select 2
union all select 3 union all select 4 union all select 5 union all select 6
union all select 7 union all select 8 union all select 9) as h
) as a
LEFT JOIN
( SELECT DISTINCT wbit_item_id from wbt_item_terms where wbit_item_id
BETWEEN 0 AND 86000000 ) as b
ON wbit_item_id = n
WHERE wbit_item_id IS NULL
AND n BETWEEN 0 AND 86000000
;
After these lists have been migrated we will also have to deal with items
above 86 million.
When running these queries earlier I identified roughly 26 million more items
to pass over
addshore@stat1007:~$ wc -l 4march1143-holes-*.list
19237277 4march1143-holes-87061632.list
6753614 4march1143-holes-nulls.list
25990891 total
addshore@stat1007:~$ sort 4march1143-holes-*.list | uniq | wc -l
25990891
TASK DETAIL
https://phabricator.wikimedia.org/T219123
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: Addshore
Cc: jcrespo, WMDE-leszek, Ladsgroup, Addshore, Jdforrester-WMF, ArielGlenn,
Aklapper, alaa_wmde, Hazizibinmahdi, Iflorez, darthmon_wmde, Nandana, Lahi,
Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS,
Jonas, Wikidata-bugs, aude, Lydia_Pintscher, Mbch331
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs