[Wikidata-bugs] [Maniphest] [Commented On] T165311: Investigate title normalization clashes
Addshore added a comment. @daniel is there something further we want to action here as a result of the investigation?TASK DETAILhttps://phabricator.wikimedia.org/T165311EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: AddshoreCc: Ladsgroup, PokestarFan, Addshore, Lydia_Pintscher, Lea_Lacroix_WMDE, Aklapper, daniel, Cinemantique, GoranSMilovanovic, QZanden, Thibaut120094, Izno, Wikidata-bugs, aude, GPHemsley, Shizhao, Nemo_bis, Darkdadaah, Mbch331, Krenair___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T165311: Investigate title normalization clashes
daniel added a comment. ruwiktionary: mysql:wikiadmin@10.64.16.18 [cognate_wiktionary]> SELECT a.cgti_raw, b.cgti_raw FROM cognate_titles as a JOIN cognate_titles as b ON a.cgti_normalized_key = b.cgti_normalized_key JOIN cognate_pages as p ON p.cgpa_title = a.cgti_raw_key and p.cgpa_namespace = 0 and p.cgpa_site = -235854953179375905 JOIN cognate_pages as q ON q.cgpa_title = b.cgti_raw_key and q.cgpa_namespace = 0 and q.cgpa_site = p.cgpa_site WHERE a.cgti_raw_key < b.cgti_raw_key limit 30; +-+---+ | cgti_raw| cgti_raw | +-+---+ | misk'i | misk’i| | sil'm | sil’m | | arc’hant| arc'hant | | маловір’я | маловір'я | | erc’h | erc'h | | мар’| мар' | | п’ятниця| п'ятниця | | saba’ | saba' | | хэм'| хэм’ | +-+---+ 9 rows in set (35.87 sec)TASK DETAILhttps://phabricator.wikimedia.org/T165311EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: danielCc: Addshore, Lydia_Pintscher, Lea_Lacroix_WMDE, Aklapper, daniel, GoranSMilovanovic, QZanden, Thibaut120094, Izno, Wikidata-bugs, aude, GPHemsley, Darkdadaah, Mbch331, Krenair___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T165311: Investigate title normalization clashes
daniel added a comment. zhwiktionary: mysql:wikiadmin@10.64.16.18 [cognate_wiktionary]> SELECT a.cgti_raw, b.cgti_raw FROM cognate_titles as a JOIN cognate_titles as b ON a.cgti_normalized_key = b.cgti_normalized_key JOIN cognate_pages as p ON p.cgpa_title = a.cgti_raw_key and p.cgpa_namespace = 0 and p.cgpa_site = 396207730596080646 JOIN cognate_pages as q ON q.cgpa_title = b.cgti_raw_key and q.cgpa_namespace = 0 and q.cgpa_site = p.cgpa_site WHERE a.cgti_raw_key < b.cgti_raw_key limit 30; ++--+ | cgti_raw | cgti_raw | ++--+ | D'Arsonval_galvanometer| D’Arsonval_galvanometer | | earth's_magnetic_field | earth’s_magnetic_field | | subscriber's_extension_station | subscriber’s_extension_station | | Maxwell’s_equation | Maxwell's_equation | | 9’s_complement | 9's_complement | | driller’s_log | driller's_log| | Joule’s_law| Joule's_law | | Fick’s_equation| Fick's_equation | | Babbage’s_analytical_engine| Babbage's_analytical_engine | | Maxwell's_law | Maxwell’s_law| | Coulomb’s_law | Coulomb's_law| | Cramer’s_rule | Cramer's_rule| | Joule's_equivalent | Joule’s_equivalent | | Loschmidt's_numeral| Loschmidt’s_numeral | | Avogadro's_number | Avogadro’s_number| | Ruhmkorff’s_coil | Ruhmkorff's_coil | | Duddell's_thermo-galvanometer | Duddell’s_thermo-galvanometer| | McMillan's_inequality | McMillan’s_inequality| | Lenz's_law | Lenz’s_law | | Kirchhoff’s_law| Kirchhoff's_law | | Ampere's_law | Ampere’s_law | | Ohm’s_law | Ohm's_law| | Weber’s_theory_of_magnetism| Weber's_theory_of_magnetism | | 10's_complement| 10’s_complement | | 1’s_complement | 1's_complement | | Kelvin’s_law | Kelvin's_law | | Steinmetz's_law| Steinmetz’s_law | | 2's_complement | 2’s_complement | ++--+ 28 rows in set (37.23 sec)TASK DETAILhttps://phabricator.wikimedia.org/T165311EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: danielCc: Addshore, Lydia_Pintscher, Lea_Lacroix_WMDE, Aklapper, daniel, GoranSMilovanovic, QZanden, Thibaut120094, Izno, Wikidata-bugs, aude, GPHemsley, Darkdadaah, Mbch331, Krenair___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T165311: Investigate title normalization clashes
daniel added a comment. eswiktionary: mysql:wikiadmin@10.64.16.18 [cognate_wiktionary]> SELECT a.cgti_raw, b.cgti_raw FROM cognate_titles as a JOIN cognate_titles as b ON a.cgti_normalized_key = b.cgti_normalized_key JOIN cognate_pages as p ON p.cgpa_title = a.cgti_raw_key and p.cgpa_namespace = 0 and p.cgpa_site = 2916682937954058841 JOIN cognate_pages as q ON q.cgpa_title = b.cgti_raw_key and q.cgpa_namespace = 0 and q.cgpa_site = p.cgpa_site WHERE a.cgti_raw_key < b.cgti_raw_key; +--+--+ | cgti_raw | cgti_raw | +--+--+ | ... | …| | ik’ | ik' | +--+--+ 2 rows in set (37.16 sec)TASK DETAILhttps://phabricator.wikimedia.org/T165311EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: danielCc: Addshore, Lydia_Pintscher, Lea_Lacroix_WMDE, Aklapper, daniel, GoranSMilovanovic, QZanden, Thibaut120094, Izno, Wikidata-bugs, aude, GPHemsley, Darkdadaah, Mbch331, Krenair___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T165311: Investigate title normalization clashes
daniel added a comment. count for shwiktionmary: mysql:wikiadmin@10.64.16.18 [cognate_wiktionary]> SELECT count(*) FROM cognate_titles as a JOIN cognate_titles as b ON a.cgti_normalized_key = b.cgti_normalized_key JOIN cognate_pages as p ON p.cgpa_title = a.cgti_raw_key and p.cgpa_namespace = 0 and p.cgpa_site = 4903199207837476164 JOIN cognate_pages as q ON q.cgpa_title = b.cgti_raw_key and q.cgpa_namespace = 0 and q.cgpa_site = p.cgpa_site WHERE a.cgti_raw_key < b.cgti_raw_key; +--+ | count(*) | +--+ |0 | +--+ 1 row in set (45.44 sec)TASK DETAILhttps://phabricator.wikimedia.org/T165311EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: danielCc: Addshore, Lydia_Pintscher, Lea_Lacroix_WMDE, Aklapper, daniel, GoranSMilovanovic, QZanden, Thibaut120094, Izno, Wikidata-bugs, aude, GPHemsley, Darkdadaah, Mbch331, Krenair___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T165311: Investigate title normalization clashes
daniel added a comment. count for mgwiktionary: mysql:wikiadmin@10.64.16.18 [cognate_wiktionary]> SELECT count(*) FROM cognate_titles as a JOIN cognate_titles as b ON a.cgti_normalized_key = b.cgti_normalized_key JOIN cognate_pages as p ON p.cgpa_title = a.cgti_raw_key and p.cgpa_namespace = 0 and p.cgpa_site = 8120841685256385134 JOIN cognate_pages as q ON q.cgpa_title = b.cgti_raw_key and q.cgpa_namespace = 0 and q.cgpa_site = p.cgpa_site WHERE a.cgti_raw_key < b.cgti_raw_key; +--+ | count(*) | +--+ | 146 | +--+ 1 row in set (43.34 sec)TASK DETAILhttps://phabricator.wikimedia.org/T165311EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: danielCc: Addshore, Lydia_Pintscher, Lea_Lacroix_WMDE, Aklapper, daniel, GoranSMilovanovic, QZanden, Thibaut120094, Izno, Wikidata-bugs, aude, GPHemsley, Darkdadaah, Mbch331, Krenair___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T165311: Investigate title normalization clashes
daniel added a comment. viwiktionary: mysql:wikiadmin@10.64.16.18 [cognate_wiktionary]> SELECT count(*) FROM cognate_titles as a JOIN cognate_titles as b ON a.cgti_normalized_key = b.cgti_normalized_key JOIN cognate_pages as p ON p.cgpa_title = a.cgti_raw_key and p.cgpa_namespace = 0 and p.cgpa_site = 4760335324028501060 JOIN cognate_pages as q ON q.cgpa_title = b.cgti_raw_key and q.cgpa_namespace = 0 and q.cgpa_site = 4760335324028501060 WHERE a.cgti_raw_key < b.cgti_raw_key; +--+ | count(*) | +--+ |0 | +--+ 1 row in set (43.01 sec)TASK DETAILhttps://phabricator.wikimedia.org/T165311EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: danielCc: Addshore, Lydia_Pintscher, Lea_Lacroix_WMDE, Aklapper, daniel, GoranSMilovanovic, QZanden, Thibaut120094, Izno, Wikidata-bugs, aude, GPHemsley, Darkdadaah, Mbch331, Krenair___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T165311: Investigate title normalization clashes
daniel added a comment. For frwiktionary: mysql:wikiadmin@10.64.16.18 [cognate_wiktionary]> SELECT a.cgti_raw, b.cgti_raw FROM cognate_titles as a JOIN cognate_titles as b ON a.cgti_normalized_key = b.cgti_normalized_key JOIN cognate_pages as p ON p.cgpa_title = a.cgti_raw_key and p.cgpa_namespace = 0 and p.cgpa_site = 2097444195020099748 JOIN cognate_pages as q ON q.cgpa_title = b.cgti_raw_key and q.cgpa_namespace = 0 and q.cgpa_site = 2097444195020099748 WHERE a.cgti_raw_key < b.cgti_raw_key; +---+-+ | cgti_raw | cgti_raw| +---+-+ | Palazzolo_sull’Oglio | Palazzolo_sull'Oglio| | Urago_d'Oglio | Urago_d’Oglio | | ... | … | | Vezza_d’Oglio | Vezza_d'Oglio | | sms’en| sms'en | | 'e| ’e | | 'o| ’o | | Monteleone_d'Orvieto | Monteleone_d’Orvieto| | ’ | ' | | Quinzano_d'Oglio | Quinzano_d’Oglio| | o’| o' | | ’a| 'a | | Robecco_d'Oglio | Robecco_d’Oglio | | Scandolara_Ripa_d’Oglio | Scandolara_Ripa_d'Oglio | +---+-+ 14 rows in set (37.42 sec)TASK DETAILhttps://phabricator.wikimedia.org/T165311EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: danielCc: Addshore, Lydia_Pintscher, Lea_Lacroix_WMDE, Aklapper, daniel, GoranSMilovanovic, QZanden, Thibaut120094, Izno, Wikidata-bugs, aude, GPHemsley, Darkdadaah, Mbch331, Krenair___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T165311: Investigate title normalization clashes
daniel added a comment. I checked for dewiktionary, and found 45 clashes. All the ones I checked are duplicates and should be fixed. Most seem to be Korean character transcription mysql:wikiadmin@10.64.16.18 [cognate_wiktionary]> SELECT a.cgti_raw, b.cgti_raw FROM cognate_titles as a JOIN cognate_titles as b ON a.cgti_normalized_key = b.cgti_normalized_key JOIN cognate_pages as p ON p.cgpa_title = a.cgti_raw_key and p.cgpa_namespace = 0 and p.cgpa_site = -3742436511788647340 JOIN cognate_pages as q ON q.cgpa_title = b.cgti_raw_key and q.cgpa_namespace = 0 and q.cgpa_site = -3742436511788647340 WHERE a.cgti_raw_key < b.cgti_raw_key -> ; ++---+ | cgti_raw | cgti_raw | ++---+ | p’ya | p'ya | | ch'a | ch’a | | ch’ŏ | ch'ŏ | | ch’o | ch'o | | p'ae | p’ae | | yujach'a | yujach’a | | p'u| p’u | | ch'e | ch’e | | p’yo | p'yo | | p'e| p’e | | ch'ŏl | ch’ŏl | | ch’i | ch'i | | p'urŭda| p’urŭda | | mach’ŏllu | mach'ŏllu | | t’i| t'i | | p'yŏ | p’yŏ | | t’a| t'a | | t'yu | t’yu | | ch’ŏngdong | ch'ŏngdong| | p'yu | p’yu | | ch’ae | ch'ae | | ch'ŏng | ch’ŏng| | p’o| p'o | | ch’ŏldo| ch'ŏldo | | p'wi | p’wi | | ch’ŏlto| ch'ŏlto | | p’ŏ| p'ŏ | | t'u| t’u | | Saint_John’s | Saint_John's | | p'oe | p’oe | | t'o| t’o | | p’a| p'a | | ellibeit'ŏ | ellibeit’ŏ| | t'anso | t’anso| | t'wi | t’wi | | t'ae | t’ae | | t'ŏ| t’ŏ | | p'ŭ| p’ŭ | | p’i| p'i | | ch’u | ch'u | | t'ŭ| t’ŭ | | kimch'i| kimch’i | | t’oe | t'oe | | ch'ŭ | ch’ŭ | | t'e| t’e | ++---+ 45 rows in set (36.65 sec)TASK DETAILhttps://phabricator.wikimedia.org/T165311EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: danielCc: Addshore, Lydia_Pintscher, Lea_Lacroix_WMDE, Aklapper, daniel, GoranSMilovanovic, QZanden, Thibaut120094, Izno, Wikidata-bugs, aude, GPHemsley, Darkdadaah, Mbch331, Krenair___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T165311: Investigate title normalization clashes
daniel added a comment. Checked for enwiktionary, found 6 pairs of pages: mysql:wikiadmin@10.64.16.18 [cognate_wiktionary]> SELECT a.cgti_raw, b.cgti_raw -> FROM cognate_titles as a -> JOIN cognate_titles as b ON a.cgti_normalized_key = b.cgti_normalized_key -> JOIN cognate_pages as p ON p.cgpa_title = a.cgti_raw_key -> and p.cgpa_namespace = 0 and p.cgpa_site = 8711873510529828948 -> JOIN cognate_pages as q ON q.cgpa_title = b.cgti_raw_key -> and q.cgpa_namespace = 0 and q.cgpa_site = 8711873510529828948 -> WHERE a.cgti_raw_key < b.cgti_raw_key -> LIMIT 10; +---+-+ | cgti_raw | cgti_raw| +---+-+ | дев'ятнадцять | дев’ятнадцять | | ... | … | | '_' | ’_’ | | ’ | ' | | lu’um | lu'um | | ni' | ni’ | +---+-+ 6 rows in set (39.45 sec)TASK DETAILhttps://phabricator.wikimedia.org/T165311EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: danielCc: Addshore, Lydia_Pintscher, Lea_Lacroix_WMDE, Aklapper, daniel, GoranSMilovanovic, QZanden, Thibaut120094, Izno, Wikidata-bugs, aude, GPHemsley, Darkdadaah, Mbch331, Krenair___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs