ArielGlenn added a comment.
I took to brute force approach of writing all queries to a log file by
adding the appropriate fopen/fputs/fclose in Database::select (live on
snapshot1010, testbed host). I then ran:
dumpsgen@snapshot1010:/srv/mediawiki$ /usr/bin/php7.2
/srv/mediawiki/multiversion/MWScript.php maintenance/categoryChangesAsRdf.php
--wiki=commonswiki -s 20200815200001 -e 20200817050001 | gzip >
/srv/tmp/categories-out.gz
I examined the output and found numerous examples of queries with the ' '
string in them (without the space).
The following two queries were back-to-back, indicating that one was used to
generate input for the next:
SELECT page_id,cat_title AS
`rc_title`,pp_propname,cat_pages,cat_subcats,cat_files FROM `category` LEFT
JOIN `page` ON ((page_title = cat_title) AND page_namespace = 14) LEFT JOIN
`page_props` ON (pp_propname = 'hiddencat' AND (pp_page = page_id)) WHERE
cat_title IN
('Bridges_over_Kunar_River_(Pakistan)','People_of_the_University_of_Wyoming','University_of_Wyoming','Bus_routes_numbered_144','Churches_in_the_Roman_Catholic_Archdiocese_of_Benevento','August_2020_in_Cardiff','Cardiff_Coach_Station,_Sophia_Gardens','Bus_stations_in_Cardiff','Sophia_Gardens','Logos_of_companies_based_in_Mecklenburg-Vorpommern','Rameswaram','Media_needing_categories_as_of_18_March_2018','All_media_needing_categories_as_of_2018','Pages_with_local_object_coordinates_and_missing_SDC_coordinates','CC-BY-SA-4.0','Self-published_work','Photographs_by_LigaDue','Civitella_Marittima','Pages_with_maps','Scans_from_the_Internet_Archive','CC-PD-Mark','PD_US_Government','FEDLINK_-_United_States_Federal_Collection','Books_uploaded_by_Fæ','Files_with_no_machine-readable_author','Former_bus_lines_in_Budapest','Bus_lines_in_Budapest','Plzeň_1','Plzeň','Plzeň-City_District','Kaufland_Plzeň-Roudná','Epta_Piges_(Rhodes)','PD_US_expired','Books_in_the_Library_of_Congress','Trains_at_Inuyama_Yuen_Station','Inuyama_Yuen_Station','People_in_1910','2_men','OCR_detected_cover_page','1910_photographs','Iwakura_Station_(Aichi)','Unidentified_subjects_in_Japan','名古屋鉄道の画像','駅名板画像','Alumni_of_the_University_of_Wyoming','Lety_memorial','Cultural_buildings_in_Burgos','Iwateken_Kotsu','岩手県交通の画像','Piet_Retief,_Mpumalanga','Quality_images_missing_SDC_source_of_file','Quality_images_missing_SDC_copyright_status','Quality_images_missing_SDC_copyright_license','Quality_images_missing_SDC_inception','Media_requiring_renaming','Media_requiring_renaming_-_rationale_6','Bus_routes_numbered_148','Stained-glass_windows_in_Burgenland','Stained-glass_windows_in_Austria_by_district','Rust_(Burgenland)','PD_NASA','Tropical_Storm_Josephine_(2020)','Quality_images_missing_SDC_Commons_quality_assessment','PD-old-100-expired','Medical_Heritage_Library','Nominated_valued_image_candidates','Iwate_Kyūkō_Bus','バス画像','Bus_routes_numbered_149','Quality_images_missing_SDC_creator','Bus_routes_numbered_150','1926-03-27','Breda,_Netherlands','一関市の画像','Bus_routes_numbered_147','Hernán_Cortés','Augusto_Belvedere','Ichinoseki_Station','1926_photographs','Items_with_OTRS_permission_confirmed','Files_with_PermissionOTRS_template_but_without_P6305_SDC_statement','Stolpersteine_in_Oslo-Gamle','Images_uploaded_by_Donna_Gedenk','Pages_with_local_camera_coordinates_and_missing_SDC_coordinates','1926_photographs_of_the_United_States','Schools_in_Quebec_City','Railway_photographs_by_Geof_Sheppard','Photographs_by_Geof_Sheppard')
SELECT cl_from,cl_to FROM `categorylinks` WHERE cl_type = 'subcat' AND
cl_from IN
(16427435,77160905,29237265,5273988,93171207,8292833,49598671,48452708,73514884,73514913,93141746,73514933,73514942,5229557,65375295,89119256,49325694,2371050,11740061,71765819,2581799,12178689,16468547,3355416,92207293,56860321,45788180,4127763,47563334,102952,4314089,25108543,93119689,5062995,2255349,6788554,'',62189827,93056961)
ORDER BY cl_from ASC,cl_to ASC LIMIT 200
And lo and behold, when I run the first query, what do I get:
+----------+--------------------------------------------------------------------+-------------+-----------+-------------+-----------+
| page_id | rc_title
| pp_propname | cat_pages | cat_subcats | cat_files |
+----------+--------------------------------------------------------------------+-------------+-----------+-------------+-----------+
| 8057514 | 1910_photographs
| NULL | 509 | 12 | 497 |
| 93176596 | 1926-03-27
| NULL | 3 | 0 | 3 |
... other normal-looking stuff ...
| 93137742 | Stained-glass_windows_in_Austria_by_district
| NULL | 98 | 98 | 0 |
| 24821400 | Stained-glass_windows_in_Burgenland
| NULL | 7 | 7 | 0 |
| NULL | Stolpersteine_in_Oslo-Gamle
| NULL | 51 | 0 | 51 |
| 62189827 | Trains_at_Inuyama_Yuen_Station
| NULL | 15 | 0 | 15 |
| 93056961 | Tropical_Storm_Josephine_(2020)
| NULL | 37 | 0 | 37 |
| 18050441 | Unidentified_subjects_in_Japan
| NULL | 494 | 13 | 481 |
| 4892503 | University_of_Wyoming
| NULL | 65 | 7 | 58 |
| NULL | バス画像
| NULL | 1 | 0 | 1 |
| NULL | 一関市の画像 |
NULL | 2 | 0 | 2 |
| NULL | 名古屋鉄道の画像 |
NULL | 3 | 0 | 3 |
| NULL | 駅名板画像
| NULL | 14 | 0 | 14 |
+----------+--------------------------------------------------------------------+-------------+-----------+-------------+-----------+
85 rows in set (0.03 sec)
NULLs. for page id. Here's a sample:
https://commons.wikimedia.org/wiki/Category:Stolpersteine_in_Oslo-Gamle Things
in the category indeed but the page does not yet exist. We need to filter those
out from the subsequent query.
TASK DETAIL
https://phabricator.wikimedia.org/T260232
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: ArielGlenn
Cc: ArielGlenn, CBogen, Cparle, Umherirrender, DannyS712, Naike, WDoranWMF,
Krinkle, aaron, Reedy, Ladsgroup, Aklapper, Marostegui, XeroS_SkalibuR,
jannee_e, Akuckartz, Adidsone1, darthmon_wmde, holger.knust, EvanProdromou,
Nandana, Namenlos314, Phukettaxigroup, Lahi, Gq86, Darkminds3113,
Lucas_Werkmeister_WMDE, GoranSMilovanovic, Jayprakash12345, Lunewa, QZanden,
EBjune, merbst, LawExplorer, Vali.matei, _jensen, rosalieper, Agabi10,
Scott_WUaS, Pchelolo, Jonas, Xmlizer, Volker_E, gnosygnu, jkroll,
Wikidata-bugs, Jdouglas, aude, Tobias1984, GWicke, Dcljr, Dinoguy1000,
Manybubbles, Mbch331, Rxy, Jay8g
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs