ArielGlenn added a comment.

  I took to brute force approach of writing all  queries to a log file by 
adding the appropriate fopen/fputs/fclose in Database::select (live on 
snapshot1010, testbed host). I then ran:
  
    dumpsgen@snapshot1010:/srv/mediawiki$ /usr/bin/php7.2 
/srv/mediawiki/multiversion/MWScript.php maintenance/categoryChangesAsRdf.php 
--wiki=commonswiki -s 20200815200001 -e 20200817050001  | gzip > 
/srv/tmp/categories-out.gz
  
  I examined the output and found numerous examples of queries with the ' ' 
string in them (without the space).
  
  The following two queries were back-to-back, indicating that one was used to 
generate input for the next:
  
    SELECT  page_id,cat_title AS 
`rc_title`,pp_propname,cat_pages,cat_subcats,cat_files  FROM `category` LEFT 
JOIN `page` ON ((page_title = cat_title) AND page_namespace = 14) LEFT JOIN 
`page_props` ON (pp_propname = 'hiddencat' AND (pp_page = page_id))   WHERE 
cat_title IN 
('Bridges_over_Kunar_River_(Pakistan)','People_of_the_University_of_Wyoming','University_of_Wyoming','Bus_routes_numbered_144','Churches_in_the_Roman_Catholic_Archdiocese_of_Benevento','August_2020_in_Cardiff','Cardiff_Coach_Station,_Sophia_Gardens','Bus_stations_in_Cardiff','Sophia_Gardens','Logos_of_companies_based_in_Mecklenburg-Vorpommern','Rameswaram','Media_needing_categories_as_of_18_March_2018','All_media_needing_categories_as_of_2018','Pages_with_local_object_coordinates_and_missing_SDC_coordinates','CC-BY-SA-4.0','Self-published_work','Photographs_by_LigaDue','Civitella_Marittima','Pages_with_maps','Scans_from_the_Internet_Archive','CC-PD-Mark','PD_US_Government','FEDLINK_-_United_States_Federal_Collection','Books_uploaded_by_Fæ','Files_with_no_machine-readable_author','Former_bus_lines_in_Budapest','Bus_lines_in_Budapest','Plzeň_1','Plzeň','Plzeň-City_District','Kaufland_Plzeň-Roudná','Epta_Piges_(Rhodes)','PD_US_expired','Books_in_the_Library_of_Congress','Trains_at_Inuyama_Yuen_Station','Inuyama_Yuen_Station','People_in_1910','2_men','OCR_detected_cover_page','1910_photographs','Iwakura_Station_(Aichi)','Unidentified_subjects_in_Japan','名古屋鉄道の画像','駅名板画像','Alumni_of_the_University_of_Wyoming','Lety_memorial','Cultural_buildings_in_Burgos','Iwateken_Kotsu','岩手県交通の画像','Piet_Retief,_Mpumalanga','Quality_images_missing_SDC_source_of_file','Quality_images_missing_SDC_copyright_status','Quality_images_missing_SDC_copyright_license','Quality_images_missing_SDC_inception','Media_requiring_renaming','Media_requiring_renaming_-_rationale_6','Bus_routes_numbered_148','Stained-glass_windows_in_Burgenland','Stained-glass_windows_in_Austria_by_district','Rust_(Burgenland)','PD_NASA','Tropical_Storm_Josephine_(2020)','Quality_images_missing_SDC_Commons_quality_assessment','PD-old-100-expired','Medical_Heritage_Library','Nominated_valued_image_candidates','Iwate_Kyūkō_Bus','バス画像','Bus_routes_numbered_149','Quality_images_missing_SDC_creator','Bus_routes_numbered_150','1926-03-27','Breda,_Netherlands','一関市の画像','Bus_routes_numbered_147','Hernán_Cortés','Augusto_Belvedere','Ichinoseki_Station','1926_photographs','Items_with_OTRS_permission_confirmed','Files_with_PermissionOTRS_template_but_without_P6305_SDC_statement','Stolpersteine_in_Oslo-Gamle','Images_uploaded_by_Donna_Gedenk','Pages_with_local_camera_coordinates_and_missing_SDC_coordinates','1926_photographs_of_the_United_States','Schools_in_Quebec_City','Railway_photographs_by_Geof_Sheppard','Photographs_by_Geof_Sheppard')
    
    SELECT  cl_from,cl_to  FROM `categorylinks`    WHERE cl_type = 'subcat' AND 
cl_from IN 
(16427435,77160905,29237265,5273988,93171207,8292833,49598671,48452708,73514884,73514913,93141746,73514933,73514942,5229557,65375295,89119256,49325694,2371050,11740061,71765819,2581799,12178689,16468547,3355416,92207293,56860321,45788180,4127763,47563334,102952,4314089,25108543,93119689,5062995,2255349,6788554,'',62189827,93056961)
   ORDER BY cl_from ASC,cl_to ASC LIMIT 200
  
  And lo and behold, when I run the first query, what do I get:
  
    
+----------+--------------------------------------------------------------------+-------------+-----------+-------------+-----------+
    | page_id  | rc_title                                                       
    | pp_propname | cat_pages | cat_subcats | cat_files |
    
+----------+--------------------------------------------------------------------+-------------+-----------+-------------+-----------+
    |  8057514 | 1910_photographs                                               
    | NULL        |       509 |          12 |       497 |
    | 93176596 | 1926-03-27                                                     
    | NULL        |         3 |           0 |         3 |
    ... other normal-looking stuff ...
    | 93137742 | Stained-glass_windows_in_Austria_by_district                   
    | NULL        |        98 |          98 |         0 |
    | 24821400 | Stained-glass_windows_in_Burgenland                            
    | NULL        |         7 |           7 |         0 |
    |     NULL | Stolpersteine_in_Oslo-Gamle                                    
    | NULL        |        51 |           0 |        51 |
    | 62189827 | Trains_at_Inuyama_Yuen_Station                                 
    | NULL        |        15 |           0 |        15 |
    | 93056961 | Tropical_Storm_Josephine_(2020)                                
    | NULL        |        37 |           0 |        37 |
    | 18050441 | Unidentified_subjects_in_Japan                                 
    | NULL        |       494 |          13 |       481 |
    |  4892503 | University_of_Wyoming                                          
    | NULL        |        65 |           7 |        58 |
    |     NULL | バス画像                                                           
| NULL        |         1 |           0 |         1 |
    |     NULL | 一関市の画像                                                       | 
NULL        |         2 |           0 |         2 |
    |     NULL | 名古屋鉄道の画像                                                   | 
NULL        |         3 |           0 |         3 |
    |     NULL | 駅名板画像                                                         
| NULL        |        14 |           0 |        14 |
    
+----------+--------------------------------------------------------------------+-------------+-----------+-------------+-----------+
    85 rows in set (0.03 sec)
  
  NULLs. for page id. Here's a sample: 
https://commons.wikimedia.org/wiki/Category:Stolpersteine_in_Oslo-Gamle  Things 
in the category indeed but the page does not yet exist. We need to filter those 
out from the subsequent query.

TASK DETAIL
  https://phabricator.wikimedia.org/T260232

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: ArielGlenn
Cc: ArielGlenn, CBogen, Cparle, Umherirrender, DannyS712, Naike, WDoranWMF, 
Krinkle, aaron, Reedy, Ladsgroup, Aklapper, Marostegui, XeroS_SkalibuR, 
jannee_e, Akuckartz, Adidsone1, darthmon_wmde, holger.knust, EvanProdromou, 
Nandana, Namenlos314, Phukettaxigroup, Lahi, Gq86, Darkminds3113, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, Jayprakash12345, Lunewa, QZanden, 
EBjune, merbst, LawExplorer, Vali.matei, _jensen, rosalieper, Agabi10, 
Scott_WUaS, Pchelolo, Jonas, Xmlizer, Volker_E, gnosygnu, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, GWicke, Dcljr, Dinoguy1000, 
Manybubbles, Mbch331, Rxy, Jay8g
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to