marcmiquel added a comment.
Thank you @ArielGlenn and @Lucas_Werkmeister_WMDE, So, to explain what I am doing ( https://pastebin.com/kPrwQ0Lb ). I am first collecting all the categories from the page dump and put them into some dictionaries. Then, I am parsing the categorylinks dump and I add the page_ids these categories contain. The problem is in the category titles in which there are these special characters. The first dump seems to work, but the second shows these hex bytes. Perhaps it is something with how the second dump must be opened or read, but I cannot find a way to read it in ('utf-8'). I just put the print ('error') and I see many. What could I do? Thanks. TASK DETAIL https://phabricator.wikimedia.org/T264850 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: JAllemandou, marcmiquel Cc: Lucas_Werkmeister_WMDE, ArielGlenn, Milimetric, Aklapper, marcmiquel, Strainu, jannee_e, CBogen, Akuckartz, 4748kitoko, darthmon_wmde, Nandana, Namenlos314, Akovalyov, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, gnosygnu, JAllemandou, terrrydactyl, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb
_______________________________________________ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs