marcmiquel added a comment.

  Thank you @ArielGlenn and @Lucas_Werkmeister_WMDE,
  
  So, to explain what I am doing ( https://pastebin.com/kPrwQ0Lb ).
  
  I am first collecting all the categories from the page dump and put them into 
some dictionaries.
  Then, I am parsing the categorylinks dump and I add the page_ids these 
categories contain.
  
  The problem is in the category titles in which there are these special 
characters. 
  The first dump seems to work, but the second shows these hex bytes.
  
  Perhaps it is something with how the second dump must be opened or read, but 
I cannot find a way to read it in ('utf-8'). I just put the print ('error') and 
I see many.
  What could I do? Thanks.

TASK DETAIL
  https://phabricator.wikimedia.org/T264850

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JAllemandou, marcmiquel
Cc: Lucas_Werkmeister_WMDE, ArielGlenn, Milimetric, Aklapper, marcmiquel, 
Strainu, jannee_e, CBogen, Akuckartz, 4748kitoko, darthmon_wmde, Nandana, 
Namenlos314, Akovalyov, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, EBjune, 
merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, gnosygnu, 
JAllemandou, terrrydactyl, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, 
Manybubbles, Mbch331, jeremyb
_______________________________________________
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to