[Wikidata-bugs] [Maniphest] [Created] T191639: Wikidata JSON dumps do not have the 'ns' (namespace)

2018-04-06 Thread marcmiquel
marcmiquel created this task.marcmiquel added projects: Wikidata, Dumps-Generation, Datasets-General-or-Unknown. TASK DESCRIPTIONI want to process the Wikidata dump and filter the qitems which have not a 0 (article) namespace. Apparently, it seems that the JSON provided in the dump does

[Wikidata-bugs] [Maniphest] [Commented On] T191639: Wikidata JSON dumps do not have the 'ns' (namespace)

2019-02-08 Thread marcmiquel
marcmiquel added a comment. The use case is to process the dumps and filter out qitems which do not relate to articles, this is why we put NS0. The JSON dump sample says there is ns field but in the final dump there is no such field.TASK DETAILhttps://phabricator.wikimedia.org/T191639EMAIL

[Wikidata-bugs] [Maniphest] [Commented On] T191639: Wikidata JSON dumps do not have the 'ns' (namespace)

2019-02-15 Thread marcmiquel
marcmiquel added a comment. I need all the Wikidata qitems that relate to Wikipedia articles. If I understand it correctly, these are qitems that have namespace 0. Although not all qitems with namespace 0 necessarily have sitelinks (they could be just qitems without an article). The thing

[Wikidata-bugs] [Maniphest] [Commented On] T191639: Wikidata JSON dumps do not have the 'ns' (namespace)

2020-05-04 Thread marcmiquel
marcmiquel added a comment. I assumed that Categories and Wikipedia: pages coming from language editions would maintain the ns from their origin wiki. We can close this. Now it is clear. Thanks, Addshore. TASK DETAIL https://phabricator.wikimedia.org/T191639 EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] [Commented On] T191639: Wikidata JSON dumps do not have the 'ns' (namespace)

2020-04-30 Thread marcmiquel
marcmiquel added a comment. The wikidata dump still does not include the namespace tag. It is specified in the JSON DataModel and it would be useful for the same use I explained in this task. https://www.mediawiki.org/wiki/Wikibase/DataModel/JSON Could you give me an update

[Wikidata-bugs] [Maniphest] T264850: Categorylinks dump might have some problem with the encoding

2020-10-09 Thread marcmiquel
marcmiquel added a comment. Thank you @ArielGlenn and @Lucas_Werkmeister_WMDE, So, to explain what I am doing ( https://pastebin.com/kPrwQ0Lb ). I am first collecting all the categories from the page dump and put them into some dictionaries. Then, I am parsing the categorylinks