I checked the files directly, both the pages.sql.gz and the
categorylinks.sql.gz files for 20170920.  The page is listed:

$ zcat enwiki-20170920-page.sql.gz | sed -e 's/),/),\n/g;' | grep
Computational_creativity | more
(16300571,0,'Computational_creativity','',0,0,0,0.718037721126,'20170903222622','20170903222623',798803037,59318,'wikitext',NULL),
(16390036,1,'Computational_creativity','',0,0,0,0.20741249006,'20170831064438','20170831084246',786288354,107057,'wikitext',NULL),

The first entry is the page, the second is the talk page.

$ zcat enwiki-20170920-categorylinks.sql.gz  | sed -e 's/),/),\n/g;' | grep
16300571 | cat -vte
(16300571,'All_NPOV_disputes','+C?EOM\'M7CA\'=^D+I/\'M7Q7MW^A^\^AM-^O^[','2013-01-27
10:43:57','','uca-default-u-kn','page'),$
(16300571,'All_articles_needing_additional_references','+C?EOM\'M7CA\'=^D+I/\'M7Q7MW^A^\^AM-^O^[','2013-05-19
16:52:06','','uca-default-u-kn','page'),$
(16300571,'All_articles_with_dead_external_links','+C?EOM\'M7CA\'=^D+I/\'M7Q7MW^A^\^AM-^O^[','2016-11-29
07:32:22','','uca-default-u-kn','page'),$
(16300571,'All_articles_with_unsourced_statements','+C?EOM\'M7CA\'=^D+I/\'M7Q7MW^A^\^AM-^O^[','2008-11-21
10:36:21','','uca-default-u-kn','page'),$
(16300571,'Areas_of_computer_science','+C?EOM\'M7CA\'=^D+I/\'M7Q7MW^A^\^AM-^O^[','2016-04-15
15:40:40','','uca-default-u-kn','page'),$
(16300571,'Articles_needing_additional_references_from_May_2013','+C?EOM\'M7CA\'=^D+I/\'M7Q7MW^A^\^AM-^O^[','2013-05-19
16:52:06','','uca-default-u-kn','page'),$
(16300571,'Articles_with_French-language_external_links','+C?EOM\'M7CA\'=^D+I/\'M7Q7MW^A^\^AM-^O^[','2013-06-20
04:05:59','','uca-default-u-kn','page'),$
(16300571,'Articles_with_dead_external_links_from_November_2016','+C?EOM\'M7CA\'=^D+I/\'M7Q7MW^A^\^AM-^O^[','2016-11-29
07:32:22','','uca-default-u-kn','page'),$
(16300571,'Articles_with_permanently_dead_external_links','+C?EOM\'M7CA\'=^D+I/\'M7Q7MW^A^\^AM-^O^[','2016-11-29
07:32:22','','uca-default-u-kn','page'),$
(16300571,'Articles_with_unsourced_statements_from_April_2015','+C?EOM\'M7CA\'=^D+I/\'M7Q7MW^A^\^AM-^O^[','2016-04-15
15:40:40','','uca-default-u-kn','page'),$
(16300571,'Articles_with_unsourced_statements_from_April_2016','+C?EOM\'M7CA\'=^D+I/\'M7Q7MW^A^\^AM-^O^[','2016-04-15
15:40:40','','uca-default-u-kn','page'),$
(16300571,'Articles_with_unsourced_statements_from_December_2015','+C?EOM\'M7CA\'=^D+I/\'M7Q7MW^A^\^AM-^O^[','2015-12-01
14:40:27','','uca-default-u-kn','page'),$
(16300571,'Articles_with_unsourced_statements_from_January_2010','+C?EOM\'M7CA\'=^D+I/\'M7Q7MW^A^\^AM-^O^[','2010-01-09
05:50:15','','uca-default-u-kn','page'),$
(16300571,'Articles_with_unsourced_statements_from_October_2016','+C?EOM\'M7CA\'=^D+I/\'M7Q7MW^A^\^AM-^O^[','2016-10-10
21:27:12','','uca-default-u-kn','page'),$
(16300571,'Artificial_intelligence','+C?EOM\'M7CA\'=^D+I/\'M7Q7MW^A^\^AM-^O^[','2008-03-19
03:45:58','','uca-default-u-kn','page'),$
(16300571,'Arts','+C?EOM\'M7CA\'=^D+I/\'M7Q7MW^A^\^AM-^O^[','2016-04-15
15:40:40','','uca-default-u-kn','page'),$
(16300571,'CS1_maint:_Extra_text:_authors_list','+C?EOM\'M7CA\'=^D+I/\'M7Q7MW^A^\^AM-^O^[','2017-06-04
08:45:09','','uca-default-u-kn','page'),$
(16300571,'Cognitive_psychology','+C?EOM\'M7CA\'=^D+I/\'M7Q7MW^A^\^AM-^O^[','2016-04-15
15:40:40','','uca-default-u-kn','page'),$
(16300571,'Computational_fields_of_study','+C?EOM\'M7CA\'=^D+I/\'M7Q7MW^A^\^AM-^O^[','2016-11-10
15:53:12','','uca-default-u-kn','page'),$
(16300571,'Creativity_techniques','+C?EOM\'M7CA\'=^D+I/\'M7Q7MW^A^\^AM-^O^[','2016-04-15
15:40:40','','uca-default-u-kn','page'),$
(16300571,'NPOV_disputes_from_January_2013','+C?EOM\'M7CA\'=^D+I/\'M7Q7MW^A^\^AM-^O^[','2013-05-19
15:48:55','','uca-default-u-kn','page'),$
(16300571,'Philosophical_movements','+C?EOM\'M7CA\'=^D+I/\'M7Q7MW^A^\^AM-^O^[','2017-01-07
20:24:38','','uca-default-u-kn','page'),$
(16300571,'Webarchive_template_wayback_links','+C?EOM\'M7CA\'=^D+I/\'M7Q7MW^A^\^AM-^O^[','2017-01-27
20:04:18','','uca-default-u-kn','page'),$
(16300571,'Wikipedia_articles_needing_clarification_from_November_2008','+C?EOM\'M7CA\'=^D+I/\'M7Q7MW^A^\^AM-^O^[','2009-02-13
10:49:28','','uca-default-u-kn','page'),$

That list of categorylinks entries matches your results.
Is it possible that your download of the pages.sql file is corrupted?  Do
the md5 sums check out?  Or perhaps it is an issue with the tools.

Ariel

On Wed, Nov 1, 2017 at 7:40 PM, Tilman Bayer <tba...@wikimedia.org> wrote:

> CCing the data dumps mailing list, which is the recommended venue for
> questions like this (https://meta.wikimedia.org/wi
> ki/Data_dumps#Where_to_go_for_help ).
>
> On Wed, Nov 1, 2017 at 8:44 AM, Shubhanshu Mishra <
> shubhanshumis...@gmail.com> wrote:
>
>> Also, important categories like Computer Architechture, Human based
>> computation, Programming language theory, Software Engineering, and Theory
>> of Computation, are missing from the subcategories of Areas of Computer
>> Science.
>>
>>
>> *Regards,*
>> *Shubhanshu Mishra*
>> Research Assistant,
>> iSchool at University of Illinois at Urbana-Champaign
>> --------------------------------------------------
>> *Website:* http://shubhanshu.com
>> *LinkedIn Profile: *http://www.linkedin.com/in/shubhanshumishra
>>
>> Blog <http://shubhanshu.com/blog>  || Facebook
>> <http://www.facebook.com/shubhanshu.mishra>  ||  Twitter
>> <http://www.twitter.com/TheShubhanshu>  || LinkedIn
>> <http://www.linkedin.com/in/shubhanshumishra>
>>
>> On Wed, Nov 1, 2017 at 10:42 AM, Shubhanshu Mishra <
>> shubhanshumis...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> When using the wikipedia dump files, I am unable to find many categories
>>> and pages in the dump.
>>>
>>> E.g. under the Areas_of_computer_science category I get only 13
>>> subcategories and 2 pages instead of 17 subcategories, 2 pages.
>>> Furthermore, 1 page "Computational_creativity" is not present as a
>>> subcategory.
>>>
>>> I am using the following wikipedia dump files to extract the
>>> categorylinks, and page details:
>>>
>>> 1.6G Sep 21   00:45 enwiki-20170920-page.sql.gz
>>> 21M Sep 21    00:45 enwiki-20170920-category.sql.gz
>>> 113M Sep 21   00:55 enwiki-20170920-redirect.sql.gz
>>> 2.2G Sep 21   03:10 enwiki-20170920-categorylinks.sql.gz
>>> 221M Sep 21   03:13 enwiki-20170920-page_props.sql.gz
>>>
>>>
>>> I use https://github.com/napsternxg/WikiUtils to parse the sql.gz dump
>>> files, but I also tried searching in the sql.gz files and couldn't find any
>>> entry for 16300571 in the page.sql.gz and in category.sql.gz
>>> files. 16300571 supposedly refers to the Computational_creativity page as
>>> the following categories are linked to this page:
>>>
>>> 16300571 'All_NPOV_disputes'    'page'
>>> 16300571 'All_articles_needing_additional_references'   'page'
>>> 16300571 'All_articles_with_dead_external_links'        'page'
>>> 16300571 'All_articles_with_unsourced_statements'       'page'
>>> 16300571 'Areas_of_computer_science'    'page'
>>> 16300571 'Articles_needing_additional_references_from_May_2013' 'page'
>>> 16300571 'Articles_with_French-language_external_links' 'page'
>>> 16300571 'Articles_with_dead_external_links_from_November_2016' 'page'
>>> 16300571 'Articles_with_permanently_dead_external_links'        'page'
>>> 16300571 'Articles_with_unsourced_statements_from_April_2015'   'page'
>>> 16300571 'Articles_with_unsourced_statements_from_April_2016'   'page'
>>> 16300571 'Articles_with_unsourced_statements_from_December_2015'
>>> 'page'
>>> 16300571 'Articles_with_unsourced_statements_from_January_2010' 'page'
>>> 16300571 'Articles_with_unsourced_statements_from_October_2016' 'page'
>>> 16300571 'Artificial_intelligence'      'page'
>>> 16300571 'Arts' 'page'
>>> 16300571 'CS1_maint:_Extra_text:_authors_list'  'page'
>>> 16300571 'Cognitive_psychology' 'page'
>>> 16300571 'Computational_fields_of_study'        'page'
>>> 16300571 'Creativity_techniques'        'page'
>>> 16300571 'NPOV_disputes_from_January_2013'      'page'
>>> 16300571 'Philosophical_movements'      'page'
>>> 16300571 'Webarchive_template_wayback_links'    'page'
>>> 16300571 'Wikipedia_articles_needing_clarification_from_November_2008'
>>> 'page'
>>>
>>> More details can be found at: https://twitter.com/TheShu
>>> bhanshu/status/925736635572072449
>>>
>>> Is there something, I am doing wrong, or are these rows just missing
>>> from the dumps.
>>>
>>>
>>>
>>>
>>>
>>> *Regards,*
>>> *Shubhanshu Mishra*
>>> Research Assistant,
>>> iSchool at University of Illinois at Urbana-Champaign
>>> --------------------------------------------------
>>> *Website:* http://shubhanshu.com
>>> *LinkedIn Profile: *http://www.linkedin.com/in/shubhanshumishra
>>>
>>> Blog <http://shubhanshu.com/blog>  || Facebook
>>> <http://www.facebook.com/shubhanshu.mishra>  ||  Twitter
>>> <http://www.twitter.com/TheShubhanshu>  || LinkedIn
>>> <http://www.linkedin.com/in/shubhanshumishra>
>>>
>>
>>
>> _______________________________________________
>> Analytics mailing list
>> analyt...@lists.wikimedia.org
>> https://lists.wikimedia.org/mailman/listinfo/analytics
>>
>>
>
>
> --
> Tilman Bayer
> Senior Analyst
> Wikimedia Foundation
> IRC (Freenode): HaeB
>
> _______________________________________________
> Xmldatadumps-l mailing list
> Xmldatadumps-l@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l
>
>
_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l

Reply via email to