CCing the data dumps mailing list, which is the recommended venue for
questions like this (https://meta.wikimedia.org/wiki/Data_dumps#Where_to_go_
for_help ).

On Wed, Nov 1, 2017 at 8:44 AM, Shubhanshu Mishra <
shubhanshumis...@gmail.com> wrote:

> Also, important categories like Computer Architechture, Human based
> computation, Programming language theory, Software Engineering, and Theory
> of Computation, are missing from the subcategories of Areas of Computer
> Science.
>
>
> *Regards,*
> *Shubhanshu Mishra*
> Research Assistant,
> iSchool at University of Illinois at Urbana-Champaign
> --------------------------------------------------
> *Website:* http://shubhanshu.com
> *LinkedIn Profile: *http://www.linkedin.com/in/shubhanshumishra
>
> Blog <http://shubhanshu.com/blog>  || Facebook
> <http://www.facebook.com/shubhanshu.mishra>  ||  Twitter
> <http://www.twitter.com/TheShubhanshu>  || LinkedIn
> <http://www.linkedin.com/in/shubhanshumishra>
>
> On Wed, Nov 1, 2017 at 10:42 AM, Shubhanshu Mishra <
> shubhanshumis...@gmail.com> wrote:
>
>> Hi,
>>
>> When using the wikipedia dump files, I am unable to find many categories
>> and pages in the dump.
>>
>> E.g. under the Areas_of_computer_science category I get only 13
>> subcategories and 2 pages instead of 17 subcategories, 2 pages.
>> Furthermore, 1 page "Computational_creativity" is not present as a
>> subcategory.
>>
>> I am using the following wikipedia dump files to extract the
>> categorylinks, and page details:
>>
>> 1.6G Sep 21   00:45 enwiki-20170920-page.sql.gz
>> 21M Sep 21    00:45 enwiki-20170920-category.sql.gz
>> 113M Sep 21   00:55 enwiki-20170920-redirect.sql.gz
>> 2.2G Sep 21   03:10 enwiki-20170920-categorylinks.sql.gz
>> 221M Sep 21   03:13 enwiki-20170920-page_props.sql.gz
>>
>>
>> I use https://github.com/napsternxg/WikiUtils to parse the sql.gz dump
>> files, but I also tried searching in the sql.gz files and couldn't find any
>> entry for 16300571 in the page.sql.gz and in category.sql.gz
>> files. 16300571 supposedly refers to the Computational_creativity page as
>> the following categories are linked to this page:
>>
>> 16300571 'All_NPOV_disputes'    'page'
>> 16300571 'All_articles_needing_additional_references'   'page'
>> 16300571 'All_articles_with_dead_external_links'        'page'
>> 16300571 'All_articles_with_unsourced_statements'       'page'
>> 16300571 'Areas_of_computer_science'    'page'
>> 16300571 'Articles_needing_additional_references_from_May_2013' 'page'
>> 16300571 'Articles_with_French-language_external_links' 'page'
>> 16300571 'Articles_with_dead_external_links_from_November_2016' 'page'
>> 16300571 'Articles_with_permanently_dead_external_links'        'page'
>> 16300571 'Articles_with_unsourced_statements_from_April_2015'   'page'
>> 16300571 'Articles_with_unsourced_statements_from_April_2016'   'page'
>> 16300571 'Articles_with_unsourced_statements_from_December_2015'
>> 'page'
>> 16300571 'Articles_with_unsourced_statements_from_January_2010' 'page'
>> 16300571 'Articles_with_unsourced_statements_from_October_2016' 'page'
>> 16300571 'Artificial_intelligence'      'page'
>> 16300571 'Arts' 'page'
>> 16300571 'CS1_maint:_Extra_text:_authors_list'  'page'
>> 16300571 'Cognitive_psychology' 'page'
>> 16300571 'Computational_fields_of_study'        'page'
>> 16300571 'Creativity_techniques'        'page'
>> 16300571 'NPOV_disputes_from_January_2013'      'page'
>> 16300571 'Philosophical_movements'      'page'
>> 16300571 'Webarchive_template_wayback_links'    'page'
>> 16300571 'Wikipedia_articles_needing_clarification_from_November_2008'
>> 'page'
>>
>> More details can be found at: https://twitter.com/TheShu
>> bhanshu/status/925736635572072449
>>
>> Is there something, I am doing wrong, or are these rows just missing from
>> the dumps.
>>
>>
>>
>>
>>
>> *Regards,*
>> *Shubhanshu Mishra*
>> Research Assistant,
>> iSchool at University of Illinois at Urbana-Champaign
>> --------------------------------------------------
>> *Website:* http://shubhanshu.com
>> *LinkedIn Profile: *http://www.linkedin.com/in/shubhanshumishra
>>
>> Blog <http://shubhanshu.com/blog>  || Facebook
>> <http://www.facebook.com/shubhanshu.mishra>  ||  Twitter
>> <http://www.twitter.com/TheShubhanshu>  || LinkedIn
>> <http://www.linkedin.com/in/shubhanshumishra>
>>
>
>
> _______________________________________________
> Analytics mailing list
> analyt...@lists.wikimedia.org
> https://lists.wikimedia.org/mailman/listinfo/analytics
>
>


-- 
Tilman Bayer
Senior Analyst
Wikimedia Foundation
IRC (Freenode): HaeB
_______________________________________________
Xmldatadumps-l mailing list
Xmldatadumps-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l

Reply via email to