CCing the data dumps mailing list, which is the recommended venue for questions like this (https://meta.wikimedia.org/wiki/Data_dumps#Where_to_go_ for_help ).
On Wed, Nov 1, 2017 at 8:44 AM, Shubhanshu Mishra < shubhanshumis...@gmail.com> wrote: > Also, important categories like Computer Architechture, Human based > computation, Programming language theory, Software Engineering, and Theory > of Computation, are missing from the subcategories of Areas of Computer > Science. > > > *Regards,* > *Shubhanshu Mishra* > Research Assistant, > iSchool at University of Illinois at Urbana-Champaign > -------------------------------------------------- > *Website:* http://shubhanshu.com > *LinkedIn Profile: *http://www.linkedin.com/in/shubhanshumishra > > Blog <http://shubhanshu.com/blog> || Facebook > <http://www.facebook.com/shubhanshu.mishra> || Twitter > <http://www.twitter.com/TheShubhanshu> || LinkedIn > <http://www.linkedin.com/in/shubhanshumishra> > > On Wed, Nov 1, 2017 at 10:42 AM, Shubhanshu Mishra < > shubhanshumis...@gmail.com> wrote: > >> Hi, >> >> When using the wikipedia dump files, I am unable to find many categories >> and pages in the dump. >> >> E.g. under the Areas_of_computer_science category I get only 13 >> subcategories and 2 pages instead of 17 subcategories, 2 pages. >> Furthermore, 1 page "Computational_creativity" is not present as a >> subcategory. >> >> I am using the following wikipedia dump files to extract the >> categorylinks, and page details: >> >> 1.6G Sep 21 00:45 enwiki-20170920-page.sql.gz >> 21M Sep 21 00:45 enwiki-20170920-category.sql.gz >> 113M Sep 21 00:55 enwiki-20170920-redirect.sql.gz >> 2.2G Sep 21 03:10 enwiki-20170920-categorylinks.sql.gz >> 221M Sep 21 03:13 enwiki-20170920-page_props.sql.gz >> >> >> I use https://github.com/napsternxg/WikiUtils to parse the sql.gz dump >> files, but I also tried searching in the sql.gz files and couldn't find any >> entry for 16300571 in the page.sql.gz and in category.sql.gz >> files. 16300571 supposedly refers to the Computational_creativity page as >> the following categories are linked to this page: >> >> 16300571 'All_NPOV_disputes' 'page' >> 16300571 'All_articles_needing_additional_references' 'page' >> 16300571 'All_articles_with_dead_external_links' 'page' >> 16300571 'All_articles_with_unsourced_statements' 'page' >> 16300571 'Areas_of_computer_science' 'page' >> 16300571 'Articles_needing_additional_references_from_May_2013' 'page' >> 16300571 'Articles_with_French-language_external_links' 'page' >> 16300571 'Articles_with_dead_external_links_from_November_2016' 'page' >> 16300571 'Articles_with_permanently_dead_external_links' 'page' >> 16300571 'Articles_with_unsourced_statements_from_April_2015' 'page' >> 16300571 'Articles_with_unsourced_statements_from_April_2016' 'page' >> 16300571 'Articles_with_unsourced_statements_from_December_2015' >> 'page' >> 16300571 'Articles_with_unsourced_statements_from_January_2010' 'page' >> 16300571 'Articles_with_unsourced_statements_from_October_2016' 'page' >> 16300571 'Artificial_intelligence' 'page' >> 16300571 'Arts' 'page' >> 16300571 'CS1_maint:_Extra_text:_authors_list' 'page' >> 16300571 'Cognitive_psychology' 'page' >> 16300571 'Computational_fields_of_study' 'page' >> 16300571 'Creativity_techniques' 'page' >> 16300571 'NPOV_disputes_from_January_2013' 'page' >> 16300571 'Philosophical_movements' 'page' >> 16300571 'Webarchive_template_wayback_links' 'page' >> 16300571 'Wikipedia_articles_needing_clarification_from_November_2008' >> 'page' >> >> More details can be found at: https://twitter.com/TheShu >> bhanshu/status/925736635572072449 >> >> Is there something, I am doing wrong, or are these rows just missing from >> the dumps. >> >> >> >> >> >> *Regards,* >> *Shubhanshu Mishra* >> Research Assistant, >> iSchool at University of Illinois at Urbana-Champaign >> -------------------------------------------------- >> *Website:* http://shubhanshu.com >> *LinkedIn Profile: *http://www.linkedin.com/in/shubhanshumishra >> >> Blog <http://shubhanshu.com/blog> || Facebook >> <http://www.facebook.com/shubhanshu.mishra> || Twitter >> <http://www.twitter.com/TheShubhanshu> || LinkedIn >> <http://www.linkedin.com/in/shubhanshumishra> >> > > > _______________________________________________ > Analytics mailing list > analyt...@lists.wikimedia.org > https://lists.wikimedia.org/mailman/listinfo/analytics > > -- Tilman Bayer Senior Analyst Wikimedia Foundation IRC (Freenode): HaeB
_______________________________________________ Xmldatadumps-l mailing list Xmldatadumps-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l