https://bugzilla.wikimedia.org/show_bug.cgi?id=18919

           Summary: Provide database dumps of just article namespace
           Product: Wikimedia
           Version: unspecified
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: Normal
         Component: General/Unknown
        AssignedTo: [email protected]
        ReportedBy: [email protected]


At the moment I can download "pages-meta-current", a dump of all pages, or
"pages-articles", which is articles, templates, image descriptions and "primary
meta pages". The latter is nice if I want to redistribute Wikipedia's content,
but if I'm just trying to gather some data about articles, and I don't want to
try to download them all individually, I only need the articles.

Since for en.wikipedia the "pages-articles" dump contains 8559359 pages, and
there are only 2892000 articles, I'm obviously getting a lot of stuff I don't
actually need. Seems it would save GBs of bandwidth (and processing time for
users) if there was just a dump of article text.


-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to