https://bugzilla.wikimedia.org/show_bug.cgi?id=18919
Summary: Provide database dumps of just article namespace
Product: Wikimedia
Version: unspecified
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: Normal
Component: General/Unknown
AssignedTo: [email protected]
ReportedBy: [email protected]
At the moment I can download "pages-meta-current", a dump of all pages, or
"pages-articles", which is articles, templates, image descriptions and "primary
meta pages". The latter is nice if I want to redistribute Wikipedia's content,
but if I'm just trying to gather some data about articles, and I don't want to
try to download them all individually, I only need the articles.
Since for en.wikipedia the "pages-articles" dump contains 8559359 pages, and
there are only 2892000 articles, I'm obviously getting a lot of stuff I don't
actually need. Seems it would save GBs of bandwidth (and processing time for
users) if there was just a dump of article text.
--
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l