https://bugzilla.wikimedia.org/show_bug.cgi?id=21195

           Summary: Include page count in database dumps
           Product: Wikimedia
           Version: unspecified
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: Normal
         Component: General/Unknown
        AssignedTo: [email protected]
        ReportedBy: [email protected]


The dumps "pages-meta-current" and "pages-articles", as well as the
hypothetical article-namespace-only dump that I would like to see (bug 18919),
should include the total number of pages in the dump at the start of the file
in the "siteinfo" section.

Among other things, it would be useful for displaying dump search progress to
the user. Attempts to estimate the total number based on a small proportion of
the file seem to produce wildly inaccurate results, especially with the
en.wikipedia dump (pages are approximately ordered by creation time, and it
seems the older a page is, the larger it is, which makes sense). Even if it
were more accurate, it would be helpful to have the exact number to hand. And
obviously the extra few bytes in a 25GB file are negligible :)

An analogous thing could probably be done for some of the other dumps.


-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to