That's not the way I read Brion's proposal. It looks to me like there would only be records for each new revision and for those revisions and pages that were updated--that no old data that had not been updated or created would be included. Either way, this is essential. I'm sure no one would disagree.
-Aaron On Fri, Apr 1, 2011 at 12:06 PM, Luca de Alfaro <[email protected]> wrote: > Not quite... if I am reading correctly the proposal by Brion, this would > list all the pages that changed in a specific interval. If the interval is > large, like a month, this could be a very large size, if all the history of > a page is provided. > What I was suggesting is to include only the changes (the revisions) that > occur in a specific time span. > > Luca > > > On Thu, Mar 31, 2011 at 5:33 PM, Yuvi Panda <[email protected]> wrote: > >> Would incremental dumps, as described by brion long time ago >> (http://leuksman.com/log/2007/10/14/incremental-dumps/) be what you're >> looking for? >> >> On Fri, Apr 1, 2011 at 5:01 AM, Aaron Halfaker <[email protected]> >> wrote: >> > If periodic update dumps are being considered, information that >> describes >> > changes to old data (page deletes, user renames, etc) would be very >> useful >> > to have along with new revisions. >> > >> > -Aaron >> > >> > On Mar 31, 2011 6:27 PM, "Luca de Alfaro" <[email protected]> wrote: >> >> I think I would be very interested in 3, or even, in having every month >> a >> >> dump of that month's revisions. As I have built tools for the xml >> dumps, >> >> no >> >> change in format is good for me (and for WikiTrust). >> >> >> >> I would find incremental dumps (with occasional, yearly, full dumps) >> much >> >> easier to manage than full dumps. >> >> >> >> Luca >> >> >> >> On Thu, Mar 31, 2011 at 2:27 PM, Yuvi Panda <[email protected]> >> wrote: >> >> >> >>> Hi, I'm a student planning on doing GSoC this year on mediawiki. >> >>> Specifically, I'd like to work on data dumps. >> >>> >> >>> I'm writing this to gauge what would be useful to the research >> >>> community. Several ideas thrown about include: >> >>> 1. JSON Dumps >> >>> 2. Sqlite Dumps >> >>> 3. Daily dumps of revisions in last 24 hours >> >>> 4. Dumps optimized for very fast import into various external storage >> >>> and smaller size (diffs) >> >>> 5. JSON/CSV for Special:Import and Special:Export >> >>> >> >>> Would any of these be useful? Or is there anything else that I'm >> >>> missing, that you would consider much more useful? >> >>> >> >>> Feedback would be invaluable :) >> >>> >> >>> Thanks :) >> >>> -- >> >>> Yuvi Panda T >> >>> http://yuvi.in/blog >> >>> >> >>> _______________________________________________ >> >>> Wiki-research-l mailing list >> >>> [email protected] >> >>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >> >>> >> > >> > _______________________________________________ >> > Wiki-research-l mailing list >> > [email protected] >> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >> > >> > >> >> >> >> -- >> Yuvi Panda T >> http://yuvi.in/blog >> >> _______________________________________________ >> Wiki-research-l mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l >> > > > _______________________________________________ > Wiki-research-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l > >
_______________________________________________ Wiki-research-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
