That's not the way I read Brion's proposal.  It looks to me like there would
only be records for each new revision and for those revisions and pages that
were updated--that no old data that had not been updated or created would be
included.  Either way, this is essential.  I'm sure no one would disagree.

-Aaron

On Fri, Apr 1, 2011 at 12:06 PM, Luca de Alfaro <[email protected]> wrote:

> Not quite... if I am reading correctly the proposal by Brion, this would
> list all the pages that changed in a specific interval.  If the interval is
> large, like a month, this could be a very large size, if all the history of
> a page is provided.
> What I was suggesting is to include only the changes (the revisions) that
> occur in a specific time span.
>
> Luca
>
>
> On Thu, Mar 31, 2011 at 5:33 PM, Yuvi Panda <[email protected]> wrote:
>
>> Would incremental dumps, as described by brion long time ago
>> (http://leuksman.com/log/2007/10/14/incremental-dumps/) be what you're
>> looking for?
>>
>> On Fri, Apr 1, 2011 at 5:01 AM, Aaron Halfaker <[email protected]>
>> wrote:
>> > If periodic update dumps are being considered, information that
>> describes
>> > changes to old data (page deletes, user renames, etc) would be very
>> useful
>> > to have along with new revisions.
>> >
>> > -Aaron
>> >
>> > On Mar 31, 2011 6:27 PM, "Luca de Alfaro" <[email protected]> wrote:
>> >> I think I would be very interested in 3, or even, in having every month
>> a
>> >> dump of that month's revisions. As I have built tools for the xml
>> dumps,
>> >> no
>> >> change in format is good for me (and for WikiTrust).
>> >>
>> >> I would find incremental dumps (with occasional, yearly, full dumps)
>> much
>> >> easier to manage than full dumps.
>> >>
>> >> Luca
>> >>
>> >> On Thu, Mar 31, 2011 at 2:27 PM, Yuvi Panda <[email protected]>
>> wrote:
>> >>
>> >>> Hi, I'm a student planning on doing GSoC this year on mediawiki.
>> >>> Specifically, I'd like to work on data dumps.
>> >>>
>> >>> I'm writing this to gauge what would be useful to the research
>> >>> community. Several ideas thrown about include:
>> >>> 1. JSON Dumps
>> >>> 2. Sqlite Dumps
>> >>> 3. Daily dumps of revisions in last 24 hours
>> >>> 4. Dumps optimized for very fast import into various external storage
>> >>> and smaller size (diffs)
>> >>> 5. JSON/CSV for Special:Import and Special:Export
>> >>>
>> >>> Would any of these be useful? Or is there anything else that I'm
>> >>> missing, that you would consider much more useful?
>> >>>
>> >>> Feedback would be invaluable :)
>> >>>
>> >>> Thanks :)
>> >>> --
>> >>> Yuvi Panda T
>> >>> http://yuvi.in/blog
>> >>>
>> >>> _______________________________________________
>> >>> Wiki-research-l mailing list
>> >>> [email protected]
>> >>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>> >>>
>> >
>> > _______________________________________________
>> > Wiki-research-l mailing list
>> > [email protected]
>> > https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>> >
>> >
>>
>>
>>
>> --
>> Yuvi Panda T
>> http://yuvi.in/blog
>>
>> _______________________________________________
>> Wiki-research-l mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>>
>
>
> _______________________________________________
> Wiki-research-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wiki-research-l
>
>
_______________________________________________
Wiki-research-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wiki-research-l

Reply via email to