Hi Tim,

If there's a problem with viewing past versions of the main page, that's
perfectly okay -- it can be excluded from the resources that are datetime
content negotiable like the Special: pages.

I admit to not following the second issue completely.  A regular robot would
never issue the X-Accept-Datetime to jump back in time, so that's okay.  A
regular robot would also respect the history page policy and not crawl
backwards either, as you say.  A robot that did issue X-Accept-Datetime
would end up crawling old revision pages and never hit a history list, but
this could also be forbidden via robots.txt if the revision pages were
excluded too?

However, that seems like it's a long time off before people write past-web
crawlers and the use case for even doing it at all is pretty hard to come up
with. :)

Hope this addresses your concerns!

Rob

On Thu, Nov 12, 2009 at 5:15 PM, Tim Starling <[email protected]>wrote:

> Daniel Kinzler wrote:
> > Hi all
> >
> > The Memento Project <http://www.mementoweb.org/> (including the Los
> Alamos
> > National Laboratory (!) featuring Herbert Van de Sompel of OpenURL fame)
> is
> > proposing a new HTTP header, X-Accept-Datetime, to fetch old versions of
> a web
> > resource. They already wrote a MediaWiki extension for this
> > <http://www.mediawiki.org/wiki/Extension:Memento> - which would of
> course be
> > particularly interesting for use on Wikipedia.
> >
> > Do you think we could have this for Wikimedia project? I think that would
> be
> > very nice indeed. I recall that ways to look at last weeks main page have
> been
> > discussed before, and I see several issues:
> >
> You can't view the main page as it was in the past, because users
> routinely upload temporary images to display there, so that they can
> be protected, and then delete them once they're off the page.
>
> Also, we can't have people crawling Wikipedia while requesting old
> versions, because of the excessive disk seeking and CPU usage that
> would generate. That's why the history page has a robot policy of
> noindex, nofollow.
>
> -- Tim Starling
>
>
> _______________________________________________
> Wikitech-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikitech-l
>
_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to