Hello Herbert.

Herbert Van de Sompel wrote:
> 2. Let me describe the actual status and challenges faced in the  
> Memento plug-in work:
> 
> 2.1. The plug-in detects a client's X-Accept-Datetime header, and  
> returns the mediawiki page that was active at the datetime specified  
> in the header. Same for images, actually. 

> 2.2. Display history pages with the template that was active at  
> the time the history page acted as the current one. [Snip] So, we are  
> looking at the mediawiki code to see whether a history page, when  
> rendered, could itself retrieve the appropriate (old) template from  
> the database. If we are successful, we will share that code also at 
> http://www.mediawiki.org/wiki/Extension:Memento 
>   once available. It will obviously be up to the mediawiki community  
> whether they are willing to adopt the proposed change to the codebase.

Obviously it's a server issue.


> 2.3. We have looked into another issue raised by Jakob: Display  
> deleted pages as they existed at the datetime expressed in X-Datetime- 
> Accept. We have actually implemented this. There are 2 caveats:
> - as is the case with mediawiki in general, deleted pages are only  
> accessible by those with appropriate permissions;
> - as is the case with mediawiki in general, deleted pages show up in  
> Edit mode.
> This code will soon be included at 
> http://www.mediawiki.org/wiki/Extension:Memento 

Showing deleted pages in edit mode is not always the case, since they
can't be rendered (albeit not with the old templates, which would be an
interesting enhacement by your work).


It is impressive how far you have gone. However, I don't think you can
do a *complete* implementation.

First, you should be aware that timemachining the pages has been tried
in the past. Discussions treating FlaggedReves are also relevant for
your project.
FlaggedRevs is an extension which allow to mark the status of a page
(eg. not vandalised) at a point in time. A naive implementation would
store the timestamp and get the old version from the archive. They ended
up storing in a table specific to the extension the page content with
templates transcluded.
However, flaggedrevs is a tool to fight vandalism. Yours is an archival
one. You could accept imperfect results under certain circunstances.


Problematic aspects:

Page moves/image moves:
*You want to see content of Foo at epoch, but the history now at Foo is
wrong. Instead you need to look at that history of the page now at
Foo_(disambiguation)
You need to follow (perhaps even many times) the move logs to find out
the real page.

Page merges:
*When two pages have been merged, you will want to show the revision
which was originally at the page the user wants to timemachine. You can
no longer just rely on the timestamps. You may be able to get that by
splitting the sources at the merge time and going back via
rev_parent_id. Needless to say, this is very inefficient, this piece
wouldn't be put live at wikipedia.

Partial undeletions:
*When a page is undeleted, the summary shows how many revisions were
undeleted, but not *which* ones.

Case:
*Page A has two edits (#1 and #2).
*A vandal adds obscene content to it (#3).
*Admin deletes the page and restores the two first revisions.
*Several months later, the page is completely deleted.

When an admin wants to view what the page looked like those months, an
application is unable to determine if the two revisions which had been
shown were #1 and #2 or perhaps #2 and #3.


revdelete may have similar issues.



> 2.4. We do not feel that all pages should necessarily be subject to  
> datetime content negotiation, in the same way that not all URIs are  
> subject to content negotiation in other dimensions. We feel that the  
> Special Pages fall under this category, as they do not have History.
> 
> 2.5. We have ideas regarding how to address the issue raised by  
> Daniel: the timestamp isn't a unique identifier, multiple revisions  
> *might* have the
> same timestamp. From the perspective of Memento, a datetime is  
> obviously the only "globally" recognizable value that can be used for  
> negotiation. If cases occur where multiple versions of a page exist  
> for the same second, the thing to do according to RFC 2295 would be to  
> return a "300 Mutliple Choices", listing the URIs (and metadata) of  
> those version in an Alternates header.  The client then has to take it  
> from there.


> 2.6. The caching issue is a general problem arising from introducing  
> Memento in a web that does not (yet) do Memento: when in datetime  
> content negotiation mode all caches between client and server (both  
> included) need to be bypassed. As described in our paper, we currently  
> address this problem by adding the following client headers:
> 
> Cache-Control: no-cache => to force cache revalidation, and
> If-Modified-Since: Thu, 01 Jan 1970 00:00:00 GMT' to enforce  
> validation failure
> 
> We very much understand this is not elegant but it tends to work ;-) .  


The caching issue is IMHO the bigger problem in your approach using the
new header.
Disabling cache on the request kind of work (although not in the long
term), but you also need to disable caching at the server, so when
someone accessing by your same proxy (ignorant of X-Accept-Datetime) to
the current page doesn't get the cached page you were served earlier.

RFC 2145 states very clearly that "A proxy MUST forward an unknown
header", but in your case it'd have been preferable that the header
wasn't forwarded if the proxy isn't memento aware.

Which leads us to another issue, which is that it seems your server
implementation doesn't "acknowledge" memento, so given a response to a
X-Accept-Datetime, you don't know if what you're getting is the version
you requested or the current one (because the server ignored it).
It can be as simple as requiring a Last-Modified <= X-Accept-Datetime on
Accept-Datetime responses (that would allow the server to explicitely
tell since when is it valid), but extended to all response codes.


_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to