On Mon, Aug 1, 2016 at 11:47 AM, Rob Lanphier <[email protected]> wrote:

> > HTML storage comes with its own can of worms, but it seems like a
> solution
> > worth thinking about in some form.
> >
> > 1. storage costs (fully rendered HTML would be 5-10 times bigger than
> > wikitext for that same page, and much larger if stored as wikitext diffs)
> > 2. evolution of HTML spec and its affect on old content (this affects the
> > entire web, so, whatever solution works there will work for us as well)
> > 3. newly discovered security holes and retroactively fixing them in
> stored
> > html and released dumps (not sure).
> > ... and maybe others.
>
> I think these are all reasons why I chose the word "seductive" as
> opposed to more unambiguous praise  :-)  Beyond these reasons, the
> bigger issue is that it's an invitation to be sloppy about our
> formats.  We should endeavor to make our wikitext to html conversion
> more scientifically reproducible (i.e. "Nachvollziehbarkeit" as Daniel
> Kinzler taught me).  Holding a large data store of snapshots seems
> like a crutch to avoid the hard work of specifying how this conversion
> ought to work.  Let's actually nail down the spec for this[2][3]
> rather than kidding ourselves into believing we can just store enough
> HTML snapshots to make the problem moot.
>

Specifying wikitext-html conversion sounds like a MediaWiki 2.0 type of
project (ie. wouldn't expect it to happen in this decade), and even then it
would not fully solve the problem - e.g. very old versions relied on the
default CSS of a different MediaWiki skin; you need site scripts for some
things such as infobox show/hide functionality to work, but the standard
library those scripts rely on has changed; same for Scribunto scripts.

HTML storage is actually not that bad - browsers are very good at backwards
compatibility with older HTML spec and there is very little security
footprint in serving static HTML from a separate domain. Storage is
problem, but there is no need to store every page revision - monthly or
yearly snapshots would be fine IMO. (cf. T17017 - again, Kiwix seems to do
this already, so maybe it's just a matter of coordination.) The only other
practical problem I can think of is that it would preserve
deleted/oversighted information - that problem already exists with the
dumps, but those are not kept for very long (on WMF servers at least).
_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to