On Mon, Aug 1, 2016 at 11:47 AM, Rob Lanphier <[email protected]> wrote:
> > HTML storage comes with its own can of worms, but it seems like a > solution > > worth thinking about in some form. > > > > 1. storage costs (fully rendered HTML would be 5-10 times bigger than > > wikitext for that same page, and much larger if stored as wikitext diffs) > > 2. evolution of HTML spec and its affect on old content (this affects the > > entire web, so, whatever solution works there will work for us as well) > > 3. newly discovered security holes and retroactively fixing them in > stored > > html and released dumps (not sure). > > ... and maybe others. > > I think these are all reasons why I chose the word "seductive" as > opposed to more unambiguous praise :-) Beyond these reasons, the > bigger issue is that it's an invitation to be sloppy about our > formats. We should endeavor to make our wikitext to html conversion > more scientifically reproducible (i.e. "Nachvollziehbarkeit" as Daniel > Kinzler taught me). Holding a large data store of snapshots seems > like a crutch to avoid the hard work of specifying how this conversion > ought to work. Let's actually nail down the spec for this[2][3] > rather than kidding ourselves into believing we can just store enough > HTML snapshots to make the problem moot. > Specifying wikitext-html conversion sounds like a MediaWiki 2.0 type of project (ie. wouldn't expect it to happen in this decade), and even then it would not fully solve the problem - e.g. very old versions relied on the default CSS of a different MediaWiki skin; you need site scripts for some things such as infobox show/hide functionality to work, but the standard library those scripts rely on has changed; same for Scribunto scripts. HTML storage is actually not that bad - browsers are very good at backwards compatibility with older HTML spec and there is very little security footprint in serving static HTML from a separate domain. Storage is problem, but there is no need to store every page revision - monthly or yearly snapshots would be fine IMO. (cf. T17017 - again, Kiwix seems to do this already, so maybe it's just a matter of coordination.) The only other practical problem I can think of is that it would preserve deleted/oversighted information - that problem already exists with the dumps, but those are not kept for very long (on WMF servers at least). _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
