https://bugzilla.wikimedia.org/show_bug.cgi?id=16330
--- Comment #7 from Nicholas Wilson <[email protected]> 2008-12-31 21:57:16 UTC --- Right, I should clarify what I was on about. The real underlying problem, as has been noted, is the inconsistent use of proper parsers (a parser, in theory, is a recursive, not regex-based, function). The current first 'parser' stage is not a full-on parser, but certain aspects of the processing do use proper parsing; templates can be nested correctly, and the magic words work fine. Either we do a huge upgrade and put in some sort of tree-based parsing everywhere, but that is too idealist and not pragmatic enough to be feasible, at least I suspect without a major backward-compatibility-breaking release. The #tag magic word 'circumventing' the anti-nesting code is not a bug per se; it actually shows the way the fix the problem. What we are tying to do is fix a parser with regexs, which does not work, when we already have a recursive parser stage (templates and magic words). Exploit the #tag method, don't do more regexp kludging to avoid it. As we can see by the use of reflist in preference to references in so many articles, there is nothing against wrapping naked (badly parsed) XML tags in (properly parsed) templates, so that is what we should do: say as policy to users "If you don't want to nest, the old XML syntax works and is good, but if you want to nest, use this template instead of putting low-level magic words in your wikitext". The template then wraps the XML tag. That is a 'better' solution, without re-inventing the wheel and changing the parser. That still leaves the problem of numbering though. The way we currently number up is to assign a number straight away, sequentially. This does not work with recursion, because (1) the inner elements get assigned their numbering first (part of the problem in the first example in the URL I originally gave above); and (2) the inner element may be in a footnote, so should in fact be numbered way later (better shown by my second example). The only way we can fix this is to delay the numbering to a later stage of processing (it is logically impossible to assign numbers immediately and not get these problems). The question is at what stage to do the numbering. Now, I have made a patch with a bit of a rework of the processing of refs at bug 16294. I had not had this particular problem in mind at the time, but I think the approach is relevant here. The idea in that fix is that, instead of the full link (with a sup and [] etc.), only a dummy tag with an internal id is written out initially; we then run through the page with a regexp stage later and put in the correct text. This is relevant to the current bug because relatively small changes to my bug 16294 patch are needed to fix this numbering issues here. What it would involve is writing out a 'fixed' number in the second pass instead of the initially allocated one. We correct the numbering in the references tags by writing out the refs 'out of order', checking for each ref whether its id tag has already been written out (which will not be the case for ones in footnotes that have not been written out); these we place to the end of the list. The regexp code copies this behaviour, placing the same fixed numbers when it updates the dummy code written out by the parser stage. (Note: again for want of a full parser, and since some pages put ols and uls in refs, we must write out the references tag contents at the first pass.) I think this approach could work, and tinkered with it for about an hour, but did not get it done. I now don't have enough time to look into it fully, and I see Rupert has assigned the bug to himself, so good luck. If these thoughts are not helpful, ignore them. However, I do think that kludging a way to disable #tag is a bad idea (even the nasty regexp detecting unclosed refs is just to help usability and avoid bad wikitext, not to actually process good wikitext). My ideas in summary: exploit the recursive nature of the proper parser handling magic words and templates, not kludge to eliminate it; delay writing out final numbering to page until have made a pass at the whole thing and know what the correct order is; possibly do this using a regexp stage in the later parsing, as at bug 16924. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
