https://bugzilla.wikimedia.org/show_bug.cgi?id=16330





--- Comment #7 from Nicholas Wilson <[email protected]>  2008-12-31 21:57:16 UTC 
---
Right, I should clarify what I was on about. The real underlying problem, as
has been noted, is the inconsistent use of proper parsers (a parser, in theory,
is a recursive, not regex-based, function). The current first 'parser' stage is
not a full-on parser, but certain aspects of the processing do use proper
parsing; templates can be nested correctly, and the magic words work fine.
Either we do a huge upgrade and put in some sort of tree-based parsing
everywhere, but that is too idealist and not pragmatic enough to be feasible,
at least I suspect without a major backward-compatibility-breaking release.

The #tag magic word 'circumventing' the anti-nesting code is not a bug per se;
it actually shows the way the fix the problem. What we are tying to do is fix a
parser with regexs, which does not work, when we already have a recursive
parser stage (templates and magic words). Exploit the #tag method, don't do
more regexp kludging to avoid it. As we can see by the use of reflist in
preference to references in so many articles, there is nothing against wrapping
naked (badly parsed) XML tags in (properly parsed) templates, so that is what
we should do: say as policy to users "If you don't want to nest, the old XML
syntax works and is good, but if you want to nest, use this template instead of
putting low-level magic words in your wikitext". The template then wraps the
XML tag. That is a 'better' solution, without re-inventing the wheel and
changing the parser. That still leaves the problem of numbering though.

The way we currently number up is to assign a number straight away,
sequentially. This does not work with recursion, because

(1) the inner elements get assigned their numbering first (part of the problem
in the first example in the URL I originally gave above); and
(2) the inner element may be in a footnote, so should in fact be numbered way
later (better shown by my second example).

The only way we can fix this is to delay the numbering to a later stage of
processing (it is logically impossible to assign numbers immediately and not
get these problems). The question is at what stage to do the numbering.

Now, I have made a patch with a bit of a rework of the processing of refs at
bug 16294. I had not had this particular problem in mind at the time, but I
think the approach is relevant here. The idea in that fix is that, instead of
the full link (with a sup and [] etc.), only a dummy tag with an internal id is
written out initially; we then run through the page with a regexp stage later
and put in the correct text. This is relevant to the current bug because
relatively small changes to my bug 16294 patch are needed to fix this numbering
issues here.

What it would involve is writing out a 'fixed' number in the second pass
instead of the initially allocated one. We correct the numbering in the
references tags by writing out the refs 'out of order', checking for each ref
whether its id tag has already been written out (which will not be the case for
ones in footnotes that have not been written out); these we place to the end of
the list. The regexp code copies this behaviour, placing the same fixed numbers
when it updates the dummy code written out by the parser stage. (Note: again
for want of a full parser, and since some pages put ols and uls in refs, we
must write out the references tag contents at the first pass.)

I think this approach could work, and tinkered with it for about an hour, but
did not get it done. I now don't have enough time to look into it fully, and I
see Rupert has assigned the bug to himself, so good luck. If these thoughts are
not helpful, ignore them. However, I do think that kludging a way to disable
#tag is a bad idea (even the nasty regexp detecting unclosed refs is just to
help usability and avoid bad wikitext, not to actually process good wikitext).

My ideas in summary: exploit the recursive nature of the proper parser handling
magic words and templates, not kludge to eliminate it; delay writing out final
numbering to page until have made a pass at the whole thing and know what the
correct order is; possibly do this using a regexp stage in the later parsing,
as at bug 16924.


-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to