2010-08-12 09:30, Andreas Jonsson skrev:
[...]
>
> However, requiring a link to be properly closed in order to be a link
> is fairly complex. What should the parser should do with the link
> title, if it desides that it is not really a link title after all? It
> may contain tokens. Thus, the lexer must use lookahead and not
> produce any spurious link open tokens. To avoid the n^2 worst case, a
> full extra pass to compute hints would be necessary before doing the
> actual lexing.
Replying to myself. I might be wrong about the complexity of finding
the closing token. The below lexer hack may actually do the trick:
a rule that matches the empty string if there is a valid closing tag
ahead. Since it does not search past '[[' tokens, no content will be
scanned more than once by this rule. So the worst case running time
is still linear.
fragment
LINK_CLOSE_LOOKAHEAD
@init{
bool success = false;
}:
(
( /*
* List of all other lexer rules that may contain the strings
* ']]' or '[['.
*/
BEGIN_TABLE
| TABLE_ROW_SEPARATOR
| TABLE_CELL
| TABLE_CELL_INLINE
/*
* Alternative: don't search beyond other block elements:
*/
// ({BOL}?=> '{|')=> '{|' {false}?=>
// | (LIST_ELEMENT)=> LIST_ELEMENT {false}?=>
// | (NEWLINE NEWLINE)=> NEWLINE NEWLINE {false}?=>
/*
* Otherwise, anything goes except ']]' or '[['.
*/
| ~('['|']')
| {!PEEK(2, '[')}?=> '['
| {!PEEK(2, ']')}?=> ']'
)+
(
']]' {(success = true), false}?=>
| {false}?=>
)
)
|
{success}?=>
;
_______________________________________________
Wikitext-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitext-l