[MediaWiki-commits] [Gerrit] mediawiki/core[master]: Protect language converter markup in the preprocessor (take 2).
jenkins-bot has submitted this change and it was merged. ( https://gerrit.wikimedia.org/r/333997 ) Change subject: Protect language converter markup in the preprocessor (take 2). .. Protect language converter markup in the preprocessor (take 2). This revises 28774022769d2273be16c6c6e1cca710a1fd97ef, which was reverted in master due to unexpected issues with `-{{...}} ` markup on translatewiki and enwiki. Test cases are added to ensure that this is parsed as a template, not as language converter markup. https://www.mediawiki.org/wiki/Preprocessor_ABNF is the canonical documentation for the preprocessor; this will be updated after this patch is merged. The basic principles described in that page are maintained in this patch: * Rightmost opening structure has precedence: `-{{` is parsed as a dash followed by template opening. * `{{{` has precedence over `{{` and `-{`: `-` is parsed as `-{` `{{{` since we first grab the rightmost `{{{`. A bunch of test cases were added to verify the "ideal precedence" order described on that wiki page. This patch introduced some minor incompatibilities in existing markup, in particular with chemical formulae in templates. Fixes for these are being tracked at https://www.mediawiki.org/wiki/Parsoid/Language_conversion/Preprocessor_fixups Bug: T146304 Bug: T153761 Change-Id: I2f0c186c75e392c95e1a3d89266cae2586349150 --- M RELEASE-NOTES-1.30 M includes/parser/Preprocessor.php M includes/parser/Preprocessor_DOM.php M includes/parser/Preprocessor_Hash.php M tests/parser/parserTests.txt 5 files changed, 444 insertions(+), 35 deletions(-) Approvals: Reedy: Looks good to me, approved jenkins-bot: Verified diff --git a/RELEASE-NOTES-1.30 b/RELEASE-NOTES-1.30 index aa583b8..97356fd 100644 --- a/RELEASE-NOTES-1.30 +++ b/RELEASE-NOTES-1.30 @@ -79,6 +79,10 @@ deprecated in 1.24) were removed. * wfMemcKey() and wfGlobalCacheKey() were deprecated. ObjectCache::makeKey() and ObjectCache::makeGlobalKey() should be used instead. +* (T146304) Preprocessor handling of LanguageConverter markup has been improved. + As a result of the new uniform handling, '-{' may need to be escaped + (for example, as '-{') where it occurs inside template arguments + or wikilinks. == Compatibility == MediaWiki 1.30 requires PHP 5.5.9 or later. There is experimental support for diff --git a/includes/parser/Preprocessor.php b/includes/parser/Preprocessor.php index 426b550..cb8e3a7 100644 --- a/includes/parser/Preprocessor.php +++ b/includes/parser/Preprocessor.php @@ -51,9 +51,9 @@ ], '-{' => [ 'end' => '}-', - 'names' => [ 1 => null ], - 'min' => 1, - 'max' => 1, + 'names' => [ 2 => null ], + 'min' => 2, + 'max' => 2, ], ]; diff --git a/includes/parser/Preprocessor_DOM.php b/includes/parser/Preprocessor_DOM.php index b93c617..7539307 100644 --- a/includes/parser/Preprocessor_DOM.php +++ b/includes/parser/Preprocessor_DOM.php @@ -223,8 +223,7 @@ $searchBase = "[{<\n"; # } if ( !$wgDisableLangConversion ) { - // FIXME: disabled due to T153761 - // $searchBase .= '-'; + $searchBase .= '-'; } // For fast reverse searches @@ -277,6 +276,13 @@ $search = $searchBase; if ( $stack->top === false ) { $currentClosing = ''; + } elseif ( + $stack->top->close === '}-' && + $stack->top->count > 2 + ) { + # adjust closing for -{{{...{{ + $currentClosing = '}'; + $search .= $currentClosing; } else { $currentClosing = $stack->top->close; $search .= $currentClosing; @@ -333,11 +339,15 @@ } elseif ( isset( $this->rules[$curChar] ) ) { $found = 'open'; $rule = $this->rules[$curChar]; - } elseif ( $curChar == '-' ) { - $found = 'dash'; } else { - # Some versions of PHP have a strcspn which stops on null characters - # Ignore and continue +
[MediaWiki-commits] [Gerrit] mediawiki/core[master]: Protect language converter markup in the preprocessor (take 2).
C. Scott Ananian has uploaded a new change for review. ( https://gerrit.wikimedia.org/r/333997 ) Change subject: Protect language converter markup in the preprocessor (take 2). .. Protect language converter markup in the preprocessor (take 2). This revises 28774022769d2273be16c6c6e1cca710a1fd97ef, which was reverted in master due to unexpected issues with `-{{...}} ` markup on translatewiki and enwiki. Test cases are added to ensure that this is parsed as a template, not as language converter markup. https://www.mediawiki.org/wiki/Preprocessor_ABNF is the canonical documentation for the preprocessor; this will be updated after this patch is merged. The basic principles described in that page are maintained in this patch: * Rightmost opening structure has precedence: `-{{` is parsed as a dash followed by template opening. * `{{{` has precedence over `{{` and `-{`: `-` is parsed as `-{` `{{{` since we first grab the rightmost `{{{`. A bunch of test cases were added to verify the "ideal precedence" order described on that wiki page. Bug: T153761 Change-Id: I2f0c186c75e392c95e1a3d89266cae2586349150 --- M includes/parser/Preprocessor.php M includes/parser/Preprocessor_DOM.php M includes/parser/Preprocessor_Hash.php M tests/parser/parserTests.txt 4 files changed, 245 insertions(+), 35 deletions(-) git pull ssh://gerrit.wikimedia.org:29418/mediawiki/core refs/changes/97/333997/1 diff --git a/includes/parser/Preprocessor.php b/includes/parser/Preprocessor.php index 426b550..cb8e3a7 100644 --- a/includes/parser/Preprocessor.php +++ b/includes/parser/Preprocessor.php @@ -51,9 +51,9 @@ ], '-{' => [ 'end' => '}-', - 'names' => [ 1 => null ], - 'min' => 1, - 'max' => 1, + 'names' => [ 2 => null ], + 'min' => 2, + 'max' => 2, ], ]; diff --git a/includes/parser/Preprocessor_DOM.php b/includes/parser/Preprocessor_DOM.php index 661318b..3cdd38c 100644 --- a/includes/parser/Preprocessor_DOM.php +++ b/includes/parser/Preprocessor_DOM.php @@ -223,8 +223,7 @@ $searchBase = "[{<\n"; # } if ( !$wgDisableLangConversion ) { - // FIXME: disabled due to T153761 - // $searchBase .= '-'; + $searchBase .= '-'; } // For fast reverse searches @@ -277,6 +276,13 @@ $search = $searchBase; if ( $stack->top === false ) { $currentClosing = ''; + } else if ( + $stack->top->close === '}-' && + $stack->top->count > 2 + ) { + # adjust closing for -{{{...{{ + $currentClosing = '}'; + $search .= $currentClosing; } else { $currentClosing = $stack->top->close; $search .= $currentClosing; @@ -333,11 +339,15 @@ } elseif ( isset( $this->rules[$curChar] ) ) { $found = 'open'; $rule = $this->rules[$curChar]; - } elseif ( $curChar == '-' ) { - $found = 'dash'; } else { - # Some versions of PHP have a strcspn which stops on null characters - # Ignore and continue + # Some versions of PHP have a strcspn which stops on + # null characters; ignore these and continue. + # We also may get '-' and '}' characters here which + # don't match -{ or $currentClosing. Add these to + # output and continue. + if ( $curChar == '-' || $curChar == '}' ) { + $accum .= $curChar; + } ++$i; continue; } @@ -615,7 +625,10 @@ } elseif ( $found == 'open' ) { # count opening brace
[MediaWiki-commits] [Gerrit] mediawiki/core[master]: Protect language converter markup in the preprocessor.
jenkins-bot has submitted this change and it was merged. ( https://gerrit.wikimedia.org/r/311849 ) Change subject: Protect language converter markup in the preprocessor. .. Protect language converter markup in the preprocessor. This ensures that `{{echo|-{R|foo}-}}` is parsed correctly as a template invocation with a single argument, not as two separate arguments split by the `|`. Bug: T146304 Change-Id: I709d007c70a3fd19264790055042c615999b2f67 --- M includes/parser/Preprocessor.php M includes/parser/Preprocessor_DOM.php M includes/parser/Preprocessor_Hash.php M tests/parser/parserTests.txt 4 files changed, 76 insertions(+), 13 deletions(-) Approvals: Tim Starling: Looks good to me, approved jenkins-bot: Verified diff --git a/includes/parser/Preprocessor.php b/includes/parser/Preprocessor.php index cc98abd..426b550 100644 --- a/includes/parser/Preprocessor.php +++ b/includes/parser/Preprocessor.php @@ -48,7 +48,13 @@ 'names' => [ 2 => null ], 'min' => 2, 'max' => 2, - ] + ], + '-{' => [ + 'end' => '}-', + 'names' => [ 1 => null ], + 'min' => 1, + 'max' => 1, + ], ]; /** diff --git a/includes/parser/Preprocessor_DOM.php b/includes/parser/Preprocessor_DOM.php index 5da7cd7..950d66d 100644 --- a/includes/parser/Preprocessor_DOM.php +++ b/includes/parser/Preprocessor_DOM.php @@ -193,6 +193,8 @@ * @return string */ public function preprocessToXml( $text, $flags = 0 ) { + global $wgDisableLangConversion; + $forInclusion = $flags & Parser::PTD_FOR_INCLUSION; $xmlishElements = $this->parser->getStripList(); @@ -220,6 +222,10 @@ $stack = new PPDStack; $searchBase = "[{<\n"; # } + if ( !$wgDisableLangConversion ) { + $searchBase .= '-'; + } + // For fast reverse searches $revText = strrev( $text ); $lengthText = strlen( $text ); @@ -298,7 +304,10 @@ break; } } else { - $curChar = $text[$i]; + $curChar = $curTwoChar = $text[$i]; + if ( ( $i + 1 ) < $lengthText ) { + $curTwoChar .= $text[$i + 1]; + } if ( $curChar == '|' ) { $found = 'pipe'; } elseif ( $curChar == '=' ) { @@ -311,11 +320,20 @@ } else { $found = 'line-start'; } + } elseif ( $curTwoChar == $currentClosing ) { + $found = 'close'; + $curChar = $curTwoChar; } elseif ( $curChar == $currentClosing ) { $found = 'close'; + } elseif ( isset( $this->rules[$curTwoChar] ) ) { + $curChar = $curTwoChar; + $found = 'open'; + $rule = $this->rules[$curChar]; } elseif ( isset( $this->rules[$curChar] ) ) { $found = 'open'; $rule = $this->rules[$curChar]; + } elseif ( $curChar == '-' ) { + $found = 'dash'; } else { # Some versions of PHP have a strcspn which stops on null characters # Ignore and continue @@ -595,7 +613,8 @@ // input pointer. } elseif ( $found == 'open' ) { # count opening brace characters - $count = strspn( $text, $curChar, $i ); + $curLen = strlen( $curChar ); + $count = ( $curLen > 1 ) ? 1 : strspn( $text, $curChar, $i ); # we need to add to stack only if opening brace count is enough for one of the