[MediaWiki-commits] [Gerrit] mediawiki/core[master]: Protect language converter markup in the preprocessor (take 2).

2017-05-23 Thread jenkins-bot (Code Review)
jenkins-bot has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/333997 )

Change subject: Protect language converter markup in the preprocessor (take 2).
..


Protect language converter markup in the preprocessor (take 2).

This revises 28774022769d2273be16c6c6e1cca710a1fd97ef, which was
reverted in master due to unexpected issues with `-{{...}} ` markup
on translatewiki and enwiki.  Test cases are added to ensure that this
is parsed as a template, not as language converter markup.

https://www.mediawiki.org/wiki/Preprocessor_ABNF is the canonical
documentation for the preprocessor; this will be updated after this
patch is merged.  The basic principles described in that page are
maintained in this patch:

* Rightmost opening structure has precedence: `-{{` is parsed as a
dash followed by template opening.

* `{{{` has precedence over `{{` and `-{`: `-` is parsed as
`-{` `{{{` since we first grab the rightmost `{{{`.

A bunch of test cases were added to verify the "ideal precedence"
order described on that wiki page.

This patch introduced some minor incompatibilities in existing
markup, in particular with chemical formulae in templates.
Fixes for these are being tracked at
https://www.mediawiki.org/wiki/Parsoid/Language_conversion/Preprocessor_fixups

Bug: T146304
Bug: T153761
Change-Id: I2f0c186c75e392c95e1a3d89266cae2586349150
---
M RELEASE-NOTES-1.30
M includes/parser/Preprocessor.php
M includes/parser/Preprocessor_DOM.php
M includes/parser/Preprocessor_Hash.php
M tests/parser/parserTests.txt
5 files changed, 444 insertions(+), 35 deletions(-)

Approvals:
  Reedy: Looks good to me, approved
  jenkins-bot: Verified



diff --git a/RELEASE-NOTES-1.30 b/RELEASE-NOTES-1.30
index aa583b8..97356fd 100644
--- a/RELEASE-NOTES-1.30
+++ b/RELEASE-NOTES-1.30
@@ -79,6 +79,10 @@
   deprecated in 1.24) were removed.
 * wfMemcKey() and wfGlobalCacheKey() were deprecated. ObjectCache::makeKey() 
and
   ObjectCache::makeGlobalKey() should be used instead.
+* (T146304) Preprocessor handling of LanguageConverter markup has been 
improved.
+  As a result of the new uniform handling, '-{' may need to be escaped
+  (for example, as '-{') where it occurs inside template arguments
+  or wikilinks.
 
 == Compatibility ==
 MediaWiki 1.30 requires PHP 5.5.9 or later. There is experimental support for
diff --git a/includes/parser/Preprocessor.php b/includes/parser/Preprocessor.php
index 426b550..cb8e3a7 100644
--- a/includes/parser/Preprocessor.php
+++ b/includes/parser/Preprocessor.php
@@ -51,9 +51,9 @@
],
'-{' => [
'end' => '}-',
-   'names' => [ 1 => null ],
-   'min' => 1,
-   'max' => 1,
+   'names' => [ 2 => null ],
+   'min' => 2,
+   'max' => 2,
],
];
 
diff --git a/includes/parser/Preprocessor_DOM.php 
b/includes/parser/Preprocessor_DOM.php
index b93c617..7539307 100644
--- a/includes/parser/Preprocessor_DOM.php
+++ b/includes/parser/Preprocessor_DOM.php
@@ -223,8 +223,7 @@
 
$searchBase = "[{<\n"; # }
if ( !$wgDisableLangConversion ) {
-   // FIXME: disabled due to T153761
-   // $searchBase .= '-';
+   $searchBase .= '-';
}
 
// For fast reverse searches
@@ -277,6 +276,13 @@
$search = $searchBase;
if ( $stack->top === false ) {
$currentClosing = '';
+   } elseif (
+   $stack->top->close === '}-' &&
+   $stack->top->count > 2
+   ) {
+   # adjust closing for -{{{...{{
+   $currentClosing = '}';
+   $search .= $currentClosing;
} else {
$currentClosing = $stack->top->close;
$search .= $currentClosing;
@@ -333,11 +339,15 @@
} elseif ( isset( 
$this->rules[$curChar] ) ) {
$found = 'open';
$rule = $this->rules[$curChar];
-   } elseif ( $curChar == '-' ) {
-   $found = 'dash';
} else {
-   # Some versions of PHP have a 
strcspn which stops on null characters
-   # Ignore and continue
+   

[MediaWiki-commits] [Gerrit] mediawiki/core[master]: Protect language converter markup in the preprocessor (take 2).

2017-01-24 Thread C. Scott Ananian (Code Review)
C. Scott Ananian has uploaded a new change for review. ( 
https://gerrit.wikimedia.org/r/333997 )

Change subject: Protect language converter markup in the preprocessor (take 2).
..

Protect language converter markup in the preprocessor (take 2).

This revises 28774022769d2273be16c6c6e1cca710a1fd97ef, which was
reverted in master due to unexpected issues with `-{{...}} ` markup
on translatewiki and enwiki.  Test cases are added to ensure that this
is parsed as a template, not as language converter markup.

https://www.mediawiki.org/wiki/Preprocessor_ABNF is the canonical
documentation for the preprocessor; this will be updated after this
patch is merged.  The basic principles described in that page are
maintained in this patch:

* Rightmost opening structure has precedence: `-{{` is parsed as a
dash followed by template opening.

* `{{{` has precedence over `{{` and `-{`: `-` is parsed as
`-{` `{{{` since we first grab the rightmost `{{{`.

A bunch of test cases were added to verify the "ideal precedence"
order described on that wiki page.

Bug: T153761
Change-Id: I2f0c186c75e392c95e1a3d89266cae2586349150
---
M includes/parser/Preprocessor.php
M includes/parser/Preprocessor_DOM.php
M includes/parser/Preprocessor_Hash.php
M tests/parser/parserTests.txt
4 files changed, 245 insertions(+), 35 deletions(-)


  git pull ssh://gerrit.wikimedia.org:29418/mediawiki/core 
refs/changes/97/333997/1

diff --git a/includes/parser/Preprocessor.php b/includes/parser/Preprocessor.php
index 426b550..cb8e3a7 100644
--- a/includes/parser/Preprocessor.php
+++ b/includes/parser/Preprocessor.php
@@ -51,9 +51,9 @@
],
'-{' => [
'end' => '}-',
-   'names' => [ 1 => null ],
-   'min' => 1,
-   'max' => 1,
+   'names' => [ 2 => null ],
+   'min' => 2,
+   'max' => 2,
],
];
 
diff --git a/includes/parser/Preprocessor_DOM.php 
b/includes/parser/Preprocessor_DOM.php
index 661318b..3cdd38c 100644
--- a/includes/parser/Preprocessor_DOM.php
+++ b/includes/parser/Preprocessor_DOM.php
@@ -223,8 +223,7 @@
 
$searchBase = "[{<\n"; # }
if ( !$wgDisableLangConversion ) {
-   // FIXME: disabled due to T153761
-   // $searchBase .= '-';
+   $searchBase .= '-';
}
 
// For fast reverse searches
@@ -277,6 +276,13 @@
$search = $searchBase;
if ( $stack->top === false ) {
$currentClosing = '';
+   } else if (
+   $stack->top->close === '}-' &&
+   $stack->top->count > 2
+   ) {
+   # adjust closing for -{{{...{{
+   $currentClosing = '}';
+   $search .= $currentClosing;
} else {
$currentClosing = $stack->top->close;
$search .= $currentClosing;
@@ -333,11 +339,15 @@
} elseif ( isset( 
$this->rules[$curChar] ) ) {
$found = 'open';
$rule = $this->rules[$curChar];
-   } elseif ( $curChar == '-' ) {
-   $found = 'dash';
} else {
-   # Some versions of PHP have a 
strcspn which stops on null characters
-   # Ignore and continue
+   # Some versions of PHP have a 
strcspn which stops on
+   # null characters; ignore these 
and continue.
+   # We also may get '-' and '}' 
characters here which
+   # don't match -{ or 
$currentClosing.  Add these to
+   # output and continue.
+   if ( $curChar == '-' || 
$curChar == '}' ) {
+   $accum .= $curChar;
+   }
++$i;
continue;
}
@@ -615,7 +625,10 @@
} elseif ( $found == 'open' ) {
# count opening brace 

[MediaWiki-commits] [Gerrit] mediawiki/core[master]: Protect language converter markup in the preprocessor.

2016-12-15 Thread jenkins-bot (Code Review)
jenkins-bot has submitted this change and it was merged. ( 
https://gerrit.wikimedia.org/r/311849 )

Change subject: Protect language converter markup in the preprocessor.
..


Protect language converter markup in the preprocessor.

This ensures that `{{echo|-{R|foo}-}}` is parsed correctly as
a template invocation with a single argument, not as two separate
arguments split by the `|`.

Bug: T146304
Change-Id: I709d007c70a3fd19264790055042c615999b2f67
---
M includes/parser/Preprocessor.php
M includes/parser/Preprocessor_DOM.php
M includes/parser/Preprocessor_Hash.php
M tests/parser/parserTests.txt
4 files changed, 76 insertions(+), 13 deletions(-)

Approvals:
  Tim Starling: Looks good to me, approved
  jenkins-bot: Verified



diff --git a/includes/parser/Preprocessor.php b/includes/parser/Preprocessor.php
index cc98abd..426b550 100644
--- a/includes/parser/Preprocessor.php
+++ b/includes/parser/Preprocessor.php
@@ -48,7 +48,13 @@
'names' => [ 2 => null ],
'min' => 2,
'max' => 2,
-   ]
+   ],
+   '-{' => [
+   'end' => '}-',
+   'names' => [ 1 => null ],
+   'min' => 1,
+   'max' => 1,
+   ],
];
 
/**
diff --git a/includes/parser/Preprocessor_DOM.php 
b/includes/parser/Preprocessor_DOM.php
index 5da7cd7..950d66d 100644
--- a/includes/parser/Preprocessor_DOM.php
+++ b/includes/parser/Preprocessor_DOM.php
@@ -193,6 +193,8 @@
 * @return string
 */
public function preprocessToXml( $text, $flags = 0 ) {
+   global $wgDisableLangConversion;
+
$forInclusion = $flags & Parser::PTD_FOR_INCLUSION;
 
$xmlishElements = $this->parser->getStripList();
@@ -220,6 +222,10 @@
$stack = new PPDStack;
 
$searchBase = "[{<\n"; # }
+   if ( !$wgDisableLangConversion ) {
+   $searchBase .= '-';
+   }
+
// For fast reverse searches
$revText = strrev( $text );
$lengthText = strlen( $text );
@@ -298,7 +304,10 @@
break;
}
} else {
-   $curChar = $text[$i];
+   $curChar = $curTwoChar = $text[$i];
+   if ( ( $i + 1 ) < $lengthText ) {
+   $curTwoChar .= $text[$i + 1];
+   }
if ( $curChar == '|' ) {
$found = 'pipe';
} elseif ( $curChar == '=' ) {
@@ -311,11 +320,20 @@
} else {
$found = 'line-start';
}
+   } elseif ( $curTwoChar == 
$currentClosing ) {
+   $found = 'close';
+   $curChar = $curTwoChar;
} elseif ( $curChar == $currentClosing 
) {
$found = 'close';
+   } elseif ( isset( 
$this->rules[$curTwoChar] ) ) {
+   $curChar = $curTwoChar;
+   $found = 'open';
+   $rule = $this->rules[$curChar];
} elseif ( isset( 
$this->rules[$curChar] ) ) {
$found = 'open';
$rule = $this->rules[$curChar];
+   } elseif ( $curChar == '-' ) {
+   $found = 'dash';
} else {
# Some versions of PHP have a 
strcspn which stops on null characters
# Ignore and continue
@@ -595,7 +613,8 @@
// input pointer.
} elseif ( $found == 'open' ) {
# count opening brace characters
-   $count = strspn( $text, $curChar, $i );
+   $curLen = strlen( $curChar );
+   $count = ( $curLen > 1 ) ? 1 : strspn( $text, 
$curChar, $i );
 
# we need to add to stack only if opening brace 
count is enough for one of the