https://bugzilla.wikimedia.org/show_bug.cgi?id=32712
Web browser: ---
Bug #: 32712
Summary: External links surrounded by unicode quotation marks
break search index
Product: MediaWiki
Version: 1.18
Platform: All
OS/Version: All
Status: NEW
Severity: major
Priority: Unprioritized
Component: Search
AssignedTo: [email protected]
ReportedBy: [email protected]
Classification: Unclassified
When a page contains an external Link which is surrounded by unicode quotation
marks (U+201E double low-9 quotation mark and U+201C left double quotation
mark), then the article's entry in the searchindex table (field si_text) will
be an empty string.
Reproduce: Just add the following text to an article and save/update fulltext
index:
„http://example.com“
I've done some investigation.
I found out that the first problem arises in includes/search/SearchUpdate.php
starting at line 64 where external URLs should be stripped. preg_replace
destroys the trailing quotation mark and leaves illegal unicode sequence in
$text. At some later stage in processing $text gets truncated to an empty
string, presumably because of the illegal unicode sequence.
--
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l