https://bugzilla.wikimedia.org/show_bug.cgi?id=32712

       Web browser: ---
             Bug #: 32712
           Summary: External links surrounded by unicode quotation marks
                    break search index
           Product: MediaWiki
           Version: 1.18
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: major
          Priority: Unprioritized
         Component: Search
        AssignedTo: [email protected]
        ReportedBy: [email protected]
    Classification: Unclassified


When a page contains an external Link which is surrounded by unicode quotation
marks (U+201E double low-9 quotation mark and U+201C left double quotation
mark), then the article's entry in the searchindex table (field si_text) will
be an empty string.

Reproduce: Just add the following text to an article and save/update fulltext
index:

„http://example.com“


I've done some investigation.
I found out that the first problem arises in includes/search/SearchUpdate.php
starting at line 64 where external URLs should be stripped. preg_replace
destroys the trailing quotation mark and leaves illegal unicode sequence in
$text. At some later stage in processing $text gets truncated to an empty
string, presumably because of the illegal unicode sequence.

-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to