https://bugzilla.wikimedia.org/show_bug.cgi?id=18609

           Summary: Search index text is empty if page contains unmatched
                    "<"
           Product: MediaWiki
           Version: 1.15-svn
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: Normal
         Component: Search
        AssignedTo: wikibugs-l@lists.wikimedia.org
        ReportedBy: neph...@skyhighway.com


Created an attachment (id=6067)
 --> (https://bugzilla.wikimedia.org/attachment.cgi?id=6067)
Fix broken regexp in SearchUpdate.php (patch to r49794)

If an article contains a "<" symbol and there is no subsequent ">" symbol
anywhere in the article, the si_text field for that article in the searchindex
table ends up completely empty -- even the text in the article before the "<"
symbol is wiped out.  It is therefore impossible to search on any of the
article's contents.

For example, http://www.uesp.net/wiki/UESPWiki:Mirror_Plan is currently
triggering this bug; si_text is being set to ''. Although UESP is currently
running MW1.10, the same bug occurs if the article is added to a test wiki
running r49794.

The basic problem is an incorrect pair of parentheses in a preg_replace
expression in SearchUpdate.php::doUpdate().  The attached patch file removes
those parentheses; I also did some secondary cleanup of the expression by
deleting some redundant chunks ("[A-Za-z0-9]*\\s*" is all covered equally well
by "[^>]*?", and the simpler expression doesn't mislead editors).  The revised
regexp successfully processes UESPWiki:Mirror_Plan, and also successfully
processes some test pages containing html tags.


-- 
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are on the CC list for the bug.

_______________________________________________
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l

Reply via email to