https://bugzilla.wikimedia.org/show_bug.cgi?id=18609
Summary: Search index text is empty if page contains unmatched "<" Product: MediaWiki Version: 1.15-svn Platform: All OS/Version: All Status: NEW Severity: normal Priority: Normal Component: Search AssignedTo: wikibugs-l@lists.wikimedia.org ReportedBy: neph...@skyhighway.com Created an attachment (id=6067) --> (https://bugzilla.wikimedia.org/attachment.cgi?id=6067) Fix broken regexp in SearchUpdate.php (patch to r49794) If an article contains a "<" symbol and there is no subsequent ">" symbol anywhere in the article, the si_text field for that article in the searchindex table ends up completely empty -- even the text in the article before the "<" symbol is wiped out. It is therefore impossible to search on any of the article's contents. For example, http://www.uesp.net/wiki/UESPWiki:Mirror_Plan is currently triggering this bug; si_text is being set to ''. Although UESP is currently running MW1.10, the same bug occurs if the article is added to a test wiki running r49794. The basic problem is an incorrect pair of parentheses in a preg_replace expression in SearchUpdate.php::doUpdate(). The attached patch file removes those parentheses; I also did some secondary cleanup of the expression by deleting some redundant chunks ("[A-Za-z0-9]*\\s*" is all covered equally well by "[^>]*?", and the simpler expression doesn't mislead editors). The revised regexp successfully processes UESPWiki:Mirror_Plan, and also successfully processes some test pages containing html tags. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l