https://bugzilla.wikimedia.org/show_bug.cgi?id=39501
Web browser: ---
Bug #: 39501
Summary: Merging Unicode apostrophe-like characters in internal
search
Product: MediaWiki extensions
Version: unspecified
Platform: All
OS/Version: All
Status: NEW
Severity: normal
Priority: Unprioritized
Component: Lucene Search
AssignedTo: [email protected]
ReportedBy: [email protected]
CC: [email protected]
Classification: Unclassified
Mobile Platform: ---
When doing a search with the apostrophe character U+0027 "apostrophe/single
quote" available on most keyboard, results should match other Unicode
apostrophe-like characters like the preferred apostrophe U+2019 and others.
In 2009 there was a discussion about "Different apostrophe signs and MediaWiki
internal search" see
http://www.gossamer-threads.com/lists/wiki/wikitech/169177
This doesn't seem to have been implemented.
This is related to bug 36313 for autocompletion.
Basically indexing should convert all apostrophes to U+0027, and searching
should convert all apostrophes to U+0027. So articles containing U+2019 for
exemple would be matches when search with U+0027, U+2019 or other apostrophes.
>From the 2009 discussion, the list of apostrophes was:
U+0027 APOSTROPHE
U+2018 LEFT SINGLE QUOTATION MARK
U+2019 RIGHT SINGLE QUOTATION MARK
U+201B SINGLE HIGH-REVERSED-9 QUOTATION MARK
U+2032 PRIME
U+00B4 ACUTE ACCENT
U+0060 GRAVE ACCENT
U+FF40 FULLWIDTH GRAVE ACCENT
U+FF07 FULLWIDTH APOSTROPHE
I would add other characters for which U+0027 is often used as an accessible
substitute like some modifier letters and saltillo:
U+02B9 MODIFIER LETTER PRIME
U+02BB MODIFIER LETTER TURNED COMMA
U+02BC MODIFIER LETTER APOSTROPHE
U+02BD MODIFIER LETTER REVERSED COMMA
U+02BE MODIFIER LETTER RIGHT HALF RING
U+02BF MODIFIER LETTER LEFT HALF RING
U+0384 GREEK TONOS
U+1FBF GREEK PSILI
U+A78B LATIN CAPITAL LETTER SALTILLO
U+A78C LATIN SMALL LETTER SALTILLO
Webkit-based browsers already do this kind of stripping and merge U+0027,
U+2018, U+2019, U+FF07. However there are many cases where merge all the
proposed characters would help regular keyboard input.
The proposed solution in 2009 was to use a strip function:
function stripForSearch( $string ) {
$s = preg_replace( '/\xe2\x80\x99/', '\'', $string );
return parent::stripForSearch( $s );
--
Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
You are on the CC list for the bug.
_______________________________________________
Wikibugs-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l