[Bug 39501] Merging Unicode similar-looking characters in internal search (apostrophes, x and ×, etc)
https://bugzilla.wikimedia.org/show_bug.cgi?id=39501 Nik Everett neverett+bugzi...@wikimedia.org changed: What|Removed |Added Priority|Lowest |Normal --- Comment #11 from Nik Everett neverett+bugzi...@wikimedia.org --- I'll have a look at this when I can. For now I'll leave the component set to CirrusSearch. It looks like PHP implements the same normalization components that I can use in Elasticsearch (http://php.net/manual/en/class.normalizer.php) so I'll have to evaluate doing that normalization there as well. I imagine we'll if we do it in php it'll have to be optional because the normalizer requires PHP 5 = 5.3.0 and PECL intl = 1.0.0. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 39501] Merging Unicode similar-looking characters in internal search (apostrophes, x and ×, etc)
https://bugzilla.wikimedia.org/show_bug.cgi?id=39501 Nik Everett neverett+bugzi...@wikimedia.org changed: What|Removed |Added See Also||https://bugzilla.wikimedia. ||org/show_bug.cgi?id=59666 --- Comment #12 from Nik Everett neverett+bugzi...@wikimedia.org --- In case anyone comes to this from http://thedailywtf.com/Articles/Lightspeed-is-Too-Slow-for-MY-Luggage.aspx#Pic-5, they should have a look at Bug 59666 which should plug that particular embarrassing hole. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 39501] Merging Unicode similar-looking characters in internal search (apostrophes, x and ×, etc)
https://bugzilla.wikimedia.org/show_bug.cgi?id=39501 MZMcBride b...@mzmcbride.com changed: What|Removed |Added CC||b...@mzmcbride.com, ||legoktm.wikipe...@gmail.com ||, matma@gmail.com --- Comment #9 from MZMcBride b...@mzmcbride.com --- Looks like apostrophes came up on The Daily WTF: http://thedailywtf.com/Articles/Lightspeed-is-Too-Slow-for-MY-Luggage.aspx (specifically http://img.thedailywtf.com/images/14/q1/e95/Pic-5.jpg). (In reply to comment #6) Were you thinking this should be done in Cirrus for all languages by pushing analysis configuration to Elasticsearch? Something along those lines would be pretty flexible, allowing, for example, us to boost perfect matches of the typed unicode characters above the squashed ones. We already do some input normalization at some level of the stack (for example, multiple underscores get squashed and input such as AbrAhAm LincoLn works if there's a redirect at Abraham lincoln). It's difficult to look at the provided screenshot and not think that the software has failed our readers. Unless you think these should be MediaWiki page redirects (#REDIRECT)? I think we should do better normalization for search inputs. Any rough idea how big of a project this would be to implement? -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 39501] Merging Unicode similar-looking characters in internal search (apostrophes, x and ×, etc)
https://bugzilla.wikimedia.org/show_bug.cgi?id=39501 MZMcBride b...@mzmcbride.com changed: What|Removed |Added See Also||https://bugzilla.wikimedia. ||org/show_bug.cgi?id=36313 -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 39501] Merging Unicode similar-looking characters in internal search (apostrophes, x and ×, etc)
https://bugzilla.wikimedia.org/show_bug.cgi?id=39501 --- Comment #10 from MZMcBride b...@mzmcbride.com --- (In reply to comment #9) We already do some input normalization at some level of the stack (for example, multiple underscores get squashed and input such as AbrAhAm LincoLn works if there's a redirect at Abraham lincoln). To be more explicit on these points: https://en.wikipedia.org/w/index.php?title=Special%3ASearchsearch=AbrAhAm+LincoLn https://en.wikipedia.org/w/index.php?title=Special%3ASearchsearch=_AbrAhAm_LincoLn_ We may be able to implement apostrophe normalization at the same level. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 39501] Merging Unicode similar-looking characters in internal search (apostrophes, x and ×, etc)
https://bugzilla.wikimedia.org/show_bug.cgi?id=39501 Nik Everett neverett+bugzi...@wikimedia.org changed: What|Removed |Added See Also||https://bugzilla.wikimedia. ||org/show_bug.cgi?id=57242 --- Comment #8 from Nik Everett neverett+bugzi...@wikimedia.org --- Added see also bug. I think we should do this when we pull the unicode plugin in to Elasticsearch. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 39501] Merging Unicode similar-looking characters in internal search (apostrophes, x and ×, etc)
https://bugzilla.wikimedia.org/show_bug.cgi?id=39501 --- Comment #6 from Nik Everett neverett+bugzi...@wikimedia.org --- Chad, Were you thinking this should be done in Cirrus for all languages by pushing analysis configuration to Elasticsearch? Something along those lines would be pretty flexible, allowing, for example, us to boost perfect matches of the typed unicode characters above the squashed ones. I'm not saying that is a good idea, just something that is possible. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 39501] Merging Unicode similar-looking characters in internal search (apostrophes, x and ×, etc)
https://bugzilla.wikimedia.org/show_bug.cgi?id=39501 --- Comment #7 from Chad H. innocentkil...@gmail.com --- (In reply to comment #6) Chad, Were you thinking this should be done in Cirrus for all languages by pushing analysis configuration to Elasticsearch? Something along those lines would be pretty flexible, allowing, for example, us to boost perfect matches of the typed unicode characters above the squashed ones. Yeah that was pretty much my thinking. I'm not saying that is a good idea, just something that is possible. I think it's a good idea, eventually. I set priority so low on purpose :) -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 39501] Merging Unicode similar-looking characters in internal search (apostrophes, x and ×, etc)
https://bugzilla.wikimedia.org/show_bug.cgi?id=39501 Chad H. innocentkil...@gmail.com changed: What|Removed |Added Priority|Normal |Lowest CC||innocentkil...@gmail.com, ||neverett+bugzilla@wikimedia ||.org Component|lucene-search-2 |CirrusSearch Product|Wikimedia |MediaWiki extensions Target Milestone|--- |Future release Summary|Merging Unicode |Merging Unicode |apostrophe-like characters |similar-looking characters |in internal search |in internal search ||(apostrophes, x and ×, ||etc) Severity|normal |enhancement --- Comment #5 from Chad H. innocentkil...@gmail.com --- Widening scope a tiny bit. If we're going to do this it should be done all at once. AntiSpoof's sort of the idea I'm thinking here. Repurposing into a Cirrus bug as lsearchd has been end-of-lifed and won't be fixed further. -- You are receiving this mail because: You are on the CC list for the bug. You are the assignee for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l