On 31/05/2013 13:31, Ron Aaron wrote:
On Friday, May 31, 2013 1:56:56 PM UTC+3, Mike Williams wrote:
On 31/05/2013 11:23, Ron Aaron wrote:
"ff" is a ligature, not a composed character. Although it has a decomposed
form it cannot be recomposed with Unicode composing rules (f is not a
composing character) There are others including "ffi" - should a search for
"fi" match the second and third characters?
OK, fair enough: it's a ligature. The original example I gave, of "aleph" + "vowel
patah" is a precomposed character. In either case, I think searching for a component thereof should
work. Certainly in the original case, it is reasonable to expect searching for an "aleph" to match.
Alas I have zero knowledge of Hebrew so will have to bow to your
superior knowledge. You will know better the use case of finding base
characters with and without combining marks.
I would go as far as to say that "fi" should match "ffi". That is, the "ffi" character should be decomposed for
search purposes to "f" "f" "i".
Ligatures are interesting. They are purely for typographic presentation
and effectively represent their component characters. As such you could
argue that ligatures should always be treated as the sequence of their
characters, and so doing s/fi/it/ on ffi should always result in the 3
character sequence "fit" replacing the original single representative
character. Effectively ligatures should always be treated as their
expanded equivalents. If anyone wants ligatures in their final file
they should then do a final s-and-r for all combinations they want to
have appear, since other edits may have introduced them without having
been automatically converted to equivalent ligature characters.
My 2ps worth anyway.
The question is do you want a search on the unicode codepoints (e.g. a
search on "ffi") or do you want a search on the semantic Unicode character
sequence (i.e. "ffi")?
As I said, I think that if one searches for "ffi" literally, it should be found (and should
not match "ffi"). There should also be an option to control the behavior.
Wouldn't this be "ignoreligatures" for the case above? In addition to
any "ignoreprecomposed".
I don't really care much what it's called, but I'm sure Bram will let us know
soon what he thinks about all this.
;-)
Mike
--
Some push the envelope to the edge. Others just lick it!
--
--
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php
---
You received this message because you are subscribed to the Google Groups "vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.