On 31/05/2013 13:31, Ron Aaron wrote:
On Friday, May 31, 2013 1:56:56 PM UTC+3, Mike Williams wrote:
On 31/05/2013 11:23, Ron Aaron wrote:

"ff" is a ligature, not a composed character.  Although it has a decomposed
form it cannot be recomposed with Unicode composing rules (f is not a
composing character)  There are others including "ffi" - should a search for
"fi" match the second and third characters?

OK, fair enough: it's a ligature.  The original example I gave, of "aleph" + "vowel 
patah" is a precomposed character.  In either case, I think searching for a component thereof should 
work.  Certainly in the original case, it is reasonable to expect searching for an "aleph" to match.

Alas I have zero knowledge of Hebrew so will have to bow to your superior knowledge. You will know better the use case of finding base characters with and without combining marks.

I would go as far as to say that "fi" should match "ffi".  That is, the "ffi" character should be decomposed for 
search purposes to "f" "f" "i".

Ligatures are interesting. They are purely for typographic presentation and effectively represent their component characters. As such you could argue that ligatures should always be treated as the sequence of their characters, and so doing s/fi/it/ on ffi should always result in the 3 character sequence "fit" replacing the original single representative character. Effectively ligatures should always be treated as their expanded equivalents. If anyone wants ligatures in their final file they should then do a final s-and-r for all combinations they want to have appear, since other edits may have introduced them without having been automatically converted to equivalent ligature characters.

My 2ps worth anyway.

The question is do you want a search on the unicode codepoints (e.g. a
search on "ffi") or do you want a search on the semantic Unicode character
sequence (i.e. "ffi")?

As I said, I think that if one searches for "ffi" literally, it should be found (and should 
not match "ffi").  There should also be an option to control the behavior.

Wouldn't this be "ignoreligatures" for the case above?  In addition to
any "ignoreprecomposed".

I don't really care much what it's called, but I'm sure Bram will let us know 
soon what he thinks about all this.

;-)

Mike
--
Some push the envelope to the edge. Others just lick it!

--
--
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- You received this message because you are subscribed to the Google Groups "vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Raspunde prin e-mail lui