Matching decomposable Unicode characters

Ron Aaron Thu, 30 May 2013 22:24:51 -0700

I was puzzled when searching for this in some Hebrew text:

    /ארבע\Z/


That it did not match this:

    אַרְבָּעָה

As it happens, the אַ is Unicode combined form of the aleph plus the vowel 
patah.  

There are two issues:

1) First, is that the normal user would expect a match here, since the symbols 
are semantically  the same (even though Unicode bizarrely assigns a separate 
symbol for the combined vowel+consonant)

2) The original file is CP1255 encoded, and my enc is set to UTF8, so the file 
is converted to UTF8 on read.  This is desired, but the conversion engine (in 
this case GNU iconv) is being more helpful than it should be.  I don't know 
what to do about this.

The first issue can (and should) be dealt with in Vim, probably with an option 
to "decompose multibyte".

It may be that Vim's internal iconv functions are better?  Any ideas?

-- 
-- 
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Matching decomposable Unicode characters

Raspunde prin e-mail lui