I was puzzled when searching for this in some Hebrew text:
/ארבע\Z/
That it did not match this:
אַרְבָּעָה
As it happens, the אַ is Unicode combined form of the aleph plus the vowel
patah.
There are two issues:
1) First, is that the normal user would expect a match here, since the symbols
are semantically the same (even though Unicode bizarrely assigns a separate
symbol for the combined vowel+consonant)
2) The original file is CP1255 encoded, and my enc is set to UTF8, so the file
is converted to UTF8 on read. This is desired, but the conversion engine (in
this case GNU iconv) is being more helpful than it should be. I don't know
what to do about this.
The first issue can (and should) be dealt with in Vim, probably with an option
to "decompose multibyte".
It may be that Vim's internal iconv functions are better? Any ideas?
--
--
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php
---
You received this message because you are subscribed to the Google Groups
"vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.