Ron Aaron wrote: > I was puzzled when searching for this in some Hebrew text: > > /ארבע\Z/ > > That it did not match this: > > אַרְבָּעָה > > As it happens, the אַ is Unicode combined form of the aleph plus the > vowel patah. > > There are two issues: > > 1) First, is that the normal user would expect a match here, since the > symbols are semantically the same (even though Unicode bizarrely > assigns a separate symbol for the combined vowel+consonant) > > 2) The original file is CP1255 encoded, and my enc is set to UTF8, so > the file is converted to UTF8 on read. This is desired, but the > conversion engine (in this case GNU iconv) is being more helpful than > it should be. I don't know what to do about this. > > The first issue can (and should) be dealt with in Vim, probably with > an option to "decompose multibyte".
I find it a bit annoying that Unicode has two forms for the same character. They should have made a choice to either use a base character plus composing characters, or the combined form. Now we need to solve this in software everywhere. > It may be that Vim's internal iconv functions are better? Any ideas? Perhaps iconv has a way to specify decomposing characters? But we don't want to convert everything. I suppose decomposing is not an algorithm but a matter of a very big table. -- Apparently, 1 in 5 people in the world are Chinese. And there are 5 people in my family, so it must be one of them. It's either my mum or my dad. Or my older brother Colin. Or my younger brother Ho-Cha-Chu. But I think it's Colin. /// Bram Moolenaar -- [email protected] -- http://www.Moolenaar.net \\\ /// sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\ \\\ an exciting new programming language -- http://www.Zimbu.org /// \\\ help me help AIDS victims -- http://ICCF-Holland.org /// -- -- You received this message from the "vim_dev" maillist. Do not top-post! Type your reply below the text you are replying to. For more information, visit http://www.vim.org/maillist.php --- You received this message because you are subscribed to the Google Groups "vim_dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
