Ron Aaron wrote:

> I was puzzled when searching for this in some Hebrew text:
> 
>     /ארבע\Z/
> 
> That it did not match this:
> 
>     אַרְבָּעָה
> 
> As it happens, the אַ is Unicode combined form of the aleph plus the
> vowel patah.  
> 
> There are two issues:
> 
> 1) First, is that the normal user would expect a match here, since the
> symbols are semantically  the same (even though Unicode bizarrely
> assigns a separate symbol for the combined vowel+consonant)
> 
> 2) The original file is CP1255 encoded, and my enc is set to UTF8, so
> the file is converted to UTF8 on read.  This is desired, but the
> conversion engine (in this case GNU iconv) is being more helpful than
> it should be.  I don't know what to do about this.
> 
> The first issue can (and should) be dealt with in Vim, probably with
> an option to "decompose multibyte".

I find it a bit annoying that Unicode has two forms for the same
character.  They should have made a choice to either use a base
character plus composing characters, or the combined form.  Now we need
to solve this in software everywhere.

> It may be that Vim's internal iconv functions are better?  Any ideas?

Perhaps iconv has a way to specify decomposing characters?
But we don't want to convert everything.

I suppose decomposing is not an algorithm but a matter of a very big
table.


-- 
Apparently, 1 in 5 people in the world are Chinese.  And there are 5
people in my family, so it must be one of them.  It's either my mum
or my dad.  Or my older brother Colin.  Or my younger brother
Ho-Cha-Chu.  But I think it's Colin.

 /// Bram Moolenaar -- [email protected] -- http://www.Moolenaar.net   \\\
///        sponsor Vim, vote for features -- http://www.Vim.org/sponsor/ \\\
\\\  an exciting new programming language -- http://www.Zimbu.org        ///
 \\\            help me help AIDS victims -- http://ICCF-Holland.org    ///

-- 
-- 
You received this message from the "vim_dev" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

--- 
You received this message because you are subscribed to the Google Groups 
"vim_dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.


Raspunde prin e-mail lui