Re: translate unicode

Tony Mechelynck Sat, 01 Nov 2008 08:33:01 -0700

On 01/11/08 15:33, bill lam wrote:
> After converting m$ word doc to tex, there are some embedding strings
> such as,
> {\selectlanguage{english}\sffamily\bfseries
> To: [6052?][8C50?]}
>
> Those 4 hex digit inside square brackets are chinese characters, eg,
> that lines should be
> {\selectlanguage{english}\sffamily\bfseries
> To: 恒豐}
>
> instead of manually typing the strings again using ctrl-v u, will that
> be any shortcuts?
>
> PS. some additional katex cjk macros will still be needed to
> constitute a correct chinese tex file but that will be another issue
> not related to vim.
>


Well, you can use the :s[ubstitute] command to convert the codes for 
you. Let's see if I remember how I did it once, a year or two ago.

First, of course, you need a Vim compiled with +multi_byte running with 
'encoding' set to utf-8. What follows will assume that you already use 
that. If the editfile's disk representation is something else (such as 
GB18030) that's no problem as long as 'fileencoding' is set correctly, 
either by 'fileencodings' or by ++enc. But you know that, I'm sure.

        :%s/\[\(\x\x\x\x\)?]/\=eval('"\u' . submatch(1) . '"')/g

I didn't test it today, so be ready to use undo if it doesn't work. 
Also, the above assumes that your [xxxx?] codes always include exactly 4 
hex digits, which implies no codepoint above U+FFFF (it should work 
below U+1000 if leading zeros are included). UTF-16 surrogates are also 
not handled.


Best regards,
Tony.
-- 
hundred-and-one symptoms of being an internet addict:
239. You think "surfing" is something you do on dry land.

--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---

Re: translate unicode

Reply via email to