On 01/11/08 15:33, bill lam wrote:
> After converting m$ word doc to tex, there are some embedding strings
> such as,
> {\selectlanguage{english}\sffamily\bfseries
> To: [6052?][8C50?]}
>
> Those 4 hex digit inside square brackets are chinese characters, eg,
> that lines should be
> {\selectlanguage{english}\sffamily\bfseries
> To: 恒豐}
>
> instead of manually typing the strings again using ctrl-v u, will that
> be any shortcuts?
>
> PS. some additional katex cjk macros will still be needed to
> constitute a correct chinese tex file but that will be another issue
> not related to vim.
>
Well, you can use the :s[ubstitute] command to convert the codes for
you. Let's see if I remember how I did it once, a year or two ago.
First, of course, you need a Vim compiled with +multi_byte running with
'encoding' set to utf-8. What follows will assume that you already use
that. If the editfile's disk representation is something else (such as
GB18030) that's no problem as long as 'fileencoding' is set correctly,
either by 'fileencodings' or by ++enc. But you know that, I'm sure.
:%s/\[\(\x\x\x\x\)?]/\=eval('"\u' . submatch(1) . '"')/g
I didn't test it today, so be ready to use undo if it doesn't work.
Also, the above assumes that your [xxxx?] codes always include exactly 4
hex digits, which implies no codepoint above U+FFFF (it should work
below U+1000 if leading zeros are included). UTF-16 surrogates are also
not handled.
Best regards,
Tony.
--
hundred-and-one symptoms of being an internet addict:
239. You think "surfing" is something you do on dry land.
--~--~---------~--~----~------------~-------~--~----~
You received this message from the "vim_use" maillist.
For more information, visit http://www.vim.org/maillist.php
-~----------~----~----~----~------~----~------~--~---