Reply to message «Issue with substitute() when replacing 8-bit characters and encoding is utf-8 (Was previously Re: lazyredraw for encoding change?)», sent 04:38:42 21 March 2011, Monday by [email protected]:
It is not a bug. You should have read `nr2char' documentation more carefully: this function does not emit bytes and characters starting from 0x80 are no longer single byte in unicode. Original message: > I'm not sure if there is a bug here. If I read the 8-bit character string > from a file and attempt to replace characters, it skips characters > (majority of the characters do get replaced). I'm attaching a simple test > script which fails to replace the characters in the string in the attached > text file. > > My current encoding is set to utf-8; if I set it to latin1, the same code > snippet works fine. > > > > On Sun, Mar 20, 2011 at 4:05 PM, [email protected] < > > [email protected]> wrote: > > ZyX, > > > > Thanks. I don't want to use option (1) for performance reasons. Option > > (2) almost worked; however, it fails to replace some characters. > > > > I wonder if I'm doing the "mappings" table correctly. Does this look > > right? > > > > map(range(128,255), 'iconv(nr2char(v:val),&enc,"latin1")') > > > > The character sequences that are not replaced start with these > > characters. Should these be escaped? > > > > dec: 221, <M-]> > > dec: 193, <M-A> > > dec: 215, <M-Z> > > ... > > > > If I switched from "utf-8" to "latin1", then everything works perfect. It > > has something to do with the encoding mode. > > > > Thanks again! > > > > On Sun, Mar 20, 2011 at 2:38 PM, ZyX <[email protected]> wrote: > >> Reply to message «Re: lazyredraw for encoding change?», > >> sent 00:23:41 21 March 2011, Monday > >> > >> by [email protected]: > >> > I need to map each character to specific strings, for which I have a > >> > dictionary (mappings below) for each character 128-255. For example, > >> > >> <97> > >> > >> > maps to "AA". I use the substitute() function along with submatch to > >> > do that. > >> > >> I would have used something like this: > >> let i=0 > >> let slen=len(str) > >> let r="" > >> while i<slen > >> > >> let c=char2nr(str[i]) > >> if c>128 > >> > >> let r.=mappings[str[i]] > >> > >> else > >> > >> let r.=str[i] > >> > >> endif > >> let i+=1 > >> > >> endwhile > >> > >> Though I tested another option and it works: > >> echo substitute(str, "[\x7F-\xFF]", '\=mappings[submatch(0)]', "g") > >> > >> Note the single quotes around {pattern}: it will force pattern to > >> contain specific bites, not unicode characters. > >> > >> Original message: > >> > Hi ZyX, > >> > > >> > Here is a specific example: > >> > > >> > I have a string that has 8-bits characters in the 128-255 range > >> > > >> > :let a ="<97><9e>·<97>¢<96><9b>Ü<88>å<82>ÌÂ<98>Ð0" > >> > > >> > I need to map each character to specific strings, for which I have a > >> > dictionary (mappings below) for each character 128-255. For example, > >> > >> <97> > >> > >> > maps to "AA". I use the substitute() function along with submatch to > >> > do that. > >> > > >> > let retval = substitute(value, '\([\d128-\d255]\)', > >> > '\=mappings[submatch(1)]', "g") > >> > > >> > It appears to me that substitute() seems to be running based off the > >> > current encoding setting, even if I converted all the rest to "latin1" > >> > using iconv() functions. > >> > > >> > > >> > Thanks for any help! > >> > > >> > On Sun, Mar 20, 2011 at 1:54 PM, ZyX <[email protected]> wrote: > >> > > Reply to message «lazyredraw for encoding change?», > >> > > sent 23:18:31 20 March 2011, Sunday > >> > > by [email protected]: > >> > > > >> > > What do you switch? 'encoding'? You are not supposed to do this > >> > > ever, > >> > >> why > >> > >> > > don't > >> > > you use `iconv()', `scriptencoding utf-8', 'fileencoding', `e ++enc' > >> > >> or > >> > >> > > some > >> > > other stuff depending on what you actually do in your script. Can > >> > > you provide > >> > > more specific example? > >> > > > >> > > Original message: > >> > > > Hello All, > >> > > > > >> > > > I have a script which needs to manipulate strings in 8-bit > >> > >> characters. > >> > >> > > So, > >> > > > >> > > > I switch between the current encoding and the 8-bit encoding. > >> > >> However, > >> > >> > > the > >> > > > >> > > > screen redraws during the change of encoding. Is there a way to > >> > > > suppress it? > >> > > > > >> > > > Thanks!
signature.asc
Description: This is a digitally signed message part.
