Mark Woodward wrote:
Hi all,

came across this [Emacs] link the other day and wondered if Vim
can do this?
http://steve-yegge.blogspot.com/

in summary...

change these to Bob, Sue, Ralph etc (capitalised)

bob
sue
ralph
alice
jimmy
preston
billy joe jim bob

:s/\w\+/\u\0/g

capitalise last letter (eg boB)

:'<,'>s/\(\w\+\)\(\w\)/\1\u\2/g



change each of these to getFather(), getMother() etc

public Relative father() { return this.father; }
public Relative mother() { return this.mother; }
public Relative sister() { return this.sister; }
public Relative brother() { return this.brother; }
public Relative auntie() { return this.auntie; }
public Relative uncle() { return this.uncle; }


:'<,'>s/public Relative\zs\(\w\)/get\U\1/g



1. change these to number order starting at 1 (1, 2, 3, 4 etc)
2. change these to alpha list (a, b, c, etc)
[This one has me stumped although I'm sure I've seen something
along these lines before. ? something to do with sub-replace-special
and submatch?]


1987:Bogotá
5243:Fabergé
9772:Mallarmé
12044:Paraná
12499:Poincaré
16956:abbé
19923:appliqué
20932:attaché
23704:blasé
26223:café
26511:canapé
29314:cliché
31431:consommé
38981:décolleté
42995:fiancé
43623:flambé
44996:frappé
48317:habitué
58328:macramé
58898:manqué
62514:naiveté
65243:outré
66710:passé
71609:protégé
73675:recherché
76387:risqué
76847:roué
77811:sauté
82455:soufflé
89055:touché
96268:émigré
96274:études


any hints?,



Since only acute accents are involved, and only on a or e, we just need to replace á by a and é by e. The easiest is to do it in two passes:

        :1,$s/á/a/g
        :1,$s/é/e/g

If you want to do it in a single pass, and maybe for any possible accents, it might be possible with a "pure" regexp, but I believe a function would be simpler, and maybe even faster (but I'm less sure about the latter):

        :1,$s/[\x80-\xFF]/\=RemoveAccents(submatch(0))/g

where, prior to using the above substitute, we would have defined the following:

        let s:accentsDict = {
        \       "á" : "a",
        \       "â" : "a",
        \       "à" : "a",
        \       "ä" : "ae",
        \       "ã" : "a",
        \       "æ" : "ae",
        \       "å" : "aa"
        \       "ç" : "c",
        \       "ð" : "dh"
        \       "é" : "e",
        \       "ê" : "e",
        \       "è" : "e",
        \       "ë" : "e",
        \       "í' : "i",
        \       "î" : "i",
        \       "ì" : "i",
        \       "ï" : "i",
        \       "ñ" : "ny"
        \       "ó" : "o",
        \       "ô" : "o",
        \       "ò" : "o",
        \       "ö" : "oe",
        \       "õ" : "o",
        \       "ø" : "oe",
        \       "ß" : "ss",
        \       "þ" : "th",
        \       "ú" : "u",
        \       "û" : "u",
        \       "ù" : "u",
        \       "ü" : "ue",
        \       "ÿ" : "y",

... similarly for uppercase ...

        \       }
        function RemoveAccents(byte)
                if has_key(s:accentsDict, a:byte)
                        return s:accentsDict[a:byte]
                else
                        return a:byte
                endif
        endfunction

For speed, we don't change bytes lower than 0x80 (which don't have accents) and we use a Dictionary (which is set up once, when initialising the script) rather than a set of if... elseif... etc. However, Dictionaries are only possible with Vim 7 or later. Note also that the "keys" of the Dictionary must be single characters but the "values" don't have to. I have tried to mention all lowercase accented letters present in Latin1, plus ae, o-bar, edh, thorn and eszet, which are also part of Latin1 and present in some national alphabets, but not in 7-bit US-ASCII. If I forgot any, they can easily be added.


Best regards,
Tony.

Reply via email to