Mark Woodward wrote:
Hi all,
came across this [Emacs] link the other day and wondered if Vim
can do this?
http://steve-yegge.blogspot.com/
in summary...
change these to Bob, Sue, Ralph etc (capitalised)
bob
sue
ralph
alice
jimmy
preston
billy joe jim bob
:s/\w\+/\u\0/g
capitalise last letter (eg boB)
:'<,'>s/\(\w\+\)\(\w\)/\1\u\2/g
change each of these to getFather(), getMother() etc
public Relative father() { return this.father; }
public Relative mother() { return this.mother; }
public Relative sister() { return this.sister; }
public Relative brother() { return this.brother; }
public Relative auntie() { return this.auntie; }
public Relative uncle() { return this.uncle; }
:'<,'>s/public Relative\zs\(\w\)/get\U\1/g
1. change these to number order starting at 1 (1, 2, 3, 4 etc)
2. change these to alpha list (a, b, c, etc)
[This one has me stumped although I'm sure I've seen something
along these lines before. ? something to do with sub-replace-special
and submatch?]
1987:Bogotá
5243:Fabergé
9772:Mallarmé
12044:Paraná
12499:Poincaré
16956:abbé
19923:appliqué
20932:attaché
23704:blasé
26223:café
26511:canapé
29314:cliché
31431:consommé
38981:décolleté
42995:fiancé
43623:flambé
44996:frappé
48317:habitué
58328:macramé
58898:manqué
62514:naiveté
65243:outré
66710:passé
71609:protégé
73675:recherché
76387:risqué
76847:roué
77811:sauté
82455:soufflé
89055:touché
96268:émigré
96274:études
any hints?,
Since only acute accents are involved, and only on a or e, we just need
to replace á by a and é by e. The easiest is to do it in two passes:
:1,$s/á/a/g
:1,$s/é/e/g
If you want to do it in a single pass, and maybe for any possible
accents, it might be possible with a "pure" regexp, but I believe a
function would be simpler, and maybe even faster (but I'm less sure
about the latter):
:1,$s/[\x80-\xFF]/\=RemoveAccents(submatch(0))/g
where, prior to using the above substitute, we would have defined the
following:
let s:accentsDict = {
\ "á" : "a",
\ "â" : "a",
\ "à" : "a",
\ "ä" : "ae",
\ "ã" : "a",
\ "æ" : "ae",
\ "å" : "aa"
\ "ç" : "c",
\ "ð" : "dh"
\ "é" : "e",
\ "ê" : "e",
\ "è" : "e",
\ "ë" : "e",
\ "í' : "i",
\ "î" : "i",
\ "ì" : "i",
\ "ï" : "i",
\ "ñ" : "ny"
\ "ó" : "o",
\ "ô" : "o",
\ "ò" : "o",
\ "ö" : "oe",
\ "õ" : "o",
\ "ø" : "oe",
\ "ß" : "ss",
\ "þ" : "th",
\ "ú" : "u",
\ "û" : "u",
\ "ù" : "u",
\ "ü" : "ue",
\ "ÿ" : "y",
... similarly for uppercase ...
\ }
function RemoveAccents(byte)
if has_key(s:accentsDict, a:byte)
return s:accentsDict[a:byte]
else
return a:byte
endif
endfunction
For speed, we don't change bytes lower than 0x80 (which don't have
accents) and we use a Dictionary (which is set up once, when
initialising the script) rather than a set of if... elseif... etc.
However, Dictionaries are only possible with Vim 7 or later. Note also
that the "keys" of the Dictionary must be single characters but the
"values" don't have to. I have tried to mention all lowercase accented
letters present in Latin1, plus ae, o-bar, edh, thorn and eszet, which
are also part of Latin1 and present in some national alphabets, but not
in 7-bit US-ASCII. If I forgot any, they can easily be added.
Best regards,
Tony.