On 03/11/12 20:42, stosss wrote:
Greetings,
Just trying to learn

On Sat, Nov 3, 2012 at 3:24 PM, Tim Chase <v...@tim.thechases.com> wrote:
On 11/03/12 14:11, Chris Lott wrote:
I have a large text file in which I need to remove all punctuation,
all special characters ("smart quotes") and the like, and a bunch of
selected words.

Can this be done within Vim?

Yes.

Oh, you want to know *how*? :-P

The smart-quotes are the hardest ones to do, but if you can enter
them in vim (or select+yank them, and then paste them into an Ex
command using control+R followed by a double-quote), they should be
usable:

  :%s/\([[:punct:]]\+\|”\|“\|selected\|words\)//g

Alternatively, you might want to specify what *is* allowed and
invert it:

   :%s/\W\+//g   " that's "everything that isn't a Word character"
or
   :%s/[^[:alnum:][:space]]\+//g  "all but alnum & spaces"

which you can read about at

   :help :alnum:
   :help /\W
   :help /\|


Asking because I don't know and I don't use smart quotes. What makes
them so difficult to remove in a s/search/replace/g ?

Aren't they just quotation marks?

Well, yes, but most keyboards haven't got them. The "usual" quotation marks (which I just used) are the same opening and closing, U+0022, and all keyboards that I know of (even US-ASCII keyboards with no accents) can easily produce them. Smart quotes can be “smart” or “smart„ or even „smart“: there are three characters for smart double quotes, U+201C upper-6, U+201D upper-9, U+201E lower-9, and they are not the same opening and closing, though which one is opening and which one is closing varies by language and sometimes by country. These characters are not in Latin1, I think they are in none of the ISO-8859 charsets, so you need some Unicode charset (such as UTF-8) to be able to represent them, and most keyboards either don't have them, or require some unusual fingering to type them: on this Linux system with Belgian keyboard layout, „ isn't available (I paste it from Vim where it has the digraph :9), and “ ” are AltGr-v and AltGr-b respectively (hold AltGr while hitting the letter). For single smart quotes it's even harder: ‘ AltGr-Shift-v, ’ AltGr-Shift-b, ‚ (not the comma but the low-9 single quote) not available. (AltGr is the key right of the space bar on international keyboards, and if you've got a second plain Alt key there you might try Alt together with Ctrl).

While I'm here, if you want to select your selected words only as full words (e.g. "unusual" as a word but not as part of "unusually"; "word" or "words" but not "worded") you should use \< (zero-length start of word) and \> (zero-length end of word) as part of your pattern:

        :%s/[[:punct:]“”]\|\<unusual\>\|\<words\=\>//g

If you want to remove other kinds of quotes, e.g. « » ‘ ’ ‚ i.e. U+00AB U+00BB U+2018 U+2019 U+201A, the pattern can easily be extended.

To type smart quotes in Vim, if you haven't got them on your keyboard, I recommend using digraphs, they're easy to remember:
        “       Ctrl-K " 6         double 6 above
        ”       Ctrl-K " 9         double 9 above
        „       Ctrl-K : 9              double 9 below
        ‘       Ctrl-K ' 6              single 6 above
        ’       Ctrl-K ' 9              single 9 above
        ‚       Ctrl-K . 9              single 9 below
        «       Ctrl-K < <                opening French
        »       Ctrl-K > >                closing French
see :help digraph.txt; or you can input them by their Unicode codepoint in hex, see :help i_CTRL-V_digit. («French» quotes are sometimes used in »German« with the opposite meaning, BTW.)

The substitute above will not remove spaces around the words. You may (if you want) *follow* this substitute with

        :%s/ \{2,}/ /
to replace two or more spaces by one space, or with
        :%s/ *\ze\%( \|$\)//
if you also want to remove any number of spaces at end of line. To remove all spaces at begin or end of line but replace them by one space elsewhere is harder to do in one operation. Hm...
        :%s/\%(\%(^\| \)\zs *\)\|\%( *\ze\%( \|$\)\)//
should work I think, but it isn't very elegant.


Best regards,
Tony.
--
"But officer, I was only trying to gain enough speed so I could coast
to the nearest gas station."

--
You received this message from the "vim_use" maillist.
Do not top-post! Type your reply below the text you are replying to.
For more information, visit http://www.vim.org/maillist.php

Reply via email to