On Fri, Mar 29, 2024 at 12:07:16PM -0400, Paul Fox wrote: > I often want to clean up the residue you sometimes get when copying > text from a web page -- bits of unicode, or special characters, like > the "\?B0" my screen uses to show the "degrees" symbols in this line: > Start Totality 01:33 pm 67.2? 178.0? > > I think that long, long, ago, I could find those characters using \P, > but that was before the vile's shorthand search notations were brought > into line with the X/Open classes. With that change, the TAB character > lost its "printable" status, so \P finds tabs as well as true non-printables. > > What I think I want is a shorthand for [:ascii:] (meaning "8th bit clear"). > Is this available in some way that I'm missing? > > Would it be possible to add this, perhaps bound to \y or \z? Even if > it weren't bound to a shorthand, if [[:ascii:]] were available as > part of a search string, that would be useful enough.
it sounds simple.
I generally just type in
[^ -~^I]
to find non-printable bytes (where the "^I" is the tab character).
For zapping Unicode as well,
:setl fk=8bit
Oddly, perlre doesn't list anything appropriate, though \z and \Z have
useful meanings. I don't see \y or \Y (except the latter as something
that one might customize).
>
> (Oddly, if I search for "[[:ascii:]]" today, it finds instances of ":]".
> Not sure why.)
>
>
> Current classes and shorthands:
> \i \I [:alnum:]
> \a \A [:alpha:]
> \b \B [:blank:]
> \c \C [:cntrl:]
> \d \D [:digit:]
> \f \F [:file:]
> \g \G [:graph:]
> \w \W [:ident:], alphanumeric (plus '_')
> \l \L [:lower:]
> \o \O [:octal:]
> \p \P [:print:], printable (note that space is printable)
> \q \Q [:punct:]
> \s \S [:space:]
> \u \U [:upper:]
> \x \X [:xdigit:]
>
>
> =----------------------
> paul fox, [email protected] (arlington, ma, where it's 40.1 degrees)
>
>
--
Thomas E. Dickey <[email protected]>
https://invisible-island.net
signature.asc
Description: PGP signature
