On Fri, Mar 29, 2024 at 12:07:16PM -0400, Paul Fox wrote:
> I often want to clean up the residue you sometimes get when copying
> text from a web page -- bits of unicode, or special characters, like
> the "\?B0" my screen uses to show the "degrees" symbols in this line:
>       Start Totality  01:33 pm        67.2?   178.0?
> 
> I think that long, long, ago, I could find those characters using \P,
> but that was before the vile's shorthand search notations were brought
> into line with the X/Open classes.  With that change, the TAB character
> lost its "printable" status, so \P finds tabs as well as true non-printables.
> 
> What I think I want is a shorthand for [:ascii:] (meaning "8th bit clear").
> Is this available in some way that I'm missing?
> 
> Would it be possible to add this, perhaps bound to \y or \z?  Even if
> it weren't bound to a shorthand, if [[:ascii:]] were available as
> part of a search string, that would be useful enough.

it sounds simple.

I generally just type in

        [^ -~^I]

to find non-printable bytes (where the "^I" is the tab character).
For zapping Unicode as well,
        :setl fk=8bit

Oddly, perlre doesn't list anything appropriate, though \z and \Z have
useful meanings.  I don't see \y or \Y (except the latter as something
that one might customize).

> 
> (Oddly, if I search for "[[:ascii:]]" today, it finds instances of ":]".
> Not sure why.)
> 
> 
> Current classes and shorthands:
>    \i \I  [:alnum:]
>    \a \A  [:alpha:]
>    \b \B  [:blank:]
>    \c \C  [:cntrl:]
>    \d \D  [:digit:]
>    \f \F  [:file:]
>    \g \G  [:graph:]
>    \w \W  [:ident:], alphanumeric (plus '_')
>    \l \L  [:lower:]
>    \o \O  [:octal:]
>    \p \P  [:print:], printable (note that space is printable)
>    \q \Q  [:punct:]
>    \s \S  [:space:]
>    \u \U  [:upper:]
>    \x \X  [:xdigit:]
> 
> 
> =----------------------
>  paul fox, p...@foxharp.boston.ma.us (arlington, ma, where it's 40.1 degrees)
> 
> 

-- 
Thomas E. Dickey <dic...@invisible-island.net>
https://invisible-island.net

Attachment: signature.asc
Description: PGP signature

Reply via email to