On Fri, Mar 29, 2024 at 12:07:16PM -0400, Paul Fox wrote: > I often want to clean up the residue you sometimes get when copying > text from a web page -- bits of unicode, or special characters, like > the "\?B0" my screen uses to show the "degrees" symbols in this line: > Start Totality 01:33 pm 67.2? 178.0? > > I think that long, long, ago, I could find those characters using \P, > but that was before the vile's shorthand search notations were brought > into line with the X/Open classes. With that change, the TAB character > lost its "printable" status, so \P finds tabs as well as true non-printables. > > What I think I want is a shorthand for [:ascii:] (meaning "8th bit clear"). > Is this available in some way that I'm missing? > > Would it be possible to add this, perhaps bound to \y or \z? Even if > it weren't bound to a shorthand, if [[:ascii:]] were available as > part of a search string, that would be useful enough.
it sounds simple. I generally just type in [^ -~^I] to find non-printable bytes (where the "^I" is the tab character). For zapping Unicode as well, :setl fk=8bit Oddly, perlre doesn't list anything appropriate, though \z and \Z have useful meanings. I don't see \y or \Y (except the latter as something that one might customize). > > (Oddly, if I search for "[[:ascii:]]" today, it finds instances of ":]". > Not sure why.) > > > Current classes and shorthands: > \i \I [:alnum:] > \a \A [:alpha:] > \b \B [:blank:] > \c \C [:cntrl:] > \d \D [:digit:] > \f \F [:file:] > \g \G [:graph:] > \w \W [:ident:], alphanumeric (plus '_') > \l \L [:lower:] > \o \O [:octal:] > \p \P [:print:], printable (note that space is printable) > \q \Q [:punct:] > \s \S [:space:] > \u \U [:upper:] > \x \X [:xdigit:] > > > =---------------------- > paul fox, p...@foxharp.boston.ma.us (arlington, ma, where it's 40.1 degrees) > > -- Thomas E. Dickey <dic...@invisible-island.net> https://invisible-island.net
signature.asc
Description: PGP signature