On Fri, Sep 14, 2018 at 7:49 PM ToddAndMargo <[email protected]> wrote:
>
> > On Fri, Sep 14, 2018 at 5:22 PM ToddAndMargo <[email protected]> wrote:
> >>
> >> Hi All,
> >>
> >> A tip to share.
> >>
> >> I work a lot with downloaded web pages. I cut
> >> out things like revision numbers and download
> >> locations.
> >>
> >> One of the things that use to drive me a bit nuts was that
> >> web pages can come with all kind of weird line terminators.
> >> I'd wind up with a link location that bombed because
> >> there was some weird unprintable character at the end.
> >>
> >> Now there are routines to chop off these kind of things,
> >> but they don't always work, depending on what the weird
> >> character is.
> >>
> >> What I had done in the past as to dump the page to a file
> >> and use a hex editor to figure out what the weird character
> >> was. I have found ascii 0, 7, 10, 12, 13 and some other weird
> >> ones I can't remember. They often came is combinations too.
> >> Then cut the turkey out with a regex. It was a lot of work.
> >>
> >> Now-a-days, it is easy. I just get "greedy" (chuckle).
> >> I always know what end of the string should be: .zip,
> >> .exe, .rpm, etc.. So
> >>
> >> $Str ~~ s/ ".zip" .* /.zip/;
> >>
> >> $ p6 'my $x="abc.zip"~chr(7)~chr(138); $x~~s/ ".zip" .* /.zip/; say
> >> "<$x>";'
> >> <abc.zip>
> >>
> >> Problem solved. And it doesn't care what the weird character(s)
> >> at the end is/are.
> >>
> >> :-)
> >>
> >> Hope this helps someone else. Thank you for all the
> >> help you guys have given me!
> >>
> >> -T
>
>
> On 09/14/2018 05:43 PM, Brad Gilbert wrote:
> > You can just remove the control characters
> >
> > my $x="abc.zip"~chr(7)~chr(138);
> > $x .= subst(/<:Cc>+ $/,'');
> > say $x;
> >
> > Note that 13 is carriage return and 10 is newline
> >
> > If the only ending values are (13,10), 13, or 10
> > you can use .chomp to remove them
> >
> > my $x="abc.zip"~chr(13)~chr(10);
> > $x .= chomp;
> > say $x;
>
> Thank you!
>
> "chomp" was on of those routines I could only get
> to work "sometimes". It depended on what weird character(s)
> I was dealing with.
`chomp` removes a trailing newline.
>
> Would you explain what you are doing with
> $x .= subst(/<:Cc>+ $/,'');
Cc is the Unicode general category for control characters
> say 7.uniprop;
Cc
> say 7.uniprop('General_Category')
Cc
You can match things by category
Like numbers
/ <:N> /
decimal numbers
/ <:Nd> /
letter numbers
/ <:Nl> /
other numbers
/ <:No> /
letters
/ <:L> /
lowercase letters
/ <:Ll> /
uppercase letters
/ <:Lu> /
titlecase letters
/ <:Lt> /
It is exactly the same as
$x ~~ s/ <:Cc>+ $ //;
Originally I was just going to return the result of .subst()
rather than mutating $x.