On Fri, Sep 14, 2018 at 7:49 PM ToddAndMargo <toddandma...@zoho.com> wrote: > > > On Fri, Sep 14, 2018 at 5:22 PM ToddAndMargo <toddandma...@zoho.com> wrote: > >> > >> Hi All, > >> > >> A tip to share. > >> > >> I work a lot with downloaded web pages. I cut > >> out things like revision numbers and download > >> locations. > >> > >> One of the things that use to drive me a bit nuts was that > >> web pages can come with all kind of weird line terminators. > >> I'd wind up with a link location that bombed because > >> there was some weird unprintable character at the end. > >> > >> Now there are routines to chop off these kind of things, > >> but they don't always work, depending on what the weird > >> character is. > >> > >> What I had done in the past as to dump the page to a file > >> and use a hex editor to figure out what the weird character > >> was. I have found ascii 0, 7, 10, 12, 13 and some other weird > >> ones I can't remember. They often came is combinations too. > >> Then cut the turkey out with a regex. It was a lot of work. > >> > >> Now-a-days, it is easy. I just get "greedy" (chuckle). > >> I always know what end of the string should be: .zip, > >> .exe, .rpm, etc.. So > >> > >> $Str ~~ s/ ".zip" .* /.zip/; > >> > >> $ p6 'my $x="abc.zip"~chr(7)~chr(138); $x~~s/ ".zip" .* /.zip/; say > >> "<$x>";' > >> <abc.zip> > >> > >> Problem solved. And it doesn't care what the weird character(s) > >> at the end is/are. > >> > >> :-) > >> > >> Hope this helps someone else. Thank you for all the > >> help you guys have given me! > >> > >> -T > > > On 09/14/2018 05:43 PM, Brad Gilbert wrote: > > You can just remove the control characters > > > > my $x="abc.zip"~chr(7)~chr(138); > > $x .= subst(/<:Cc>+ $/,''); > > say $x; > > > > Note that 13 is carriage return and 10 is newline > > > > If the only ending values are (13,10), 13, or 10 > > you can use .chomp to remove them > > > > my $x="abc.zip"~chr(13)~chr(10); > > $x .= chomp; > > say $x; > > Thank you! > > "chomp" was on of those routines I could only get > to work "sometimes". It depended on what weird character(s) > I was dealing with.
`chomp` removes a trailing newline. > > Would you explain what you are doing with > $x .= subst(/<:Cc>+ $/,''); Cc is the Unicode general category for control characters > say 7.uniprop; Cc > say 7.uniprop('General_Category') Cc You can match things by category Like numbers / <:N> / decimal numbers / <:Nd> / letter numbers / <:Nl> / other numbers / <:No> / letters / <:L> / lowercase letters / <:Ll> / uppercase letters / <:Lu> / titlecase letters / <:Lt> / It is exactly the same as $x ~~ s/ <:Cc>+ $ //; Originally I was just going to return the result of .subst() rather than mutating $x.