On Fri, Sep 14, 2018 at 7:49 PM ToddAndMargo <toddandma...@zoho.com> wrote:
>
> > On Fri, Sep 14, 2018 at 5:22 PM ToddAndMargo <toddandma...@zoho.com> wrote:
> >>
> >> Hi All,
> >>
> >> A tip to share.
> >>
> >> I work a lot with downloaded web pages.  I cut
> >> out things like revision numbers and download
> >> locations.
> >>
> >> One of the things that use to drive me a bit nuts was that
> >> web pages can come with all kind of weird line terminators.
> >> I'd wind up with a link location that bombed because
> >> there was some weird unprintable character at the end.
> >>
> >> Now there are routines to chop off these kind of things,
> >> but they don't always work, depending on what the weird
> >> character is.
> >>
> >> What I had done in the past as to dump the page to a file
> >> and use a hex editor to figure out what the weird character
> >> was.  I have found ascii 0, 7, 10, 12, 13 and some other weird
> >> ones I can't remember.  They often came is combinations too.
> >> Then cut the turkey out with a regex.  It was a lot of work.
> >>
> >> Now-a-days, it is easy.  I just get "greedy" (chuckle).
> >> I always know what end of the string should be: .zip,
> >> .exe, .rpm, etc..  So
> >>
> >>      $Str ~~ s/ ".zip"  .* /.zip/;
> >>
> >>      $ p6 'my $x="abc.zip"~chr(7)~chr(138); $x~~s/ ".zip" .* /.zip/; say
> >> "<$x>";'
> >>      <abc.zip>
> >>
> >> Problem solved.  And it doesn't care what the weird character(s)
> >> at the end is/are.
> >>
> >> :-)
> >>
> >> Hope this helps someone else.  Thank you for all the
> >> help you guys have given me!
> >>
> >> -T
>
>
> On 09/14/2018 05:43 PM, Brad Gilbert wrote:
>  > You can just remove the control characters
>  >
>  >     my $x="abc.zip"~chr(7)~chr(138);
>  >     $x .= subst(/<:Cc>+ $/,'');
>  >     say $x;
>  >
>  > Note that 13 is carriage return and 10 is newline
>  >
>  > If the only ending values are (13,10), 13, or 10
>  > you can use .chomp to remove them
>  >
>  >     my $x="abc.zip"~chr(13)~chr(10);
>  >     $x .= chomp;
>  >     say $x;
>
> Thank you!
>
> "chomp" was on of those routines I could only get
> to work "sometimes".  It depended on what weird character(s)
> I was dealing with.

`chomp` removes a trailing newline.

>
> Would you explain what you are doing with
>     $x .= subst(/<:Cc>+ $/,'');

Cc is the Unicode general category for control characters

    > say 7.uniprop;
    Cc

    > say 7.uniprop('General_Category')
    Cc

You can match things by category

Like numbers
    / <:N> /
decimal numbers
    / <:Nd> /
letter numbers
    / <:Nl> /
other numbers
    / <:No> /

letters
    / <:L> /
lowercase letters
    / <:Ll> /
uppercase letters
    / <:Lu> /
titlecase letters
    / <:Lt> /

It is exactly the same as

   $x ~~ s/ <:Cc>+ $ //;

Originally I was just going to return the result of .subst()
rather than mutating $x.

Reply via email to