subject:"Re\: tip\: that annoying character at the end"

Re: tip: that annoying character at the end

2018-09-14 Thread Brad Gilbert

On Fri, Sep 14, 2018 at 10:11 PM ToddAndMargo  wrote:
>
> On 09/14/2018 07:34 PM, Brad Gilbert wrote:
> > $x ~~ s/ <:Cc>+ $ //;
>
> What exactly is <:Cc> again?

< and >  inside of a regular expression is for advanced features

If the first character is   :   then it knows to look for Unicode properties

One of those properties is GeneralCategory.

The General Category for the weird characters is Cc

( I think it is Control, common )

So this works

/ <:GeneralCategory> /

As a shortcut, you can leave off the "GeneralCategory" portion

/ <:Cc> /

Re: tip: that annoying character at the end

2018-09-14 Thread ToddAndMargo


On 09/14/2018 07:34 PM, Brad Gilbert wrote:

$x ~~ s/ <:Cc>+ $ //;


What exactly is <:Cc> again?

Re: tip: that annoying character at the end

2018-09-14 Thread ToddAndMargo

On 09/14/2018 07:34 PM, Brad Gilbert wrote:

On Fri, Sep 14, 2018 at 7:49 PM ToddAndMargo  wrote:

On Fri, Sep 14, 2018 at 5:22 PM ToddAndMargo  wrote:

Hi All,

A tip to share.

I work a lot with downloaded web pages.  I cut
out things like revision numbers and download
locations.

One of the things that use to drive me a bit nuts was that
web pages can come with all kind of weird line terminators.
I'd wind up with a link location that bombed because
there was some weird unprintable character at the end.

Now there are routines to chop off these kind of things,
but they don't always work, depending on what the weird
character is.

What I had done in the past as to dump the page to a file
and use a hex editor to figure out what the weird character
was.  I have found ascii 0, 7, 10, 12, 13 and some other weird
ones I can't remember.  They often came is combinations too.
Then cut the turkey out with a regex.  It was a lot of work.

Now-a-days, it is easy.  I just get "greedy" (chuckle).
I always know what end of the string should be: .zip,
.exe, .rpm, etc..  So

  $Str ~~ s/ ".zip"  .* /.zip/;

  $ p6 'my $x="abc.zip"~chr(7)~chr(138); $x~~s/ ".zip" .* /.zip/; say
"<$x>";'

Problem solved.  And it doesn't care what the weird character(s)
at the end is/are.

:-)

Hope this helps someone else.  Thank you for all the
help you guys have given me!

-T

On 09/14/2018 05:43 PM, Brad Gilbert wrote:
  > You can just remove the control characters
  >
  > my $x="abc.zip"~chr(7)~chr(138);
  > $x .= subst(/<:Cc>+ $/,'');
  > say $x;
  >
  > Note that 13 is carriage return and 10 is newline
  >
  > If the only ending values are (13,10), 13, or 10
  > you can use .chomp to remove them
  >
  > my $x="abc.zip"~chr(13)~chr(10);
  > $x .= chomp;
  > say $x;

Thank you!

"chomp" was on of those routines I could only get
to work "sometimes".  It depended on what weird character(s)
I was dealing with.

`chomp` removes a trailing newline.

Would you explain what you are doing with
 $x .= subst(/<:Cc>+ $/,'');

Cc is the Unicode general category for control characters

 > say 7.uniprop;
 Cc

 > say 7.uniprop('General_Category')
 Cc

You can match things by category

Like numbers
 / <:N> /
decimal numbers
 / <:Nd> /
letter numbers
 / <:Nl> /
other numbers
 / <:No> /

letters
 / <:L> /
lowercase letters
 / <:Ll> /
uppercase letters
 / <:Lu> /
titlecase letters
 / <:Lt> /

It is exactly the same as

$x ~~ s/ <:Cc>+ $ //;

Originally I was just going to return the result of .subst()
rather than mutating $x.

Wow!  Thank you!

--
~~~
Serious error.
All shortcuts have disappeared.
Screen. Mind. Both are blank.
~~~

Re: tip: that annoying character at the end

2018-09-14 Thread Brad Gilbert

On Fri, Sep 14, 2018 at 7:49 PM ToddAndMargo  wrote:
>
> > On Fri, Sep 14, 2018 at 5:22 PM ToddAndMargo  wrote:
> >>
> >> Hi All,
> >>
> >> A tip to share.
> >>
> >> I work a lot with downloaded web pages.  I cut
> >> out things like revision numbers and download
> >> locations.
> >>
> >> One of the things that use to drive me a bit nuts was that
> >> web pages can come with all kind of weird line terminators.
> >> I'd wind up with a link location that bombed because
> >> there was some weird unprintable character at the end.
> >>
> >> Now there are routines to chop off these kind of things,
> >> but they don't always work, depending on what the weird
> >> character is.
> >>
> >> What I had done in the past as to dump the page to a file
> >> and use a hex editor to figure out what the weird character
> >> was.  I have found ascii 0, 7, 10, 12, 13 and some other weird
> >> ones I can't remember.  They often came is combinations too.
> >> Then cut the turkey out with a regex.  It was a lot of work.
> >>
> >> Now-a-days, it is easy.  I just get "greedy" (chuckle).
> >> I always know what end of the string should be: .zip,
> >> .exe, .rpm, etc..  So
> >>
> >>  $Str ~~ s/ ".zip"  .* /.zip/;
> >>
> >>  $ p6 'my $x="abc.zip"~chr(7)~chr(138); $x~~s/ ".zip" .* /.zip/; say
> >> "<$x>";'
> >>  
> >>
> >> Problem solved.  And it doesn't care what the weird character(s)
> >> at the end is/are.
> >>
> >> :-)
> >>
> >> Hope this helps someone else.  Thank you for all the
> >> help you guys have given me!
> >>
> >> -T
>
>
> On 09/14/2018 05:43 PM, Brad Gilbert wrote:
>  > You can just remove the control characters
>  >
>  > my $x="abc.zip"~chr(7)~chr(138);
>  > $x .= subst(/<:Cc>+ $/,'');
>  > say $x;
>  >
>  > Note that 13 is carriage return and 10 is newline
>  >
>  > If the only ending values are (13,10), 13, or 10
>  > you can use .chomp to remove them
>  >
>  > my $x="abc.zip"~chr(13)~chr(10);
>  > $x .= chomp;
>  > say $x;
>
> Thank you!
>
> "chomp" was on of those routines I could only get
> to work "sometimes".  It depended on what weird character(s)
> I was dealing with.

`chomp` removes a trailing newline.

>
> Would you explain what you are doing with
> $x .= subst(/<:Cc>+ $/,'');

Cc is the Unicode general category for control characters

> say 7.uniprop;
Cc

> say 7.uniprop('General_Category')
Cc

You can match things by category

Like numbers
/ <:N> /
decimal numbers
/ <:Nd> /
letter numbers
/ <:Nl> /
other numbers
/ <:No> /

letters
/ <:L> /
lowercase letters
/ <:Ll> /
uppercase letters
/ <:Lu> /
titlecase letters
/ <:Lt> /

It is exactly the same as

   $x ~~ s/ <:Cc>+ $ //;

Originally I was just going to return the result of .subst()
rather than mutating $x.

Re: tip: that annoying character at the end

2018-09-14 Thread ToddAndMargo

On Fri, Sep 14, 2018 at 5:22 PM ToddAndMargo  wrote:

Hi All,

A tip to share.

I work a lot with downloaded web pages.  I cut
out things like revision numbers and download
locations.

One of the things that use to drive me a bit nuts was that
web pages can come with all kind of weird line terminators.
I'd wind up with a link location that bombed because
there was some weird unprintable character at the end.

Now there are routines to chop off these kind of things,
but they don't always work, depending on what the weird
character is.

What I had done in the past as to dump the page to a file
and use a hex editor to figure out what the weird character
was.  I have found ascii 0, 7, 10, 12, 13 and some other weird
ones I can't remember.  They often came is combinations too.
Then cut the turkey out with a regex.  It was a lot of work.

Now-a-days, it is easy.  I just get "greedy" (chuckle).
I always know what end of the string should be: .zip,
.exe, .rpm, etc..  So

 $Str ~~ s/ ".zip"  .* /.zip/;

 $ p6 'my $x="abc.zip"~chr(7)~chr(138); $x~~s/ ".zip" .* /.zip/; say
"<$x>";'

Problem solved.  And it doesn't care what the weird character(s)
at the end is/are.

:-)

Hope this helps someone else.  Thank you for all the
help you guys have given me!

-T

On 09/14/2018 05:43 PM, Brad Gilbert wrote:
> You can just remove the control characters
>
> my $x="abc.zip"~chr(7)~chr(138);
> $x .= subst(/<:Cc>+ $/,'');
> say $x;
>
> Note that 13 is carriage return and 10 is newline
>
> If the only ending values are (13,10), 13, or 10
> you can use .chomp to remove them
>
> my $x="abc.zip"~chr(13)~chr(10);
> $x .= chomp;
> say $x;

Thank you!

"chomp" was on of those routines I could only get
to work "sometimes".  It depended on what weird character(s)
I was dealing with.

Would you explain what you are doing with
   $x .= subst(/<:Cc>+ $/,'');

Re: tip: that annoying character at the end

2018-09-14 Thread Brad Gilbert

You can just remove the control characters

   my $x="abc.zip"~chr(7)~chr(138);
   $x .= subst(/<:Cc>+ $/,'');
   say $x;

Note that 13 is carriage return and 10 is newline

If the only ending values are (13,10), 13, or 10
you can use .chomp to remove them

   my $x="abc.zip"~chr(13)~chr(10);
   $x .= chomp;
   say $x;
On Fri, Sep 14, 2018 at 5:22 PM ToddAndMargo  wrote:
>
> Hi All,
>
> A tip to share.
>
> I work a lot with downloaded web pages.  I cut
> out things like revision numbers and download
> locations.
>
> One of the things that use to drive me a bit nuts was that
> web pages can come with all kind of weird line terminators.
> I'd wind up with a link location that bombed because
> there was some weird unprintable character at the end.
>
> Now there are routines to chop off these kind of things,
> but they don't always work, depending on what the weird
> character is.
>
> What I had done in the past as to dump the page to a file
> and use a hex editor to figure out what the weird character
> was.  I have found ascii 0, 7, 10, 12, 13 and some other weird
> ones I can't remember.  They often came is combinations too.
> Then cut the turkey out with a regex.  It was a lot of work.
>
> Now-a-days, it is easy.  I just get "greedy" (chuckle).
> I always know what end of the string should be: .zip,
> .exe, .rpm, etc..  So
>
> $Str ~~ s/ ".zip"  .* /.zip/;
>
> $ p6 'my $x="abc.zip"~chr(7)~chr(138); $x~~s/ ".zip" .* /.zip/; say
> "<$x>";'
> 
>
> Problem solved.  And it doesn't care what the weird character(s)
> at the end is/are.
>
> :-)
>
> Hope this helps someone else.  Thank you for all the
> help you guys have given me!
>
> -T

Re: tip: that annoying character at the end

Re: tip: that annoying character at the end

Re: tip: that annoying character at the end

Re: tip: that annoying character at the end

Re: tip: that annoying character at the end

Re: tip: that annoying character at the end

6 matches

Site Navigation

Mail list logo

Footer information