Re: Regexps using 'after' and 'before' like ^ and $

Joseph Brenner Fri, 29 May 2020 14:47:13 -0700

I opened a github issue:

https://github.com/rakudo/rakudo/issues/3728




On 5/26/20, Joseph Brenner <doom...@gmail.com> wrote:
> Hey Brad, thanks much for the explication:
>
>> ｢<!before .>｣ should probably also prevent the position from being at the
>> end.
>
>> It does work if you write it differently
>
>>     'abc' ~~ / b <!before( /./ )> /
>>     Nil
>
> That's pretty interesting, though I can't say I understand at all
> what's going on there.
>
>> It does seem like there could be a bug here.
>
> That was my suspicion.  I'll probably open an issue on it soon.
>
>> All of that said, I don't think it is useful to tell new Raku programmers
>> that you can use those features that way.
>
> Yes, certainly not.  Just to be clear, I'm just messing around
> with after/before to get a better sense of what they do.
>
> I tried to avoid saying the two forms are equivalent, they just
> do roughly similar things.
>
>
>
> On 5/26/20, Brad Gilbert <b2gi...@gmail.com> wrote:
>> I'm not sure that is the best way to look at ｢<before>｣ and ｢<after>｣.
>>
>>     > 'abcd123abcd' ~~ / <?before <digit>> .+ <?after <digit>> /
>>     ｢123｣
>>
>> In the above code ｢<?before <digit>>｣ makes sure that the first thing
>> that
>> ｢.+｣ matches is a ｢<digit>｣
>> And ｢<?after <digit>>｣ makes sure that the last thing ｢.+｣ matches is
>> also
>> a ｢<digit>｣
>>
>> The ｢<?before <digit>>｣ is written in front of the ｢.+｣ so it starts at
>> that position
>>
>> It does the thing that ｢<digit>｣ would normally do.
>>
>>     ' a b c d 1 2 3 a b c d '
>>     ' _ _ _ _^1^_ _ _ _ _ _ '
>>
>> The thing is, ｢<before>｣ resets the position to what it was immediately
>> before the successful ｢<digit>｣ match.
>>
>>     ' a b c d 1 2 3 a b c d '
>>     ' _ _ _ _^_ _ _ _ _ _ _ '
>>
>> The ｢.+｣ then tries to grab everything
>>
>>     ' a b c d 1 2 3 a b c d '
>>     ' _ _ _ _^1 2 3 a b c d^'
>>
>> Then  ｢<?after <digit>>｣ gets to tell it that it can't do that.
>>
>> The reason is that ｢<after>｣ looks backwards from the current position.
>> The
>> current position is at the very end.
>> It obviously isn't a ｢<digit>｣, so ｢.+｣ has to keep giving up characters
>> until its last value is a ｢<digit>｣.
>>
>>     ' a b c d 1 2 3 a b c d '
>>     ' _ _ _ _^1 2 3^_ _ _ _ '
>>
>> ---
>>
>> You can use ｢<after>｣ to check that is at the beginning.
>>
>>      'abc' ~~ / <!after .> b /
>>      Nil
>>
>> The reason is that if the current position is anywhere other than the
>> beginning ｢.｣ would match.
>> Since we used ｢!｣ that won't fly.
>>
>> ｢<!before .>｣ should probably also prevent the position from being at the
>> end.
>>
>> It does work if you write it differently
>>
>>     'abc' ~~ / b <!before( /./ )> /
>>     Nil
>>
>> Note that ｢<before>｣ and ｢<after>｣ are really just function calls.
>>
>> It does seem like there could be a bug here.
>>
>> ---
>>
>> All of that said, I don't think it is useful to tell new Raku programmers
>> that you can use those features that way.
>>
>> It make them think that these two regexes are doing something similar.
>>
>>     / ^ ... /
>>     / <!after .> ... /
>>
>> They match the same three characters, but for entirely different reasons.
>>
>> The ｢^｣ version is basically the same as:
>>
>>     / <?{ $/.pos == 0 }> ... /
>>
>> While the other one is something like:
>>
>>     / <!{ try $/.orig.substr( $/.pos - 1, 1 ) ~~ /./ }> ... /
>>
>> (The ｢try｣ is needed because ｢.substr( -1 )｣ is a Failure.)
>>
>> So then these:
>>
>>     / ... $ /
>>     / ... <!before .>
>>
>> Would be
>>
>>     / ... <?{ $/.pos == $/.orig.chars }> /
>>     / ... <!{ try $/.orig.substr( $/.pos, 1 ) ~~ /./ }> /
>>
>> ---
>>
>> What I think is happening is that the ｢<!after .>｣ works because the
>> ｢.substr( -1, 1)｣ creates a Failure.
>>
>> The thing is that ｢'abc'.substr( 3, 1 )｣ doesn't create a Failure, it
>> just
>> gives you an empty Str.
>>
>> (The second argument is the maximum number of characters to return.)
>>
>> On Mon, May 25, 2020 at 4:10 PM Joseph Brenner <doom...@gmail.com> wrote:
>>
>>> Given this string:
>>>    my $str = "Romp romp ROMP";
>>>
>>> We can match just the first or last by using the usual pinning
>>> features, '^' or '$':
>>>
>>>    say $str ~~ m:i:g/^romp/;               ## (｢Romp｣)
>>>    say $str ~~ m:i:g/romp$/;               ## (｢ROMP｣)
>>>
>>> Moritz Lenz (Section 3.8 of 'Parsing', p32) makes the point you
>>> can use 'after' to do something like '^' pinning:
>>>
>>>    say $str ~~ m:i:g/ <!after .> romp /;   ## (｢Romp｣)
>>>
>>> That makes sense:  the BOL is "not after any character"
>>> So: I wondered if there was a way to use 'before' to do
>>> something like '$' pinning:
>>>
>>>   say $str ~~ m:i:g/ romp <!before .> /;  ## (｢Romp｣ ｢romp｣)
>>>
>>> That was unexpected: it filters out the one I was trying to
>>> match for, though the logic seemed reasonable: the EOL is "not
>>> before any character".
>>>
>>> What if we flip this and do a positive before match?
>>>
>>>   say $str ~~ m:i:g/ romp <?before .> /;  ## (｢Romp｣ ｢romp｣)
>>>
>>> That does exactly the same thing, but here the logic makes
>>> sense to me: the first two are "before some character",
>>> but the last one isn't.
>>>
>>
>

Re: Regexps using 'after' and 'before' like ^ and $

Reply via email to