Re: Working with a regex using positional captures stored in a variable
On Mon, Mar 22, 2021 at 2:50 PM yary wrote: > > how to get all nested captures worked for me ... not sure I agree with the > design I think the current flattening aspect is awkward in a couple ways: * Having to specify ``. I half like Brad's suggestion. * Having to write `.pairs` to extract the captures. Are your reservations about the design related to the above awkwardnesses? I can see scope for improvement on the above in years to come. Some other commentary: For regexing, it makes sense to return single match objects or flat lists. Devs can then manually add structure to those results if they wish. For parsing, it makes sense to go the other way around, returning nested structures that devs can manually flatten if they wish (as I did). For data structures in general, for some tasks, it makes sense to flatten by default and let devs add structure if they wish, and for other tasks it makes sense to maintain structure by default and let devs flatten if they wish. A PL does have to pick one or the other, either on a general basis, or on a feature-by-feature basis. Perl generally focused on the flat-by-default approach, with regexes fitting in that scheme, and has stayed true to its roots, albeit with some evolution toward improvements for structured data in recent years as devs have contributed additions to the language. Raku generally focused on the structured-data-by-default approach. It also unified regexes and parsing. The natural outcome was a bias toward structure (parse trees). I anticipate there will be improvements for flattening structured data as time passes. -- love raiph
Re: Working with a regex using positional captures stored in a variable
Hi all, Thanks Bill for posting your results from my samples. Seems like we both get lots of warnings/errors from our REPL's, me even with 2021.02.01. I suspect there must be something going on with what the REPL is trying to print, after all it does want to display the results of every line. I haven't looked at that since the first post, was more interested in the capturing-or-not rules displayed by the code run from a file than tracking down the REPL output. Maybe I'll look into it again next weekend, need to get back to $work today! Thanks Raiph for the continued examples, this one in particular showing how to get all nested captures worked for me. While I'm not sure I agree with the design, it is consistent, and perhaps I will internalize this over time. On Fri, Mar 19, 2021 at 7:14 PM Ralph Mellor wrote: ... > > A Raku equivalent: > my $word = '(\w+)'; > my $AwithB = "$word ' with ' $word"; > my $regex = "$AwithB .* 'is ' $word"; > $_ = 'Interpolating regexes with arbitrary captures is fun!'; > .say for m//..pairs; > displays: > 0 => 「regexes」 > 1 => 「arbitrary」 > 2 => 「fun」 > > > Raku example: > > > > my $word = /(\w+)/; > > my $AwithB = /$word' with '$word/; > If you interpolate by using `$abc...` or `<$abc...>` instead of ``, > Raku will by default not capture. And the non-capturing is nested, so > throwing away those captures also throws away the corresponding > capture within `$word`. -y
Re: Working with a regex using positional captures stored in a variable
Hi Yary, I ran your Raku code in a script (on MacOS) and in the REPL (MacOS with Linenoise). All results below with Rakudo_2020.10: #Script: my $word = /(\w+)/; my $AwithB = /$word' with '$word/; $_= 'Interpolating regexes with arbitrary captures is fun!'; say "Nested rx"; dd m/$AwithB.*'is '$word/; say "shallow rx"; dd m/$word' with '$word.*'is '$word/; say "no interpolation"; dd m/(\w+)' with '(\w+).*'is '(\w+)/; #Script Result: admin@mbook:~$ raku yary_named.p6 Nested rx Match $/ = Match.new(:orig("Interpolating regexes with arbitrary captures is fun!"), :from(14), :pos(52)) shallow rx Match $/ = Match.new(:orig("Interpolating regexes with arbitrary captures is fun!"), :from(14), :pos(52)) no interpolation Match $/ = Match.new(:orig("Interpolating regexes with arbitrary captures is fun!"), :from(14), :pos(52), :list((Match.new(:orig("Interpolating regexes with arbitrary captures is fun!"), :from(14), :pos(21)), Match.new(:orig("Interpolating regexes with arbitrary captures is fun!"), :from(27), :pos(36)), Match.new(:orig("Interpolating regexes with arbitrary captures is fun!"), :from(49), :pos(52) REPL: admin@mbook:~$ raku Welcome to 퐑퐚퐤퐮퐝퐨™ v2020.10. Implementing the 퐑퐚퐤퐮™ programming language v6.d. Built on MoarVM version 2020.10. To exit type 'exit' or '^D' > my $word = /(\w+)/; /(\w+)/ > my $AwithB = /$word' with '$word/; Regex object coerced to string (please use .gist or .raku to do that) in any metachar at /Users/admin/rakudo/rakudo- 2020.10/install/share/nqp/lib/NQPP6QRegex.moarvm line 1 in any termseq at /Users/admin/rakudo/rakudo- 2020.10/install/share/nqp/lib/NQPP6QRegex.moarvm line 1 in any quote:sym at /Users/admin/rakudo/rakudo- 2020.10/install/share/perl6/lib/Perl6/Grammar.moarvm line 1 in any quote at /Users/admin/rakudo/rakudo-2020.10/install/share/perl6/lib/Perl6/Grammar.moarvm line 1 in any value:sym at /Users/admin/rakudo/rakudo- 2020.10/install/share/perl6/lib/Perl6/Grammar.moarvm line 1 in any value at /Users/admin/rakudo/rakudo-2020.10/install/share/perl6/lib/Perl6/Grammar.moarvm line 1 in any term:sym at /Users/admin/rakudo/rakudo- 2020.10/install/share/perl6/lib/Perl6/Grammar.moarvm line 1 in any term at /Users/admin/rakudo/rakudo-2020.10/install/share/perl6/lib/Perl6/Grammar.moarvm line 1 Regex object coerced to string (please use .gist or .raku to do that) in any metachar at /Users/admin/rakudo/rakudo- 2020.10/install/share/nqp/lib/NQPP6QRegex.moarvm line 1 in any termseq at /Users/admin/rakudo/rakudo- 2020.10/install/share/nqp/lib/NQPP6QRegex.moarvm line 1 in any quote:sym at /Users/admin/rakudo/rakudo- 2020.10/install/share/perl6/lib/Perl6/Grammar.moarvm line 1 in any quote at /Users/admin/rakudo/rakudo-2020.10/install/share/perl6/lib/Perl6/Grammar.moarvm line 1 in any value:sym at /Users/admin/rakudo/rakudo- 2020.10/install/share/perl6/lib/Perl6/Grammar.moarvm line 1 in any value at /Users/admin/rakudo/rakudo-2020.10/install/share/perl6/lib/Perl6/Grammar.moarvm line 1 in any term:sym at /Users/admin/rakudo/rakudo- 2020.10/install/share/perl6/lib/Perl6/Grammar.moarvm line 1 in any term at /Users/admin/rakudo/rakudo-2020.10/install/share/perl6/lib/Perl6/Grammar.moarvm line 1 > $_= 'Interpolating regexes with arbitrary captures is fun!'; Interpolating regexes with arbitrary captures is fun! > say "Nested rx"; Nested rx > dd m/$AwithB.*'is '$word/; Regex object coerced to string (please use .gist or .raku to do that) in any metachar at /Users/admin/rakudo/rakudo- 2020.10/install/share/nqp/lib/NQPP6QRegex.moarvm line 1 in any termseq at /Users/admin/rakudo/rakudo- 2020.10/install/share/nqp/lib/NQPP6QRegex.moarvm line 1 in any quote at /Users/admin/rakudo/rakudo-2020.10/install/share/perl6/lib/Perl6/Grammar.moarvm line 1 in any value:sym at /Users/admin/rakudo/rakudo- 2020.10/install/share/perl6/lib/Perl6/Grammar.moarvm line 1 in any value at /Users/admin/rakudo/rakudo-2020.10/install/share/perl6/lib/Perl6/Grammar.moarvm line 1 in any term:sym at /Users/admin/rakudo/rakudo- 2020.10/install/share/perl6/lib/Perl6/Grammar.moarvm line 1 in any term at /Users/admin/rakudo/rakudo-2020.10/install/share/perl6/lib/Perl6/Grammar.moarvm line 1 Regex object coerced to string (please use .gist or .raku to do that) in any metachar at /Users/admin/rakudo/rakudo- 2020.10/install/share/nqp/lib/NQPP6QRegex.moarvm line 1 in any termseq at /Users/admin/rakudo/rakudo- 2020.10/install/share/nqp/lib/NQPP6QRegex.moarvm line 1 in any quote at /Users/admin/rakudo/rakudo-2020.10/install/share/perl6/lib/Perl6/Grammar.moarvm line 1 in any value:sym at /Users/admin/rakudo/rakudo- 2020.10/install/share/perl6/lib/Perl6/Grammar.moarvm line 1 in any value at /Users/admin/rakudo/rakudo-2020.10/install/share/perl6/lib/Perl6/Grammar.moarvm line 1 in any term:sym at /Users/admin/rakudo/rakudo- 2020.10/install/share/perl6/lib/Perl6/Grammar.moarvm line 1 in any term at
Re: Working with a regex using positional captures stored in a variable
On Wed, Mar 17, 2021 at 7:17 PM William Michels via perl6-users wrote: > > ("If the first character inside is anything other than an alpha it doesn't > capture"). > It should be added to the Raku Docs ASAP. Fyi, here's how Larry Wall expressed it 15-18 years ago: > A leading alphabetic character means it's capturing (from https://design.raku.org/S05.html#line_1422) -- love, raiph
Re: Working with a regex using positional captures stored in a variable
On Fri, Mar 19, 2021 at 6:12 PM yary wrote: > > I don't know how to get the result. > DB<1> $word = qr/(\w+)/; > DB<2> $AwithB = qr/$word with $word/ > DB<3> $_ = 'Interpolating regexes with arbitrary captures is fun!' > DB<4> x /$AwithB.*is $word/ A Raku equivalent: my $word = '(\w+)'; my $AwithB = "$word ' with ' $word"; my $regex = "$AwithB .* 'is ' $word"; $_ = 'Interpolating regexes with arbitrary captures is fun!'; .say for m//..pairs; displays: 0 => 「regexes」 1 => 「arbitrary」 2 => 「fun」 > Raku example: > > my $word = /(\w+)/; > my $AwithB = /$word' with '$word/; If you interpolate by using `$abc...` or `<$abc...>` instead of ``, Raku will by default not capture. And the non-capturing is nested, so throwing away those captures also throws away the corresponding capture within `$word`. > Where my expectation differs from the behavior in my example > is Raku's discarding the capture groups of the interpolated regexes. It only discards them if you tell it to discard them. If a `<...>` construct begins with a letter, it'll capture. If not, it won't. -- love, raiph
Re: Working with a regex using positional captures stored in a variable
My current expectations are a little different than any others previously expressed and I don't know how to get the result. I am no longer considering named captures from Regex's interpolated inside and am now looking at directly interpolating them. Perl example: DB<1> *$word = qr/(\w+)/;* DB<2> *$AwithB = qr/$word with $word/* DB<3> *$_ = 'Interpolating regexes with arbitrary captures is fun!'* DB<4> *x /$AwithB.*is $word/* 0 'regexes' 1 'arbitrary' 2 'fun' That was simple and I like the results of the capture groups being first-level. Raku example: my $word = /(\w+)/; my $AwithB = /$word' with '$word/; $_= 'Interpolating regexes with arbitrary captures is fun!'; say "Nested rx"; dd m/$AwithB.*'is '$word/; say "shallow rx"; dd m/$word' with '$word.*'is '$word/; say "no interpolation"; dd m/(\w+)' with '(\w+).*'is '(\w+)/; # code end results below Nested rx Match $/ = Match.new(:orig("Interpolating regexes with arbitrary captures is fun!"), :from(14), :pos(52)) shallow rx Match $/ = Match.new(:orig("Interpolating regexes with arbitrary captures is fun!"), :from(14), :pos(52)) no interpolation Match $/ = Match.new(:orig("Interpolating regexes with arbitrary captures is fun!"), :from(14), :pos(52), :list((Match.new(:orig("Interpolating regexes with arbitrary captures is fun!"), :from(14), :pos(21)), Match.new(:orig("Interpolating regexes with arbitrary captures is fun!"), :from(27), :pos(36)), Match.new(:orig("Interpolating regexes with arbitrary captures is fun!"), :from(49), :pos(52) Run against Welcome to 퐑퐚퐤퐮퐝퐨™ v2021.02.1. Implementing the 퐑퐚퐤퐮™ programming language v6.d. Built on MoarVM version 2021.02. What I see from that example code is Raku matching all the regex's as I expect regardless of nesting them, all without the named capture grouping angle-brackets. Which is what the documentation suggests from its example- my $string = 'Is this a regex or a string: 123\w+False$pattern1 ?'; my $regex= /\w+/; say $string.match: / $regex /;# [4] OUTPUT: «「Is」» Where my expectation differs from the behavior in my example is Raku's discarding the capture groups of the interpolated regexes. The overall match works, in all cases :from(14) to :pos(52), but Raku treats the groupings inside the interpolations as non-capturing. -y On Thu, Mar 18, 2021 at 6:08 PM Ralph Mellor wrote: > On Thu, Mar 18, 2021 at 12:59 AM yary wrote: > > > > As it is I get same kinds of errors in the REPL, perhaps it is MacOS > > with Linenoise that's mucking that up. > > I can confirm your new test code also works fine in both program > and repl forms in 2020.12. > > Though obviously the case you mark as "interesting" still doesn't do > any sub-capturing. Which is to be expected if you know that aspect > of Raku's regex language. > > > I had hoped that by directly interpolating $rd and $rw they would > > fill in the top-level match object and fill in $0, $1 – but it has the > > same issue as Joe's original example. > > Are you just saying that your original expectations were the same > as Joe's, but you now understand that's not how Raku regexes > work, but it's trivial to get the same result? Or are you saying you > don't know how to get the same result? > > -- > love, raiph >
Re: Working with a regex using positional captures stored in a variable
On Thu, Mar 18, 2021 at 12:59 AM yary wrote: > > As it is I get same kinds of errors in the REPL, perhaps it is MacOS > with Linenoise that's mucking that up. I can confirm your new test code also works fine in both program and repl forms in 2020.12. Though obviously the case you mark as "interesting" still doesn't do any sub-capturing. Which is to be expected if you know that aspect of Raku's regex language. > I had hoped that by directly interpolating $rd and $rw they would > fill in the top-level match object and fill in $0, $1 – but it has the > same issue as Joe's original example. Are you just saying that your original expectations were the same as Joe's, but you now understand that's not how Raku regexes work, but it's trivial to get the same result? Or are you saying you don't know how to get the same result? -- love, raiph
Re: Working with a regex using positional captures stored in a variable
Thanks raiph for everything! Including getting me to upgrade my Raku, "Welcome to Rakudo(tm) v2021.02.1. Implementing the Raku(tm) programming language v6.d. Built on MoarVM version 2021.02." As it is I get same kinds of errors in the REPL, perhaps it is MacOS with Linenoise that's mucking that up. The code in a file does better, matching the docs. Still I had hoped to work around these issues by interpolating with the bare variables not inside of angle-brackets. Here's the test code- my $str = ' grr huh yeah 388 boo! '; say $str ~~ / (\d+) \s (\w+) /; # 388 boo say "Match 0=$0, 1=$1"; # Match 0=388, 1=boo say "=== below is the interesting case"; my $rd = /(\d+)/; my $rw = /(\w+)/; say $str.match: / $rd \s $rw /; # 388 boo say "Match 0=$0, 1=$1"; # Match 0=, 1= say "=== below shows literal string matching"; my $sd = '(\d+)'; my $sw = '(\w+)'; $str ~= "$sd $sw"; say $str.match: / ($sd) \s ($sw) /; # (\d+) (\w+) say "Match 0=$0, 1=$1"; # Match 0=(\d+), 1=(\w+) I had hoped that by directly interpolating $rd and $rw they would fill in the top-level match object and fill in $0, $1 – but it has the same issue as Joe's original example. It matches the right text but doesn't fill in the top-level match. -y On Wed, Mar 17, 2021 at 6:55 PM Ralph Mellor wrote: > > 1. The list you posted is fantastic ("If the first character inside is > anything other > > than an alpha it doesn't capture"). It should be added to the Raku Docs > ASAP. > > Not the list, right? Just the rule. (There are dozens of kinds of > assertions. No one > is going to remember the list.) If you were to add just the line you > suggest then > you'd be able to do it ASAP. > > > 2. There are some shortcuts that don't seem to follow a set pattern. For > example > > a named capture can be accessed using $ instead of $/ ; > > the "/' can be elided. Do you have a method you can share for remembering > > these sorts of shortcuts? Or are they disfavored? > > I know you're asking Brad, but fwiw my overall method for mnemonics is to > stay > creative and bring in visual and audio elements (like images and rhyming) > and > weave them into a little made up story that fits in with an overall > adventure story > about Raku. The elements would be ones that work for a given individual > for a > given aspect of a given feature of Raku, with the story being made up by > the > person doing the learning. > > Thus, for example, I might note that the @ symbol looks something like a > `0` > and is sounded out as "at" and have a kid tell me a little story they > imagine > getting added to the twitter profile of someone they know about programming > that involves the fact that array indexing is `0` based, thus giving > them a strong > reminder of the latter aspect. > > Fwiw I'm not seeing much value in developing one for eliding the `/`. > If a dev doesn't > know they can elide the `/` when writing code, then no harm done; just > leave it in. > If a dev is *reading* code and sees syntax of the form `$` and > wonders what > it is, they can type `$<` into the search box in the doc and get a > match. I personally > found it really easy to remember because it's so simple and used so > frequently. I > think mnemonics > > > 3. Finally, I've never seen in the Perl6/Raku literature the motto you > cite: > > "One of the mottos of Raku, is that it is ok to confuse a new programmer, > > it is not ok to confuse an expert." > > I think that's a reasonable distillation of Larry's perspective. It's > another way > of expressing that Python is good as a first language, Raku as a last one. > > Consider the options: > > * Ok to confuse a new programmer or an expert. (Not a good option.) > > * Ok to confuse an expert, not Ok to confuse a new programmer. ScratchJr? > > * Not Ok to confuse a new programmer or an expert. ScratchJr? > > > [ The motto I prefer is from Larry Wall: "...easy things should stay > easy, > > hard things should get easier, and impossible things should get hard... > ." > > I like that one too. I daresay I prefer it too. But for it to work, it > really needs > to be Ok to confuse a new programmer but not Ok to confuse an expert. > > Note that easy things being easy does not mean that new programmers > won't get confused. For example, a new Raku programmer who is used > to Perl might be confused that `<$foo>` does not capture in Raku, even > though it does in Perl. But it's still easy to capture; you write > ``. > > But when *experts* are *systematically* confused, i.e. *all* experts just > keep falling for the same trap, then impossible things won't just be hard, > they'll stop happening. > > Note that I say that as a non-expert in many, many areas of Raku. What > I *do* know is that whenever I encounter something that surprises me, > and keep an open mind about what's going on, no matter how annoyed > I am or convinced Raku is being stupid, I almost always eventually arrive > at an interim conclusion it's appropriate as is. > > *Almost* always.
Re: Working with a regex using positional captures stored in a variable
> 1. The list you posted is fantastic ("If the first character inside is > anything other > than an alpha it doesn't capture"). It should be added to the Raku Docs ASAP. Not the list, right? Just the rule. (There are dozens of kinds of assertions. No one is going to remember the list.) If you were to add just the line you suggest then you'd be able to do it ASAP. > 2. There are some shortcuts that don't seem to follow a set pattern. For > example > a named capture can be accessed using $ instead of $/ ; > the "/' can be elided. Do you have a method you can share for remembering > these sorts of shortcuts? Or are they disfavored? I know you're asking Brad, but fwiw my overall method for mnemonics is to stay creative and bring in visual and audio elements (like images and rhyming) and weave them into a little made up story that fits in with an overall adventure story about Raku. The elements would be ones that work for a given individual for a given aspect of a given feature of Raku, with the story being made up by the person doing the learning. Thus, for example, I might note that the @ symbol looks something like a `0` and is sounded out as "at" and have a kid tell me a little story they imagine getting added to the twitter profile of someone they know about programming that involves the fact that array indexing is `0` based, thus giving them a strong reminder of the latter aspect. Fwiw I'm not seeing much value in developing one for eliding the `/`. If a dev doesn't know they can elide the `/` when writing code, then no harm done; just leave it in. If a dev is *reading* code and sees syntax of the form `$` and wonders what it is, they can type `$<` into the search box in the doc and get a match. I personally found it really easy to remember because it's so simple and used so frequently. I think mnemonics > 3. Finally, I've never seen in the Perl6/Raku literature the motto you cite: > "One of the mottos of Raku, is that it is ok to confuse a new programmer, > it is not ok to confuse an expert." I think that's a reasonable distillation of Larry's perspective. It's another way of expressing that Python is good as a first language, Raku as a last one. Consider the options: * Ok to confuse a new programmer or an expert. (Not a good option.) * Ok to confuse an expert, not Ok to confuse a new programmer. ScratchJr? * Not Ok to confuse a new programmer or an expert. ScratchJr? > [ The motto I prefer is from Larry Wall: "...easy things should stay easy, > hard things should get easier, and impossible things should get hard... ." I like that one too. I daresay I prefer it too. But for it to work, it really needs to be Ok to confuse a new programmer but not Ok to confuse an expert. Note that easy things being easy does not mean that new programmers won't get confused. For example, a new Raku programmer who is used to Perl might be confused that `<$foo>` does not capture in Raku, even though it does in Perl. But it's still easy to capture; you write ``. But when *experts* are *systematically* confused, i.e. *all* experts just keep falling for the same trap, then impossible things won't just be hard, they'll stop happening. Note that I say that as a non-expert in many, many areas of Raku. What I *do* know is that whenever I encounter something that surprises me, and keep an open mind about what's going on, no matter how annoyed I am or convinced Raku is being stupid, I almost always eventually arrive at an interim conclusion it's appropriate as is. *Almost* always. And always an *interim* conclusion at best. That is to say, I retain an eternally open mind toward all such things. So if someone were to add a table to the doc listing all the assertion types, noting which ones capture and which ones don't, rather than just the one rule ("starts with an alpha") I'd be open minded about what the outcome would be. Likewise if Brad reveals some master method he has for coming up with a mnemonic covering dropping the `/` in `$`. And if it turns out Larry has never said something directly to the effect that Brad has mentioned, I'd be surprised but curious why I was surprised. -- love, raiph
Re: Working with a regex using positional captures stored in a variable
And when I cut/paste from the doc, the number example works too, in both script and repl. On Wed, Mar 17, 2021 at 10:33 PM Ralph Mellor wrote: > > Er, by wfm I mean it matches 「Is」 as the code suggests. > > On Wed, Mar 17, 2021 at 10:32 PM Ralph Mellor wrote: > > > > Works for me in Rakudo 2020.12. > > > > On Wed, Mar 17, 2021 at 9:33 PM yary wrote: > > > > > > The "Interpolation" section of the raku docs use strings as the elements > > > of building up a larger regex from smaller pieces, but the example that > > > looks fruitful isn't working in my raku. This is taken from > > > https://docs.raku.org/language/regexes#Regex_interpolation > > > > > > > my $string = 'Is this a regex or a string: 123\w+False$pattern1 ?'; > > > > > > Is this a regex or a string: 123\w+False$pattern1 ? > > > > > > > my $regex= /\w+/; > > > > > > /\w+/ > > > > > > > say $string.match: / $regex /; > > > > > > Regex object coerced to string (please use .gist or .raku to do that) > > > > > > ... and more error lines, and no result when the docs show matching > > > '123': > > > > > > 「」 > > > > > > > > > $ raku -v > > > > > > Welcome to 퐑퐚퐤퐮퐝퐨™ v2020.10. > > > > > > Implementing the 퐑퐚퐤퐮™ programming language v6.d. > > > > > > Built on MoarVM version 2020.10. > > > > > > > > > > > > -y > > > > > > > > > On Wed, Mar 17, 2021 at 3:17 PM William Michels via perl6-users > > > wrote: > > >> > > >> Dear Brad, > > >> > > >> 1. The list you posted is fantastic ("If the first character inside is > > >> anything other than an alpha it doesn't capture"). It should be added to > > >> the Raku Docs ASAP. > > >> > > >> 2. There are some shortcuts that don't seem to follow a set pattern. For > > >> example a named capture can be accessed using $ instead of > > >> $/ ; the "/' can be elided. Do you have a method you can share > > >> for remembering these sorts of shortcuts? Or are they disfavored? > > >> > > >> > say ~$ if 'abc' ~~ / $ = [ \w+ ] /; > > >> abc > > >> > > > >> [ Above from the example at > > >> https://docs.raku.org/syntax/Named%20captures ]. > > >> > > >> 3. Finally, I've never seen in the Perl6/Raku literature the motto you > > >> cite: "One of the mottos of Raku, is that it is ok to confuse a new > > >> programmer, it is not ok to confuse an expert." Do you have a citation? > > >> > > >> [ The motto I prefer is from Larry Wall: "...easy things should stay > > >> easy, hard things should get easier, and impossible things should get > > >> hard... ." Citation: https://www.perl.com/pub/2000/10/23/soto2000.html/ > > >> ]. > > >> > > >> Best Regards, > > >> > > >> Bill. > > >> > > >> > > >> > > >> On Sat, Mar 13, 2021 at 4:47 PM Brad Gilbert wrote: > > >>> > > >>> It makes <…> more consistent precisely because <$pattern> doesn't > > >>> capture. > > >>> > > >>> If the first character inside is anything other than an alpha it > > >>> doesn't capture. > > >>> Which is a very simple description of when it captures. > > >>> > > >>> doesn't capture because of the 「?」 > > >>> doesn't capture because of the 「!」 > > >>> <.ws> doesn't capture because of the 「.」 > > >>> <> doesn't capture because of the 「&」 > > >>> <$pattern> doesn't capture because of the 「$」 > > >>> <$0> doesn't capture because of the 「$」 > > >>> <@a> doesn't capture because of the 「@」 > > >>> <[…]> doesn't capture because of the 「[」 > > >>> <-[…]> doesn't capture because of the 「-] > > >>> <:Ll> doesn't capture because of the 「:」 > > >>> > > >>> For most of those, you don't actually want it to capture. > > >>> With 「.」 the whole point is that it doesn't capture. > > >>> > > >>> does capture because it starts with an alpha > > >>> does capture because it starts with an alpha > > >>> > > >>> $0 = <$pattern> doesn't capture to $, but does capture to > > >>> $0 > > >>> $ = <$pattern> captures because of $ = > > >>> > > >>> It would be a mistake to just make <$pattern> capture. > > >>> Consistency is perhaps Raku's most important feature. > > >>> > > >>> One of the mottos of Raku, is that it is ok to confuse a new > > >>> programmer, it is not ok to confuse an expert. > > >>> An expert in Raku understands the deep fundamental ways that Raku is > > >>> consistent. > > >>> So breaking consistency should be very carefully considered. > > >>> > > >>> In this case, there is very little benefit. > > >>> Even worse, you then have to come up with some new syntax to prevent it > > >>> from capturing when you don't want it to. > > >>> That new syntax wouldn't be as guessible as it currently is. Which > > >>> again would confuse experts. > > >>> > > >>> If anyone seriously suggests such a change, I will vehemently fight to > > >>> prevent it from happening. > > >>> > > >>> I would be more likely to accept <=$pattern> being added as a synonym > > >>> to . > > >>> > > >>> On Sat, Mar 13, 2021 at 3:30 PM Joseph Brenner > > >>> wrote: > > > > Thanks much for your answer on this. I
Re: Working with a regex using positional captures stored in a variable
Er, by wfm I mean it matches 「Is」 as the code suggests. On Wed, Mar 17, 2021 at 10:32 PM Ralph Mellor wrote: > > Works for me in Rakudo 2020.12. > > On Wed, Mar 17, 2021 at 9:33 PM yary wrote: > > > > The "Interpolation" section of the raku docs use strings as the elements of > > building up a larger regex from smaller pieces, but the example that looks > > fruitful isn't working in my raku. This is taken from > > https://docs.raku.org/language/regexes#Regex_interpolation > > > > > my $string = 'Is this a regex or a string: 123\w+False$pattern1 ?'; > > > > Is this a regex or a string: 123\w+False$pattern1 ? > > > > > my $regex= /\w+/; > > > > /\w+/ > > > > > say $string.match: / $regex /; > > > > Regex object coerced to string (please use .gist or .raku to do that) > > > > ... and more error lines, and no result when the docs show matching '123': > > > > 「」 > > > > > > $ raku -v > > > > Welcome to 퐑퐚퐤퐮퐝퐨™ v2020.10. > > > > Implementing the 퐑퐚퐤퐮™ programming language v6.d. > > > > Built on MoarVM version 2020.10. > > > > > > > > -y > > > > > > On Wed, Mar 17, 2021 at 3:17 PM William Michels via perl6-users > > wrote: > >> > >> Dear Brad, > >> > >> 1. The list you posted is fantastic ("If the first character inside is > >> anything other than an alpha it doesn't capture"). It should be added to > >> the Raku Docs ASAP. > >> > >> 2. There are some shortcuts that don't seem to follow a set pattern. For > >> example a named capture can be accessed using $ instead of > >> $/ ; the "/' can be elided. Do you have a method you can share for > >> remembering these sorts of shortcuts? Or are they disfavored? > >> > >> > say ~$ if 'abc' ~~ / $ = [ \w+ ] /; > >> abc > >> > > >> [ Above from the example at https://docs.raku.org/syntax/Named%20captures > >> ]. > >> > >> 3. Finally, I've never seen in the Perl6/Raku literature the motto you > >> cite: "One of the mottos of Raku, is that it is ok to confuse a new > >> programmer, it is not ok to confuse an expert." Do you have a citation? > >> > >> [ The motto I prefer is from Larry Wall: "...easy things should stay easy, > >> hard things should get easier, and impossible things should get hard... ." > >> Citation: https://www.perl.com/pub/2000/10/23/soto2000.html/ ]. > >> > >> Best Regards, > >> > >> Bill. > >> > >> > >> > >> On Sat, Mar 13, 2021 at 4:47 PM Brad Gilbert wrote: > >>> > >>> It makes <…> more consistent precisely because <$pattern> doesn't capture. > >>> > >>> If the first character inside is anything other than an alpha it doesn't > >>> capture. > >>> Which is a very simple description of when it captures. > >>> > >>> doesn't capture because of the 「?」 > >>> doesn't capture because of the 「!」 > >>> <.ws> doesn't capture because of the 「.」 > >>> <> doesn't capture because of the 「&」 > >>> <$pattern> doesn't capture because of the 「$」 > >>> <$0> doesn't capture because of the 「$」 > >>> <@a> doesn't capture because of the 「@」 > >>> <[…]> doesn't capture because of the 「[」 > >>> <-[…]> doesn't capture because of the 「-] > >>> <:Ll> doesn't capture because of the 「:」 > >>> > >>> For most of those, you don't actually want it to capture. > >>> With 「.」 the whole point is that it doesn't capture. > >>> > >>> does capture because it starts with an alpha > >>> does capture because it starts with an alpha > >>> > >>> $0 = <$pattern> doesn't capture to $, but does capture to $0 > >>> $ = <$pattern> captures because of $ = > >>> > >>> It would be a mistake to just make <$pattern> capture. > >>> Consistency is perhaps Raku's most important feature. > >>> > >>> One of the mottos of Raku, is that it is ok to confuse a new programmer, > >>> it is not ok to confuse an expert. > >>> An expert in Raku understands the deep fundamental ways that Raku is > >>> consistent. > >>> So breaking consistency should be very carefully considered. > >>> > >>> In this case, there is very little benefit. > >>> Even worse, you then have to come up with some new syntax to prevent it > >>> from capturing when you don't want it to. > >>> That new syntax wouldn't be as guessible as it currently is. Which again > >>> would confuse experts. > >>> > >>> If anyone seriously suggests such a change, I will vehemently fight to > >>> prevent it from happening. > >>> > >>> I would be more likely to accept <=$pattern> being added as a synonym to > >>> . > >>> > >>> On Sat, Mar 13, 2021 at 3:30 PM Joseph Brenner wrote: > > Thanks much for your answer on this. I think this is the sort of > trick I was looking for: > > Brad Gilbert wrote: > > > You can put it back in as a named > > > > $input ~~ / > > 「9 million」 > > pattern => 「9 million」 > > 0 => 「9」 > > 1 => 「million」 > > That's good enough, I guess, though you need to know about the > issue... is there some reason it shouldn't happen automatically, >
Re: Working with a regex using positional captures stored in a variable
Works for me in Rakudo 2020.12. On Wed, Mar 17, 2021 at 9:33 PM yary wrote: > > The "Interpolation" section of the raku docs use strings as the elements of > building up a larger regex from smaller pieces, but the example that looks > fruitful isn't working in my raku. This is taken from > https://docs.raku.org/language/regexes#Regex_interpolation > > > my $string = 'Is this a regex or a string: 123\w+False$pattern1 ?'; > > Is this a regex or a string: 123\w+False$pattern1 ? > > > my $regex= /\w+/; > > /\w+/ > > > say $string.match: / $regex /; > > Regex object coerced to string (please use .gist or .raku to do that) > > ... and more error lines, and no result when the docs show matching '123': > > 「」 > > > $ raku -v > > Welcome to 퐑퐚퐤퐮퐝퐨™ v2020.10. > > Implementing the 퐑퐚퐤퐮™ programming language v6.d. > > Built on MoarVM version 2020.10. > > > > -y > > > On Wed, Mar 17, 2021 at 3:17 PM William Michels via perl6-users > wrote: >> >> Dear Brad, >> >> 1. The list you posted is fantastic ("If the first character inside is >> anything other than an alpha it doesn't capture"). It should be added to the >> Raku Docs ASAP. >> >> 2. There are some shortcuts that don't seem to follow a set pattern. For >> example a named capture can be accessed using $ instead of >> $/ ; the "/' can be elided. Do you have a method you can share for >> remembering these sorts of shortcuts? Or are they disfavored? >> >> > say ~$ if 'abc' ~~ / $ = [ \w+ ] /; >> abc >> > >> [ Above from the example at https://docs.raku.org/syntax/Named%20captures ]. >> >> 3. Finally, I've never seen in the Perl6/Raku literature the motto you cite: >> "One of the mottos of Raku, is that it is ok to confuse a new programmer, it >> is not ok to confuse an expert." Do you have a citation? >> >> [ The motto I prefer is from Larry Wall: "...easy things should stay easy, >> hard things should get easier, and impossible things should get hard... ." >> Citation: https://www.perl.com/pub/2000/10/23/soto2000.html/ ]. >> >> Best Regards, >> >> Bill. >> >> >> >> On Sat, Mar 13, 2021 at 4:47 PM Brad Gilbert wrote: >>> >>> It makes <…> more consistent precisely because <$pattern> doesn't capture. >>> >>> If the first character inside is anything other than an alpha it doesn't >>> capture. >>> Which is a very simple description of when it captures. >>> >>> doesn't capture because of the 「?」 >>> doesn't capture because of the 「!」 >>> <.ws> doesn't capture because of the 「.」 >>> <> doesn't capture because of the 「&」 >>> <$pattern> doesn't capture because of the 「$」 >>> <$0> doesn't capture because of the 「$」 >>> <@a> doesn't capture because of the 「@」 >>> <[…]> doesn't capture because of the 「[」 >>> <-[…]> doesn't capture because of the 「-] >>> <:Ll> doesn't capture because of the 「:」 >>> >>> For most of those, you don't actually want it to capture. >>> With 「.」 the whole point is that it doesn't capture. >>> >>> does capture because it starts with an alpha >>> does capture because it starts with an alpha >>> >>> $0 = <$pattern> doesn't capture to $, but does capture to $0 >>> $ = <$pattern> captures because of $ = >>> >>> It would be a mistake to just make <$pattern> capture. >>> Consistency is perhaps Raku's most important feature. >>> >>> One of the mottos of Raku, is that it is ok to confuse a new programmer, it >>> is not ok to confuse an expert. >>> An expert in Raku understands the deep fundamental ways that Raku is >>> consistent. >>> So breaking consistency should be very carefully considered. >>> >>> In this case, there is very little benefit. >>> Even worse, you then have to come up with some new syntax to prevent it >>> from capturing when you don't want it to. >>> That new syntax wouldn't be as guessible as it currently is. Which again >>> would confuse experts. >>> >>> If anyone seriously suggests such a change, I will vehemently fight to >>> prevent it from happening. >>> >>> I would be more likely to accept <=$pattern> being added as a synonym to >>> . >>> >>> On Sat, Mar 13, 2021 at 3:30 PM Joseph Brenner wrote: Thanks much for your answer on this. I think this is the sort of trick I was looking for: Brad Gilbert wrote: > You can put it back in as a named > > $input ~~ / > 「9 million」 > pattern => 「9 million」 > 0 => 「9」 > 1 => 「million」 That's good enough, I guess, though you need to know about the issue... is there some reason it shouldn't happen automatically, using the variable name to label the captures? I don't think this particular gotcha is all that well documented, though I guess there's a reference to this being a "known trap" in the documentation under "Regex interpolation"-- but that's the sort of remark that makes sense only after you know what its talking about. I have to say, my first reaction was
Re: Working with a regex using positional captures stored in a variable
The "Interpolation" section of the raku docs use strings as the elements of building up a larger regex from smaller pieces, but the example that looks fruitful isn't working in my raku. This is taken from https://docs.raku.org/language/regexes#Regex_interpolation > my $string = 'Is this a regex or a string: 123\w+False$pattern1 ?'; Is this a regex or a string: 123\w+False$pattern1 ? > my $regex= /\w+/; /\w+/ > say $string.match: / $regex /; Regex object coerced to string (please use .gist or .raku to do that) ... and more error lines, and no result when the docs show matching '123': 「」 $ raku -v Welcome to 퐑퐚퐤퐮퐝퐨™ v2020.10. Implementing the 퐑퐚퐤퐮™ programming language v6.d. Built on MoarVM version 2020.10. -y On Wed, Mar 17, 2021 at 3:17 PM William Michels via perl6-users < perl6-us...@perl.org> wrote: > Dear Brad, > > 1. The list you posted is fantastic ("If the first character inside is > anything other than an alpha it doesn't capture"). It should be added to > the Raku Docs ASAP. > > 2. There are some shortcuts that don't seem to follow a set pattern. For > example a named capture can be accessed using $ instead of > $/ ; the "/' can be elided. Do you have a method you can share for > remembering these sorts of shortcuts? Or are they disfavored? > > > say ~$ if 'abc' ~~ / $ = [ \w+ ] /; > abc > > > [ Above from the example at https://docs.raku.org/syntax/Named%20captures > ]. > > 3. Finally, I've never seen in the Perl6/Raku literature the motto you > cite: "One of the mottos of Raku, is that it is ok to confuse a new > programmer, it is not ok to confuse an expert." Do you have a citation? > > [ The motto I prefer is from Larry Wall: "...easy things should stay easy, > hard things should get easier, and impossible things should get hard... ." > Citation: https://www.perl.com/pub/2000/10/23/soto2000.html/ ]. > > Best Regards, > > Bill. > > > > On Sat, Mar 13, 2021 at 4:47 PM Brad Gilbert wrote: > >> It makes <…> more consistent precisely because <$pattern> doesn't capture. >> >> If the first character inside is anything other than an alpha it doesn't >> capture. >> Which is a very simple description of when it captures. >> >> doesn't capture because of the 「?」 >> doesn't capture because of the 「!」 >> <.ws> doesn't capture because of the 「.」 >> <> doesn't capture because of the 「&」 >> <$pattern> doesn't capture because of the 「$」 >> <$0> doesn't capture because of the 「$」 >> <@a> doesn't capture because of the 「@」 >> <[…]> doesn't capture because of the 「[」 >> <-[…]> doesn't capture because of the 「-] >> <:Ll> doesn't capture because of the 「:」 >> >> For most of those, you don't actually want it to capture. >> With 「.」 the whole point is that it doesn't capture. >> >> does capture because it starts with an alpha >> does capture because it starts with an alpha >> >> $0 = <$pattern> doesn't capture to $, but does capture to $0 >> $ = <$pattern> captures because of $ = >> >> It would be a mistake to just make <$pattern> capture. >> Consistency is perhaps Raku's most important feature. >> >> One of the mottos of Raku, is that it is ok to confuse a new programmer, >> it is not ok to confuse an expert. >> An expert in Raku understands the deep fundamental ways that Raku is >> consistent. >> So breaking consistency should be very carefully considered. >> >> In this case, there is very little benefit. >> Even worse, you then have to come up with some new syntax to prevent it >> from capturing when you don't want it to. >> That new syntax wouldn't be as guessible as it currently is. Which again >> would confuse experts. >> >> If anyone seriously suggests such a change, I will vehemently fight to >> prevent it from happening. >> >> I would be more likely to accept <=$pattern> being added as a synonym to >> . >> >> On Sat, Mar 13, 2021 at 3:30 PM Joseph Brenner wrote: >> >>> Thanks much for your answer on this. I think this is the sort of >>> trick I was looking for: >>> >>> Brad Gilbert wrote: >>> >>> > You can put it back in as a named >>> >>> > > $input ~~ / >>> > 「9 million」 >>> > pattern => 「9 million」 >>> > 0 => 「9」 >>> > 1 => 「million」 >>> >>> That's good enough, I guess, though you need to know about the >>> issue... is there some reason it shouldn't happen automatically, >>> using the variable name to label the captures? >>> >>> I don't think this particular gotcha is all that well >>> documented, though I guess there's a reference to this being a >>> "known trap" in the documentation under "Regex interpolation"-- >>> but that's the sort of remark that makes sense only after you know >>> what its talking about. >>> >>> I have to say, my first reaction was something like "if they >>> couldn't get this working right, why did they put it in?" >>> >>> >>> On 3/11/21, Brad Gilbert wrote: >>> > If you interpolate a regex, it is a sub regex. >>> > >>> > If you have something like a sigil, then the
Re: Working with a regex using positional captures stored in a variable
Dear Brad, 1. The list you posted is fantastic ("If the first character inside is anything other than an alpha it doesn't capture"). It should be added to the Raku Docs ASAP. 2. There are some shortcuts that don't seem to follow a set pattern. For example a named capture can be accessed using $ instead of $/ ; the "/' can be elided. Do you have a method you can share for remembering these sorts of shortcuts? Or are they disfavored? > say ~$ if 'abc' ~~ / $ = [ \w+ ] /; abc > [ Above from the example at https://docs.raku.org/syntax/Named%20captures ]. 3. Finally, I've never seen in the Perl6/Raku literature the motto you cite: "One of the mottos of Raku, is that it is ok to confuse a new programmer, it is not ok to confuse an expert." Do you have a citation? [ The motto I prefer is from Larry Wall: "...easy things should stay easy, hard things should get easier, and impossible things should get hard... ." Citation: https://www.perl.com/pub/2000/10/23/soto2000.html/ ]. Best Regards, Bill. On Sat, Mar 13, 2021 at 4:47 PM Brad Gilbert wrote: > It makes <…> more consistent precisely because <$pattern> doesn't capture. > > If the first character inside is anything other than an alpha it doesn't > capture. > Which is a very simple description of when it captures. > > doesn't capture because of the 「?」 > doesn't capture because of the 「!」 > <.ws> doesn't capture because of the 「.」 > <> doesn't capture because of the 「&」 > <$pattern> doesn't capture because of the 「$」 > <$0> doesn't capture because of the 「$」 > <@a> doesn't capture because of the 「@」 > <[…]> doesn't capture because of the 「[」 > <-[…]> doesn't capture because of the 「-] > <:Ll> doesn't capture because of the 「:」 > > For most of those, you don't actually want it to capture. > With 「.」 the whole point is that it doesn't capture. > > does capture because it starts with an alpha > does capture because it starts with an alpha > > $0 = <$pattern> doesn't capture to $, but does capture to $0 > $ = <$pattern> captures because of $ = > > It would be a mistake to just make <$pattern> capture. > Consistency is perhaps Raku's most important feature. > > One of the mottos of Raku, is that it is ok to confuse a new programmer, > it is not ok to confuse an expert. > An expert in Raku understands the deep fundamental ways that Raku is > consistent. > So breaking consistency should be very carefully considered. > > In this case, there is very little benefit. > Even worse, you then have to come up with some new syntax to prevent it > from capturing when you don't want it to. > That new syntax wouldn't be as guessible as it currently is. Which again > would confuse experts. > > If anyone seriously suggests such a change, I will vehemently fight to > prevent it from happening. > > I would be more likely to accept <=$pattern> being added as a synonym to > . > > On Sat, Mar 13, 2021 at 3:30 PM Joseph Brenner wrote: > >> Thanks much for your answer on this. I think this is the sort of >> trick I was looking for: >> >> Brad Gilbert wrote: >> >> > You can put it back in as a named >> >> > > $input ~~ / >> > 「9 million」 >> > pattern => 「9 million」 >> > 0 => 「9」 >> > 1 => 「million」 >> >> That's good enough, I guess, though you need to know about the >> issue... is there some reason it shouldn't happen automatically, >> using the variable name to label the captures? >> >> I don't think this particular gotcha is all that well >> documented, though I guess there's a reference to this being a >> "known trap" in the documentation under "Regex interpolation"-- >> but that's the sort of remark that makes sense only after you know >> what its talking about. >> >> I have to say, my first reaction was something like "if they >> couldn't get this working right, why did they put it in?" >> >> >> On 3/11/21, Brad Gilbert wrote: >> > If you interpolate a regex, it is a sub regex. >> > >> > If you have something like a sigil, then the match data structure gets >> > thrown away. >> > >> > You can put it back in as a named >> > >> > > $input ~~ / >> > 「9 million」 >> > pattern => 「9 million」 >> > 0 => 「9」 >> > 1 => 「million」 >> > >> > Or as a numbered: >> > >> > > $input ~~ / $0 = <$pattern> >> > 「9 million」 >> > 0 => 「9 million」 >> > 0 => 「9」 >> > 1 => 「million」 >> > >> > Or put it in as a lexical regex >> > >> > > my regex pattern { (\d+) \s+ (\w+) } >> > > $input ~~ / / >> > 「9 million」 >> > pattern => 「9 million」 >> > 0 => 「9」 >> > 1 => 「million」 >> > >> > Or just use it as the whole regex >> > >> > > $input ~~ $pattern # variable >> > 「9 million」 >> > 0 => 「9」 >> > 1 => 「million」 >> > >> > > $input ~~ # my regex pattern /…/ >> > 「9 million」 >> > 0 => 「9」 >> > 1 => 「million」 >> > >> > On Thu, Mar 11, 2021 at 2:29 AM Joseph Brenner >> wrote: >> > >> >> Does this behavior
Re: Working with a regex using positional captures stored in a variable
And once again, thanks much for the explication of all this... But even after thinking it over, the current state-of-affairs on this really doesn't strike me as being okay. As I'm sure everyone here knows, over in perl-land the main trick you have for creating regexes from components is lexical interpolation, so something like this: my $r1 = qr{ (\d+) }x; my $r2 = qr{ (\w+) }x; $str =~ m/$r1 \s+ $r2/x; behaves exactly the same as $str =~ m/ (\d+) \s+ (\w+) /x; A direct translation of this approach to Raku doesn't really work: my $r1 = rx{ (\d+) }; my $r2 = rx{ (\w+) }; $str ~~ m/<$r1> \s+ <$r2>/; And it doesn't work in a potentially insidious way: it can *look* like it's working and it certainly doesn't throw any warnings. You might use it for some time before noticing there's a feature missing. So as is, /<$regex>/ construct treats the contents of $regex as a regex-- *except* that it ignores some key features of regexes. It silently throws away some information. Now, it is true that there are other ways of doing regex composition in Raku that work much better, but I don't think that's really the issue: more than one way to do it is fine as long as they all actually work. > I would be more likely to accept <=$pattern> being added as a synonym to > . That could be an improvement. I was thinking something like <:$pattern>, in analogy to colon pairs. (At the very least: this alternate way would get documented, and then we'd have to distinguish between it and the other one, and explain that its missing a feature.) On 3/13/21, Brad Gilbert wrote: > It makes <…> more consistent precisely because <$pattern> doesn't capture. > > If the first character inside is anything other than an alpha it doesn't > capture. > Which is a very simple description of when it captures. > > doesn't capture because of the 「?」 > doesn't capture because of the 「!」 > <.ws> doesn't capture because of the 「.」 > <> doesn't capture because of the 「&」 > <$pattern> doesn't capture because of the 「$」 > <$0> doesn't capture because of the 「$」 > <@a> doesn't capture because of the 「@」 > <[…]> doesn't capture because of the 「[」 > <-[…]> doesn't capture because of the 「-] > <:Ll> doesn't capture because of the 「:」 > > For most of those, you don't actually want it to capture. > With 「.」 the whole point is that it doesn't capture. > > does capture because it starts with an alpha > does capture because it starts with an alpha > > $0 = <$pattern> doesn't capture to $, but does capture to $0 > $ = <$pattern> captures because of $ = > > It would be a mistake to just make <$pattern> capture. > Consistency is perhaps Raku's most important feature. > > One of the mottos of Raku, is that it is ok to confuse a new programmer, it > is not ok to confuse an expert. > An expert in Raku understands the deep fundamental ways that Raku is > consistent. > So breaking consistency should be very carefully considered. > > In this case, there is very little benefit. > Even worse, you then have to come up with some new syntax to prevent it > from capturing when you don't want it to. > That new syntax wouldn't be as guessible as it currently is. Which again > would confuse experts. > > If anyone seriously suggests such a change, I will vehemently fight to > prevent it from happening. > > I would be more likely to accept <=$pattern> being added as a synonym to > . > > On Sat, Mar 13, 2021 at 3:30 PM Joseph Brenner wrote: > >> Thanks much for your answer on this. I think this is the sort of >> trick I was looking for: >> >> Brad Gilbert wrote: >> >> > You can put it back in as a named >> >> > > $input ~~ / >> > 「9 million」 >> > pattern => 「9 million」 >> > 0 => 「9」 >> > 1 => 「million」 >> >> That's good enough, I guess, though you need to know about the >> issue... is there some reason it shouldn't happen automatically, >> using the variable name to label the captures? >> >> I don't think this particular gotcha is all that well >> documented, though I guess there's a reference to this being a >> "known trap" in the documentation under "Regex interpolation"-- >> but that's the sort of remark that makes sense only after you know >> what its talking about. >> >> I have to say, my first reaction was something like "if they >> couldn't get this working right, why did they put it in?" >> >> >> On 3/11/21, Brad Gilbert wrote: >> > If you interpolate a regex, it is a sub regex. >> > >> > If you have something like a sigil, then the match data structure gets >> > thrown away. >> > >> > You can put it back in as a named >> > >> > > $input ~~ / >> > 「9 million」 >> > pattern => 「9 million」 >> > 0 => 「9」 >> > 1 => 「million」 >> > >> > Or as a numbered: >> > >> > > $input ~~ / $0 = <$pattern> >> > 「9 million」 >> > 0 => 「9 million」 >> > 0 => 「9」 >> > 1 => 「million」 >> > >> > Or put it in as a lexical regex >> >
Re: Working with a regex using positional captures stored in a variable
It makes <…> more consistent precisely because <$pattern> doesn't capture. If the first character inside is anything other than an alpha it doesn't capture. Which is a very simple description of when it captures. doesn't capture because of the 「?」 doesn't capture because of the 「!」 <.ws> doesn't capture because of the 「.」 <> doesn't capture because of the 「&」 <$pattern> doesn't capture because of the 「$」 <$0> doesn't capture because of the 「$」 <@a> doesn't capture because of the 「@」 <[…]> doesn't capture because of the 「[」 <-[…]> doesn't capture because of the 「-] <:Ll> doesn't capture because of the 「:」 For most of those, you don't actually want it to capture. With 「.」 the whole point is that it doesn't capture. does capture because it starts with an alpha does capture because it starts with an alpha $0 = <$pattern> doesn't capture to $, but does capture to $0 $ = <$pattern> captures because of $ = It would be a mistake to just make <$pattern> capture. Consistency is perhaps Raku's most important feature. One of the mottos of Raku, is that it is ok to confuse a new programmer, it is not ok to confuse an expert. An expert in Raku understands the deep fundamental ways that Raku is consistent. So breaking consistency should be very carefully considered. In this case, there is very little benefit. Even worse, you then have to come up with some new syntax to prevent it from capturing when you don't want it to. That new syntax wouldn't be as guessible as it currently is. Which again would confuse experts. If anyone seriously suggests such a change, I will vehemently fight to prevent it from happening. I would be more likely to accept <=$pattern> being added as a synonym to . On Sat, Mar 13, 2021 at 3:30 PM Joseph Brenner wrote: > Thanks much for your answer on this. I think this is the sort of > trick I was looking for: > > Brad Gilbert wrote: > > > You can put it back in as a named > > > > $input ~~ / > > 「9 million」 > > pattern => 「9 million」 > > 0 => 「9」 > > 1 => 「million」 > > That's good enough, I guess, though you need to know about the > issue... is there some reason it shouldn't happen automatically, > using the variable name to label the captures? > > I don't think this particular gotcha is all that well > documented, though I guess there's a reference to this being a > "known trap" in the documentation under "Regex interpolation"-- > but that's the sort of remark that makes sense only after you know > what its talking about. > > I have to say, my first reaction was something like "if they > couldn't get this working right, why did they put it in?" > > > On 3/11/21, Brad Gilbert wrote: > > If you interpolate a regex, it is a sub regex. > > > > If you have something like a sigil, then the match data structure gets > > thrown away. > > > > You can put it back in as a named > > > > > $input ~~ / > > 「9 million」 > > pattern => 「9 million」 > > 0 => 「9」 > > 1 => 「million」 > > > > Or as a numbered: > > > > > $input ~~ / $0 = <$pattern> > > 「9 million」 > > 0 => 「9 million」 > > 0 => 「9」 > > 1 => 「million」 > > > > Or put it in as a lexical regex > > > > > my regex pattern { (\d+) \s+ (\w+) } > > > $input ~~ / / > > 「9 million」 > > pattern => 「9 million」 > > 0 => 「9」 > > 1 => 「million」 > > > > Or just use it as the whole regex > > > > > $input ~~ $pattern # variable > > 「9 million」 > > 0 => 「9」 > > 1 => 「million」 > > > > > $input ~~ # my regex pattern /…/ > > 「9 million」 > > 0 => 「9」 > > 1 => 「million」 > > > > On Thu, Mar 11, 2021 at 2:29 AM Joseph Brenner > wrote: > > > >> Does this behavior make sense to anyone? When you've got a regex > >> with captures in it, the captures don't work if the regex is > >> stashed in a variable and then interpolated into a regex. > >> > >> Do capture groups need to be defined at the top level where the > >> regex is used? > >> > >> { # From a code example in the "Parsing" book by Moritz Lenz, p. 48, > >> section 5.2 > >>my $input = 'There are 9 million bicycles in beijing.'; > >>if $input ~~ / (\d+) \s+ (\w+) / { > >>say $0.^name; # Match > >>say $0;# 「9」 > >>say $1.^name; # Match > >>say $1;# 「million」 > >>say $/; > >> # 「9 million」 > >> # 0 => 「9」 > >> # 1 => 「million」 > >>} > >> } > >> > >> say '---'; > >> > >> { # Moving the pattern to var which we interpolate into match > >>my $input = 'There are 9 million bicycles in beijing.'; > >>my $pattern = rx{ (\d+) \s+ (\w+) }; > >>if $input ~~ / <$pattern> / { > >>say $0.^name; # Nil > >>say $0;# Nil > >>say $1.^name; # Nil > >>say $1;# Nil > >>say $/;# 「9 million」 > >>} > >> } > >> > >> In the second case, the match clearly works, but it
Re: Working with a regex using positional captures stored in a variable
Thanks much for your answer on this. I think this is the sort of trick I was looking for: Brad Gilbert wrote: > You can put it back in as a named > > $input ~~ / > 「9 million」 > pattern => 「9 million」 > 0 => 「9」 > 1 => 「million」 That's good enough, I guess, though you need to know about the issue... is there some reason it shouldn't happen automatically, using the variable name to label the captures? I don't think this particular gotcha is all that well documented, though I guess there's a reference to this being a "known trap" in the documentation under "Regex interpolation"-- but that's the sort of remark that makes sense only after you know what its talking about. I have to say, my first reaction was something like "if they couldn't get this working right, why did they put it in?" On 3/11/21, Brad Gilbert wrote: > If you interpolate a regex, it is a sub regex. > > If you have something like a sigil, then the match data structure gets > thrown away. > > You can put it back in as a named > > > $input ~~ / > 「9 million」 > pattern => 「9 million」 > 0 => 「9」 > 1 => 「million」 > > Or as a numbered: > > > $input ~~ / $0 = <$pattern> > 「9 million」 > 0 => 「9 million」 > 0 => 「9」 > 1 => 「million」 > > Or put it in as a lexical regex > > > my regex pattern { (\d+) \s+ (\w+) } > > $input ~~ / / > 「9 million」 > pattern => 「9 million」 > 0 => 「9」 > 1 => 「million」 > > Or just use it as the whole regex > > > $input ~~ $pattern # variable > 「9 million」 > 0 => 「9」 > 1 => 「million」 > > > $input ~~ # my regex pattern /…/ > 「9 million」 > 0 => 「9」 > 1 => 「million」 > > On Thu, Mar 11, 2021 at 2:29 AM Joseph Brenner wrote: > >> Does this behavior make sense to anyone? When you've got a regex >> with captures in it, the captures don't work if the regex is >> stashed in a variable and then interpolated into a regex. >> >> Do capture groups need to be defined at the top level where the >> regex is used? >> >> { # From a code example in the "Parsing" book by Moritz Lenz, p. 48, >> section 5.2 >>my $input = 'There are 9 million bicycles in beijing.'; >>if $input ~~ / (\d+) \s+ (\w+) / { >>say $0.^name; # Match >>say $0;# 「9」 >>say $1.^name; # Match >>say $1;# 「million」 >>say $/; >> # 「9 million」 >> # 0 => 「9」 >> # 1 => 「million」 >>} >> } >> >> say '---'; >> >> { # Moving the pattern to var which we interpolate into match >>my $input = 'There are 9 million bicycles in beijing.'; >>my $pattern = rx{ (\d+) \s+ (\w+) }; >>if $input ~~ / <$pattern> / { >>say $0.^name; # Nil >>say $0;# Nil >>say $1.^name; # Nil >>say $1;# Nil >>say $/;# 「9 million」 >>} >> } >> >> In the second case, the match clearly works, but it behaves as >> though the capture groups aren't there. >> >> >>raku --version >> >>Welcome to 퐑퐚퐤퐮퐝퐨™ v2020.10. >>Implementing the 퐑퐚퐤퐮™ programming language v6.d. >> >
Re: Working with a regex using positional captures stored in a variable
If you interpolate a regex, it is a sub regex. If you have something like a sigil, then the match data structure gets thrown away. You can put it back in as a named > $input ~~ / 「9 million」 pattern => 「9 million」 0 => 「9」 1 => 「million」 Or as a numbered: > $input ~~ / $0 = <$pattern> 「9 million」 0 => 「9 million」 0 => 「9」 1 => 「million」 Or put it in as a lexical regex > my regex pattern { (\d+) \s+ (\w+) } > $input ~~ / / 「9 million」 pattern => 「9 million」 0 => 「9」 1 => 「million」 Or just use it as the whole regex > $input ~~ $pattern # variable 「9 million」 0 => 「9」 1 => 「million」 > $input ~~ # my regex pattern /…/ 「9 million」 0 => 「9」 1 => 「million」 On Thu, Mar 11, 2021 at 2:29 AM Joseph Brenner wrote: > Does this behavior make sense to anyone? When you've got a regex > with captures in it, the captures don't work if the regex is > stashed in a variable and then interpolated into a regex. > > Do capture groups need to be defined at the top level where the > regex is used? > > { # From a code example in the "Parsing" book by Moritz Lenz, p. 48, > section 5.2 >my $input = 'There are 9 million bicycles in beijing.'; >if $input ~~ / (\d+) \s+ (\w+) / { >say $0.^name; # Match >say $0;# 「9」 >say $1.^name; # Match >say $1;# 「million」 >say $/; > # 「9 million」 > # 0 => 「9」 > # 1 => 「million」 >} > } > > say '---'; > > { # Moving the pattern to var which we interpolate into match >my $input = 'There are 9 million bicycles in beijing.'; >my $pattern = rx{ (\d+) \s+ (\w+) }; >if $input ~~ / <$pattern> / { >say $0.^name; # Nil >say $0;# Nil >say $1.^name; # Nil >say $1;# Nil >say $/;# 「9 million」 >} > } > > In the second case, the match clearly works, but it behaves as > though the capture groups aren't there. > > >raku --version > >Welcome to 퐑퐚퐤퐮퐝퐨™ v2020.10. >Implementing the 퐑퐚퐤퐮™ programming language v6.d. >
Working with a regex using positional captures stored in a variable
Does this behavior make sense to anyone? When you've got a regex with captures in it, the captures don't work if the regex is stashed in a variable and then interpolated into a regex. Do capture groups need to be defined at the top level where the regex is used? { # From a code example in the "Parsing" book by Moritz Lenz, p. 48, section 5.2 my $input = 'There are 9 million bicycles in beijing.'; if $input ~~ / (\d+) \s+ (\w+) / { say $0.^name; # Match say $0;# 「9」 say $1.^name; # Match say $1;# 「million」 say $/; # 「9 million」 # 0 => 「9」 # 1 => 「million」 } } say '---'; { # Moving the pattern to var which we interpolate into match my $input = 'There are 9 million bicycles in beijing.'; my $pattern = rx{ (\d+) \s+ (\w+) }; if $input ~~ / <$pattern> / { say $0.^name; # Nil say $0;# Nil say $1.^name; # Nil say $1;# Nil say $/;# 「9 million」 } } In the second case, the match clearly works, but it behaves as though the capture groups aren't there. raku --version Welcome to 퐑퐚퐤퐮퐝퐨™ v2020.10. Implementing the 퐑퐚퐤퐮™ programming language v6.d.